Game-Aware MPC-DDP for Mixed Traffic: Safe, Efficient, and Comfortable Interactive Driving

Wang, Zhenhua; Wu, Zheng; Hu, Shiguang; Yuan, Fujiang; Yang, Junye

doi:10.3390/wevj16090544

Open AccessArticle

Game-Aware MPC-DDP for Mixed Traffic: Safe, Efficient, and Comfortable Interactive Driving

by

Zhenhua Wang

^1,*

,

Zheng Wu

¹,

Shiguang Hu

²,

Fujiang Yuan

^3,4

and

Junye Yang

^3,4

¹

College of Information Technology, Nanjing Police University, Nanjing 210023, China

²

Equipment Management and UAV Engineering College, Air Force Engineering University, Xi’an 710051, China

³

School of Computer Science and Technology, Taiyuan Normal University, Jinzhong 030619, China

⁴

Shanxi Key Laboratory of Intelligent Optimization Computing and Blockchain Technology, Jinzhong 030619, China

^*

Author to whom correspondence should be addressed.

World Electr. Veh. J. 2025, 16(9), 544; https://doi.org/10.3390/wevj16090544

Submission received: 20 August 2025 / Revised: 17 September 2025 / Accepted: 19 September 2025 / Published: 22 September 2025

(This article belongs to the Special Issue Recent Advances in Autonomous Vehicles)

Download

Browse Figures

Versions Notes

Abstract

In recent years, achieving safety, efficiency, and comfort among interactive automated driving has been a formidable challenge. Model-based approaches, such as game-theoretic and robust control methods, often result in overly cautious decisions or suboptimal solutions. In contrast, learning-based techniques typically demand high computational resources and lack interpretability. At the same time, simpler strategies that rely on static assumptions tend to underperform in rapidly evolving traffic environments. To address these limitations, we propose a novel game-based MPC-DDP framework that integrates game-theoretic predictions of human-driven vehicle (HDV) with a Dynamic Differential Programming (DDP) solver under a receding-horizon setting. Our method dynamically adjusts the autonomous vehicle’s (AV) control inputs in response to real-time human-driven vehicle (HDV) behavior. This enables an effective balance between safety and efficiency. Experimental evaluations on lane-change and intersection scenarios demonstrate that the proposed approach achieves smoother trajectories, higher average speeds when needed, and larger safety margins in high-risk conditions. Comparisons against state-of-the-art baselines confirm its suitability for complex, interactive driving environments.

Keywords:

autonomous vehicle; interactive driving; risk potential field; model predictive control

1. Introduction

In recent years, autonomous driving has advanced rapidly, with many research concentrating on self-driving technology. The field of autonomous driving typically includes perception [1,2,3,4,5,6], planning [7,8,9,10,11,12], and control [13,14]. Besides, some works rely on the blockchain-based technology to enhance the autonomous driving [15,16]. However, interactive driving scenarios still pose significant challenges for motion planning. Among these, ensuring safety is the top priority. In Figure 1, an autonomous vehicle (AV) coexists on the road with multiple human-driven vehicle (HDV), creating an environment where insufficient caution could lead to collisions. The AV must continually evaluate potential hazards and generate safe trajectories during the interactions.

Model-based methods have been extensively explored. For instance, game-theoretic techniques attempt to predict the actions of other drivers [17,18], assuming a particular driving style for them. However, real-world driver behavior can deviate from such assumptions, often resulting in overly conservative planning. Another model-based approach is robust control [19], which expands the feasible region to address uncertainties. Yet, this enlargement may inadvertently include regions that are actually unsafe. Existing game-aware MPC frameworks (e.g., Stackelberg or mean-field game integrated MPC) have demonstrated the potential of incorporating interaction-aware predictions into motion planning. However, these approaches often rely on simplified dynamics or linear approximations, which can limit their ability to handle nonlinear vehicle behaviors and complex cost structures in real-time. Moreover, their solutions tend to be overly conservative in high-risk scenarios, leading to reduced efficiency.

Learning-based methods have also garnered considerable interest [20,21,22,23], typically requiring large-scale training datasets. Consequently, they demand substantial computational resources. Moreover, such methods often function as black boxes [24], offering limited transparency into their decision-making processes. Other advanced learning-based concepts, such as context-assisted learning [25,26] and Large Language Models [27], could only achieve simple driving-related tasks in current stage, even though they have good capabilities in other domains [28].

Besides the variety of limitations in above methods, effectively balancing safety, efficiency, and flexibility in interactive driving remains an open problem. Therefore, an approach that can both overcome above limitations and maintain the flexible driving driving is needed. Optimization-based approaches, such as Particle Swarm Optimization (PSO) [29,30,31], Quadratic Programming (QP) [32], and Adaptive Searching (AS) [33], and other optimization methods [34,35] have been adopted in certain motion-planning contexts for efficient and flexible driving. Nevertheless, these methods can be limited by local minima issues or require carefully tuned cost functions to manage safety margins and driver comfort.

To address these limitations, we propose a novel method based on game theory [36,37], model predictive control (MPC) [38,39] integrated with dynamic differential programming (DDP) [40]. Our approach leverages the strengths of game theory, MPC and DDP. Game theory provides the flexible decision-makings based on the relative states over the HDV, even under high-risk scenarios. The MPC provides a receding-horizon framework that continuously updates the control policy based on real-time measurements and predicted traffic evolutions, while DDP efficiently handles nonlinear system dynamics and complex cost structures through a backward-forward optimization procedure. By dynamically adjusting the control inputs to account for interactions with surrounding HDVs, our MPC-DDP framework effectively balances safety and efficiency. In high-risk conditions, the method adopts lower speeds and larger inter-vehicle gaps, whereas in less hazardous situations, it increases speed and maintains tighter following distances. As a result, the proposed MPC-DDP algorithm offers a flexible, transparent, and computationally feasible solution for interactive driving scenarios. The main contributions of this work are:

We introduce a novel game-based MPC-DDP framework that leverages dynamic differential programming to solve the optimal control problem in interactive driving environments. Our formulation uses advanced matrix representations and a recursive backward-forward pass to efficiently compute real-time control policies.
We demonstrate that our MPC-DDP method adapts to diverse driving scenarios. For example, in lane-change scenarios, it achieves higher average speeds and tighter gaps to enable efficient merging, whereas in intersection scenarios, it maintains lower speeds and larger gaps to maximize safety.
Extensive simulation results confirm that our proposed method outperforms state-of-the-art benchmark algorithms such as PSO, QP, and AS, by producing smoother trajectories for improved ride comfort, optimizing speed for greater efficiency, and ensuring collision-free operation.

Unlike existing hybrid MPC approaches that primarily combine MPC with approximate solvers or learning-based modules, our framework uniquely integrates DDP into a game-theoretic MPC setting, thereby achieving both nonlinear optimization efficiency and adaptive interaction modeling.

Furthermore, our proposed game-based MPC-DDP framework distinguishes itself by embedding game-theoretic predictions into a receding-horizon control structure solved via Dynamic Differential Programming (DDP). This integration not only improves computational efficiency for nonlinear dynamics but also enables adaptive behaviors—choosing conservative strategies in high-risk situations while maintaining efficiency and comfort in low-risk conditions.

In summary, existing methods exhibit complementary strengths but also face significant drawbacks in interactive driving. Game-theoretic and robust control approaches often yield overly conservative or suboptimal behaviors; learning-based methods, while flexible, demand extensive computational resources and lack interpretability; and conventional optimization methods such as PSO, QP, and AS either suffer from local optima, linearization errors, or sensitivity to parameter tuning. Unlike these approaches, our proposed game-based MPC-DDP framework explicitly integrates game-theoretic reasoning with second-order optimization via Dynamic Differential Programming (DDP). This integration enables efficient handling of nonlinear vehicle dynamics while adaptively balancing safety, efficiency, and comfort across diverse traffic scenarios.

In the following sections, we introduce the related works, the system model, the risk field formulation, the evolution mechanism, and the MPC optimization problem. Experimental results validate the effectiveness of our proposed approach in interactive driving scenarios.

2. Related Works

This section reviews the challenges and existing approaches for interactive autonomous driving, focusing on optimization-based methods and their limitations, and introduces multi-agent reinforcement learning for driving scenarios.

2.1. Challenges in Interactive Autonomous Driving

Interactive autonomous driving presents numerous challenges, particularly in achieving flexible driving strategies that adapt to diverse traffic scenarios. The AV must first process raw images using sensing technologies such as laser scanning [41] and point clouds [42], or rely on different processed images such as RGB [43], to perform semantic segmentation [44,45,46] for interactive driving. Semantic segmentation provides categorized objects whose specific types and positions can be obtained [47,48,49,50], thereby assisting the interactive driving process. Once encounter the uncertain scenarios, such as special weathers [51], that lead the incompletion of the images, diffusion models are used to fix the incomplete images [52]. In dynamic interactive environments, vehicles must operate safely while interacting with HDVs [53,54,55]. The risk of collisions increases when steering responses are delayed or judgment errors occur, which necessitates the adoption of distinct strategies for different scenarios. For example, maneuvers such as lane changes and overtaking require rapid, decisive actions [56,57,58,59] to maintain both efficiency and tight car-following, whereas more complex or high-risk environments, such as intersections, demand conservative approaches to maximize safety. In such cases, the vehicle must balance safety, efficiency, and ride comfort by dynamically switching between aggressive and cautious strategies based on the specific driving context. Many existing systems struggle to achieve this flexibility in real time, making it a critical challenge to ensure safe operation in complex and uncertain traffic conditions. Optimization-based approaches offer a promising solution to these challenges, as they enable a systematic formulation of the control problem. Besides, optimization-based approaches allow for real-time re-optimization that adapts to evolving traffic dynamics.

2.2. Shortcomings of Current Optimization-Based Approaches

Recent research in autonomous driving has explored various optimization-based methods for trajectory planning and control. PSO has been employed in several studies to address global optimization challenges in driving environments [60,61]. While PSO is advantageous in its ability to explore a large solution space, it often suffers from slow convergence and may become trapped in local optima, leading to suboptimal control performance in fast-changing traffic conditions.

QP techniques have also been widely utilized in autonomous driving applications, particularly for real-time collision avoidance and control under convex constraints [62,63]. However, QP-based approaches typically rely on linear or convex approximations of the underlying nonlinear vehicle dynamics and cost functions, which can compromise solution accuracy in driving scenarios.

AS methods have been proposed to improve the tuning of control parameters and to adaptively search the solution space [33,64]. Despite their flexibility, AS methods are often sensitive to the choice of parameters and can exhibit inconsistent performance when dealing with significant uncertainties in traffic interactions.

While the aforementioned optimization-based approaches have advanced autonomous driving, they exhibit critical limitations in real-time interactive scenarios. As shown in Table 1, PSO’s slow convergence and susceptibility to local optima hinder its real-time applicability. QP methods often rely on convex approximations that fail to capture the full nonlinearity of vehicle dynamics and interaction costs, compromising both performance and safety in highly dynamic settings. Although flexible, AS techniques are sensitive to parameter tuning and can yield inconsistent performance under significant uncertainty. Beyond these, learning-based methods (such as, reinforcement learning) typically function as black boxes, lacking interpretability, and demand substantial computational resources for training and inference, raising concerns about reliability and real-time feasibility. These collective shortcomings in real-time performance, interpretability, and handling of nonlinear dynamics and uncertainties motivate our proposed game-theoretic MPC-DDP framework, which leverages a dynamic differential programming solver to efficiently and transparently address these challenges within a receding-horizon optimal control structure.

Game theory, with its equilibrium model, provides flexible decision-making capabilities, allowing the identification of optimal actions that satisfy both traffic participants based on their current states. Meanwhile, DDP offers a promising alternative by using a second-order Taylor series expansion to approximate both the system dynamics and the cost function, which enables it to capture the nonlinear characteristics of the system more accurately. Through its iterative backward-forward process, DDP computes optimal control corrections based on the full nonlinear model while efficiently updating the nominal trajectory using insights derived from game theory. This approach enables the MPC-DDP framework to dynamically adjust control inputs with greater precision, ensuring both robust and flexible control performance under various traffic conditions.

While hybrid MPC approaches have been proposed by combining MPC with heuristic optimization or machine learning to enhance efficiency, they typically overlook explicit game-theoretic reasoning and second-order optimization. Our method fills this gap by embedding DDP into a game-aware MPC formulation, thereby bridging strategic interaction modeling and efficient nonlinear control optimization.

2.3. Multi-Agent Reinforcement Learning for Driving Scenarios

In the field of Multi-Agent Reinforcement Learning (MARL), numerous studies in recent years have explored its application in autonomous driving decision-making and control. As shown in Table 2. Hua et al. [65] provides a systematic review of the latest advancements and future challenges of MARL in the control of Connected Autonomous Vehicles (CAVs), highlighting its potential to handle dynamic interactions and enhance traffic flow and fuel efficiency in critical scenarios such as fleet coordination, lane changes, and unsignalized intersections. Addressing safety challenges in practical MARL deployment, Zheng et al. [66] proposes a secure MARL approach based on Stackelberg models and two-layer optimization. It designs the CSQ and CS-MADDPG algorithms, significantly improving reward and safety performance across diverse driving scenarios including merging, roundabouts, and intersections. Wang et al. [67] models energy management and eco-driving for plug-in hybrid vehicles as a multi-agent cooperative task, proposing a MARL-based cooperative control strategy that significantly reduces fuel consumption while ensuring safe following distances. Chen et al. [68] focuses on cooperative decision-making between two vehicles in dynamic random highway environments, proposing a fair cooperative MARL method that effectively balances maintaining convoy formation with enabling free overtaking, enhancing system adaptability and collaborative efficiency under variable traffic conditions. Yadav et al. [69] provides a comprehensive review of MARL applications in the CAV domain, systematically organizing current research issues, methodologies, and future directions, offering a clear research framework for subsequent developments in this field.

However, existing MARL methods still face numerous challenges, including high sample complexity, insufficient training stability, weak safety guarantees, and poor policy interpretability. Particularly in safety-critical scenarios, black-box decision mechanisms struggle to provide reliable behavioral verification and constraint satisfaction assurance. In contrast, the proposed optimization method based on game theory and model predictive control (MPC) in this paper, while exhibiting strong model dependency, features explicit safety constraint handling mechanisms, high computational efficiency, and good interpretability. It is suitable for interactive driving tasks demanding high safety and real-time performance. Both approaches possess distinct advantages, and future research may explore complementary integration by combining the adaptability of MARL with the safety properties of optimization methods.

3. Methodology

This section describes the overall framework of our game-based MPC-DDP method for interactive driving. The proposed methodology comprises three main components:

Interactive Driving Environment: We define the vehicle models for the autonomous vehicle (AV) and the surrounding human-driven vehicles (HDVs) in a complex multi-vehicle environment.
Game-Based Interaction: We model the interaction among vehicles using a hierarchical game formulation, in which the AV acts as a leader and HDVs as followers. The payoff functions are expressed in matrix form.
MPC-DDP: We integrate the game-based interaction into a receding-horizon Model Predictive Control framework solved via Dynamic Differential Programming (DDP), with detailed linearization and quadratic approximations.

3.1. Overall Framework

Figure 2 illustrates our overall approach. The proposed Game-based MPC-DDP framework is a unified approach that integrates game-theoretic prediction with MPC and DDP to address interactive driving challenges. First, sensor data is collected and processed to estimate the states of both the AV and HDV. A game-theoretic module then computes the optimal responses for HDVs using a Stackelberg formulation, predicting how they will behave in reaction to the AV’s actions. These predicted HDV actions are incorporated into a receding-horizon MPC formulation, where DDP is used to efficiently linearize the system dynamics and quadratically approximate the cost function. Through a backward-forward pass, the MPC-DDP optimization computes an optimal control sequence for the AV that balances safety, efficiency, and comfort. Finally, the optimal control commands is applied to the AV, and the process is repeated in real time, enabling the AV to continuously adapt to the dynamic traffic environment while leveraging the predicted behavior of other vehicles.

3.2. Interactive Driving Environment

In our environment, the set of vehicles is given by

V = {AV} \cup H,

(1)

where the AV is the autonomous vehicle and

H = {{HDV}_{1}, {HDV}_{2}, \dots, {HDV}_{N}}

(2)

is the set of N human-driven vehicles. The discrete-time index is denoted by

k \in {0, 1, \dots}

with sampling interval

Δ t

.

For each vehicle

i \in V

, the state vector is defined as

x_{i} (k) = [\begin{matrix} x_{i} (k) \\ y_{i} (k) \\ v_{i} (k) \\ θ_{i} (k) \end{matrix}] \in R^{4},

(3)

where

x_{i} (k)

and

y_{i} (k)

denote the Cartesian coordinates,

v_{i} (k)

is the speed, and

θ_{i} (k)

is the heading angle. The control input for vehicle i is given by

u_{i} (k) = [\begin{matrix} a_{i} (k) \\ ω_{i} (k) \end{matrix}] \in R^{2},

(4)

with

a_{i} (k)

representing the longitudinal acceleration and

ω_{i} (k)

the steering rate.

We adopt a simplified bicycle model for the dynamics. The evolution of the state is governed by

x_{i} (k + 1) = f_{i} (x_{i} (k), u_{i} (k)),

(5)

and a typical instantiation is:

\begin{matrix} x_{i} (k + 1) & = x_{i} (k) + Δ t v_{i} (k) \cos θ_{i} (k), \\ y_{i} (k + 1) & = y_{i} (k) + Δ t v_{i} (k) \sin θ_{i} (k), \\ v_{i} (k + 1) & = v_{i} (k) + Δ t a_{i} (k), \\ θ_{i} (k + 1) & = θ_{i} (k) + Δ t ω_{i} (k) . \end{matrix}

(6)

The AV’s state is denoted by

x_{AV} (k)

and each HDV’s state is denoted by

x_{{HDV}_{j}} (k)

,

j = 1, \dots, N

. We define the joint state as

X (k) = [\begin{matrix} x_{AV} (k) \\ x_{{HDV}_{1}} (k) \\ ⋮ \\ x_{{HDV}_{N}} (k) \end{matrix}] \in R^{4 (N + 1)} .

(7)

3.3. Game-Based Interaction

In interactive driving, each vehicle’s decision-making can be modeled as part of a dynamic game. We assume that the AV acts as the leader while the HDVs serve as followers. At each game stage

m \in {0, 1, \dots, M - 1}

(with M denoting the game horizon), each vehicle chooses its control input in order to minimize a cost function.

For vehicle i, the stage cost is defined as

ℓ_{i} (x_{i} (m), u_{i} (m)) = {(x_{i} (m) - x_{i, ref} (m))}^{⊤} Q_{i} (x_{i} (m) - x_{i, ref} (m)) + u_{i} {(m)}^{⊤} R_{i} u_{i} (m),

(8)

where

Q_{i} \in R^{4 \times 4}

and

R_{i} \in R^{2 \times 2}

are positive-definite weighting matrices, and

x_{i, ref} (m)

is the reference state.

In addition, to capture interaction effects (e.g., collision avoidance), we define an interaction cost

φ_{i} (x_{i} (m), {x_{j} (m)}_{j \neq i}) = \sum_{j \neq i} ϕ_{i j} I \{∥ x_{i} (m) - x_{j} {(m) ∥}_{2} < d_{safe}\},

(9)

where

ϕ_{i j} > 0

is a penalty coefficient,

d_{safe}

is a safety distance, and

I {\cdot}

is the indicator function.

Thus, the overall cost for vehicle i over the game horizon is

J_{i} = \sum_{m = 0}^{M - 1} [ℓ_{i} (x_{i} (m), u_{i} (m)) + φ_{i} (x_{i} (m), {x_{j} (m)}_{j \neq i})] .

(10)

For each HDV j, given the AV’s proposed control

u_{AV} (m)

, the best response is computed as

u_{{HDV}_{j}}^{*} (m) = \arg \min_{u_{{HDV}_{j}}} J_{{HDV}_{j}} (x_{{HDV}_{j}} (m), u_{{HDV}_{j}} (m), x_{- {HDV}_{j}} (m), u_{- {HDV}_{j}} (m)) .

(11)

The AV, anticipating these responses, solves

u_{AV}^{*} (m) = \arg \min_{u_{AV}} J_{AV} (x_{AV} (m), u_{AV} (m), {x_{{HDV}_{j}} (m)}_{j = 1}^{N}) .

(12)

$x_{i} (m) \in R^{4}$ : state vector of vehicle i at game stage m, defined as $x_{i} (m) = {[x_{i} (m), y_{i} (m), v_{i} (m), θ_{i} (m)]}^{⊤}$ , where $x_{i}, y_{i}$ are Cartesian coordinates, $v_{i}$ is longitudinal speed, and $θ_{i}$ is heading angle.
$u_{i} (m) \in R^{2}$ : control input vector of vehicle i, given by $u_{i} (m) = {[a_{i} (m), ω_{i} (m)]}^{⊤}$ , with $a_{i} (m)$ the longitudinal acceleration and $ω_{i} (m)$ the steering rate.
$x_{i, ref} (m)$ : reference state of vehicle i at stage m.
$Q_{i} \in R^{4 \times 4}$ , $R_{i} \in R^{2 \times 2}$ : positive-definite weighting matrices penalizing deviations from the reference state and excessive control efforts, respectively.
$ℓ_{i} (x_{i} (m), u_{i} (m))$ : stage cost of vehicle i, measuring tracking error and control effort (Equation (8)).
$φ_{i j} > 0$ : penalty coefficient for potential collision between vehicle i and vehicle j.
$d_{safe}$ : predefined safety distance threshold.
$I {\cdot}$ : indicator function, equal to 1 if the condition holds and 0 otherwise.
$ϕ_{i} (x_{i} (m), {x_{j} (m)}_{j \neq i})$ : interaction cost penalizing violation of safety distance (Equation (9)).
$J_{i}$ : cumulative cost of vehicle i over the game horizon, defined as the sum of stage and interaction costs (Equation (10)).
$u_{{HDV}_{j}}^{*} (m)$ : best-response control action of human-driven vehicle j, obtained by minimizing its cumulative cost $J_{{HDV}_{j}}$ given the AV’s proposed action (Equation (11)).
$u_{AV}^{*} (m)$ : optimal action of the autonomous vehicle, determined by anticipating HDVs’ best responses (defined later in Equation (12)).

Algorithm 1 outlines the game-based interaction procedure.

Algorithm 1 Game-Based Interaction Procedure using Stackelberg model.

1:: Input: Joint state $X (0)$ , game horizon M, cost functions ${J_{i}}$ .
2:: for $m = 0$ to $M - 1$ do
3:: AV proposes a tentative control $u_{AV} (m)$ .
4:: for each HDV $j = 1, \dots, N$ in parallel do
5:: Compute best response using (11).
6:: end for
7:: AV updates its decision using (12).
8:: Update joint state (13).
9:: end for

X (m + 1) = f (X (m), U (m)) .

(13)

3.4. MPC-DDP Formulation

We now embed the game-based interaction into a Model Predictive Control (MPC) framework solved via Dynamic Differential Programming (DDP). Let the prediction horizon be T steps. Define the joint state

Z (k) = {[\begin{matrix} x_{AV}^{⊤} (k) & x_{{HDV}_{1}}^{⊤} (k) & \dots & x_{{HDV}_{N}}^{⊤} (k) \end{matrix}]}^{⊤} \in R^{n_{z}},

(14)

and the joint control input

U (k) = {[\begin{matrix} u_{AV}^{⊤} (k) & u_{{HDV}_{1}}^{⊤} (k) & \dots & u_{{HDV}_{N}}^{⊤} (k) \end{matrix}]}^{⊤} \in R^{n_{u}} .

(15)

The stage cost is defined as

C (Z (k), U (k)) = {(Z (k) - Z_{ref})}^{⊤} Q_{z} (Z (k) - Z_{ref}) + U {(k)}^{⊤} R_{u} U (k) + Ψ (Z (k)),

(16)

where

Q_{z} \in R^{n_{z} \times n_{z}}

and

R_{u} \in R^{n_{u} \times n_{u}}

are weighting matrices,

Z_{ref}

is the desired joint state, and

Ψ (\cdot)

is an extra penalty term (derived from the game-based interaction) to enforce safety constraints.

Thus, the total cost over the horizon is

J = \sum_{k = 0}^{T - 1} C (Z (k), U (k)) .

(17)

To solve this MPC problem in real time, we use DDP. In the backward pass, we linearize the dynamics about a nominal trajectory

{Z^{*} (k), U^{*} (k)}

as

Z (k + 1) \approx A (k) Z (k) + B (k) U (k),

(18)

where

A (k) \in R^{n_{z} \times n_{z}}

and

B (k) \in R^{n_{z} \times n_{u}}

are the Jacobians:

A (k) = \frac{𝜕 f}{𝜕 Z} |_{(Z^{*} (k), U^{*} (k))}, B (k) = \frac{𝜕 f}{𝜕 U} |_{(Z^{*} (k), U^{*} (k))} .

(19)

The stage cost is quadratically approximated about the nominal point

(Z^{*} (k), U^{*} (k))

. Let

δ Z (k) = Z (k) - Z^{*} (k)

and

δ U (k) = U (k) - U^{*} (k)

denote the deviations of the state and control from their nominal trajectories, respectively. The quadratic approximation of the cost variation

δ C (k) = C (Z (k), U (k)) - C (Z^{*} (k), U^{*} (k))

is given by:

U_{new} (k) = U^{*} (k) + Δ U (k),

(20)

Equation (20) updates the control input by adding the correction term

Δ U (k)

to the nominal control

U^{*} (k)

, thereby generating the new control sequence

U_{new} (k)

.

Z^{*} (k) \leftarrow Z_{new} (k), U^{*} (k) \leftarrow U_{new} (k) .

(21)

Equation (21) resets the nominal state and control trajectories to their updated values after each iteration, ensuring consistency in the forward–backward optimization procedure.

The stage cost is quadratically approximated about

(Z^{*} (k), U^{*} (k))

:

δ C (k) \approx \frac{1}{2} δ Z {(k)}^{⊤} Q_{zz} (k) δ Z (k) + δ Z {(k)}^{⊤} Q_{z} (k) + \frac{1}{2} δ U {(k)}^{⊤} R_{uu} (k) δ U (k) + δ Z {(k)}^{⊤} Q_{zu} (k) δ U (k),

(22)

Equation (22) represents the local quadratic approximation of the stage cost in terms of the state deviation

δ Z (k)

and control deviation

δ U (k)

. The matrices

Q_{z z} (k)

,

Q_{z} (k)

,

R_{u u} (k)

, and

Q_{z u} (k)

capture the second-order and cross-term contributions.

Q_{z z} (k) = Q_{z} + A {(k)}^{⊤} V_{zz} (k + 1) A (k),

(23)

Equation (23) defines the Hessian of the value function with respect to the state, combining the immediate state cost

Q_{z}

with the propagated effect from the next step via system dynamics

A (k)

.

Q_{z} (k) = Q_{z} (Z^{*} (k) - Z_{ref}) + A {(k)}^{⊤} V_{z} (k + 1),

(24)

Equation (24) provides the gradient of the value function with respect to the state deviation, accounting for the tracking error from the reference

Z_{ref}

and the backward-propagated gradient

V_{z} (k + 1)

.

R_{uu} (k) = R_{u} + B {(k)}^{⊤} V_{zz} (k + 1) B (k),

(25)

Equation (25) gives the Hessian of the value function with respect to the control input, where

R_{u}

is the immediate control penalty and the second term captures the effect of control on future states via

B (k)

.

Q_{zu} (k) = A {(k)}^{⊤} V_{zz} (k + 1) B (k) .

(26)

Equation (26) defines the cross-term matrix between state and control deviations, reflecting how control actions affect future states in the quadratic expansion.

The optimal control correction is computed via.

Δ U (k) = - Q_{uu} {(k)}^{- 1} Q_{zu} {(k)}^{⊤} Δ Z (k),

(27)

Equation (27) derives the optimal control adjustment

Δ U (k)

by minimizing the quadratic cost, where

Q_{u u} {(k)}^{- 1}

serves as the control gain weighting.

And the value function is updated recursively:

V_{zz} (k) = Q_{zz} (k) - Q_{zu} (k) Q_{uu} {(k)}^{- 1} Q_{zu} {(k)}^{⊤},

(28)

Equation (28) updates the second-order term of the value function with respect to the state, incorporating the effect of optimal feedback control.

V_{z} (k) = Q_{z} (k) - Q_{zu} (k) Q_{uu} {(k)}^{- 1} Q_{u} (k) .

(29)

Equation (29) updates the first-order term of the value function with respect to the state, where the contribution of the optimal control is explicitly subtracted.

Algorithm 2 outlines the MPC-DDP procedure. In this formulation, the cost function is augmented by penalties from the game-based interaction. The backward pass computes a local quadratic approximation of the cost-to-go function and the optimal control correction. Meanwhile, the forward pass simulates the new trajectory. Only the first control is applied in a receding-horizon fashion.

Algorithm 2 MPC-DDP for Interactive Driving.

1:: Input: Current joint state $Z (0)$ , horizon T, initial control sequence ${U^{*} (k)}_{k = 0}^{T - 1}$ .
2:: for iteration = 1 to maxIters do
3::    Backward Pass:
   1.     Linearize dynamics using (18)
   2.     Quadratically approximate the stage cost as in (22).
   3.     Recursively compute cost-to-go using (29) and (28)
4:: Forward Pass: For $k = 0$ to $T - 1$ , update the control using (27) and set (20) then
simulate to obtain the new state trajectory ${Z_{new} (k)}$ .
5:: Update nominal trajectory (21).
6:: Convergence Check: If $J_{old} - J_{new} < ϵ$ , break.
7:: end for
8:: Apply: Execute $U^{*} (0)$ and shift the horizon.

3.5. Computational Complexity and Scalability of DDP

The computational complexity of the DDP solver primarily arises from the backward pass, which involves iterative linearization and quadratic approximation of the value function over the prediction horizon T. The complexity per iteration is

O (T \cdot (n_{z}^{3} + n_{u}^{3}))

, where

n_{z}

and

n_{u}

denote the dimensions of the joint state and control vectors, respectively. While this indicates polynomial scaling with the number of vehicles, practical real-time implementation requires careful management of the horizon length T and state dimensionality.

To enhance scalability in multi-vehicle scenarios, we employ a parallelized implementation of the backward pass and leverage sparse matrix operations where possible. Additionally, we adopt a receding-horizon framework that limits the effective interaction range to nearby vehicles, thereby reducing the effective value of

n_{z}

and

n_{u}

. Future work will explore decentralized or hierarchical formulations to further improve scalability.

4. Experimental Evaluation

The simulations were conducted to evaluate the safety, stability, and efficiency of the proposed method. They were run on a computer operating on Ubuntu 18.04.6 LTS, featuring a 12th generation, 16-thread Intel^® Core™ i5-12600KF CPU, an NVIDIA GeForce RTX 3070Ti GPU, and 16 GB of RAM. All simulation results were generated using MATLAB R2024b.

To verify the effectiveness of our proposed game-based MPC-DDP, we designed two distinct simulation scenarios that reflect typical yet challenging driving maneuvers. In the first scenario, illustrated in Figure 3a, the AV initiates a lane-change maneuver from its current lane to the adjacent lane, while the HDV travels at a steady speed in that lane. As shown, the AV’s path must merge safely behind or in front of the HDV without collisions or abrupt accelerations. In the second scenario, illustrated in Figure 3b, the AV approaches and traverses a four-way intersection, while HDVs enter from both the top and bottom roads. The AV must navigate across the intersection, maintaining a safe distance from the crossing HDVs and adapting to potential high-risk interactions. In both scenarios, the HDVs drive with stable behavior that does not explicitly respond to the AV, thereby highlighting the AV’s need to anticipate and adapt to possible interactions. Finally, to underscore the superior performance of the proposed game-based MPC-DDP, we compared it against other popular benchmark algorithms under these two scenarios.

4.1. Implementation Details and Parameters

4.1.1. DDP Algorithm Parameters

The DDP optimization algorithm employs several critical parameters that directly impact convergence behavior and computational efficiency. Table 3 summarizes these parameters with their theoretical justifications.

The convergence criteria employ multiple checks to ensure algorithm reliability:

\begin{matrix} ∥ \nabla J ∥ & < \sqrt{ϵ_{tol}} \end{matrix}

(30)

\begin{matrix} | J^{(k + 1)} - J^{(k)} | & < ϵ_{tol} \end{matrix}

(31)

\begin{matrix} ∥ g_{ineq} ∥_{\infty} & < ϵ_{tol} \end{matrix}

(32)

where

J^{(k)}

denotes the cost at iteration k, and

g_{ineq}

represents inequality constraint violations.

4.1.2. Safety Constraints and Limits

Safety-critical autonomous driving applications require carefully designed constraint sets. Table 4 details our safety constraint implementation.

The collision avoidance constraint is formulated as a smooth barrier function to maintain differentiability:

ϕ_{i j} (t) = d_{safe} - ∥ p_{i} (t) - p_{j} (t) ∥ + ϵ \log (1 + e^{- \frac{∥ p_{i} (t) - p_{j} (t) ∥ - d_{safe}}{ϵ}})

(33)

where

ϵ = 0.1

m provides smooth approximation near the constraint boundary.

4.1.3. Game-Theoretic Parameters

The Stackelberg game formulation requires careful parameter tuning to model realistic human driving behavior. Table 5 presents our game-theoretic configuration.

The HDV cost function incorporates the Intelligent Driver Model (IDM) structure:

J_{HDV} = \int_{0}^{T} [Q_{speed} {(v - v_{0})}^{2} + Q_{gap} {(s^{*} - s)}^{2} + R_{HDV} u^{T} u] d t

(34)

where

s^{*}

is the desired gap distance given by:

s^{*} = s_{0} + T_{h} v + \frac{v Δ v}{2 \sqrt{a_{\max} b_{comf}}}

(35)

4.1.4. Discretization Scheme

The continuous-time dynamics are discretized using fourth-order Runge-Kutta integration with adaptive step size control:

\begin{matrix} x_{k + 1} & = x_{k} + \frac{Δ t}{6} (k_{1} + 2 k_{2} + 2 k_{3} + k_{4}) \\ k_{1} & = f (x_{k}, u_{k}) \\ k_{2} & = f (x_{k} + \frac{Δ t}{2} k_{1}, u_{k}) \\ k_{3} & = f (x_{k} + \frac{Δ t}{2} k_{2}, u_{k}) \\ k_{4} & = f (x_{k} + Δ t k_{3}, u_{k}) \end{matrix}

(36)

(1) Computational Complexity

The algorithm’s computational complexity scales as

O (n^{2} T)

where n is the number of vehicles and T is the prediction horizon. Memory requirements scale as

O (n T)

. Empirical timing results on an Intel i7-10700K processor show average computation times of

55 \pm 30

ms for scenarios with up to 8 vehicles.

(2) Numerical Stability Measures

To ensure reliable operation across diverse scenarios, we implement several critical numerical stability measures that maintain algorithm robustness under varying conditions. The adaptive regularization scheme continuously monitors the Hessian condition number throughout the optimization process, automatically applying additional regularization terms when

κ (H) > 10^{8}

to prevent numerical ill-conditioning. This dynamic approach maintains computational stability without unnecessarily constraining well-conditioned problems.

The warm starting strategy initializes each optimization cycle with the solution from the previous time step, shifted forward by one time step, which significantly reduces the number of required iterations and improves convergence reliability. Constraint scaling normalizes all constraint functions to similar magnitudes, typically within the range

[0.1, 10]

, which prevents numerical precision issues that arise when constraints differ by several orders of magnitude. Finally, a fallback mechanism automatically engages emergency braking protocols if the optimization fails to converge within the allocated computational time budget, ensuring system safety under all circumstances.

(3) Reproducibility Guidelines

To facilitate accurate reproduction of our experimental results, we provide comprehensive documentation and resources that enable other researchers to implement and validate our approach. The complete parameter specification includes all numerical values, initialization procedures, and algorithmic choices documented, with explicit justification for each design decision based on theoretical analysis or empirical validation.

Our reference implementation provides a complete MATLAB codebase that implements the full algorithm, including all optimization routines, constraint handling mechanisms, and safety protocols. This implementation will be made available through a public GitHub repository upon paper acceptance, with comprehensive documentation and usage examples. The codebase includes a comprehensive suite of benchmark scenarios that represent standard test cases with documented expected outputs, enabling researchers to validate their own implementations against our reference results.

Performance baselines provide computational timing benchmarks measured on standardized hardware configurations, allowing researchers to assess the efficiency of their implementations and identify potential optimization opportunities. The modular code architecture facilitates easy modification of individual algorithmic components while preserving the overall framework integrity.

The modular code structure allows researchers to easily modify individual components (e.g., cost functions, constraint sets, or vehicle models) while maintaining the core algorithmic framework.

4.1.5. Parameter Sensitivity Analysis

Table 6 presents the sensitivity of key performance metrics to parameter variations, providing guidance for parameter tuning in different applications.

This analysis demonstrates the robustness of the algorithm to moderate parameter variations, with safety metrics showing particularly low sensitivity to parameter changes.

4.2. Effective Driving in Lane-Changing Scenario

4.2.1. Lane-Changing Scenario Setup

In this scenario, the AV performs a lane-change maneuver from its current lane to an adjacent lane where a HDV is traveling at a steady speed. The AV must anticipate the HDV’s motion and plan a safe merge behind or in front of the HDV. Table 7 summarizes the initial conditions for both vehicles.

In our implementation, the AV starts in the lower lane (centered at

y = 0

) with an initial speed of 15 m/s and heading

θ = 0

. Meanwhile, the HDV occupies the adjacent lane (centered at

y = 3.5

) with the same initial speed of 15 m/s and the same heading. The lane width is set to 3.5 m, and both vehicles are assumed to follow the bicycle model described in Section 3. The AV’s objective is to change lanes safely by anticipating the HDV’s motion, avoiding collisions, and maintaining comfortable accelerations and steering rates.

4.2.2. Simulation Results in Lane-Changing Scenario

Figure 4 illustrates the lane-change scenario in which the AV (blue trajectory) merges from its initial lane into the adjacent lane occupied by a HDV (red trajectory). The solid black lines represent lane boundaries, and the dashed line indicates the lane center. Observing the AV’s path (blue circles), one can see a smooth lateral transition from

y = 0

to

y = 3.5

, demonstrating that the proposed game-based MPC-DDP method effectively plans a continuous lane-change maneuver without abrupt steering or acceleration. This smoothness not only maintains passenger comfort, but also helps reduce the risk of collisions with the HDV or other vehicles.

From a safety perspective, there is no point along the trajectory where the AV crosses the HDV’s path with insufficient spacing, indicating that the method anticipates the HDV’s motion and avoids encroaching on its safety envelope. Additionally, the gradual convergence of the AV’s y-position to the target lane center (around

y = 3.5

) reflects the method’s ability to execute a comfortable lane change, rather than a sudden or aggressive swerve. This trajectory analysis highlights how the proposed approach ensures both safety through collision-free motion and sufficient inter-vehicle spacing and comfort through smooth steering and controlled acceleration.

Figure 5 provides additional metrics for the lane-change scenario, highlighting the performance of our game-based MPC-DDP method. In the top subplot, the blue curve indicates the relative speed between the AV and HDV, whereas the red curve shows their relative distance. The left y-axis corresponds to the relative speed, and the right y-axis to the relative distance. Notably, the minimum relative distance observed in this simulation is approximately 19.8 m, a margin that is sufficiently large to avoid collisions and ensure safety. This relatively high margin reflects the algorithm’s ability to anticipate the HDV’s motion and adjust the AV’s acceleration to maintain a comfortable gap.

Meanwhile, the relative speed curve transitions smoothly from around

1.5

m/s to nearly 0 m/s as time progresses, illustrating how the AV gradually converges to the HDV’s speed without abrupt changes. In the bottom subplot, the speed scatter plot shows that the HDV maintains a roughly constant speed of around 15 m/s, while the AV’s speed initially rises above 15 m/s before gently settling to match the HDV. The absence of sharp spikes in the AV’s speed profile underscores both comfort by limiting excessive accelerations and stability by avoiding oscillatory or reactive behavior. Together, these results confirm that our proposed method ensures safety by keeping the vehicles well separated while providing a smooth driving experience through gradual speed adaptation.

Figure 6 compares the speed and acceleration distributions of both the AV and HDV in the lane-change scenario. In the left plot, we show a speed histogram for the AV and HDV. Most of the AV’s speeds are clustered between 14.8 m/s and 15.2 m/s, reflecting the AV’s smooth adaptation toward the HDV’s speed. Meanwhile, the HDV’s speed remains relatively constant at around 15 m/s, with minimal variance. This narrow band of speeds for the AV, centered around the HDV’s speed, indicates that the proposed method avoids large deviations or oscillations, thus contributing to a more comfortable ride.

In the right plot, we display the acceleration distributions for the AV and HDV. The HDV has a single representative acceleration point, corresponding to its nearly constant speed, while the AV’s accelerations spread within a moderate range, ensuring it can merge safely. Notably, the majority of the AV’s accelerations are relatively small in magnitude less than 1 m/s², which verifies that the lane-change maneuver is performed without aggressive throttle or braking inputs. The absence of high positive or negative acceleration spikes also attests to passenger comfort and ride stability. Overall, these distribution analyses confirm that our game-based MPC-DDP method yields both a narrow speed band around the HDV’s velocity and moderate acceleration profiles, thus enhancing safety, smoothness, and overall driving quality.

4.3. Simulation Results in Intersection’s Driving

In this intersection scenario, the AV enters from the left side of the crossroad, while a HDV approaches from the top. Unlike the lane-change scenario, the HDV’s speed here varies slightly to reflect more realistic human driving behavior. Table 8 summarizes the initial positions, speed ranges, and headings for both vehicles. The AV starts at a moderate speed and must pass safely through the intersection, anticipating any HDV speed fluctuations and avoiding collisions.

As indicated in Table 8, the AV’s heading is 0 rad, meaning it travels in the positive x-direction from

x = - 20

to cross the intersection. Meanwhile, the HDV’s heading is set to

- \frac{π}{2}

. Although the HDV’s average speed is around 15 m/s, to make the simulation close to the real-world driving, it can fluctuate within the

14.5

–

15.5

m/s range.

Figure 7 illustrates the intersection scenario, in which the AV travels from left to right, while the HDV moves from top to bottom. The dashed lines indicate approximate intersection boundaries or lane demarcations. Notably, both vehicles maintain collision-free trajectories, highlighting the effectiveness of our game-based MPC-DDP method in handling potentially high-risk crossing maneuvers.

In particular, the AV’s path shows only minor deviations from a straight line, indicating that it does not need to make abrupt turns or stops when crossing the intersection. Meanwhile, the HDV passes steadily from the top to the bottom portion of the road. Despite the HDV’s slight speed variations, the AV anticipates and accounts for these fluctuations in real time, thus maintaining a safe longitudinal and lateral distance. This result underscores the method’s capabilities, allowing the AV to plan ahead and avoid late-reactive maneuvers. The absence of sudden changes in the AV’s trajectory or the HDV’s speed attests to the stability and robustness of the game-based MPC-DDP framework.

Figure 8 depicts the relative speed and distance as well as the speed scatter plot for the intersection scenario, highlighting how our game-based MPC-DDP method balances safety and efficiency. In the top subplot, the blue curve shows the relative speed between the AV and HDV, while the red curve shows their relative distance over time. Initially, the AV accelerates to a higher speed, allowing it to pass through the risky intersection area more quickly. After crossing, the AV gradually reduces its speed, converging closer to the HDV’s velocity. Throughout this process, the minimum relative distance is approximately 1.9 m, ensuring that no collisions occur despite the high-speed crossing.

The bottom subplot provides a speed scatter plot for both the AV and the HDV. As shown, the AV’s speed peaks around the middle of the simulation, then decreases back to a lower level, reflecting its strategy of briefly speeding up to clear the intersection and then resuming safer, more moderate speeds. Meanwhile, the HDV maintains a relatively steady pace, unaffected by the AV’s maneuver. This pattern underscores the flexibility of our method, which not only avoids collisions but also minimizes the time spent in high-risk zones by taking advantage of higher speeds when needed. Overall, these metrics confirm that the proposed game-based MPC-DDP achieves both efficiency through timely intersection crossing and safety through maintaining a minimum distance of 1.9 m and avoiding collisions.

4.4. Benchmark Comparisons for Performance Evaluation

Figure 9 presents bar charts comparing MPC integrated with four popular optimization methods, including PSO [61], QP [63], and AS [33] in terms of average speed, average relative distance, and acceleration variability under two scenarios: lane-change in left side and intersection in right side. Each subplot highlights how our proposed game-based MPC-DDP adapts differently depending on the scenario’s requirements.

Lane-Change Scenario: In the left portion of each subplot, our game-based MPC_DDP achieves both the highest average speed, for example, around 15.2 m/s, and the smallest average gap, such as approximately 3.0 m, compared to the other methods which average around 14.6–15.0 m/s and maintain gaps of 3.5–4.0 m. This behavior is particularly suitable for lane-changing, where higher efficiency and tighter car-following are desirable. By traveling faster and maintaining a closer distance to the HDV, the AV minimizes travel time and merges smoothly without causing unnecessary slowdowns. Furthermore, the acceleration variance for MPC_DDP in this scenario is relatively large, for example, in the range of 0.35–0.40 (m/s²)², whereas other methods typically remain below 0.25 (m/s²)². This broader range indicates that MPC_DDP is more flexible in selecting acceleration strategies, enabling quick adaptation and optimization of both speed and gap under dynamic traffic conditions.
Intersection Scenario: In the right portion of each subplot, MPC_DDP shows the lowest average speed, for example, around 12.5 m/s, and the largest average gap, such as approximately 6.0 m, relative to the other methods which generally maintain speeds of 13.0–14.0 m/s and gaps of 4.0–5.0 m. This conservative, safety-oriented strategy suits the more hazardous intersection environment, where the AV must avoid high-speed conflicts and preserve a larger safety margin. By reducing its speed and increasing the gap, MPC_DDP lowers collision risk and can respond more effectively to unexpected HDV maneuvers. The acceleration variance also remains higher in this scenario, for example, around 0.30–0.35 (m/s²)², again illustrating the AV’s ability to choose from a broader range of accelerations. This flexibility enables the AV to decelerate quickly or accelerate when needed, thereby enhancing overall safety and stability at the intersection.

As shown in Table 9, in both scenarios, the acceleration variability of MPC_DDP remains moderate, implying smooth speed and steering changes. This balance further underscores our method’s capacity to adapt its driving style to be aggressive enough to ensure efficiency in the lane-change case, but cautious enough to ensure safety at the intersection.

4.5. Comprehensive Statistical Evaluation and Benchmark Comparisons

To demonstrate our algorithm’s performance in terms of statistical rigor, generalization capability, and diverse conditions, we conducted a large-scale comprehensive simulation study. This section details the experimental design and compares the proposed Game-MPC-DDP algorithm against three state-of-the-art benchmarks through comprehensive statistical analysis: Mixed-Integer Quadratic Programming MPC (MIQP-MPC) [62], Nonlinear MPC (NMPC) [38], and Deep Reinforcement Learning (Deep RL) [21].

4.5.1. Experimental Design and Setup

To ensure the claims are statistically sound and generalizable, we designed a full-factorial experiment encompassing a wide spectrum of driving scenarios, environmental conditions, and behavioral uncertainties. The experiment consists of 2880 unique scenario configurations, each repeated 30 times via Monte Carlo simulation to account for stochasticity, resulting in a total of 86,400 experimental runs. The experimental factors are as follows:

Scenarios (4 types): Lane Change, Intersection, Highway Merge, and Roundabout, covering major interactive driving challenges.
HDV Configurations (4 levels): 2, 4, 6, and 8 surrounding HDVs to test scalability and complexity.
HDV Driving Styles (4 types):
-
Aggressive: Time headway = 1.0 s, desired speed = 28 m/s, max acceleration = 3.0 m/s².
-
Normal: Time headway = 1.5 s, desired speed = 25 m/s, max acceleration = 2.0 m/s².
-
Conservative: Time headway = 2.0 s, desired speed = 22 m/s, max acceleration = 1.5 m/s².
-
Mixed: A combination of the above styles within a single scenario.
Initial Conditions (5 types): Low density (50 m spacing), Medium density (30 m), High density (15 m), Mixed speeds (15–30 m/s), and an Adversarial setting where HDVs exhibit blocking behaviors.
Weather Conditions (3 types):
-
Clear: Visibility 100%, friction coefficient 1.0.
-
Rain: Visibility 70%, friction coefficient 0.6, speed factor 0.8.
-
Fog: Visibility 30%, friction coefficient 0.9, speed factor 0.6.

This design provides high statistical power (

>

0.95 at

α = 0.05

to detect effects as small as 0.2 standard deviations), ensures comprehensive coverage of real-world variables, and allows for robust validation of our method’s performance.

4.5.2. Statistical Analysis of Safety Performance

The primary metric for evaluation is safety, quantified by the minimum distance between the AV and any HDV during an interaction. Table 10 presents the aggregated safety results across all 86,400 experiments, demonstrating a decisive advantage of the proposed method.

The results show that our method maintains an average minimum distance 6–8 times larger than the benchmarks. This directly translates into superior safety outcomes: zero collisions were recorded in all 2880 scenarios, compared to collision rates of 9.5% to 17.5% for other methods. Furthermore, the number of near misses (distance < 2 m) and Time-to-Collision (TTC) violations for our approach are negligible, approaching zero. The statistical significance of these differences is confirmed with p-values < 0.001 using Welch’s t-test, and the effect size (Cohen’s

d > 2.0

) indicates a very large practical significance.

4.5.3. Performance Under Diverse Conditions

Scalability with Number of HDVs: As shown in Figure 10, the computation time of our method exhibits a sub-linear increase, growing only from 53 ms (2 HDVs) to 55 ms (8 HDVs). More importantly, its safety performance (minimum distance) remains consistently high regardless of complexity, while the performance of other methods degrades significantly with more HDVs.
Robustness to Weather Conditions: Figure 10 illustrates that our method maintains a near-100% success rate (no collisions) across all weather conditions. In contrast, the success rates of benchmark methods, particularly in challenging fog conditions, drop to between 82% and 88%. This demonstrates the superior robustness of our game-theoretic prediction integrated with DDP optimization in adverse perceptual conditions.
Adaptation to Driving Styles and Density: The proposed method successfully adapts its strategy across all HDV driving styles and traffic densities. In aggressive, high-density settings, it prioritizes safety by maintaining larger gaps, while in normal conditions, it efficiently balances safety and traffic flow. The variance in performance metrics across different initial conditions was minimal for our method, confirming its robustness.

4.6. Discussion and Limitations

While the proposed game-based MPC-DDP framework demonstrates effective performance in the tested interactive scenarios, its performance relies on the assumption that human-driven vehicles (HDVs) act as rational agents in a Stackelberg game equilibrium. This model, although useful for structured prediction, represents a simplification of real-world driving behavior. In practice, human drivers may exhibit behaviors that deviate from this assumption due to factors such as unpredictable aggressiveness, hesitation, distraction, or varying levels of risk tolerance.

Such deviations could potentially impact the prediction accuracy of the game-theoretic module, especially in highly adversarial or ambiguous situations. For instance, an overly aggressive HDV might not yield as predicted, while a hesitant one might create unnecessary conservatism in the AV’s plan.

Nevertheless, the receding-horizon nature of the MPC framework provides inherent robustness to moderate prediction errors by frequently re-planning based on updated observations. Furthermore, the safety-oriented cost terms (e.g., the risk potential field and collision avoidance penalty

Ψ (\cdot)

) are designed to explicitly penalize any trajectories that encroach on safety margins, thereby mitigating the consequences of imperfect behavioral predictions.

A promising direction for future work involves enhancing the behavioral model by integrating adaptive or probabilistic driver models that can identify different driving styles (e.g., aggressive, conservative) in real-time. This would allow the AV to adjust its interaction strategy dynamically, leading to even more robust and human-like cooperative driving in mixed traffic environments.

5. Conclusions

In this paper, we proposed a game-based MPC-DDP framework to tackle interactive autonomous driving challenges, particularly in lane-change and intersection scenarios. By integrating a game-theoretic prediction module with a Dynamic Differential Programming (DDP) solver under a receding-horizon control scheme, our method balances safety, efficiency, and comfort more effectively than existing approaches. Experimental results demonstrated that the proposed framework achieves higher average speeds and smaller inter-vehicle gaps when appropriate, while adopting conservative maneuvers in high-risk conditions to maintain larger safety margins. For future work, we intend to expand the proposed approach to handle multiple HDVs with diverse driving behaviors, thereby modeling more complex and realistic traffic interactions. Although the current framework demonstrates strong performance in single-HDV interaction scenarios, extending it to multi-HDV environments presents several critical challenges, primarily encompassing the following aspects. First, computational complexity increases significantly with the number of HDVs, as the game strategy space grows exponentially, potentially compromising real-time performance. To address this, we plan to explore hierarchical optimization structures or distributed computing approaches, combined with approximate dynamic programming to enhance computational efficiency. Second, the stability of equilibrium strategies in multi-agent games remains a critical concern. Nash equilibria or Stackelberg equilibria may be non-existent or non-unique under certain conditions. Therefore, we consider introducing learning-based behavioral prediction mechanisms to enhance adaptability to diverse driving styles and develop more robust equilibrium selection strategies. Additionally, common conflict coupling phenomena in multi-vehicle interactions—such as simultaneous lane changes or intersecting paths at intersections—require more refined coordination mechanisms. We propose employing conflict graph modeling or dynamic priority allocation strategies, combined with real-time replanning techniques to ensure system safety. Finally, the diversity of driving behaviors (e.g., aggressive, conservative) significantly impacts interaction strategies. Future work will integrate a driver behavior recognition module and design adaptive cost functions to dynamically respond to different behavioral patterns.

Author Contributions

Conceptualization: Z.W. (Zhenhua Wang) and Z.W. (Zheng Wu); methodology: Z.W. (Zheng Wu); software, Z.W. (Zheng Wu); validation: J.Y., Z.W. (Zheng Wu), and S.H.; formal analysis: Z.W. (Zheng Wu); investigation: Z.W. (Zheng Wu) and Z.W. (Zhenhua Wang); resources: F.Y.; data curation: F.Y.; writing—original draft preparation: Z.W. (Zhenhua Wang) and Z.W. (Zheng Wu); writing—review and editing: S.H.; visualization: S.H.; supervision: Z.W. (Zhenhua Wang); project administration: Z.W. (Zhenhua Wang); funding acquisition: J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

The authors would like to express their sincere gratitude to the Fundamental Research Funds for the Central Universities Special Fund Project (Grant No. LGZD202501), the General Research Projects in Philosophy and Social Sciences of Colleges and Universities in Jiangsu Province (Grant No. 2025SJYB0084), the Harbin Xinguang Optic-electronics Technology Co., Ltd. Horizontal Research Project (Grant No. 2024320107003397) for providing funds to support this study.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhu, Q.; Xiao, J.; Fan, L. IndoorMS: A Multispectral Dataset for Semantic Segmentation in Indoor Scene Understanding. IEEE Sens. J. 2025, 25, 19837–19847. [Google Scholar] [CrossRef]
Song, Z.; Liu, L.; Jia, F.; Luo, Y.; Jia, C.; Zhang, G.; Yang, L.; Wang, L. Robustness-aware 3D object detection in autonomous driving: A review and outlook. IEEE Trans. Intell. Transp. Syst. 2024, 25, 15407–15436. [Google Scholar] [CrossRef]
Lin, Z.; Zhang, Q.; Tian, Z.; Yu, P.; Lan, J. DPL-SLAM: Enhancing Dynamic Point-Line SLAM Through Dense Semantic Methods. IEEE Sens. J. 2024, 24, 14596–14607. [Google Scholar] [CrossRef]
Lin, Z.; Zhang, Q.; Tian, Z.; Yu, P.; Ye, Z.; Zhuang, H.; Lan, J. Slam2: Simultaneous localization and multimode mapping for indoor dynamic environments. Pattern Recognit. 2025, 158, 111054. [Google Scholar] [CrossRef]
Zhu, Q.; Cai, Y.; Fan, L. Performance of xLSTM for Semantic Segmentation of Remotely Sensed Images. In Proceedings of the 2024 7th International Conference on Sensors, Signal and Image Processing, Shenzhen, China, 22–24 November 2024; pp. 90–96. [Google Scholar]
Cui, J.; Yuan, C.; Zhang, D. Multi-scale feature aggregation with hierarchical semantics and uncertainty assessment: Enabling high-accuracy visual retrieval. J. Supercomput. 2025, 81, 1141. [Google Scholar] [CrossRef]
Reda, M.; Onsy, A.; Haikal, A.Y.; Ghanbari, A. Path planning algorithms in the autonomous driving system: A comprehensive review. Robot. Auton. Syst. 2024, 174, 104630. [Google Scholar] [CrossRef]
Chen, L.; Wu, P.; Chitta, K.; Jaeger, B.; Geiger, A.; Li, H. End-to-end autonomous driving: Challenges and frontiers. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10164–10183. [Google Scholar] [CrossRef]
Teng, S.; Hu, X.; Deng, P.; Li, B.; Li, Y.; Ai, Y.; Yang, D.; Li, L.; Xuanyuan, Z.; Zhu, F.; et al. Motion planning for autonomous driving: The state of the art and future perspectives. IEEE Trans. Intell. Veh. 2023, 8, 3692–3711. [Google Scholar] [CrossRef]
Yuan, F.; Lin, Z.; Tian, Z.; Chen, B.; Zhou, Q.; Yuan, C.; Sun, H.; Huang, Z. Bio-inspired hybrid path planning for efficient and smooth robotic navigation. Int. J. Intell. Robot. Appl. 2025, 1–31. [Google Scholar] [CrossRef]
Xu, L.; Yuan, C.; Jiang, Z. Multi-strategy enhanced secret bird optimization algorithm for solving obstacle avoidance path planning for mobile robots. Mathematics 2025, 13, 717. [Google Scholar] [CrossRef]
Lin, Z.; Tian, Z.; Zhang, Q.; Zhuang, H.; Lan, J. Enhanced visual slam for collision-free driving with lightweight autonomous cars. Sensors 2024, 24, 6258. [Google Scholar] [CrossRef]
Tsai, J.; Chang, Y.T.; Chen, Z.Y.; You, Z. Autonomous Driving Control for Passing Unsignalized Intersections Using the Semantic Segmentation Technique. Electronics 2024, 13, 484. [Google Scholar] [CrossRef]
Barruffo, L.; Caiazzo, B.; Petrillo, A.; Santini, S. A GoA4 control architecture for the autonomous driving of high-speed trains over ETCS: Design and experimental validation. IEEE Trans. Intell. Transp. Syst. 2024, 25, 5096–5111. [Google Scholar] [CrossRef]
Yuan, F.; Zuo, Z.; Jiang, Y.; Shu, W.; Tian, Z.; Ye, C.; Yang, J.; Mao, Z.; Huang, X.; Gu, S.; et al. AI-Driven Optimization of Blockchain Scalability, Security, and Privacy Protection. Algorithms 2025, 18, 263. [Google Scholar] [CrossRef]
He, Z.; Xu, R.; Wang, B.; Meng, Q.; Tang, Q.; Shen, L.; Tian, Z.; Duan, J. Integrated Blockchain and Federated Learning for Robust Security in Internet of Vehicles Networks. Symmetry 2025, 17, 1168. [Google Scholar] [CrossRef]
Qin, Z.; Ji, A.; Sun, Z.; Wu, G.; Hao, P.; Liao, X. Game theoretic application to intersection management: A literature review. IEEE Trans. Intell. Veh. 2024, 10, 2589–2607. [Google Scholar] [CrossRef]
Zhang, J.; Guo, X.; Fu, Z.; Liu, Y.; Ding, Y. Non-Cooperative Game Theory Based Driver-Automation Shared Steering Control Considering Driver Steering Behavior Characteristics. IEEE Internet Things J. 2024, 11, 28465–28479. [Google Scholar] [CrossRef]
Wang, J.; Zhou, A.; Liu, Z.; Peeta, S. Robust cooperative control strategy for a platoon of connected and autonomous vehicles against sensor errors and control errors simultaneously in a real-world driving environment. Transp. Res. B Methodol. 2024, 184, 102946. [Google Scholar] [CrossRef]
Lin, Z.; Tian, Z.; Zhang, Q.; Ye, Z.; Zhuang, H.; Lan, J. A Conflicts-Free, Speed-Lossless KAN-Based Reinforcement Learning Decision System for Interactive Driving in Roundabouts. IEEE Trans. Intell. Transp. Syst. 2025, Early Access, 1–14. [Google Scholar] [CrossRef]
Tian, Z.; Zhao, D.; Lin, Z.; Flynn, D.; Zhao, W.; Tian, D. Balanced reward-inspired reinforcement learning for autonomous vehicle racing. In Proceedings of the 6th Annual Learning for Dynamics & Control Conference, Oxford, UK, 15–17 July 2024; pp. 628–640. [Google Scholar]
Tian, Z.; Lin, Z.; Zhao, D.; Zhao, W.; Flynn, D.; Ansari, S.; Wei, C. Evaluating Scenario-based Decision-making for Interactive Autonomous Driving Using Rational Criteria: A Survey. arXiv 2025, arXiv:2501.01886. [Google Scholar]
Yang, Q.; Chen, J.; Yuan, C.; Na, L.; Chen, W.; Sun, J.; Cai, Y. Spatial distribution prediction of environmental microbial communities based on deep learning combined with MAXENT model. In Proceedings of the Fourth International Conference on Electronics Technology and Artificial Intelligence (ETAI 2025), Harbin, China, 21–23 February 2025; Volume 13692, pp. 954–959. [Google Scholar]
Hassija, V.; Chamola, V.; Mahapatra, A.; Singal, A.; Goel, D.; Huang, K.; Scardapane, S.; Spinelli, I.; Mahmud, M.; Hussain, A. Interpreting black-box models: A review on explainable artificial intelligence. Cogn. Comput. 2024, 16, 45–74. [Google Scholar] [CrossRef]
Zhou, X.; Ye, W.; Lee, Z.; Zou, L.; Zhang, S. Valuing Training Data via Causal Inference for In-Context Learning. IEEE Trans. Knowl. Data Eng. 2025, 37, 3824–3840. [Google Scholar] [CrossRef]
Zhou, X.; Ye, W.; Wang, Y.; Jiang, C.; Lee, Z.; Xie, R.; Zhang, S. Enhancing In-Context Learning via Implicit Demonstration Augmentation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, Bangkok, Thailand, 11–16 August 2024; Volume 1: Long Papers. pp. 2810–2828. [Google Scholar]
Zhou, X.; Zhang, M.; Lee, Z.; Ye, W.; Zhang, S. HaDeMiF: Hallucination Detection and Mitigation in Large Language Models. In Proceedings of the The Thirteenth International Conference on Learning Representations, Singapore, 24–28 April 2025. [Google Scholar]
Xiao, N.; Yuan, C.; Pei, Y.; Xue, W.; Cai, Y. A study of artificial intelligence in writing assessment for secondary school students: A comparative analysis based on the GPT-4 and human raters. Educ. Stud. 2025, 1–23. [Google Scholar] [CrossRef]
Kraiem, H.; Flah, A.; Mohamed, N.; Alowaidi, M.; Bajaj, M.; Mishra, S.; Sharma, N.K.; Sharma, S.K. Increasing electric vehicle autonomy using a photovoltaic system controlled by particle swarm optimization. IEEE Access 2021, 9, 72040–72054. [Google Scholar] [CrossRef]
Li, Q.; Tian, Z.; Wang, X.; Yang, J.; Lin, Z. Adaptive Field Effect Planner for Safe Interactive Autonomous Driving on Curved Roads. arXiv 2025, arXiv:2504.14747. [Google Scholar] [CrossRef]
Li, Q.; Tian, Z.; Wang, X.; Yang, J.; Lin, Z. Efficient and Safe Planner for Automated Driving on Ramps Considering Unsatisfication. arXiv 2025, arXiv:2504.15320. [Google Scholar] [CrossRef]
Jeong, D.; Choi, S.B. Efficient trajectory planning for autonomous vehicles using quadratic programming with weak duality. IEEE Trans. Intell. Veh. 2023, 9, 2878–2892. [Google Scholar] [CrossRef]
Rontsis, N.; Goulart, P.J.; Nakatsukasa, Y. An active-set algorithm for norm constrained quadratic problems. Math. Program. 2022, 193, 447–483. [Google Scholar] [CrossRef]
Liu, Y.; Tian, Z.; Zhou, Q.; Huang, Z.; Sun, H. An ACO-MPC Framework for Energy-Efficient and Collision-Free Path Planning in Autonomous Maritime Navigation. In Proceedings of the 2025 8th International Conference on Advanced Algorithms and Control Engineering (ICAACE), Shanghai, China, 21–23 March 2025; pp. 344–354. [Google Scholar]
Wang, Y.; Zhang, H.; Yuan, C.; Li, X.; Jiang, Z. An efficient scheduling method in supply chain logistics based on network flow. Processes 2025, 13, 969. [Google Scholar] [CrossRef]
Ji, K.; Orsag, M.; Han, K. Lane-merging strategy for a self-driving car in dense traffic using the stackelberg game approach. Electronics 2021, 10, 894. [Google Scholar] [CrossRef]
Zheng, L.; Wang, X.; Li, F.; Mao, Z.; Tian, Z.; Peng, Y.; Yuan, F.; Yuan, C. A Mean-Field-Game-Integrated MPC-QP Framework for Collision-Free Multi-Vehicle Control. Drones 2025, 9, 375. [Google Scholar] [CrossRef]
Batkovic, I.; Gupta, A.; Zanon, M.; Falcone, P. Experimental validation of safe mpc for autonomous driving in uncertain environments. IEEE Trans. Control. Syst. Technol. 2023, 31, 2027–2042. [Google Scholar] [CrossRef]
Liu, Y.; Tian, Z.; Yang, J.; Lin, Z. Data-Driven Evolutionary Game-Based Model Predictive Control for Hybrid Renewable Energy Dispatch in Autonomous Ships. In Proceedings of the 2025 4th International Conference on New Energy System and Power Engineering (NESP), Fuzhou, China, 25–27 April 2025; pp. 482–490. [Google Scholar]
Jallet, W.; Bambade, A.; Mansard, N.; Carpentier, J. Constrained differential dynamic programming: A primal-dual augmented lagrangian approach. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; pp. 13371–13378. [Google Scholar]
Zhao, Y.; Wang, D.; Zhu, Q.; Fan, L.; Bao, Y. Algorithm-Driven Extraction of Point Cloud Data Representing Bottom Flanges of Beams in a Complex Steel Frame Structure for Deformation Measurement. Buildings 2024, 14, 2847. [Google Scholar] [CrossRef]
Zhu, Q.; Fan, L.; Weng, N. Advancements in point cloud data augmentation for deep learning: A survey. Pattern Recognit. 2024, 153, 110532. [Google Scholar] [CrossRef]
Zhu, Q.; Fang, Y.; Fan, L. MSCrackMamba: Leveraging Vision Mamba for Crack Detection in Fused Multispectral Imagery. arXiv 2024, arXiv:2412.06211. [Google Scholar] [CrossRef]
Zhu, Q.; Jiang, Y.; Fan, L. ClassWise-CRF: Category-Specific Fusion for Enhanced Semantic Segmentation of Remote Sensing Imagery. arXiv 2025, arXiv:2504.21491. [Google Scholar]
Zhu, Q.; Weng, N.; Fan, L.; Cai, Y. Enhancing environmental monitoring through multispectral imaging: The WasteMS dataset for semantic segmentation of lakeside waste. In Proceedings of the International Conference on Multimedia Modeling, Nara, Japan, 8–10 January 2025; pp. 362–372. [Google Scholar]
Zhu, Q.; Cao, J.; Cai, Y.; Fan, L. Evaluating the impact of point cloud colorization on semantic segmentation accuracy. In Proceedings of the 2024 IEEE 8th International Conference on Vision, Image and Signal Processing (ICVISP), Kunming, China, 27–29 December 2024; pp. 1–5. [Google Scholar]
Zhou, J.; Wu, Y.; Zhang, Y.; Zhang, Y.; Liu, Y.; Huang, B.; Yuan, C. SemIRNet: A Semantic Irony Recognition Network for Multimodal Sarcasm Detection. In Proceedings of the 2025 10th International Conference on Information and Network Technologies (ICINT), Melbourne, Australia, 12–14 March 2025; pp. 158–162. [Google Scholar]
Zhu, Q.; Cai, Y.; Fan, L. Seg-LSTM: Performance of xLSTM for Semantic Segmentation of Remotely Sensed Images. arXiv 2024. [Google Scholar] [CrossRef]
Zhu, Q.; Fang, Y.; Cai, Y.; Chen, C.; Fan, L. Rethinking scanning strategies with vision mamba in semantic segmentation of remote sensing imagery: An experimental study. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 18223–18234. [Google Scholar] [CrossRef]
Zhu, Q.; Cai, Y.; Fang, Y.; Yang, Y.; Chen, C.; Fan, L.; Nguyen, A. Samba: Semantic segmentation of remotely sensed images with state space model. Heliyon 2024, 10, e38495. [Google Scholar] [CrossRef]
Zhu, Z.; Li, X.; Zhai, J.; Hu, H. PODB: A learning-based polarimetric object detection benchmark for road scenes in adverse weather conditions. Inf. Fusion 2024, 108, 102385. [Google Scholar] [CrossRef]
Zhu, Z.; Li, X.; Ma, Q.; Zhai, J.; Hu, H. FDNet: Fourier transform guided dual-channel underwater image enhancement diffusion network. Sci. China Technol. Sci. 2025, 68, 1100403. [Google Scholar] [CrossRef]
Liu, Y.; Wu, Y.; Li, W.; Cui, Y.; Wu, C.; Guo, G. Designing External Displays for Safe AV-HDV Interactions: Conveying Scenarios Decisions of Intelligent Cockpit. In Proceedings of the 2023 7th CAA International Conference on Vehicular Control and Intelligence (CVCI), Changsha, China, 27–29 October 2023; pp. 1–8. [Google Scholar]
Liang, J.; Tan, C.; Yan, L.; Zhou, J.; Yin, G.; Yang, K. Interaction-Aware Trajectory Prediction for Safe Motion Planning in Autonomous Driving: A Transformer-Transfer Learning Approach. arXiv 2024, arXiv:2411.01475. [Google Scholar] [CrossRef]
Gong, B.; Wang, F.; Lin, C.; Wu, D. Modeling HDV and CAV mixed traffic flow on a foggy two-lane highway with cellular automata and game theory model. Sustainability 2022, 14, 5899. [Google Scholar] [CrossRef]
Yao, Z.; Deng, H.; Wu, Y.; Zhao, B.; Li, G.; Jiang, Y. Optimal lane-changing trajectory planning for autonomous vehicles considering energy consumption. Expert Syst. Appl. 2023, 225, 120133. [Google Scholar] [CrossRef]
Liu, Y.; Zhou, B.; Wang, X.; Li, L.; Cheng, S.; Chen, Z.; Li, G.; Zhang, L. Dynamic lane-changing trajectory planning for autonomous vehicles based on discrete global trajectory. IEEE Trans. Intell. Transp. Syst. 2021, 23, 8513–8527. [Google Scholar] [CrossRef]
Chai, R.; Tsourdos, A.; Chai, S.; Xia, Y.; Savvaris, A.; Chen, C.P. Multiphase overtaking maneuver planning for autonomous ground vehicles via a desensitized trajectory optimization approach. IEEE Trans. Ind. Inform. 2022, 19, 74–87. [Google Scholar] [CrossRef]
Palatti, J.; Aksjonov, A.; Alcan, G.; Kyrki, V. Planning for safe abortable overtaking maneuvers in autonomous driving. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; pp. 508–514. [Google Scholar]
Daryina, A.N.; Prokopiev, I.V. Parametric optimization of unmanned vehicle controller by PSO algorithm. Procedia Comput. Sci. 2021, 186, 787–792. [Google Scholar] [CrossRef]
Fernandes, P.B.; Oliveira, R.; Neto, J.F. Trajectory planning of autonomous mobile robots applying a particle swarm optimization algorithm with peaks of diversity. Appl. Soft Comput. 2022, 116, 108108. [Google Scholar] [CrossRef]
Quirynen, R.; Safaoui, S.; Di Cairano, S. Real-time mixed-integer quadratic programming for vehicle decision-making and motion planning. IEEE Trans. Control. Syst. Technol. 2024, 33, 77–91. [Google Scholar] [CrossRef]
Zhang, Z.; Zhang, L.; Deng, J.; Wang, M.; Wang, Z.; Cao, D. An enabling trajectory planning scheme for lane change collision avoidance on highways. IEEE Trans. Intell. Veh. 2021, 8, 147–158. [Google Scholar] [CrossRef]
Fan, W.; He, H.; Lu, B. Online Active Set-Based Longitudinal and Lateral Model Predictive Tracking Control of Electric Autonomous Driving. Appl. Sci. 2021, 11, 9259. [Google Scholar] [CrossRef]
Hua, M.; Qi, X.; Chen, D.; Jiang, K.; Liu, Z.E.; Sun, H.; Zhou, Q.; Xu, H. Multi-Agent Reinforcement Learning for Connected and Automated Vehicles Control: Recent Advancements and Future Prospects. IEEE Trans. Autom. Sci. Eng. 2025, 22, 16266–16286. [Google Scholar] [CrossRef]
Zheng, Z.; Gu, S. Safe Multiagent Reinforcement Learning With Bilevel Optimization in Autonomous Driving. IEEE Trans. Artif. Intell. 2025, 6, 829–842. [Google Scholar] [CrossRef]
Wang, Y.; Wu, Y.; Tang, Y.; Li, Q.; He, H. Cooperative energy management and eco-driving of plug-in hybrid electric vehicle via multi-agent reinforcement learning. Appl. Energy 2023, 332, 120563. [Google Scholar] [CrossRef]
Chen, S.; Wang, M.; Song, W.; Yang, Y.; Fu, M. Multi-agent reinforcement learning-based decision making for twin-vehicles cooperative driving in stochastic dynamic highway environments. IEEE Trans. Veh. Technol. 2023, 72, 12615–12627. [Google Scholar] [CrossRef]
Yadav, P.; Mishra, A.; Kim, S. A Comprehensive Survey on Multi-Agent Reinforcement Learning for Connected and Automated Vehicles. Sensors 2023, 23, 4710. [Google Scholar] [CrossRef]

Figure 1. Illustration of interactive driving.

Figure 2. Game-Based MPC-DDP architecture for interactive driving.

Figure 3. Considered simulation scenarios. (a) the lane-changing scenario; (b) driving into intersection.

Figure 4. Scenario 1: Trajectories of the autonomous vehicle (AV, blue) and human-driven vehicle (HDV, red) during a lane-change scenario. The AV starts in the lower lane (y = 0 m) and merges into the adjacent lane (y = 3.5 m) occupied by the HDV. Solid black lines indicate lane boundaries; the dashed line represents the target lane center. The smooth trajectory demonstrates effective collision avoidance and comfortable merging behavior.

Figure 5. Top side: Performance metrics during lane-change: Top subplot shows the relative speed (blue, left axis) and relative distance (red, right axis) between the AV and HDV over time. The minimum relative distance is approximately 19.8 m, ensuring safety. Bottom subplot displays the speed profiles of the AV (blue) and HDV (red), illustrating smooth speed adaptation by the AV to match the HDV’s velocity.

Figure 6. Left side: Distribution analysis of speed and acceleration during lane−change: Left histogram shows the speed distribution of the AV (blue) and HDV (red), with the AV converging smoothly to the HDV’s speed. Right plot displays acceleration distributions, indicating that the AV’s accelerations are moderate (mostly below 1 m/s), ensuring passenger comfort and stable merging.

Figure 7. Scenario 2: Trajectories of the AV (blue) and HDV (red) in an intersection scenario. The AV enters from the left (x = −20 m) and moves rightward; the HDV enters from the top (y = 20 m) and moves downward. Dashed lines indicate the intersection area. The AV maintains a nearly straight path with minimal deviation, demonstrating effective collision avoidance and real-time adaptation to the HDV’s motion.

Figure 8. Top side: Performance metrics during intersection crossing: Top subplot shows the relative speed (blue) and relative distance (red) between the AV and HDV. The minimum relative distance is approximately 1.9 m, ensuring safety despite high-speed interaction. Bottom subplot displays the speed profiles: the AV (blue) briefly accelerates to clear the intersection, then decelerates, while the HDV (red) maintains a steady speed.

Figure 9. Comparative performance of the proposed MPC-DDP method against PSO, QP, and AS algorithms in lane-change (left) and intersection (right) scenarios. Metrics include average speed, average relative distance, and acceleration variance. MPC-DDP achieves higher speed and tighter gaps in lane-change, and lower speed and larger gaps in intersection, demonstrating scenario-aware adaptability.

Figure 10. Comprehensive Analysis of Safety Distance and Computational Efficiency in Autonomous Driving Algorithms Across Multiple Scenarios.

Table 1. Comparison of Optimization Methods in Autonomous Driving.

Method	Representative Works	Strengths	Limitations	Issues to Address
Particle Swarm Optimization	Daryina et al. [60], Fernandes et al. [61]	Good exploration of large solution space	Slow convergence; prone to local optima	Improve convergence speed; enhance robustness in fast-changing traffic scenarios
Quadratic Programming	Quirynen et al. [62], Zhang et al. [63]	Efficient real-time control; handles convex constraints well	Relies on linear/convex approximations; reduced accuracy for nonlinear dynamics	Incorporate more accurate nonlinear dynamics modeling while maintaining real-time capability
Active Search	Rontsis et al. [33], Fan et al. [64]	Flexible parameter tuning; adaptive search	Sensitive to parameter choices; inconsistent under high uncertainty	Develop robust parameter adaptation; ensure consistent performance under traffic uncertainties

Table 2. Comparison of related work on MARL for autonomous driving.

Reference	Application Scenario	Main Contribution	Limitations/Issues
Hua et al. [65]	CAV control (fleet coordination, lane change, intersections)	Systematic review of MARL in CAVs; emphasizes dynamic interaction handling, traffic flow, and fuel efficiency	Lacks algorithmic innovations; challenges in real-world deployment not deeply addressed
Zheng et al. [66]	Safety in merging, roundabouts, intersections	Proposes secure MARL via Stackelberg game and two-layer optimization; designs CSQ and CS-MADDPG for improved safety and rewards	Focused mainly on safety; scalability and adaptability under highly dynamic environments need further validation
Wang et al. [67]	Energy management and eco-driving of hybrid vehicles	Formulates energy management as cooperative MARL; achieves reduced fuel consumption and ensures safe following	Limited to eco-driving; generalization to diverse driving tasks not verified
Chen et al. [68]	Cooperative decision-making in highway environments	Proposes fair cooperative MARL for convoy formation vs. free overtaking; enhances adaptability and efficiency	Limited to two-vehicle scenarios; scalability to larger multi-agent systems remains uncertain
Yadav et al. [69]	General review of MARL in CAV domain	Provides comprehensive survey of research issues, methods, and future directions; offers research framework	Review-oriented; does not provide concrete solutions or experiments

Table 3. DDP Algorithm Parameters and Justification.

Parameter	Value	Description	Justification
$N_{\max}$	10	Maximum iterations	Balance between convergence quality and real-time constraints
$ϵ_{tol}$	$1 \times 10^{- 3}$	Convergence tolerance	Ensures numerical stability while avoiding over-optimization
$λ_{reg}$	$1 \times 10^{- 2}$	Hessian regularization	Prevents ill-conditioning based on vehicle dynamics analysis
$α_{ls}$	0.5	Line search factor	Armijo criterion for sufficient descent
$δ_{fd}$	$1 \times 10^{- 6}$	Finite difference step	Forward difference for Jacobian computation

Table 4. Safety Constraints and Physical Limits.

Constraint Type	Value	Description	Standard Reference
Collision Avoidance
$d_{safe}$	2.0 m	Minimum safe distance	Vehicle length + reaction margin
$d_{coll}$	1.0 m	Collision threshold	Conservative estimate for control failure
${TTC}_{crit}$	2.0 s	Critical time-to-collision	European NCAP standard
Vehicle Dynamics
$a_{\max}$	3.0 m/s²	Maximum acceleration	Tire friction limit (dry asphalt)
$a_{\min}$	−4.0 m/s²	Maximum deceleration	Emergency braking capability
${\dot{ω}}_{\max}$	0.5 rad/s	Maximum steering rate	Actuator saturation limit
$v_{\max}$	35.0 m/s	Maximum velocity	Highway speed limit compliance
Comfort Constraints
$a_{comfort}$	2.0 m/s²	Comfort acceleration limit	Passenger comfort threshold
$j_{\max}$	2.0 m/s³	Maximum jerk	Longitudinal comfort standard
$a_{lat, \max}$	4.0 m/s²	Maximum lateral acceleration	Lateral comfort limit

Table 5. Game-Theoretic Parameters and Cost Function Weights.

Parameter	Value	Description
Game Structure
M	5	Game prediction horizon
$N_{leader}$	3	Leader optimization iterations
$N_{follower}$	2	Follower response iterations
$ϵ_{Nash}$	$1 \times 10^{- 2}$	Nash equilibrium tolerance
AV (Leader) Cost Weights
$Q_{track}$	$diag ([10, 10, 1, 5])$	State tracking weights $[x, y, v, θ]$
$R_{ctrl}$	$diag ([1, 10])$	Control effort weights $[a, ω]$
$λ_{safety}$	100	Safety penalty weight
$λ_{comfort}$	10	Comfort penalty weight
HDV (Follower) Driving Style Weights
Aggressive:
$Q_{speed}$	15	Speed preference weight
$Q_{gap}$	5	Gap maintenance weight
$T_{h}$	1.0 s	Desired time headway
Normal:
$Q_{speed}$	10	Speed preference weight
$Q_{gap}$	10	Gap maintenance weight
$T_{h}$	1.5 s	Desired time headway
Conservative:
$Q_{speed}$	5	Speed preference weight
$Q_{gap}$	20	Gap maintenance weight
$T_{h}$	2.0 s	Desired time headway

Table 6. Parameter Sensitivity Analysis.

Parameter	$\pm 50 %$ Change	Safety Impact	Comfort Impact
$λ_{safety}$	$50 \to 150$	$\pm 5 %$	$\pm 15 %$
$d_{safe}$	$1.5 \to 3.0$ m	$\pm 2 %$	$\pm 8 %$
$N_{\max}$	$5 \to 15$	$\pm 1 %$	$\pm 3 %$
M (game horizon)	$3 \to 7$	$\pm 3 %$	$\pm 12 %$

Table 7. Initial Conditions for the Lane-Changing Scenario.

Vehicle	Initial x (m)	Initial y (m)	Initial Speed (m/s)	Heading $θ$ (rad)
AV	0	0	15	0
HDV	20	3.5	15	0

Table 8. Initial Conditions for the Intersection Scenario.

Vehicle	Initial x (m)	Initial y (m)	Speed Range (m/s)	Heading (rad)	Comment
AV	$- 20$	0	15 (fixed)	0	Approaches from left
HDV	0	20	$14.5$ – $15.5$	$- \frac{π}{2}$	Approaches from top

Table 9. Key Requirements and Our Method’s Advantages in Each Scenario.

Scenario	Requirement	Proposed Method’s Result	Advantage
Lane-Change	High efficiency and tight car-following	Highest average speed; smallest gap to HDV	Minimizes travel time; merges smoothly
Intersection	Reduced speed in risky environment	Lowest average speed; largest gap to HDV	Avoids collisions; improves safety margin

Table 10. Comparative Safety Performance Metrics (Mean ± Std. Dev.).

Method	Min Distance (m)	Collision Rate	Near Misses (<2 m)	TTC Violations
Game-MPC-DDP (Ours)	10.02 ± 7.24	0.000 ± 0.000	0.1 ± 0.3	0.0 ± 0.1
MIQP-MPC	1.40 ± 0.67	0.095 ± 0.009	2.8 ± 1.2	1.9 ± 0.8
NMPC	1.54 ± 0.76	0.140 ± 0.012	2.5 ± 1.4	1.7 ± 0.9
Deep RL	1.20 ± 0.68	0.175 ± 0.015	3.4 ± 1.6	2.3 ± 1.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the World Electric Vehicle Association. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Z.; Wu, Z.; Hu, S.; Yuan, F.; Yang, J. Game-Aware MPC-DDP for Mixed Traffic: Safe, Efficient, and Comfortable Interactive Driving. World Electr. Veh. J. 2025, 16, 544. https://doi.org/10.3390/wevj16090544

AMA Style

Wang Z, Wu Z, Hu S, Yuan F, Yang J. Game-Aware MPC-DDP for Mixed Traffic: Safe, Efficient, and Comfortable Interactive Driving. World Electric Vehicle Journal. 2025; 16(9):544. https://doi.org/10.3390/wevj16090544

Chicago/Turabian Style

Wang, Zhenhua, Zheng Wu, Shiguang Hu, Fujiang Yuan, and Junye Yang. 2025. "Game-Aware MPC-DDP for Mixed Traffic: Safe, Efficient, and Comfortable Interactive Driving" World Electric Vehicle Journal 16, no. 9: 544. https://doi.org/10.3390/wevj16090544

APA Style

Wang, Z., Wu, Z., Hu, S., Yuan, F., & Yang, J. (2025). Game-Aware MPC-DDP for Mixed Traffic: Safe, Efficient, and Comfortable Interactive Driving. World Electric Vehicle Journal, 16(9), 544. https://doi.org/10.3390/wevj16090544

Article Menu

Game-Aware MPC-DDP for Mixed Traffic: Safe, Efficient, and Comfortable Interactive Driving

Abstract

1. Introduction

2. Related Works

2.1. Challenges in Interactive Autonomous Driving

2.2. Shortcomings of Current Optimization-Based Approaches

2.3. Multi-Agent Reinforcement Learning for Driving Scenarios

3. Methodology

3.1. Overall Framework

3.2. Interactive Driving Environment

3.3. Game-Based Interaction

3.4. MPC-DDP Formulation

3.5. Computational Complexity and Scalability of DDP

4. Experimental Evaluation

4.1. Implementation Details and Parameters

4.1.1. DDP Algorithm Parameters

4.1.2. Safety Constraints and Limits

4.1.3. Game-Theoretic Parameters

4.1.4. Discretization Scheme

(1) Computational Complexity

(2) Numerical Stability Measures

(3) Reproducibility Guidelines

4.1.5. Parameter Sensitivity Analysis

4.2. Effective Driving in Lane-Changing Scenario

4.2.1. Lane-Changing Scenario Setup

4.2.2. Simulation Results in Lane-Changing Scenario

4.3. Simulation Results in Intersection’s Driving

4.4. Benchmark Comparisons for Performance Evaluation

4.5. Comprehensive Statistical Evaluation and Benchmark Comparisons

4.5.1. Experimental Design and Setup

4.5.2. Statistical Analysis of Safety Performance

4.5.3. Performance Under Diverse Conditions

4.6. Discussion and Limitations

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI