To ensure high-fidelity tracking accuracy and realistic dynamic response, a nonlinear dynamic bicycle model is employed as the predictive model for the MPC controller. Unlike kinematic models, this model explicitly accounts for tire slip angles and lateral dynamics, which are critical in ramp merging maneuvers.
3.4.1. Modeling of the Cooperative Control Problem
Based on the system architecture, the vehicle state vector at time step is defined as , where denotes the global position, is the yaw angle, and are the longitudinal and lateral velocities in the vehicle body frame, and is the yaw rate. The control input vector is defined as , representing the longitudinal acceleration and the front wheel steering angle, respectively.
The continuous-time nonlinear dynamics are governed by the following differential equations, utilizing the parameters defined in Table 5:
The lateral tire forces
and
are approximated using a linear tire model, which is valid for the operational range of ramp merging:
where
is the vehicle mass,
is the yaw moment of inertia,
and
are the distances from the center of mass to the front and rear axles, and
and
are the equivalent cornering stiffnesses of the front and rear tires, as listed in Table 5.
This section establishes a receding horizon optimization method where each vehicle in the multi-vehicle system is independently equipped with an MPC controller. Only the first optimal control action is executed in each control cycle, and the optimal policy is recalculated in the next cycle based on the latest ego-vehicle state, surrounding vehicle states, and the updated reference trajectory. To ensure real-time performance, the coordination adopts a non-iterative (single-shot) information exchange protocol per time step to minimize communication latency. Although this introduces minor prediction mismatches, the Receding Horizon Control (RHC) mechanism inherently compensates for these errors by re-optimizing trajectories at the subsequent step.
In this study, the number of planned reference trajectories for each vehicle is set to 1. Based on the architecture of the integrated decision-control scheme and combined with the problem formulation of cooperative control, the objective function of cooperative controller i is given as follows:
where
and
denote the control and state variables of vehicle
at the
-th step within the prediction horizon, respectively, and
represents the reference trajectory planned for vehicle
i at time
t. Here,
denotes the tracking accuracy,
denotes the fuel consumption, and
denotes performance indicators such as driving comfort. To prevent erratic vehicle behavior and ensure smooth control despite potential sub-optimalities in the neural approximation, the cost function explicitly penalizes high-frequency control fluctuations. Specifically, the weight matrix
(associated with
) imposes heavy penalties on the rate of change of control inputs (
), thereby physically constraining the DNN to output smooth and continuous trajectories.
To achieve the desired cooperative behavior, the MPC optimization problem is formulated with a multi-objective cost function that explicitly balances tracking accuracy, energy efficiency, driving comfort, and safety. The full objective function
over the prediction horizon
is defined as follows:
where
and
are the predicted and reference state vectors at step
, respectively.
is the control input vector, and
represents the rate of change of control inputs, which serves as the smoothness term to ensure passenger comfort.
is a soft penalty function for collision avoidance constraints (as defined in Equation (16)) with a high penalty weight
.
The weighting matrices are defined as diagonal matrices to normalize and prioritize different objectives:
, and
. The specific numerical values for all weight terms used in the simulation are reported in
Table 3.
To ensure control effectiveness and safety, four types of constraint conditions are established.
Equality and inequality constraints are formulated based on vehicle dynamics, traffic legality in ramp merging scenarios, and driving safety on ramps. The vehicle dynamics model
serves as the equality constraint relating the control vector
and the state vector
:
For traffic regulation constraints, let
denote a scalar function of the vehicle state vector
and the local road feature vector:
In the research on multi-vehicle cooperative driving and collision avoidance control, to effectively handle the safe interaction between vehicles, a nonlinear vector function is introduced to describe the feature circle distance constraint between vehicles. This constraint function is closely related to the ego-vehicle state and the state vector of surrounding vehicles , and serves as a key component in the optimization solution of Model Predictive Control (MPC).
Specifically, the reference trajectory of the ego-vehicle and the predicted trajectories of surrounding vehicles together constitute the basic parameters for solving the vehicle optimization problem at time . Based on the optimal control sequence and the current system state, the controller infers the predicted state trajectory, and on this basis, establishes collision constraints for the vehicle over prediction horizons. The nonlinear inequality constraint for multi-vehicle collision avoidance (where ) is designed to ensure the geometric separation between the ego-vehicle and surrounding vehicles within the time domain.
To accurately convert the rectangular geometric shape of the vehicle into computable mathematical constraints, the Feature Circle approximation method is adopted. The collision avoidance constraints between vehicles are defined as the following system of nonlinear inequalities, i.e.,:
In the above equation, and denote the states of the ego-vehicle and the obstacle vehicle (i.e., surrounding vehicle j), respectively, and is the weight matrix. This constraint essentially requires that the squared distance between the front/rear Feature Circles of the ego-vehicle and those of the surrounding vehicle must be greater than the square of the Feature Circle diameter , thereby ensuring no physical contact.
The position coordinates of the centers of the Feature Circles—
(front circle) and
(rear circle)—are derived from the ego-vehicle’s centroid state and heading angle, with their kinematic relationship defined as:
Herein,
and
denote the offset distances of the centers of the vehicle’s Feature Circles relative to the ego-vehicle’s centroid, forward and backward along the longitudinal axis, respectively;
is the Feature Circle diameter. To extract heading angle information from the high-dimensional state vector for calculating geometric positions, sparse matrices
and
are introduced:
In the above definitions, all elements of are zero except for the element at the first row and third column, which is set to 1; all elements of are zero except for the element at the second row and third column.
The flexibility of this constraint model stems from its parameter configuration: when , the four quadratic inequalities in Equation (20) degenerate into the same form, i.e., the single-circle constraint model; when , the four distinct quadratic inequalities correspond to strict constraints that the front and rear double Feature Circles of the ego-vehicle do not come into contact with those of the surrounding vehicle pairwise, thereby achieving accurate coverage of the collision risk associated with the rectangular vehicle body. This nonlinear constraint is ultimately integrated into the optimization problem of Model Predictive Control (MPC) and acts synergistically with control saturation constraints, ensuring that the trajectory generated by the vehicle within the prediction horizon not only satisfies dynamic feasibility but also possesses rigorous collision avoidance capability.
For control input saturation constraints, the lower bound
and upper bound
of the control input for vehicle
are specified as:
To prevent optimization infeasibility in tight merging scenarios, the collision avoidance constraints (Equation (15)) are implemented as Soft Constraints using a heavy penalty function (Generalized Exterior Point). This ensures the solver always finds a solution, even if it requires temporarily penalizing a safety buffer violation. In extreme cases where the merging gap closes unexpectedly, the collision avoidance constraints (Equation (17)) will force the optimization to output a deceleration or braking command. If the trajectory becomes infeasible, a backup safety mechanism is triggered to bring the vehicle to a stop until a safe gap becomes available. Furthermore, to ensure the stability of the DMPC over an infinite horizon, a mathematical terminal constraint term
is incorporated into the cost function as a soft penalty:
where
is a large penalty coefficient enforcing the terminal state to reside within the feasible safety region defined by Equation (19).
3.4.2. Neural Network-Based Policy Solving and Optimization
Although the nonlinear MPC model formulated in
Section 3.4.1 can effectively handle vehicle dynamics and complex collision avoidance constraints, its online numerical solution often faces enormous computational burden in high-density traffic scenarios at ramps, making it difficult to meet the stringent (near-)real-time response requirements of autonomous driving systems. To address this issue, this section introduces how to use an offline-trained deep neural network (DNN) policy
to approximate the optimization and solving process of online MPC, thereby resolving the real-time performance bottleneck. To ensure controller continuity, the input layer accepts a fixed number of nearest surrounding vehicles (e.g.,
). A sorting mechanism based on Euclidean distance ensures that the most critical interaction targets are consistently retained in the input sequence across consecutive time steps. Prior to being fed into the network, all input state variables (e.g., velocity, gap distance) are normalized to the range
using Min-Max Normalization. This standardization prevents gradient saturation and ensures balanced sensitivity across different state features. The detailed architecture of the policy network is presented in
Table 4.
To construct a robust dataset for approximating the global optimal policy, we employed a ‘Teacher–Student’ learning paradigm. The high-precision nonlinear MPC (the ‘Teacher’) was deployed on offline servers to solve optimal control problems over 5000 episodes, generating a comprehensive set of state-action pairs.
The data generation protocol incorporated strict randomization to ensure the network’s generalization capability across diverse ramp merging scenarios:
- 1.
Scenario Initialization: For each episode, the ego vehicle and surrounding vehicles were initialized with random states. The initial speeds were sampled from the operational range [0,Ulimit] (e.g., 0–30 m/s), and initial positions were randomized to cover various conflict levels.
- 2.
Traffic Density and Gaps: To simulate realistic traffic variations, the arrival intervals of mainline vehicles were generated using a shifted negative exponential distribution, while ramp vehicle generation followed a Poisson distribution. This approach naturally created a wide range of inter-vehicle gaps and traffic density conditions, forcing the MPC to resolve complex merging conflicts.
- 3.
Data Split: The collected dataset, consisting of the optimal trajectory sequences, was randomly partitioned into a training set (80%) and a validation set (20%). The validation set was strictly isolated from the training process to monitor the loss convergence and assess the generalization performance of the policy network πθ.
Compared with real-time MPC, which has to shorten the prediction horizon or reduce the number of iterations due to constraints on on-board computing power, the trained neural network can learn the long-term temporal planning characteristics of the ideal MPC. More importantly, the neural network converts the complex optimization iteration process into matrix operation-based forward inference, increasing the decision frequency to the millisecond level. This effectively avoids insufficient vehicle response and abrupt changes in control actions caused by online computation delays, achieving real-time approximation of the globally optimal policy.
At time
t, given
, the network outputs the action sequence
according to the policy
. Herein,
denotes the vehicle observation state set, and
denotes the vehicle admissible control set. The output dimension of the network is determined by the product of the action dimension
and the prediction horizon
, and is partitioned by the acting time step into
,
,⋯,
. In the single Multi-Layer Perceptron (MLP) network adopted in this study, the output layer is structured sequentially to match the total dimension defined by the product of action dimension
and prediction horizon
. Specifically, the 1st to
-th neurons correspond to the approximately optimal action executed at time step 0 within the virtual time horizon, while the
-th to
-th neurons correspond to the action executed at time step 1, and this pattern continues for the entire sequence. The
-th step action
and the vehicle state
are input into the dynamical system
(where
denotes the vehicle feasible state set), to recursively propagate to the vehicle state
at the next time step.
represents the distribution of observations.
denotes the parameters of the to-be-optimized policy network
. The geometric layout of the road network and the initial configuration of the vehicles are illustrated in
Figure 7.
To provide a clear overview of the implementation, the step-by-step execution procedure of the proposed online distributed NN-MPC framework is summarized in Algorithm 2. This table details how the dynamic topology extraction, neural network acceleration, and optimization refinement are integrated within each control cycle.
| Algorithm 2: Distributed NN-MPC with Dynamic Topology |
| | Input: Ego state Neighbor states , Reference |
| | Output: Optimal control |
| 1: | Initialize: Construct Dynamic Topology Graph G(t) |
| 2: | for each time step k = 0 to N − 1 do |
| 3: | Step 1: Neighbor Extraction |
| 4: | Select N_neighbor nearest vehicles based on Equation (8); |
| 5: | Step 2: Neural Approximation (Hot-start) |
| 6: | Predict initial guess ; |
| 7: | Step 3: Optimization Refinement (if needed) |
| 8: | // Minimize Cost (Equation (10)) |
| 9: | Subject to: |
| 10: | // Vehicle Dynamics (Equation (11)) |
| 11: | // Feature Circle Collision Constraints (Equation (15)) |
| 12: | // Actuator Limits (Equation (19)) |
| 13: | |
| 14: | end for |
This study approximates the collision avoidance constraints described in Equation (17) by adding a constraint violation penalty term to the objective function, using a differentiable activation function
for approximation:
Notably, although the neural network policy can significantly improve computational speed, it is essentially a data-driven approximate solution. The output results theoretically cannot provide rigorous proof of hard constraint satisfaction like numerical optimizers. To ensure driving safety under extreme operating conditions, this study introduces a terminal Safety Check mechanism at the output of the policy network in engineering implementation. Specifically, when the control commands output by the network cause the predicted trajectory to violate the collision avoidance constraints defined in Equation (19), the system overrides the network output and strictly executes a rule-based Maximum Braking Strategy (AEB). This ensures fail-safe operation without incurring the computational latency associated with numerical re-optimization. This hybrid architecture, utilizing the neural network for efficient planning and deterministic rules for safety boundaries, effectively balances the system’s real-time performance and operational safety.
Although the specific loss curves are not plotted for brevity, the validation loss converged to the same order of magnitude as the training loss, indicating no significant overfitting. Furthermore, the high open-loop tracking accuracy (as shown in the simulation results) confirms that the DNN has learned the generalized control policy rather than memorizing specific trajectories.