Next Article in Journal
Retrieval-Based Language Model Framework for Predicting Postoperative Complications Under Class Imbalance
Previous Article in Journal
A Cyber Attack Path Prediction Approach Based on a Text-Enhanced Graph Attention Mechanism
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Diffusion-Guided Model Predictive Control for Signal Temporal Logic Specifications

Department of Information and Telecommunication Engineering, Incheon National University, Incheon 22012, Republic of Korea
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(3), 551; https://doi.org/10.3390/electronics15030551
Submission received: 30 December 2025 / Revised: 24 January 2026 / Accepted: 25 January 2026 / Published: 27 January 2026
(This article belongs to the Special Issue Real-Time Path Planning Design for Autonomous Driving Vehicles)

Abstract

We study control synthesis under Signal Temporal Logic (STL) specifications for driving scenarios where strict rule satisfaction is not always feasible and human experts exhibit context-dependent flexibility. We represent such behavior using robustness slackness—learned rule-wise lower bounds on STL robustness—and introduce sub-goals that encode intermediate intent in the state/output space (e.g., lane-level waypoints). Prior learning-based MPC–STL methods typically infer slackness with VAE priors and plug it into MPC, but these priors can underrepresent multimodal and rare yet valid expert behaviors and do not explicitly model intermediate intent. We propose a diffusion-guided MPC–STL framework that jointly learns slackness and sub-goals from demonstrations and integrates both into STL-constrained MPC. A conditional diffusion model generates pairs of (rule-wise slackness, sub-goal) conditioned on features from the ego vehicle, surrounding traffic, and road context. At run time, a few denoising steps produce samples for the current situation; slackness values define soft STL margins, while sub-goals shape the MPC objective via a terminal (optionally stage) cost, enabling context-dependent trade-offs between rule relaxation and task completion. In closed-loop simulations on held-out highD track-driving scenarios, our method improves task success and yields more realistic lane-changing behavior compared to imitation-learning baselines and MPC–STL variants using CVAE slackness or strict rule enforcement, while remaining computationally tractable for receding-horizon MPC in our experimental setting.

1. Introduction

Robotics is increasingly permeating diverse sectors, spanning both civilian and industrial applications, and is becoming integral to everyday life. Service robots are now prevalent in public spaces, interacting with individuals and delivering services. Within robotics, autonomous driving is a particularly dynamic area that continues to attract extensive research attention.
A recurring challenge in real-world autonomy is rule-governed decision making. In many robotic domains, rules range from basic collision avoidance in navigation to complex traffic regulations in driving. While such rules are primarily designed for safety, strict adherence to all rules is not always feasible or even desirable. In autonomous driving, for instance, a safe and efficient maneuver may require temporarily relaxing a rule—such as, changing lanes in dense traffic, deciding whether to stop or proceed at a yellow light, or briefly crossing a lane marker to avoid a stalled vehicle. These situations require robots to evaluate trade-offs between partially conflicting objectives and to make nuanced decisions about how much each rule should be respected in the current context. Designing controllers that systematically support such context-dependent flexibility remains a core problem.
Model Predictive Control (MPC) is a powerful paradigm for autonomous control due to its ability to optimize trajectories online under dynamics and constraints [1]. MPC naturally combines an objective function characterizing desired behavior with constraints encoding safety or physical limitations, and has demonstrated strong performance across a wide range of robotic applications, including whole-body control of humanoid robots [2,3,4] and driving-related control tasks. Despite these strengths, designing effective MPC controllers is challenging: expert behavior must be encoded through cost terms, constraint margins, and tuning parameters, and these choices must generalize across diverse scenarios. In driving, for example, expert decisions such as whether to decelerate, follow closely, or change lanes depend on traffic interactions and road context, making manual tuning of MPC parameters highly nontrivial.
Imitation learning (IL) offers an appealing alternative by learning control policies directly from expert demonstrations [5,6]. In autonomous driving, end-to-end IL approaches map observations and high-level commands to control actions and can capture rich nonlinear behaviors [7,8]. However, pure IL does not inherently guarantee compliance with safety-critical rules, and performance can degrade under distribution shift or rare configurations. This motivates hybrid approaches that retain explicit rule representations while leveraging learning for human-like behavior.
In this work, we study control synthesis in the presence of rule specifications, building on our prior MPC–STL framework [9]. We represent rules using Signal Temporal Logic (STL) [10,11], a formal language for specifying temporal properties of real-valued signals. STL has been widely adopted for robotic task specification and control synthesis [12,13,14,15,16]. A key benefit of STL is its robustness degree, which quantifies how strongly a trajectory satisfies a rule, enabling rule margins to be incorporated into optimization-based controllers.
Rather than prescribing rule priorities manually, our earlier work learned robustness slackness from expert demonstrations [9]. Slackness values define rule-wise lower bounds on STL robustness, capturing how strictly experts tend to satisfy each rule in context. In that framework, a Conditional Variational Autoencoder (CVAE) [17] was trained to map contextual features to slackness values, which were then plugged into STL-constrained MPC to enable selective, flexible rule compliance. While effective, the CVAE prior can be limiting when expert behavior is inherently multimodal, as in driving (e.g., both braking and lane changing can be valid responses), and our prior framework did not explicitly represent intermediate sub-goals (e.g., target lanes or waypoints) that organize longer-horizon decisions.
Concurrently, diffusion models have emerged as expressive generative models that can represent complex multimodal distributions. Beyond vision, diffusion-based policies have recently been applied to robot control by modeling distributions over actions or trajectories conditioned on observations [18]. Their ability to capture diverse behaviors makes diffusion a natural candidate for learning human-like decision variables in rule-governed domains.
Motivated by these developments, we propose a diffusion-guided MPC–STL framework that extends our previous approach in two key ways. First, we replace the CVAE with a conditional diffusion model to learn a richer distribution over rule-wise robustness slackness conditioned on the current traffic context. Second, we augment the learned outputs with sub-goals that encode intermediate intent in the state/output space (e.g., lane-level targets or waypoints). The diffusion model jointly predicts slackness and sub-goals, and the predicted sub-goals are injected into the MPC objective (terminal and, optionally, stage costs), guiding the optimizer toward human-like intermediate targets while enforcing STL constraints with learned margins.
At run time, given the current feature vector, the conditional diffusion model generates samples of rule-wise slackness and sub-goals. The slackness values define soft margins for STL constraints, and the sub-goals shape the MPC optimization, resulting in a closed-loop controller that combines the expressive, multimodal modeling capacity of diffusion with the interpretability and structure of STL-constrained MPC. More broadly, our framework bridges symbolic task specifications (STL) and continuous control by learning context-dependent “soft” margins and intent signals while preserving constrained receding-horizon optimization. This principle extends beyond driving to robotics domains that must balance formal safety/task constraints with multimodal human preferences.
The main contributions of this paper are as follows:
  • We formulate an STL-constrained MPC framework in which both rule-wise robustness slackness and intermediate sub-goals are learned from demonstrations as context-dependent decision variables.
  • We propose a conditional diffusion model that jointly generates robustness slackness and sub-goals, providing improved multimodal coverage compared to VAE-based slackness learning, which yields a more diverse set of feasible MPC–STL plans under the same context.
  • We demonstrate in highD-based track-driving simulations that diffusion-guided MPC–STL improves task success and induces more realistic lane-changing behavior compared to imitation-learning baselines and MPC–STL variants (CVAE slackness and strict STL enforcement), while remaining computationally tractable for receding-horizon control in our experimental setting.

2. Related Work

Temporal logic specifications in planning and control. A large body of work studies trajectory optimization and MPC under temporal logic, most prominently Linear Temporal Logic (LTL). Early formulations rely on mixed-integer programs to encode finite- or infinite-horizon specifications for continuous systems [19,20,21], and sampling or graph(tree)-based planners for co-safe LTL variants [22]. Signal Temporal Logic (STL) has subsequently been integrated into MPC to optimize robustness of satisfaction [23], address uncertainty via probabilistic predicates [24], and avoid combinatorial search using successive convexification [25]. Beyond single-agent settings, distributed STL planning has been explored for multi-robot teams, improving scalability while preserving task-level correctness [15]. More recently, learning-augmented formulations that optimize STL robustness in closed loop have shown promise for improving computational efficiency and empirical satisfaction [16].
Learning for MPC and rule flexibility. Coupling learning with MPC has been widely investigated for model identification, residual dynamics, and task-specific performance [26,27,28]. In rule-governed domains (e.g., driving), strict adherence to all rules may be infeasible; related efforts therefore either learn temporal-logic structure from data or plan under partially unsatisfiable specifications. On the learning side, Kong et al. infer reactive parameter STL (rPSTL) formulae directly from labeled trajectories, discovering discriminative temporal-logic properties and exposing causal, spatial, and temporal relations useful for monitoring and design tuning [29]. On the planning side, minimum-violation methods explicitly search for trajectories that relax low-priority requirements when all constraints cannot be met [30]. Our prior work learned robustness slackness—rule-wise lower bounds on STL robustness inferred from demonstrations—and embedded them into MPC–STL via a conditional VAE mapping from context to slackness values [9]. While effective, VAE-based priors can suffer from limited expressiveness in inherently multimodal settings and may underrepresent rare but valid expert behaviors; moreover, they did not explicitly capture intermediate sub-goals that structure longer-horizon decisions. In contrast, the present work employs a conditional diffusion model to jointly predict robustness slackness and sub-goals, which are then integrated into the MPC–STL constraints and objective, respectively.
Imitation learning for autonomous driving. Imitation learning (IL) provides an alternative to manual cost design by fitting policies to expert data [6]. In end-to-end driving, Conditional Imitation Learning (CIL) conditions policy outputs on high-level commands, improving intersection handling and intent disambiguation [7]. Although IL methods can capture complex mappings from observations to actions, they provide limited mechanisms for explicit rule enforcement and safety constraint handling under distribution shift, motivating hybrid approaches that retain constraint-based structure.
Diffusion models for control and driving. Diffusion models have recently emerged as expressive generative priors for sequential decision making in robotics. Diffusion Policy learns action distributions conditioned on observations and has demonstrated strong multimodal control from demonstrations [18]. In autonomous driving, diffusion-based planners have been explored for closed-loop planning with explicit guidance mechanisms [31], as well as diffusion-based driving trajectory generation/planning frameworks [32]. Recent work also studies more efficient planning-time sampling/optimization with diffusion, e.g., using truncated diffusion within search to improve exploration and performance in closed-loop driving benchmarks [33]. These results suggest that diffusion priors can mitigate mode-averaging limitations of VAEs and provide richer multimodal representations of expert strategies. In contrast to diffusion planners that directly generate trajectories/actions, our diffusion model serves as a multimodal prior over decision variables (slackness and sub-goals) that parameterize a downstream constrained MPC–STL optimizer.
Recent RL-based decision making and eco-driving. Recent studies have investigated end-to-end reinforcement-learning decision modules for challenging road geometries (e.g., consecutive sharp turns), and multi-objective eco-driving strategies that jointly consider safety and energy objectives for intelligent (hybrid/electric) vehicles [34,35]. These approaches are complementary to our setting: they primarily rely on reward design and policy optimization, whereas our work focuses on rule-governed control with explicit STL robustness constraints embedded in MPC.
Positioning of our contribution. Compared to prior MPC–STL works that either (i) encode logic via MILP or convexified surrogates [23,25], (ii) rely on fixed rule weights or pre-specified priorities [30], or (iii) learn only rule-wise slackness with VAE priors [9], our approach introduces a conditional diffusion model that jointly predicts (a) rule-wise robustness slackness and (b) state-space sub-goals from expert demonstrations and current context. The learned slackness values define soft margins in the STL constraints, and the learned sub-goals are injected into the MPC objective (including the terminal term). This diffusion-guided MPC–STL preserves the constraint-based structure of MPC–STL while capturing multimodal expert preferences and intermediate intent, enabling flexible yet rule-aware trajectories that align more closely with human driving behavior.

3. Preliminaries

3.1. System Model

We consider a continuous-time nonlinear control system
x ˙ t = f ( x t , u t ) ,
where x t X R n x and u t U R n u . With a fixed sampling period d t , we use the discretized model
x n + 1 = f ( x n , u n ) , n N ,
and write u H , n = { u n , , u n + H 1 } for a length-H input sequence starting from time n. The resulting state rollout from x n under u H , n is denoted by x ( x n , u H , n ) = { x n , , x n + H } .
We define a signal over a finite horizon as the state–input sequence
ξ ( x n , u H , n ) = ( x n , u n ) , , ( x n + H 1 , u n + H 1 ) ,
and, with a slight abuse of notation, also write ξ ( n ) for a signal that starts at the discrete time n.

3.2. Signal Temporal Logic (STL)

Signals and predicates. We evaluate STL over the discrete-time signal induced by (2). With a slight abuse of notation, we write ξ ( t ) for the signal at time index t N , i.e., ξ ( t ) = ( x t , u t ) .
An STL predicate is a real-valued function μ : R n x × R n u R , and the atomic proposition is μ ( x ( t ) , u ( t ) ) > 0 (or μ ( x t , u t ) > 0 in discrete time). Typical examples include distance-to-obstacle margins, lane-boundary margins, or speed-limit residuals.
Syntax (boolean layer). STL formulas φ are built from predicates using boolean and temporal operators:
φ : : = μ | ¬ φ | φ 1 φ 2 | φ 1 φ 2 | G [ a , b ] φ | φ 1 U [ a , b ] φ 2 ,
where G [ a , b ] is “globally” on [ a , b ] and U [ a , b ] is “until” on [ a , b ] . Boolean satisfaction is denoted by ( ξ , t ) φ and is defined in the standard way, e.g., ( ξ , t ) μ μ ( x t , u t ) > 0 , ( ξ , t ) G [ a , b ] φ t [ t + a , t + b ] , ( ξ , t ) φ .
Quantitative semantics (robustness). Beyond true/false, STL provides a robustness degree ρ φ ( ξ , t ) R measuring signed distance to violation: positive values mean satisfaction (larger is safer), negative values mean violation (more negative is worse). The robustness is defined recursively:
ρ μ ( ξ , t ) = μ ( ξ ( t ) ) , ρ ¬ φ ( ξ , t ) = ρ φ ( ξ , t ) , ρ φ 1 φ 2 ( ξ , t ) = min ρ φ 1 ( ξ , t ) , ρ φ 2 ( ξ , t ) , ρ φ 1 φ 2 ( ξ , t ) = max ρ φ 1 ( ξ , t ) , ρ φ 2 ( ξ , t ) , ρ G [ a , b ] φ ( ξ , t ) = min t [ t + a , t + b ] ρ φ ( ξ , t ) , ρ φ 1 U [ a , b ] φ 2 ( ξ , t ) = max t [ t + a , t + b ] min ρ φ 2 ( ξ , t ) , min t [ t , t ] ρ φ 1 ( ξ , t ) .
Hence, in the vector notation used later, ρ φ j ( ξ , t ) simply means “the robustness of the j-th STL rule φ j at time t.” In our implementation, temporal operators are evaluated over discrete indices with step d t ; thus, the interval [ t + a , t + b ] is interpreted as the index set { t + a , , t + b } (after discretization), and the min/max are taken over these indices.

3.3. Robustness Slackness and Soft Satisfaction

Let φ = [ φ 1 , , φ N ] be the set of STL rules. We introduce a rule-wise (per-rule) slackness vector r = [ r 1 , , r N ] R N . We say the signal satisfies all rules with slackness r at time t if
( ξ , t ) ( φ , r ) ρ φ j ( ξ , t ) > r j , j { 1 , , N } .
Here, r j is a lower bound on the allowed robustness for rule φ j : larger r j enforces a stricter margin; r j < 0 allows a bounded violation. For a finite prediction horizon H, we use the worst-case robustness
ρ ̲ φ j ( ξ , n ; H ) = min m [ n , n + H 1 ] ρ φ j ( ξ , m ) ,
and define soft satisfaction over the horizon by ( ξ , n ) H ( φ j , r j ) iff ρ ̲ φ j ( ξ , n ; H ) > r j .

3.4. Sub-Goals

Besides rule margins, we use sub-goals to encode intermediate intent (e.g., a target lane center, waypoint, or speed setpoint) over the horizon. We denote a sub-goal by g G R n g , which may represent either a subset of the state (e.g., ( x , y ) position) or a task/output quantity. In our MPC, sub-goals appear as stage/terminal terms in the objective, guiding the trajectory toward human-like intermediate targets while the STL constraints are enforced via (4).

4. Problem Formulation

We consider STL-constrained MPC where both the rule-wise margins of satisfaction and the behavioral intent (sub-goal) are learned functions of context. At each time t, given a feature vector ϕ t extracted from the current state, neighbors, and map cues, we assume access to learned predictors
r t = R ( ϕ t ) R N , g t = S ( ϕ t ) G R n g ,
which provide (i) a rule-wise robustness slackness vector and (ii) a target sub-goal, respectively. (Their concrete realization is described in Section 5.)
Let the prediction horizon be H. Denote u H , t = { u t , , u t + H 1 } , x ( x t , u H , t ) = { x t , , x t + H } , and the finite-horizon signal ξ ( x t , u H , t ) as in (3).
STL constraints with learned slackness. Here, r t , j is a learned lower bound on the robustness of rule φ j over the horizon; allowing r t , j < 0 permits bounded robustness violation when strict satisfaction is infeasible. For φ = [ φ 1 , , φ N ] , we enforce
ρ ̲ φ j ξ ( x t , u H , t ) , t ; H = min m { t , , t + H 1 } ρ φ j ξ ( x t , u H , t ) , m > r t , j , j = 1 , , N .
Goal-shaped MPC objective. Let x t + H be the terminal state under u H , t . We define
J ( x , u H , t ; g t ) = ( S x x t + H g t ) W g ( S x x t + H g t ) + k = t t + H 1 u k R u k ,
with S x R n g × n x , W g 0 , and  R 0 . Typical choices for S x include (i) S x = I n x (full-state goal, n g = n x ); (ii) S x = [ I n g 0 ] (subset tracking, e.g., position only); (iii) a user-defined linear map to a task/output space. (If the goal is defined in a nonlinear output y = h ( x ) , one may replace S x x t + H with h ( x t + H ) in (7).)
MPC–STL. At each time t, the optimization in (8) searches for a control sequence u H , t that (a) steers the terminal state toward the context-dependent sub-goal g t via the quadratic objective in (7), while (b) satisfying all STL rules with learned per-rule margins r t , j over the prediction horizon, as enforced by (6). In words, the planner balances “go to the intended sub-goal” against “respect each rule with the required slackness,” where both the intent ( g t ) and the required margins ( r t ) are learned functions of the current context (see Section 5).
minimize u H , t J x ( x t , u H , t ) , u H , t ; g t subject   to x k + 1 = f ( x k , u k ) , k = t , , t + H 1 , x k X , u k U , k = t , , t + H 1 , ρ ̲ φ j ξ ( x t , u H , t ) , t ; H > r t , j , j = 1 , , N .
After solving (8), we apply only the first input u t , observe x t + 1 , and repeat the optimization in a receding-horizon manner.

5. Proposed Method

This section describes the concrete realization of the mappings R ( · ) and S ( · ) introduced in Section 4. Our framework combines a learned context-conditioned generative model with an STL-constrained MPC. Given the current context feature ϕ t , the learned model produces (i) rule-wise robustness slackness r t and (ii) a sub-goal g t . These outputs are then injected into the STL constraints and the MPC objective in (6)–(8), respectively, yielding trajectories that are both rule-aware and human-like.

5.1. Overview

Figure 1 illustrates the overall procedure. From expert demonstrations, we construct supervision signals for robustness slackness and sub-goals. We then train a conditional diffusion model to approximate the conditional distribution
p θ ( r , g ϕ ) ,
which captures multimodal expert preferences under the same context. At run time, the diffusion model samples ( r t , g t ) conditioned on ϕ t , and the MPC solves (8) in a receding-horizon fashion.

5.2. Feature Description

We define a feature extractor ϕ ( · ) that maps the current driving context (ego state, surrounding vehicles, and lane geometry) to a feature vector ϕ t R n f . In the track-driving setup (Figure 2), the ego vehicle V ego interacts with up to six nearby vehicles V near = { V l f , V l r , V c f , V c r , V r f , V r r } . The feature vector consists of (i) longitudinal distances to nearby vehicles d t = ( d l f , d l r , d c f , d c r , d r f , d r r ) , (ii) lateral distance from the right lane boundary to the ego vehicle center, denoted by d d e v , and (iii) heading deviation from the lane direction θ d e v . Unless otherwise stated, each d ( · ) is a nonnegative longitudinal distance measured along the lane axis; whether the vehicle is in front or behind is encoded by the index (e.g., l f vs. l r ). If a corresponding nearby vehicle does not exist (e.g., no left-front vehicle), we set the distance to a predefined maximum value d max .

5.3. Supervision from Demonstrations

Let Ξ = { ξ i } i = 1 M be a set of demonstrated signals, where ξ n i = ( x n i , u n i ) denotes the state and control at step n. For each STL rule φ j , we compute a rule-wise horizon robustness label as the worst-case (minimum) robustness over the next H steps:
r n i , j = min m { n , , n + H 1 } ρ φ j ( ξ i , m ) .
This quantity serves as the robustness slackness target for φ j at time n: it summarizes how strongly the demonstrated behavior satisfies (or violates) φ j over the planning horizon. Stacking over rules yields r n i = [ r n i , 1 , , r n i , N ] R N .
To capture intermediate intent, we additionally construct a sub-goal label from the demonstration. Using the same selection matrix S x as in (7), we set
g n i = S x x n + H i R n g .
This choice aligns the supervision with the MPC horizon and provides a simple, annotation-free proxy for short-horizon intent in highway driving. We note that this proxy can be imperfect: in highly interactive situations, the demonstrated terminal state may reflect a transient reactive outcome (e.g., yielding or braking) rather than a persistent intent, which can introduce label noise. Developing more robust intent representations—such as semantic goals (e.g., target-lane selection), time-indexed sub-goal sequences, or latent-intent discovery—is an important direction for future work.
Finally, we build paired training samples
D = ( ϕ n i , y n i ) , y n i = r n i ; g n i R N + n g ,
where ϕ n i = ϕ ( ξ n i ) is the context feature at time step n.

5.4. Conditional Diffusion Model for Joint Prediction

We model the conditional distribution of the joint output y = [ r ; g ] R N + n g given context ϕ using a conditional diffusion model. Let y 0 denote a clean sample drawn from the empirical demonstration distribution. The forward noising process is defined by
q ( y k y k 1 ) = N 1 β k y k 1 , β k I , k = 1 , , K ,
where { β k } is a variance schedule. Equivalently, using α ¯ k = = 1 k ( 1 β ) , we can write y k = α ¯ k y 0 + 1 α ¯ k ϵ with ϵ N ( 0 , I ) .
The reverse (denoising) model is parameterized by a neural network ϵ θ ( y k , k , ϕ ) that predicts the injected noise conditioned on ϕ . We train ϵ θ using the standard denoising objective:
L ( θ ) = E ( ϕ , y 0 ) D , k Unif ( { 1 , , K } ) , ϵ N ( 0 , I ) ϵ ϵ θ ( y k , k , ϕ ) 2 2 .
At run time, given ϕ t , we initialize y K N ( 0 , I ) and generate samples via an accelerated diffusion sampler. Unless stated otherwise, we use a DDIM sampler [36] with K DDIM K denoising steps (e.g., K DDIM = 20 ).

5.5. MPC–STL Synthesis with Diffusion Guidance

Given the predicted (or sampled) ( r t , g t ) at time t, we solve the MPC–STL problem in (8) in a receding-horizon fashion. The learned slackness r t sets per-rule robustness margins through (6), while the learned sub-goal g t shapes the MPC objective via (7). To handle nonlinear dynamics, we employ local linearization around the current operating point. STL constraints are enforced using a robustness-based mixed-integer encoding (e.g., MIQP/MILP formulations commonly used in MPC–STL).
Candidate selection (optional). If we draw multiple diffusion samples s = 1 , , S , we select the feasible candidate that yields the lowest optimal MPC objective value at time t:
s arg min s I t J t ( s ) , I t : = { s ( 8 ) is feasible under ( r t ( s ) , g t ( s ) ) } ,
where J t ( s ) denotes the optimal objective value returned by solving (8) at time t under candidate s. We then set ( r t , g t ) = ( r t ( s ) , g t ( s ) ) . Infeasible candidates are discarded (or assigned + cost). If all candidates are infeasible, we fall back to a conservative default (e.g., ( r t 1 , g t 1 ) or a lane-following sub-goal with strict margins).

5.6. Algorithm

Algorithm 1 summarizes our closed-loop controller.
Offline. From expert demonstrations, we compute supervision targets: (i) rule-wise slackness r n i as the horizon-wise minimum robustness in (9), and (ii) sub-goals g n i as the H-step-ahead terminal target projected by S x in (10). We then train a conditional diffusion model to approximate the conditional distribution p θ ( [ r ; g ] ϕ ) via the denoising loss (11).
Online. At each time step t, we extract the context feature ϕ t from the current ego state, surrounding vehicles, and lane geometry. The diffusion model performs a (truncated) reverse denoising process to generate one or multiple candidates ( r t , g t ) , capturing multimodal expert preferences. Given each candidate, we solve the MPC–STL problem (8): r t sets per-rule robustness margins in the STL constraints, while g t shapes the terminal objective. We apply only the first control input (receding-horizon execution), observe the next state, and repeat. If multiple candidates are sampled, we choose the feasible candidate with the lowest MPC optimal cost (Section 5.5).
Algorithm 1 Diffusion-guided MPC–STL with Learned Slackness and Sub-goals
  1:
Offline: Construct dataset D = { ( ϕ n i , y n i ) } using (9) and (10), where y n i = [ r n i ; g n i ] .
  2:
Offline: Train conditional diffusion model ϵ θ ( · ) by minimizing (11).
  3:
Online: Initialize x 0 , set t = 0 .
  4:
while not terminated do
  5:
      Extract feature ϕ t = ϕ ( x t , neighbors , map ) .
  6:
      Sample S candidates { y 0 ( s ) } s = 1 S using a DDIM sampler: y 0 ( s ) = [ r t ( s ) ; g t ( s ) ] p θ ( · ϕ t ) .
  7:
      for  s = 1 to S do
  8:
          Solve MPC–STL (8) with ( r t ( s ) , g t ( s ) ) to obtain u H , t , ( s ) and optimal cost J t ( s ) (mark infeasible if no solution).
  9:
      end for
10:
      Select s arg min s I t J t ( s ) and set u H , t = u H , t , ( s ) .
11:
      Apply the first input u t , observe x t + 1 .
12:
       t t + 1 .
13:
end while

6. Experimental Results

6.1. Implementation Details and Dataset

All methods were implemented in Python (v3.10) with PyTorch (v2.7.1) for learning. The MPC problems were solved using Gurobi [37]. Experiments were conducted on a workstation equipped with an AMD R7-7700 CPU and an RTX 4080 Super GPU. Unless stated otherwise, the MPC prediction horizon is set to H = 20 and we use a single diffusion sample ( S = 1 ) per control step.
We use the highD dataset [38] as expert demonstrations. The highD recordings used in this work consist of 60 tracks, which we group into three subsets: highD dataset1 (tracks 1–20), highD dataset2 (tracks 21–40), and highD dataset3 (tracks 41–60). To avoid leakage across train/test, we construct the training set by extracting samples only from even-numbered tracks within each subset, and we perform all closed-loop evaluations on the remaining odd-numbered tracks that were never used to construct training samples. From each subset, we extract 5000 training samples from even-numbered tracks, resulting in 15 , 000 training pairs ( ϕ n i , y n i ) in total. Here, ϕ n i is the context feature at time step n and y n i = [ r n i ; g n i ] contains (i) rule-wise robustness slackness targets computed from the horizon-wise minimum robustness in (9), and (ii) goal-point targets extracted from demonstrations using (10). We uniformly sample training pairs across the even-numbered tracks within each subset to avoid over-representing a small number of long trajectories. We do not use a separate validation split; all hyperparameters are fixed across experiments, and all reported closed-loop metrics are computed only on the held-out odd-numbered test tracks.
For the diffusion model, we normalize each component of y = [ r ; g ] to the range [ 1 , 1 ] using min–max statistics computed from the training set, and train the model in the normalized space. At inference time, we denormalize the generated outputs back to the original units before passing ( r t , g t ) to the MPC–STL solver. We train the diffusion model with K = 100 forward diffusion steps, a standard choice that yields stable training. Unless stated otherwise, we use a DDIM sampler [36] for diffusion inference with 20 denoising steps, which provides a practical quality–runtime trade-off. For a fair comparison, all imitation-learning baselines are trained using the same dataset with the same split and sample size.
On our workstation (AMD R7-7700 CPU, RTX 4080 Super GPU), the average Gurobi-based MPC solve time is approximately 0.12 s per control step, and DDIM sampling with 20 denoising steps takes approximately 0.165 s per sample (averaged over the evaluated scenarios). With the default setting S = 1 , this yields an end-to-end wall-clock time of about 0.285 s per control step (excluding minor overhead).
The proposed pipeline is computationally tractable for receding-horizon evaluation in our experimental setting; however, the current implementation (diffusion sampling + mixed-integer MPC–STL) may not meet strict real-time constraints at high control rates. Accordingly, we use a conservative default setting of S = 1 with DDIM inference (20 denoising steps) in the main experiments. While Algorithm 1 supports multimodal sampling with S > 1 , increasing S can improve robustness by enabling selection of the best feasible plan among multiple candidate pairs ( r t , g t ) , at a computation cost that scales approximately linearly with S because sampling and MPC–STL solves are repeated per candidate. With our measured timings, the per-step wall-clock time is approximately S · ( 0.165 + 0.12 ) s in a non-parallel setting without solver warm-starting. This overhead can be mitigated via fewer DDIM steps, solver warm-starting, parallel candidate evaluation, and faster optimization backends or compiled implementations.
We use a kinematic unicycle model and do not explicitly impose near-limit vehicle stability constraints such as tire friction limits, sideslip, or detailed lateral dynamics. This choice matches the scope of highway driving scenarios in highD, where recorded maneuvers are typically smooth and far from handling limits. Extending the MPC layer with stability constraints (e.g., lateral-acceleration bounds and friction-circle/ellipse constraints) and higher-fidelity dynamics (e.g., a dynamic bicycle model with sideslip) is left for future work.

6.2. Baselines

We compare the proposed method against two representative alternatives:
(1) Imitation Learning (IL). We consider imitation-learning baselines that predict a length-H action sequence u ^ H , t = { u ^ t , , u ^ t + H 1 } from the current context feature ϕ t (and oracle nearby-vehicle futures over the same horizon when used in the evaluation protocol), and execute them in closed loop in a receding-horizon manner by applying only the first action u ^ t at each step. The resulting ego trajectory is obtained by rolling out the applied actions through the same vehicle dynamics model for reporting and analysis. These baselines do not explicitly enforce STL constraints and do not solve an optimization problem.
(2) MPC–STL with CVAE slackness (MPC–STL (CVAE)) [9]. This baseline follows our previous framework where a conditional VAE predicts only the rule-wise robustness slackness r t from ϕ t , and an MPC–STL optimizer generates the control sequence under the predicted margins. In contrast, our proposed method uses a conditional diffusion model and jointly predicts both slackness and sub-goals, injecting the latter into the MPC objective as in (7).

6.3. System Description

We model vehicle dynamics using a unicycle model with state x t = [ x t , y t , θ t , v t ] and control input u t = [ w t , a t ] , where w t is the angular velocity and a t is the acceleration:
x ˙ t = v t cos ( θ t ) , y ˙ t = v t sin ( θ t ) , θ ˙ t = κ 1 v t w t , v ˙ t = κ 2 a t .
For optimization, we linearize the dynamics around a reference point x ^ = [ x ^ , y ^ , θ ^ , v ^ ] and obtain the first-order approximation x n + 1 = A n x n + B n u n + C n , with A n , B n , C n defined as in the following.
A n = 1 0 v ^ sin ( θ ^ ) d t cos ( θ ^ ) d t 0 1 v ^ cos ( θ ^ ) d t sin ( θ ^ ) d t 0 0 1 0 0 0 0 1 , B n = 0 0 0 0 κ 1 v ^ d t 0 0 κ 2 d t , C n = v ^ sin ( θ ^ ) θ ^ d t v ^ cos ( θ ^ ) θ ^ d t 0 0 .
In all experiments, the sub goal (goal point) is defined in the planar position space as g t = [ x g , t , y g , t ] R 2 ; accordingly, we set n g = 2 and use S x = [ I 2 × 2 0 2 × 2 ] to select ( x , y ) from x t = [ x t , y t , θ t , v t ] .

6.4. Rule Description

We consider five driving rules encoded as STL formulas φ = [ φ 1 , , φ 5 ] . In our coordinate convention, larger y corresponds to the left side of the road, and smaller y corresponds to the right side.
1.
Lane keeping (right/lower boundary):  φ 1 : y t y l , min ;
2.
Lane keeping (left/upper boundary):  φ 2 : y t y l , max ;
3.
Collision avoidance (neighbor vehicle bounding box):
φ 3 : ( x t x c , min ) ( x t x c , max ) ( y t y c , min ) ( y t y c , max ) ;
4.
Speed limit:  φ 4 : v t v t h ;
5.
Slow down before the preceding vehicle:
φ 5 : ( v t v u ) U [ t a , t b ] ( x t x c , min ) ,
where x c , min denotes the rear boundary of the preceding vehicle along the lane axis.
All rules are evaluated over the MPC horizon using the minimum robustness as in (6). Figure 3 illustrates the environment and notations used in the rules. In all experiments, the temporal parameters for φ 5 are set to t a = 6 and t b = 12 .

6.5. Simulation Results

Figure 4 shows representative closed-loop rollouts on the highD dataset. For each scenario, the conditional diffusion model predicts (i) a sub-goal (goal point) g t (top-left) and (ii) rule-wise robustness slackness r t (bottom-left). Given these predictions, the MPC–STL solver computes the optimized control sequence and the resulting trajectory (right). In the slackness plots, entries with r t , j < 0 are highlighted (red boxes), which indicates that the learned margins allow controlled (bounded) relaxation of the corresponding rules when necessary.
In Figure 4a, the predicted goal point lies in the left lane and the slackness relaxes φ 2 (upper/left boundary) and φ 5 (slow-down) relative to the other rules. Accordingly, the MPC executes a left-lane maneuver and maintains speed while approaching the preceding vehicle, reflecting the learned intent and context-dependent flexibility. In Figure 4b, the predicted goal point lies in the right lane and the slackness relaxes φ 1 (lower/right boundary), leading to a rightward lane change. Overall, these examples illustrate that the diffusion model captures both intermediate intent (goal points) and rule flexibility (slackness), and that MPC–STL converts them into human-like maneuvers in closed loop.

6.5.1. Comparison with Imitation Learning (IL)

We first compare our diffusion-guided MPC–STL against representative imitation-learning (IL) approaches. To ensure a fair comparison, we provide the same level of future information to all methods: in addition to the current context feature ϕ t , we supply the future trajectories of nearby vehicles over the horizon (obtained from the held-out dataset). Concretely, each IL baseline receives as input
ϕ t , x near H , t u ^ ego H , t ,
where x near H , t denotes the future states of nearby vehicles (up to six vehicles) over the same horizon, and u ^ ego H , t is the predicted action sequence of the ego vehicle. All IL baselines are executed in a receding-horizon manner: at each time step, they predict u ^ ego H , t and apply only the first action u ^ t . For reporting and analysis, the executed ego trajectory is obtained by rolling out the applied actions through the same vehicle dynamics model. This oracle protocol removes prediction error as a confounder and focuses the comparison on the decision mechanism (optimization with constraints vs. direct action prediction).
In our method, the same nearby-vehicle future trajectories are used inside the MPC–STL optimization (e.g., to evaluate collision-related STL predicates and constraints), whereas in IL they are used only as additional conditioning inputs, without explicit constraint evaluation or optimization. We emphasize that the goal of this comparison is not to claim IL as the closest architectural match, but to quantify the empirical gap between purely learned action prediction and constraint-aware receding-horizon planning under matched access to oracle neighbor futures.
We consider three IL variants: (i) Diffusion Policy [18], which generates a length-H ego action sequence via an action diffusion model conditioned on ( ϕ t , x near H , t ) ; (ii) a Transformer–VAE, using a Transformer decoder and a conditional VAE; and (iii) LSTM–GMM [39], using an LSTM decoder with a Gaussian mixture output layer. All baselines are trained on the same number of samples ( 15 , 000 ) from the highD dataset and use the same train/test split as our method. For fairness, since our method does not use the ego vehicle’s past trajectory as input, we also remove past-trajectory inputs from all IL baselines.
The evaluation task is long-horizon track driving: the ego vehicle starts near one end of the track and must reach the opposite end. A rollout is counted as a success if the ego vehicle reaches the goal region without leaving the track boundaries and without colliding with any other vehicle; otherwise it is a failure. The primary metric is the success rate over repeated trials.
We evaluate in held-out highD scenarios across three disjoint subsets: highD dataset1 (tracks 1–20), highD dataset2 (tracks 21–40), and highD dataset3 (tracks 41–60). For each subset, we perform closed-loop rollouts on the test tracks that were not used for training sample extraction (i.e., odd-numbered tracks in the corresponding range). For each test scenario, we select one vehicle and designate it as the ego vehicle. Importantly, the ego vehicle does not follow the recorded trajectory; instead, it is controlled by the tested algorithm (our method or an IL baseline). The surrounding vehicles follow the dataset trajectories, and their future trajectories over the horizon are provided to the tested method as described above. We run 200 rollouts per subset for each method under the same evaluation protocol (scenario set and initialization procedure), resulting in 600 rollouts in total across the three subsets.
Table 1 reports the success statistics under the same test environments. Although all IL baselines are provided with oracle future trajectories of nearby vehicles, they can still fail in long-horizon rollouts due to error accumulation in the rolled-out (executed) ego trajectory and the absence of explicit rule enforcement. In contrast, our method achieves a higher success rate, consistent with explicitly solving an STL-constrained receding-horizon optimization problem at every step, which enforces safety rules and goal-reaching while adapting to the context-dependent slackness and sub-goal predicted by the diffusion model.
Figure 5 shows qualitative snapshots from a representative test scenario. The proposed method performs a smooth lane change and reaches the goal region while maintaining feasibility with respect to the MPC–STL constraints. In contrast, the diffusion-policy baseline progresses toward the goal but collides with a nearby vehicle near the terminal region. The Transformer–VAE reaches the goal; however, it exhibits an undesirable behavior of traveling along the dashed lane marking for an extended period. The LSTM–GMM baseline shows a similar tendency to linger on the dashed line and ultimately results in a collision. These snapshots highlight that, even with access to oracle nearby-vehicle futures, purely learned IL rollouts can exhibit long-horizon instability or unsafe maneuvers, whereas the proposed optimization-based controller yields more reliable, rule-aware behavior.

6.5.2. Comparison with MPC–STL with CVAE Slackness

We further compare the proposed diffusion-guided MPC–STL against two MPC–STL variants to isolate the impact of (i) the generative prior used to infer robustness slackness and (ii) learning sub-goals to shape the MPC objective. Specifically, we consider:
  • MPC–STL with CVAE slackness (MPC–STL (CVAE)) [9]: a conditional VAE predicts only the rule-wise robustness slackness r t from ϕ t , and the MPC–STL optimizer generates trajectories under the predicted margins (no learned sub-goal term).
  • Strict MPC–STL (no slackness): an MPC–STL formulation that enforces all STL rules with fixed strict margins (i.e., without learned robustness slackness), corresponding to strict rule compliance.
The proposed method infers both rule-wise slackness and a sub-goal, i.e., ( r t , g t ) , from the conditional diffusion model, and uses g t in the MPC–STL objective (Equation (7)). In contrast, MPC–STL (CVAE) and Strict MPC–STL do not infer sub-goals. To keep the goal term in the MPC objective identical across MPC–STL variants, we provide these two baselines with a simple heuristic sub-goal: a point located at a fixed look-ahead distance d sg ahead of the ego vehicle along the (local) lane direction (aligned with the longitudinal axis in the highD coordinate frame). Thus, all MPC–STL methods share the same objective form, while differing only in how ( r t , g t ) are obtained (learned g t for the proposed method vs. heuristic g t for the baselines). This design isolates the effect of learning ( r t , g t ) (and the diffusion prior) from the choice of objective structure itself.
We perform controlled closed-loop evaluations under the same evaluation protocol. We evaluate on three disjoint held-out subsets of highD scenarios: highD dataset1 (tracks 1–20), highD dataset2 (tracks 21–40), and highD dataset3 (tracks 41–60). For each subset, we construct 100 test trials, where each trial corresponds to a different initial condition and traffic context (i.e., different starting positions and surrounding vehicle configurations). For each trial and each method, we repeat the rollout 10 times to account for stochasticity when sampling is enabled in the learned generative models (Proposed and MPC–STL (CVAE)). Strict MPC–STL is deterministic; repeating yields identical outcomes, and we include it for consistency in reporting.
We evaluate (i) the success rate, where a rollout is successful if the ego vehicle reaches the goal region without leaving the track boundaries and without colliding with any other vehicle, and (ii) the lane-change rate, the fraction of rollouts in which the ego vehicle performs at least one lane change. This metric matters because lane changes occur with a non-negligible frequency in real driving data; for example, highD reports an average lane-change frequency on the order of 10 1 per vehicle (depending on the subset and filtering) [38]. Overly conservative planners that never change lanes may therefore succeed only in limited situations and can fail to make progress in dense traffic.
Table 2 summarizes the quantitative comparison across the three subsets. The proposed method achieves the highest success rate and a higher lane-change rate than MPC–STL (CVAE), suggesting that jointly predicting slackness and sub-goals helps the optimizer resolve long-horizon dilemmas more consistently. In contrast, strict MPC–STL yields near-zero lane changes and the lowest success rate in our setting, suggesting that strict satisfaction of all rules can be overly conservative and may prevent progress in dense traffic.
Figure 6 provides a qualitative example illustrating the behavioral difference between the proposed method and MPC–STL (CVAE). In this scenario, MPC–STL (CVAE) remains in the current lane and does not execute a lane change, which limits progress toward the goal. In contrast, the proposed method predicts a sub-goal that encourages a lane change and, together with context-dependent slackness, enables the MPC–STL optimizer to perform a safe lane-change maneuver.
To highlight the practical benefit of using diffusion as a multimodal prior within the constrained MPC–STL loop, we visualize the diversity of MPC solutions induced by multiple samples from the learned generative models. In the proposed method, the conditional diffusion model jointly samples a pair ( r t , g t ) , i.e., rule-wise robustness slackness and a sub-goal, and MPC–STL then computes a trajectory for each sampled pair. In contrast, MPC–STL (CVAE) samples only r t from the CVAE prior (without learned sub-goals), and computes trajectories under the sampled slackness values.
Figure 7 compares the resulting sets of planned trajectories in representative highD scenarios. For this visualization only, we use S = 16 to reveal the diversity of candidate plans induced by each learned prior under the same context. The proposed diffusion-guided MPC–STL produces a noticeably more diverse set of feasible plans, capturing multiple plausible strategies (e.g., varying degrees of lateral motion and lane-change timing) under the same context. By contrast, the CVAE-based prior yields trajectories that are more concentrated around a single mode, resulting in limited behavioral diversity. This qualitative comparison supports our motivation for using diffusion in the MPC–STL setting: better mode coverage at the level of decision variables ( r , g ) translates into richer candidate plans, which is especially useful in driving contexts where multiple expert-like responses can be valid.

7. Conclusions

We presented a diffusion-guided MPC–STL framework that learns context-dependent rule flexibility and intermediate intent from expert demonstrations, and synthesizes closed-loop driving behaviors under STL constraints. Unlike prior learning-based MPC–STL approaches that predict only robustness slackness with VAE priors, our method employs a conditional diffusion model to jointly predict (i) a rule-wise robustness slackness vector that sets soft margins for STL satisfaction and (ii) a sub-goal that shapes the MPC objective. This coupling preserves the interpretability and constraint-based structure of STL specifications while enabling multimodal, human-like decision making by sampling decision variables and selecting feasible plans via receding-horizon optimization. More broadly, the proposed recipe—combining symbolic temporal-logic specifications with learned multimodal priors—offers a general pathway toward interpretable, constraint-aware controllers beyond driving.
Experiments on held-out highD scenarios showed that the proposed approach improves task success and induces more realistic lane-changing behavior compared to representative imitation-learning baselines, even when all methods are given oracle future trajectories of surrounding vehicles. Comparisons with MPC–STL (CVAE) and strict MPC–STL further indicated that jointly learning slackness and sub-goals yields more consistent long-horizon progress, whereas strict rule enforcement can lead to overly conservative behavior and reduced success in dense traffic. Qualitative results also suggested that diffusion-based sampling provides a richer set of plausible maneuver candidates than VAE-based slackness priors, translating into diverse feasible plans under the same context.
Several limitations and extensions remain for practical deployment. While the proposed pipeline is computationally tractable for receding-horizon evaluation in our experimental setting, the current implementation (diffusion sampling + mixed-integer MPC–STL) may not meet strict real-time constraints at high control rates; runtime can be reduced through fewer DDIM steps/samples, warm-starting, parallel candidate evaluation, and faster optimization backends or compiled implementations. The controller also depends on learned decision variables ( r t , g t ) and can be affected by prediction error or distribution shift; practical safeguards include projecting r t to a meaningful range, filtering outlier sub-goals, sampling multiple candidates and discarding infeasible ones, and reverting to a conservative fallback policy (e.g., strict MPC–STL or safe braking/keep-lane) when all candidates are infeasible. In addition, our evaluation provides oracle neighbor futures to isolate the decision-making mechanism; a deployable system must integrate a prediction module and handle uncertainty via multi-hypothesis prediction and uncertainty-aware constraint handling (e.g., scenario-based or chance-constrained MPC).
Beyond deployment concerns, our current study is intentionally scoped to structured highway track-driving with a fixed rule set. Extending to complex urban environments likely requires richer and more compositional specifications (e.g., context-conditioned rule-set selection or composition from STL templates) and semantic sub-goals/options (e.g., “yield to pedestrian, then turn”). Moreover, the formulation does not detect “unknown unknowns” where the specification itself becomes invalid (e.g., emergency vehicles or atypical hazards); a promising direction is a hierarchical architecture that augments MPC–STL with anomaly/out-of-distribution (OOD) and intent-reasoning modules to enable mode switching or activation of context-appropriate rule sets.

Author Contributions

Conceptualization, J.C. and K.C.; methodology, J.C. and K.C.; validation, J.C.; data curation, J.C.; writing—original draft preparation, J.C.; writing—review and editing, K.C.; visualization, J.C.; supervision, K.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP)-Innovative Human Resource Development for Local Intellectualization program grant funded by the Korea government (MSIT) (IITP-2025-RS-2023-00259678).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available in a publicly accessible repository.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Camacho, E.F.; Bordons, C. Model Predictive Control; Advanced Textbooks in Control and Signal Processing; Springer: London, UK, 2013. [Google Scholar]
  2. Dantec, E.; Naveau, M.; Fernbach, P.; Villa, N.; Saurel, G.; Stasse, O.; Taix, M.; Mansard, N. Whole-body model predictive control for biped locomotion on a torque-controlled humanoid robot. In Proceedings of the 2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids), Ginowan, Japan, 28–30 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 638–644. [Google Scholar]
  3. Kong, N.J.; Li, C.; Council, G.; Johnson, A.M. Hybrid iLQR model predictive control for contact implicit stabilization on legged robots. IEEE Trans. Robot. 2023, 39, 4712–4727. [Google Scholar] [CrossRef]
  4. Le Cleac’h, S.; Howell, T.A.; Yang, S.; Lee, C.Y.; Zhang, J.; Bishop, A.; Schwager, M.; Manchester, Z. Fast contact-implicit model predictive control. IEEE Trans. Robot. 2024, 40, 1617–1629. [Google Scholar] [CrossRef]
  5. Jang, E.; Irpan, A.; Khansari, M.; Kappler, D.; Ebert, F.; Lynch, C.; Levine, S.; Finn, C. Bc-z: Zero-shot task generalization with robotic imitation learning. In Proceedings of the 5th Conference on Robot Learning (CoRL), London, UK, 8–11 November 2021; PMLR: Cambridge, MA, USA, 2022; Volume 164, pp. 991–1002. [Google Scholar]
  6. Zare, M.; Kebria, P.M.; Khosravi, A.; Nahavandi, S. A survey of imitation learning: Algorithms, recent developments, and challenges. IEEE Trans. Cybern. 2024, 54, 7173–7186. [Google Scholar] [CrossRef] [PubMed]
  7. Codevilla, F.; Müller, M.; López, A.; Koltun, V.; Dosovitskiy, A. End-to-end driving via conditional imitation learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 4693–4700. [Google Scholar]
  8. Prakash, A.; Chitta, K.; Geiger, A. Multi-modal fusion transformer for end-to-end autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; IEEE/CVF: Piscataway, NJ, USA, 2021; pp. 7073–7083. [Google Scholar]
  9. Im, E.; Choi, M.; Cho, K. Model Predictive Control with Variational Autoencoders for Signal Temporal Logic Specifications. Sensors 2024, 24, 4567. [Google Scholar] [CrossRef] [PubMed]
  10. Maler, O.; Nickovic, D. Monitoring temporal properties of continuous signals. In FORMATS/FTRTFT; Springer: Berlin/Heidelberg, Germany, 2004; Volume 3253, pp. 152–166. [Google Scholar]
  11. Donzé, A.; Maler, O. Robust satisfaction of temporal logic over real-valued signals. In FORMATS; Springer: Berlin/Heidelberg, Germany, 2010; Volume 6246, pp. 92–106. [Google Scholar]
  12. Fainekos, G.E.; Kress-Gazit, H.; Pappas, G.J. Temporal logic motion planning for mobile robots. In Proceedings of the IEEE International Conference on Robotics and Automation, Barcelona, Spain, 18–22 April 2005. [Google Scholar]
  13. Karaman, S.; Frazzoli, E. Complex mission optimization for multiple-UAVs using linear temporal logic. In Proceedings of the IEEE American Control Conference, Seattle, WA, USA, 11–13 June 2008. [Google Scholar]
  14. Wongpiromsarn, T.; Topcu, U.; Murray, R.M. Receding horizon temporal logic planning for dynamical systems. In Proceedings of the IEEE Conference on Decision and Control, Shanghai, China, 15–18 December 2009. [Google Scholar]
  15. Pant, Y.V.; Abbas, H.; Mangharam, R. Distributed trajectory planning for multi-rotor uavs with signal temporal logic objectives. In Proceedings of the 2022 IEEE Conference on Control Technology and Applications (CCTA), Trieste, Italy, 23–25 August 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 476–483. [Google Scholar]
  16. Meng, Y.; Fan, C. Signal temporal logic neural predictive control. IEEE Robot. Autom. Lett. 2023, 8, 7719–7726. [Google Scholar] [CrossRef]
  17. Sohn, K.; Lee, H.; Yan, X. Learning structured output representation using deep conditional generative models. Adv. Neural Inf. Process. Syst. 2015, 28, 3483–3491. [Google Scholar]
  18. Chi, C.; Feng, S.; Du, Y.; Xu, Z.; Cousineau, E.; Burchfiel, B.; Song, S. Diffusion Policy: Visuomotor Policy Learning via Action Diffusion. In Proceedings of the Robotics: Science and Systems (RSS), Daegu, Republic of Korea, 10–14 July 2023. [Google Scholar]
  19. Karaman, S.; Sanfelice, R.G.; Frazzoli, E. Optimal control of mixed logical dynamical systems with linear temporal logic specifications. In Proceedings of the IEEE Conference on Decision and Control, Cancun, Mexico, 9–11 December 2008. [Google Scholar]
  20. Kwon, Y.; Agha, G. LTLC: Linear temporal logic for control. In Hybrid Systems: Computation and Control; Springer: Berlin/Heidelberg, Germany, 2008; pp. 316–329. [Google Scholar]
  21. Wolff, E.M.; Topcu, U.; Murray, R.M. Optimization-based control of nonlinear systems with linear temporal logic specifications. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation, Hong Kong, China, 31 May–7 June 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 5319–5325. [Google Scholar]
  22. Cho, K. Learning-based path planning under co-safe temporal logic specifications. IEEE Access 2023, 11, 25865–25878. [Google Scholar] [CrossRef]
  23. Raman, V.; Donzé, A.; Maasoumy, M.; Murray, R.M.; Sangiovanni-Vincentelli, A.; Seshia, S.A. Model predictive control with signal temporal logic specifications. In Proceedings of the 53rd IEEE Conference on Decision and Control, Los Angeles, CA, USA, 15–17 December 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 81–87. [Google Scholar]
  24. Sadigh, D.; Kapoor, A. Safe control under uncertainty. arXiv 2015, arXiv:1510.07313. [Google Scholar] [CrossRef]
  25. Mao, Y.; Acikmese, B.; Garoche, P.L.; Chapoutot, A. Successive convexification for optimal control with signal temporal logic specifications. In Proceedings of the 25th ACM International Conference on Hybrid Systems: Computation and Control, Milan, Italy, 4–6 May 2022; ACM: New York, NY, USA, 2022; pp. 1–7. [Google Scholar]
  26. Lenz, I.; Knepper, R.A.; Saxena, A. DeepMPC: Learning Deep Latent Features for Model Predictive Control. In Proceedings of the Robotics: Science and Systems, Rome, Italy, 13–17 July 2015. [Google Scholar]
  27. Carron, A.; Arcari, E.; Wermelinger, M.; Hewing, L.; Hutter, M.; Zeilinger, M.N. Data-driven model predictive control for trajectory tracking with a robotic arm. IEEE Robot. Autom. Lett. 2019, 4, 3758–3765. [Google Scholar] [CrossRef]
  28. Lin, Y.; McPhee, J.; Azad, N.L. Comparison of deep reinforcement learning and model predictive control for adaptive cruise control. IEEE Trans. Intell. Veh. 2020, 6, 221–231. [Google Scholar] [CrossRef]
  29. Kong, Z.; Jones, A.; Medina Ayala, A.; Aydin Gol, E.; Belta, C. Temporal logic inference for classification and prediction from data. In Proceedings of the International Conference on Hybrid Systems: Computation and Control, Berlin, Germany, 15–17 April 2014; ACM: New York, NY, USA, 2014; pp. 273–282. [Google Scholar]
  30. Castro, L.I.R.; Chaudhari, P.; Tumova, J.; Karaman, S.; Frazzoli, E.; Rus, D. Incremental sampling-based algorithm for minimum-violation motion planning. In Proceedings of the IEEE Conference on Decision and Control, Firenze, Italy, 10–13 December 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 3217–3224. [Google Scholar]
  31. Zheng, Y.; Liang, R.; Zheng, K.; Zheng, J.; Mao, L.; Li, J.; Gu, W.; Ai, R.; Li, S.E.; Zhan, X.; et al. Diffusion-based planning for autonomous driving with flexible guidance. arXiv 2025, arXiv:2501.15564. [Google Scholar] [CrossRef]
  32. Liao, B.; Chen, S.; Yin, H.; Jiang, B.; Wang, C.; Yan, S.; Zhang, X.; Li, X.; Zhang, Y.; Zhang, Q.; et al. Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 11–15 June 2025; Computer Vision Foundation/IEEE: Piscataway, NJ, USA, 2025; pp. 12037–12047. [Google Scholar]
  33. Yang, B.; Su, H.; Gkanatsios, N.; Ke, T.W.; Jain, A.; Schneider, J.; Fragkiadaki, K. Diffusion-es: Gradient-free planning with diffusion for autonomous driving and zero-shot instruction following. arXiv 2024, arXiv:2402.06559. [Google Scholar]
  34. Tong, H.; Chu, L.; Chen, Z.; Liu, Y.; Zhang, Y.; Hu, J. Multi-Objective Autonomous Eco-Driving Strategy: A Pathway to Future Green Mobility. Green Energy Intell. Transp. 2025, 4, 100279. [Google Scholar] [CrossRef]
  35. Li, T.; Ruan, J.; Zhang, K. The investigation of reinforcement learning-based End-to-End decision-making algorithms for autonomous driving on the road with consecutive sharp turns. Green Energy Intell. Transp. 2025, 4, 100288. [Google Scholar] [CrossRef]
  36. Song, J.; Meng, C.; Ermon, S. Denoising Diffusion Implicit Models. In Proceedings of the International Conference on Learning Representations, Virtual Event, 3–7 May 2021. [Google Scholar]
  37. Gurobi Optimization, Inc. Gurobi Optimizer Reference Manual. 2014. Available online: http://www.gurobi.com (accessed on 1 September 2025).
  38. Krajewski, R.; Bock, J.; Kloeker, L.; Eckstein, L. The highD Dataset: A Drone Dataset of Naturalistic Vehicle Trajectories on German Highways for Validation of Highly Automated Driving Systems. In Proceedings of the International Conference on Intelligent Transportation Systems, Maui, HI, USA, 4–7 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 2118–2125. [Google Scholar] [CrossRef]
  39. Mandlekar, A.; Xu, D.; Wong, J.; Nasiriany, S.; Wang, C.; Kulkarni, R.; Li, F.; Savarese, S.; Zhu, Y.; Martín-Martín, R. What Matters in Learning from Offline Human Demonstrations for Robot Manipulation. In Proceedings of the 5th Conference on Robot Learning (CoRL), London, UK, 8–11 November 2021; PMLR: Cambridge, MA, USA, 2022; Volume 164, pp. 1678–1690. [Google Scholar]
Figure 1. Overview of the proposed diffusion-guided MPC–STL framework. Expert demonstrations are used to construct training targets for robustness slackness and sub-goals. A conditional diffusion model predicts these quantities in novel contexts, and MPC–STL generates controls using the learned margins and sub-goals.
Figure 1. Overview of the proposed diffusion-guided MPC–STL framework. Expert demonstrations are used to construct training targets for robustness slackness and sub-goals. A conditional diffusion model predicts these quantities in novel contexts, and MPC–STL generates controls using the learned margins and sub-goals.
Electronics 15 00551 g001
Figure 2. Ego and nearby vehicles in the track-driving scenario.
Figure 2. Ego and nearby vehicles in the track-driving scenario.
Electronics 15 00551 g002
Figure 3. Driving environment illustrating the defined STL rules φ .
Figure 3. Driving environment illustrating the defined STL rules φ .
Electronics 15 00551 g003
Figure 4. Qualitative rollouts on the highD dataset. Top-left: diffusion-predicted goal point g t . Bottom-left: diffusion-predicted robustness slackness r t (red dashed boxes indicate r t , j < 0 ). Right: resulting trajectory obtained by solving the MPC–STL problem with the predicted ( r t , g t ) . (a) Predicted goal point and slackness suggest a left-lane maneuver. (b) Predicted goal point and slackness suggest a right-lane maneuver.
Figure 4. Qualitative rollouts on the highD dataset. Top-left: diffusion-predicted goal point g t . Bottom-left: diffusion-predicted robustness slackness r t (red dashed boxes indicate r t , j < 0 ). Right: resulting trajectory obtained by solving the MPC–STL problem with the predicted ( r t , g t ) . (a) Predicted goal point and slackness suggest a left-lane maneuver. (b) Predicted goal point and slackness suggest a right-lane maneuver.
Electronics 15 00551 g004
Figure 5. Qualitative snapshots in a held-out highD scenario for the long-horizon track-driving task. (a) Proposed diffusion-guided MPC–STL reaches the goal with a smooth lane change. (b) Diffusion policy collides near the goal region. (c) Transformer–VAE reaches the goal but exhibits prolonged motion along the dashed lane marking. (d) LSTM–GMM follows the dashed marking for an extended period and eventually collides.
Figure 5. Qualitative snapshots in a held-out highD scenario for the long-horizon track-driving task. (a) Proposed diffusion-guided MPC–STL reaches the goal with a smooth lane change. (b) Diffusion policy collides near the goal region. (c) Transformer–VAE reaches the goal but exhibits prolonged motion along the dashed lane marking. (d) LSTM–GMM follows the dashed marking for an extended period and eventually collides.
Electronics 15 00551 g005
Figure 6. Qualitative snapshots in a held-out highD scenario. The proposed method performs a lane change to make progress toward the goal, whereas MPC–STL (CVAE) stays in the lane and does not exploit the available maneuver. (a) Proposed (Diffusion-guided MPC–STL). (b) MPC–STL (CVAE) [9].
Figure 6. Qualitative snapshots in a held-out highD scenario. The proposed method performs a lane change to make progress toward the goal, whereas MPC–STL (CVAE) stays in the lane and does not exploit the available maneuver. (a) Proposed (Diffusion-guided MPC–STL). (b) MPC–STL (CVAE) [9].
Electronics 15 00551 g006
Figure 7. Diversity of planned trajectories induced by multimodal sampling ( S = 16 ). Left: Proposed (Diffusion-guided MPC–STL). Multiple candidate trajectories are obtained by sampling multiple pairs ( r t , g t ) from the conditional diffusion model and solving MPC–STL for each sample. Right: MPC–STL (CVAE). Multiple candidate trajectories are obtained by sampling multiple slackness vectors r t from the CVAE prior and solving MPC–STL (without learned sub-goals). Across representative scenarios (rows), the diffusion-guided approach yields a visibly broader set of feasible plans, reflecting improved multimodal coverage compared to the CVAE prior.
Figure 7. Diversity of planned trajectories induced by multimodal sampling ( S = 16 ). Left: Proposed (Diffusion-guided MPC–STL). Multiple candidate trajectories are obtained by sampling multiple pairs ( r t , g t ) from the conditional diffusion model and solving MPC–STL for each sample. Right: MPC–STL (CVAE). Multiple candidate trajectories are obtained by sampling multiple slackness vectors r t from the CVAE prior and solving MPC–STL (without learned sub-goals). Across representative scenarios (rows), the diffusion-guided approach yields a visibly broader set of feasible plans, reflecting improved multimodal coverage compared to the CVAE prior.
Electronics 15 00551 g007
Table 1. Success rates (%) on held-out highD scenarios across three disjoint subsets: highD dataset1 (tracks 1–20), highD dataset2 (tracks 21–40), and highD dataset3 (tracks 41–60). All methods receive oracle future trajectories of nearby vehicles over the horizon. Results are computed over 200 rollouts per subset under identical initial conditions.
Table 1. Success rates (%) on held-out highD scenarios across three disjoint subsets: highD dataset1 (tracks 1–20), highD dataset2 (tracks 21–40), and highD dataset3 (tracks 41–60). All methods receive oracle future trajectories of nearby vehicles over the horizon. Results are computed over 200 rollouts per subset under identical initial conditions.
MethodDataset1 (1–20)Dataset2 (21–40)Dataset3 (41–60)
Proposed (Diffusion-guided)93.592.092.5
Diffusion Policy [18]88.584.587.0
Transformer–VAE84.581.583.5
LSTM–GMM [39]81.577.080.5
Table 2. Comparison with MPC–STL variants on held-out highD trials across three disjoint subsets. For each subset (tracks 1–20/21–40/41–60), we evaluate 100 trials and repeat each trial 10 times per method. We report success rate (%) and lane-change rate (%) per subset.
Table 2. Comparison with MPC–STL variants on held-out highD trials across three disjoint subsets. For each subset (tracks 1–20/21–40/41–60), we evaluate 100 trials and repeat each trial 10 times per method. We report success rate (%) and lane-change rate (%) per subset.
MethodDataset1 (1–20)Dataset2 (21–40)Dataset3 (41–60)
Success Lane-Chg Success Lane-Chg Success Lane-Chg
Proposed94.210.691.810.992.911.2
MPC–STL (CVAE) [9]91.35.490.15.790.55.3
Strict MPC–STL84.50.083.90.084.40.0
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Choi, J.; Cho, K. Diffusion-Guided Model Predictive Control for Signal Temporal Logic Specifications. Electronics 2026, 15, 551. https://doi.org/10.3390/electronics15030551

AMA Style

Choi J, Cho K. Diffusion-Guided Model Predictive Control for Signal Temporal Logic Specifications. Electronics. 2026; 15(3):551. https://doi.org/10.3390/electronics15030551

Chicago/Turabian Style

Choi, Jonghyuck, and Kyunghoon Cho. 2026. "Diffusion-Guided Model Predictive Control for Signal Temporal Logic Specifications" Electronics 15, no. 3: 551. https://doi.org/10.3390/electronics15030551

APA Style

Choi, J., & Cho, K. (2026). Diffusion-Guided Model Predictive Control for Signal Temporal Logic Specifications. Electronics, 15(3), 551. https://doi.org/10.3390/electronics15030551

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop