Next Article in Journal
Optimal Cooperative Guidance Algorithm for Active Defense of EWA Under Dual Fighter Escort
Previous Article in Journal
Stochastic Assessment of Fracture Toughness and Reliability in Anisotropic Boride Layers on Ti6Al4V: A Monte Carlo-Based Mixed-Mode Model
Previous Article in Special Issue
Approximated Adaptive Dynamic Programming Control of Axial-Piston Pump
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Regime-Adaptive Conformal Calibration of Entropic Soft-Min Relaxations for Heterogeneous Optimization Problems

by
J. Ernesto Solanes
1,* and
Aitana Francés-Falip
2
1
Instituto de Diseño y Fabricación, Universitat Politècnica de València, 46022 València, Spain
2
Digital Integrated Technologies and E-Health, Technological Institute for Children’s Products and Leisure (AIJU), 03440 Alicante, Spain
*
Author to whom correspondence should be addressed.
Mathematics 2026, 14(7), 1188; https://doi.org/10.3390/math14071188
Submission received: 14 March 2026 / Revised: 30 March 2026 / Accepted: 31 March 2026 / Published: 2 April 2026
(This article belongs to the Special Issue Advances in Robust Control Theory and Its Applications)

Abstract

Entropic soft-min relaxations are widely used to obtain smooth approximations of minimum operators in optimization, machine learning, and control. The accuracy of this approximation is governed by an inverse temperature (or sharpness) parameter that controls the trade-off between smoothness and fidelity, yet its principled selection is typically heuristic. This work studies the data-driven calibration of the inverse temperature parameter governing the entropic soft-min relaxation, with explicit guarantees on the relaxation error between the soft-min operator and the infimum of the cost function. After establishing monotonicity properties and approximation bounds for the relaxation error, we introduce a conformal calibration rule that selects the smallest inverse temperature ensuring that the approximation error satisfies a prescribed tolerance with distribution-free finite-sample validity. The resulting selector adapts to the distribution of candidate cost-vector geometries represented in the calibration sample, enabling regime-specific inverse temperature selection in heterogeneous settings. Numerical experiments, including an adaptive cruise control application with safety filtering, show that the proposed method accurately tracks oracle calibration inverse temperatures and achieves near-target coverage in the exchangeable setting covered by the theory, while an additional shifted evaluation illustrates the role of this assumption.

1. Introduction

Approximating non-smooth optimization operators by smooth surrogates is a fundamental technique in modern optimization, machine learning, and control. Many decision-making architectures require selecting or aggregating the minimum value among a set of candidate costs or actions. However, the minimum operator is non-differentiable and therefore difficult to integrate into gradient-based algorithms or smooth optimization pipelines [1,2,3,4]. To address this limitation, smooth relaxations are frequently introduced to replace the exact infimum with differentiable approximations.
Among these constructions, the entropic soft-min, also known as the log-sum-exp approximation, plays a particularly important role. Given a collection of candidate costs, the soft-min operator replaces the exact infimum with an exponential aggregation controlled by an inverse temperature (or sharpness) parameter. This relaxation is widely used in convex optimization, reinforcement learning, probabilistic inference, and receding-horizon control, where differentiability and numerical stability are desirable properties. As the inverse temperature parameter λ increases, the entropic relaxation becomes sharper and approaches the infimum of the cost function [2,5,6,7,8].
A central difficulty in practice is the selection of the inverse temperature parameter governing this relaxation. The inverse temperature controls the trade-off between approximation accuracy and numerical smoothness and is often chosen heuristically using a single global value. In heterogeneous settings, where candidate cost geometries vary across instances, such a choice may lead to poor approximation accuracy or excessive conservativeness [9,10,11,12,13].
The log-sum-exp and related entropic relaxations have been extensively studied in optimization and machine learning and are widely used in statistical learning, probabilistic inference, and entropy-regularized optimization [2,6,14,15,16].
Similar entropic mechanisms also appear in control and robotics applications, where soft-min aggregation is used to combine candidate control actions or trajectories in receding-horizon frameworks. Despite this widespread use, principled methods for selecting the inverse temperature parameter with explicit guarantees on the approximation error remain limited [9,10,11,17,18].
Existing approaches typically rely on worst-case theoretical bounds or empirical risk criteria. Worst-case bounds derived from the cardinality of the candidate set provide deterministic guarantees on the approximation error but often yield excessively conservative inverse temperatures that degrade numerical conditioning or smoothness. Consequently, selecting the entropic inverse temperature remains largely heuristic in many practical applications [1,2,9,11].
This work addresses the problem of selecting the entropic inverse temperature in a data-driven manner while maintaining explicit guarantees on the approximation quality of the soft-min operator. Our approach is based on conformal calibration, a distribution-free statistical framework that provides finite-sample validity under minimal assumptions [19,20,21,22,23]. By interpreting the relaxation error of the soft-min operator as a calibration score, we construct a conformal rule that selects the smallest inverse temperature ensuring that the approximation error remains below a prescribed tolerance with high probability.
Beyond providing statistical guarantees, the proposed calibration rule adapts to heterogeneity in the geometry of candidate cost vectors at the level of the calibration population. In heterogeneous environments, different instances may exhibit distinct ambiguity structures. The conformal calibration mechanism captures this variability at the level of the calibration population through data-driven quantile estimation, enabling regime-adaptive inverse temperature selection across heterogeneous settings [24,25,26].
To analyze this approach, we establish structural properties of the entropic soft-min operator and its relaxation error, providing the analytical basis for formulating inverse temperature selection as a calibration problem [1,2].
The proposed method is evaluated through two numerical experiments. The first considers a heterogeneous benchmark with controlled candidate-set geometries. The second embeds the calibration mechanism into an adaptive cruise control problem with uncertain lead-vehicle prediction [10,27,28,29].
The main contributions of the current paper can be summarized as follows: (a) Structural analysis of the entropic soft-min relaxation. Fundamental properties of the relaxation error are established, including monotonicity with respect to the inverse temperature parameter and approximation bounds characterizing the behavior of the relaxation. (b) Conformal inverse temperature calibration. A distribution-free calibration rule is introduced that selects the smallest inverse temperature ensuring that the relaxation error satisfies a prescribed tolerance with finite-sample validity. (c) Adaptivity to heterogeneous candidate-set geometries. The proposed calibration rule selects regime-dependent inverse temperatures according to the geometry distributions represented in the calibration data. (d) Numerical validation in control-oriented settings. Numerical experiments, including a heterogeneous benchmark and an adaptive cruise control application with safety filtering, illustrate the empirical behavior of the proposed calibration mechanism.
This paper proceeds as follows. In Section 2, the entropic soft-min relaxation is introduced and its basic properties are established. Section 3 analyzes the associated relaxation error and derives its main structural properties. In Section 4, finite-domain and asymptotic bounds for the relaxation error are presented. Section 5 introduces the conformal calibration procedure for inverse temperature selection. Numerical experiments are reported in Section 6, including a heterogeneous benchmark and an adaptive cruise control application. Finally, concluding remarks are provided in Section 7.

2. Mathematical Preliminaries and Entropic Relaxation

This section introduces the mathematical framework used throughout the paper and recalls the definition of the entropic soft-min operator, which will serve as the basic building block for the approximation analysis and calibration procedure developed in the subsequent sections.
Let ( Φ , F , μ ) be a measure space and let
ρ : Φ R
be a measurable cost function. Let w : Φ [ 0 , ) be a probability density with respect to μ satisfying
Φ w ( ϕ ) d μ ( ϕ ) = 1 .
Let W denote the probability measure on ( Φ , F ) induced by the density w with respect to μ , namely
W ( d ϕ ) = w ( ϕ ) d μ ( ϕ ) .
Then (1) implies that W is a probability measure. Let λ > 0 denote an inverse temperature parameter. For the values of λ under consideration, define the associated partition function by
Z λ ( ρ ) = Φ e λ ρ ( ϕ ) W ( d ϕ ) ,
and assume that Z λ ( ρ ) < .
Definition 1.
For λ > 0 and whenever Z λ ( ρ ) < , the entropic relaxation of ρ is defined as
D λ ( ρ ) = 1 λ log Z λ ( ρ ) .
This corresponds to the classical log-sum-exp or entropic relaxation of the minimum operator widely used in convex optimization, statistical physics, and probabilistic inference [2,14,15]. The parameter λ controls the trade-off between smoothness and approximation accuracy, with larger values yielding sharper approximations of the infimum.
The operator D λ provides a smooth approximation of the infimum of ρ . As the parameter λ increases, the exponential weighting concentrates near low-cost regions of ρ , and the relaxation approaches the infimum of the cost function. When the infimum is attained, the concentration occurs around the minimizers.
Whenever 0 < Z λ ( ρ ) < , define the Gibbs probability measure associated with ρ by
Π λ ( d ϕ ) = e λ ρ ( ϕ ) Z λ ( ρ ) W ( d ϕ ) .
The following result establishes a basic property of the entropic relaxation.
Proposition 1.
For every λ > 0 such that Z λ ( ρ ) < ,
D λ ( ρ ) inf ϕ Φ ρ ( ϕ ) .
Proof. 
Let
m = inf ϕ Φ ρ ( ϕ ) .
Then ρ ( ϕ ) m for all ϕ Φ , hence
e λ ρ ( ϕ ) e λ m .
Integrating with respect to W ( d ϕ ) yields
Z λ ( ρ ) e λ m .
Substituting this inequality into (3) gives (5). □

3. Relaxation Error

The approximation accuracy of the entropic relaxation introduced in Definition 1 depends on the value of the inverse temperature parameter λ . Quantifying the discrepancy between log-sum-exp relaxations and the exact minimum has been studied in several contexts including convex optimization and probabilistic inference [2,5]. A natural way to quantify this approximation accuracy is to measure the discrepancy between the relaxed operator D λ ( ρ ) and the exact infimum of the cost function.
Definition 2.
For a cost function ρ : Φ R and λ > 0 , the relaxation error is defined as
E λ ( ρ ) = D λ ( ρ ) inf ϕ Φ ρ ( ϕ ) .
By Proposition 1, the entropic relaxation always overestimates the infimum of the cost function. Consequently,
E λ ( ρ ) 0 .
The relaxation error therefore provides a nonnegative measure of the approximation accuracy of the soft-min operator.
The dependence of the relaxation on the inverse temperature parameter can be characterized through the following monotonicity property.
Lemma 1.
Let D λ ( ρ ) denote the entropic relaxation defined in (3), and assume that Z λ ( ρ ) < for the values of λ under consideration. Then the mapping
λ D λ ( ρ )
is nonincreasing. Consequently, for any λ 1 < λ 2 ,
E λ 2 ( ρ ) E λ 1 ( ρ ) .
Proof. 
The entropic relaxation admits the variational representation
D λ ( ρ ) = inf Q W E Q [ ρ ] + 1 λ KL ( Q W ) ,
where the infimum is taken over probability measures Q on ( Φ , F ) that are absolutely continuous with respect to the reference probability measure W. Here
E Q [ ρ ] = Φ ρ ( ϕ ) Q ( d ϕ )
denotes the expectation of ρ under Q, and
KL ( Q W ) = Φ log d Q d W Q ( d ϕ )
denotes the Kullback–Leibler divergence of Q from W. This is the Gibbs variational principle (equivalently, the Donsker–Varadhan representation for the log-Laplace functional [30,31,32,33,34]).
Let λ 1 < λ 2 . Since the function λ 1 / λ is decreasing, for every admissible probability measure Q W we have
E Q [ ρ ] + 1 λ 2 KL ( Q W ) E Q [ ρ ] + 1 λ 1 KL ( Q W ) .
Taking the infimum over Q W on both sides yields
D λ 2 ( ρ ) D λ 1 ( ρ ) .
Therefore D λ ( ρ ) is nonincreasing in λ . Since E λ ( ρ ) = D λ ( ρ ) inf ϕ Φ ρ ( ϕ ) , the same monotonicity holds for the relaxation error. □
Under standard differentiability assumptions, and assuming that 0 < Z λ ( ρ ) < and differentiation under the integral sign is justified, the derivative satisfies
d d λ D λ ( ρ ) = 1 λ 2 KL ( Π λ W ) 0 .
where Π λ denotes the Gibbs probability measure defined in (4). Here both Π λ and W are probability measures on ( Φ , F ) , and KL ( Π λ W ) is understood in the standard measure-theoretic sense [14,15,33,35]. This identity is consistent with the Gibbs variational representation in (7) and provides an alternative justification of monotonicity.
Another useful structural property of the entropic relaxation is its concavity with respect to the cost function.
Theorem 1.
Let ρ 1 , ρ 2 : Φ R be such that
Z λ ( ρ 1 ) < , Z λ ( ρ 2 ) < .
Then, for any t [ 0 , 1 ] ,
D λ ( t ρ 1 + ( 1 t ) ρ 2 ) t D λ ( ρ 1 ) + ( 1 t ) D λ ( ρ 2 ) .
Proof. 
Using the definition of the partition function (2), we have
Z λ ( t ρ 1 + ( 1 t ) ρ 2 ) = Φ e λ ( t ρ 1 + ( 1 t ) ρ 2 ) W ( d ϕ ) .
Applying Hölder’s inequality yields
Z λ ( t ρ 1 + ( 1 t ) ρ 2 ) Z λ ( ρ 1 ) t Z λ ( ρ 2 ) 1 t .
Taking logarithms gives
log Z λ ( t ρ 1 + ( 1 t ) ρ 2 ) t log Z λ ( ρ 1 ) + ( 1 t ) log Z λ ( ρ 2 ) .
Multiplying by 1 / λ and using (3) yields (9). □

4. Approximation Bounds

The structural properties established in Section 3 make it possible to characterize the approximation accuracy of the entropic relaxation. In particular, the relaxation error introduced in Definition 2 can be bounded under different assumptions on the candidate set and the cost function [2,5].
We first consider the case where the candidate set is finite. In the finite-domain setting Φ = { 1 , , N } , we write ρ i : = ρ ( i ) for i = 1 , , N . Similarly, when needed, we write w i : = W ( { i } ) .
Theorem 2.
Assume Φ = { 1 , , N } and w i = 1 / N for i = 1 , , N . Then for every λ > 0
E λ ( ρ ) log N λ .
Proof. 
Let m = min i ρ i . Then
1 N i = 1 N e λ ρ i 1 N e λ m .
Using the definition of the entropic relaxation (3), we obtain
D λ ( ρ ) m + log N λ .
Combining this inequality with the definition of the relaxation error (6) yields the bound. □
The bound (10) provides a worst-case guarantee on the approximation accuracy of the soft-min operator. Although this bound is independent of the geometry of the cost function, it shows explicitly how the relaxation error decreases as the inverse temperature parameter λ increases.
More precise asymptotic behavior can be obtained in continuous domains under additional regularity assumptions on the cost function and the reference measure. In the remainder of this section we specialize to the following setting.
Assume that Φ R d is an open set equipped with the Lebesgue measure μ , and that the reference density w is continuous and strictly positive in a neighborhood of the minimizer. Suppose further that ρ : Φ R is twice continuously differentiable and admits a unique nondegenerate minimizer ϕ * Φ , meaning that
ρ ( ϕ * ) = 0 and 2 ρ ( ϕ * ) 0 .
Theorem 3.
Under the assumptions above, as λ ,
D λ ( ρ ) = ρ ( ϕ * ) + d 2 λ log λ 1 λ log w ( ϕ * ) ( 2 π ) d / 2 det ( 2 ρ ( ϕ * ) ) 1 / 2 + o 1 λ .
The expansion follows from the classical Laplace method for exponentially weighted integrals [36,37,38].
An immediate consequence concerns the asymptotic decay rate of the relaxation error.
Corollary 1.
Under the assumptions of Theorem 3, the relaxation error satisfies
E λ ( ρ ) = D λ ( ρ ) ρ ( ϕ * ) = O log λ λ .
The asymptotic behavior of the relaxation error can also be characterized in the case where the minimum is attained at finitely many points.
Proposition 2.
Assume Φ = { 1 , , N } and w i = 1 / N for i = 1 , , N . Let
m = min 1 i N ρ i
and suppose that the minimum is attained at exactly k indices.
Then, as λ ,
D λ ( ρ ) = m + 1 λ log N k + o 1 λ .
Consequently,
E λ ( ρ ) = 1 λ log N k + o 1 λ .
Proof. 
Let m = min 1 i N ρ i . Then
1 N i = 1 N e λ ρ i = 1 N k e λ m + i : ρ i > m e λ ρ i .
Since the set of indices with ρ i > m is finite, define
δ : = min i : ρ i > m ( ρ i m ) > 0 .
Then, for every index such that ρ i > m ,
e λ ρ i = e λ ( m + δ i ) = e λ m e λ δ i , δ i : = ρ i m δ .
Therefore,
i : ρ i > m e λ ρ i = e λ m i : ρ i > m e λ ( ρ i m ) = O e λ ( m + δ ) = o ( e λ m )
as λ .
It follows that
1 N i = 1 N e λ ρ i = k N e λ m ( 1 + o ( 1 ) ) .
Substituting this into the definition of the entropic relaxation (3) gives
D λ ( ρ ) = 1 λ log k N e λ m ( 1 + o ( 1 ) ) = m + 1 λ log N k + o 1 λ .
The second claim follows immediately from the definition of the relaxation error (6).
The asymptotic expression in Proposition 2 clarifies an important point about the geometry of the relaxation error. For fixed N, the leading term
1 λ log N k
is largest when the minimum is attained at a single index ( k = 1 ), and decreases as the number of exact minimizers increases. Thus, from the viewpoint of approximating the minimum value, the most unfavorable discrete geometry is not the presence of many exact minimizers, but rather a sharp configuration with a unique minimizer. This observation helps interpret the numerical experiments. Proposition 2 shows that, for exact minimizer multiplicity, the value-approximation error becomes smaller as the number of exact minimizers increases. In the experiments, a related effect appears in regimes with many candidates close to the minimum, which empirically tend to require a smaller inverse temperature to satisfy a prescribed tolerance on value approximation, even though they may appear more ambiguous from an action-selection perspective.
These bounds show that the approximation accuracy of the entropic relaxation improves as the inverse temperature parameter λ increases. In discrete settings, the decay rate can be characterized more precisely in terms of the multiplicity of the minimizers, as shown in Proposition 2. This reveals a direct connection between the geometry of the cost function and the sharpness required for the relaxation.
However, the appropriate choice of λ depends on this geometry and is generally unknown in practice.

5. Conformal Calibration

The bounds derived in Section 4 characterize how the approximation error of the entropic relaxation depends on the inverse temperature parameter λ . In practice, however, the geometry of the cost function is typically unknown, and therefore these bounds cannot be used directly to determine an appropriate value of λ . This motivates a data-driven calibration procedure that selects the inverse temperature parameter from observed instances while providing finite-sample guarantees on the resulting relaxation error [19,20,22].
Suppose ρ ( 1 ) , , ρ ( n ) are observed calibration instances and ρ ( n + 1 ) is a future test instance, and assume that
ρ ( 1 ) , , ρ ( n + 1 )
are exchangeable.
Throughout this section we use superscripts to index problem instances and subscripts to index candidates within a given instance. In particular, ρ ( j ) denotes the j-th observed instance and ρ k ( j ) denotes the k-th candidate cost within that instance.
For each instance we evaluate the relaxation error introduced in Definition 2. This motivates the following score function.
Definition 3.
For each observed instance ρ ( j ) , define the score
S j ( λ ) = E λ ( ρ ( j ) ) .
Definition 4.
For a cost function ρ, define
Λ ( ρ ) : = inf { λ > 0 : E λ ( ρ ) τ } .
In other words, Λ ( ρ ) is the smallest inverse temperature for which the relaxation error does not exceed the tolerance τ for that instance.
The score measures the discrepancy between the entropic relaxation and the exact infimum for the observed instance.
Given a candidate value of λ , we compute the empirical ( 1 α ) quantile of these scores,
q ^ 1 α ( λ ) .
For a given value of λ , let
S 1 ( λ ) , , S n ( λ )
denote the calibration scores defined in (15). Throughout the paper we assume that
α 1 n + 1 ,
and we define the empirical ( 1 α ) quantile as
q ^ 1 α ( λ ) = S ( k ) ( λ ) , k = ( 1 α ) ( n + 1 ) ,
where S ( 1 ) ( λ ) S ( n ) ( λ ) denote the ordered scores. Under the assumption α 1 / ( n + 1 ) , we have k n , so the empirical quantile is well defined.
The next result shows that this empirical quantile inherits the same monotonicity with respect to λ as the individual relaxation errors.
Lemma 2.
The function
λ q ^ 1 α ( λ )
is nonincreasing.
Proof. 
By Lemma 1, for each observed instance ρ ( i ) , the relaxation error
λ E λ ( ρ ( i ) )
is nonincreasing. Therefore, for every i = 1 , , n and every λ 1 < λ 2 , we have
S i ( λ 2 ) S i ( λ 1 ) .
Hence the entire sample of scores at λ 2 is componentwise no larger than the sample of scores at λ 1 . It follows that the empirical ( 1 α ) quantile cannot increase, namely
q ^ 1 α ( λ 2 ) q ^ 1 α ( λ 1 ) .
This proves the claim. □
The monotonicity of q ^ 1 α ( λ ) implies that the constraint
q ^ 1 α ( λ ) τ
defines a threshold condition in λ , which makes it natural to select the smallest inverse temperature parameter satisfying the prescribed tolerance.
Definition 5.
Given a tolerance τ > 0 , define
λ * = inf λ > 0 : q ^ 1 α ( λ ) τ .
The selected inverse temperature λ * corresponds to the smallest value of λ for which the empirical upper quantile of the relaxation error does not exceed the prescribed tolerance.
The following result establishes a distribution-free guarantee for the resulting inverse temperature selection rule under the standard exchangeability assumption. In this context, exchangeability means that the joint distribution of the sample
ρ ( 1 ) , , ρ ( n + 1 )
is invariant under permutations of the indices.
Lemma 3.
For i = 1 , , n , let
Λ i : = Λ ( ρ ( i ) )
with Λ ( ρ ( i ) ) defined as in Definition 4, and let
Λ ( 1 ) Λ ( n )
denote the ordered values of Λ 1 , , Λ n .
Then the inverse temperature defined in Definition 5 satisfies
λ * = Λ ( k ) , k = ( 1 α ) ( n + 1 ) .
Proof. 
By definition, for each j = 1 , , n ,
Λ j λ E λ ( ρ ( j ) ) τ .
Therefore the condition
q ^ 1 α ( λ ) τ
is equivalent to requiring that at least k calibration instances satisfy
E λ ( ρ ( j ) ) τ .
Equivalently, at least k instances satisfy
Λ ( ρ ( j ) ) λ .
The smallest λ with this property is precisely Λ ( k ) . □
Theorem 4.
Assume that the sample of cost functions
ρ ( 1 ) , , ρ ( n + 1 )
is exchangeable and let the calibration scores be defined as in (15). Let q ^ 1 α ( λ ) denote the empirical quantile defined in (17), and let λ * be the inverse temperature defined in (18). Then the entropic relaxation computed with λ * satisfies
P E λ * ( ρ ( n + 1 ) ) τ 1 α .
Proof. 
Let
Λ i = Λ ( ρ ( i ) ) , i = 1 , , n + 1 .
Because the instances ρ ( 1 ) , , ρ ( n + 1 ) are exchangeable, the variables
Λ 1 , , Λ n + 1
are also exchangeable.
By Lemma 3, the selected inverse temperature satisfies
λ * = Λ ( k ) ,
where Λ ( 1 ) Λ ( n ) are the ordered calibration scores.
By the standard split conformal argument for exchangeable variables,
P Λ n + 1 Λ ( k ) 1 α .
By the definition of Λ ( ρ ) , the condition
Λ ( ρ ( n + 1 ) ) λ *
implies
E λ * ( ρ ( n + 1 ) ) τ .
Therefore
P E λ * ( ρ ( n + 1 ) ) τ 1 α .    
The probability in (19) is taken over the joint randomness of the calibration sample and the test instance. Thus, the guarantee is marginal with respect to the data-generating process and does not hold conditionally on the realized calibration sample, which is the standard notion of validity in split conformal inference [19,20,21,22,39].
Remark 1.
Theorem 4 provides a distribution-free certificate for the approximation accuracy of the entropic relaxation. The guarantee holds under exchangeability of the observed instances and does not rely on parametric assumptions on the underlying distribution of cost functions.
The selected parameter λ * can therefore be interpreted as the smallest inverse temperature for which the relaxation error satisfies the tolerance constraint with probability at least 1 α under the exchangeability assumption.
Remark 2.
The calibrated inverse temperature λ * is a single value determined by the calibration sample. Therefore, the proposed method is adaptive at the level of the calibration population, or at the regime level when calibration is performed separately across regimes. It does not produce an instance-dependent inverse temperature for each new test cost function. Extending the method toward covariate-aware or conditional calibration for individual instances is left for future work.

6. Numerical Experiments

This section presents two numerical experiments. The first considers a heterogeneous benchmark with prescribed candidate-set geometries. The second considers an adaptive cruise control problem with safety filtering, namely, the exclusion of candidate control sequences whose predicted rollouts violate the imposed safety constraints, and uncertain lead-vehicle prediction.
All simulations were implemented in MATLAB R2025b using fixed random seeds and paired evaluation across methods. In particular, whenever several methods are compared on the same episode, they are evaluated on the same initial state, the same realization of the exogenous process, and the same candidate set. This paired design eliminates Monte Carlo variability from the comparison and ensures that differences in performance are attributable only to the selected value of λ .
The coverage guarantees established in Theorem 4 hold under the exchangeability assumption on the sequence of cost functions. The experiments are conducted mainly under the exchangeability assumption of Theorem 4. Experiment 1 also includes a shifted evaluation.
Figure 1 summarizes the experimental pipeline used in the numerical studies. Throughout the experiments, we consider the finite-domain version of the entropic relaxation
D λ ( ρ ) = 1 λ log 1 N i = 1 N e λ ρ i , E λ ( ρ ) = D λ ( ρ ) min i ρ i ,
where N denotes the number of feasible candidates in the current batch after the safety filtering step. Table 1 summarizes the inverse-temperature selection methods compared in the numerical experiments.
Unless otherwise stated, the quality of the selected inverse temperature is assessed through empirical coverage,
P E λ ( ρ ) τ ,
the mean and upper quantiles of E λ , and the absolute distance to the relevant oracle inverse temperature.
In the experiments below, adaptation is implemented at the regime level by calibrating separate inverse temperatures on regime-specific calibration samples. Thus, the observed adaptivity is distributional rather than instance-conditional.

6.1. Experiment 1: Heterogeneous Control Benchmark

We first consider a heterogeneous benchmark in which the geometry of the planner score vectors is prescribed explicitly. The purpose is to examine how the calibrated inverse temperature varies across regimes with different candidate-set geometries.

6.1.1. Setup

The first experiment considers a control-oriented benchmark in which the geometry of the planner score vectors is imposed explicitly. The underlying plant is the discrete-time linear system
x t + 1 = A x t + B u t + w t ,
with
A = 1.0 0.18 0.0 1.0 , B = 0.0 0.18 ,
horizon length T = 30 , action bound | u t |   U max with U max = 2.5 , stage cost weights
Q = diag ( 5.5 , 0.9 ) , R = 0.05 ,
and terminal cost
Q f = diag ( 8.0 , 1.2 ) .
The initial state is sampled as
x 0 = x 0 , 1 x 0 , 2 , x 0 , 1 Unif [ r x , r x ] , x 0 , 2 Unif [ r v , r v ] ,
and the goal is sampled as
g N ( 0 , σ g 2 ) .
The process noise has the form
w t = 0 ξ t , ξ t N ( 0 , σ w 2 ) ,
with regime-dependent standard deviation σ w .
The corresponding quadratic closed-loop cost functional is
J cl = t = 0 T 1 ( x t x ref ) Q ( x t x ref ) + R u t 2 + ( x T x ref ) Q f ( x T x ref ) ,
where x ref = g 0 and g denotes the goal position. This functional is used to evaluate the resulting closed-loop trajectories. As described below, the planner score vectors employed in the entropic soft-min relaxation are generated synthetically in order to control the geometry of the candidate set.
At each time step, a bank of M = 64 candidate actions is generated around a nominal proportional-derivative controller used only to define the center of the candidate bank. The nominal control law is
u t nom = sat [ U max , U max ] 1.35 ( x t , 1 g ) 0.90 x t , 2 ,
where g denotes the goal position. Equivalently, the controller gains are K p = 1.35 and K v = 0.90 . The same gains are used in all regimes and for all compared methods, and no additional gain-optimization step is performed in the reported experiments. Thus, the role of the nominal PD controller is not to provide a competing benchmark, but rather to generate a simple and interpretable reference command that drives the state toward the sampled goal while damping velocity. The candidate actions are sampled as
u t ( i ) = u t nom + η i , η i N ( 0 , σ u 2 ) ,
with regime-dependent action dispersion σ u .
The planner score vectors used in the experiment are constructed synthetically in order to control the geometry of the candidate set. In particular, the vector
ρ = ( ρ 1 , , ρ M )
is generated directly from prescribed gap distributions rather than being derived from a simulated optimal control cost. This design isolates the behavior of the operator of the entropic soft-min and allows the ambiguity structure of the candidate set to be controlled explicitly.
The planner score vector is constructed explicitly in order to control the ambiguity structure. For each candidate bank, one candidate is assigned zero excess cost, a prescribed number of candidates are assigned near-minimizer gaps, and the remaining candidates are assigned far gaps. More precisely, if
ρ = ( ρ 1 , , ρ M ) R M
denotes the planner score vector, then one entry is set to the minimum value, a subset of cardinality k near is assigned gaps sampled uniformly from a regime-dependent interval [ δ ̲ near , δ ¯ near ] , and the remaining entries are assigned gaps sampled uniformly from [ δ ̲ far , δ ¯ far ] . Lower scores are preferentially assigned to actions closer to the nominal command. The strength of this alignment is controlled by a regime-dependent parameter a align [ 0 , 1 ] : values close to one strongly align the best planner scores with actions near u t nom , whereas smaller values introduce weaker alignment and therefore more ambiguous soft selection.
To implement the alignment between planner scores and the nominal action, let
d i = | u t ( i ) u t nom |
denote the distance of candidate i from the nominal action. Let π dist be the permutation that sorts candidates by increasing d i , and let π rand be a random permutation of { 1 , , M } .
The final permutation used to assign the score gaps is obtained through a convex ranking mixture controlled by the alignment parameter a align [ 0 , 1 ] :
r i = a align rank π dist ( i ) + ( 1 a align ) rank π rand ( i ) .
Candidates are then ordered by increasing r i , and the generated score gaps are assigned following this order. In the rare case of ties in r i , ties are broken uniformly at random. Thus, when a align = 1 the smallest planner scores correspond exactly to candidates closest to the nominal action, whereas when a align = 0 the assignment is random.
The three regimes are defined by the following parameters:
  • Easy:
    σ w = 0.05 , σ g = 0.40 , r x = 3.5 , r v = 1.0 , σ u = 1.00 ,
    k near = 1 , [ δ ̲ near , δ ¯ near ] = [ 0.00 , 0.010 ] ,
    [ δ ̲ far , δ ¯ far ] = [ 1.50 , 5.50 ] , a align = 1.00 .
  • Moderate:
    σ w = 0.08 , σ g = 0.90 , r x = 4.0 , r v = 1.2 , σ u = 0.75 ,
    k near = 6 , [ δ ̲ near , δ ¯ near ] = [ 0.00 , 0.050 ] ,
    [ δ ̲ far , δ ¯ far ] = [ 0.90 , 3.20 ] , a align = 0.75 .
  • Hard:
    σ w = 0.12 , σ g = 1.40 , r x = 4.8 , r v = 1.5 , σ u = 0.55 ,
    k near = 24 , [ δ ̲ near , δ ¯ near ] = [ 0.00 , 0.090 ] ,
    [ δ ̲ far , δ ¯ far ] = [ 0.25 , 1.60 ] , a align = 0.45 .
For calibration, the inverse temperature is searched over the grid
λ [ 0.5 , 120 ]
using 500 equally spaced values. The target coverage is 1 α = 0.9 with tolerance τ = 0.10 . The global calibration set uses 300 score vectors per regime, the regime-specific calibration set also uses 300 score vectors per regime, and the oracle calibration set uses 600 score vectors per regime. Evaluation is performed on 140 paired episodes per regime. The global fixed baseline is calibrated by pooling the regime-specific calibration samples, the mean-risk baseline uses the same pooled set, and the worst-case baseline is defined by
λ = log M τ , M = 64 .
Given a score vector ρ and candidate actions u t ( 1 ) , , u t ( M ) , the control applied to the plant is obtained through Gibbs aggregation,
u t = i = 1 M exp ( λ ρ i ) j = 1 M exp ( λ ρ j ) u t ( i ) .
For each evaluation batch we generate a fresh score vector, compute the relaxation error E λ ( ρ ) , and record the indicator 1 { E λ ( ρ ) τ } , where 1 { · } denotes the indicator function. The reported coverage corresponds to the empirical probability of this event over all evaluation batches. This evaluation protocol matches the exchangeable sampling assumption used in Theorem 4.

6.1.2. Results

Table 2 reports the calibrated inverse temperatures. The oracle values decrease from the easy regime to the hard regime. The proposed conformal selector matches the oracle values in all three regimes, whereas the global, mean-risk, and worst-case baselines deviate substantially, especially in the moderate and hard regimes.
The oracle calibration inverse temperature decreases from the easy regime to the hard regime. This is consistent with the fact that the approximation error concerns the minimum value rather than the identity of the minimizing action.
As shown in Proposition 2, when several candidates attain or nearly attain the minimum value, the bias of the entropic relaxation decreases. Therefore, a smaller inverse temperature is sufficient to satisfy the prescribed tolerance on value approximation.
A regime-wise comparison of distances to the oracle is given in Table 3. The global baseline is nearly correct in the easy regime, but becomes markedly overconservative in the moderate and hard regimes. The worst-case baseline is consistently the most conservative, while the mean-risk baseline fails to track the oracle in a regime-dependent manner. By contrast, the proposed conformal selector coincides with the oracle by construction up to numerical resolution.
At the aggregate level, the proposed method attains empirical coverage 0.913 , close to the target value 0.9 , whereas the mean-risk baseline undercovers and the worst-case baseline overcovers. Coverage is computed over independently generated score vectors.
These results show that the proposed selector tracks the oracle calibration inverse temperature across regimes, whereas a single global inverse temperature does not.
Figure 2 summarizes the main aggregate metrics, and Figure 3 shows the regime-specific calibration curves q ^ 1 α ( λ ) together with the selected inverse temperatures. The crossing point with the tolerance τ occurs at different values of λ in the three regimes, and the proposed method tracks these values accurately.

6.1.3. Shifted Evaluation and the Role of Exchangeability

We also consider a shifted evaluation derived from the moderate regime. In this setting, the test score vectors are generated from a sharper candidate geometry than those used in calibration, so the exchangeability assumption is violated.
More precisely, the conformal inverse temperature is calibrated on the moderate regime and then evaluated on a shifted regime whose oracle inverse temperature is substantially larger:
λ prop moderate = 29.2375 , λ oracle shift = 35.9429 .
Thus, the inverse temperature inherited from the calibration regime is too small for the shifted test distribution.
Table 4 reports the corresponding results. The empirical coverage of ProposedConf falls well below the nominal level in the shifted regime, with
P E λ ( ρ ) τ = 0 , q 0.95 ( E λ ) = 0.1221 > τ = 0.10 ,
whereas the shift-specific oracle restores near-nominal behavior. This behavior is consistent with Theorem 4, whose guarantee is established under exchangeability. When the test distribution differs from the calibration distribution, the nominal coverage level need not be preserved.

6.2. Experiment 2: Application-Oriented Adaptive Cruise Control Benchmark

We next consider a longitudinal adaptive cruise control (ACC) problem with uncertain lead-vehicle prediction and safety filtering [10,27,28,29]. This experiment is used to assess the proposed calibration rule in a closed-loop control setting.
The ACC problem is chosen here because it combines three ingredients that make the soft-min inverse temperature meaningful in practice:
1.
a finite set of candidate trajectories or control sequences;
2.
uncertain predictive scores induced by the forecast of the lead vehicle;
3.
a nontrivial trade-off among safety, comfort, and tracking performance [27,29,40].
This experiment compares the calibration-oriented inverse temperature with the inverse temperature preferred by closed-loop performance.

6.2.1. Setup

Let d t denote the distance to the lead vehicle, v t e the ego speed, and v t the lead speed. The ego vehicle dynamics are modeled in discrete time using a standard longitudinal ACC kinematic model [27,29] as
v t + 1 e = max { v t e + Δ t u t , 0 } , d t + 1 = d t + Δ t ( v t v t e ) ,
with sampling time Δ t = 0.2 s and acceleration command
u t [ u min , u max ] , u min = 4.5 , u max = 2.5 .
The planning horizon is H = 16 , the episode length is T = 60 , and the number of candidate sequences generated at each decision step is M = 100 . The reference speed is v ref = 20 m/s.
The desired following distance is
d safe ( v t e ) = 8.0 + 1.1 max { v t e , 0 } ,
and a hard minimum distance
d min = 3.0 m
is enforced throughout. The margin term uses
ε = 0.35 .
At each decision step, infeasible candidates are removed by the safety filter described below. Consequently, the effective number of candidates may be smaller than M. Throughout the experiment we denote by N the number of feasible candidates that remain after filtering. Thus N M may vary across time steps and episodes.
A nominal acceleration is computed as
u t nom = sat [ u min , u max ] 0.15 ( d t d safe ( v t e ) ) + 0.50 ( v t v t e ) + 0.20 ( v ref v t e ) .
Around this nominal command, M candidate sequences of length H are generated by adding Gaussian perturbations and then applying temporal smoothing. More precisely, each candidate sequence is first sampled as
u t : t + H 1 ( i ) = u t nom 1 + η ( i ) , η k ( i ) N ( 0 , σ cand 2 ) ,
where σ cand is regime-dependent. The resulting sequence is then smoothed by a moving-average filter of window length three and finally saturated componentwise to the interval [ u min , u max ] .
The lead-vehicle prediction is generated over the horizon by a stochastic acceleration model. Starting from the current lead speed v t , the predicted lead trajectory is propagated as
v t + k + 1 = max { v t + k + Δ t a t + k , 0 } ,
where the acceleration a t + k is sampled according to the current traffic regime. At each prediction step, the lead vehicle may undergo nominal fluctuations, stop-and-go behavior, or hard braking events, with regime-dependent probabilities and magnitudes. Prediction uncertainty is incorporated by adding Gaussian noise with regime-dependent standard deviation.
Conformal calibration is performed only on feasible candidate sets. For a candidate sequence to be feasible, its predicted rollout must satisfy
d t + k > d min , k = 0 , , H 1 ,
along the entire prediction horizon. Infeasible candidates are discarded before the soft-min aggregation is applied. If no feasible candidate exists, emergency braking u t = u min is applied. When no feasible candidate exists ( N = 0 ), the planner score vector ρ is not defined and the entropic relaxation is not evaluated. Such time steps are therefore excluded from the coverage computation of P ( E λ ( ρ ) τ ) . Consequently, the reported coverage is conditional on the event N 1 . We also report the empirical frequency of N = 0 events. Infeasible candidates are removed before calibration and evaluation.
For each feasible candidate, the predictive score used by the planner is
J pred ( i ) = k = 0 H 1 w track ( v ref v t + k e ) 2 + w u ( u t + k ( i ) ) 2 + w Δ u ( u t + k ( i ) u t + k 1 ( i ) ) 2 + w margin d t + k d min + ε ,
with weights
w track = 1.2 , w u = 0.06 , w Δ u = 0.45 , w margin = 3.5 .
In (28), the smoothness term is evaluated with the convention
u t 1 ( i ) : = u t ( i ) ,
so that the contribution at k = 0 is zero. Equivalently, the smoothness penalty acts only on increments within the predicted sequence.
Let ρ i = J pred ( i ) denote the score of the i-th feasible candidate. The corresponding Gibbs weights are defined as
p i ( λ ) = exp ( λ ρ i ) j = 1 N exp ( λ ρ j ) .
The control command applied to the system is obtained through a soft-argmin aggregation of the first control inputs of the feasible candidate sequences,
u t = i = 1 N p i ( λ ) u t ( i ) .
This construction corresponds to the Gibbs policy associated with the entropic relaxation and yields a smooth interpolation between averaging ( λ small) and hard minimum selection ( λ ). If the feasible set is empty, the fallback control is u t = u min .
The real closed-loop cost uses the same structure as (28), with the addition of a collision penalty
w coll = 2 × 10 4
whenever d t d min , and a terminal penalty
3 ( v ref v T e ) 2 .
Three lead-vehicle regimes are considered.
  • Easy:
    μ v = 21.0 , σ v = 1.0 , σ a = 0.15 , σ pred = 0.12 , σ cand = 0.35 ,
    with stop-and-go probability 0, hard-brake probability 0, hard-brake mean 1.2 , target number of near-minimizers 1, near-gap range [ 0.00 , 0.015 ] , mid-gap range [ 0.20 , 0.60 ] , far-gap range [ 1.20 , 2.50 ] , initial gap range [ 18 , 28 ] , and ego-speed bias 0.0 .
  • Moderate:
    μ v = 16.0 , σ v = 2.0 , σ a = 0.45 , σ pred = 0.45 , σ cand = 0.55 ,
    with stop-and-go probability 0.15 , hard-brake probability 0.05 , hard-brake mean 2.4 , target number of near-minimizers 6, near-gap range [ 0.00 , 0.045 ] , mid-gap range [ 0.12 , 0.45 ] , far-gap range [ 0.70 , 1.80 ] , initial gap range [ 12 , 22 ] , and ego-speed bias 0.5 .
  • Hard:
    μ v = 11.0 , σ v = 2.8 , σ a = 0.85 , σ pred = 0.90 , σ cand = 0.85 ,
    with stop-and-go probability 0.40 , hard-brake probability 0.15 , hard-brake mean 3.8 , target number of near-minimizers 18, near-gap range [ 0.00 , 0.090 ] , mid-gap range [ 0.08 , 0.28 ] , far-gap range [ 0.25 , 0.80 ] , initial gap range [ 10 , 18 ] , and ego-speed bias 1.2 .
The initial gap, lead speed, and ego speed are sampled from these regime-dependent distributions. For reproducibility, the calibration set sizes are 260 samples per regime for global and local calibration and 520 samples per regime for OracleCal, while the evaluation uses 140 paired episodes per regime. The calibration grid is
λ [ 2 , 80 ] ,
and the oracle-performance grid is restricted to
λ [ 8 , 60 ] ,
with an additional minimum coverage constraint of 0.75 .
Unless otherwise stated, the conformal calibration in this experiment uses target coverage level 1 α = 0.9 and tolerance τ = 0.11 for the relaxation error E λ ( ρ ) . In particular, the worst-case baseline is defined as
λ = log M τ .
For M = 100 and τ = 0.11 , this yields λ = 41.8652 .
The application-level performance metrics recorded in the ACC experiment are summarized in Table 5. At the operator level, we also report empirical coverage, mean relaxation error, the empirical 0.95 quantile of E λ , and the distances to OracleCal and OraclePerf.

6.2.2. Results

The calibrated inverse temperatures are reported in Table 6. The oracle-performance inverse temperatures are selected on the restricted grid λ [ 8 , 60 ] subject to a minimum empirical coverage level of 0.75 . The oracle calibration inverse temperatures are well separated across regimes:
λ oracle cal easy = 37.7422 , λ oracle cal moderate = 30.1098 , λ oracle cal hard = 21.7327 .
The proposed conformal selector closely tracks these values in all three regimes, whereas the global, mean-risk, and worst-case baselines remain substantially misaligned, especially in the hard regime. The spread of the oracle calibration inverse temperatures is 16.0095 .
Figure 4 shows the regime-specific calibration curves. Table 7 and Table 8 report aggregate results across all regimes. The oracle-performance inverse temperature is computed on a validation set, whereas the results reported in Table 7 and Table 8 are obtained on an independent test set. Therefore, OraclePerf need not minimize the reported test cost.
The ACC experiment shows that the proposed selector remains close to the oracle calibration inverse temperature and attains coverage close to the target level. It also shows that the inverse temperature selected for calibration need not coincide with the one preferred by closed-loop performance (see Figure 5). In the easy and moderate regimes, the performance-oriented value lies near the upper end of the admissible range, whereas in the hard regime a smaller value is preferred.

7. Concluding Remarks

This paper studied the principled selection of the inverse temperature parameter in entropic soft-min relaxations. Starting from the definition of the operator, structural properties of the associated relaxation error were established, including nonnegativity, monotonicity with respect to the inverse temperature parameter, and approximation bounds in finite and asymptotic regimes.
On this basis, a conformal calibration procedure was introduced to select the smallest inverse temperature ensuring that the relaxation error satisfies a prescribed tolerance with finite-sample distribution-free validity. The resulting rule provides an explicit certificate on the approximation quality of the entropic relaxation under exchangeability of the observed instances.
The numerical experiments support the main claims of the paper. The heterogeneous control-oriented benchmark shows that the proposed conformal selector accurately tracks the oracle calibration inverse temperature in non-homogeneous settings where a single global inverse temperature is inadequate. In the same benchmark, an additional shifted evaluation illustrates that the finite-sample guarantee is tied to the exchangeable setting of Theorem 4: when the test distribution departs from the calibration distribution, the nominal coverage level is no longer guaranteed. The adaptive cruise control experiment demonstrates that the in-distribution behavior of the proposed method persists in a realistic control scenario with explicit safety filtering and uncertain prediction, thereby establishing practical relevance beyond synthetic operator tests.
At the same time, the application experiment clarifies an important conceptual point: an inverse temperature that is optimal for certifying the approximation quality of the entropic soft-min operator is not necessarily identical to the inverse temperature that minimizes the final task-level cost. This distinction does not weaken the role of conformal calibration; rather, it clarifies its purpose. The proposed method provides a certified, distribution-free, and regime-adaptive mechanism for selecting the soft-min inverse temperature itself, which can then be incorporated into broader optimization and control architectures.
The present method selects a single inverse temperature for a given calibration population, or a single inverse temperature per regime when regime-specific calibration is used. Extending this framework toward instance-conditional or covariate-aware conformal calibration constitutes a natural next step, especially in settings where side information is available at test time and finer-grained adaptation is desirable.

Author Contributions

Conceptualization, J.E.S.; Software, A.F.-F.; Formal analysis, J.E.S.; Investigation, J.E.S. and A.F.-F.; Data curation, A.F.-F.; Writing—original draft, J.E.S.; Writing—review and editing, J.E.S. and A.F.-F.; Supervision, J.E.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work has received funding from the European Union’s DIGITAL Europe Programme under Grant Agreement No. 101226207 (project AI-SECRETT), from the Spanish Government under Grant PID2024-156583OB-I00 (funded by MCIN/AEI/10.13039/501100011033), and from the Generalitat Valenciana under Grant CIGE/2024/195.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Nesterov, Y. Smooth minimization of non-smooth functions. Math. Program. 2005, 103, 127–152. [Google Scholar] [CrossRef]
  2. Boyd, S.; Vandenberghe, L. Convex Optimization; Number pt. 1 in Berichte über Verteilte Messysteme; Cambridge University Press: Cambridge, UK, 2004; Available online: https://books.google.es/books?id=mYm0bLd3fcoC (accessed on 30 March 2026).
  3. Nishioka, A.; Kanno, Y. A feasible smoothing accelerated projected gradient method for nonsmooth convex optimization. Oper. Res. Lett. 2024, 57, 107181. [Google Scholar] [CrossRef]
  4. Palomar, D.P. Convex Optimization Theory. In Portfolio Optimization: Theory and Application; Cambridge University Press: Cambridge, UK, 2025; pp. 491–538. Available online: https://portfoliooptimizationbook.com/slides/slides-convex-optimization-theory.pdf (accessed on 30 March 2026).
  5. Blanchard, P.; Higham, D.J.; Higham, N.J. Accurately computing the log-sum-exp and softmax functions. IMA J. Numer. Anal. 2020, 41, 2311–2330. [Google Scholar] [CrossRef]
  6. Calafiore, G.C.; Gaubert, S.; Possieri, C. Log-Sum-Exp Neural Networks and Posynomial Models for Convex and Log-Log-Convex Data. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 827–838. [Google Scholar] [CrossRef]
  7. Zhang, S.; Tepedelenlioğlu, C.; Banavar, M.K.; Spanias, A. Max-consensus using the soft maximum. In Proceedings of the 2013 Asilomar Conference on Signals, Systems and Computers; IEEE: Piscataway, NJ, USA, 2013; pp. 433–437. [Google Scholar] [CrossRef]
  8. Nowzari, A.; Rabbat, M.G. Improved Bounds for Max Consensus in Wireless Networks. IEEE Trans. Signal Inf. Process. Netw. 2019, 5, 305–319. [Google Scholar] [CrossRef]
  9. Lefebvre, T.; Crevecoeur, G. On Entropy Regularized Path Integral Control for Trajectory Optimization. Entropy 2020, 22, 1120. [Google Scholar] [CrossRef] [PubMed]
  10. Lefebvre, T.; Crevecoeur, G. Entropy Regularised Deterministic Optimal Control: From Path Integral Solution to Sample-Based Trajectory Optimisation. In Proceedings of the 2022 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM); IEEE Press: Piscataway, NJ, USA, 2022; pp. 401–408. [Google Scholar] [CrossRef]
  11. Williams, G.; Drews, P.; Goldfain, B.; Rehg, J.M.; Theodorou, E.A. Aggressive driving with model predictive path integral control. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA); IEEE: Piscataway, NJ, USA, 2016; pp. 1433–1440. [Google Scholar] [CrossRef]
  12. Xu, H.; Xuan, J.; Zhang, G.; Lu, J. Trust region policy optimization via entropy regularization for Kullback–Leibler divergence constraint. Neurocomputing 2024, 589, 127716. [Google Scholar] [CrossRef]
  13. Tao, F.; Wu, M.; Cao, Y. Generalized Maximum Entropy Reinforcement Learning via Reward Shaping. IEEE Trans. Artif. Intell. 2024, 5, 1563–1572. [Google Scholar] [CrossRef]
  14. Wainwright, M.J.; Jordan, M.I. Graphical Models, Exponential Families, and Variational Inference. Found. Trends Mach. Learn. 2008, 1, 1–305. [Google Scholar] [CrossRef]
  15. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar] [CrossRef]
  16. Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. In NIPS’13: Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2; Curran Associates Inc.: Red Hook, NY, USA, 2013; pp. 2292–2300. [Google Scholar] [CrossRef]
  17. Bhole, A.; Filabadi, M.M.; Crevecoeur, G.; Lefebvre, T. Unifying Entropy Regularization in Optimal Control: From and Back to Classical Objectives via Iterated Soft Policies and Path Integral Solutions. arXiv 2025, arXiv:2512.06109. [Google Scholar] [CrossRef]
  18. Arriojas, A.; Adamczyk, J.; Tiomkin, S.; Kulkarni, R.V. Entropy Regularized Reinforcement Learning Using Large Deviation Theory. Phys. Rev. Res. 2023, 5, 023085. [Google Scholar] [CrossRef]
  19. Vovk, V.; Gammerman, A.; Shafer, G. Algorithmic Learning in a Random World, 2nd ed.; Springer: Cham, Switzerland, 2022. [Google Scholar] [CrossRef]
  20. Lei, J.; G’Sell, M.; Rinaldo, A.; Tibshirani, R.J.; Wasserman, L. Distribution-Free Predictive Inference for Regression. J. Am. Stat. Assoc. 2018, 113, 1094–1111. [Google Scholar] [CrossRef]
  21. Balasubramanian, V.N.; Ho, S.S.; Vovk, V. Conformal Prediction for Reliable Machine Learning: Theory, Adaptations and Applications; Morgan Kaufmann: Burlington, MA, USA, 2014. [Google Scholar] [CrossRef]
  22. Romano, Y.; Patterson, E.; Candes, E. Conformalized Quantile Regression. In Proceedings of the Advances in Neural Information Processing Systems; Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32, Available online: https://proceedings.neurips.cc/paper_files/paper/2019/file/5103c3584b063c431bd1268e9b5e76fb-Paper.pdf (accessed on 30 March 2026).
  23. Zhou, X.; Chen, B.; Gui, Y.; Cheng, L. Conformal Prediction: A Data Perspective. arXiv 2025, arXiv:2410.06494. [Google Scholar] [CrossRef]
  24. Flovik, V. Quantifying Distribution Shifts and Uncertainties for Enhanced Model Robustness in Machine Learning Applications. arXiv 2024, arXiv:2405.01978. [Google Scholar] [CrossRef]
  25. Lin, V.; Jang, K.J.; Dutta, S.; Caprio, M.; Sokolsky, O.; Lee, I. DC4L: Distribution shift recovery via data-driven control for deep learning models. In Proceedings of the 6th Annual Learning for Dynamics & Control Conference, 15–17 July 2024; PMLR, Proceedings of Machine Learning Research; Abate, A., Cannon, M., Margellos, K., Papachristodoulou, A., Eds.; University of Oxford: Oxford, UK, 2024; Volume 242, pp. 1526–1538. Available online: https://proceedings.mlr.press/v242/lin24b.html (accessed on 30 March 2026).
  26. Danesh, M.H.; Wabartha, M.; Pineau, J.; Lin, H.C. Mitigating Distribution Shifts: Uncertainty-Aware Offline-to-Online Reinforcement Learning. 2025. Available online: https://openreview.net/forum?id=0WqAnYWi7H (accessed on 30 March 2026).
  27. Guo, L.; Ge, P.; Sun, D.; Qiao, Y. Adaptive Cruise Control Based on Model Predictive Control with Constraints Softening. Appl. Sci. 2020, 10, 1635. [Google Scholar] [CrossRef]
  28. Li, X.; Girard, A.; Kolmanovsky, I. Safe Adaptive Cruise Control Under Perception Uncertainty: A Deep Ensemble and Conformal Tube Model Predictive Control Approach. In Proceedings of the 2025 IEEE 64th Conference on Decision and Control (CDC); IEEE: Piscataway, NJ, USA, 2025; pp. 3081–3088. [Google Scholar] [CrossRef]
  29. Wang, J.; Gong, X.; Wang, P.; Wang, Y.; Wang, R.; Guo, L.; Hu, Y.; Chen, H. A Stochastic Predictive Adaptive Cruise Control System with Uncertainty-Aware Velocity Prediction and Parameter Self-Learning. IEEE Trans. Intell. Transp. Syst. 2024, 25, 13900–13913. [Google Scholar] [CrossRef]
  30. Donsker, M.D.; Varadhan, S.R.S. Asymptotic evaluation of certain markov process expectations for large time, I. Commun. Pure Appl. Math. 1975, 28, 1–47. [Google Scholar] [CrossRef]
  31. Dembo, A.; Zeitouni, O. Large Deviations Techniques and Applications, 2nd ed.; Stochastic Modelling and Applied Probability; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar] [CrossRef]
  32. Dupuis, P.; Ellis, R.S. A Weak Convergence Approach to the Theory of Large Deviations; Wiley Series in Probability and Statistics; John Wiley & Sons: Hoboken, NJ, USA, 1997. [Google Scholar] [CrossRef]
  33. Ellis, R.S. Entropy, Large Deviations, and Statistical Mechanics; Grundlehren der Mathematischen Wissenschaften; Springer: New York, NY, USA, 1985. [Google Scholar] [CrossRef]
  34. Hartmann, C.; Richter, L.; Schütte, C.; Zhang, W. Variational Characterization of Free Energy: Theory and Algorithms. Entropy 2017, 19, 626. [Google Scholar] [CrossRef]
  35. Amari, S.I. Information Geometry and Its Applications; Applied Mathematical Sciences; Springer: Tokyo, Japan, 2016. [Google Scholar] [CrossRef]
  36. Wong, R. Asymptotic Approximations of Integrals; Academic Press: Cambridge, MA, USA, 1989. [Google Scholar] [CrossRef]
  37. de Bruijn, N. Asymptotic Methods in Analysis; Bibliotheca Mathematica; Dover Publications: Garden City, NY, USA, 1981; Available online: https://books.google.es/books?id=_tnwmvHmVwMC (accessed on 30 March 2026).
  38. Temme, N.M. Uniform asymptotic methods for integrals. Indag. Math. 2013, 24, 739–765. [Google Scholar] [CrossRef]
  39. Oliveira, R.I.; Orenstein, P.; Ramos, T.; Romano, J.V. Split Conformal Prediction and Non-Exchangeable Data. J. Mach. Learn. Res. 2022, 25, 225:1–225:38. Available online: http://jmlr.org/papers/v25/23-1553.html (accessed on 30 March 2026).
  40. Chacko, P.J.; Krishna, S.M.; Haneesh, K.M.; Daya, J.L.F.; Stonier, A.A. Design and validation of minimal jerk lane changing manoeuvre for adaptive cruise control in electric vehicles. Discov. Appl. Sci. 2025, 7, 1363. [Google Scholar] [CrossRef]
Figure 1. Summary of the experimental pipeline used in the numerical studies. In both experiments, a calibration sample is first generated to select the inverse temperature parameter, after which the resulting value is assessed on paired evaluation episodes and compared with the considered baselines.
Figure 1. Summary of the experimental pipeline used in the numerical studies. In both experiments, a calibration sample is first generated to select the inverse temperature parameter, after which the resulting value is assessed on paired evaluation episodes and compared with the considered baselines.
Mathematics 14 01188 g001
Figure 2. Experiment 1: aggregate comparison in the heterogeneous control-oriented benchmark. The panels report closed-loop cost, empirical coverage, average selected inverse temperature, mean distance to the oracle calibration inverse temperature, upper quantiles of the relaxation error, and excess conservativeness.
Figure 2. Experiment 1: aggregate comparison in the heterogeneous control-oriented benchmark. The panels report closed-loop cost, empirical coverage, average selected inverse temperature, mean distance to the oracle calibration inverse temperature, upper quantiles of the relaxation error, and excess conservativeness.
Mathematics 14 01188 g002
Figure 3. Experiment 1: regime-specific calibration curves. The proposed conformal inverse temperature tracks the oracle calibration inverse temperature in all three regimes, whereas the global inverse temperature fails to adapt as ambiguity increases.
Figure 3. Experiment 1: regime-specific calibration curves. The proposed conformal inverse temperature tracks the oracle calibration inverse temperature in all three regimes, whereas the global inverse temperature fails to adapt as ambiguity increases.
Mathematics 14 01188 g003
Figure 4. Experiment 2: calibration curves in the ACC benchmark. The feasible-candidate formulation avoids collapse and yields distinct oracle calibration inverse temperatures across traffic regimes.
Figure 4. Experiment 2: calibration curves in the ACC benchmark. The feasible-candidate formulation avoids collapse and yields distinct oracle calibration inverse temperatures across traffic regimes.
Mathematics 14 01188 g004
Figure 5. Experiment 2: aggregate comparison in the ACC benchmark. The proposed selector remains close to the oracle calibration inverse temperature and achieves near-target coverage, while OraclePerf identifies a distinct task-optimal inverse temperature.
Figure 5. Experiment 2: aggregate comparison in the ACC benchmark. The proposed selector remains close to the oracle calibration inverse temperature and achieves near-target coverage, while OraclePerf identifies a distinct task-optimal inverse temperature.
Mathematics 14 01188 g005
Table 1. Summary of the inverse-temperature selection methods compared in the numerical experiments.
Table 1. Summary of the inverse-temperature selection methods compared in the numerical experiments.
MethodCalibration DataSelection RuleAvailability
ProposedConfRegime-specific calibration sampleSmallest λ such that the empirical ( 1 α ) quantile of E λ does not exceed τ Experiments 1 and 2
GlobalFixedPooled calibration sample across regimesSame conformal rule applied after pooling all regimesExperiments 1 and 2
MeanRiskSame sample as GlobalFixedSmallest λ such that the empirical mean of E λ does not exceed τ Experiments 1 and 2
WorstCaseNo data-driven calibrationDeterministic rule λ = log ( N ) / τ obtained from Theorem 2Experiments 1 and 2
OracleCalLarge regime-specific calibration sampleSmallest feasible λ computed on the oracle calibration setExperiments 1 and 2
OraclePerfValidation set of simulated episodesValue of λ minimizing the empirical closed-loop cost subject to a minimum coverage requirementExperiment 2 only
Table 2. Experiment 1: calibrated inverse temperatures by regime.
Table 2. Experiment 1: calibrated inverse temperatures by regime.
RegimeProposedConfGlobalFixedMeanRiskOracleCal
Easy36.182435.942927.082236.1824
Moderate29.237535.942927.082229.2375
Hard16.784635.942927.082216.7846
Table 3. Experiment 1: absolute distance to the oracle calibration inverse temperature.
Table 3. Experiment 1: absolute distance to the oracle calibration inverse temperature.
MethodEasyModerateHard
ProposedConf0.00000.00000.0000
GlobalFixed0.23956.705419.1583
MeanRisk9.10022.155310.2976
WorstCase5.406512.351424.8043
Table 4. Experiment 1: shifted evaluation illustrating the role of exchangeability. The inverse temperature is calibrated on the moderate regime and evaluated on a shifted regime with sharper candidate geometry.
Table 4. Experiment 1: shifted evaluation illustrating the role of exchangeability. The inverse temperature is calibrated on the moderate regime and evaluated on a shifted regime with sharper candidate geometry.
MethodCoverage q 0.95 ( E λ ) | λ     λ oracle |
ProposedConf0.00000.12216.7054
GlobalFixed0.95900.09990.0000
MeanRisk0.00000.13158.8607
WorstCase1.00000.08685.6459
OracleShift0.95900.09990.0000
Table 5. Application-level performance metrics recorded in the ACC experiment.
Table 5. Application-level performance metrics recorded in the ACC experiment.
MetricDefinition
Real closed-loop costAccumulated stage cost over the episode, including tracking, control, smoothness, and margin terms, plus the collision penalty when d t d min
Collision rateFraction of time steps for which d t d min
Unsafe rateFraction of time steps for which d t < d safe ( v t e )
Minimum realized distance min t d t over the episode
Accumulated speed tracking error t | v ref     v t e |
Control energy t u t 2
Total variation of the acceleration command t | u t u t 1 | with u 1 = 0
Accumulated jerk t | ( u t u t 1 ) / Δ t | with u 1 = 0
Table 6. Experiment 2: calibrated inverse temperatures in the ACC benchmark.
Table 6. Experiment 2: calibrated inverse temperatures in the ACC benchmark.
RegimeProposedConfGlobalFixedMeanRiskWorstCaseOracleCalOraclePerf
Easy37.556137.183828.248241.865237.742260.0000
Moderate30.109837.183828.248241.865230.109860.0000
Hard21.732737.183828.248241.865221.73278.0000
Table 7. Experiment 2: aggregate results across all regimes.
Table 7. Experiment 2: aggregate results across all regimes.
MethodCostCollision RateUnsafe RateMin GapCoverage | λ     λ cal | | λ     λ perf |
ProposedConf8474.5360.00000.69914.5810.9380.062122.0223
GlobalFixed8416.7320.00000.70614.4210.9067.694524.9387
MeanRisk8459.3970.00000.70014.5450.5215.957027.9173
WorstCase8399.0670.00000.70814.3691.00012.003623.3783
OracleCal8474.5310.00000.69914.5810.9690.000021.9602
OraclePerf8509.9240.00000.69814.6340.94921.96020.0000
Table 8. Experiment 2: supplementary aggregate application-level results across all regimes.
Table 8. Experiment 2: supplementary aggregate application-level results across all regimes.
MethodSpeed Tracking ErrorControl EnergyTotal VariationJerk
ProposedConf460.190257.91747.493237.465
GlobalFixed458.115257.76747.909239.545
MeanRisk459.952257.89847.548237.740
WorstCase457.381257.72948.081240.407
OracleCal460.188257.91747.493237.467
OraclePerf460.231258.27247.808239.040
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Solanes, J.E.; Francés-Falip, A. Regime-Adaptive Conformal Calibration of Entropic Soft-Min Relaxations for Heterogeneous Optimization Problems. Mathematics 2026, 14, 1188. https://doi.org/10.3390/math14071188

AMA Style

Solanes JE, Francés-Falip A. Regime-Adaptive Conformal Calibration of Entropic Soft-Min Relaxations for Heterogeneous Optimization Problems. Mathematics. 2026; 14(7):1188. https://doi.org/10.3390/math14071188

Chicago/Turabian Style

Solanes, J. Ernesto, and Aitana Francés-Falip. 2026. "Regime-Adaptive Conformal Calibration of Entropic Soft-Min Relaxations for Heterogeneous Optimization Problems" Mathematics 14, no. 7: 1188. https://doi.org/10.3390/math14071188

APA Style

Solanes, J. E., & Francés-Falip, A. (2026). Regime-Adaptive Conformal Calibration of Entropic Soft-Min Relaxations for Heterogeneous Optimization Problems. Mathematics, 14(7), 1188. https://doi.org/10.3390/math14071188

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop