1. Introduction
Approximating non-smooth optimization operators by smooth surrogates is a fundamental technique in modern optimization, machine learning, and control. Many decision-making architectures require selecting or aggregating the minimum value among a set of candidate costs or actions. However, the minimum operator is non-differentiable and therefore difficult to integrate into gradient-based algorithms or smooth optimization pipelines [
1,
2,
3,
4]. To address this limitation, smooth relaxations are frequently introduced to replace the exact infimum with differentiable approximations.
Among these constructions, the entropic soft-min, also known as the log-sum-exp approximation, plays a particularly important role. Given a collection of candidate costs, the soft-min operator replaces the exact infimum with an exponential aggregation controlled by an inverse temperature (or sharpness) parameter. This relaxation is widely used in convex optimization, reinforcement learning, probabilistic inference, and receding-horizon control, where differentiability and numerical stability are desirable properties. As the inverse temperature parameter
increases, the entropic relaxation becomes sharper and approaches the infimum of the cost function [
2,
5,
6,
7,
8].
A central difficulty in practice is the selection of the inverse temperature parameter governing this relaxation. The inverse temperature controls the trade-off between approximation accuracy and numerical smoothness and is often chosen heuristically using a single global value. In heterogeneous settings, where candidate cost geometries vary across instances, such a choice may lead to poor approximation accuracy or excessive conservativeness [
9,
10,
11,
12,
13].
The log-sum-exp and related entropic relaxations have been extensively studied in optimization and machine learning and are widely used in statistical learning, probabilistic inference, and entropy-regularized optimization [
2,
6,
14,
15,
16].
Similar entropic mechanisms also appear in control and robotics applications, where soft-min aggregation is used to combine candidate control actions or trajectories in receding-horizon frameworks. Despite this widespread use, principled methods for selecting the inverse temperature parameter with explicit guarantees on the approximation error remain limited [
9,
10,
11,
17,
18].
Existing approaches typically rely on worst-case theoretical bounds or empirical risk criteria. Worst-case bounds derived from the cardinality of the candidate set provide deterministic guarantees on the approximation error but often yield excessively conservative inverse temperatures that degrade numerical conditioning or smoothness. Consequently, selecting the entropic inverse temperature remains largely heuristic in many practical applications [
1,
2,
9,
11].
This work addresses the problem of selecting the entropic inverse temperature in a data-driven manner while maintaining explicit guarantees on the approximation quality of the soft-min operator. Our approach is based on conformal calibration, a distribution-free statistical framework that provides finite-sample validity under minimal assumptions [
19,
20,
21,
22,
23]. By interpreting the relaxation error of the soft-min operator as a calibration score, we construct a conformal rule that selects the smallest inverse temperature ensuring that the approximation error remains below a prescribed tolerance with high probability.
Beyond providing statistical guarantees, the proposed calibration rule adapts to heterogeneity in the geometry of candidate cost vectors at the level of the calibration population. In heterogeneous environments, different instances may exhibit distinct ambiguity structures. The conformal calibration mechanism captures this variability at the level of the calibration population through data-driven quantile estimation, enabling regime-adaptive inverse temperature selection across heterogeneous settings [
24,
25,
26].
To analyze this approach, we establish structural properties of the entropic soft-min operator and its relaxation error, providing the analytical basis for formulating inverse temperature selection as a calibration problem [
1,
2].
The proposed method is evaluated through two numerical experiments. The first considers a heterogeneous benchmark with controlled candidate-set geometries. The second embeds the calibration mechanism into an adaptive cruise control problem with uncertain lead-vehicle prediction [
10,
27,
28,
29].
The main contributions of the current paper can be summarized as follows: (a) Structural analysis of the entropic soft-min relaxation. Fundamental properties of the relaxation error are established, including monotonicity with respect to the inverse temperature parameter and approximation bounds characterizing the behavior of the relaxation. (b) Conformal inverse temperature calibration. A distribution-free calibration rule is introduced that selects the smallest inverse temperature ensuring that the relaxation error satisfies a prescribed tolerance with finite-sample validity. (c) Adaptivity to heterogeneous candidate-set geometries. The proposed calibration rule selects regime-dependent inverse temperatures according to the geometry distributions represented in the calibration data. (d) Numerical validation in control-oriented settings. Numerical experiments, including a heterogeneous benchmark and an adaptive cruise control application with safety filtering, illustrate the empirical behavior of the proposed calibration mechanism.
This paper proceeds as follows. In
Section 2, the entropic soft-min relaxation is introduced and its basic properties are established.
Section 3 analyzes the associated relaxation error and derives its main structural properties. In
Section 4, finite-domain and asymptotic bounds for the relaxation error are presented.
Section 5 introduces the conformal calibration procedure for inverse temperature selection. Numerical experiments are reported in
Section 6, including a heterogeneous benchmark and an adaptive cruise control application. Finally, concluding remarks are provided in
Section 7.
2. Mathematical Preliminaries and Entropic Relaxation
This section introduces the mathematical framework used throughout the paper and recalls the definition of the entropic soft-min operator, which will serve as the basic building block for the approximation analysis and calibration procedure developed in the subsequent sections.
Let
be a measure space and let
be a measurable cost function. Let
be a probability density with respect to
satisfying
Let
W denote the probability measure on
induced by the density
w with respect to
, namely
Then (
1) implies that
W is a probability measure. Let
denote an inverse temperature parameter. For the values of
under consideration, define the associated partition function by
and assume that
.
Definition 1. For and whenever , the entropic relaxation of ρ is defined as This corresponds to the classical log-sum-exp or entropic relaxation of the minimum operator widely used in convex optimization, statistical physics, and probabilistic inference [
2,
14,
15]. The parameter
controls the trade-off between smoothness and approximation accuracy, with larger values yielding sharper approximations of the infimum.
The operator provides a smooth approximation of the infimum of . As the parameter increases, the exponential weighting concentrates near low-cost regions of , and the relaxation approaches the infimum of the cost function. When the infimum is attained, the concentration occurs around the minimizers.
Whenever
, define the Gibbs probability measure associated with
by
The following result establishes a basic property of the entropic relaxation.
Proposition 1. For every such that , Proof. Then
for all
, hence
Integrating with respect to
yields
Substituting this inequality into (
3) gives (
5). □
3. Relaxation Error
The approximation accuracy of the entropic relaxation introduced in Definition 1 depends on the value of the inverse temperature parameter
. Quantifying the discrepancy between log-sum-exp relaxations and the exact minimum has been studied in several contexts including convex optimization and probabilistic inference [
2,
5]. A natural way to quantify this approximation accuracy is to measure the discrepancy between the relaxed operator
and the exact infimum of the cost function.
Definition 2. For a cost function and , the relaxation error is defined as By Proposition 1, the entropic relaxation always overestimates the infimum of the cost function. Consequently,
The relaxation error therefore provides a nonnegative measure of the approximation accuracy of the soft-min operator.
The dependence of the relaxation on the inverse temperature parameter can be characterized through the following monotonicity property.
Lemma 1. Let denote the entropic relaxation defined in (3), and assume that for the values of λ under consideration. Then the mappingis nonincreasing. Consequently, for any , Proof. The entropic relaxation admits the variational representation
where the infimum is taken over probability measures
Q on
that are absolutely continuous with respect to the reference probability measure
W. Here
denotes the expectation of
under
Q, and
denotes the Kullback–Leibler divergence of
Q from
W. This is the Gibbs variational principle (equivalently, the Donsker–Varadhan representation for the log-Laplace functional [
30,
31,
32,
33,
34]).
Let
. Since the function
is decreasing, for every admissible probability measure
we have
Taking the infimum over
on both sides yields
Therefore is nonincreasing in . Since , the same monotonicity holds for the relaxation error. □
Under standard differentiability assumptions, and assuming that
and differentiation under the integral sign is justified, the derivative satisfies
where
denotes the Gibbs probability measure defined in (
4). Here both
and
W are probability measures on
, and
is understood in the standard measure-theoretic sense [
14,
15,
33,
35]. This identity is consistent with the Gibbs variational representation in (
7) and provides an alternative justification of monotonicity.
Another useful structural property of the entropic relaxation is its concavity with respect to the cost function.
Theorem 1. Let be such thatThen, for any , Proof. Using the definition of the partition function (
2), we have
Applying Hölder’s inequality yields
Multiplying by
and using (
3) yields (
9). □
4. Approximation Bounds
The structural properties established in
Section 3 make it possible to characterize the approximation accuracy of the entropic relaxation. In particular, the relaxation error introduced in Definition 2 can be bounded under different assumptions on the candidate set and the cost function [
2,
5].
We first consider the case where the candidate set is finite. In the finite-domain setting , we write for . Similarly, when needed, we write .
Theorem 2. Assume and for . Then for every Proof. Using the definition of the entropic relaxation (
3), we obtain
Combining this inequality with the definition of the relaxation error (
6) yields the bound. □
The bound (
10) provides a worst-case guarantee on the approximation accuracy of the soft-min operator. Although this bound is independent of the geometry of the cost function, it shows explicitly how the relaxation error decreases as the inverse temperature parameter
increases.
More precise asymptotic behavior can be obtained in continuous domains under additional regularity assumptions on the cost function and the reference measure. In the remainder of this section we specialize to the following setting.
Assume that
is an open set equipped with the Lebesgue measure
, and that the reference density
w is continuous and strictly positive in a neighborhood of the minimizer. Suppose further that
is twice continuously differentiable and admits a unique nondegenerate minimizer
, meaning that
Theorem 3. Under the assumptions above, as , The expansion follows from the classical Laplace method for exponentially weighted integrals [
36,
37,
38].
An immediate consequence concerns the asymptotic decay rate of the relaxation error.
Corollary 1. Under the assumptions of Theorem 3, the relaxation error satisfies The asymptotic behavior of the relaxation error can also be characterized in the case where the minimum is attained at finitely many points.
Proposition 2. Assume and for . Letand suppose that the minimum is attained at exactly k indices. Proof. Let
. Then
Since the set of indices with
is finite, define
Then, for every index such that
,
Therefore,
as
.
Substituting this into the definition of the entropic relaxation (
3) gives
The second claim follows immediately from the definition of the relaxation error (
6).
□
The asymptotic expression in Proposition 2 clarifies an important point about the geometry of the relaxation error. For fixed
N, the leading term
is largest when the minimum is attained at a single index (
), and decreases as the number of exact minimizers increases. Thus, from the viewpoint of approximating the minimum value, the most unfavorable discrete geometry is not the presence of many exact minimizers, but rather a sharp configuration with a unique minimizer. This observation helps interpret the numerical experiments. Proposition 2 shows that, for exact minimizer multiplicity, the value-approximation error becomes smaller as the number of exact minimizers increases. In the experiments, a related effect appears in regimes with many candidates close to the minimum, which empirically tend to require a smaller inverse temperature to satisfy a prescribed tolerance on value approximation, even though they may appear more ambiguous from an action-selection perspective.
These bounds show that the approximation accuracy of the entropic relaxation improves as the inverse temperature parameter increases. In discrete settings, the decay rate can be characterized more precisely in terms of the multiplicity of the minimizers, as shown in Proposition 2. This reveals a direct connection between the geometry of the cost function and the sharpness required for the relaxation.
However, the appropriate choice of depends on this geometry and is generally unknown in practice.
5. Conformal Calibration
The bounds derived in
Section 4 characterize how the approximation error of the entropic relaxation depends on the inverse temperature parameter
. In practice, however, the geometry of the cost function is typically unknown, and therefore these bounds cannot be used directly to determine an appropriate value of
. This motivates a data-driven calibration procedure that selects the inverse temperature parameter from observed instances while providing finite-sample guarantees on the resulting relaxation error [
19,
20,
22].
Suppose
are observed calibration instances and
is a future test instance, and assume that
are exchangeable.
Throughout this section we use superscripts to index problem instances and subscripts to index candidates within a given instance. In particular, denotes the j-th observed instance and denotes the k-th candidate cost within that instance.
For each instance we evaluate the relaxation error introduced in Definition 2. This motivates the following score function.
Definition 3. For each observed instance , define the score Definition 4. For a cost function ρ, define In other words, is the smallest inverse temperature for which the relaxation error does not exceed the tolerance τ for that instance.
The score measures the discrepancy between the entropic relaxation and the exact infimum for the observed instance.
Given a candidate value of
, we compute the empirical
quantile of these scores,
For a given value of
, let
denote the calibration scores defined in (
15). Throughout the paper we assume that
and we define the empirical
quantile as
where
denote the ordered scores. Under the assumption
, we have
, so the empirical quantile is well defined.
The next result shows that this empirical quantile inherits the same monotonicity with respect to as the individual relaxation errors.
Lemma 2. The functionis nonincreasing. Proof. By Lemma 1, for each observed instance
, the relaxation error
is nonincreasing. Therefore, for every
and every
, we have
Hence the entire sample of scores at
is componentwise no larger than the sample of scores at
. It follows that the empirical
quantile cannot increase, namely
This proves the claim. □
The monotonicity of
implies that the constraint
defines a threshold condition in
, which makes it natural to select the smallest inverse temperature parameter satisfying the prescribed tolerance.
Definition 5. Given a tolerance , define The selected inverse temperature corresponds to the smallest value of for which the empirical upper quantile of the relaxation error does not exceed the prescribed tolerance.
The following result establishes a distribution-free guarantee for the resulting inverse temperature selection rule under the standard exchangeability assumption. In this context, exchangeability means that the joint distribution of the sample
is invariant under permutations of the indices.
Lemma 3. For , letwith defined as in Definition 4, and letdenote the ordered values of . Then the inverse temperature defined in Definition 5 satisfies Proof. By definition, for each
,
Therefore the condition
is equivalent to requiring that at least
k calibration instances satisfy
Equivalently, at least
k instances satisfy
The smallest with this property is precisely . □
Theorem 4. Assume that the sample of cost functionsis exchangeable and let the calibration scores be defined as in (15). Let denote the empirical quantile defined in (17), and let be the inverse temperature defined in (18). Then the entropic relaxation computed with satisfies Proof. Because the instances
are exchangeable, the variables
are also exchangeable.
By Lemma 3, the selected inverse temperature satisfies
where
are the ordered calibration scores.
By the standard split conformal argument for exchangeable variables,
By the definition of
, the condition
implies
The probability in (
19) is taken over the joint randomness of the calibration sample and the test instance. Thus, the guarantee is marginal with respect to the data-generating process and does not hold conditionally on the realized calibration sample, which is the standard notion of validity in split conformal inference [
19,
20,
21,
22,
39].
Remark 1. Theorem 4 provides a distribution-free certificate for the approximation accuracy of the entropic relaxation. The guarantee holds under exchangeability of the observed instances and does not rely on parametric assumptions on the underlying distribution of cost functions.
The selected parameter can therefore be interpreted as the smallest inverse temperature for which the relaxation error satisfies the tolerance constraint with probability at least under the exchangeability assumption.
Remark 2. The calibrated inverse temperature is a single value determined by the calibration sample. Therefore, the proposed method is adaptive at the level of the calibration population, or at the regime level when calibration is performed separately across regimes. It does not produce an instance-dependent inverse temperature for each new test cost function. Extending the method toward covariate-aware or conditional calibration for individual instances is left for future work.
6. Numerical Experiments
This section presents two numerical experiments. The first considers a heterogeneous benchmark with prescribed candidate-set geometries. The second considers an adaptive cruise control problem with safety filtering, namely, the exclusion of candidate control sequences whose predicted rollouts violate the imposed safety constraints, and uncertain lead-vehicle prediction.
All simulations were implemented in MATLAB R2025b using fixed random seeds and paired evaluation across methods. In particular, whenever several methods are compared on the same episode, they are evaluated on the same initial state, the same realization of the exogenous process, and the same candidate set. This paired design eliminates Monte Carlo variability from the comparison and ensures that differences in performance are attributable only to the selected value of .
The coverage guarantees established in Theorem 4 hold under the exchangeability assumption on the sequence of cost functions. The experiments are conducted mainly under the exchangeability assumption of Theorem 4. Experiment 1 also includes a shifted evaluation.
Figure 1 summarizes the experimental pipeline used in the numerical studies. Throughout the experiments, we consider the finite-domain version of the entropic relaxation
where
N denotes the number of feasible candidates in the current batch after the safety filtering step.
Table 1 summarizes the inverse-temperature selection methods compared in the numerical experiments.
Unless otherwise stated, the quality of the selected inverse temperature is assessed through empirical coverage,
the mean and upper quantiles of
, and the absolute distance to the relevant oracle inverse temperature.
In the experiments below, adaptation is implemented at the regime level by calibrating separate inverse temperatures on regime-specific calibration samples. Thus, the observed adaptivity is distributional rather than instance-conditional.
6.1. Experiment 1: Heterogeneous Control Benchmark
We first consider a heterogeneous benchmark in which the geometry of the planner score vectors is prescribed explicitly. The purpose is to examine how the calibrated inverse temperature varies across regimes with different candidate-set geometries.
6.1.1. Setup
The first experiment considers a control-oriented benchmark in which the geometry of the planner score vectors is imposed explicitly. The underlying plant is the discrete-time linear system
with
horizon length
, action bound
with
, stage cost weights
and terminal cost
The initial state is sampled as
and the goal is sampled as
The process noise has the form
with regime-dependent standard deviation
.
The corresponding quadratic closed-loop cost functional is
where
and
g denotes the goal position. This functional is used to evaluate the resulting closed-loop trajectories. As described below, the planner score vectors employed in the entropic soft-min relaxation are generated synthetically in order to control the geometry of the candidate set.
At each time step, a bank of
candidate actions is generated around a nominal proportional-derivative controller used only to define the center of the candidate bank. The nominal control law is
where
g denotes the goal position. Equivalently, the controller gains are
and
. The same gains are used in all regimes and for all compared methods, and no additional gain-optimization step is performed in the reported experiments. Thus, the role of the nominal PD controller is not to provide a competing benchmark, but rather to generate a simple and interpretable reference command that drives the state toward the sampled goal while damping velocity. The candidate actions are sampled as
with regime-dependent action dispersion
.
The planner score vectors used in the experiment are constructed synthetically in order to control the geometry of the candidate set. In particular, the vector
is generated directly from prescribed gap distributions rather than being derived from a simulated optimal control cost. This design isolates the behavior of the operator of the entropic soft-min and allows the ambiguity structure of the candidate set to be controlled explicitly.
The planner score vector is constructed explicitly in order to control the ambiguity structure. For each candidate bank, one candidate is assigned zero excess cost, a prescribed number of candidates are assigned near-minimizer gaps, and the remaining candidates are assigned far gaps. More precisely, if
denotes the planner score vector, then one entry is set to the minimum value, a subset of cardinality
is assigned gaps sampled uniformly from a regime-dependent interval
, and the remaining entries are assigned gaps sampled uniformly from
. Lower scores are preferentially assigned to actions closer to the nominal command. The strength of this alignment is controlled by a regime-dependent parameter
: values close to one strongly align the best planner scores with actions near
, whereas smaller values introduce weaker alignment and therefore more ambiguous soft selection.
To implement the alignment between planner scores and the nominal action, let
denote the distance of candidate
i from the nominal action. Let
be the permutation that sorts candidates by increasing
, and let
be a random permutation of
.
The final permutation used to assign the score gaps is obtained through a convex ranking mixture controlled by the alignment parameter
:
Candidates are then ordered by increasing , and the generated score gaps are assigned following this order. In the rare case of ties in , ties are broken uniformly at random. Thus, when the smallest planner scores correspond exactly to candidates closest to the nominal action, whereas when the assignment is random.
The three regimes are defined by the following parameters:
For calibration, the inverse temperature is searched over the grid
using 500 equally spaced values. The target coverage is
with tolerance
. The global calibration set uses 300 score vectors per regime, the regime-specific calibration set also uses 300 score vectors per regime, and the oracle calibration set uses 600 score vectors per regime. Evaluation is performed on 140 paired episodes per regime. The global fixed baseline is calibrated by pooling the regime-specific calibration samples, the mean-risk baseline uses the same pooled set, and the worst-case baseline is defined by
Given a score vector
and candidate actions
, the control applied to the plant is obtained through Gibbs aggregation,
For each evaluation batch we generate a fresh score vector, compute the relaxation error , and record the indicator , where denotes the indicator function. The reported coverage corresponds to the empirical probability of this event over all evaluation batches. This evaluation protocol matches the exchangeable sampling assumption used in Theorem 4.
6.1.2. Results
Table 2 reports the calibrated inverse temperatures. The oracle values decrease from the easy regime to the hard regime. The proposed conformal selector matches the oracle values in all three regimes, whereas the global, mean-risk, and worst-case baselines deviate substantially, especially in the moderate and hard regimes.
The oracle calibration inverse temperature decreases from the easy regime to the hard regime. This is consistent with the fact that the approximation error concerns the minimum value rather than the identity of the minimizing action.
As shown in Proposition 2, when several candidates attain or nearly attain the minimum value, the bias of the entropic relaxation decreases. Therefore, a smaller inverse temperature is sufficient to satisfy the prescribed tolerance on value approximation.
A regime-wise comparison of distances to the oracle is given in
Table 3. The global baseline is nearly correct in the easy regime, but becomes markedly overconservative in the moderate and hard regimes. The worst-case baseline is consistently the most conservative, while the mean-risk baseline fails to track the oracle in a regime-dependent manner. By contrast, the proposed conformal selector coincides with the oracle by construction up to numerical resolution.
At the aggregate level, the proposed method attains empirical coverage , close to the target value , whereas the mean-risk baseline undercovers and the worst-case baseline overcovers. Coverage is computed over independently generated score vectors.
These results show that the proposed selector tracks the oracle calibration inverse temperature across regimes, whereas a single global inverse temperature does not.
Figure 2 summarizes the main aggregate metrics, and
Figure 3 shows the regime-specific calibration curves
together with the selected inverse temperatures. The crossing point with the tolerance
occurs at different values of
in the three regimes, and the proposed method tracks these values accurately.
6.1.3. Shifted Evaluation and the Role of Exchangeability
We also consider a shifted evaluation derived from the moderate regime. In this setting, the test score vectors are generated from a sharper candidate geometry than those used in calibration, so the exchangeability assumption is violated.
More precisely, the conformal inverse temperature is calibrated on the moderate regime and then evaluated on a shifted regime whose oracle inverse temperature is substantially larger:
Thus, the inverse temperature inherited from the calibration regime is too small for the shifted test distribution.
Table 4 reports the corresponding results. The empirical coverage of ProposedConf falls well below the nominal level in the shifted regime, with
whereas the shift-specific oracle restores near-nominal behavior. This behavior is consistent with Theorem 4, whose guarantee is established under exchangeability. When the test distribution differs from the calibration distribution, the nominal coverage level need not be preserved.
6.2. Experiment 2: Application-Oriented Adaptive Cruise Control Benchmark
We next consider a longitudinal adaptive cruise control (ACC) problem with uncertain lead-vehicle prediction and safety filtering [
10,
27,
28,
29]. This experiment is used to assess the proposed calibration rule in a closed-loop control setting.
The ACC problem is chosen here because it combines three ingredients that make the soft-min inverse temperature meaningful in practice:
- 1.
a finite set of candidate trajectories or control sequences;
- 2.
uncertain predictive scores induced by the forecast of the lead vehicle;
- 3.
a nontrivial trade-off among safety, comfort, and tracking performance [
27,
29,
40].
This experiment compares the calibration-oriented inverse temperature with the inverse temperature preferred by closed-loop performance.
6.2.1. Setup
Let
denote the distance to the lead vehicle,
the ego speed, and
the lead speed. The ego vehicle dynamics are modeled in discrete time using a standard longitudinal ACC kinematic model [
27,
29] as
with sampling time
s and acceleration command
The planning horizon is
, the episode length is
, and the number of candidate sequences generated at each decision step is
. The reference speed is
m/s.
The desired following distance is
and a hard minimum distance
is enforced throughout. The margin term uses
At each decision step, infeasible candidates are removed by the safety filter described below. Consequently, the effective number of candidates may be smaller than M. Throughout the experiment we denote by N the number of feasible candidates that remain after filtering. Thus may vary across time steps and episodes.
A nominal acceleration is computed as
Around this nominal command,
M candidate sequences of length
H are generated by adding Gaussian perturbations and then applying temporal smoothing. More precisely, each candidate sequence is first sampled as
where
is regime-dependent. The resulting sequence is then smoothed by a moving-average filter of window length three and finally saturated componentwise to the interval
.
The lead-vehicle prediction is generated over the horizon by a stochastic acceleration model. Starting from the current lead speed
, the predicted lead trajectory is propagated as
where the acceleration
is sampled according to the current traffic regime. At each prediction step, the lead vehicle may undergo nominal fluctuations, stop-and-go behavior, or hard braking events, with regime-dependent probabilities and magnitudes. Prediction uncertainty is incorporated by adding Gaussian noise with regime-dependent standard deviation.
Conformal calibration is performed only on feasible candidate sets. For a candidate sequence to be feasible, its predicted rollout must satisfy
along the entire prediction horizon. Infeasible candidates are discarded before the soft-min aggregation is applied. If no feasible candidate exists, emergency braking
is applied. When no feasible candidate exists (
), the planner score vector
is not defined and the entropic relaxation is not evaluated. Such time steps are therefore excluded from the coverage computation of
. Consequently, the reported coverage is conditional on the event
. We also report the empirical frequency of
events. Infeasible candidates are removed before calibration and evaluation.
For each feasible candidate, the predictive score used by the planner is
with weights
In (
28), the smoothness term is evaluated with the convention
so that the contribution at
is zero. Equivalently, the smoothness penalty acts only on increments within the predicted sequence.
Let
denote the score of the
i-th feasible candidate. The corresponding Gibbs weights are defined as
The control command applied to the system is obtained through a soft-argmin aggregation of the first control inputs of the feasible candidate sequences,
This construction corresponds to the Gibbs policy associated with the entropic relaxation and yields a smooth interpolation between averaging (
small) and hard minimum selection (
). If the feasible set is empty, the fallback control is
.
The real closed-loop cost uses the same structure as (
28), with the addition of a collision penalty
whenever
, and a terminal penalty
Three lead-vehicle regimes are considered.
Easy:
with stop-and-go probability 0, hard-brake probability 0, hard-brake mean
, target number of near-minimizers 1, near-gap range
, mid-gap range
, far-gap range
, initial gap range
, and ego-speed bias
.
Moderate:
with stop-and-go probability
, hard-brake probability
, hard-brake mean
, target number of near-minimizers 6, near-gap range
, mid-gap range
, far-gap range
, initial gap range
, and ego-speed bias
.
Hard:
with stop-and-go probability
, hard-brake probability
, hard-brake mean
, target number of near-minimizers 18, near-gap range
, mid-gap range
, far-gap range
, initial gap range
, and ego-speed bias
.
The initial gap, lead speed, and ego speed are sampled from these regime-dependent distributions. For reproducibility, the calibration set sizes are 260 samples per regime for global and local calibration and 520 samples per regime for OracleCal, while the evaluation uses 140 paired episodes per regime. The calibration grid is
and the oracle-performance grid is restricted to
with an additional minimum coverage constraint of
.
Unless otherwise stated, the conformal calibration in this experiment uses target coverage level
and tolerance
for the relaxation error
. In particular, the worst-case baseline is defined as
For
and
, this yields
.
The application-level performance metrics recorded in the ACC experiment are summarized in
Table 5. At the operator level, we also report empirical coverage, mean relaxation error, the empirical
quantile of
, and the distances to OracleCal and OraclePerf.
6.2.2. Results
The calibrated inverse temperatures are reported in
Table 6. The oracle-performance inverse temperatures are selected on the restricted grid
subject to a minimum empirical coverage level of
. The oracle calibration inverse temperatures are well separated across regimes:
The proposed conformal selector closely tracks these values in all three regimes, whereas the global, mean-risk, and worst-case baselines remain substantially misaligned, especially in the hard regime. The spread of the oracle calibration inverse temperatures is
.
Figure 4 shows the regime-specific calibration curves.
Table 7 and
Table 8 report aggregate results across all regimes. The oracle-performance inverse temperature is computed on a validation set, whereas the results reported in
Table 7 and
Table 8 are obtained on an independent test set. Therefore, OraclePerf need not minimize the reported test cost.
The ACC experiment shows that the proposed selector remains close to the oracle calibration inverse temperature and attains coverage close to the target level. It also shows that the inverse temperature selected for calibration need not coincide with the one preferred by closed-loop performance (see
Figure 5). In the easy and moderate regimes, the performance-oriented value lies near the upper end of the admissible range, whereas in the hard regime a smaller value is preferred.
7. Concluding Remarks
This paper studied the principled selection of the inverse temperature parameter in entropic soft-min relaxations. Starting from the definition of the operator, structural properties of the associated relaxation error were established, including nonnegativity, monotonicity with respect to the inverse temperature parameter, and approximation bounds in finite and asymptotic regimes.
On this basis, a conformal calibration procedure was introduced to select the smallest inverse temperature ensuring that the relaxation error satisfies a prescribed tolerance with finite-sample distribution-free validity. The resulting rule provides an explicit certificate on the approximation quality of the entropic relaxation under exchangeability of the observed instances.
The numerical experiments support the main claims of the paper. The heterogeneous control-oriented benchmark shows that the proposed conformal selector accurately tracks the oracle calibration inverse temperature in non-homogeneous settings where a single global inverse temperature is inadequate. In the same benchmark, an additional shifted evaluation illustrates that the finite-sample guarantee is tied to the exchangeable setting of Theorem 4: when the test distribution departs from the calibration distribution, the nominal coverage level is no longer guaranteed. The adaptive cruise control experiment demonstrates that the in-distribution behavior of the proposed method persists in a realistic control scenario with explicit safety filtering and uncertain prediction, thereby establishing practical relevance beyond synthetic operator tests.
At the same time, the application experiment clarifies an important conceptual point: an inverse temperature that is optimal for certifying the approximation quality of the entropic soft-min operator is not necessarily identical to the inverse temperature that minimizes the final task-level cost. This distinction does not weaken the role of conformal calibration; rather, it clarifies its purpose. The proposed method provides a certified, distribution-free, and regime-adaptive mechanism for selecting the soft-min inverse temperature itself, which can then be incorporated into broader optimization and control architectures.
The present method selects a single inverse temperature for a given calibration population, or a single inverse temperature per regime when regime-specific calibration is used. Extending this framework toward instance-conditional or covariate-aware conformal calibration constitutes a natural next step, especially in settings where side information is available at test time and finer-grained adaptation is desirable.