Regime-Adaptive Conformal Calibration of Entropic Soft-Min Relaxations for Heterogeneous Optimization Problems

Solanes, J. Ernesto; Francés-Falip, Aitana

doi:10.3390/math14071188

Open AccessFeature PaperArticle

Regime-Adaptive Conformal Calibration of Entropic Soft-Min Relaxations for Heterogeneous Optimization Problems

by

J. Ernesto Solanes

^1,*

and

Aitana Francés-Falip

²

¹

Instituto de Diseño y Fabricación, Universitat Politècnica de València, 46022 València, Spain

²

Digital Integrated Technologies and E-Health, Technological Institute for Children’s Products and Leisure (AIJU), 03440 Alicante, Spain

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(7), 1188; https://doi.org/10.3390/math14071188

Submission received: 14 March 2026 / Revised: 30 March 2026 / Accepted: 31 March 2026 / Published: 2 April 2026

(This article belongs to the Special Issue Advances in Robust Control Theory and Its Applications)

Download

Browse Figures

Versions Notes

Abstract

Entropic soft-min relaxations are widely used to obtain smooth approximations of minimum operators in optimization, machine learning, and control. The accuracy of this approximation is governed by an inverse temperature (or sharpness) parameter that controls the trade-off between smoothness and fidelity, yet its principled selection is typically heuristic. This work studies the data-driven calibration of the inverse temperature parameter governing the entropic soft-min relaxation, with explicit guarantees on the relaxation error between the soft-min operator and the infimum of the cost function. After establishing monotonicity properties and approximation bounds for the relaxation error, we introduce a conformal calibration rule that selects the smallest inverse temperature ensuring that the approximation error satisfies a prescribed tolerance with distribution-free finite-sample validity. The resulting selector adapts to the distribution of candidate cost-vector geometries represented in the calibration sample, enabling regime-specific inverse temperature selection in heterogeneous settings. Numerical experiments, including an adaptive cruise control application with safety filtering, show that the proposed method accurately tracks oracle calibration inverse temperatures and achieves near-target coverage in the exchangeable setting covered by the theory, while an additional shifted evaluation illustrates the role of this assumption.

Keywords:

entropic soft-min relaxation; conformal calibration; distribution-free guarantees; parameter selection in optimization; log-sum-exp approximation

MSC:

93E20; 93C55; 90C25; 62G30

1. Introduction

Approximating non-smooth optimization operators by smooth surrogates is a fundamental technique in modern optimization, machine learning, and control. Many decision-making architectures require selecting or aggregating the minimum value among a set of candidate costs or actions. However, the minimum operator is non-differentiable and therefore difficult to integrate into gradient-based algorithms or smooth optimization pipelines [1,2,3,4]. To address this limitation, smooth relaxations are frequently introduced to replace the exact infimum with differentiable approximations.

Among these constructions, the entropic soft-min, also known as the log-sum-exp approximation, plays a particularly important role. Given a collection of candidate costs, the soft-min operator replaces the exact infimum with an exponential aggregation controlled by an inverse temperature (or sharpness) parameter. This relaxation is widely used in convex optimization, reinforcement learning, probabilistic inference, and receding-horizon control, where differentiability and numerical stability are desirable properties. As the inverse temperature parameter

λ

increases, the entropic relaxation becomes sharper and approaches the infimum of the cost function [2,5,6,7,8].

A central difficulty in practice is the selection of the inverse temperature parameter governing this relaxation. The inverse temperature controls the trade-off between approximation accuracy and numerical smoothness and is often chosen heuristically using a single global value. In heterogeneous settings, where candidate cost geometries vary across instances, such a choice may lead to poor approximation accuracy or excessive conservativeness [9,10,11,12,13].

The log-sum-exp and related entropic relaxations have been extensively studied in optimization and machine learning and are widely used in statistical learning, probabilistic inference, and entropy-regularized optimization [2,6,14,15,16].

Similar entropic mechanisms also appear in control and robotics applications, where soft-min aggregation is used to combine candidate control actions or trajectories in receding-horizon frameworks. Despite this widespread use, principled methods for selecting the inverse temperature parameter with explicit guarantees on the approximation error remain limited [9,10,11,17,18].

Existing approaches typically rely on worst-case theoretical bounds or empirical risk criteria. Worst-case bounds derived from the cardinality of the candidate set provide deterministic guarantees on the approximation error but often yield excessively conservative inverse temperatures that degrade numerical conditioning or smoothness. Consequently, selecting the entropic inverse temperature remains largely heuristic in many practical applications [1,2,9,11].

This work addresses the problem of selecting the entropic inverse temperature in a data-driven manner while maintaining explicit guarantees on the approximation quality of the soft-min operator. Our approach is based on conformal calibration, a distribution-free statistical framework that provides finite-sample validity under minimal assumptions [19,20,21,22,23]. By interpreting the relaxation error of the soft-min operator as a calibration score, we construct a conformal rule that selects the smallest inverse temperature ensuring that the approximation error remains below a prescribed tolerance with high probability.

Beyond providing statistical guarantees, the proposed calibration rule adapts to heterogeneity in the geometry of candidate cost vectors at the level of the calibration population. In heterogeneous environments, different instances may exhibit distinct ambiguity structures. The conformal calibration mechanism captures this variability at the level of the calibration population through data-driven quantile estimation, enabling regime-adaptive inverse temperature selection across heterogeneous settings [24,25,26].

To analyze this approach, we establish structural properties of the entropic soft-min operator and its relaxation error, providing the analytical basis for formulating inverse temperature selection as a calibration problem [1,2].

The proposed method is evaluated through two numerical experiments. The first considers a heterogeneous benchmark with controlled candidate-set geometries. The second embeds the calibration mechanism into an adaptive cruise control problem with uncertain lead-vehicle prediction [10,27,28,29].

The main contributions of the current paper can be summarized as follows: (a) Structural analysis of the entropic soft-min relaxation. Fundamental properties of the relaxation error are established, including monotonicity with respect to the inverse temperature parameter and approximation bounds characterizing the behavior of the relaxation. (b) Conformal inverse temperature calibration. A distribution-free calibration rule is introduced that selects the smallest inverse temperature ensuring that the relaxation error satisfies a prescribed tolerance with finite-sample validity. (c) Adaptivity to heterogeneous candidate-set geometries. The proposed calibration rule selects regime-dependent inverse temperatures according to the geometry distributions represented in the calibration data. (d) Numerical validation in control-oriented settings. Numerical experiments, including a heterogeneous benchmark and an adaptive cruise control application with safety filtering, illustrate the empirical behavior of the proposed calibration mechanism.

This paper proceeds as follows. In Section 2, the entropic soft-min relaxation is introduced and its basic properties are established. Section 3 analyzes the associated relaxation error and derives its main structural properties. In Section 4, finite-domain and asymptotic bounds for the relaxation error are presented. Section 5 introduces the conformal calibration procedure for inverse temperature selection. Numerical experiments are reported in Section 6, including a heterogeneous benchmark and an adaptive cruise control application. Finally, concluding remarks are provided in Section 7.

2. Mathematical Preliminaries and Entropic Relaxation

This section introduces the mathematical framework used throughout the paper and recalls the definition of the entropic soft-min operator, which will serve as the basic building block for the approximation analysis and calibration procedure developed in the subsequent sections.

Let

(Φ, F, μ)

be a measure space and let

ρ : Φ \to R

be a measurable cost function. Let

w : Φ \to [0, \infty)

be a probability density with respect to

μ

satisfying

\int_{Φ} w (ϕ) d μ (ϕ) = 1 .

(1)

Let W denote the probability measure on

(Φ, F)

induced by the density w with respect to

μ

, namely

W (d ϕ) = w (ϕ) d μ (ϕ) .

Then (1) implies that W is a probability measure. Let

λ > 0

denote an inverse temperature parameter. For the values of

λ

under consideration, define the associated partition function by

Z_{λ} (ρ) = \int_{Φ} e^{- λ ρ (ϕ)} W (d ϕ),

(2)

and assume that

Z_{λ} (ρ) < \infty

.

Definition 1.

For

λ > 0

and whenever

Z_{λ} (ρ) < \infty

, the entropic relaxation of ρ is defined as

D_{λ} (ρ) = - \frac{1}{λ} log Z_{λ} (ρ) .

(3)

This corresponds to the classical log-sum-exp or entropic relaxation of the minimum operator widely used in convex optimization, statistical physics, and probabilistic inference [2,14,15]. The parameter

λ

controls the trade-off between smoothness and approximation accuracy, with larger values yielding sharper approximations of the infimum.

The operator

D_{λ}

provides a smooth approximation of the infimum of

ρ

. As the parameter

λ

increases, the exponential weighting concentrates near low-cost regions of

ρ

, and the relaxation approaches the infimum of the cost function. When the infimum is attained, the concentration occurs around the minimizers.

Whenever

0 < Z_{λ} (ρ) < \infty

, define the Gibbs probability measure associated with

ρ

by

Π_{λ} (d ϕ) = \frac{e^{- λ ρ (ϕ)}}{Z_{λ} (ρ)} W (d ϕ) .

(4)

The following result establishes a basic property of the entropic relaxation.

Proposition 1.

For every

λ > 0

such that

Z_{λ} (ρ) < \infty

,

D_{λ} (ρ) \geq inf_{ϕ \in Φ} ρ (ϕ) .

(5)

Proof.

Let

m = inf_{ϕ \in Φ} ρ (ϕ) .

Then

ρ (ϕ) \geq m

for all

ϕ \in Φ

, hence

e^{- λ ρ (ϕ)} \leq e^{- λ m} .

Integrating with respect to

W (d ϕ)

yields

Z_{λ} (ρ) \leq e^{- λ m} .

Substituting this inequality into (3) gives (5). □

3. Relaxation Error

The approximation accuracy of the entropic relaxation introduced in Definition 1 depends on the value of the inverse temperature parameter

λ

. Quantifying the discrepancy between log-sum-exp relaxations and the exact minimum has been studied in several contexts including convex optimization and probabilistic inference [2,5]. A natural way to quantify this approximation accuracy is to measure the discrepancy between the relaxed operator

D_{λ} (ρ)

and the exact infimum of the cost function.

Definition 2.

For a cost function

ρ : Φ \to R

and

λ > 0

, the relaxation error is defined as

E_{λ} (ρ) = D_{λ} (ρ) - inf_{ϕ \in Φ} ρ (ϕ) .

(6)

By Proposition 1, the entropic relaxation always overestimates the infimum of the cost function. Consequently,

E_{λ} (ρ) \geq 0 .

The relaxation error therefore provides a nonnegative measure of the approximation accuracy of the soft-min operator.

The dependence of the relaxation on the inverse temperature parameter can be characterized through the following monotonicity property.

Lemma 1.

Let

D_{λ} (ρ)

denote the entropic relaxation defined in (3), and assume that

Z_{λ} (ρ) < \infty

for the values of λ under consideration. Then the mapping

λ \mapsto D_{λ} (ρ)

is nonincreasing. Consequently, for any

λ_{1} < λ_{2}

,

E_{λ_{2}} (ρ) \leq E_{λ_{1}} (ρ) .

Proof.

The entropic relaxation admits the variational representation

D_{λ} (ρ) = inf_{Q ≪ W} (E_{Q} [ρ] + \frac{1}{λ} KL (Q ∥ W)),

(7)

where the infimum is taken over probability measures Q on

(Φ, F)

that are absolutely continuous with respect to the reference probability measure W. Here

E_{Q} [ρ] = \int_{Φ} ρ (ϕ) Q (d ϕ)

denotes the expectation of

ρ

under Q, and

KL (Q ∥ W) = \int_{Φ} log (\frac{d Q}{d W}) Q (d ϕ)

denotes the Kullback–Leibler divergence of Q from W. This is the Gibbs variational principle (equivalently, the Donsker–Varadhan representation for the log-Laplace functional [30,31,32,33,34]).

Let

λ_{1} < λ_{2}

. Since the function

λ \mapsto 1 / λ

is decreasing, for every admissible probability measure

Q ≪ W

we have

E_{Q} [ρ] + \frac{1}{λ_{2}} KL (Q ∥ W) \leq E_{Q} [ρ] + \frac{1}{λ_{1}} KL (Q ∥ W) .

Taking the infimum over

Q ≪ W

on both sides yields

D_{λ_{2}} (ρ) \leq D_{λ_{1}} (ρ) .

Therefore

D_{λ} (ρ)

is nonincreasing in

λ

. Since

E_{λ} (ρ) = D_{λ} (ρ) - {inf}_{ϕ \in Φ} ρ (ϕ)

, the same monotonicity holds for the relaxation error. □

Under standard differentiability assumptions, and assuming that

0 < Z_{λ} (ρ) < \infty

and differentiation under the integral sign is justified, the derivative satisfies

\frac{d}{d λ} D_{λ} (ρ) = - \frac{1}{λ^{2}} KL (Π_{λ} ∥ W) \leq 0 .

(8)

where

Π_{λ}

denotes the Gibbs probability measure defined in (4). Here both

Π_{λ}

and W are probability measures on

(Φ, F)

, and

KL (Π_{λ} ∥ W)

is understood in the standard measure-theoretic sense [14,15,33,35]. This identity is consistent with the Gibbs variational representation in (7) and provides an alternative justification of monotonicity.

Another useful structural property of the entropic relaxation is its concavity with respect to the cost function.

Theorem 1.

Let

ρ_{1}, ρ_{2} : Φ \to R

be such that

Z_{λ} (ρ_{1}) < \infty, Z_{λ} (ρ_{2}) < \infty .

Then, for any

t \in [0, 1]

,

D_{λ} (t ρ_{1} + (1 - t) ρ_{2}) \geq t D_{λ} (ρ_{1}) + (1 - t) D_{λ} (ρ_{2}) .

(9)

Proof.

Using the definition of the partition function (2), we have

Z_{λ} (t ρ_{1} + (1 - t) ρ_{2}) = \int_{Φ} e^{- λ (t ρ_{1} + (1 - t) ρ_{2})} W (d ϕ) .

Applying Hölder’s inequality yields

Z_{λ} (t ρ_{1} + (1 - t) ρ_{2}) \leq Z_{λ} {(ρ_{1})}^{t} Z_{λ} {(ρ_{2})}^{1 - t} .

Taking logarithms gives

log Z_{λ} (t ρ_{1} + (1 - t) ρ_{2}) \leq t log Z_{λ} (ρ_{1}) + (1 - t) log Z_{λ} (ρ_{2}) .

Multiplying by

- 1 / λ

and using (3) yields (9). □

4. Approximation Bounds

The structural properties established in Section 3 make it possible to characterize the approximation accuracy of the entropic relaxation. In particular, the relaxation error introduced in Definition 2 can be bounded under different assumptions on the candidate set and the cost function [2,5].

We first consider the case where the candidate set is finite. In the finite-domain setting

Φ = {1, \dots, N}

, we write

ρ_{i} : = ρ (i)

for

i = 1, \dots, N

. Similarly, when needed, we write

w_{i} : = W ({i})

.

Theorem 2.

Assume

Φ = {1, \dots, N}

and

w_{i} = 1 / N

for

i = 1, \dots, N

. Then for every

λ > 0

E_{λ} (ρ) \leq \frac{log N}{λ} .

(10)

Proof.

Let

m = {min}_{i} ρ_{i}

. Then

\frac{1}{N} \sum_{i = 1}^{N} e^{- λ ρ_{i}} \geq \frac{1}{N} e^{- λ m} .

Using the definition of the entropic relaxation (3), we obtain

D_{λ} (ρ) \leq m + \frac{log N}{λ} .

Combining this inequality with the definition of the relaxation error (6) yields the bound. □

The bound (10) provides a worst-case guarantee on the approximation accuracy of the soft-min operator. Although this bound is independent of the geometry of the cost function, it shows explicitly how the relaxation error decreases as the inverse temperature parameter

λ

increases.

More precise asymptotic behavior can be obtained in continuous domains under additional regularity assumptions on the cost function and the reference measure. In the remainder of this section we specialize to the following setting.

Assume that

Φ \subset R^{d}

is an open set equipped with the Lebesgue measure

μ

, and that the reference density w is continuous and strictly positive in a neighborhood of the minimizer. Suppose further that

ρ : Φ \to R

is twice continuously differentiable and admits a unique nondegenerate minimizer

ϕ^{*} \in Φ

, meaning that

\nabla ρ (ϕ^{*}) = 0 and \nabla^{2} ρ (ϕ^{*}) ≻ 0 .

Theorem 3.

Under the assumptions above, as

λ \to \infty

,

D_{λ} (ρ) = ρ (ϕ^{*}) + \frac{d}{2 λ} log λ - \frac{1}{λ} log (w (ϕ^{*}) {(2 π)}^{d / 2} det {(\nabla^{2} ρ (ϕ^{*}))}^{- 1 / 2}) + o (\frac{1}{λ}) .

(11)

The expansion follows from the classical Laplace method for exponentially weighted integrals [36,37,38].

An immediate consequence concerns the asymptotic decay rate of the relaxation error.

Corollary 1.

Under the assumptions of Theorem 3, the relaxation error satisfies

E_{λ} (ρ) = D_{λ} (ρ) - ρ (ϕ^{*}) = O (\frac{log λ}{λ}) .

(12)

The asymptotic behavior of the relaxation error can also be characterized in the case where the minimum is attained at finitely many points.

Proposition 2.

Assume

Φ = {1, \dots, N}

and

w_{i} = 1 / N

for

i = 1, \dots, N

. Let

m = min_{1 \leq i \leq N} ρ_{i}

and suppose that the minimum is attained at exactly k indices.

Then, as

λ \to \infty

,

D_{λ} (ρ) = m + \frac{1}{λ} log (\frac{N}{k}) + o (\frac{1}{λ}) .

(13)

Consequently,

E_{λ} (ρ) = \frac{1}{λ} log (\frac{N}{k}) + o (\frac{1}{λ}) .

(14)

Proof.

Let

m = {min}_{1 \leq i \leq N} ρ_{i}

. Then

\frac{1}{N} \sum_{i = 1}^{N} e^{- λ ρ_{i}} = \frac{1}{N} (k e^{- λ m} + \sum_{i : ρ_{i} > m} e^{- λ ρ_{i}}) .

Since the set of indices with

ρ_{i} > m

is finite, define

δ : = min_{i : ρ_{i} > m} (ρ_{i} - m) > 0 .

Then, for every index such that

ρ_{i} > m

,

e^{- λ ρ_{i}} = e^{- λ (m + δ_{i})} = e^{- λ m} e^{- λ δ_{i}}, δ_{i} : = ρ_{i} - m \geq δ .

Therefore,

\sum_{i : ρ_{i} > m} e^{- λ ρ_{i}} = e^{- λ m} \sum_{i : ρ_{i} > m} e^{- λ (ρ_{i} - m)} = O (e^{- λ (m + δ)}) = o (e^{- λ m})

as

λ \to \infty

.

It follows that

\frac{1}{N} \sum_{i = 1}^{N} e^{- λ ρ_{i}} = \frac{k}{N} e^{- λ m} (1 + o (1)) .

Substituting this into the definition of the entropic relaxation (3) gives

D_{λ} (ρ) = - \frac{1}{λ} log (\frac{k}{N} e^{- λ m} (1 + o (1))) = m + \frac{1}{λ} log (\frac{N}{k}) + o (\frac{1}{λ}) .

The second claim follows immediately from the definition of the relaxation error (6).

□

The asymptotic expression in Proposition 2 clarifies an important point about the geometry of the relaxation error. For fixed N, the leading term

\frac{1}{λ} log (\frac{N}{k})

is largest when the minimum is attained at a single index (

k = 1

), and decreases as the number of exact minimizers increases. Thus, from the viewpoint of approximating the minimum value, the most unfavorable discrete geometry is not the presence of many exact minimizers, but rather a sharp configuration with a unique minimizer. This observation helps interpret the numerical experiments. Proposition 2 shows that, for exact minimizer multiplicity, the value-approximation error becomes smaller as the number of exact minimizers increases. In the experiments, a related effect appears in regimes with many candidates close to the minimum, which empirically tend to require a smaller inverse temperature to satisfy a prescribed tolerance on value approximation, even though they may appear more ambiguous from an action-selection perspective.

These bounds show that the approximation accuracy of the entropic relaxation improves as the inverse temperature parameter

λ

increases. In discrete settings, the decay rate can be characterized more precisely in terms of the multiplicity of the minimizers, as shown in Proposition 2. This reveals a direct connection between the geometry of the cost function and the sharpness required for the relaxation.

However, the appropriate choice of

λ

depends on this geometry and is generally unknown in practice.

5. Conformal Calibration

The bounds derived in Section 4 characterize how the approximation error of the entropic relaxation depends on the inverse temperature parameter

λ

. In practice, however, the geometry of the cost function is typically unknown, and therefore these bounds cannot be used directly to determine an appropriate value of

λ

. This motivates a data-driven calibration procedure that selects the inverse temperature parameter from observed instances while providing finite-sample guarantees on the resulting relaxation error [19,20,22].

Suppose

ρ^{(1)}, \dots, ρ^{(n)}

are observed calibration instances and

ρ^{(n + 1)}

is a future test instance, and assume that

ρ^{(1)}, \dots, ρ^{(n + 1)}

are exchangeable.

Throughout this section we use superscripts to index problem instances and subscripts to index candidates within a given instance. In particular,

ρ^{(j)}

denotes the j-th observed instance and

ρ_{k}^{(j)}

denotes the k-th candidate cost within that instance.

For each instance we evaluate the relaxation error introduced in Definition 2. This motivates the following score function.

Definition 3.

For each observed instance

ρ^{(j)}

, define the score

S_{j} (λ) = E_{λ} (ρ^{(j)}) .

(15)

Definition 4.

For a cost function ρ, define

Λ (ρ) : = inf {λ > 0 : E_{λ} (ρ) \leq τ} .

In other words,

Λ (ρ)

is the smallest inverse temperature for which the relaxation error does not exceed the tolerance τ for that instance.

The score measures the discrepancy between the entropic relaxation and the exact infimum for the observed instance.

Given a candidate value of

λ

, we compute the empirical

(1 - α)

quantile of these scores,

{\hat{q}}_{1 - α} (λ) .

(16)

For a given value of

λ

, let

S_{1} (λ), \dots, S_{n} (λ)

denote the calibration scores defined in (15). Throughout the paper we assume that

α \geq \frac{1}{n + 1},

and we define the empirical

(1 - α)

quantile as

{\hat{q}}_{1 - α} (λ) = S_{(k)} (λ), k = ⌈(1 - α) (n + 1)⌉,

(17)

where

S_{(1)} (λ) \leq \dots \leq S_{(n)} (λ)

denote the ordered scores. Under the assumption

α \geq 1 / (n + 1)

, we have

k \leq n

, so the empirical quantile is well defined.

The next result shows that this empirical quantile inherits the same monotonicity with respect to

λ

as the individual relaxation errors.

Lemma 2.

The function

λ \mapsto {\hat{q}}_{1 - α} (λ)

is nonincreasing.

Proof.

By Lemma 1, for each observed instance

ρ^{(i)}

, the relaxation error

λ \mapsto E_{λ} (ρ^{(i)})

is nonincreasing. Therefore, for every

i = 1, \dots, n

and every

λ_{1} < λ_{2}

, we have

S_{i} (λ_{2}) \leq S_{i} (λ_{1}) .

Hence the entire sample of scores at

λ_{2}

is componentwise no larger than the sample of scores at

λ_{1}

. It follows that the empirical

(1 - α)

quantile cannot increase, namely

{\hat{q}}_{1 - α} (λ_{2}) \leq {\hat{q}}_{1 - α} (λ_{1}) .

This proves the claim. □

The monotonicity of

{\hat{q}}_{1 - α} (λ)

implies that the constraint

{\hat{q}}_{1 - α} (λ) \leq τ

defines a threshold condition in

λ

, which makes it natural to select the smallest inverse temperature parameter satisfying the prescribed tolerance.

Definition 5.

Given a tolerance

τ > 0

, define

λ^{*} = inf \{λ > 0 : {\hat{q}}_{1 - α} (λ) \leq τ\} .

(18)

The selected inverse temperature

λ^{*}

corresponds to the smallest value of

λ

for which the empirical upper quantile of the relaxation error does not exceed the prescribed tolerance.

The following result establishes a distribution-free guarantee for the resulting inverse temperature selection rule under the standard exchangeability assumption. In this context, exchangeability means that the joint distribution of the sample

ρ^{(1)}, \dots, ρ^{(n + 1)}

is invariant under permutations of the indices.

Lemma 3.

For

i = 1, \dots, n

, let

Λ_{i} : = Λ (ρ^{(i)})

with

Λ (ρ^{(i)})

defined as in Definition 4, and let

Λ_{(1)} \leq \dots \leq Λ_{(n)}

denote the ordered values of

Λ_{1}, \dots, Λ_{n}

.

Then the inverse temperature defined in Definition 5 satisfies

λ^{*} = Λ_{(k)}, k = ⌈(1 - α) (n + 1)⌉ .

Proof.

By definition, for each

j = 1, \dots, n

,

Λ_{j} \leq λ ⟺ E_{λ} (ρ^{(j)}) \leq τ .

Therefore the condition

{\hat{q}}_{1 - α} (λ) \leq τ

is equivalent to requiring that at least k calibration instances satisfy

E_{λ} (ρ^{(j)}) \leq τ .

Equivalently, at least k instances satisfy

Λ (ρ^{(j)}) \leq λ .

The smallest

λ

with this property is precisely

Λ_{(k)}

. □

Theorem 4.

Assume that the sample of cost functions

ρ^{(1)}, \dots, ρ^{(n + 1)}

is exchangeable and let the calibration scores be defined as in (15). Let

{\hat{q}}_{1 - α} (λ)

denote the empirical quantile defined in (17), and let

λ^{*}

be the inverse temperature defined in (18). Then the entropic relaxation computed with

λ^{*}

satisfies

P (E_{λ^{*}} (ρ^{(n + 1)}) \leq τ) \geq 1 - α .

(19)

Proof.

Let

Λ_{i} = Λ (ρ^{(i)}), i = 1, \dots, n + 1 .

Because the instances

ρ^{(1)}, \dots, ρ^{(n + 1)}

are exchangeable, the variables

Λ_{1}, \dots, Λ_{n + 1}

are also exchangeable.

By Lemma 3, the selected inverse temperature satisfies

λ^{*} = Λ_{(k)},

where

Λ_{(1)} \leq \dots \leq Λ_{(n)}

are the ordered calibration scores.

By the standard split conformal argument for exchangeable variables,

P (Λ_{n + 1} \leq Λ_{(k)}) \geq 1 - α .

By the definition of

Λ (ρ)

, the condition

Λ (ρ^{(n + 1)}) \leq λ^{*}

implies

E_{λ^{*}} (ρ^{(n + 1)}) \leq τ .

Therefore

P (E_{λ^{*}} (ρ^{(n + 1)}) \leq τ) \geq 1 - α . □

The probability in (19) is taken over the joint randomness of the calibration sample and the test instance. Thus, the guarantee is marginal with respect to the data-generating process and does not hold conditionally on the realized calibration sample, which is the standard notion of validity in split conformal inference [19,20,21,22,39].

Remark 1.

Theorem 4 provides a distribution-free certificate for the approximation accuracy of the entropic relaxation. The guarantee holds under exchangeability of the observed instances and does not rely on parametric assumptions on the underlying distribution of cost functions.

The selected parameter

λ^{*}

can therefore be interpreted as the smallest inverse temperature for which the relaxation error satisfies the tolerance constraint with probability at least

1 - α

under the exchangeability assumption.

Remark 2.

The calibrated inverse temperature

λ^{*}

is a single value determined by the calibration sample. Therefore, the proposed method is adaptive at the level of the calibration population, or at the regime level when calibration is performed separately across regimes. It does not produce an instance-dependent inverse temperature for each new test cost function. Extending the method toward covariate-aware or conditional calibration for individual instances is left for future work.

6. Numerical Experiments

This section presents two numerical experiments. The first considers a heterogeneous benchmark with prescribed candidate-set geometries. The second considers an adaptive cruise control problem with safety filtering, namely, the exclusion of candidate control sequences whose predicted rollouts violate the imposed safety constraints, and uncertain lead-vehicle prediction.

All simulations were implemented in MATLAB R2025b using fixed random seeds and paired evaluation across methods. In particular, whenever several methods are compared on the same episode, they are evaluated on the same initial state, the same realization of the exogenous process, and the same candidate set. This paired design eliminates Monte Carlo variability from the comparison and ensures that differences in performance are attributable only to the selected value of

λ

.

The coverage guarantees established in Theorem 4 hold under the exchangeability assumption on the sequence of cost functions. The experiments are conducted mainly under the exchangeability assumption of Theorem 4. Experiment 1 also includes a shifted evaluation.

Figure 1 summarizes the experimental pipeline used in the numerical studies. Throughout the experiments, we consider the finite-domain version of the entropic relaxation

D_{λ} (ρ) = - \frac{1}{λ} log (\frac{1}{N} \sum_{i = 1}^{N} e^{- λ ρ_{i}}), E_{λ} (ρ) = D_{λ} (ρ) - min_{i} ρ_{i},

(20)

where N denotes the number of feasible candidates in the current batch after the safety filtering step. Table 1 summarizes the inverse-temperature selection methods compared in the numerical experiments.

Unless otherwise stated, the quality of the selected inverse temperature is assessed through empirical coverage,

P (E_{λ} (ρ) \leq τ),

the mean and upper quantiles of

E_{λ}

, and the absolute distance to the relevant oracle inverse temperature.

In the experiments below, adaptation is implemented at the regime level by calibrating separate inverse temperatures on regime-specific calibration samples. Thus, the observed adaptivity is distributional rather than instance-conditional.

6.1. Experiment 1: Heterogeneous Control Benchmark

We first consider a heterogeneous benchmark in which the geometry of the planner score vectors is prescribed explicitly. The purpose is to examine how the calibrated inverse temperature varies across regimes with different candidate-set geometries.

6.1.1. Setup

The first experiment considers a control-oriented benchmark in which the geometry of the planner score vectors is imposed explicitly. The underlying plant is the discrete-time linear system

x_{t + 1} = A x_{t} + B u_{t} + w_{t},

(21)

with

A = [\begin{matrix} 1.0 & 0.18 \\ 0.0 & 1.0 \end{matrix}], B = [\begin{matrix} 0.0 \\ 0.18 \end{matrix}],

horizon length

T = 30

, action bound

| u_{t} | \leq U_{max}

with

U_{max} = 2.5

, stage cost weights

Q = diag (5.5, 0.9), R = 0.05,

and terminal cost

Q_{f} = diag (8.0, 1.2) .

The initial state is sampled as

x_{0} = [\begin{matrix} x_{0, 1} \\ x_{0, 2} \end{matrix}], x_{0, 1} \sim Unif [- r_{x}, r_{x}], x_{0, 2} \sim Unif [- r_{v}, r_{v}],

and the goal is sampled as

g \sim N (0, σ_{g}^{2}) .

The process noise has the form

w_{t} = [\begin{matrix} 0 \\ ξ_{t} \end{matrix}], ξ_{t} \sim N (0, σ_{w}^{2}),

with regime-dependent standard deviation

σ_{w}

.

The corresponding quadratic closed-loop cost functional is

J_{cl} = \sum_{t = 0}^{T - 1} ({(x_{t} - x^{ref})}^{⊤} Q (x_{t} - x^{ref}) + R u_{t}^{2}) + {(x_{T} - x^{ref})}^{⊤} Q_{f} (x_{T} - x^{ref}),

(22)

where

x^{ref} = {[\begin{matrix} g & 0 \end{matrix}]}^{⊤}

and g denotes the goal position. This functional is used to evaluate the resulting closed-loop trajectories. As described below, the planner score vectors employed in the entropic soft-min relaxation are generated synthetically in order to control the geometry of the candidate set.

At each time step, a bank of

M = 64

candidate actions is generated around a nominal proportional-derivative controller used only to define the center of the candidate bank. The nominal control law is

u_{t}^{nom} = {sat}_{[- U_{max}, U_{max}]} (- 1.35 (x_{t, 1} - g) - 0.90 x_{t, 2}),

(23)

where g denotes the goal position. Equivalently, the controller gains are

K_{p} = 1.35

and

K_{v} = 0.90

. The same gains are used in all regimes and for all compared methods, and no additional gain-optimization step is performed in the reported experiments. Thus, the role of the nominal PD controller is not to provide a competing benchmark, but rather to generate a simple and interpretable reference command that drives the state toward the sampled goal while damping velocity. The candidate actions are sampled as

u_{t}^{(i)} = u_{t}^{nom} + η_{i}, η_{i} \sim N (0, σ_{u}^{2}),

with regime-dependent action dispersion

σ_{u}

.

The planner score vectors used in the experiment are constructed synthetically in order to control the geometry of the candidate set. In particular, the vector

ρ = (ρ_{1}, \dots, ρ_{M})

is generated directly from prescribed gap distributions rather than being derived from a simulated optimal control cost. This design isolates the behavior of the operator of the entropic soft-min and allows the ambiguity structure of the candidate set to be controlled explicitly.

The planner score vector is constructed explicitly in order to control the ambiguity structure. For each candidate bank, one candidate is assigned zero excess cost, a prescribed number of candidates are assigned near-minimizer gaps, and the remaining candidates are assigned far gaps. More precisely, if

ρ = (ρ_{1}, \dots, ρ_{M}) \in R^{M}

denotes the planner score vector, then one entry is set to the minimum value, a subset of cardinality

k_{near}

is assigned gaps sampled uniformly from a regime-dependent interval

[{\underset{̲}{δ}}_{near}, {\bar{δ}}_{near}]

, and the remaining entries are assigned gaps sampled uniformly from

[{\underset{̲}{δ}}_{far}, {\bar{δ}}_{far}]

. Lower scores are preferentially assigned to actions closer to the nominal command. The strength of this alignment is controlled by a regime-dependent parameter

a_{align} \in [0, 1]

: values close to one strongly align the best planner scores with actions near

u_{t}^{nom}

, whereas smaller values introduce weaker alignment and therefore more ambiguous soft selection.

To implement the alignment between planner scores and the nominal action, let

d_{i} = | u_{t}^{(i)} - u_{t}^{nom} |

denote the distance of candidate i from the nominal action. Let

π_{dist}

be the permutation that sorts candidates by increasing

d_{i}

, and let

π_{rand}

be a random permutation of

{1, \dots, M}

.

The final permutation used to assign the score gaps is obtained through a convex ranking mixture controlled by the alignment parameter

a_{align} \in [0, 1]

:

r_{i} = a_{align} {rank}_{π_{dist}} (i) + (1 - a_{align}) {rank}_{π_{rand}} (i) .

Candidates are then ordered by increasing

r_{i}

, and the generated score gaps are assigned following this order. In the rare case of ties in

r_{i}

, ties are broken uniformly at random. Thus, when

a_{align} = 1

the smallest planner scores correspond exactly to candidates closest to the nominal action, whereas when

a_{align} = 0

the assignment is random.

The three regimes are defined by the following parameters:

Easy:

$σ_{w} = 0.05, σ_{g} = 0.40, r_{x} = 3.5, r_{v} = 1.0, σ_{u} = 1.00,$

$k_{near} = 1, [{\underset{̲}{δ}}_{near}, {\bar{δ}}_{near}] = [0.00, 0.010],$

$[{\underset{̲}{δ}}_{far}, {\bar{δ}}_{far}] = [1.50, 5.50], a_{align} = 1.00 .$
Moderate:

$σ_{w} = 0.08, σ_{g} = 0.90, r_{x} = 4.0, r_{v} = 1.2, σ_{u} = 0.75,$

$k_{near} = 6, [{\underset{̲}{δ}}_{near}, {\bar{δ}}_{near}] = [0.00, 0.050],$

$[{\underset{̲}{δ}}_{far}, {\bar{δ}}_{far}] = [0.90, 3.20], a_{align} = 0.75 .$
Hard:

$σ_{w} = 0.12, σ_{g} = 1.40, r_{x} = 4.8, r_{v} = 1.5, σ_{u} = 0.55,$

$k_{near} = 24, [{\underset{̲}{δ}}_{near}, {\bar{δ}}_{near}] = [0.00, 0.090],$

$[{\underset{̲}{δ}}_{far}, {\bar{δ}}_{far}] = [0.25, 1.60], a_{align} = 0.45 .$

For calibration, the inverse temperature is searched over the grid

λ \in [0.5, 120]

using 500 equally spaced values. The target coverage is

1 - α = 0.9

with tolerance

τ = 0.10

. The global calibration set uses 300 score vectors per regime, the regime-specific calibration set also uses 300 score vectors per regime, and the oracle calibration set uses 600 score vectors per regime. Evaluation is performed on 140 paired episodes per regime. The global fixed baseline is calibrated by pooling the regime-specific calibration samples, the mean-risk baseline uses the same pooled set, and the worst-case baseline is defined by

λ = \frac{log M}{τ}, M = 64 .

Given a score vector

ρ

and candidate actions

u_{t}^{(1)}, \dots, u_{t}^{(M)}

, the control applied to the plant is obtained through Gibbs aggregation,

u_{t} = \sum_{i = 1}^{M} \frac{exp (- λ ρ_{i})}{\sum_{j = 1}^{M} exp (- λ ρ_{j})} u_{t}^{(i)} .

(24)

For each evaluation batch we generate a fresh score vector, compute the relaxation error

E_{λ} (ρ)

, and record the indicator

1 {E_{λ} (ρ) \leq τ}

, where

1 {\cdot}

denotes the indicator function. The reported coverage corresponds to the empirical probability of this event over all evaluation batches. This evaluation protocol matches the exchangeable sampling assumption used in Theorem 4.

6.1.2. Results

Table 2 reports the calibrated inverse temperatures. The oracle values decrease from the easy regime to the hard regime. The proposed conformal selector matches the oracle values in all three regimes, whereas the global, mean-risk, and worst-case baselines deviate substantially, especially in the moderate and hard regimes.

The oracle calibration inverse temperature decreases from the easy regime to the hard regime. This is consistent with the fact that the approximation error concerns the minimum value rather than the identity of the minimizing action.

As shown in Proposition 2, when several candidates attain or nearly attain the minimum value, the bias of the entropic relaxation decreases. Therefore, a smaller inverse temperature is sufficient to satisfy the prescribed tolerance on value approximation.

A regime-wise comparison of distances to the oracle is given in Table 3. The global baseline is nearly correct in the easy regime, but becomes markedly overconservative in the moderate and hard regimes. The worst-case baseline is consistently the most conservative, while the mean-risk baseline fails to track the oracle in a regime-dependent manner. By contrast, the proposed conformal selector coincides with the oracle by construction up to numerical resolution.

At the aggregate level, the proposed method attains empirical coverage

0.913

, close to the target value

0.9

, whereas the mean-risk baseline undercovers and the worst-case baseline overcovers. Coverage is computed over independently generated score vectors.

These results show that the proposed selector tracks the oracle calibration inverse temperature across regimes, whereas a single global inverse temperature does not.

Figure 2 summarizes the main aggregate metrics, and Figure 3 shows the regime-specific calibration curves

{\hat{q}}_{1 - α} (λ)

together with the selected inverse temperatures. The crossing point with the tolerance

τ

occurs at different values of

λ

in the three regimes, and the proposed method tracks these values accurately.

6.1.3. Shifted Evaluation and the Role of Exchangeability

We also consider a shifted evaluation derived from the moderate regime. In this setting, the test score vectors are generated from a sharper candidate geometry than those used in calibration, so the exchangeability assumption is violated.

More precisely, the conformal inverse temperature is calibrated on the moderate regime and then evaluated on a shifted regime whose oracle inverse temperature is substantially larger:

λ_{prop}^{moderate} = 29.2375, λ_{oracle}^{shift} = 35.9429 .

Thus, the inverse temperature inherited from the calibration regime is too small for the shifted test distribution.

Table 4 reports the corresponding results. The empirical coverage of ProposedConf falls well below the nominal level in the shifted regime, with

P (E_{λ} (ρ) \leq τ) = 0, q_{0.95} (E_{λ}) = 0.1221 > τ = 0.10,

whereas the shift-specific oracle restores near-nominal behavior. This behavior is consistent with Theorem 4, whose guarantee is established under exchangeability. When the test distribution differs from the calibration distribution, the nominal coverage level need not be preserved.

6.2. Experiment 2: Application-Oriented Adaptive Cruise Control Benchmark

We next consider a longitudinal adaptive cruise control (ACC) problem with uncertain lead-vehicle prediction and safety filtering [10,27,28,29]. This experiment is used to assess the proposed calibration rule in a closed-loop control setting.

The ACC problem is chosen here because it combines three ingredients that make the soft-min inverse temperature meaningful in practice:

1.: a finite set of candidate trajectories or control sequences;
2.: uncertain predictive scores induced by the forecast of the lead vehicle;
3.: a nontrivial trade-off among safety, comfort, and tracking performance [27,29,40].

This experiment compares the calibration-oriented inverse temperature with the inverse temperature preferred by closed-loop performance.

6.2.1. Setup

Let

d_{t}

denote the distance to the lead vehicle,

v_{t}^{e}

the ego speed, and

v_{t}^{ℓ}

the lead speed. The ego vehicle dynamics are modeled in discrete time using a standard longitudinal ACC kinematic model [27,29] as

v_{t + 1}^{e} = max {v_{t}^{e} + Δ t u_{t}, 0}, d_{t + 1} = d_{t} + Δ t (v_{t}^{ℓ} - v_{t}^{e}),

(25)

with sampling time

Δ t = 0.2

s and acceleration command

u_{t} \in [u_{min}, u_{max}], u_{min} = - 4.5, u_{max} = 2.5 .

The planning horizon is

H = 16

, the episode length is

T = 60

, and the number of candidate sequences generated at each decision step is

M = 100

. The reference speed is

v_{ref} = 20

m/s.

The desired following distance is

d_{safe} (v_{t}^{e}) = 8.0 + 1.1 max {v_{t}^{e}, 0},

(26)

and a hard minimum distance

d_{min} = 3.0 m

is enforced throughout. The margin term uses

ε = 0.35 .

At each decision step, infeasible candidates are removed by the safety filter described below. Consequently, the effective number of candidates may be smaller than M. Throughout the experiment we denote by N the number of feasible candidates that remain after filtering. Thus

N \leq M

may vary across time steps and episodes.

A nominal acceleration is computed as

u_{t}^{nom} = {sat}_{[u_{min}, u_{max}]} (0.15 (d_{t} - d_{safe} (v_{t}^{e})) + 0.50 (v_{t}^{ℓ} - v_{t}^{e}) + 0.20 (v_{ref} - v_{t}^{e})) .

(27)

Around this nominal command, M candidate sequences of length H are generated by adding Gaussian perturbations and then applying temporal smoothing. More precisely, each candidate sequence is first sampled as

u_{t : t + H - 1}^{(i)} = u_{t}^{nom} 1 + η^{(i)}, η_{k}^{(i)} \sim N (0, σ_{cand}^{2}),

where

σ_{cand}

is regime-dependent. The resulting sequence is then smoothed by a moving-average filter of window length three and finally saturated componentwise to the interval

[u_{min}, u_{max}]

.

The lead-vehicle prediction is generated over the horizon by a stochastic acceleration model. Starting from the current lead speed

v_{t}^{ℓ}

, the predicted lead trajectory is propagated as

v_{t + k + 1}^{ℓ} = max {v_{t + k}^{ℓ} + Δ t a_{t + k}^{ℓ}, 0},

where the acceleration

a_{t + k}^{ℓ}

is sampled according to the current traffic regime. At each prediction step, the lead vehicle may undergo nominal fluctuations, stop-and-go behavior, or hard braking events, with regime-dependent probabilities and magnitudes. Prediction uncertainty is incorporated by adding Gaussian noise with regime-dependent standard deviation.

Conformal calibration is performed only on feasible candidate sets. For a candidate sequence to be feasible, its predicted rollout must satisfy

d_{t + k} > d_{min}, k = 0, \dots, H - 1,

along the entire prediction horizon. Infeasible candidates are discarded before the soft-min aggregation is applied. If no feasible candidate exists, emergency braking

u_{t} = u_{min}

is applied. When no feasible candidate exists (

N = 0

), the planner score vector

ρ

is not defined and the entropic relaxation is not evaluated. Such time steps are therefore excluded from the coverage computation of

P (E_{λ} (ρ) \leq τ)

. Consequently, the reported coverage is conditional on the event

N \geq 1

. We also report the empirical frequency of

N = 0

events. Infeasible candidates are removed before calibration and evaluation.

For each feasible candidate, the predictive score used by the planner is

J_{pred}^{(i)} = \sum_{k = 0}^{H - 1} [w_{track} {(v_{ref} - v_{t + k}^{e})}^{2} + w_{u} {(u_{t + k}^{(i)})}^{2} + w_{Δ u} {(u_{t + k}^{(i)} - u_{t + k - 1}^{(i)})}^{2} + \frac{w_{margin}}{d_{t + k} - d_{min} + ε}],

(28)

with weights

w_{track} = 1.2, w_{u} = 0.06, w_{Δ u} = 0.45, w_{margin} = 3.5 .

In (28), the smoothness term is evaluated with the convention

u_{t - 1}^{(i)} : = u_{t}^{(i)},

so that the contribution at

k = 0

is zero. Equivalently, the smoothness penalty acts only on increments within the predicted sequence.

Let

ρ_{i} = J_{pred}^{(i)}

denote the score of the i-th feasible candidate. The corresponding Gibbs weights are defined as

p_{i} (λ) = \frac{exp (- λ ρ_{i})}{\sum_{j = 1}^{N} exp (- λ ρ_{j})} .

(29)

The control command applied to the system is obtained through a soft-argmin aggregation of the first control inputs of the feasible candidate sequences,

u_{t} = \sum_{i = 1}^{N} p_{i} (λ) u_{t}^{(i)} .

(30)

This construction corresponds to the Gibbs policy associated with the entropic relaxation and yields a smooth interpolation between averaging (

λ

small) and hard minimum selection (

λ \to \infty

). If the feasible set is empty, the fallback control is

u_{t} = u_{min}

.

The real closed-loop cost uses the same structure as (28), with the addition of a collision penalty

w_{coll} = 2 \times 10^{4}

whenever

d_{t} \leq d_{min}

, and a terminal penalty

3 {(v_{ref} - v_{T}^{e})}^{2} .

Three lead-vehicle regimes are considered.

Easy:

$μ_{v^{ℓ}} = 21.0, σ_{v^{ℓ}} = 1.0, σ_{a^{ℓ}} = 0.15, σ_{pred} = 0.12, σ_{cand} = 0.35,$

with stop-and-go probability 0, hard-brake probability 0, hard-brake mean $- 1.2$ , target number of near-minimizers 1, near-gap range $[0.00, 0.015]$ , mid-gap range $[0.20, 0.60]$ , far-gap range $[1.20, 2.50]$ , initial gap range $[18, 28]$ , and ego-speed bias $0.0$ .
Moderate:

$μ_{v^{ℓ}} = 16.0, σ_{v^{ℓ}} = 2.0, σ_{a^{ℓ}} = 0.45, σ_{pred} = 0.45, σ_{cand} = 0.55,$

with stop-and-go probability $0.15$ , hard-brake probability $0.05$ , hard-brake mean $- 2.4$ , target number of near-minimizers 6, near-gap range $[0.00, 0.045]$ , mid-gap range $[0.12, 0.45]$ , far-gap range $[0.70, 1.80]$ , initial gap range $[12, 22]$ , and ego-speed bias $0.5$ .
Hard:

$μ_{v^{ℓ}} = 11.0, σ_{v^{ℓ}} = 2.8, σ_{a^{ℓ}} = 0.85, σ_{pred} = 0.90, σ_{cand} = 0.85,$

with stop-and-go probability $0.40$ , hard-brake probability $0.15$ , hard-brake mean $- 3.8$ , target number of near-minimizers 18, near-gap range $[0.00, 0.090]$ , mid-gap range $[0.08, 0.28]$ , far-gap range $[0.25, 0.80]$ , initial gap range $[10, 18]$ , and ego-speed bias $1.2$ .

The initial gap, lead speed, and ego speed are sampled from these regime-dependent distributions. For reproducibility, the calibration set sizes are 260 samples per regime for global and local calibration and 520 samples per regime for OracleCal, while the evaluation uses 140 paired episodes per regime. The calibration grid is

λ \in [2, 80],

and the oracle-performance grid is restricted to

λ \in [8, 60],

with an additional minimum coverage constraint of

0.75

.

Unless otherwise stated, the conformal calibration in this experiment uses target coverage level

1 - α = 0.9

and tolerance

τ = 0.11

for the relaxation error

E_{λ} (ρ)

. In particular, the worst-case baseline is defined as

λ = \frac{log M}{τ} .

For

M = 100

and

τ = 0.11

, this yields

λ = 41.8652

.

The application-level performance metrics recorded in the ACC experiment are summarized in Table 5. At the operator level, we also report empirical coverage, mean relaxation error, the empirical

0.95

quantile of

E_{λ}

, and the distances to OracleCal and OraclePerf.

6.2.2. Results

The calibrated inverse temperatures are reported in Table 6. The oracle-performance inverse temperatures are selected on the restricted grid

λ \in [8, 60]

subject to a minimum empirical coverage level of

0.75

. The oracle calibration inverse temperatures are well separated across regimes:

λ_{oracle - cal}^{easy} = 37.7422, λ_{oracle - cal}^{moderate} = 30.1098, λ_{oracle - cal}^{hard} = 21.7327 .

The proposed conformal selector closely tracks these values in all three regimes, whereas the global, mean-risk, and worst-case baselines remain substantially misaligned, especially in the hard regime. The spread of the oracle calibration inverse temperatures is

16.0095

.

Figure 4 shows the regime-specific calibration curves. Table 7 and Table 8 report aggregate results across all regimes. The oracle-performance inverse temperature is computed on a validation set, whereas the results reported in Table 7 and Table 8 are obtained on an independent test set. Therefore, OraclePerf need not minimize the reported test cost.

The ACC experiment shows that the proposed selector remains close to the oracle calibration inverse temperature and attains coverage close to the target level. It also shows that the inverse temperature selected for calibration need not coincide with the one preferred by closed-loop performance (see Figure 5). In the easy and moderate regimes, the performance-oriented value lies near the upper end of the admissible range, whereas in the hard regime a smaller value is preferred.

7. Concluding Remarks

This paper studied the principled selection of the inverse temperature parameter in entropic soft-min relaxations. Starting from the definition of the operator, structural properties of the associated relaxation error were established, including nonnegativity, monotonicity with respect to the inverse temperature parameter, and approximation bounds in finite and asymptotic regimes.

On this basis, a conformal calibration procedure was introduced to select the smallest inverse temperature ensuring that the relaxation error satisfies a prescribed tolerance with finite-sample distribution-free validity. The resulting rule provides an explicit certificate on the approximation quality of the entropic relaxation under exchangeability of the observed instances.

The numerical experiments support the main claims of the paper. The heterogeneous control-oriented benchmark shows that the proposed conformal selector accurately tracks the oracle calibration inverse temperature in non-homogeneous settings where a single global inverse temperature is inadequate. In the same benchmark, an additional shifted evaluation illustrates that the finite-sample guarantee is tied to the exchangeable setting of Theorem 4: when the test distribution departs from the calibration distribution, the nominal coverage level is no longer guaranteed. The adaptive cruise control experiment demonstrates that the in-distribution behavior of the proposed method persists in a realistic control scenario with explicit safety filtering and uncertain prediction, thereby establishing practical relevance beyond synthetic operator tests.

At the same time, the application experiment clarifies an important conceptual point: an inverse temperature that is optimal for certifying the approximation quality of the entropic soft-min operator is not necessarily identical to the inverse temperature that minimizes the final task-level cost. This distinction does not weaken the role of conformal calibration; rather, it clarifies its purpose. The proposed method provides a certified, distribution-free, and regime-adaptive mechanism for selecting the soft-min inverse temperature itself, which can then be incorporated into broader optimization and control architectures.

The present method selects a single inverse temperature for a given calibration population, or a single inverse temperature per regime when regime-specific calibration is used. Extending this framework toward instance-conditional or covariate-aware conformal calibration constitutes a natural next step, especially in settings where side information is available at test time and finer-grained adaptation is desirable.

Author Contributions

Conceptualization, J.E.S.; Software, A.F.-F.; Formal analysis, J.E.S.; Investigation, J.E.S. and A.F.-F.; Data curation, A.F.-F.; Writing—original draft, J.E.S.; Writing—review and editing, J.E.S. and A.F.-F.; Supervision, J.E.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work has received funding from the European Union’s DIGITAL Europe Programme under Grant Agreement No. 101226207 (project AI-SECRETT), from the Spanish Government under Grant PID2024-156583OB-I00 (funded by MCIN/AEI/10.13039/501100011033), and from the Generalitat Valenciana under Grant CIGE/2024/195.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Nesterov, Y. Smooth minimization of non-smooth functions. Math. Program. 2005, 103, 127–152. [Google Scholar] [CrossRef]
Boyd, S.; Vandenberghe, L. Convex Optimization; Number pt. 1 in Berichte über Verteilte Messysteme; Cambridge University Press: Cambridge, UK, 2004; Available online: https://books.google.es/books?id=mYm0bLd3fcoC (accessed on 30 March 2026).
Nishioka, A.; Kanno, Y. A feasible smoothing accelerated projected gradient method for nonsmooth convex optimization. Oper. Res. Lett. 2024, 57, 107181. [Google Scholar] [CrossRef]
Palomar, D.P. Convex Optimization Theory. In Portfolio Optimization: Theory and Application; Cambridge University Press: Cambridge, UK, 2025; pp. 491–538. Available online: https://portfoliooptimizationbook.com/slides/slides-convex-optimization-theory.pdf (accessed on 30 March 2026).
Blanchard, P.; Higham, D.J.; Higham, N.J. Accurately computing the log-sum-exp and softmax functions. IMA J. Numer. Anal. 2020, 41, 2311–2330. [Google Scholar] [CrossRef]
Calafiore, G.C.; Gaubert, S.; Possieri, C. Log-Sum-Exp Neural Networks and Posynomial Models for Convex and Log-Log-Convex Data. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 827–838. [Google Scholar] [CrossRef]
Zhang, S.; Tepedelenlioğlu, C.; Banavar, M.K.; Spanias, A. Max-consensus using the soft maximum. In Proceedings of the 2013 Asilomar Conference on Signals, Systems and Computers; IEEE: Piscataway, NJ, USA, 2013; pp. 433–437. [Google Scholar] [CrossRef]
Nowzari, A.; Rabbat, M.G. Improved Bounds for Max Consensus in Wireless Networks. IEEE Trans. Signal Inf. Process. Netw. 2019, 5, 305–319. [Google Scholar] [CrossRef]
Lefebvre, T.; Crevecoeur, G. On Entropy Regularized Path Integral Control for Trajectory Optimization. Entropy 2020, 22, 1120. [Google Scholar] [CrossRef] [PubMed]
Lefebvre, T.; Crevecoeur, G. Entropy Regularised Deterministic Optimal Control: From Path Integral Solution to Sample-Based Trajectory Optimisation. In Proceedings of the 2022 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM); IEEE Press: Piscataway, NJ, USA, 2022; pp. 401–408. [Google Scholar] [CrossRef]
Williams, G.; Drews, P.; Goldfain, B.; Rehg, J.M.; Theodorou, E.A. Aggressive driving with model predictive path integral control. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA); IEEE: Piscataway, NJ, USA, 2016; pp. 1433–1440. [Google Scholar] [CrossRef]
Xu, H.; Xuan, J.; Zhang, G.; Lu, J. Trust region policy optimization via entropy regularization for Kullback–Leibler divergence constraint. Neurocomputing 2024, 589, 127716. [Google Scholar] [CrossRef]
Tao, F.; Wu, M.; Cao, Y. Generalized Maximum Entropy Reinforcement Learning via Reward Shaping. IEEE Trans. Artif. Intell. 2024, 5, 1563–1572. [Google Scholar] [CrossRef]
Wainwright, M.J.; Jordan, M.I. Graphical Models, Exponential Families, and Variational Inference. Found. Trends Mach. Learn. 2008, 1, 1–305. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar] [CrossRef]
Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. In NIPS’13: Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2; Curran Associates Inc.: Red Hook, NY, USA, 2013; pp. 2292–2300. [Google Scholar] [CrossRef]
Bhole, A.; Filabadi, M.M.; Crevecoeur, G.; Lefebvre, T. Unifying Entropy Regularization in Optimal Control: From and Back to Classical Objectives via Iterated Soft Policies and Path Integral Solutions. arXiv 2025, arXiv:2512.06109. [Google Scholar] [CrossRef]
Arriojas, A.; Adamczyk, J.; Tiomkin, S.; Kulkarni, R.V. Entropy Regularized Reinforcement Learning Using Large Deviation Theory. Phys. Rev. Res. 2023, 5, 023085. [Google Scholar] [CrossRef]
Vovk, V.; Gammerman, A.; Shafer, G. Algorithmic Learning in a Random World, 2nd ed.; Springer: Cham, Switzerland, 2022. [Google Scholar] [CrossRef]
Lei, J.; G’Sell, M.; Rinaldo, A.; Tibshirani, R.J.; Wasserman, L. Distribution-Free Predictive Inference for Regression. J. Am. Stat. Assoc. 2018, 113, 1094–1111. [Google Scholar] [CrossRef]
Balasubramanian, V.N.; Ho, S.S.; Vovk, V. Conformal Prediction for Reliable Machine Learning: Theory, Adaptations and Applications; Morgan Kaufmann: Burlington, MA, USA, 2014. [Google Scholar] [CrossRef]
Romano, Y.; Patterson, E.; Candes, E. Conformalized Quantile Regression. In Proceedings of the Advances in Neural Information Processing Systems; Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32, Available online: https://proceedings.neurips.cc/paper_files/paper/2019/file/5103c3584b063c431bd1268e9b5e76fb-Paper.pdf (accessed on 30 March 2026).
Zhou, X.; Chen, B.; Gui, Y.; Cheng, L. Conformal Prediction: A Data Perspective. arXiv 2025, arXiv:2410.06494. [Google Scholar] [CrossRef]
Flovik, V. Quantifying Distribution Shifts and Uncertainties for Enhanced Model Robustness in Machine Learning Applications. arXiv 2024, arXiv:2405.01978. [Google Scholar] [CrossRef]
Lin, V.; Jang, K.J.; Dutta, S.; Caprio, M.; Sokolsky, O.; Lee, I. DC4L: Distribution shift recovery via data-driven control for deep learning models. In Proceedings of the 6th Annual Learning for Dynamics & Control Conference, 15–17 July 2024; PMLR, Proceedings of Machine Learning Research; Abate, A., Cannon, M., Margellos, K., Papachristodoulou, A., Eds.; University of Oxford: Oxford, UK, 2024; Volume 242, pp. 1526–1538. Available online: https://proceedings.mlr.press/v242/lin24b.html (accessed on 30 March 2026).
Danesh, M.H.; Wabartha, M.; Pineau, J.; Lin, H.C. Mitigating Distribution Shifts: Uncertainty-Aware Offline-to-Online Reinforcement Learning. 2025. Available online: https://openreview.net/forum?id=0WqAnYWi7H (accessed on 30 March 2026).
Guo, L.; Ge, P.; Sun, D.; Qiao, Y. Adaptive Cruise Control Based on Model Predictive Control with Constraints Softening. Appl. Sci. 2020, 10, 1635. [Google Scholar] [CrossRef]
Li, X.; Girard, A.; Kolmanovsky, I. Safe Adaptive Cruise Control Under Perception Uncertainty: A Deep Ensemble and Conformal Tube Model Predictive Control Approach. In Proceedings of the 2025 IEEE 64th Conference on Decision and Control (CDC); IEEE: Piscataway, NJ, USA, 2025; pp. 3081–3088. [Google Scholar] [CrossRef]
Wang, J.; Gong, X.; Wang, P.; Wang, Y.; Wang, R.; Guo, L.; Hu, Y.; Chen, H. A Stochastic Predictive Adaptive Cruise Control System with Uncertainty-Aware Velocity Prediction and Parameter Self-Learning. IEEE Trans. Intell. Transp. Syst. 2024, 25, 13900–13913. [Google Scholar] [CrossRef]
Donsker, M.D.; Varadhan, S.R.S. Asymptotic evaluation of certain markov process expectations for large time, I. Commun. Pure Appl. Math. 1975, 28, 1–47. [Google Scholar] [CrossRef]
Dembo, A.; Zeitouni, O. Large Deviations Techniques and Applications, 2nd ed.; Stochastic Modelling and Applied Probability; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar] [CrossRef]
Dupuis, P.; Ellis, R.S. A Weak Convergence Approach to the Theory of Large Deviations; Wiley Series in Probability and Statistics; John Wiley & Sons: Hoboken, NJ, USA, 1997. [Google Scholar] [CrossRef]
Ellis, R.S. Entropy, Large Deviations, and Statistical Mechanics; Grundlehren der Mathematischen Wissenschaften; Springer: New York, NY, USA, 1985. [Google Scholar] [CrossRef]
Hartmann, C.; Richter, L.; Schütte, C.; Zhang, W. Variational Characterization of Free Energy: Theory and Algorithms. Entropy 2017, 19, 626. [Google Scholar] [CrossRef]
Amari, S.I. Information Geometry and Its Applications; Applied Mathematical Sciences; Springer: Tokyo, Japan, 2016. [Google Scholar] [CrossRef]
Wong, R. Asymptotic Approximations of Integrals; Academic Press: Cambridge, MA, USA, 1989. [Google Scholar] [CrossRef]
de Bruijn, N. Asymptotic Methods in Analysis; Bibliotheca Mathematica; Dover Publications: Garden City, NY, USA, 1981; Available online: https://books.google.es/books?id=_tnwmvHmVwMC (accessed on 30 March 2026).
Temme, N.M. Uniform asymptotic methods for integrals. Indag. Math. 2013, 24, 739–765. [Google Scholar] [CrossRef]
Oliveira, R.I.; Orenstein, P.; Ramos, T.; Romano, J.V. Split Conformal Prediction and Non-Exchangeable Data. J. Mach. Learn. Res. 2022, 25, 225:1–225:38. Available online: http://jmlr.org/papers/v25/23-1553.html (accessed on 30 March 2026).
Chacko, P.J.; Krishna, S.M.; Haneesh, K.M.; Daya, J.L.F.; Stonier, A.A. Design and validation of minimal jerk lane changing manoeuvre for adaptive cruise control in electric vehicles. Discov. Appl. Sci. 2025, 7, 1363. [Google Scholar] [CrossRef]

Figure 1. Summary of the experimental pipeline used in the numerical studies. In both experiments, a calibration sample is first generated to select the inverse temperature parameter, after which the resulting value is assessed on paired evaluation episodes and compared with the considered baselines.

Figure 2. Experiment 1: aggregate comparison in the heterogeneous control-oriented benchmark. The panels report closed-loop cost, empirical coverage, average selected inverse temperature, mean distance to the oracle calibration inverse temperature, upper quantiles of the relaxation error, and excess conservativeness.

Figure 3. Experiment 1: regime-specific calibration curves. The proposed conformal inverse temperature tracks the oracle calibration inverse temperature in all three regimes, whereas the global inverse temperature fails to adapt as ambiguity increases.

Figure 4. Experiment 2: calibration curves in the ACC benchmark. The feasible-candidate formulation avoids collapse and yields distinct oracle calibration inverse temperatures across traffic regimes.

Figure 5. Experiment 2: aggregate comparison in the ACC benchmark. The proposed selector remains close to the oracle calibration inverse temperature and achieves near-target coverage, while OraclePerf identifies a distinct task-optimal inverse temperature.

Table 1. Summary of the inverse-temperature selection methods compared in the numerical experiments.

Method	Calibration Data	Selection Rule	Availability
ProposedConf	Regime-specific calibration sample	Smallest $λ$ such that the empirical $(1 - α)$ quantile of $E_{λ}$ does not exceed $τ$	Experiments 1 and 2
GlobalFixed	Pooled calibration sample across regimes	Same conformal rule applied after pooling all regimes	Experiments 1 and 2
MeanRisk	Same sample as GlobalFixed	Smallest $λ$ such that the empirical mean of $E_{λ}$ does not exceed $τ$	Experiments 1 and 2
WorstCase	No data-driven calibration	Deterministic rule $λ = log (N) / τ$ obtained from Theorem 2	Experiments 1 and 2
OracleCal	Large regime-specific calibration sample	Smallest feasible $λ$ computed on the oracle calibration set	Experiments 1 and 2
OraclePerf	Validation set of simulated episodes	Value of $λ$ minimizing the empirical closed-loop cost subject to a minimum coverage requirement	Experiment 2 only

Table 2. Experiment 1: calibrated inverse temperatures by regime.

Regime	ProposedConf	GlobalFixed	MeanRisk	OracleCal
Easy	36.1824	35.9429	27.0822	36.1824
Moderate	29.2375	35.9429	27.0822	29.2375
Hard	16.7846	35.9429	27.0822	16.7846

Table 3. Experiment 1: absolute distance to the oracle calibration inverse temperature.

Method	Easy	Moderate	Hard
ProposedConf	0.0000	0.0000	0.0000
GlobalFixed	0.2395	6.7054	19.1583
MeanRisk	9.1002	2.1553	10.2976
WorstCase	5.4065	12.3514	24.8043

Table 4. Experiment 1: shifted evaluation illustrating the role of exchangeability. The inverse temperature is calibrated on the moderate regime and evaluated on a shifted regime with sharper candidate geometry.

Method	Coverage	$q_{0.95} (E_{λ})$	$\| λ - λ_{oracle} \|$
ProposedConf	0.0000	0.1221	6.7054
GlobalFixed	0.9590	0.0999	0.0000
MeanRisk	0.0000	0.1315	8.8607
WorstCase	1.0000	0.0868	5.6459
OracleShift	0.9590	0.0999	0.0000

Table 5. Application-level performance metrics recorded in the ACC experiment.

Metric	Definition
Real closed-loop cost	Accumulated stage cost over the episode, including tracking, control, smoothness, and margin terms, plus the collision penalty when $d_{t} \leq d_{min}$
Collision rate	Fraction of time steps for which $d_{t} \leq d_{min}$
Unsafe rate	Fraction of time steps for which $d_{t} < d_{safe} (v_{t}^{e})$
Minimum realized distance	${min}_{t} d_{t}$ over the episode
Accumulated speed tracking error	$\sum_{t} \| v_{ref} - v_{t}^{e} \|$
Control energy	$\sum_{t} u_{t}^{2}$
Total variation of the acceleration command	$\sum_{t} \| u_{t} - u_{t - 1} \|$ with $u_{- 1} = 0$
Accumulated jerk	$\sum_{t} \| (u_{t} - u_{t - 1}) / Δ t \|$ with $u_{- 1} = 0$

Table 6. Experiment 2: calibrated inverse temperatures in the ACC benchmark.

Regime	ProposedConf	GlobalFixed	MeanRisk	WorstCase	OracleCal	OraclePerf
Easy	37.5561	37.1838	28.2482	41.8652	37.7422	60.0000
Moderate	30.1098	37.1838	28.2482	41.8652	30.1098	60.0000
Hard	21.7327	37.1838	28.2482	41.8652	21.7327	8.0000

Table 7. Experiment 2: aggregate results across all regimes.

Method	Cost	Unsafe Rate	Min Gap	Coverage	$\| λ - λ_{cal} \|$	$\| λ - λ_{perf} \|$
ProposedConf	8474.536	0.699	14.581	0.938	0.0621	22.0223
GlobalFixed	8416.732	0.706	14.421	0.906	7.6945	24.9387
MeanRisk	8459.397	0.700	14.545	0.521	5.9570	27.9173
WorstCase	8399.067	0.708	14.369	1.000	12.0036	23.3783
OracleCal	8474.531	0.699	14.581	0.969	0.0000	21.9602
OraclePerf	8509.924	0.698	14.634	0.949	21.9602	0.0000

Table 8. Experiment 2: supplementary aggregate application-level results across all regimes.

Method	Speed Tracking Error	Control Energy	Total Variation	Jerk
ProposedConf	460.190	257.917	47.493	237.465
GlobalFixed	458.115	257.767	47.909	239.545
MeanRisk	459.952	257.898	47.548	237.740
WorstCase	457.381	257.729	48.081	240.407
OracleCal	460.188	257.917	47.493	237.467
OraclePerf	460.231	258.272	47.808	239.040

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Solanes, J.E.; Francés-Falip, A. Regime-Adaptive Conformal Calibration of Entropic Soft-Min Relaxations for Heterogeneous Optimization Problems. Mathematics 2026, 14, 1188. https://doi.org/10.3390/math14071188

AMA Style

Solanes JE, Francés-Falip A. Regime-Adaptive Conformal Calibration of Entropic Soft-Min Relaxations for Heterogeneous Optimization Problems. Mathematics. 2026; 14(7):1188. https://doi.org/10.3390/math14071188

Chicago/Turabian Style

Solanes, J. Ernesto, and Aitana Francés-Falip. 2026. "Regime-Adaptive Conformal Calibration of Entropic Soft-Min Relaxations for Heterogeneous Optimization Problems" Mathematics 14, no. 7: 1188. https://doi.org/10.3390/math14071188

APA Style

Solanes, J. E., & Francés-Falip, A. (2026). Regime-Adaptive Conformal Calibration of Entropic Soft-Min Relaxations for Heterogeneous Optimization Problems. Mathematics, 14(7), 1188. https://doi.org/10.3390/math14071188

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Regime-Adaptive Conformal Calibration of Entropic Soft-Min Relaxations for Heterogeneous Optimization Problems

Abstract

1. Introduction

2. Mathematical Preliminaries and Entropic Relaxation

3. Relaxation Error

4. Approximation Bounds

5. Conformal Calibration

6. Numerical Experiments

6.1. Experiment 1: Heterogeneous Control Benchmark

6.1.1. Setup

6.1.2. Results

6.1.3. Shifted Evaluation and the Role of Exchangeability

6.2. Experiment 2: Application-Oriented Adaptive Cruise Control Benchmark

6.2.1. Setup

6.2.2. Results

7. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI