Next Article in Journal
Reliability Analysis of Improved Type-II Adaptive Progressively Inverse XLindley Censored Data
Previous Article in Journal
Dynamic Event-Triggered Interval Observer-Based Fault Detection for a Class of Nonlinear Cyber–Physical Systems with Disturbance
Previous Article in Special Issue
Optimizing a Bayesian Method for Estimating the Hurst Exponent in Behavioral Sciences
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Equivalence Theorem and A Sequential Algorithm for A-Optimal Experimental Designs on Manifolds

Key Laboratory of Advanced Theory and Application in Statistics and Data Science—MOE, School of Statistics, East China Normal University, Shanghai 200062, China
*
Author to whom correspondence should be addressed.
Axioms 2025, 14(6), 436; https://doi.org/10.3390/axioms14060436
Submission received: 29 April 2025 / Revised: 26 May 2025 / Accepted: 29 May 2025 / Published: 2 June 2025
(This article belongs to the Special Issue New Perspectives in Mathematical Statistics)

Abstract

:
Selecting input data points in the context of high-dimensional, nonlinear, and complex data in Riemannian space is challenging. While optimal experimental design theory is well-established in Euclidean space, its extension to Riemannian manifolds remains underexplored. Li and Del Castillo recently obtained new theoretical results on D-optimal and G-optimal designs on Riemannian manifolds. This paper follows their framework to investigate A-optimal designs on such manifolds. We prove an equivalence theorem for A-optimality under the manifold regularization model. Based on this result, a sequential algorithm for identifying A-optimal designs on manifold data is developed. Numerical studies using both synthetic and real datasets show the validity of the proposed method.

1. Introduction

Supervised learning models frequently rely on large labeled datasets to perform well, but labeling these datasets can be time-consuming and costly. A key challenge is how to select the most informative data points to improve model efficiency while minimizing labeling costs. This issue is closely related to the design of experiments (DOE), which focuses on selecting optimal designs based on some statistical criteria. In traditional DOE, it is often assumed that the statistical model is linear and the data lie on Euclidean space, with criteria usually focused on either parameter estimation or predictive performance (see Kiefer and Wolfowitz [1], Fedorov [2], Silvey [3], and Pukelsheim [4] for more details).
Modern tasks such as image recognition and text categorization often involve high-dimensional, nonlinear data. Manifold learning, or nonlinear dimensionality reduction, addresses these challenges by introducing the manifold structure. Basically, manifold learning assumes that although data reside in high-dimensional ambient spaces, they lie on lower-dimensional manifolds that offer a more compact representation (see, e.g., Roweis and Saul [5], Tenenbaum et al. [6], Donoho and Grimes [7], Coifman et al. [8], Belkin et al. [9]). Extensive research supports this assumption, showing that high-dimensional data are often sparse in ambient spaces, with most of the data concentrated on lower-dimensional manifolds. For a good review of existing manifold learning methods, see Meilă and Zhang [10]. Given this, traditional DOE methods, which rely on Euclidean assumptions, often fail to account for the complexities of such nonlinear data. Thus, optimal experimental design on manifolds, which considers the true underlying manifold structure, has become an essential tool in fields like biomedicine and computer vision. Manifold-based approaches enhance both the efficiency and accuracy of experiments, reducing labeling requirements while maximizing information gain.
While some previous studies have implemented optimal experimental design criteria as active learning strategies for high-dimensional data (e.g., He [11], Chen et al. [12], Alaeddini et al. [13]), these efforts lack theoretical justification for applying such designs on Riemannian manifolds. A pioneering study by Li and Del Castillo [14] explored D/G-optimal experimental designs on Riemannian manifolds, proving that D-optimal and G-optimal designs are equivalent in this context. They also proposed a lower bound for the maximum prediction variance and developed an efficient algorithm to find the optimal experimental design on manifold data.
While D-optimal designs are widely used due to their computational simplicity, A-optimal designs are still highly valuable, especially in situations where precise parameter estimation is crucial. Jones et al. [15] showed that A-optimal designs are generally preferred over D-optimal designs for early-stage experiments because they more directly align with the goal of identifying active factors by minimizing the average variance of parameter estimates, leading to better statistical properties, less collinearity, and improved prediction accuracy, especially when the two designs differ. These advantages become particularly evident in complex design spaces, where A-optimal designs lead to more stable optimization outcomes (Stallrich et al. [16]). Building on the above benefits, in this paper, we extend the framework of Li and Del Castillo [14] to investigate A-optimal designs on Riemannian manifolds. We prove an equivalence theorem for A-optimality under the manifold regularization model and propose a sequential algorithm to identify A-optimal experimental designs for manifold data. Numerical experiments on both synthetic and real datasets demonstrate the effectiveness of the proposed method.
The remainder of this paper is organized as follows. Section 2 reviews classical optimal experimental design in Euclidean space and the manifold regularization model from Belkin [9] and Li and Del Castillo [14]. Section 3 presents the main theoretical results, including the equivalence theorem for A-optimal designs on manifolds and a sequential algorithm. Section 4 provides numerical studies to validate the proposed methods. Section 5 concludes the paper with a summary and Section 6 provides some potential directions for future research. Additional simulation results are provided in the Supplementary Materials.

2. Preliminaries and Literature Review

2.1. Classical Optimal Experimental Design on Euclidean Space

The classical optimal experimental design theory on Euclidean space was introduced by Kiefer and Wolfowitz [1]. This framework uses optimization techniques to identify the optimal design that maximizes the efficiency of statistical inference under a given model. For a comprehensive treatment of optimal design theory, see Silvey [3] and Pukelsheim [4].
Consider the general linear regression model:
y = j = 1 p g j ( x ) β j + ϵ ,
where y R is the response variable, x = ( x 1 , , x m ) R m is the vector of input variables, g 1 , , g p are known functions, β = ( β 1 , , β p ) is the vector of unknown parameters, and ϵ is a random error term, assumed to follow a normal distribution with mean 0 and variance σ 2 .
Given a set of n design points { x i = ( x i , 1 , , x i , m ) } i = 1 , , n , suppose the corresponding response values are { y i } i = 1 , , n . The matrix form of the above model is
y = X β + ϵ ,
where y = ( y 1 , , y n ) ,
X = g 1 ( x 1 ) g p ( x 1 ) g 1 ( x n ) g p ( x n ) ,
and ϵ = ( ϵ 1 , , ϵ n ) . The least squares estimate of β is given by
β ^ = ( X X ) 1 X y ,
and the covariance matrix of β ^ is
Cov ( β ^ ) = σ 2 ( X X ) 1 .
The Fisher information matrix for this linear regression model is M = 1 n X X .
In classical optimal design theory, exact designs are approximated by continuous designs, i.e., a probability measure with finite support points [3]. Let
ξ = x 1 , x 2 , , x n 0 q 1 , q 2 , , q n 0 ,
where q i 0 and i = 1 n 0 q i = 1 denote a (continuous) design. The optimal design theory seeks to find the optimal design ξ * that optimizes a certain function of the Fisher information matrix, i.e.,
ξ * = arg min ξ Φ ( M ( ξ ) ) ,
where Φ ( · ) is the criterion function. Clearly, the optimal design ξ * depends on the specific optimality criterion Φ . Numerous optimality criteria have been proposed in the literature, many of which are categorized as alphabetic optimality criteria, named after letters of the alphabet. Common criteria include D-optimality, where Φ ( M ( ξ ) ) = | M ( ξ ) | 1 , A-optimality, where Φ ( M ( ξ ) ) = tr ( M ( ξ ) 1 ) , and G-optimality, where Φ ( M ( ξ ) ) = max x g ( x ) M ( ξ ) g ( x ) with g ( x ) = ( g 1 ( x ) , , g p ( x ) ) . From a statistical perspective, D-optimal designs minimize the volume of the confidence ellipsoid for β [17]. A-optimal designs minimize the average of the variances of the parameter estimators, and G-optimal designs minimize the maximum prediction variance across the design region [3,4].

2.2. Manifold Learning and Manifold Regularization Model

Manifold learning is a form of dimensionality reduction that assumes the data distribution in high-dimensional space lies on a lower-dimensional manifold. The goal is to ensure that the reduced data preserve the geometric properties and structure inherent in the manifold of the high-dimensional space. Let M be a manifold in an m-dimensional space, a true subset of R m , i.e.,  M R m . Manifold learning aims to project this manifold onto a lower-dimensional space, such that M R n with n m . This mapping aims to retain the intrinsic geometry of the manifold while reducing its dimensionality. Several well-known manifold dimensionality reduction algorithms, such as Locally Linear Embedding (LLE) and Isomap, have been proposed and extensively studied; see Roweis and Saul [5], Tenenbaum et al. [6], Donoho and Grimes [7], Coifman et al. [8], and Belkin et al. [9] for examples. For a recent comprehensive review, see Meilă and Zhang [10].
In contrast to conventional nonlinear dimensionality reduction methods, our work focuses on developing an optimal design methodology. We extend the framework proposed by Li and Del Castillo [14], which investigated D- and G-optimal designs in the manifold regularization model introduced by Belkin et al. [9]. Below, we briefly review the model setup; for technical details, see [14] and the references therein.
Let X R d represent the input space and { ( x i , y i ) } i = 1 , , l X × R the labeled training data. Let P be the joint distribution generating the labeled data, and  P X the marginal distribution that generates the unlabeled data { x i } l + 1 , , n X . In traditional machine learning, the goal is to learn a function f : X R that maps instances to labels. Belkin et al. [9] extended this framework to Riemannian manifolds by assuming that the conditional distribution P ( y | x ) varies smoothly along the manifold supporting the data distribution P X . Specifically, if two points x 1 , x 2 X are close in geodesic distance, their label probabilities P ( y | x 1 ) and P ( y | x 2 ) are similar. They proposed a semi-supervised learning framework with a double-regularization objective function:
f ^ = arg min f H K i = 1 l V x i , y i , f + λ A f H K 2 + λ I f I 2 ,
where V is a known loss function, H K is a Hilbert space with a Mercer kernel K , f H K 2 and λ A represent the non-smoothness penalty term and penalty quantity in the ambient space, respectively. The term f I 2 penalizes the non-smoothness of f along the geodesics of the intrinsic manifold, ensuring that the learned function aligns more closely with the underlying data distribution, while λ I is the corresponding penalty weight for the manifold’s intrinsic geometry.
The penalty term f I 2 serves as a roughness penalty related to the distribution P X on the manifold. To illustrate this, consider a specific case where M, the support of P X , is a compact manifold. Then, we can define f I 2 as
x M M f 2 d P X = x M f Δ M f d P X ,
where M is the gradient of f along the manifold M , and  Δ M is the Laplace–Beltrami operator. In most real-world applications, however, the distribution P X is unknown, and one can replace it with an empirical estimate of the marginal distribution. Significant work has focused on scenarios where P X is supported on a compact manifold M R p , and a finite set of points is used to infer the manifold structure (Roweis and Saul [5], Tenenbaum et al. [6], Donoho and Grimes [7], and Coifman et al. [8]). Given these assumptions, it has been demonstrated (see, for instance, Belkin [18]) that the problem (1) can be reformulated as
f ^ = arg min f H K i = 1 l V x i , y i , f + λ A f H K 2 + λ I f L f ,
where f = [ f ( x 1 ) , , f ( x n ) ] is the vector of function values at the data points, and L is the Laplacian matrix associated with the graph G that is constructed from the labeled and unlabeled data points { x i } i = 1 , , n . The graph Laplacian matrix L encodes the connectivity relationships among the data points and serves as an approximation to the Laplace–Beltrami operator on the continuous manifold M .
The infinite-dimensional problem in Equation (2) can be reformulated as a finite sum over the labeled and unlabeled points, as follows:
f ( x ) = i = 1 n α i K x i , x ,
where K is the Mercer kernel associated with the ambient space H K . Denote α = [ α 1 , , α n ] as the vector of parameters α i . The representer formula in (3) enables the derivation of a specific expression for the parameter estimates.
To discuss the optimal experimental design for the manifold regularization model (2), we consider a sequential experimental design problem, beginning with no labeled data. Let the set of labeled points at the k-th iteration be denoted as { z i } i = 1 k { x i } i = 1 n , and the vector of their corresponding response values as y = ( y 1 , , y k ) .
To simplify, we adopt the setting in Li and Del Castillo [14] for the choice of f H K 2 and V in (2), where the squared loss function is used for V, and the RKHS space H K with L 2 norm is chosen for f H K 2 . This leads to the Laplacian Regularized Least Squares (LapRLS) problem (Belkin [18]):
f ^ = arg min f H K i = 1 k y i f z i 2 + λ A f H K 2 + λ I f L f .
By substituting the finite sum from Equation (3), i.e.,  f ( x ) = i = 1 n α i K x i , x , into Equation (4), we obtain the objective function for estimating the parameters α :
α ^ = arg min α R n y K X Z α y K X Z α + λ A α K α + λ I α K L K α ,
where
K X Z = K x 1 , z 1 K x 1 , z k K x n , z 1 K x n , z k n × k ,
and
K = K x 1 , x 1 K x 1 , x n K x n , x 1 K x n , x n n × n .
Taking the derivative of Equation (5) with respect to α and letting it equal to zero gives
α ^ = ( K X Z K X Z + λ A K + λ I K L K ) 1 K X Z y .
To study optimal design theory, we further consider a linear model and a linear kernel for H K without loss of generality. Then, the model parameters β can be estimated as follows:
β ^ = X α ^ = X X Z k Z k X + λ A X X + λ I X X L X X 1 X Z k y = Z k Z k + λ A I p + λ I X L X 1 Z k y ,
where
Z k = g z 1 g z k , X = g x 1 g x n , y = y 1 y k ,
and g x = ( g 1 ( x ) , , g p ( x ) ) . Note that in this case, the model is linear in the parameters β , though the function g ( x ) is generally nonlinear in x .
As shown by He [11], since the regularization parameters λ A and λ I are typically small, we can approximate the covariance matrix of the parameter estimates in the following way [14]:
cov ( β ^ ) σ 2 Z k Z k + λ A I p + λ I X L X 1 .
This equation provides a formula for the information matrix, which is essential for studying optimal design.

3. Main Results

3.1. The A-Optimality Criterion

Consider an infinite set of points x uniformly distributed on a Riemannian manifold M R d . Let ξ represent a (continuous) design (i.e., a  probability measure) on M with finite support, given by
ξ = z 1 , z 2 , , z n 0 q 1 , q 2 , , q n 0 ,
where z M , q i 0 and i = 1 n 0 q i = 1 . Here, ξ ( z i ) = q i indicates that the proportion of experimental effort allocated to the point z i is q i .
By Equation (8), for any design ξ , the information matrix for the LapRLS model (4) can be defined as
M Lap ( ξ ) = z X ξ ( z ) g ( z ) g ( z ) d z + λ A I p + λ I x M g ( x ) Δ M g ( x ) d μ ,
where X M is the experimental region on the manifold M and μ is the uniform measure on M . Since the last two terms in the information matrix depend only on the overall manifold structure and not on the selected dataset, we introduce the following simplification for convenience:
C = λ A I p + λ I x M g ( x ) Δ M g ( x ) d μ .
By substituting the expression for C into Equation (9), the equation simplifies as follows:
M Lap ( ξ ) = z X ξ ( z ) g ( z ) g ( z ) d z + C .
We can now define the A-optimality criterion under the LapRLS model on manifolds. An experimental design ξ * is said to be A-optimal under LapRLS model (4) if it minimizes the trace of the inverse of the information matrix, i.e.,
ξ * = arg min ξ tr M Lap 1 ( ξ ) ,
where the information matrix M Lap ( ξ ) is given by Equation (10). Similar to the statistical meaning of A-optimal designs on Euclidean spaces, an A-optimal design under the LapRLS model (4) minimizes the sum or average of the variances of the parameter estimators β ^ , as seen by Equation (8).

3.2. Equivalence Theorem for A-Optimal Designs on Manifolds

Before proving the equivalence theorem for A-optimal designs under the LapRLS model on manifolds, we first present the following lemma (Proposition 1 in [14]), which establishes the linearity property of the information matrix.
Lemma 1. 
The information matrix M Lap ( · ) is linear with respect to the convex combination of experimental designs. Specifically, for any two designs ξ 1 and ξ 2 on M , and for any 0 < α < 1 , the following holds:
M Lap ξ 3 = ( 1 α ) M Lap ξ 1 + α M Lap ξ 2 ,
where ξ 3 = ( 1 α ) ξ 1 + α ξ 2 is the convex combination of ξ 1 and ξ 2 .
Fedorov [2] established the equivalence theorem for classical linear optimality criteria in Euclidean spaces. We now adapt his approach to prove an equivalence theorem for A-optimal designs under the LapRLS model on manifolds.
It is well-known that the trace function tr ( · ) satisfies the following properties:
tr ( A + B ) = tr ( A ) + tr ( B ) , tr ( k A ) = k tr ( A ) , tr ( A ) 0 ,
where A and B are positive semidefinite matrices. Now, consider the matrix
A = ( 1 α ) A 1 + α A 2 ,
where 0 α 1 , and  A 1 and A 2 are positive semidefinite matrices. It is known that
A 1 ( 1 α ) A 1 1 + α A 2 1 ,
where the matrix inequality implies a positive semidefinite relationship (i.e., A B means B A is positive semidefinite). By applying the trace function to both sides of the inequality, we get
tr ( A 1 ) ( 1 α ) tr ( A 1 1 ) + α tr ( A 2 1 ) ,
which shows that tr [ A 1 ] is a convex function of A for a convex set of positive semidefinite matrices.
Using these properties and Lemma 1, we can derive the following equivalence theorem for A-optimal designs under the LapRLS model on manifolds.
Theorem 1. 
(Equivalence Theorem of A-Optimality on Manifolds) The following statements are equivalent:
(1) 
The experimental design ξ * is A-optimal under the LapRLS model (4), i.e.,  ξ * minimizes tr ( M Lap 1 ( ξ ) ) ;
(2) 
The experimental design ξ * minimizes
sup z X tr M Lap 1 ( ξ ) g ( z ) g ( z ) + C M Lap 1 ( ξ ) ;
(3) 
sup z X tr M Lap 1 ( ξ * ) g ( z ) g ( z ) + C M Lap 1 ( ξ * ) = t r ( M Lap 1 ( ξ * ) ) ;
where the information matrix M Lap ( ξ ) is defined in Equation (10).
Proof. 
(i) Proof of (1) ⇒ (2) and (3).
Let ξ * be the design that minimizes tr ( M Lap 1 ( ξ ) ) , and let ξ be an arbitrary design, not necessarily optimal. Define the new design ξ 0 as a convex combination of ξ * and ξ :
ξ 0 = ( 1 α ) ξ * + α ξ ,
where 0 α 1 .
By Lemma 1, the information matrix for ξ 0 is a convex combination of the information matrices for ξ * and ξ :
M Lap ( ξ 0 ) = ( 1 α ) M Lap ( ξ * ) + α M Lap ( ξ ) .
Since ξ * minimizes tr ( M Lap 1 ( ξ ) ) , this minimality can be expressed in terms of derivatives. Differentiating M Lap 1 ( ξ ) with respect to α and simplifying the notation by using M instead of M Lap , we get
M 1 ( ξ 0 ) α = M 1 ( ξ 0 ) M ( ξ ) M ( ξ * ) M 1 ( ξ 0 ) .
When α = 0 , we have ξ 0 = ξ * . Using the convexity of tr ( M Lap 1 ( ξ ) ) , we can conclude that
tr ( M 1 ( ξ 0 ) ) α α = 0 = tr M 1 ( ξ * ) M ( ξ ) M ( ξ * ) M 1 ( ξ * ) 0 .
Now, assume without loss of generality that the design ξ consists of only a single point z X . Substituting the information matrix of this singleton set into the above equation, we obtain
tr M 1 ( ξ * ) g ( z ) g ( z ) + C M ( ξ * ) M 1 ( ξ * ) 0 .
Rearranging the terms leads to
tr M 1 ( ξ * ) g ( z ) g ( z ) + C M 1 ( ξ * ) tr M 1 ( ξ * ) .
Next, we decompose the information matrix for any design ξ and apply the properties of the trace function and the interchangeability of integration and summation:
tr ( M 1 ( ξ ) ) = tr M 1 ( ξ ) M ( ξ ) M 1 ( ξ ) = tr M 1 ( ξ ) ξ ( z ) g ( z ) g ( z ) d z + C M 1 ( ξ ) = tr M 1 ( ξ ) ξ ( z ) g ( z ) g ( z ) M 1 ( ξ ) d z + M 1 ( ξ ) C M 1 ( ξ ) = ξ ( z ) tr g ( z ) M 1 ( ξ ) g ( z ) M 1 ( ξ ) d z + tr M 1 ( ξ ) C M 1 ( ξ ) .
Using the form above, we derive the following inequality:
sup z X tr M 1 ( ξ ) g ( z ) g ( z ) + C M 1 ( ξ ) sup i tr M 1 ( ξ ) g ( z i ) g ( z i ) + C M 1 ( ξ ) tr ( M 1 ( ξ ) ) tr ( M 1 ( ξ * ) ) .
Finally, combining the inequalities from Equations (11) and (12), we can derive the implications stated in parts (2) and (3) of the theorem.
(ii) Proof of (2) ⇒ (1).
We will prove the result by contradiction. Assume that ξ * does not satisfy (1), i.e.,
tr M 1 ξ * > inf ξ t r M 1 ( ξ ) .
According to the definition of the infimum, there exists a design ξ such that
tr M 1 ξ 0 α < 0 , where ξ 0 = ( 1 α ) ξ * + α ξ .
Substituting α = 0 , we obtain the following expression based on Equation (11):
tr M 1 ξ 0 α α = 0 = tr M 1 ξ * tr M 1 ξ * M ( ξ ) M 1 ( ξ * ) = tr M 1 ξ * tr M 1 ξ * ξ ( z ) g ( z ) g ( z ) d z + C M 1 ξ * sup z X tr M 1 ξ * tr M 1 ξ * g ( z ) g ( z ) + C M 1 ξ * 0 .
However, this contradicts our original assumption in Equation (13), which completes the proof.
(iii) Proof of (3) ⇒ (1).
We continue by assuming, for the sake of contradiction, that ξ * does not satisfy (1). Following a similar argument as in the previous proof in (ii), we can derive the contradiction between Equations (13) and (14). The proof is complete.    □

3.3. Sequential Algorithm with Finite Candidate Points

Li and Del Castillo [14] proposed the ODOEM (Optimal Design of Experiments on Manifolds) Algorithm 1, which is based on the D-optimality criterion under the LapRLS model. This algorithm is designed for scenarios where the experimental region is defined by a discrete set of candidate points.
Algorithm 1 ODOEM with discrete candidate points under the D-optimality criterion.
  • Require:  λ A , λ I , kernel function K , discrete candidate points { x i } i = 1 n within the experimental region, and initial labeled set L k = { ( z i , y i ) } i = 1 k .
  • Ensure: D-optimal design on the manifold M .
  •     Define the unlabeled set U k = { x i } i = 1 n { z i } i = 1 k .
  •     Compute the initial information matrix from the labeled set:
    M Lap ( L k ) = K X Z K X Z + λ A K + λ I K L K ,
  •     while more data points need to be labeled do
  •         1. Find z k + 1 such that
    z k + 1 = arg max u U k K X u M Lap 1 ( L k ) K X u ,
        where K X u = [ K ( x 1 , u ) , K ( x 2 , u ) , , K ( x n , u ) ] .
  •         2. Update the labeled and unlabeled sets:
    L k + 1 = L k { ( z k + 1 , y k + 1 ) } , U k + 1 = U k { z k + 1 } .
  •         3. Compute the updated information matrix M Lap ( L k + 1 ) , set k = k + 1 and repeat Steps 1–3.
  •     end while
Inspired by the equivalence theorem for A-optimality under the LapRLS model as presented in Theorem 1, we now introduce a modified version of the original ODOEM algorithm, referred to as Algorithm 2 or ODOEM2. This new algorithm adapts the ODOEM framework, which was originally designed for the D-optimal criterion, to the A-optimal criterion in scenarios where the experimental region consists of a discrete set of candidate points.
Algorithm 2 ODOEM2 with discrete candidate points under the A-optimality criterion.
  • Require:  λ A , λ I , kernel function K , discrete candidate points { x i } i = 1 n within the experimental region, and initial labeled set L k = { ( z i , y i ) } i = 1 k .
  • Ensure: A-optimal design on the manifold M .
  •     Define the unlabeled set U k = { x i } i = 1 n { z i } i = 1 k .
  •     Compute the initial information matrix from the labeled set:
    M Lap ( L k ) = K X Z K X Z + λ A K + λ I K L K ,
  •     while more data points need to be labeled do
  •         1. Find z k + 1 such that
    z k + 1 = arg sup u U k tr M Lap 1 L k K X u K X u + C M Lap 1 L k ,
        where K X u = [ K ( x 1 , u ) , K ( x 2 , u ) , , K ( x n , u ) ] .
  •         2. Update the labeled and unlabeled sets:
    L k + 1 = L k { ( z k + 1 , y k + 1 ) } , U k + 1 = U k { z k + 1 } .
  •         3. Compute the updated information matrix M Lap ( L k + 1 ) , set k = k + 1 and repeat Steps 1–3.
  •     end while
In many practical scenarios, the intrinsic manifold structure of high-dimensional data is unknown and needs to be inferred from the data themselves. To estimate this structure, an adjacency graph is typically employed, where the local distances between points approximate the geodesic distances along the manifold. The Laplace–Beltrami operator in the information matrix is then approximated by the graph Laplacian matrix L associated with the adjacency graph; see Roweis and Saul [5], Tenenbaum et al. [6], Donoho and Grimes [7], Coifman et al. [8], and Belkin et al. [9].

4. Simulation Study

We consider two simulation scenarios (synthetic datasets and real-world high-dimensional image datasets) to evaluate our ODOEM2 algorithm (Algorithm 2), which is based on A-optimality under the LapRLS model.
As demonstrated in [14], the D-optimal ODOEM algorithm (Algorithm 1) has several advantages: (a) Unlike classical D-optimal designs without considering the manifold structure, which tend to cluster instances in specific regions, ODOEM ensures a more uniform distribution of instances across the manifold, leading to improved overall fitting performance. (b) ODOEM adapts effectively to various manifold structures, and offers superior and stable fitting performance in various scenarios, including both noisy and noise-free cases. (c) ODOEM consistently outperforms other existing algorithms in terms of both fitting accuracy and prediction performance. Please refer to Section 5 of [14].
Due to the strong performance of ODOEM, we mainly compare ODOEM2 to ODOEM in this section. Other existing algorithms are also compared in Section 5. The primary distinction between the two algorithms is the change in the optimality criterion. A notable advantage of the A-optimality criterion is its ability to weight the coefficients, offering greater flexibility in certain applications. As highlighted by Jones et al. [15], A-optimal designs often result in better statistical properties of the model estimation, reduced collinearity, and improved prediction accuracy. Our results indicate that ODOEM2, which finds A-optimal designs under the LapRLS model, performs similarly to the D-optimal ODOEM algorithm, showing comparable or slightly better results in terms of fitting accuracy, prediction, and computing time.

4.1. Synthetic Manifold Datasets

We generate two distinct two-dimensional manifold datasets: one based on a Möbius strip and the other on a saddle surface. Each dataset consists of 400 instances, and 100 instances are selected using the ODOEM and ODOEM2 algorithms, respectively. For both datasets, we visualize the corresponding manifolds in three-dimensional Euclidean space, as shown in Figure 1a,d. The colors on the manifolds represent the true response values { y i } i = 1 n or their estimates { y ^ i } i = 1 n obtained from different experimental designs. The true response values { y i } i = 1 n are given by the equation
y = sin ( 2 u ) + cos ( 3 v ) ,
where u [ 0 , 2 π ) and v [ 0 , 2 π ) . The red numbers on the manifolds indicate the order of the 100 labeled instances selected by the respective design algorithms.
In both ODOEM and ODOEM2, we set the parameters λ A = 0.01 and λ I = ln ( k / n ) , as in [14], where k represents the number of labeled points at the current iteration and n is the total number of candidate points. This choice is intended to ensure numerical stability; see Section 5 and a sensitivity analysis in the Supplementary Materials. More specifically, the small fixed value of λ A helps reduce overfitting, and simplifies model tuning. Meanwhile, the decreasing sequence for λ I adapts the penalty on the manifold structure as more labeled points are added. More specifically, higher values of λ I at early iterations help guide learning with fewer labeled points, while lower values in later iterations allow the model to better adapt to the increasing data, thereby improving overall performance. For the kernel function K , we use a nonlinear Radial Basis Function (RBF) kernel with range parameter set to 0.01. Belkin et al. (2006) [9] proposed the manifold regularization framework, assuming that data points close in geodesic distance on the manifold should have similar response values. The locality property of the RBF kernel aligns well with this assumption, as it assigns high weights only to points that are very close, thereby enforcing strong local smoothness on the manifold. The RBF kernel can approximate any continuous function arbitrarily well, which makes it particularly suitable for nonlinear modeling of high-dimensional manifold data ([19] Chapter 4). Additionally, by mapping manifold data into an infinite-dimensional reproducing kernel Hilbert space (RKHS), the RBF kernel can represent features better (Jayasumana [20]).
In Figure 1, the red numbers and surface colors in panels (b) and (e) represent 100 labeled instances and their fitted response values based ODOEM (Algorithm 1). Similarly, the red numbers and surface colors in panels (c) and (f) represent 100 labeled instances and their fitted response values based on ODOEM2 (Algorithm 2). We observe that in both the Möbius strip and saddle surface examples, the instances selected by ODOEM2 are evenly distributed across the manifolds. The selected regions and outcomes of ODOEM and ODOEM2 are similar, but there are slight differences in the sequence of labeled instances, as indicated by the red numbers on the manifolds. Additionally, as demonstrated in the simulations by Li and Del Castillo [14], classical optimal designs that do not consider the manifold structure tend to cluster instances in specific regions. In contrast, both ODOEM and ODOEM2 significantly improve performance by selecting points more uniformly across the manifold.

4.2. Real Dataset: Columbia Object Image Library

In this subsection, we evaluate the performance of Algorithms 1 and 2 on real-world manifold learning problems using the Columbia Object Image Library (COIL-20). COIL-20 (http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php (accessed on 9 April 2025)) consists of grayscale images of 20 different objects, each captured at 5-degree intervals. The database offers two versions: the unprocessed version, which includes both the object and background, and the processed version, where the background has been removed. For this study, we select the processed version, which contains 1440 standardized images, each of size 32 × 32 pixels. In this dataset, x i i = 1 n represents the object images, while y i i = 1 n denotes the corresponding object angles relative to the observer. The objective is to estimate the angle of each object relative to the observer based on its image.
We choose images of four different objects in COIL-20 as illustration: ‘Lucky Cat’, ‘Car’, ‘Pill Box’, and ‘Bottle’. For each object, we applied both ODOEM (Algorithms 1) and ODOEM2 (Algorithm 2) to choose labeled instances, training the LapRLS model to predict the object’s angle. Similar to the synthetic manifold experiments, we set the range parameter of RBF kernel to 0.01, λ A = 0.01 , and λ I = ln ( k / n ) in both algorithms. A sensitivity analysis of these hyperparameters is provided in the Supplementary Materials.
For the ‘Lucky Cat’ data and the ‘Car’ data, Figure 2 and Figure 3 show the first four images selected by Algorithms 1 and 2 for model training. Both algorithms select images with larger dispersion in terms of angles, which is beneficial for learning. Notably, the four images selected by ODOEM2 (A-optimal design under the LapRLS model) exhibit slightly greater angular dispersion than those selected by ODOEM (D-optimal design under the LapRLS model).
Figure 4 shows the comparison of the fitting performance of the algorithms, measured by the mean squared error (MSE). In addition to ODOEM and ODOEM2, we compare the MSEs with several traditional sequential data selection methods from Section 5 of Li and Del Castillo [14], including RBF kernel regression with D-optimality (D), RBF kernel regression with random sampling (RBF Random), RBF kernel regression with minimax (RBF Minmax), and SVM with manifold adaptive experimental design (SVM MAED). The ODOEM and ODOEM2 algorithms perform much better than others. Both algorithms perform similarly, though the A-optimal design under the LapRLS model performs slightly less well with a small number of labeled points. However, once the labeling proportion exceeds 0.2, its performance surpasses that of the D-optimal design under the LapRLS model. Also, classical optimal designs that ignore manifold structure perform significantly worse than D-optimal designs and A-optimal designs under the LapRLS model. This highlights the effectiveness of ODOEM2 in manifold learning tasks. The results for the ‘Pill Box’ data and the ‘Bottle’ data show similar behavior and are provided in the Supplementary Materials.
To better demonstrate the advantages of ODOEM2, we evaluate additional statistics to compare ODOEM and ODOEM2. Specifically, we compute their Area Under the Curve of Average MSE ( AUC avg MSE ) ([21] Chapter 2, pp. 25–55) and Worst-case and Best-case Mean Squared Error (Worst/Best MSE) [22] for the ‘Lucky Cat’ data. For each statistic, lower values are preferred. The Worst/Best MSE is computed when the number of labeled data is 50 (percentage = 70 % ). The results are
AUC avg MSE : ODOEM = 1.2940 × 10 6 , ODOEM 2 = 4.5170 × 10 5 ,
Worst   MSE : ODOEM = 1.4484 × 10 4 , ODOEM 2 = 6.3364 × 10 3 ,
Best   MSE : ODOEM = 1.1039 × 10 4 , ODOEM 2 = 2.9515 × 10 3 .
These results show that ODOEM2 slightly outperforms ODOEM in terms of overall prediction error, achieving not only a lower average MSE but also better worst-case and best-case performance.
Finally, we compare the execution times of ODOEM and ODOEM2, both of which were implemented in Matlab R2023b. Figure 5 shows the distribution of execution times and the total execution time for each iteration. On average, Algorithm 2 runs 30.60% faster than Algorithm 1 (ODOEM: Mean = 0.0613 s per iteration, ODOEM2: Mean = 0.0425 s per iteration). Despite this difference, both algorithms are relatively fast.

5. Conclusions

In this paper, we reviewed the existing theoretical framework for D-optimal and G-optimal under the LapRLS model experimental designs on Riemannian manifolds. Building on this foundation, we extended the theory by investigating A-optimal experimental designs under the LapRLS model on manifolds. This extension led to the derivation of a new equivalence theorem specifically for A-optimality under the LapRLS model. Furthermore, we proposed a novel sequential algorithm tailored to efficiently find A-optimal experimental designs when only a finite set of candidate data points is available. We then compared our algorithms with the D-optimal design algorithm from Li and Del Castillo [14], and conducted numerical studies on synthetic and real-world datasets. The experimental results on synthetic and real datasets demonstrate the effectiveness of our proposed manifold-based A-optimal design algorithms. In particular, the results on real-world datasets in Section 4.2 show that while the two algorithms perform similarly, our method (Algorithm 2) achieves a slightly more uniform selection of image angles, marginally better prediction performance, and slightly faster runtime compared to Algorithm 1. These findings highlight the potential of A-optimal designs under the LapRLS model for improving experimental outcomes on Riemannian manifolds.

6. Discussion

Future research could focus on exploring optimal designs under other general optimality criteria, such as ϕ p -optimality (see e.g., Chapter 6 of Pukelsheim [4]). Another direction for future work is to extend the results for the LapRLS regularization model to a broader range of models, including those with a more general loss function V in Equation (2), and models that are nonlinear in their parameters β . This could offer deeper insights and more flexibility in addressing complex experimental design problems, especially in high-dimensional and nonlinear settings.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/axioms14060436/s1, Figure S1: Sensitivity analysis on the COIL-20 Lucky Cat image datase evaluated by MSE: (a) Sensitivity to choices of the range parameter of the RBF kernel. (b) Sensitivity to choices of the hyperparameter λ A . (c) Sensitivity to choices of the hyperparameter λ I . Figure S2: Top row: The first four bottle images selected by ODOEM (left) and ODOEM2 (right). Bottom row: The first four pill-box images selected by ODOEM (left) and ODOEM2 (right). The true angle is labeled on top of each image. Compared to ODOEM, the images selected by ODOEM2 exhibit greater dispersion in viewing angles. Figure S3: MSE comparisons among different algorithms on the bottle and the pill-box COIL-20 image datasets. The horizontal axis represents the proportion of labeled images for each object. It can be seen that ODOEM2 consistently outperforms all other algorithms in both examples.

Author Contributions

Writing—original draft, J.Z. and Y.W. Simulation Study—J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by “National Key R&D Program of China 2024YFA1016200”, “National Natural Science Foundation of China grants 12271166, W2412023, and 32030063” and “Natural Science Foundation of Inner Mongolia Autonomous Region of China 2025QN01044”.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Kiefer, J.; Wolfowitz, J. The Equivalence of Two Extremum Problems. Can. J. Math. 1960, 12, 363–366. [Google Scholar] [CrossRef]
  2. Fedorov, V.V. Design of Experiments for Linear Optimality Criteria. Theory Probab. Its Appl. 1971, 16, 189–195. [Google Scholar] [CrossRef]
  3. Silvey, S.D. Optimal Design; Chapman and Hall: London, UK, 1980. [Google Scholar]
  4. Pukelsheim, F. Optimal Design of Experiments; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2006. [Google Scholar]
  5. Roweis, S.T.; Saul, L.K. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef] [PubMed]
  6. Tenenbaum, J.B.; de Silva, V.; Langford, J.C. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 2000, 290, 2319–2323. [Google Scholar] [CrossRef] [PubMed]
  7. Donoho, D.; Grimes, C. Hessian Eigenmaps: Locally Linear Embedding Techniques for High Dimensional Data. Proc. Natl. Acad. Sci. USA 2003, 100, 5591–5596. [Google Scholar] [CrossRef] [PubMed]
  8. Coifman, R.; Lafon, S.; Lee, A.; Maggioni, M.; Nadler, B.; Warner, F.; Zuker, S. Geometric Diffusions as a Tool for Harmonic Analysis and Structure Definition of Data: Diffusion Maps. Proc. Natl. Acad. Sci. USA 2005, 102, 7426–7431. [Google Scholar] [CrossRef] [PubMed]
  9. Belkin, M.; Niyogi, P.; Sindhwani, V. Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples. J. Mach. Learn. Res. 2006, 7, 2399–2434. [Google Scholar]
  10. Meilă, M.; Zhang, H. Manifold Learning: What, How, and Why. Annu. Rev. Stat. Its Appl. 2024, 11, 393–417. [Google Scholar] [CrossRef]
  11. He, X. Laplacian Regularized d-optimal Design for Active Learning and its Application to Image Retrieval. IEEE Trans. Image Process. 2010, 19, 254–263. [Google Scholar] [PubMed]
  12. Chen, C.; Chen, Z.; Bu, J.; Wang, C.; Zhang, L.; Zhang, C. G-optimal Design with Laplacian Regularization. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, Atlanta, GA, USA, 11–15 July 2010; Volume 1, pp. 413–418. [Google Scholar]
  13. Alaeddini, A.; Craft, E.; Meka, R.; Martinez, S. Sequential Laplacian Regularized V-optimal Design of Experiments for Response Surface Modeling of Expensive Tests: An Application in Wind Tunnel Testing. IISE Trans. 2019, 51, 559–576. [Google Scholar] [CrossRef]
  14. Li, H.; Del Castillo, E. Optimal Design of Experiments on Riemannian Manifolds. J. Am. Stat. Assoc. 2024, 119, 875–886. [Google Scholar] [CrossRef]
  15. Jones, B.; Allen-Moyer, K.; Goos, P. A-Optimal versus D-Optimal Design of Screening Experiments. J. Qual. Technol. 2020, 53, 369–382. [Google Scholar] [CrossRef]
  16. Stallrich, J.; Allen-Moyer, K.; Jones, B. D- and A-Optimal Screening Designs. Technometrics 2023, 65, 492–501. [Google Scholar] [CrossRef]
  17. Domagni, F.K.; Hedayat, A.S.; Sinha, B.K. D-optimal saturated designs for main effects and interactions in 2k-factorial experiments. Stat. Theory Relat. Fields 2024, 8, 186–194. [Google Scholar] [CrossRef]
  18. Belkin, M. Problems of Learning on Manifolds. Ph.D. Thesis, The University of Chicago, Chicago, IL, USA, 2003. [Google Scholar]
  19. Schölkopf, B.; Smola, A.J. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond; MIT Press: Cambridge, MA, USA, 2001. [Google Scholar]
  20. Jayasumana, S.; Hartley, R.; Salzmann, M.; Li, H.; Harandi, M. Kernel Methods on Riemannian Manifolds with Gaussian RBF Kernels. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 2464–2477. [Google Scholar] [CrossRef] [PubMed]
  21. Zhou, Z.-H. Machine Learning; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
  22. Singh, V.P.; Bokam, J.K.; Singh, S.P. Best-case, worst-case and mean integral-square-errors for reduction of continuous interval systems. Int. J. Artif. Intell. Pattern Recognit. 2020, 17, 17–28. [Google Scholar] [CrossRef]
Figure 1. Comparison of ODOEM and ODOEM2 designs on the Möbius strip and the saddle surface: (a,d): The colors represent the true response values defined on the Möbius strip and the saddle surface. (b,e): A hundred labeled instances (red numbers) and fitted response values (surface colors) by the ODOEM algorithm based on D-optimality under the LapRLS model (Algorithm 1). (c,f): A hundred labeled instances (red numbers) and fitted response values (surface colors) by the ODOEM2 algorithm based on A-optimality under the LapRLS model (Algorithm 2).
Figure 1. Comparison of ODOEM and ODOEM2 designs on the Möbius strip and the saddle surface: (a,d): The colors represent the true response values defined on the Möbius strip and the saddle surface. (b,e): A hundred labeled instances (red numbers) and fitted response values (surface colors) by the ODOEM algorithm based on D-optimality under the LapRLS model (Algorithm 1). (c,f): A hundred labeled instances (red numbers) and fitted response values (surface colors) by the ODOEM2 algorithm based on A-optimality under the LapRLS model (Algorithm 2).
Axioms 14 00436 g001
Figure 2. First four images selected by ODOEM. The true angle is labeled on top of each image. (Left) ‘Lucky Cat’ data, (Right) ‘Car’ data.
Figure 2. First four images selected by ODOEM. The true angle is labeled on top of each image. (Left) ‘Lucky Cat’ data, (Right) ‘Car’ data.
Axioms 14 00436 g002
Figure 3. First four images selected by ODOEM2. The true angle is labeled on top of each image. (Left) ‘Lucky Cat’ data, (Right) ‘Car’ data.
Figure 3. First four images selected by ODOEM2. The true angle is labeled on top of each image. (Left) ‘Lucky Cat’ data, (Right) ‘Car’ data.
Axioms 14 00436 g003
Figure 4. Comparisons of MSEs between Algorithms 1 (ODOEM) and 2 (ODOEM2). The horizontal axis represents the proportion of images labeled for each object. (Left) ‘Lucky Cat’ data, (Right) ‘Car’ data.
Figure 4. Comparisons of MSEs between Algorithms 1 (ODOEM) and 2 (ODOEM2). The horizontal axis represents the proportion of images labeled for each object. (Left) ‘Lucky Cat’ data, (Right) ‘Car’ data.
Axioms 14 00436 g004
Figure 5. Boxplots of execution time of each iteration of Algorithms 1 and 2.
Figure 5. Boxplots of execution time of each iteration of Algorithms 1 and 2.
Axioms 14 00436 g005
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, J.; Wang, Y. An Equivalence Theorem and A Sequential Algorithm for A-Optimal Experimental Designs on Manifolds. Axioms 2025, 14, 436. https://doi.org/10.3390/axioms14060436

AMA Style

Zhang J, Wang Y. An Equivalence Theorem and A Sequential Algorithm for A-Optimal Experimental Designs on Manifolds. Axioms. 2025; 14(6):436. https://doi.org/10.3390/axioms14060436

Chicago/Turabian Style

Zhang, Jingwen, and Yaping Wang. 2025. "An Equivalence Theorem and A Sequential Algorithm for A-Optimal Experimental Designs on Manifolds" Axioms 14, no. 6: 436. https://doi.org/10.3390/axioms14060436

APA Style

Zhang, J., & Wang, Y. (2025). An Equivalence Theorem and A Sequential Algorithm for A-Optimal Experimental Designs on Manifolds. Axioms, 14(6), 436. https://doi.org/10.3390/axioms14060436

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop