Determining the subspaces for additional sampling is a critical step in adaptive sampling. This section describes the construction of a sample information matrix and the use of the coefficients of correlation between the samples in the information matrix for determining the subspaces for additional sampling. In addition, to eliminate the potential overfitting by global approximate models during sampling, local approximate models are constructed in the identified subspaces for accurately describing the local regions of the target space. Finally, based on these two methods, the information matrix-based dynamically embedded approximate models (IM-DEAM) is proposed.
2.1. Adaptive Sampling with the Information Matrix
For the sample set
X = {
x1,
x2, …,
xm}
T, this study utilizes the Kriging model to construct an approximate model. Kriging combines a global model and a local component:
where,
F(
x) is a polynomial function of order 0, 1, or 2 of
x,
β is a coefficient vector (when
F(
x) is a unit column vector,
β is a constant), and
Z(
x) follows a random distribution of mean 0 and variance σ
2.
F(
x) is a unit column vector, then Equation (1) is:
where,
Y = {
y1,
y2, …,
ym}
T is the m-dimensional column vector composed of sample fitness,
R is the correlation matrix.
For the specific derivation of Kriging, please refer to DACE [
26]. This section mainly introduces the correlation function and the correlation matrix
R.
Based on the stationarity assumption, in the absence of prior knowledge, the covariance of two points is only a function of their distance and does not depend on their position in the design space. It is, in some sense, a non-informational and non-discriminatory assumption [
27]. The information matrix
R (i.e., correlation function matrix) of samples consists of the coefficients of correlation between the samples. It is used for characterising the correlations between them, that is, the degrees of mutual influence between the samples. The correlation function is a function with respect to distance and is used to calculate the coefficients of correlation between samples. The general form of the correlation function is as follows:
where
xi is the
ith sample in the set of initial samples,
θd is the hyperparameter of the correlation function in the
dth dimension, and
D is the number of variables.
For convenience,
R(
θ,
xi,
xj) is denoted as
Rij hereafter. At present, the coefficients of correlation between samples are usually calculated using the following exponential function:
Then the information matrix
R is obtained:
Equation (4) contains two unknowns:
θd and
p. The value of
θd indicates the influenced extension of the variable. The value of
p affects the smoothness of the function. In order to clearly show the influence of the two parameters on the correlation function, the one-dimensional (1
D) Gaussian exponential function indicator function is shown in
Figure 2.
For all curves in
Figure 2a,b, the value of
Rij decreases as the distance increases, that is, the correlation between samples weakens as the distance increases. Two samples with a distance larger than a certain value are considered not correlated. In addition, a larger value of
θ leads to a more rapid decrease in the coefficient of correlation, that is, a smaller threshold value of the distance for the absence of correlation.
Figure 2a,b indicates that the Gaussian correlation function (
p = 2) has parabolic curves near the origin. This implies that, for continuously differentiable problems, the Gaussian correlation function leads to a higher level of smoothness. Therefore, in this study, the Gaussian correlation function is used to calculate sample information matrices and construct the globe Kriging model.
It is found that the LOO cross-validation (CV) errors [
28] are able to estimate the local prediction errors to some extent. A small LOO-CV error implies that the model accuracy is insensitive to the loss of
xi, that is, the approximate model has been well fitted around
xi; while on the contrary, a large LOO-CV error indicates that the region around
xi does not contain enough points such that the model accuracy is significantly affected by the loss of
xi.
Figure 3 depicts the flowchart for adaptive sampling with the information matrix. For a globe Kriging model, it can provide the correlation coefficient
Rij between the
xi and
xj. Based on the globe Kriging model, the LOO can also provide the errors of each sample in the sample set
X. Then based on the LOO errors, the samples with a large error are obtained, i.e.,
Xerror and the optimal sample
xbest ∈
X is obtained. The remaining samples
XRij >
ε meeting the judgment conditions
Rij >
ε (
ε = 0.01) can be obtained through the information matrix
R for
xbest and every sample in
Xerror. Finally, the spaces
Ssub centred on sample
xi ∈ {
Xerror,
xbest} containing the sample set
XRij >
ε is the sampling space in which new sample points
∈
Xnew are obtained by DOE.
2.2. Embedded Approximate Models
In
Section 2.1, the LOO-CV based adaptive sampling directly estimates the prediction errors. In the context of continuous LOO-CV based adaptive sampling, the estimated LOO-CV errors help identify interesting regions for guiding local exploitation in adaptive sampling process [
15,
29]. However, most of new samples would cluster in a small number of spaces which could lead to overfitting. As shown in
Figure 4, based on adaptive sampling, the adaptive sample points are mainly concentrated in the “oscillating region”
x ∈ [0, 1.6] on the left, while the “flat region”
x ∈ [1.6, 4.0] on the right has little samples. However, the global approximation model obtains an optimal “model” conforming to the characteristics of the “oscillating region” by analysing the samples of the whole design space, which results in the continuation of similar “oscillating” characteristics in the “flat region”.
The mathematical function in
Figure 4 is expressed as:
It is obvious that the adaptive sample points (Ada-points) of adaptive sampling mainly cluster in the “oscillating region”
x ∈ [0, 1.6]. Although this makes the global Kriging approximation model fit well in
x ∈ [0, 1.6], serious overfitting phenomenon occurs in the “flat region”
x ∈ [1.6, 4.0], which would directly cause the algorithm to miss the optimal solution. To avoid that, a local embedded approximate model construction method is proposed in this section,
Figure 5 depicts the flowchart of this method. The major steps are as follows:
- (1)
Obtain the centre sample points of all the sub spaces Ssub;
- (2)
Eliminate spaces that have the same center point;
- (3)
Calculate the correlation coefficients between and ; Merge adjacent spaces that the correlation coefficients ≥ σ (σ = 0.8 in this paper), and;
- (4)
Construct local embedded approximation models in the spatial integration regions.
After the construction of dynamic local embedded approximation models (EAM), the model fitting of Equation (4) is shown in
Figure 6. The local approximate models (
EAM1,
EAM2,
EAM3 and
EAM4) are embedded in four different regions respectively, i.e.,
x ∈ [0, 1.6],
x ∈ [0.8, 2.4],
x ∈ [1.6, 3.2], and
x ∈ [3.2, 4.0], and it can be seen that the EAM fitting effect is well not only in the oscillating region
x ∈ [0, 1.6], but also in the other three regions. In fact, the global Kriging approximate model in
Figure 4 did not miss the optima during the optimization process, but it is unacceptable for overfitting phenomenon occurred in
x ∈ [1.6, 3.2]. This phenomenon cannot be predicted, and in most cases, the algorithm will miss the optima and make the optimization failure, which is not worth the candle. The local embedded approximation models can not only achieve a better fitting in the oscillation or the optimal region, but also explore its local features in the rest of the regions, so that the algorithm can get the actual optima.
Figure 7 depicts the information matrix-based dynamically embedded approximate models process. Firstly, based on the correlation coefficient provided by the information matrix, the sampling spaces with large LOO-CV error points and the optimal point of the current sample set are obtained. Then the construction of the global approximation model is completed until the stopping criterion is met. At the same time, the sampling spaces in the adaptive sampling process are integrated. Finally, the dynamic embedded approximation models are constructed in sampling spaces.