Nested Maximum Entropy Designs for Computer Experiments

: Presently, computer experiments with multiple levels of accuracy are widely applied in science and engineering. This paper introduces a class of nested maximum entropy designs for such computer experiments. A multi-layer DETMAX algorithm is proposed to construct nested maximum entropy designs. Based on nested maximum entropy designs, we also propose an integer-programming procedure to specify the sample sizes in multi-ﬁdelity computer experiments. Simulated annealing techniques are used to tackle complex optimization problems in the proposed methods. Illustrative examples show that the proposed nested entropy designs can yield better prediction results than nested Latin hypercube designs in the literature and that the proposed sample-size determination method is effective.


Introduction
With the rapid development of computer simulation technology, computer experiments have been widely used in the manufacturing industry, system engineering, natural science, and other fields [1,2]. Statistical designs for computer experiments have received considerable attention [3][4][5][6]. In many real applications, multi-fidelity computer experiments with different levels of accuracy are often encountered. More accurate experiments need longer computational time, while faster experiments have relatively low accuracy. It seems inefficient to study them separately. Therefore, many authors have studied statistical analysis for integrating multi-fidelity computer experiments with different levels of accuracy [7][8][9][10]. The experimental design issue for such computer experiments has been investigated by [11][12][13][14][15], and many others.
Shannon entropy is a basic concept of information theory [16]. Ref. [17] introduced the use of Shannon entropy as a measure of experimental information for spatial design. He argued that the experiment that minimizes the expected entropy, which is the entropy of the posterior distribution, can provide the largest amount of information for prediction. Ref. [18] proved that minimizing the posterior entropy is equivalent to maximizing the entropy of the prior distribution. The maximum entropy criterion was subsequently adopted as one of the major approaches for computer experiments [1]. Ref. [19] proposed the DETMAX algorithm [20] to efficiently construct the maximum entropy design. Ref. [21] proposed a sequential framework for conducting computer experiments with the maximum entropy criterion. However, to the best of our knowledge, there is no research on the maximum entropy design for multi-fidelity computer experiments.
In this paper, we introduce a class of nested maximum entropy designs with multilayer structures for multi-fidelity computer experiments. Unlike [14]'s nested Latin hypercube designs, a nested maximum entropy design allows for flexibility in sample sizes as the sample size in each larger design does not need to be a multiple of that in a smaller one. Since computer experiments with higher accuracy are more important, we first consider the optimization of lower layers in the nested maximum entropy designs. Based on a layer-by-layer optimization strategy [11,22], a multi-layer DETMAX algorithm is proposed to construct such nested maximum entropy designs. The algorithm begins to generate a maximum entropy design for the lowest layer. Subsequently, we fix the design points optimized in lower layers, and optimize the current layer according to the maximum entropy criterion step by step, until the whole design is completely optimized. Based on nested maximum entropy designs, we also propose an integer-programming procedure to specify the sample sizes in multi-accuracy computer experiments under the budget constraint. Simulated annealing techniques [23] are adopted to tackle complex optimization problems in the proposed approaches. Illustrative examples are presented to show the effectiveness of our methods.
The contributions of this paper are summarized as follows. First, we introduce a new type of model-based design for multi-fidelity computer experiments based on information entropy. Second, our methods are flexible in sample sizes of multi-fidelity computer experiments. Third, this paper first provides an entropy-based strategy to determine sample sizes of multi-fidelity computer experiments.
The rest of this paper is organized as follows. Section 2 reviews the concept of maximum entropy designs for a single level of accuracy. In Section 3, the DETMAX algorithm is extended to construct nested maximum entropy designs. Section 4 deals with the sample-size determination of multi-accuracy computer experiments. Section 5 provides numerical examples. We end this paper with some concluding remarks in Section 6.

Review of Maximum Entropy Designs
In this section, we give a review of maximum entropy designs. Consider the following Kriging model, . . , f m (x)) is a prespecified vector of regression functions, β = (β 1 , . . . , β m ) presents a vector of unknown regression coefficients, Z(x) is a stationary Gaussian process with zero mean, variance σ 2 , and the correlation function for x 1 = (x 11 , . . . , x 1p ) and x 2 = (x 21 , . . . , x 2p ) and a vector of correlation parameters . , x n } and y = (y(x 1 ), . . . , y(x n )) represent a design with n runs and the corresponding vector of response values, respectively. Ref. [18] used the expected change in information to evaluate the design D. Since entropy is the negative of information, maximizing the expected change in information is equivalent to maximizing the entropy of the responses at the points in the design, denoted by H(Y D ). In the context of Gaussian process models, the design relevant part of H(Y D ) is log(det(σ 2 R))/2, where R is the n × n correlation matrix whose (i, j)th entry is R(x i − x j |φ). Therefore, a maximum entropy design D maximizes the determinant of the covariance matrix of the set of responses y at the points in the design [2], max D det(σ 2 R).
Because of the independence between σ 2 and the design D, (2) is equivalent to Here the vector of correlation parameters, φ, in R are assumed to be known.

Construction of Nested Maximum Entropy Designs
In this section, we extend the maximum entropy design to the case of multiple layers and propose the corresponding construction algorithms.

Nested Maximum Entropy Designs
Nested designs with multiple layers are usually used for multi-fidelity computer experiments [11,12,22,24,25]. Assume that we have a computer experiment with K levels, and the accuracy declines gradually from level 1 to level K. For each k = 1, . . . , K, the Kriging model for computer experiments at the kth level of accuracy is where f(x i ) and β k are straightforward extensions to those in (1), and . , x n k } denotes the kth layer of the nested design for each k = 1, . . . , K and n 1 < · · · < n K . The vector s = (n 1 , . . . , n K ) represents the structure of D (K) . Please note that D (k) with smaller k is used for computer experiments with higher accuracy, which are more important. Similar to some definitions of optimal nested designs [11,22], we call D (K) = {x 1 , . . . , x n K } a nested maximum entropy design if the following conditions hold: the first layer D (1) is a maximum entropy design that maximizes det(R 1 ); for each By the above definition, a nested maximum entropy design can be constructed by a sequential algorithm; see Algorithm 1.

Algorithm 1 Construction of a nested maximum entropy design with K layers
Initialization: k = 1, randomly construct the first layer D (1) . Optimize D (1) using the maximum criteria to obtain D (1) best . Recursive step: is the kth layer of the design with the structure s. Maximize the entropy det(R k ) corresponding to D (k) by optimizing best . end for

Multi-Layer DETMAX Algorithm
This subsection presents optimization algorithms for constructing each layer of a nested maximum entropy design.
The maximum entropy design can be obtained by a DETMAX-based algorithm [19]. It is optimized through a series of "excursions" to improve the det(R) corresponding to the current design by adding or removing appropriate points from the current design until the det(R) for the resulting n-point design cannot be increased. Except for the initial and final designs which have exactly n points, the number of chosen points of any designs constructed on this excursion can be greater or less than n. We extend this algorithm to the multi-layer case.
The flow chart in Figure 1 describes the procedure of the multi-layer DETMAX algorithm. The layer-by-layer optimization strategy [11,22] is adopted here. First, the initial design of the first layer, denoted as D (1) 0 , is randomly generated and then optimized. If D (1) best obtained through a series of excursions meets the stopping condition, then the first layer has been completed. Subsequently, optimize the second layer with the first layer fixed. Each layer is optimized in turn until the last layer is optimized, and then output D Meet the stopping rule?
Continue to optimize the kth layer no yes Figure 1. The procedure of multi-layer DETMAX algorithm.
We give the details for optimizing each layer in the above algorithm. Suppose we now optimize the kth layer of the design D (k) . One excursion starts with the n k -point design and ends when the number of points in D (k) reaches exactly n k again. The procedure for making excursions is described as follows. Let denote a prespecified threshold.
Step 1. Add a point at which the variance function σ 2 0|D (k) is largest, or subtract a point corresponding to the maximum element of the diagonal of R −1 k .
Step 2. The current design D (k) has n k points. If n k > n k , remove a point if D (k) is not in F k and add a point otherwise. If n k < n k , add a point if D (k) is not in F k and remove a point otherwise. The new design updated by this step has n new k points and correlation matrix R new k .
Step 3. If n new , place all the designs generated on this excursion into F k . Go to Step 1.
In Step 1, the selection of whether to add or subtract a point is made randomly. The best point x 0 is obtained by maximizing σ 2 0|D (k) , which is given by To determine the best site x 0 to add to the current design, we adopt a grid search procedure [26] for p = 2, and the simulated annealing algorithm [23] for p ≥ 3 (see Algorithm 2). The point we subtract in Step 1 is best .

Algorithm 2 Simulated annealing in excursions
Step 0: Input the starting temperature T = T 0 > 0, the ending temperature T end > 0, the length of Markov chain L, search step size λ, Boltzmann's constant k 0 = 1, reduction factor α (0 < α < 1) and an initial solution x = x (0) , which is randomly generated Step 1: x new = x + λu, where u is generated by sampling random values from N p (0, 1).
Go back to Step 1.

end if end if end for
Step 3: T=αT. end while Step 4: Output the best solution x.
Since the design points are bounded, the determinant of the corresponding covariance matrix is bounded. According to the monotone convergence theorem, our algorithm can converge to a limit after considerable iteration times. However, because of the nonconvex feature of the problem like other design construction problems [27], the limit may not be the global solution. To better approximate the global solution, the above algorithm can be conducted repeatedly with several random initial designs, and the final output is the best one among the corresponding results. Several two-dimensional nested maximum entropy designs constructed by the proposed algorithm can be seen in Figure 2.

Sample-Size Determination of Multi-Accuracy Computer Experiments
A related issue to experimental design is sample-size determination. Both samplesize determination and experimental design should be implemented before we obtain data. Sample-size determination should be considered earlier than experimental design, since the latter is usually conducted with given sample sizes. The problem of sample-size determination in computer experiments has attracted much attention in the literature; see [28][29][30], among others. However, these studies focused on computer experiments with one level of accuracy, and there is little work for the case of more than one level. In this section, we propose a method to determine sample sizes of multi-accuracy computer experiments based on the entropy criterion.
There is no data available when we implement sample-size determination. For multifidelity computer experiments, we consider the maximum entropy of possible nested designs with different sample sizes. We first introduce a concept of integrative entropy, which is an extension to (3). For a K-layer nested maximum entropy design D with structure s = (n 1 , . . . , n K ), the integrative entropy of D is defined by where w k is a non-negative weight and R k is the correlation matrix of layer k for k = 1, . . . , K.
For a computer experiment with K levels of accuracy, let b k denote the cost at the kth level, k = 1, . . . , K. Assume that the total budget is B. We specify the sample sizes n 1 , . . . , n K through maximizing the integrative entropy under the budget constraint, i.e., solving the optimization problem, max En(n 1 , . . . , n K ), (5) s.t.
where N denotes the set of non-negative integers. The above optimization problem is a nonlinear knapsack problem [31]. There are many techniques for this problem such as the branch-and-bound algorithm, dynamic programming, and the decomposition method. Please note that the objective function in (5) is very complicated. We adopt the simulated annealing algorithm to solve this integer-programming problem since the algorithm possesses the features of avoiding local optimum, high flexibility, and good convergence properties. Due to the complexity of this problem, we select multiple initial points to run Algorithm 3, and output the solution corresponding to the greatest objective function value. Let randint(1, m) denote an integer randomly chosen from {1, . . . , m}, m ∈ N.

Illustrations
In this section, we provide several examples to illustrate the effectiveness of the proposed methods. Examples 1, 2, and 3 demonstrate the prediction performance of the proposed nested maximum entropy designs, which are constructed by applying the algorithm in Section 3. Example 4 presents an application of the proposed sample-size determination method in Section 4.
Here we consider the case of K = 2. In Examples 1-3, let D h = {x h 1 , . . . , x h n 1 } and D l = {x l 1 , . . . , x l n 2 } represent the design sets of the high-accuracy experiment (HE) with n 1 runs and the design sets of the low-accuracy experiment (LE) with n 2 runs, respectively. We denote the HE response associated with D h as y h , and the LE response associated with D l as y l . The prediction model in [32] is used, and the corresponding predictor of y h is denoted byŷ h . The FNLHD [11] is compared with the proposed entropy design. Prediction performance is evaluated with the empirical mean squared prediction error (MSPE), where {x 1 , . . . , x 10000 } is generated by Latin hypercube sampling ( [33]).

Concluding Remarks
In this paper, we have introduced a new class of nested designs, nested maximum entropy designs, for multi-fidelity computer experiments. Such designs possess flexible run numbers in each layer and can provide a considerable amount of information for prediction. A multi-layer DETMAX algorithm has been proposed to construct nested maximum entropy designs. The related maximum entropy criterion has been used to determine the sample sizes for each level of accuracy in multi-fidelity computer experiments.
There are some limitations of our work. Due to the complexity of the optimization problem with the entropy criterion, the proposed algorithms can only handle relatively simple cases, such as relatively low dimensions and relatively small run sizes. In addition, extensions of the proposed approaches can be made in several directions. First, the corresponding designs for finite design regions [35] can be studied in the future. Second, our methods can be modified to accommodate both qualitative and quantitative factors [9,36,37]. Third, sequential frameworks [21,32,38,39] for multi-fidelity computer experiments can be developed by the proposed entropy criterion.

Acknowledgments:
The authors gratefully acknowledge the editors and reviewers for their professional comments.

Conflicts of Interest:
The authors declare no conflict of interest.