A Surrogate Model Based Multi-Objective Optimization Method for Optical Imaging System

: An optimization model for the optical imaging system was established in this paper. It combined the modern design of experiments (DOE) method known as Latin hypercube sampling (LHS), Kriging surrogate model training, and the multi-objective optimization algorithm NSGA-III into the optimization of a triplet optical system. Compared with the methods that rely mainly on optical system simulation, this surrogate model-based multi-objective optimization method can achieve a high-accuracy result with signiﬁcantly improved optimization efﬁciency. Using this model, case studies were carried out for two-objective optimizations of a Cooke triplet optical system. The results showed that the weighted geometric spot diagram and the maximum ﬁeld curvature were reduced 5.32% and 11.59%, respectively, in the ﬁrst case. In the second case, where the initial parameters were already optimized by Code-V, this model further reduced the weighted geometric spot diagram and the maximum ﬁeld curvature by another 3.53% and 4.33%, respectively. The imaging quality in both cases was considerably improved compared with the initial design, indicating that the model is suitable for the optimal design of an optical system.


Introduction
The optimal design of an optical imaging system [1,2] is a vital problem for designing modern complex optical equipment. In the past, the optimization of optical systems relied mainly on the engineers' experience, which can only provide very limited guidance for the optimal design of some modern optical equipment. The design space of an optical imaging system is decided by the number and the variance of desired design parameters. With the optical system becoming more and more complex, the modern optimal design has developed into a process heavily relying on computer calculation to find an optimal point/solution in a design space with high dimensionality. The difficulty of this process is often affected by the number of required design parameters, the variance of these parameters, and the acceptable tolerance of the optimal design solution.
With the increase in a design space's dimension, the successful search for an optimal point for both local and global optimization algorithms becomes more and more difficult. With a rather high-level space dimension, the local search tends to fall into those suboptimal designs, and the number of calculations required for a global search would increase exponentially. The computer power and existing algorithms commonly used for copying the single-order improvements may not be sufficient to prevent the so-called "curse of dimensionality" caused by the amount of calculation generated by the high-level dimension [3]. Therefore, optimal optical designs are limited to either a finite search in the global space or gradient-based searches that tend to become stuck in local optima. The current revival of surrogate models-based optimization algorithms [4][5][6] provides a possibility of overcoming the "curse of dimensionality". Currently, surrogate models have been used in the design of optical thin films [7], nanostructures [8][9][10][11], and meta-surfaces [12][13][14] and have shown some promising results.
For complex systems, the calculation cost is very high in the case of many parameters to be optimized and objective functions (dimension disaster). This paper proposes a multi-objective optimization method based on the Kriging surrogate model [15][16][17] for an optical imaging system; for a specific system, it greatly reduces the calculation cost in the optimization process and helps to conduct a more comprehensive search in the global design space. This surrogate model has a large number of applications in aerodynamics [18], weather prediction [19], and the structural reliability [20] of aircraft. Compared with the conventional methods, which mainly rely on optical system simulation using ray-tracingbased programs, the surrogate model-based method can greatly reduce the calculation cost and provide a possibility of using the saved computer power for more comprehensive searches in the design space. Section 2 of this paper will introduce the methodology used in the proposed model. The process flow of the model and the methods involved, including the experimental design method, surrogate model, and multi-objective optimization algorithm, will be introduced in detail in this section. Two case studies using the proposed method to optimize the design of a Cooke triplet system were carried out, and the results are presented in Section 3. A conclusion is given in Section 4.

Overview of Optical Imaging System Design
The optical imaging system usually consists of a series of well-designed sequential lenses with constraints in manufacturing, physical size, tolerances, and cost. The excellent performance of the system is typically realized through a careful iterative process, including the definition of performance objectives and optical constraints, construction and minimization of an appropriate merit function comprising these objectives, and constraints to realize the optimum design of the optical system, and then a prediction of the realized performance with a tolerance analysis of the design [21]. The aim of the optimum design of the optical lens under several physical and system constraints is to obtain a series of optimal lens variables with a satisfactory optical performance, such as a low aberration. Optimal variables in lens design include targets of the lens, such as element material, surface curvatures, surface aspherical coefficients, element thicknesses, and spacings.
A merit function in the optical design procedure is defined as the measure of optical quality, typically with zero indicating "perfection" of the optical system. The value of the merit function is calculated through the process of ray tracing and optical analyses in an optical system. Computers became widely used in optical design because of the high computational complexity of ray tracing [22,23]. However, the guidance and intervention of competent users is critical in achieving an optimized and well-balanced design solution; even modern high-speed computers with extreme processing power can be applied in the design process [24].
With high-order aspherical surfaces or more optimization variables implemented in modern lens designs processes, the optimization process is becoming further sophisticated with new techniques, such as integrating manufacturing tolerances into optimization in order to achieve minimal performance degradation with as-built lenses [25,26] or incorporating computational photography steps into the lens design stage [27][28][29]. In general, optimization algorithms applied to optical systems can be divided into classical gradientbased optimization algorithms based on the least-squares (LS) method [30][31][32][33][34][35] and modern optimization algorithms based on the analogy with natural evolution.
The application of the classical LS method in the optimization of optical systems was first proposed by Rosen and Eldert [30]; since then, a considerable number of researchers have applied or modified this method in different fields. The appealing reason for the application of the LS method in the merit function is the preservation of the information relating to the distribution of the various aberrations. Kidger [36] defined a value, referred to as a step length, with the target of controlling and limiting the changes of the constructional parameters in the optical system, and formed the damped least-squares (DLS) method. After that, numerous methods, including altering the additive damping in the DLS into multiplicative damping [31], were proposed to improve the convergence of the DLS. Except for the LS methods, Spencer [37] has specified that computers could only be regarded as a tool capable of offering optical designers temporary solutions because qualitative judgments and compromises were required in the optimization of optical systems. A novel concept of aberrations brought up by David S. Grey [38,39] is prominent, and this is principally due to the practical realization of his computer program, where a novel orthonormal theory of aberrations was applied in the optimization of optical systems. Moreover, the orthonormalization in this theory was improved through the Gram-Schmidt transformation proposed by Pegis et al. [40]. The fundamental ideas forming the concept of simulated annealing originated from Metropolis et al. [41] and were suggested by Gelatt et al. [42] to be used as an optimization method in various systems, such as optical. Glatzel's adaptive optimization method, described by Glatzel and Wilson [43] and Rayces [44], is the first optimization method where the number of aberrations is smaller than that of variable constructional parameters.
Modern evolutionary optimization algorithms primarily comprise genetic algorithms (GAs) and evolution strategies (ESs). GAs can be applied to solve complicated search and optimization problems with the implementation of adaptive methods, which are mainly based on a simplified genetic processes simulation [45][46][47]. The simple genetic algorithm (SGA) proposed by Goldberg [48] only consists of the most fundamental elements that every genetic algorithm must have. These elements include the individual population, the individual's merit function selection, the crossover to create a new progeny, and the arbitrary mutation of a new progeny. The adaptive steady-state genetic algorithm used for the construction of the genetic algorithm for the optimization of optical systems was defined by Davis [49], and each genetic algorithm consists of three modules: the evaluation module, the population module, and the reproduction module. Evolution strategies (ESs) were developed by Schwefel [50] with the target of solving parameter optimization problems and mainly consist of the two-membered evolution strategy and the multimembered evolution strategy algorithms that mimic the natural selection principle.
One of the most essential differences between classical and modern optimization algorithms is the optimum searched by these approaches; the classical optimization algorithms can only search for local optimal results, while the modern algorithms attempt to search for the global optimum. The theory behind the optimum difference is that the classical optimization algorithms do not allow for the deterioration of the merit function, so they cannot escape the first local optimum they find. As for the evolutionary algorithms, even though they cannot find the global optimum all the time, they can find adequately good results close to the global optimum [51].
In addition to the above-mentioned approaches, the study of applying machine learning based on deep neural networks (DNNs) [4,[52][53][54][55] in optical system design became prominent in recent years. Yang et al. [52] demonstrated that the approach of neural network-based deep learning can immediately generate a good starting point in freeform reflective imaging systems. Hegde [4] has proven that the combination of applying DNN as a surrogate model and optical optimization can improve the efficiency of optimization, with a 90% decrease in evaluated function budget compared with optimization without a surrogate model. After that, Hegde [53] extended his work into the field of deep convolutional neural networks (CNNs) and proved that the trained networks can reach a much faster convergence in solving inverse scattering global optimization problems.
In this paper, a Cooke triplet lens is implemented for the optimization problem. Even with only three lenses, the optimization problem related to the curvature of surfaces, thickness of element and airspace, and selection of element glass is not trivial. Moreover, optimization of the triplet offers constructive insight concerning the characteristics of appropriate optimization algorithms.

Overview of Surrogate-Based Modelling
A surrogate model is also referred to as a "metamodel", "response surface model", "approximation model", or "emulator" in different research fields. In complex computer simulations, finding more data requires additional experiments, which would result in extensive material or economic cost as well as computational expense. Consequently, obtaining an analytical form of derivatives or the objective function is relatively challenging. However, the derivation of the information from a surrogate model is comparatively easier, as the analytical form is known and, hence, is cheaper to evaluate. Building through the sampled data that are obtained by evaluating a set of sample points in the target space via expensive analysis code, a surrogate model can be used to efficiently predict the output of the code at any unknown point [56].
According to Anthony et al. [67] and Balabanov and Haftka [68], PRSM can be applied in aircraft design. Kriging is based on the idea that a surrogate can be represented as a realization of a stochastic process. This idea was first proposed in the field of geostatistics by Krige [69] and Matheron [70]. It gained popularity after being used for the design and analysis of computer experiments by Sacks, Welch, Mitchell, and Wynn [60]. Kriging is also known as a Gaussian process regression in the field of machine learning [71,72]. Kriging is used for process flowsheet simulations [73], design simulations [74], pharmaceutical process simulations [75], and feasibility analysis [76]. Radial basis functions have been developed for the interpolation of scattered multivariate data. RBFs are used for feasibility analysis [77] and parameter estimation [78]. ANN is used for process modelling [79], process control [80], and optimization [81,82]. SVR is shown to achieve comparable accuracy with that of other surrogates [83]. SVR models are accurate as well as fast in prediction; however, the time required to build this model is high because finding the unknown parameters requires solving a quadratic programming problem. This added complexity hinders the popularity of SVR [6].
Among them, Kriging has earned popularity in the fields of aerodynamic design optimization [84][85][86][87][88] and structural and multidisciplinary optimization [89,90]. Generally, geostatistical interpolation methods that calculate the spatial autocorrelation between measurements and utilize the spatial structure of measurements around the prediction location comprise universal Kriging, ordinary Kriging, and co-Kriging [91]. Isotropy (uniform values in all directions) is assumed during the Kriging process unless anisotropy is specified. Consequently, comparisons between isotropic and anisotropic semi-variogramderived surfaces are not often made. Thus far, the application of anisotropy within Kriging has been shown to be superfluous for local-and regional-scale modelling, although Luo et al. [90] hypothesized that it may be more useful for meso-and macro-scale modelling.
According to the properties of surrogate-based models, Kriging is quite suitable for the multi-objective optimization of optical systems with high dimensions; hence, in this paper, the surrogate-based model applied to the triplet is Kriging.

Design of Experiments (DOE)
Defined as a process for choosing a series of sample points in the design space and with a general target of gaining maximum information from a constrained set of samples, design of experiments (DOE) can be divided into two categories: classical and modern techniques. The classical DOE originated from the random error that exists in a nonrepeatable laboratory experiment (e.g., experimental chemistry and agricultural yield studies), while modern DOE, which includes the deterministic computer simulations, can eliminate the influence of non-repeatability. Therefore, to provide a more convincing result with non-repeatable experiments, classical DOE approaches mainly involve designs of fractional-factorial [92,93], full-factorial [94], Box-Behnken [95], and central composite [96], which normally locate sample points at the boundaries of the target space. In order to obtain the tendency of information accurately, modern DOE primarily employs space-filling designs, and the approaches in modern DOE mainly include Latin hypercube sampling (LHS) [97,98], pseudo-Monte Carlo sampling [99], quasi-Monte Carlo sampling [100], and orthogonal array sampling [101].
Modern DOE is also distinguished from classical DOE in the aspect of choosing the probability distribution functions of design parameters. In modern DOE, the probability of design parameters can be distributed uniformly and non-uniformly (e.g., Gaussian, Weibull); on the contrary, the possible values of a design parameter in classical DOE are typically assumed to be distributed uniformly between the lower and upper extremes. Additionally, the data generated in the design and analysis of computer experiments (DACE) [6,[102][103][104][105] study of an optical imaging system can be applied in surrogate functions, normally expressed as response surface approximations [106], to assist the optimization process. Considering the complex relationships among input design parameters and imaging quality in the design of optical imaging systems, the independent sample points in the design and analysis of computer experiments (DACE) make it possible to utilize parallel computing, either on a multiprocessor computer or over a network [107].
Providentially, a perennial study in mathematical formulation leveraged by the progress in computer power enabled techniques developed for DACE to be successfully employed in various problems (e.g., design of energy and aerospace [108][109][110] systems, manufacturing [111], bioengineering [112,113], and decision under uncertainty [114]). Such techniques comprise a series of methodologies for generating a surrogate model, which can be used to substitute the expensive simulation code. The aim is to build an estimate of the response that is as accurate as possible under a limited number of expensive simulations [115].
Among the modern DOE methods, Metropolis and Ulam [99] first applied pseudo-Monte Carlo sampling into the field of computer simulations in 1949, with the utilization of a pseudo-random number generation algorithm aimed to imitate an indeed random natural procedure. Pseudo-Monte Carlo sampling, also known as Monte Carlo (MC) sampling, is suitable for convex but not rectangular design spaces, whereas the employment in high-dimensional and non-convex design spaces is rather difficult.
Quasi-Monte Carlo sampling [100], also named low-discrepancy sampling, has a common characteristic with pseudo-Monte Carlo sampling in that both approaches were developed for multidimensional integration. One of the fundamental differences between them is that quasi-Monte Carlo sampling can almost generate uniform samplings in a high-dimensional space with the employment of a deterministic algorithm [116]. Stemming from MC sampling, the stratified Monte Carlo sampling method [117] can create a more uniform sampling and offer superior overall coverage of the design space.
Developed by McKay et al. [118] as a substitute for pseudo-Monte Carlo sampling, Latin hypercube sampling (LHS) is one of the most widely and prevalently used spacefilling methods for DOE. Under certain assumptions associated with the function to be sampled, Latin hypercube sampling provides a more accurate estimate of the mean value of the function than does MC sampling. As a result, the LHS can estimate less error in the mean value than the mean value estimated with MC sampling, under the condition of an equal number of samples. Another attractive aspect of the Latin hypercube design is that it allows the user to tailor the number of samples to the available computational budget. That is, a Latin hypercube design can be configured with any number of samples and is not restricted to sample sizes that are specific multiples or powers of n.
However, with a considerable number of design variables, it is challenging for the Latin hypercube design to provide a good coverage of the entire high-dimensional design space. In order to break this curse of dimensionality, constructing space-filling designs in low-dimensional projections is a promising approach. Such approaches comprise randomized orthogonal arrays [117], orthogonal array-based Latin hypercube designs [118], and the construction of orthogonal Latin hypercube designs [119]. The introduction of orthogonality into the Latin hypercube design is directly beneficial in fitting data with polynomial models. In addition, orthogonality can be considered as a stepping-stone to designs that are spacefilling in low-dimensional projections [120].
Latin hypercube designs [98,121] have become particularly popular among all strategies mentioned above for computer experiments. According to Viana [115], the Latin hypercube design has a close growth rate in publications with DACE. Further evidence of the popularity of the Latin hypercube design is the number and diversity of the reported applications in which the LHS is used. For example, with the dedication of evaluating applications of surrogate modeling, the Latin hypercube design appears in eight out of the sixteen chapters in the book edited by Koziel and Leifsson [122]. On account of the advantages and popularity of the Latin hypercube design, it was chosen as the DACE method in this paper.

Multi-Objective Optimization (MOO)
In practical engineering, problems encountered by engineers with multiple objectives are known as multi-objective problems (MOPs), and MOPs with at least four objectives are casually known as many-objective problems (MaOPs) [123]. Multi-objective evolutionary algorithms (MOEAs) are typically applied to solve MOPs, which can be divided into decomposition-based [123][124][125][126], indicator-based [127,128], and Pareto-based [129][130][131] algorithms. However, it should be pointed out that the MOEAs confront three challenges when handling MaOPs, namely, dominance resistance (DR) phenomenon, dimensional curse, and visualization difficulty [132]. To solve the first challenge efficiently, three methods have been introduced, including modification of the Pareto dominance relation, an indicator-based approach, and enhanced diversity management [133].
Even though these methods can deal with MOPs effectively, there are still high computational burdens. The third approach for MaOPs is to enhance diversity management. For example, the NSGA-II [134] algorithm managed the activation and deactivation of the crowding distance to maintain diversity. As one of the Pareto-based algorithms, NSGA-III [135,136] achieved great success in practical application, which replaced the crowding distance operator in the NSGA-II with a clustering operator and used a set of well-distributed reference points to guarantee diversity. Although the NSGA-III algorithm can achieve good diversity, its performance needs to be improved by remedying deficiency or expanding application.
In this paper, the multi-objective algorithm NSGA-III is adopted in the model proposed. NSGA-III has been widely applied to different areas, such as the economic dispatch problem [137] and the ship hull form optimization [138]. This algorithm does not need to convert multiple targets into a single one. It can directly optimize multiple targets at the same time and provide a non-dominated solution set as output. From this solution set, designers can search for the optimal solutions according to their optimization focus and strategy. The multi-objective optimization method proposed in this paper has great potential to be used in the design process of complex high-precision optical systems [15,135,136].

Methodology
Compared with the above-mentioned methods/models, the surrogate model-based multi-objective optimization method presented in this paper is a data-driven method. It trains the surrogate model using a relatively small set of sample data before optimizing the design. Sample points are chosen using the DOE algorithms, and the data set can then be obtained by simulation at sample points using optical tracking methods, such as ray tracing. Benefiting from the surrogate model, this method has a low calculation cost. This is especially useful when the amount of calculation is very high, such as for the optimization of multiple objectives in a design space with high dimensionality. Figure 1 shows the specific process steps of the method proposed in this paper, including:

1.
Decide on design parameters, including their ranges: the key design parameters that affect the performance of the optical system need to be decided first.

2.
Experimental design: based on the number and range of parameters given in step 1, DOE needs to be carried out to decide the sample points in the design space and provide information including the number of samples and their distribution.

3.
Sample points calculation: the ray-tracing-based program is then used to complete the calculation at each sample point and provide the interested targets required in the optimization. 4.
Surrogate model training: the surrogate model can be trained using the output from the sample points in step 3. The accuracy of the trained model is estimated, and more sample points are required if the accuracy cannot meet the requirements. 5.
Multi-objective optimization design: the multi-objective optimization algorithm is used at this step to optimize the design based on the prediction of the surrogate model and provide the final Pareto solution set as output. 6.
Decision making: the final optimal design can then be chosen from the Pareto solution set depending on the desired design focus and strategy.
be obtained by simulation at sample points using optical tracking methods, such as ray tracing. Benefiting from the surrogate model, this method has a low calculation cost. This is especially useful when the amount of calculation is very high, such as for the optimization of multiple objectives in a design space with high dimensionality. Figure 1 shows the specific process steps of the method proposed in this paper, including:

Process Flow
1. Decide on design parameters, including their ranges: the key design parameters that affect the performance of the optical system need to be decided first.

DOE Method
For optical imaging system design, the relationships among input design parameters and imaging quality are very complex. It would be prohibitively time-consuming to perform all the possible computer experiments in order to comprehend these relationships or find the optimal design. The statistical design of experiments is a technique that can be used to design a limited number of samples that could reflect the design space information.
For the conventional optical imaging system, the number of design parameters is in the range of 10 1 to 10 2 . Among the experimental design methods for computer

DOE Method
For optical imaging system design, the relationships among input design parameters and imaging quality are very complex. It would be prohibitively time-consuming to perform all the possible computer experiments in order to comprehend these relationships or find the optimal design. The statistical design of experiments is a technique that can be used to design a limited number of samples that could reflect the design space information.
For the conventional optical imaging system, the number of design parameters is in the range of 10 1 to 10 2 . Among the experimental design methods for computer experiments discussed in Section 1.3, the Latin hypercube design was applied in this model. As a modern and popular method for space-filling experimental design, LHS is a type of stratified Monte Carlo (MC), which allows the experimental designer total freedom in selecting the number of designs to run (as long as it is greater than the number of parameters). The Latin hypercube design is suitable for computer experiments with considerably large dimensions of the design space and has the advantage that the number of samples is not limited by the number of design parameters. Its operation process is simple and flexible and meets the requirement of reducing the sample scale in the case of a large number of design parameters. At present, the Latin hypercube design has become particularly popular among strategies for computer experiments [98].
In view of the above-mentioned advantages, LHS was chosen as the DOE algorithm for the computer experiment of the optical imaging system. The Latin hypercube design requires the designer to specify the number of parameters and their ranges, as well as the number of sample points to run. Assuming that the dimension of the design space is n, the number of sample points to be extracted is ns, and the value range in a certain dimension is x ∈ [l i , u i ](i = 1, 2, . . . , n), where, l i is the lower limit for the i-th parameter, and u i is the upper limit for the i-th parameter. The main steps of the LHS experimental design are as follows:

1.
Give the scale of sampling ns.

2.
Divide the value range [l i , u i ] of each dimension parameter x i into ns intervals equally, then the design space can be divided into ns × n sub-areas.

3.
Randomly generate a matrix X with the order of ns × n. Each column of this matrix is a random arrangement from 1 to ns (elements are X i,j , i = 1 · · · ns, j = 1 · · · n, which are random integers in the range from 1 to ns). The matrix X is called a Latin hypercube.

4.
Each row of the matrix X corresponds to a selected small hypercube, which is a sample point. The normalized value of the i-th sample point for the j-th parameter can be calculated as: x i,j = X i,j − 0.5 /ns. The actual value of the parameter for each sample point can be obtained by mapping x i,j into the design space considering the actual range. experiments [98].
In view of the above-mentioned advantages, LHS was chosen as the DOE algorithm for the computer experiment of the optical imaging system. The Latin hypercube design requires the designer to specify the number of parameters and their ranges, as well as the number of sample points to run. Assuming that the dimension of the design space is n, the number of sample points to be extracted is , and the value range in a certain dimension is ∈ [ , ]( = 1,2, … , ), where, is the lower limit for the i-th parameter, and is the upper limit for the i-th parameter. The main steps of the LHS experimental design are as follows: 1. Give the scale of sampling . 2. Divide the value range [ , ] of each dimension parameter into intervals equally, then the design space can be divided into × sub-areas. 3. Randomly generate a matrix with the order of × . Each column of this matrix is a random arrangement from 1 to (elements are , , = 1 ⋯ , = 1 ⋯ , which are random integers in the range from 1 to ). The matrix is called a Latin hypercube. 4. Each row of the matrix corresponds to a selected small hypercube, which is a sample point. The normalized value of the i-th sample point for the j-th parameter can be calculated as: , = ( , − 0.5)/ .
The actual value of the parameter for each sample point can be obtained by mapping , into the design space considering the actual range. Figure 2 shows the results of extracting 200 sample points in a two-dimensional space (each dimension range is [0, 1]) and 500 sample points in a three-dimensional space (each dimension range is [0, 1]) using the LHS method. The LHS method can guarantee that the number of projections of samples in each dimension of the design parameters is equal to the number of samples, and the projections have uniform distribution in each dimension.

Kriging Surrogate Model
The Kriging surrogate model originated in the areas of mining and geostatistics, which involve temporally and spatially correlated data. The unique characteristic of Kriging stems from its ability to combine global and local modeling. The Kriging surrogate model is one of the unbiased models with the smallest estimation variance, which could provide efficient and reliable prediction. Extensive reviews of the Kriging model used in simulation, sensitivity analysis, and optimization in the design process can be found in [139]. Due to its high accuracy and good performance for complex nonlinear problems, the Kriging surrogate model was chosen as the surrogate model to provide prediction for the imaging quality of the optical imaging system in this paper.
The Kriging model consists of two parts, the regression model and the stochastic process. The regression model represents the global tendency of the analyzed function, and the stochastic process represents the spatial correlations in the design space of interest [140].
where x is n-dimensional vector, y is the unknown function of β i f i (regression model) and Z(x) (stochastic process). The regression models with polynomials of orders 0, 1 and 2 were adopted here and detailed in Table 1. Table 1. Regression models.

Orders
Number . . , n, j = 1, . . . , n Z(x) represents a local deviation from the regression model and is the realization of a stationary, normally distributed Gauss random process with zero mean, variance, and non-zero covariance. The covariance matrix of Z(x) is given by: where σ 2 is the process variance, and R is an ns × ns symmetric correlation matrix. In addition, R x (i) , x (j) is the spatial correlation function between any two points x (i) and x (j) of ns sample points. A popular Gaussian correlation function is used here, and the function can be expressed: where θ k is the k th element of the correlation vector parameter θ. The regression term ∑ k i=1 β i f i (x) can choose a constant value, a linear model, or a quadratic model. In this paper, the quadratic model was adopted here. In the implementations, x is normalized by subtracting the mean from each variable, and then dividing the values of each variable by its standard deviation: The Kriging predictor is:ŷ where F is a matrix that can be written as: When the order of the regression models is 0, F is a column vector of length ns filled with ones. Y the column vector with responses of sample points, and r(x) is the correlation vector, which can be written as: For a give parameter θ,μ andσ 2 can be calculated as: The uncertainty of the predicted value of the Kriging model can be expressed as: Due toμ,σ 2 , r(x), and correlation matrix R being dependent on the parameter θ, the Kriging model is trained by finding a parameter θ that maximizes the following likelihood function. Unlike the deep neural networks, the goodness-of-fit of the Kriging model is not clearly defined. For the Kriging model, the value of ln(Likelihood) has a similar effect to the goodness-of-fit. A larger value of ln(Likelihood) represents a better-fitting effect of the Kriging model.
The process to find a parameter θ that maximizes the likelihood function is to solve an unconstrained optimization problem. For the Kriging model in the present paper, optimization algorithms such as the Genetic Algorithm (GA) [141], Particle Swarm Optimization (PSO) [142], and the pattern search algorithm [143] are used for this purpose. The GA and the PSO was chosen when the dimension was lower than 10 [144].
To verify the reliability of the surrogate model, it is important to test the model using test sample points, based on different error evaluation criteria, such as the average relative error, root-mean-square error, and correlation coefficient. The definition formula of these criteria are as follows: Correlation Coe f f icient =

NSGA-III Multi-Objective Optimization Algorithm
Most multi-objective optimization algorithms using evolutionary optimization methods have demonstrated their efficiency in various practical problems involving mostly two and three objectives. There is a growing need for developing multi-objective optimization algorithms for handling optimization problems with more objectives. The multi-objective optimization algorithm used in this paper is the Non-dominated Sorting Genetic Algorithm III (NSGA-III) [136], which is an upgrade from NSGA-II [134]. NSGA-III is a referencepoint-based many-objective evolutionary algorithm that emphasizes population members that are non-dominated, yet close to a set of supplied reference points.
The framework of NSGA-III is basically the same as that of NSGA-II. The biggest change in NSGA-III is the use of the well-distributed reference points to maintain a good diversity of the population. Therefore, it shows a better diversity and convergence. It also uses the simulated binary crossover (SBX) [145], mutation operator (polynomial mutation), and Pareto sorting in the process and selects population in the key layer L, using a niching algorithm rather than the crowding distance method used in NSGA-II. In order to deal with the constraint problem, the model used here also adopted the penalty method, which means a certain penalty value would be added to the individual for triggering the constraint depending on its adaptability.
The steps of using the NSGA-III algorithm are as follows: 1.
Generate the initial population P o , which contains N randomly generated individuals.

2.
Conduct binary competition selection, simulated binary crossover, and mutation operations on individuals in the initial population to generate N offspring populations Q o .

3.
Merge the parent and child populations. The number of individuals in the new population is 2N.

4.
Apply a fast non-dominated sorting technique on the population to obtain the individuals' order and carry on selecting the next generation population P1.

5.
Decide whether the conditions have been reached for terminating the iteration. If it is, output the individuals; otherwise, go to step 2.

Case Studies of a Cooke Triplet System
Two case studies focusing on a Cooke triplet optical system were carried out using the method introduced in Section 2. In the first case, the optimization was carried out on a classic Cooke triplet system simply to prove that the model proposed can be applied to an optical system. The second case starts the optimization from a system that has been optimized using a commercial software, CODE-V (version: codev 10.8) [146], in order to show that the model can further improve the results.

Case 1
A Cooke triplet system that consists of three lenses was used as the optimization design subject, as shown in Figure 3. The geometric shape and the distance between the lens were selected as the design parameters. The optimization's objectives were to minimize the maximum field curvature (DIS) and the geometric spot diagram (RMS) of the Cooke triplet system. The front and back curvature, thickness, and spacing of each lens were chosen as the design parameters. In total, there were 12 of them (not including D1, which is the distance to the system's origin), as shown in Figure 4. The initial design parameters from which the optimization started are listed in Table 2.         Since there are two objectives, minimizing DIS and RMS, this was a two-objective optimization. Here, DIS was treated as a single value, while the three RMS were combined into one by using weighting factors, as seen in Equations (14) and (15). The weighting factors w 1 , w 2 , and w 3 used here were 0.3, 0.35, and 0.35, respectively. Aim2 The range of variation for each design parameter was set as ±1% of the initial value. The LHS method was used to choose 1200 sample points within this 12-dimension design space. Figure 5 shows the projections of these sample points in a 2D space (S1 X S2) and a 3D space (D2 X D3 X D4). Appl The range of variation for each design parameter was set as ±1% of the initial value. The LHS method was used to choose 1200 sample points within this 12-dimension design space. Figure 5 shows the projections of these sample points in a 2D space (S1 X S2) and a 3D space (D2 X D3 X D4).
A commercial software, CODE-V [146], was used to carry out the calculation at these sample points using a ray-tracing-based method and to provide the DIS and RMS at these sample points. Of the sample results, 95% were randomly selected as the training data for the Kriging surrogate model, and the remaining 5% were used for testing. Table 3 shows the evaluation results of each target value based on the 5% testing samples. From the table, the Correlation Coefficients are all close to 1. The Relative Errors are less than 1% except RMS1, which is 5%. Multi-objective optimization was conducted using the NSGA-III algorithm. The A commercial software, CODE-V [146], was used to carry out the calculation at these sample points using a ray-tracing-based method and to provide the DIS and RMS at these sample points. Of the sample results, 95% were randomly selected as the training data for the Kriging surrogate model, and the remaining 5% were used for testing. Table 3 shows the evaluation results of each target value based on the 5% testing samples. From the table, the Correlation Coefficients are all close to 1. The Relative Errors are less than 1% except RMS1, which is 5%. Multi-objective optimization was conducted using the NSGA-III algorithm. The population number was set at 1000, and the evolutionary generation was set at 2000. Figure 6 shows the Pareto frontier at different evolution steps in the evolution process. The shape of the frontier tends to stabilize after about 100 generations (last is 2000). The final Pareto frontier solution set and the initial design are shown in the Figure 7. Since this was a multi-objective analysis, the final result was not unique but a set of nondominated solutions. It is very obvious that the optimization process significantly reduced both the weighted RMS and DIS. The final optimal solution can be chosen from the solution set depending on the design focus and strategy. For example, the strategy here was to minimize the weighted RMS providing the DIS an acceptable level. Here, the level was set at 1.10. One final solution, shown as a blue star in Figure 7, can then be chosen from the solution set.
A comparison of the DIS and RMS before and after the optimization is shown in Table   Figure 6. Pareto frontier evolution process.
The final Pareto frontier solution set and the initial design are shown in the Figure 7. Since this was a multi-objective analysis, the final result was not unique but a set of nondominated solutions. It is very obvious that the optimization process significantly reduced both the weighted RMS and DIS. The final optimal solution can be chosen from the solution set depending on the design focus and strategy. The final Pareto frontier solution set and the initial design are shown in the Figure 7. Since this was a multi-objective analysis, the final result was not unique but a set of nondominated solutions. It is very obvious that the optimization process significantly reduced both the weighted RMS and DIS. The final optimal solution can be chosen from the solution set depending on the design focus and strategy. For example, the strategy here was to minimize the weighted RMS providing the DIS an acceptable level. Here, the level was set at 1.10. One final solution, shown as a blue star in Figure 7, can then be chosen from the solution set.
A comparison of the DIS and RMS before and after the optimization is shown in Table   Figure 7. Comparison of the Pareto solution set and its initial state.
For example, the strategy here was to minimize the weighted RMS providing the DIS an acceptable level. Here, the level was set at 1.10. One final solution, shown as a blue star in Figure 7, can then be chosen from the solution set.
A comparison of the DIS and RMS before and after the optimization is shown in Table 4. The optimized solution reduced RMS by 5.32% and DIS by 11.59%. It significantly improved the performance of the Cooke triplet from its original design. The values of the 12 design parameters before and after the optimization are listed in Table 5. Since the optimized solution was obtained from the Kriging surrogate model, not from an actual calculation, it was put into CODE-V for an actual calculation as a double check. The results are shown in Table 6. The deviation for DIS and weighted RMS between the optimized solution and the CODE-V calculation is less than 0.5%. The maximum deviation for an individual RMS is 3.7%.

Case 2
Since CODE V has its own built-in optimization module and it has been used as an industrial standard, a case study was carried out to show that the model presented here can further improve the CODE-V's optimization result. The CODE-V optimized values are shown in Tables 7 and 8. These were used as the starting point of the optimization process in Case 2. The other settings were all the same as Case 1. Based on the prediction of the Kriging surrogate model for the testing data, the Correlation Coefficient of the prediction results were all greater than 0.971, and the Relative Errors were less than 3% except RMS1, which was 5.6%.
Multi-objective optimization was conducted using the NSGA-III algorithm, with the setting of 1000 population and 2000 evolutionary generations. Figure 8 shows the final Pareto frontier solution set with the initial state. Appl The other settings were all the same as Case 1. Based on the prediction of the Kriging surrogate model for the testing data, the Correlation Coefficient of the prediction results were all greater than 0.971, and the Relative Errors were less than 3% except RMS1, which was 5.6%.
Multi-objective optimization was conducted using the NSGA-III algorithm, with the setting of 1000 population and 2000 evolutionary generations. Figure 8 shows the final Pareto frontier solution set with the initial state. As seen from Figure 8, although CODE-V has optimized its output, the model presented here can still further improve the optimization design. If the DIS value is chosen as 0.63 as an acceptable value, the final optimization solution can be obtained from the As seen from Figure 8, although CODE-V has optimized its output, the model presented here can still further improve the optimization design. If the DIS value is chosen as 0.63 as an acceptable value, the final optimization solution can be obtained from the solution set. The design parameters and targeted values before and after the optimization are listed in Tables 9 and 10. The optimized solution further improved the performance of the Cooke triplet, with a 3.53% reduction in weighted RMS and a 4.33% reduction in the DIS.

Conclusions
An optimization model based on a surrogate model and a multi-objective optimization algorithm for an optical imaging system was established in this paper. The use of a surrogate model can significantly reduce the calculation cost but still keep a high level of accuracy, especially when the design space has a large dimension. Another advantage of this model is the ability to optimize multiple objectives simultaneously during the optimization process. This is achieved by using a multi-objective optimization algorithm. With the surrogate model and the multi-objective optimization algorithm, this model can significantly improve the efficiency of optical design.
Two case studies of optimizing a Cooke triplet optical system were carried out with twelve design parameters and two optimization objectives: Case 1 showed that the optimized result from the model significantly improved the imaging quality of the initial design, with a reduction of 5.32% in RMS and 11.59% in DIS. Further verification conducted using CODE-V showed that the deviation from an actual calculation was less than 0.5%. Case 2 used an optimized result from CODE-V as the starting point and showed that the optimization from the model presented further reduced the weighted RMS by 3.53% and the DIS by 4.33%.
As a result, the model presented in this paper is suitable for the optimization of optical system design, and it can further improve the optimization results from CODE-V. It has great potential to be used in the design process of complex high-precision optical systems.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The study did not report any data.

Conflicts of Interest:
The authors declare no conflict of interest.