Towards a Better Basis Search through a Surrogate Model-Based Epistasis Minimization for Pseudo-Boolean Optimization

: Epistasis, which indicates the difﬁculty of a problem, can be used to evaluate the basis of the space in which the problem lies. However, calculating epistasis may be challenging as it requires all solutions to be searched. In this study, a method for constructing a surrogate model, based on deep neural networks, that estimates epistasis is proposed for basis evaluation. The proposed method is applied to the Variant-OneMax problem and the NK -landscape problem. The method is able to make successful estimations on a similar level to basis evaluation based on actual epistasis, while signiﬁcantly reducing the computation time. In addition, when compared to the epistasis-based basis evaluation, the proposed method is found to be more efﬁcient.


Introduction
In terms of computation, various challenges such as black-box problems still exist. Multiple attempts have been made to resolve the difficulties by constructing surrogate models based on deep learning [1,2].
When we use a basis other than the standard basis, the structure of problem space can be quite different from the original one. Mbarek et al. [3,4] changed the standard basis of a vector space to ensure the efficient performance of an algorithm. Seo et al. [5] modified problem space by nontrivial encoding through a method changing the standard basis. The effects of basis change on a genetic algorithm (GA) in binary encoding have been investigated [6]. Furthermore, it has been shown that using this method, problem space can be fundamentally changed in graph problems, the performance of GAs gets affected. There were several studies that measured the problem difficulty from the view of epistasis [7,8]. To search a basis to smooth the ruggedness of the problem space, Lee and Kim [9,10] proposed a method that used a meta-GA and an epistasis-based basis evaluation method which largely reduced computational time over the meta-GA. In order to calculate epistasis accurately, all feasible solutions need to be first searched. However, searching all solutions becomes challenging as the complexity of the problem increases. They have resolved this issue by estimating the epistasis of solution samples.
However, estimating the epistasis using samples of the solutions still requires a large computational cost. Therefore, in this study, deep neural networks (DNNs) are used to construct an epistasis-estimating surrogate model to remarkably reduce the computational cost.
The remaining of the paper is consisted as follows. The background is explained in Section 2. In Section 3, related studies are introduced. First, a method of smoothing the ruggedness of the problem space using a meta-GA is described, and then, a study related to basis evaluation based on actual epistasis is described. Section 4 discusses the deep learning-based epistasis estimation method proposed in this study. In Section 5, the results of applying the basis change to a GA are analyzed. We draw conclusions in Section 6.

Backgrounds and Test Problems
In this section, we describe various backgrounds needed to understand the rest of this paper. In Sections 2.1 and 2.2, we describe basis and epistasis, respectively. We explain how we calculated epistasis in Section 2.3. Sections 2.4 and 2.5 discuss the deep learning and surrogate models, respectively. In Section 2.6, we introduce test problems used in our experiments.

Basis
In linear algebra, the basis of a vector space is the linear independent vectors that span the vector space [11,12]. In other words, they are vectors that give a unique representation as a linear combination to any vector in the vector space. The basis can be defined as follows: [Basis] Basis B, which is a basis of V, which is a vector space, over F, which is a field, is a linear independent subset of V that spans V. Satisfying the following two conditions means that B, which is a subset of V, is a basis: • Linear independence property: for every finite subset {b 1 , b 2 , . . . , b n } of B and every a 1 , a 2 , . . . , a n in F, if a 1 b 1 + a 2 b 2 + · · · + a n b n = 0, then necessarily a 1 = a 2 = · · · = a n = 0; The standard basis for vector space Z n is e 1 , e 2 , . . . , e n , where e i is a column vector with the i-th component of 1 and remaining components of 0 If the basis is expressed as a 0-1 matrix, it becomes the identity matrix. Changing the identity matrix into an invertible one can lead to the transformation of the standard basis.

Epistasis
In GAs, epistasis refers to any type of gene interaction. A more complex gene interaction implies more complex problems and greater epistasis. On the other hand, if the gene interaction is independent, the epistasis becomes zero.
A method proposed by Lee and Kim [10] used the epistasis proposed by Davidor [13]. Equations for calculating the epistasis are provided in Section 2.3.
Typically, the standard basis is used in binary encoding. When the evaluation functions in the standard basis with complex interdependencies of basis vectors are changed to another basis, with the aim of smoothing the ruggedness of the problem space, the calculation of the evaluation functions becomes simpler and the epistasis decreases as well. However, as all the feasible solutions need to be searched first to calculate the epistasis, the epistasis of the sampled solutions was calculated in [10] and, based on this, an estimation was made for the actual epistasis.

How to Calculate Epistasis
In this paper, we used the formula proposed by Davidor [13] to calculate epistasis. Equations for calculating epistasis in Section 2.2 as follows. A string S is composed of l elements (s 1 , s 2 , . . . , s l ), where s i ∈ {0, 1} for each i = 1, 2, . . . , l.
The GrandPopulationΓ is the group of all strings of length l, Let Pop denote a sample of Γ, where the sample is selected uniformly and with replacement. N Pop is the size of the sample Pop. v(S) is the fitness of a string S.
The average fitness value of Pop is: The number of string S in Pop that match s i = a is denoted by N i (a). If Pop S i =a is the set of all strings in Pop with the allele a in their i-th position. The average allele value A i (a) is denoted by The excess allele value X i (a) is defined by: The excess genic value X(S) is: The genic value A(S) of a string S is defined as: Finally, the epistasis variance ε(S) is:

Deep Learning
Deep learning [14] is a high-level abstraction method that uses a combination of various non-linear techniques. This technology is currently being used in various domains of the modern society. In particular, it has brought significant developments in fields like computer vision, speech recognition, and natural language processing.
A deep neural network (DNN) is one kind of artificial neural network (ANN) that consists of various hidden layers between the input layer and output layer. Regardless of the linearity, a DNN can model complex relationships that change from the input to the output. In addition, there are various ANN models such as convolutional neural network (CNN) [15], recurrent neural network [16], and restricted Boltzmann machine [17].

Surrogate Model
Surrogate model is a model used to replace tasks such as those of complex and time-consuming calculations [18]. There are several real-world simulations that are extremely time-consuming or difficult to implement in a realistic manner. These simulations may take days or even weeks to produce the results. As these tasks are highly time intensive, performing simulations in many cases is almost infeasible. In such cases, surrogate models are used for the simulations. Sreekanth and Datte [19] built a surrogate model for coastal aquifer management. Eason and Cremaschi [20] used an ANN-based surrogate model for a chemical simulation.
Surrogate models are also used in the fields of mathematics and optimization. Some studies have been conducted to optimize various objective functions using surrogate models [21][22][23][24][25][26]. Specifically, these studies attempted to optimize the cost-expensive black-box problems. In particular, several studies have proposed the use of Walsh-based surrogate models to reduce the computational cost associated with pseudo-Boolean problems [27][28][29].
Similarly, these models have been adopted to reduce computational costs in several other fields as well. Deep learning techniques can also be used to replace black-box evaluation. To this end, surrogate models using deep learning have been developed [2,[30][31][32][33][34][35]. Deep learning can be a powerful method for estimating the evaluation values that are difficult to compute in a realistic manner. This study intends to reduce the computation time by constructing a DNN-based surrogate model that estimates the epistasis corresponding to a basis.

Test Problems
This study tested the same problems as those in the experiments conducted by Lee and Kim [10]. The OneMax problem is the maximization of the number of 1s in a binary vector. Variant-OneMax is defined as the problem that maximizes the number of 1s in the binary vectors that are changed from the standard basis to another basis. When B S is the standard basis and B is another basis, the change of B S to B can be expressed as an invertible matrix [T] B B S . When we change the basis, a vector v can be represented as The optimal solution to the Variant-OneMax problem is that all elements are 1.
The NK-landscape problem defined by Kauffman [36] comprises of chromosome S of length N, and each gene has a dependency on the other K numbers of genes. The fitness contribution f i of the gene at locus i depends on the allele S i and K other alleles S i1 , S i2 , . . . , S iK . The fitness f of a point S = (S 1 , S 2 , . . . , S N ) can then be expressed as follows: As the NK-landscape problem is an NP-complete problem, it is difficult to find the global optimum, and therefore, it is employed extensively in the optimization field [37]. Additionally, the level of difficulty of the problem can be adjusted through N, which represents the overall size of the landscape and K, which represents the number of its local hills and valleys. The higher K is, the more rugged the problem space is.

Prior Work on Searching Basis
In this section, we introduce prior work on searching basis. We describe the work of finding a good basis using a meta-GA and the GA combined with epistasis-based basis evaluation.

Basis Searching with a Meta-GA
A good basis can improve the performance of a GA. Lee and Kim [9] proposed a method of using a meta-GA to search a good basis.
The performance improvement was demonstrated for the NK-landscape problem by changing the standard basis to the other basis obtained using the meta-GA. However, the time complexity of searching the basis with their meta-GA was found to be O(2 n 2 ). Hence, the method of obtaining a good basis using the meta-GA is not practical as it is extremely time-intensive.

Epistasis-based Basis Evaluation
Lee and Kim [10] proposed an epistasis-based basis evaluation method and subsequently applied the proposed method to a GA in order to search a good basis. They converted the complex problem into a simpler one by changing the basis.
Their epistasis-based basis evaluation method was conducted as follows. With a certain given basis, the sampled solution set S was obtained from the problem, following which S was obtained by a basis change. The epistasis in S was calculated. A low value of the epistasis represented the higher appropriateness of the basis for the problem. When the gene size was represented by l and the number of the sampled solution set S was represented by s, the time required for calculating the epistasis was O(l 2 s).
Next, GAs using the epistasis evaluation were described. A basis could be represented as an invertible matrix. If E i s were elementary matrices, any invertible matrix T could be expressed as Each basis was represented as a variable-length string encoding meaning the multiplication of the elementary matrices [38].
The parents were aligned according to the Wagner-Fischer algorithm [39], after which a uniform crossover operator was applied. For the selection operator, the tournament operator was applied to select the best solution out of three parents. The mutation operator was used by applying either insertion, deletion, or replacement. The replacement was applied by replacing the parent households with child households.
Experiments were conducted on the Variant-OneMax problem and the NK-landscape problem that are described in Section 2.6. After performing the GA, to identify another basis for each problem, the optimal solutions were found by searching solutions for each problem one hundred times, independently. This was conducted by changing to the basis found by the GA. Thereafter, the experimental results were compared for three cases: the case where the basis change was not conducted, the case where a meta-GA was applied, and the case where the epistasis-based basis evaluation method was used. The case with using the meta-GA showed better results overall. However, as the computation of the meta-GA is time intensive, it is difficult to use in practice. The epistasis-based basis evaluation method yielded better results compared to the basis with no change made, and required less time compared with the meta-GA. Thus, considering both the experimental results and process time, the epistasis-based basis evaluation method was shown to be the most efficient method. Further, the calculation of the epistasis before and after the basis change showed a decrease in the epistasis, thereby indicating the effectiveness of the presented model. However, this method was still time-consuming as it required calculating the epistasis for each solution in the solution set. We try to remarkably reduce the computational cost by estimating epistasis using deep learning.

Proposed Method Based on Surrogate Model
In this section, we describe the proposed epistasis estimation method based on a surrogate model. We present how to estimate epistasis using deep learning in Section 4.1. In Section 4.2, we introduce our GA combined with the proposed surrogate model.

Surrogate Model-Based Epistasis Estimation Using Deep Learning
In Section 4.2, we confirmed that the complexity of a problem on a basis could be represented by an epistasis. This was done by estimating the actual epistasis using a sampled solution set as the problem of searching all the solutions to calculate the epistasis. However, as this estimated method of computing the epistasis was still time-consuming, we expected that deep learning could be used to further reduce the computation time.
We intended to apply a deep learning model when the basis and the estimated epistasis, in Section 4.2, were considered to be input and output, respectively. In practice, it is nearly impossible to search all the solutions to calculate epistasis. However, if the epistasis can be successfully estimated using a deep learning model, the deep learning model can be viewed as an objective function for estimating the epistasis. A surrogate model estimates an unknown objective function based on the accumulated input and output data. Therefore, the corresponding deep learning model can be considered to be a surrogate model.
Kim et al. [40] and Kim and Kim [41] introduced an epistasis estimation method using a DNN model. Experiments were conducted using DNN and CNN to determine the deep learning model that was more suitable for epistasis estimation. The hyperparameters for the DNN and CNN models are detailed in Table 1, and the used structure of the CNN model is shown in Figure 1.  The data used in our experiments were from the populations obtained by the GA experiments of Lee and Kim [10] for searching the basis on each problem. Further, 10-fold cross-validation was conducted after removing duplicated data within the dataset. The results of the experiment are summarized in Table 2. The ratio shown there was the result calculated by 100 × E s −E d E s (%), where E s denotes the calculated epistasis by sampling, and E d represents the estimated epistasis using deep learning. A lower ratio implies that the epistasis estimated using deep learning could be successfully replaced by epistasis of a sampled solution set.
According to the experimental results are present in Table 2, there was no significant difference between the ratio in DNN and the ratio in CNN. However, the CNN model required 6.5 times more training time than the DNN model. When we considered both the estimation results and the training time taken, the DNN model was more efficient.
There are various hyperparameters that can be configured in the DNN model, and depending on the hyperparameter configuration, the performance of the DNN model may vary. Kim et al. [40] used the dropout technique [42] to resolve the vanishing gradient problem of the DNN. The initial weights also affect the model performance; the two popular initialization techniques are "Xavier" [43] and "He" [44]. Finally, the experiment was conducted using the best composition: three layers, the "Xavier" initializer, and a dropout rate of 0.5.

A Genetic Algorithm with Our Surrogate Model
In this section, we present a GA to find a good basis with our surrogate model. As shown in Figure 2, we replaced the part of calculating epistasis in our GA with the proposed surrogate model. It is known that any basis is representable as an invertible matrix [10]. Every invertible matrix can be expressed as a product of elementary matrices. In our GA, we used a variable-length string for the encoding of the elementary matrix product as a way to represent a basis.
For selection, we applied a tournament operator three times in order to get three solutions. The best among the three became a parent. Parents were gathered as many as the size of population, and two parents randomly paired up and we applied a crossover operator between them. In both parents, one string was stretched to match the other string. The stretched string was inserted with "-" symbols to adjust the length to minimize the Hamming distance between the two strings. Two offspring were generated by applying a uniform crossover to align and removing the "-" symbols. The mutation operator applied one of insertion, deletion, and replacement. The probability of applying the mutation operator was 0.05 for each gene, and 0.2 for each individual. Afterwards, fitness was evaluated by applying the proposed surrogate model. Finally, the parent population was replaced by the offspring population.

Results of Experiments and Discussion
We introduce our computing environments and dataset in Section 5.1. In Section 5.2, we present the experimental results when we applied the proposed method to the Variant-OneMax problem and the NK-landscape problem. In Section 5.3, we describe how well the proposed method estimates epistasis.

Test Environments and Dataset
Our experiments were conducted on the computing environments presented in Table 3. Further, GPU computation was used for tensor calculation to reduce the time of the DNN model training. The amount of data was reduced after removing the duplicated data from Table 2. If the training process was conducted repeatedly on a small amount of data, it may result in an overfit. To prevent an overfit, additional data were generated in a manner similar to the one used by Lee and Kim [10]. The amount of data generated for each problem became 100 times the amount of data before the duplication removal; the generated data were then used for the DNN model training.

Results
We conducted experiments on the Variant-OneMax and NK-landscape problems. As shown in Figure 3, the experiments were split into two parts. The first part involved searching for another basis for each problem through a GA, whereas the second part consisted of finding the optimal solution for each problem by changing the basis found through another GA. In this study, we used the DNN model trained by the basis and the epistasis that were used in the experiments by Lee and Kim [10]. On the Variant-OneMax problem, experiments were conducted for N = 20 and N = 30. This experiments was conducted under the same conditions as in [41], and the same result was obtained. In both cases, the basis was searched for using a GA. Then, after changing as the basis obtained by the GA, the optimal solution was obtained by independently searching for solutions one hundred times on each of the problem instances. For finding the best solution on the problem, our GA used one-point crossover and bit-flip mutation with probability 0.05. The remaining part including selection and replacement were the same as GA for finding a good basis in Section 4.2. This process was repeated 10 times to obtain the average values. Table 4 shows how well the optimal solution could be found when the estimation method using the DNN was applied to the Variant-OneMax problem. In Figures 4 and 5, "Original" corresponds to the case where the optimal solution is found without applying the basis change. "Epistasis" corresponds to the case of finding the optimal solution using the epistasis-based basis evaluation method proposed by Lee and Kim [10]. "DNN" is the result of applying the method proposed in this study. Considering the results in Figure 4, "Epistasis" provided the best results among all the cases. "DNN" provided better results than "Original", but performed slightly poorer than "Epistasis." However, in Table 4, "DNN" showed almost similar results to "Epistasis." Considering the computation time results in Figure 5, there was a 40 to 60 times difference between "DNN" and "Epistasis. . Similar to the Variant-OneMax problem, the basis was searched for using a GA, following that the change was made on the obtained basis. This was followed by ten repetitions of the independent GA search that was conducted 100 times. The results of the experiments are summarized in Table 5.    In Figures 6-8, "Original," "Epistasis," and "DNN" illustrate the same trends as obtained from the Variant-OneMax problem. Figure 6 compares the entire population averages. The best quality of population was obtained when "Epistasis" was used. "DNN" had better results than "Original", but showed slightly poorer results than "Epistasis." However, the comparison results for the optimal solutions were more important because the best solution of the population was the optimal solution. In Figure 7, "DNN" generally found better optimal solutions than "Original," and also found almost identical or slightly poorer solutions when compared with "Epistasis." In Figure 8, when we compared the computing time taken, "Epistasis" required 46 to 87 times longer times compared with that required by "DNN." In this study, it was confirmed only how well the best solution of the problems was found, and we did not perform a statistical analysis. Derrac et al. [45] explained how to use statistical tests to compare evolutionary algorithms. In future work, it is valuable to check the significance of our improvement through a statistical analysis.

Epistasis Estimation Based on Basis
This section examines the effectiveness of the epistasis estimation method based on the DNN. Through Figures 9-11, we can know the comparison results of estimating the epistasis with the DNN and calculating the epistasis of sampled solutions in the basis search using a GA. Considering Figure 9, the estimation results using the DNN model appeared to be similar to the calculated epistasis for the Variant-OneMax problem. However, from Figures 10 and 11, when we considered the NK-landscape problem, the difference between the DNN model estimation and the calculated epistasis is observed to increase with the growth in problem complexity. Owing to this tendency, the reason for the lower quality of the population by "DNN" than that by "Epistasis" for the cases of (N = 30, K = 10) and (N = 30, K = 20) can be guessed as follows. As the Variant-OneMax problem is a relatively simple problem, the epistasis estimation based on the basis was not presented with any significant difficulties; however, on the NK-landscape problem, as the problem complexity increased, the estimation by the DNN became more difficult.

Conclusions
Lee and Kim [10] introduced an epistasis-based basis evaluation method; in this study, however, we propose a surrogate model-based epistasis estimation method, which uses deep neural networks (DNNs) for epistasis estimation.The proposed method was applied to two types of problems, as discussed in Section 2.6, then experiments were conducted. The results were compared with those obtained using the epistasis-based basis evaluation method. Regarding the optimal solution search, estimating the epistasis of the basis using the DNN model was nearly always better than that with no basis change. In addition, the method using the DNN showed nearly similar or slightly poorer results than the epistasis-based basis evaluation. In particular, on the NK-landscape problem, the method using the DNN provided optimal solutions that were quite similar to those obtained using the epistasis-based basis evaluation. However, using the DNN, the time required for the epistasis estimation was 87 times less as compared with the one conventionally used. Although our method spent time to construct a surrogate model using DNN, the time was just for pre-processing, and it was a low overhead that was much less than the time taken by the previous method [10] that repeatedly calculated epistasis. Therefore, by applying our method, a successful estimation of the epistasis was achieved, along with effective reduction in the computation time. (See Table 2, and Figures 5 and 8).
The contributions of our study can be described as follows. Kim and Yoon [6] presented the effect of basis change. Lee and Kim [9] obtained a good basis by using a meta-GA, and also obtained better results on the NK-landscape problem using this basis. However, the meta-GA is time intensive and is therefore very impractical. To address this issue, they proposed an epistasis-based basis evaluation method [10] that showed better efficiency than the meta-GA method; however, their method was still computationally intensive. To this end, we proposed a surrogate model method of estimating the epistasis by using a DNN model. The proposed method showed similar qualities as the epistasis-based basis evaluation method while significantly reducing the computation time. Thus, our epistasis estimation using the DNN was more efficient than the previous methods; furthermore, our method was found to be more practical.
However, the DNN model was presented with challenges when we estimate the epistasis as the problem complexity increased, due to the following reasons: the use of the estimated epistasis values instead of the actual calculated values for DNN training resulted in a lower estimation accuracy, or that each problem required different optimal hyperparameters for the DNN. Thus, in the future, we intend to improve the results of the present study by improving the method of searching for the better optimal solution or by developing separate DNN models for each problem.