Solution of Mixed-Integer Optimization Problems in Bioinformatics with Differential Evolution Method

The solution of the so-called mixed-integer optimization problem is an important challenge for modern life sciences. A wide range of methods has been developed for its solution, including metaheuristics approaches. Here, a modification is proposed of the differential evolution entirely parallel (DEEP) method introduced recently that was successfully applied to mixed-integer optimization problems. The triangulation recombination rule was implemented and the recombination coefficients were included in the evolution process in order to increase the robustness of the optimization. The deduplication step included in the procedure ensures the uniqueness of individual integer-valued parameters in the solution vectors. The developed algorithms were implemented in the DEEP software package and applied to three bioinformatic problems. The application of the method to the optimization of predictors set in the genomic selection model in wheat resulted in dimensionality reduction such that the phenotype can be predicted with acceptable accuracy using a selected subset of SNP markers. The method was also successfully used to optimize the training set of samples for such a genomic selection model. According to the obtained results, the developed algorithm was capable of constructing a non-linear phenomenological regression model of gene expression in developing a Drosophila eye with almost the same average accuracy but significantly less standard deviation than the linear models obtained earlier.


Introduction
Modern life sciences, such as bioinformatics, heavily rely on numerical methods and computer algorithms. Optimization problems in which some unknown parameters of mathematical models are to be determined from data arise in many practical and theoretical studies, such as the identification of genes, proteins and regulatory networks, association between genotype and phenotype, prediction of agronomically important traits, and others. Unknown parameters often represent not only coefficients of floating point type, but also indices of some kind and thus are integer numbers. Consequently, the solution of so-called mixed-integer optimization problems is an important challenge for modern bioinformatics.
A wide range of methods has been developed for solution of such problems. Particularly, stochastic optimizers can be employed to obtain a global minimum of objective function given diverse biological data of a big volume. Different types of simulated annealing (SA) and genetic algorithms (GA) were successfully applied to biological problems [1]. Modern evolutionary algorithms, such as evolution strategies (ESs) or differential evolution (DE) [2], can outperform other methods in the estimation of parameters of several biological models [3][4][5]. In a recent study [6], it was shown that DE is robust against structural bias that may cause the optimization procedure to favor certain regions of the search space over others. The development of new methods for solution of mixed-integer optimization problem continues. In [7], mixed-integer evolution strategies (MIES) were presented, which are natural extensions of ES for mixed-integer optimization problems. A two-timescale duplex neurodynamic approach to mixed-integer optimization, based on a biconvex optimization problem reformulation with additional bilinear equality or inequality constraints was proposed in [8]. A genetic algorithm for model-based mixedinteger optimization (GAMBIT) was developed and extended with a parameterless scheme in [9]. Several approaches to mixed-integer optimization can be found within a commercial general algebraic modeling system (GAMS) framework [10]. Other alternatives include the covariance matrix adaptation evolutionary strategy CMA-ES [11], mixed-integer sequential quadratic programming (MISQP) [12,13] and a Bayesian network-based approach with the mixed-integer Bayesian optimization algorithm (MIBOA) by [14]. A method for the solution of very large mixed-integer dynamic optimization (MIDO) based on an parallel cooperative scatter search metaheuristic with new mechanisms of self-adaptation and specific extensions to handle large mixed-integer problems was introduced in [15].
The global optimization methods that may depend on problem-specific assumptions introduce difficulties in adaptation to other types of problems together with challenges in efficient implementation suitable in different areas. The suboptimal algorithm of temperature reduction, termed the cooling schedule, in SA can increase the computational time and lower the accuracy of the final solution. The correct choice of mutation, recombination and selection operators is crucial for success of GA application. The evolutionary algorithms are highly dependent on the parameters of the evolution model. A wide range of metaheuristic approaches have been developed for parameter estimation in biology. For example, enhanced scatter search, proposed in [16], is computationally expensive but is capable of obtaining integer and real parameters with high accuracy and outperforming the state-of-the-art methods [17].
Differential evolution (DE) [2,18] is a metaheuristic algorithm proposed by Rainer Storn and Kenneth Price. In the standard DE, population is randomly initialized and evolved using mutation and crossover operators to produce trial offspring together with a selection operator that accepts or rejects those trials into the population, comparing their values of the objective function.
The standard DE works with floating point parameters; therefore, DE algorithms for discrete parameters [19] were proposed. A modified binary DE (MBDE) algorithm and logical mutation operator was proposed in [20] based on the structure of binary bit strings. In a binary-adapted DE (BADE), developed in [21], the scaling factor is considered the probability that the scaled difference bit becomes "1". In a discrete DE algorithm, proposed in [22], the real value obtained from the DE operators is converted to binary values using the sigmoid constraint function. The works [23,24] used the adaptive DE algorithm for feature selection, where parameters in DE are automatically adapted depending on the task. A hybrid approach, proposed in [25], combined strong exploratory properties of differential evolution with a modified artificial bee colony observation process for trait selection. In [26] the authors developed differential evolution with a binary mutation scheme for feature selection. A novel adapted mixed-integer differential evolution algorithm was designed in [27]. In [28], an improved binary difference evolution algorithm was developed for the feature selection problem for general classification problems in data mining.
In this work, we propose a modification of the differential evolution entirely parallel (DEEP) method introduced recently [29,30]. DEEP was successfully applied in mixedinteger optimization problems, such as the identification of the biogeographic origins of highly mixed individuals implemented in the online ancestry prediction tool reAdmix [31], regression and dynamical models of flowering time in chickpea [32][33][34] and mungbean [35]. The DEEP method operates on real-valued parameters that can be converted to integers, using either rounding operation or array index transformation in which the resulting integers are the indices in the sorted array of real parameters. The rounding is better suited for many practical tasks, as a drawback of another transformation is that a relatively small change in real values can lead to a big perturbation of the solution. DEEP implements a parameter adaptation scheme based on preserving the diversity of solutions [36] that was developed for classic DE strategies. The local search capabilities of DE can be enhanced by including additional differences in the mutation. Consequently, in order to increase the robustness of the procedure, the triangulation recombination rule was implemented based on [37]. To control the diversity of the integer-valued parameters in intermediate and final solution vectors, the deduplication step was included based on [38]. To implement a new adaptation scheme, the control parameters of the resulting algorithm are made separate for each vector in the population and included in the evolution process.
Thus the contribution of the paper is three fold: (i) DEEP was enhanced with the triangulation recombination rule and deduplication step; (ii) the control parameters were included in the evolution process for self-adaptation; (iii) the resulting method implemented in the open-source software [39] was applied to three bioinformatic problems. The first two of them, namely the optimization of the set predictors and the compilation of the training sample, are important to reduce the dimensionality of the genomic selection model. The third example illustrates the construction of the regression model of gene expression in analytic form.
We compared the results for the genomic selection model with those for the DEEP method without the proposed enhancements and the genetic algorithm and the results for the third example were compared with a state-of-the-art package gramEvol specifically designed for building models in analytic form using the genetic algorithm for optimization [40].
The next sections of the paper give detailed description of the developed methods, test problems and results. The papers on modifications of the DE are surveyed in Section 2. The formal problem is stated in Section 3.1, and the differential evolution entirely parallel method and proposed enhancements are presented in Sections 3.2-3.4. The bioinformatic problems considered in this research are stated in Sections 3.5-3.8. The results are presented in Section 4 followed by their discussion in Section 5, and Section 6 provides the conclusions.

Related Work
Numerous variants of DE have been developed to improve the reliability, scalability, accuracy, convergence speed and other aspects of the method. Parameter adaptation methods were introduced that resulted in self-adaptive DE variants, such as SaDE [41] and ensemble-based SAEDE [42]. The mutation rule was improved by using the best member from the selected fraction of individuals and an external archive containing suboptimal individuals JADE [43]. Success history-based parameter adaptation SHADE was introduced in [44]. The classic mutation operator DE/rand/1 was enhanced by attaching a benchmark factor to the basis vector in IMMSADE [45]. A crossover probability sorting mechanism and dynamic population reduction strategy was proposed in EJADE [46]. Scale factor and crossover probability obeyed the Cauchy distribution and uniform distribution, respectively, and were adaptively adjusted, according to their respective weighted Lehmer means in ACDE [47]. Several ways for handling constraints imposed for parameters were introduced and investigated [48].
Different variants of improved parameter control were introduced, such as a multiple exponential recombination [49] and encoding of control parameters in individuals [50]. Improved individual-based parameter setting and selection strategy was developed in IDEI [51]. A DE variant with novel control parameter adaptation that includes a grouping strategy for adjusting the scaling factor and crossover probability and a parabolic reduction rule for changing NP was suggested in PaDE [52]. Mechanisms of population size adaption were also developed. In [53], the algorithm was proposed that evaluated the current population diversity based on the European distance and adjusted the NP size according to the evaluation results. The size of the population was also dynamically adjusted in APTS [54]. The population was divided into subpopulations in mDE-bES [55]. The population was clustered in multiple tribes in sTDE-dR [56]. Improved mutation and crossover strategies were proposed, such as a novel framework based on the proximity characteristics among the individual solutions in Pro-DE [57], a variant of the classical DE/current-to-best/1 scheme that uses the best of a selected group individual in MDE-pBX [58], a collective information-powered DE proposed in CIPDE [59], an improved DE algorithm with dual mutation strategies collaboration termed DMCDE [60], DE with a fitness and diversity ranking-based mutation operator called FDDE [61], ReDE [62] for which a revised mutation strategy for DE was introduced, RNDE [63] which used a new mutation operator, and DE/neighbor/1 which balanced the exploration and exploitation ability of the DE process. Other examples include DE with eigenvector-based crossover [64], DE with multiple exponential crossover [65], a switched parameter DE [66], a classification-based mutation strategy DE [67], and an orthogonal crossover (OX) operator DE [68].
Various offspring generation strategies were combined to produce an algorithm with enhanced properties, such as DE with composite strategies and parameters CoDE [65], DE that selects mutation strategies and control parameters from the strategy and parameter pools, respectively, to produce successful offspring population EPSDE [69], the framework with two mechanisms, namely, the multiple variants adaptive selecting mechanism and the multiple variants adaptive solutions preserving mechanisms [70], and improved ensemble of differential evolution variants (IEDEV) [71]. A method with multiple variant coordination was proposed in [72], the archiving of promising directions was suggested [73], and IMO-CADE was introduced in [74], where mutation operators were selected adaptively.

Problem Statement
Let us consider mixed-integer optimization problem (1) that combines unknowns that take real and integer values.
.., K} is the vector of parameters of the model of the length K that is to be found to give the minimal value to the objective function Q and satisfy the constraints with upper and lower limits c U and c L , respectively. Q computes the deviation of the model with parameters x from experimental data u. The subset I holds the indices of parameters that are integer numbers.

Differential Evolution Entirely Parallel Method
Differential evolution (DE) was proposed in [18] as an effective stochastic method for function minimization. DE operates on a set called population of parameter vectors called individuals that are initially generated randomly. The size of population NP is constant, and on each iteration, the set of parameter vectors may be referred to as generation. The differential evolution entirely parallel (DEEP) method [29,30,92] combines enhancements proposed in the literature as well as some specifically developed modifications. These enhancements include the two-step scheme for adaptation of algorithmic parameters based on the control of the population diversity developed in [36], an optional scatter search step [16] that is performed each Ξ iteration, where Ξ is a control parameter, and a selection rule [93] for consideration of several different objective functions. To avoid premature convergence, the predefined number of individuals Ψ that were unable to update the parameters after the predefined number of iterations Θ are substituted with the same number of ones with the lowest values of the objective function. To satisfy the constraints in the form of inequalities imposed for the model parameters, the infeasible solutions that might be generated by the method are substituted with the values uniformly sampled from the constrained interval.
DE operates on floating point parameters. The conversion from real to integer values can be performed using one of the two approaches implemented in the DEEP method. The first option employs the rounding operation that substitutes a real value with the nearest integer number. The second approach involves the following operations: • The values are sorted in ascending order. • The index of the parameter in the floating point array becomes the value of the parameter in the integer array.
The rounding method is more convenient for the solution of practical problems because in the second procedure, a relatively small change in the real value of the parameter can lead to a relatively important perturbation of the solution. The local search capabilities of classic DE strategies can be improved by including additional differences in the mutation formula. This was done using the triangulation recombination rule based on [37].
As the parameter adaptation scheme based on preserving the population diversity [36] was developed for classic DE strategies, another approach is introduced in this work. To avoid equally valued integer parameters in the trial and final solution vectors, the deduplication procedure is performed on each iteration based on [38]. The control parameters of the triangulation recombination rule are made different for each individual vector in the population. They are included in the evolution process and updated on each iteration with the same formulae.
The operation of the method, including the two enhancements proposed in this work, is described in the Algorithm 1 insertion. The input to the algorithm includes the following: • The specification of the problem to be solved: -The objective function; -The data with which the solution is to be compared; -Constraints for the parameters; • Control parameters of the algorithm.
The execution starts with the initialization step that randomly assigns the values satisfying the given constraints c L k < x k < c U k , k = 1, ..., K to a set of NP parameter vectors X 0 = v 0 i |i = 1..., NP for starting generation G = 0. The main loop of the algorithm consists of recombination, evaluation and selection steps. Calculations are terminated when the stopping criterion is satisfied and can be defined by the maximal allowed number of generations G max or the number of consecutive iterations during which the objective function variation is less than a predefined small value. Objective function Q for several vectors is calculated in parallel.
The number of individuals in population NP, the frequencies Θ and Ξ, the number Ψ of substituted individuals and stopping criterion parameter G max are the main control parameters of the method that are to be set by the user.
The output of the algorithm contains the values of unknowns that provides the minimal value of the objective function.  6) Update scaling coefficients and weights (7), (8)

Triangulation Recombination Rule
In order to increase the robustness of the procedure, the triangulation recombination rule was implemented based on [37]. The trial vector v G+1 where b, t, and r are random indices sampled from the uniform distribution so that x G b , x G t , and x G r are the best, better and the worst vectors with respect to objective function Q; w G j,i and p G j,i are the normalized and nonnormalized weights, respectively; and F G j,i are the scaling coefficients.
To adaptively change the weights and scaling coefficients, they are evolved using the same formulae (4)-(6): p G+1 where F G+1 :,i and p G+1 :,i are trial vectors of scaling coefficients and nonnormalized weights, respectively, and F G and p G are the sets of all such vectors for the current generation G.

Deduplication
To avoid equal integer-valued parameters in intermediate and final solution vectors, the deduplication step for each offspring was included based on [38]. The algorithm performs the following operations for each trial vector v G+1 The execution of the algorithm is illustrated in Table 1  The time need to complete the deduplication procedure depends on the number of parameters K and, with the fast sorting algorithm, is estimated as O(2K(log(2K)).

Genomic Selection
More recently, it has become clear that improved crop production can only be achieved if the potential of varieties can be assessed at different locations and seasons and in combination with a range of management options that cannot be tested in vivo [94][95][96]. Thus, world-class breeding programs have reformed their yield improvement strategy by integrating traditional yield estimates with gene-phenotype-system modeling tools [97][98][99][100]. This integration allows in silico identification of genotypes associated with phenotypes that are likely to improve crop production [101][102][103]. Such approaches led to a 3-4% increase in yields per year under the Australian sorghum program and ≈1% growth under the U.S. Northern Production Zone corn program. These techniques have not only proven to be effective, but are rapidly improving [100].
Breeding value assessment models suffer from the so-called curse of dimensionality-a scenario in which the number of possible SNP markers is significantly larger compared to the available observations [104].
The standard approach to calculating genomic breeding value is to use the best linear unbiased predictor (BLUP), using a linear model and ridge regression [105]. Bayesian methods have also found widespread use [104]. In this work, an implementation provided in rrBLUP package [106] for R software system [107] is utilized.
Here, we apply the developed algorithm to two independent problems with two different datasets.

Optimization of the Set of Predictors
Ridge regression helps to overcome the dimensionality problem, but the set of predictors can be optimized to obtain the same accuracy with less SNPs used [28]. Thus the optimization problem is formulated to select the subset of predictors that maximize the correlation between the estimated and measured phenotype. To reduce overfitting, 4-fold cross-validation is utilized, and the objective function equals the mean Pearson's correlation coefficient for the model built by rrBLUP [106].
The algorithm was applied to a wheat dataset [108] phenotyped for baking quality traits that was split into training and test samples of 1516 and 300 elements, respectively, and genotyped for 5021 SNP markers.

Compilation of Training Sample
An important problem in the wide adoption of genomic selection is that the accuracy of the prediction of the phenotype for new genotypes is hard to estimate beforehand. It was shown that it is possible to achieve a substantial gain in accuracy by careful compilation of the sample on which the genomic selection model is trained. As a result, EthAcc (estimated theoretical accuracy) was proposed in [109] as an objective function for the training sample optimization. EthAcc is based on causal quantitative trait loci assessed from a genome-wide association study. EthAcc does not evaluate the training set but the genomic prediction accuracy based on the training set. The choice of the best training set from the historical panel of genotyped and phenotyped samples is a key task in the evaluation of newly created breeding resources.
The developed algorithm was applied to a wheat dataset provided in [109] containing information on 1368 SNPs and 237 samples that were phenotyped for the heading date. The algorithm and data for EthAcc computation were taken from the original paper [109].

Regression Model of Gene Expression
Gene expression controls all biological processes and gene regulation can be modeled in several ways: differential equations for dynamical models and linear or non-linear regression in statistical models [110]. The fruit fly Drosophila is an ideal model organism for different biological studies. In [111], the robustness in development was investigated using quantitative spatial gene expression in the morphogenetic furrow (MF), a wave of differentiation that patterns the developing eye disc. The MF passes from the posterior to the anterior of the disc over a period of two days (90 min per adjacent row of cells). The method of hybridization chain reaction (HCR) was used to quantitatively measure spatial gene expression, co-staining for several genes simultaneously. Three dimensional images of mounted, HCR stained eye discs were acquired on a Zeiss LSM 780 laser scanning microscope. Gene transcripts were detected within image stacks after processing.
The gene atonal plays a central role in establishing the correct eye structure. It is known that its expression is under the control of hedgehog and Delta genes. In [111], a regulation was modeled by linear regression function. A more general approach called grammatical evolution (GE) [112] allows the user to determine the analytic form of functional dependence by minimizing the deviation of model solution from data. In GE, as in a genetic programming method, the function is constructed according to the rules of the given context-free grammar from "words" that are integer-valued vectors of parameters.
The data represent a quantitative description of gene expression, presented in the form of bursts of intensities obtained by the processing of images of the eye discs of fruit fly Drosophila larvae. The information is provided for 85 discs in the form of the table for each processed sample with one row for each cell in a disc, approximately 1000 cells per disc. The rows contain values of measured gene expression.

Experimental Setup
The described algorithm was implemented in C programming language in the framework of the open-source software DEEP [39]. For each case study, the experimental setup contained the formulation of the objective function Q to be minimized, the upper c U k and lower c L k limits for unknown model parameters and selected control parameters for the algorithm.
The control parameters G max = 300, NP = 200, Ξ = 5, Ψ = 2 and Θ = 24 were chosen to be the same as those used in the successful previous applications of the method [32,35]. The objective function Q used in the evaluation step in Algorithm 1 measures the deviation of the model solution from provided experimental data. In practice, any executable program written in any programming language can be utilized; two R-scripts and a program written in C++ were used in the case studies described in Sections 3.6-3.8, respectively.
The specification of the problem is stored in the INI-file that is loaded by the DEEP software.

Optimization of the Set of Predictors
The developed algorithm was used to optimize the set of SNPs that serve as predictors in a linear regression model of genomic selection. The integer-valued parameters represent the indices of SNPs that are included in the model that was built using the rrBLUP package for R statistical software. The Pearson's correlation coefficient between predicted and measured phenotypes was used as an objective function. Several runs with different seeds for a random number generator were performed that took, on average, 28.7 hours on a cluster node equipped with 2xIntel Xeon CPU E5-2697 v3, 2.60GHz.
Each resulting model reduced the number of needed SNPs from 5021 to 1800-3800, with the correlation coefficient ranging from 0.87 to 0.88 and from 0.74 to 0.76 for training and test data, respectively. The accuracy is not dependent on the number of predictors in these results (see Figure 1). The average number of selected SNPs was 2939, and the average correlation coefficient is 0.75 for the test data and 0.875 for the training data. The smallest model contains 1456 SNPs, and the correlation coefficient equals 0.74 for the test set.
The results obtained with the developed method were compared with those for the base DEEP method without proposed enhancements. The new method outperforms the base one in terms of the smallest model, as the base method needs 2400 SNPs instead of 1456 with almost the same correlation coefficient, equal to 0.75, for the test set. Different combination of SNPs were obtained in different runs, and each SNP was selected in aa different number of combinations (see Figure 2). The average frequency of the SNP selection was 65%, and 264 SNPs were selected in more than 80% combinations. Figure 2. The number of occurrences for each SNP in obtained solutions. The colors are made random for better visual appearance.

Compilation of Training Sample
The DEEP method with proposed modifications was applied to a problem of compilation of the set of samples from the given dataset to be used for training of regression model for genomic selection using EthAcc as an objective function. The implementation of the EthAcc computation algorithm and additional data were taken from the original paper [109]. The integer-valued parameters of the model represented the indices of the observations to be included in training. Several runs were performed with different initial values for the random number generator, with the mean runtime of 42 minutes on a cluster node equipped with 2xIntel Xeon CPU E5-2697 v3, 2.60 GHz.
The value of the objective function was reduced almost twice for a typical run according to the convergence curve (see Figure 3). The value of the objective function EthAcc was significantly positively correlated (r = 0.92, p-value= 9.20 × 10 −13 ) with the accuracy of the final model expressed as the correlation between the predicted and measured phenotypes (see Figure 4), as was expected in [109].  The results obtained with the developed method were compared with those for the base DEEP method without proposed enhancements and the state-of-the-art genetic algorithm implemented in package GA [113] for R statistical software. According to the Mann-Whitney criterion, the comparison was statistically significant (p-value < 0.05); consequently, the developed method outperformed the competitors in all cases (see Table 2). Table 2. Comparison of results for the problem of compilation of training sample for the method developed in this work (new), for DEEP without enhancements with DE/rand/1/bin, DE/rand/1/bin, DE/best/1/bin and DE/best/1/exp strategies and for GA with real-valued (GA/rvd) and binary encoding (GA/bin), also with a additional local search step (GA/rvd/loc and GA/bin/loc).

Regression Model of Gene Expression
The developed algorithm was used to optimize the analytic form of the phenomenological function that describes how the expression of the atonal gene depends on the expression of hedgehog and Delta genes in cells of morphogenetic furrow of developing eye disc. The data on gene expression were split into k = 10 folds for cross validation. The model was trained on k − 1 parts of the sample and tested on a remaining part.
Parameters of the mixed-integer optimization problem include integer indices interpreted according to the grammatical evolution rules and real-valued coefficients. The best solution (9) with respect to the root mean squared (RMS) difference between predicted and measured gene expression suggests the non-linear dependence between gene expression levels (see Figure 6).
adj is an adjusted coefficient of determination, HH, DL and ATO are expressions of genes hedgehog, Delta and atonal, respectively. The average accuracy of the obtained non-linear solutions (R 2 adj = 0.25) is slightly better than that of linear regressions (R 2 adj = 0.24) obtained in [111], but the deviation of accuracy is statistically significantly smaller (σ non−lin = 0.03 < 0.10 = σ lin ) according to the F-test (p-value= 8.4 × 10 −11 ). We compared the obtained results with a state-of-the-art method for grammatical evolution implemented in the gramEvol [40] package for R statistical software [107]. We used default recommended settings 40, 200 and 300 for chromosome size, population size and iterations, respectively. The default mutation probability was set to 10/(1 + chromosomesize) ≈ 0.244. The mean adjusted coefficient of determination for the developed method 0.25 was statistically significantly better than that of the competitor, 0.21, according to the Mann-Whitney criterion (U = 806.0, p = 1.47 × 10 −7 ).

Discussion
Feature selection methods are often subdivided in two main groups [114,115]: filtering and wrapper methods. The filtering methods select features based on information content or statistics and do not depend on any learning algorithm. Thus, the filtering techniques are generally less computationally expensive, and are fast, and suitable for large datasets. Moreover, they are easily applicable to various learning algorithms. For ranking features, filtering methods are commonly used, such as Fisher's score, T statistic, entropy-based feature ranking, Bhattacharya distance, etc. [116,117]. However, the filtering methods ignore the interaction of features and the performance of the classification algorithm with the selected features, while the wrapper methods evaluate subsets of features based on the classification efficiency, which is highly dependent on the specific algorithm used. It is known that wrapper methods, such as, for example, SVM (SVM-R), cooperative index (CI), K-top scoring pair (k-TSP), microarray significance analysis (SAM), positive synergy index (PSI) and greedy forward selection (GFS) [118,119] are more accurate than filtering methods, but require more computation and are more likely to lead to overfitting. In [28], an improved binary differential evolution (BDE) algorithm was developed for the feature selection problem for general classification problems in data mining.
The differential evolution entirely parallel (DEEP) method introduced recently [29,30] is capable of finding integer-valued parameters that can represent indices of selected features. In this work, the method was enhanced by the deduplication step based on [38] to avoid equal integer-valued parameters in the intermediate and final solution vectors and by the triangulation recombination rule based on [37] to increase the robustness of the procedure.
The implementation of DEEP [39] was applied to mixed-integer problems, such as the optimization of the predictors set and training set for genomic selection and the gene regulation regression model in symbolic form.
The accuracy of regression models in genomic selection is suppressed by the fact that the number of SNP markers is usually much larger than the number of observations-the so-called curse of dimensionality [104]. Consequently, the dimensionality reduction and training set optimization are important challenges.
The ability of the developed algorithm to optimize the set of SNP predictors for genomic selection was evaluated on the wheat dataset presented in [108]. The developed algorithm successfully constructed a model with fewer predictors and acceptable accuracy, thereby reducing the dimensionality of the problem. The average number of selected SNPs was 2939, and the average correlation coefficient was 0.75 for the test data and 0.875 for the training data. The small set of 9 markers-IWA1008, IWA1273, IWA1637, IWA1870, IWA1929, IWA4784, IWA837, IWA1360, and IWA4243-was selected in more than 87% of solutions, which suggests their high impact on the modeled phenotype.
The developed algorithm was successfully used to determine the optimal training set for genomic selection for the heading date in wheat, using the maximization of EthAcc (estimated theoretical accuracy) as an objective function [109]. The mean number of samples in the resulting training set was 70 out of 237 records with the mean accuracy of 0.75; 28 samples were included in 70% of solutions.
The DEEP method with proposed modifications was used to obtain a non-linear phenomenological model of the expression of atonal gene expression under the control of hedgehog and Delta genes in the developing Drosophila eye, using the grammatical evolution (GE) approach [112]. The quantitative data on gene expression obtained by processing of images of the eye discs were used for model construction [111]. The obtained solutions are highly non-linear, and the adjusted coefficient of determination showed almost the same average accuracy but significantly less deviation than the linear models obtained earlier.
These problems represent important areas of application of mathematical modeling in biology. The prediction of important phenotypic characteristics based on the combination of genetic markers can be used in selection, as in the tested case study, and also in biomedicine for early screening of diseases. The models of the regulation of gene expression help to decipher the mechanism of development. The successful application of the method to these diverse problems supports its applicability in other biological tasks.

Conclusions
The differential evolution entirely parallel method was enhanced by the deduplication step and by the triangulation recombination rule for solving mixed-integer optimization problems.
The application of the method resulted in dimensionality reduction in the genomic selection model in wheat such that the phenotype can be predicted with acceptable accuracy, using a selected subset of SNP markers. The method was also successfully used to optimize the training set of samples for such a genomic selection model.
According to the obtained results, the developed algorithm is capable of constructing a non-linear phenomenological model of gene expression in the developing Drosophila eye with almost the same average accuracy but significantly less deviation than the linear models obtained earlier.

Data Availability Statement:
The data analyzed in this study are available in Zenodo at 10.5281/zenodo.5792935.