Open Access
This article is
 freely available
 reusable
Information 2017, 8(4), 115; doi:10.3390/info8040115
Article
Predicting DNA Motifs by Using MultiObjective Hybrid Adaptive BiogeographyBased Optimization
^{1}
College of Information Science & Technology, Hainan University, No. 58 Renmin Avenue, Hai’kou 570228, China
^{2}
State Key Laboratory of Marine Resource Utilization in the South China Sea, Hainan University, No. 58 Renmin Avenue, Hai’kou 570228, China
*
Correspondence: Tel.: +8689866187056
Received: 7 August 2017 / Accepted: 18 September 2017 / Published: 21 September 2017
Abstract
:The computational discovery of DNA motifs is one of the most important problems in molecular biology and computational biology, and it has not yet been resolved in an efficient manner. With previous research, we have solved the singleobjective motif discovery problem (MDP) based on biogeographybased optimization (BBO) and gained excellent results. In this study, we apply multiobjective biogeographybased optimization algorithm to the multiobjective motif discovery problem, which refers to discovery of novel transcription factor binding sites in DNA sequences. For this, we propose an improved multiobjective hybridization of adaptive BiogeographyBased Optimization with differential evolution (DE) approach, namely MHABBO, to predict motifs from DNA sequences. In the MHABBO algorithm, the fitness function based on distribution information among the habitat individuals and the Pareto dominance relation are redefined. Based on the relationship between the cost of fitness function and average cost in each generation, the MHABBO algorithm adaptively changes the migration probability and mutation probability. Additionally, the mutation procedure that combines with the DE algorithm is modified. And the migration operators based on the number of iterations are improved to meet motif discovery requirements. Furthermore, the immigration and emigration rates based on a cosine curve are modified. It can therefore generate promising candidate solutions. Statistical comparisons with DEPT and MOGAMOD approaches on three commonly used datasets are provided, which demonstrate the validity and effectiveness of the MHABBO algorithm. Compared with some typical existing approaches, the MHABBO algorithm performs better in terms of the quality of the final solutions.
Keywords:
multiobjective optimization; motif discovery; transcription factor binding site; hybrid adaptive biogeographybased optimization1. Introduction
The motif discovery problem (MDP) in molecular biology is to find similar regions common to each sequence in a given set of DNA, RNA, or protein sequences [1]. It is an important problem for locating binding sites and finding conserved regions in unaligned sequences. From a computational point of view, finding motifs in many sequences is an NPhard problem. Many methods have been applied to solve MDP and have achieved excellent results such as statistical methods, probabilistic methods etc. In recent years, with the development of evolutionary algorithms and their advantages, they have also been gradually applied to MDP.
Evolutionary computation (EC) is an optimization method based on the principles of biological evolution and is gaining more attention in recent years. EC has certain advantages for motif discovery [2]. Evolutionary algorithms (EA) carry out global search and have relatively low sensitivity to initial conditions [3]. They are comparatively flexible in terms of how solutions are represented and evaluated, and do not require knowledge about the problem to which they are being applied. EC methods have been successfully applied to solve the motif discovery problem such as the genetic algorithm (GA) [4], bacterial foraging optimization algorithm integrating taboo search (TSBFO) [5], estimation of distribution algorithm with differential evolution (DE/EDA) [6], evolutionary multiobjective optimization (DEPT) [7], multiobjective artificial bee colony (MOABC) algorithm [8], Multiobjective genetic algorithm (MOGAMOD) [9], Nondominated Sorting Genetic algorithmIII (NSGAIII) [10] and Multiobjective evolutionary algorithm based on decomposition (MOEA/D) [11], etc. The biogeographybased optimization (BBO) algorithm (Simon, 2008) [12] is a natureinspired computational technique based on the mathematical models of biogeography. As a populationbased stochastic algorithm, the BBO algorithm generates the next generation population by simulating the characteristics of the biological species migration. Because of information sharing in the migration process, the BBO algorithm has a better exploitation ability. The BBO algorithm is superior for solving the singleobjective motif discovery problem [13,14,15], which has also been modified to solve multiobjective optimization problems (MOPs) [16,17,18,19]. However, in these papers, the BBO algorithm has still not been applied to solving the multiobjective motif discovery problem. In the literature [20], we have applied hybridization of adaptive biogeographybased optimization algorithm and differential evolution (DE) to multiobjective optimization problems (MOPs), and have achieved excellent performance on the convergence and the distribution of solutions.
The aim of this paper is based on our previous research, and is to apply the BBO algorithm to the multiobjective motif discovery problem. So far as we know, it is the first time that the multiobjective biogeographybased optimization has been applied to multiobjective MDP. In this paper, a new algorithm named MHABBO is presented and used to solve multiobjective MDP, and then presents a comparative study on twelve datasets with other different algorithms.
The motivation for proposing MHABBO for MDP in this research is threefold. First, based on the above literature review, there have been several successful applications based on multiobjective biogeographybased optimization (MBBO). Second, we have proposed a new MBBO algorithm and achieved excellent performance on multiobjective benchmark functions [20]. Finally, we will try to apply it to solve multiobjective MDP.
The key contributions of this paper are as follows. Firstly, we propose a new approach called MHABBO based on BBO algorithm to predict motifs. In the MHABBO algorithm, the migration in the BBO is implemented with the number of iterations to avoid the presence of a very stable local minimum. Secondly, motivated by the work described in References [21,22,23,24], the mutation is performed by integrating with DE to produce new feasible solutions. Simultaneously, the parameters of migration probability and mutation probability are adaptively changed. Furthermore, the immigration and emigration rates based on the cosine curve are modified. Finally, we apply MHABBO algorithm to the multiobjective motif discovery problem.
Compared with DEPT and MOGAMOD approaches on three commonly used datasets, the MHABBO algorithm performs better, or at least comparably, in terms of the quality of the final solutions. Statistical comparisons with some typical existing approaches demonstrate the validity and effectiveness of the MHABBO algorithm. Experimental results show that the obtained Pareto solutions can approximate to the Pareto optimal front and has good diversity and uniform distribution.
2. Motif Discovery Problem
In this paper, we also use the same objectives as those used in Reference [9] to find many long and strong motifs. The multiobjective motif discovery problem is converted into the following threeobjective optimization problem: Maximize similarity, Maximize motif length and Maximize support. These three objectives for MDP are defined as follows [7]:
 (1)
 Support: Support indicates the level of the support of the candidate motifs to the consensus motif. The consensus motif is built by using the candidate motifs. The level of the support is measured by similarity rate of the candidate motif to the consensus motif. The similarity rate means the same number of the nucleosides between the candidate motif and the consensus motif. When the similarity rate is larger than 50%, the subsequent corresponding to candidate motif can be considered as a Support. For example, the consensus motif is assumed to be GACCTTTTGCAATCCTGG, the candidate motif of the sequence 1, i.e., GACCACTTGCAGTCTTAG, has 13 nucleotides identical to the consensus motif, and the consensus motif has 18 nucleotides, so its similarity rate is 13/18 = 72%.
 (2)
 Motif Length: The motif length points to the number of the nucleotides of the consensus motif. In the example, the motif length is 18. According to real datasets used in this paper, the value of the motif length is limited to between 5 and 60.
 (3)
 Similarity: the similarity objective function of motif is defined as the average of the dominance values of all position weight matrix columns. The similarity is calculated based on Equation (1). In which the $dv$ in each column (dominant nucleotide) is the dominance value of the dominant nucleotide, it is calculated by Equation (2):$$Similarity(Motif)=\frac{{\displaystyle {\sum}_{i=1}^{l}dv(i)}}{length}$$$$dv(i)={\mathrm{max}}_{b}\{f(b,i)\}\text{\hspace{1em}\hspace{1em}}i=1,\dots ,l$$
To better understand the similarity objective function, an example is used to illustrate it. Firstly, a position weight matrix (PWM) from the motif patterns found by MHABBO in every sequence is generated. Then, the percentage of occurrence of nucleotides at each motif position is calculated (see Table 1). The highest value of each matrix column is selected. The similarity is obtained by averaging all these dominance values. In this example, the similarity value is computed as 0.81 (81%) by using Equation (6):
(1 + 1 + 0.5 + 0.75 + 1 + 1 + 0.75 + 0.5 + 1 + 0.5 + 0.75 + 1 + 0.75 + 1 + 0.5 + 1 + 0.5 + 1)/18 = 0.81.
3. MHABBO Algorithm
In this section, we describe the MHABBO algorithm for the motif discovery problem in detail. In the MHABBO algorithm, the migration in the BBO is implemented with the number of iterations, the mutation is performed by integrating with DE to produce new feasible solutions. Simultaneously the parameters about migration probability and mutation probability are adaptively changed. Meanwhile, the immigration rate and emigration rate based on a cosine curve are modified.
First, we describe the representation of the individuals in our algorithms. Because each individual contains the necessary information used to form a possible motif, an individual is represented as the motif length and the starting location ${s}_{i}$ of the potential motif on all the sequences. Representation of an individual is shown as Table 2. This representation is the same as that used in [9].
3.1. Migration Operator for the MDP
The sharing of features between solutions is represented as immigration and emigration between the islands. The immigration rate λ and the emigration rate µ of each solution are used to probabilistically share features between solutions. Motivated by the work in [25], these parameters are modified based on the cosine curve. The immigration rate and emigration rate of each individual are changed respectively by Equations (3) and (4). In which NP is the size of population.
$${\lambda}_{i}=\frac{1}{2}(\mathrm{cos}(\frac{i}{NP}\times \pi \frac{\pi}{2})+1).$$
$${\mu}_{i}=\frac{1}{2}(\mathrm{cos}(\frac{i}{NP}\times \pi \frac{\pi}{2})+1).$$
Motivated by blended migration operator in [26], in our algorithm, the coefficient of solution H_{i} is related to the number of iterations. The modified migration operator is based on the following considerations. First, blend combination operators have been widely used in other optimization algorithms. Second, good solutions will be not degraded due to migration. Besides, poor solutions can still accept a lot of new features from good solutions. The migration operator is designed to accelerate the speed of convergence based on the number of iterations. Modified migration is defined as:
where H_{i} is immigrating island, H_{k} is emigrating island, H_{i}(j) is the jth dimension of the ith solution, and t is the number of iterations, ${t}_{\mathrm{max}}$ is the maximum number of iteration. Equation (5) means a new solution after migration is comprised of two components: the migration of feature from itself and another solution. It accelerates the convergence speed of the algorithm. And modified migration operator is described as follows (Algorithm 1):
$${H}_{i}(j)=\frac{t}{{t}_{\mathrm{max}}}{H}_{i}(j)+(1\frac{t}{{t}_{\mathrm{max}}}){H}_{k}(j)\text{\hspace{1em}}.$$
Algorithm 1: Migration for the MDP (MigrationDo(H, ${P}_{modif}$)) Input: Initial population H and migration probability Output: The population H that have been optimized by migration 
For i = 1 to NP // NP is the size of population If rand < ${P}_{modif}$ Use ${\lambda}_{i}$ to probabilistically decide whether to immigrate to ${H}_{i}$ If $rand(0,1)<{\lambda}_{i}$ then For $k=1\text{\hspace{0.17em}}to\text{\hspace{0.17em}}NP$ Select the emigrating island ${H}_{k}$ with probability $\infty {\mu}_{i}$ If $rand(0,1)<{\mu}_{k}$ then For j = 1 to N_{d} // N_{d} is the dimension size ${H}_{i}(j)=\frac{t}{{t}_{\mathrm{max}}}{H}_{i}(j)+(1\frac{t}{{t}_{\mathrm{max}}}){H}_{k}(j)\text{\hspace{1em}}$ End for End if End for End if End if End for 
3.2. Mutation Operator for the MDP
Although the hybridization of the BBO with DE has achieved many good results [27,28,29], they incorporate DE into the migration procedure for singleobjective optimization problems. In MHABBO algorithm, DE is incorporated into the mutation procedure for multiobjective optimization problems. The algorithm helps to find the nondominated solutions. A mutated individual (${H}_{i}(j)$) is generated according to Equation (6)
where ${H}_{i}(j)$ is selected for mutation, ${c}_{1}$ is the mutation scaling factor, usually its value is set as range between 0.1 and 0.15. ${H}_{r}{}_{1}(j),\text{}{H}_{r}{}_{2}(j)$ is the randomly selected two solutions, ${H}_{best}(j)$ is the best solution in this generation. In MHABBO algorithm, this mutation scheme tends to increase the diversity among the population. Modified mutation operator is described as follows (Algorithm 2):
$${H}_{i}(j)={H}_{i}(j)+{c}_{1}\times ({H}_{best}(j){H}_{i}(j))+{c}_{1}\times ({H}_{r1}(j){H}_{r2}(j))\text{\hspace{1em}}$$
Algorithm 2: Mutation for the MDP (Mutation Do(H, ${P}_{muta}$)) Input: The population H optimized by migration, mutation probability Output: The population H that have been optimized by mutation 
For i = 1 to NP // NP is the size of population Select mutating habitat ${H}_{i}$ with probability $\propto {P}_{muta}$ If ${H}_{i}$ is selected, then For j = 1 to N_{d} // N_{d} is the dimension size ${H}_{i}(j)={H}_{i}(j)+{c}_{1}\times ({H}_{best}(j){H}_{i}(j))+{c}_{1}\times ({H}_{r1}(j){H}_{r2}(j))\text{\hspace{1em}}$ End for End if End for 
3.3. Adaptive BBO for MDP
Modification probability factors and mutation probability factors in the BBO algorithm are denoted as ${P}_{modif}$ and ${P}_{muta}$ respectively. The two factors with ranges between 0 and 1 are set by users. The settings of the parameters are related to the experience of the user, and they may be unfavorable for the selection of migration individual and mutation individual. In order to choose better migration individual and mutation individual, these parameters are changed dynamically with the fitness function.
In the adaptive BBO algorithm, the parameters of modification probability and mutation probability are altered according to Equations (7) and (8). In Equations (7) and (8), constant factor k_{1}, k_{2}, k_{3} and k_{4} which range between 0 and 1 are set by users. Usually k_{1} = 0.4, k_{2} = 0.95, k_{3} = 0.1, k_{4} = 0.25.
$${P}_{modif}=\{\begin{array}{ll}{k}_{1}\times \frac{MaxCostFitnessCost}{MaxCostAvgCost}& FitnessCost\ge AvgCost\\ {k}_{2}& FitnessCost<AvgCost\end{array}$$
$${P}_{muta}=\{\begin{array}{ll}{k}_{3}\times \frac{MaxCostFitnessCost}{MaxCostAvgCost}& \text{\hspace{1em}}FitnessCost\ge AvgCost\\ {k}_{4}& \text{\hspace{1em}}FitnessCost<AvgCost\end{array}$$
3.4. The Redefinition of the Fitness Function
In this paper, we propose the multiobjective MHABBO algorithm for multiobjective motif discovery problem. Generally speaking, solving multiobjective optimization problems is through Pareto nondominated sorting and crowding distance sorting of different solutions. The fitness function is determined based on the Pareto dominance relation in [30]. However, only considering the Pareto dominance relation is not enough, if the distribution of solutions is also included, the definition of the fitness function will be more reasonable. Originating from SPEA2 algorithm [31], which measures the Pareto dominance relationship and density relationship between different solutions as fitness function. So the fitness function is redefined based on density information and Pareto dominance relationship among the habitats. That is, we employ nondominated sorting approach to determine the nondominated rank of individuals. Specifically, the Pareto dominance relationship refers to the number of nondominated solutions that dominate an individual. The density of each individual is calculated by the k nearest neighbor method. For any individual ${H}_{i}=({H}_{i1},{H}_{i2}\dots ,{H}_{in})$ in the habitat population $H=\{{H}_{1},{H}_{2}\dots ,{H}_{NP}\}$, its fitness function is defined by Equation (9) as follows.
where ${H}_{i},{H}_{j},{H}_{k}\in H$ are habitats, ${\sigma}_{i}^{e}$ is the distance between the habitat ${H}_{i}$ and ${H}_{e}$ in the objective space, the operator $\cdot $ is the cardinality of the set. e is integer value of the square root of the sum of population number NP and elitism number $\overline{N}$. According to Equation (14), the fitness function $F({H}_{i})$, is the sum of $D(i)$ and the average of the sum of the number of dominated habitats in the total population. In which the number of dominated habitats means the total number of any other habitats whom every individual who dominates ${H}_{i}$ can dominate in the population. Therefore, the lower the dominated degree of habitat ${H}_{i}$ is, the smaller the fitness function of ${H}_{i}$ is, when the fitness function of ${H}_{i}$ is 0, it indicates that ${H}_{i}$ is a nondominated habitat.
$$\begin{array}{l}F({H}_{i})={\displaystyle \sum _{{H}_{i}\in H,{H}_{j}\prec {H}_{i}}\frac{1}{NP}\{k{H}_{j},{H}_{k}\in H\wedge k\ne j\wedge k\ne i\wedge {H}_{j}\prec {H}_{k}\}}+D(i);\\ D(i)=\frac{1}{{\sigma}_{i}^{e}+2},e=\lfloor \sqrt{NP+\overline{N}}\rfloor \\ i,j,k,e\in \{1,2,\dots ,NP\}\end{array}$$
3.5. Main Procedure of MHABBO for MultiObjective Motif Discovery Problem
Firstly, the fitness function on the basis of density information and Pareto dominance relation is redefined, then the modified migration procedure and the mutation procedure are merged into the BBO. Furthermore, related parameters in the BBO such as modification probability and mutation probability, emigration rate and immigration rate, are altered. The procedure of the MHABBO is described in Algorithm 3.
Algorithm 3: The main pseudocode of MHABBO algorithm for multiobjective MDP 
Input: The Sequences S Output: support, motif length, similarity and the nondominated consensus motif instance and corresponding PWM. 1. Init(number of iterations, elitism parameter keep, migration probability ${P}_{modif}$, mutation probability ${P}_{muta}$ etc.) 2. $P\Leftarrow $ GenerateInitialRandomPopulation() 3. EvaluateFitness(${H}_{i}$) for each habitat ${H}_{i}$ in $P$ according to Equation (9). 4. While the halting criterion is not satisfied do 5. Elite$\Leftarrow P$(1:keep) 6. Compute ${\lambda}_{i},{\mu}_{i}$ for each habitat ${H}_{i}$ according to Equations (3) and (4) 7. $P\Leftarrow $MigrationDo($P$, ${P}_{modif}$) Algorithm 1 8. $P\Leftarrow $MutationDo($P$, ${P}_{muta}$) Algorithm 2 9. EvaluateFitness(${H}_{i}$) 10. SortPopulation($P$) 11. $P\Leftarrow $ReplaceWorstbyElites ($P$, Elite) 12. $P\Leftarrow $ClearDuplicates($P$) 13. [maximum cost, minimum cost, average cost]$\Leftarrow $EvaluateCostItems() 14. [${P}_{modif}$,${P}_{muta}$]$\Leftarrow $updateProbability() Equations (7) and (8) 15. End while 
We generate the initial population for three different targets. That is, a solution is randomly generated when the length and support is different. Besides, we get the similarity between this candidate motif and consensus motif. Each solution has these three different indicators, including the length, support and similarity. That is, each solution reflects multiple different objectives. The function GenerateInitialRandomPopulation() in Algorithm 3 is described as follows:
GenerateInitialRandomPopulation() 

In the evaluation of the fitness function, the degrees of Pareto domination and distribution information between different solutions are reflected by fitness function. That is, the Pareto nondominated sorting is equivalent to the ranking of the value of fitness function. The MHABBO measures the Pareto dominance relationship and density information between different solutions as fitness function. We employ nondominated sorting approach to determine the nondominated rank of individuals. The function EvaluateFitness(${H}_{i}$) in Algorithm 3 is described as follows:
EvaluateFitness(${H}_{i}$) 

4. Simulation and Analysis
4.1. Simulation, Comparison and Discussion
4.1.1. Results Comparisons with Other Methods
In order to demonstrate the feasibility of the MHABBO algorithm for the MDP, MHABBO algorithm is compared with MOGAMOD and DEPT. Some experiments are carried out on a number of real sequence datasets which are selected from the TRANSFAC database [32]. Motif instances from different sequences of each dataset have already been tagged, so these datasets are used as a benchmark for the discovery of TBFSs [33]. The properties of datasets are shown in Table 3. Every real dataset corresponds to living beings in nature. More concretely, three of the datasets are from the fly (those beginning by dm), three from the human being (hm), three from the mouse (mus), and three from yeast (yst). Meanwhile, datasets with a different number of sequences and different sizes (nucleotides per sequence) are selected to ensure that our algorithm works with several types of instances. For example, the yst04r sequence dataset contains 7 sequences of 1000 bps each. Motif instances from the yst04r sequence have 7 instances ranges from 5 to 25. The hm03r sequence dataset contains 10 sequences of 1500 bps each. Motif instances from the hm03r sequence have 15 instances ranges from 14 to 46. Using datasets from different species, the new algorithm can obtain the meaning motifs in all types of biological data. The times are also given in Table 3. The algorithm has been implemented by using the MATLAB R2014b programming language. All experiments were performed using windows10 OS, 64 bit processer, Inter(R) Core(TM) i56200U CPU (2.30 GHz) with 12 GB of RAM.
Parameters used by the MHABBO algorithm are shown in Table 4. MHABBO algorithm has been run 5 times for each dataset with different random seeds. The top 20 results obtained by a nondominated sort in the 5 runs are recorded. Due to the limit of the length of article, we only list the parameters used for yst04r dataset: DE mutation scheme is DE/rand/1/bin, the population size is 100, the maximal generation number is 100, number of variables in each individual is 8, the value of motif length is between 5 to 25, habitat modification probability is 0.75, mutation probability is 0.05, elitism parameter is 10, and scaling factor is 0.01, k_{1} factor is 0.4; k_{2} factor is 0.95; k_{3} factor is 0.05; k_{4} factor is 0.1. Other test problems have similar parameters to yst04r dataset. These parameters that are different from the yst04r dataset have a number of variables for each individual mutation probability and modification probability etc. We assume a motif instance is correctly discovered if the predicted binding site is within 3 bps away from the true binding site.
The performance of MHABBO for the MDP has been compared with other different methods such as MOGAMOD and DEPT. Comparisons of the motif predicted by MHABBO and corresponding Support, Similarity and Motif Length with other methods for yst04r are shown in Table 5. Comparisons of three objectives and the motif predicted by MHABBO with other methods for yst08r are shown in Table 6. Comparisons of three objectives and the motif predicted by MHABBO with different methods for hm03r are shown in Table 7. The motifs predicted with “*” in the Table 6 indicate that the motif predicted is consistent with the known motif instance.
From these tables above, MHABBO achieves better results than MOGAMOD, while MHABBO achieves solutions similar to solutions obtained by DEPT and several motif instances predicted by MHABBO are very similar with the known motif instances. So we conclude that the MHABBO algorithm can predict meaningful motifs, therefore it is a promising method for multiobjective motif discovery. As the length of the predicted motifs becomes longer, the similarity does not obviously decrease. From Table 5 and Table 6, we observe that there are some motifs only one of algorithms can predict. The reason is that different search strategies explore different search spaces. Hence, MHABBO is chosen for motif discovery. Additionally, the MHABBO algorithm can not only predict some motifs acquired by other known methods but also find novel motifs. However, the accuracy of the predicted motifs is not high enough. The reason is that the performance of MHABBO is influenced by randomly selecting an SIV during the process of migration and mutation between the islands. Another reason is that the definition of fitness function just considers some factors, so it may lead them away from accuracy.
4.1.2. The Consensus Motifs Obtained by MHABBO Algorithm
Sequence logos are a graphical representation of an amino acid or nucleic acid multiple sequence alignment developed by Tom Schneiderand Mike Stephens [37]. A sequence logo provides a richer and more precise description of, for example, a binding site, than a consensus sequence. WebLogo is a web based application designed to make the generation of sequence logos as easy and painless as possible, so the consensus motifs predicted by our algorithm on different datasets are expressed by WebLogo in Table 8.
4.1.3. Representation of the Pareto Fronts Obtained by MHABBO Algorithm
In order to have a visual perspective on the results, we show the graphs corresponding to the solutions obtained by MHABBO for each dataset (see Figure 1). The graphs show the Pareto front points (blue points) that are obtained by running the algorithm configured with the optimal parameters. The motif length is represented in the Xaxis, the similarity in the Yaxis, and the support in the Zaxis. Furthermore, we show the projection of each point at the planes XY (purple points), XZ (red points) and ZY (yellow points).
The Pareto fronts obtained by MHABBO are shown in Figure 1, which shows that MHABBO on these datasets achieves better distribution of solutions. For example, there are 7 motif instances with a length range between 9 and 54 in the hm16r dataset. The length value of most parts of these motif instances is about 20. It can be seen from Figure 1 that the length of most obtained solutions is about 20. So the distribution of the obtained solutions is consistent with the distribution of the standard solution. As the length of the predicted motifs becomes longer, the similarity does not obviously decrease and as the support of the predicted motifs becomes larger, the similarity does not obviously decrease. The results demonstrate that the proposed MHABBO algorithm is competitive on the quantity and the distribution of final solutions. The results also present the distribution of the solutions and the convergence to Paretooptimal front. It indicates that our approach performed well on multiobjective MDP.
4.2. Metrics to Assess Performance
Performance metrics play an important role in returning a scalar quantity, which reflects the quality of solutions. For each tool T and each data set D, we now have the set of known binding sites and the set of predicted binding sites. The correctness of T on D can be assessed both at the nucleotide level and at the motif level. There are many metrics that can be used to measure the quality of MDP [30], for example, nucleotidelevel sensitivity (nSn), nucleotidelevel positive predictive value (nPPV), the nucleotidelevel correlation coefficient (nCC), nucleotidelevel performance coefficient (nPC), the motiflevel correlation coefficient (mCC) and the motiflevel Fscore[35] etc. The following metrics are used in this paper: the nPC and Fscore.
4.2.1. The NucleotideLevel Performance Coefficient (nPC)
To measure the prediction accuracy of methods with respect to motif location, we have used the nucleotidelevel performance coefficient (nPC). It was also adopted by Tompa et al. to evaluate binding site predictions in their single motif discovery benchmark study. The nPC is defined as follows:
$$n\mathrm{PC}=\frac{n\mathrm{TP}}{n\mathrm{TP}\text{}+\text{}n\mathrm{FN}\text{}+\text{}n\mathrm{FP}}$$
Here, nTP is the number of nucleotide positions in both known sites and predicted sites, while nFN is the number of nucleotide positions in known sites but not in predicted sites, nFP is the number of nucleotide positions not in known sites but in predicted sites.
In order to furtherly measure the efficiency of this algorithm, the nPC value obtained by the MHABBO algorithm on different test functions are compared with other 14 algorithms. The results obtained by the fifteen different algorithms used to the different test problems are given in Figure 2.
It can be seen from Figure 2 that the nPCs obtained by the MHABBO algorithm on datasets from 1 to 9 are significantly better than the other fourteen algorithms. Except that the nPCs obtained by the MHABBO algorithm on datasets (Yst) are worse than several algorithms. This algorithm shows better performance on higher organisms than simpler organisms. It can be concluded that the performance of the algorithm is not obviously decreased with the increase of the dimension of the problem.
4.2.2. FScore
To assess the performance of our algorithm at the motiflevel, Precision, Recall and Fscore are adopted on the basis of Equation (10) [38], where the operator · is the cardinality of the set. The candidate motif instances obtained by MHABBO need to be verified by biological experiments. We hope to have a high Precision and a high Recall. The Fscore is a tradeoff between Precision and Recall.
$$\begin{array}{l}Precision=\frac{correct\text{\hspace{0.17em}}motif}{motif\text{\hspace{0.17em}}found},\text{\hspace{0.17em}\hspace{1em}}Recall=\frac{correct\text{\hspace{0.17em}}motif}{true\text{\hspace{0.17em}}motif\text{\hspace{0.17em}}},\text{\hspace{1em}}\\ \text{\hspace{1em}\hspace{1em}}Fscore=2*\frac{Precision*Recall}{Precision+Recall}\end{array}$$
Average results (precisions (P), recalls (R) and Fscores (F)) obtained by MHABBO on the twelve datasets is shown in Table 9. The comparisons of MHABBO with other methods [33] on the three datasets are given in Table 10.
Table 10 shows the average results of these algorithms in 5 runs. According to the Fscore, MHABBO on hm03r and mus02r dataset is the best algorithm of all twelve algorithms, and it is better than ABBO/DE/GEN for singleobjective motif discovery problems on hm03r and mus02r dataset. However, it is worse on yst08r than MEME, MEME3, ABBO/DE/GEN and MOTIFSAMPLE. This algorithm shows better performance on higher organisms than simpler organisms. The experiments demonstrate the validity of the proposed MHABBO algorithm for multiobjective motif discovery problems.
Assessing performance of the MHABBO algorithm at the nucleotide level and at the motif level, similar results have been obtained. That is to say, the more dimensions of the problem there are, the performance of the MHABBO does not worsen. This algorithm can obtain a more significant motif. It also shows that the algorithm on the convergence has better performance on higher organisms than simpler organisms.
5. Conclusions and Future Research
Since multiobjective, biogeographybased optimization has not been applied to the multiobjective motif discovery problem, we propose a hybrid multiobjective optimization algorithm named MHABBO to solve threeobjective motif discovery problem on the basis of our previous research work. Compared with the existing methods, the proposed algorithm has the following advantages. Firstly, the redefinition of fitness evaluation based on MOEA can simplify the multiobjective optimization problem and use the Pareto dominance relationship to preserve population diversity. Secondly, modifying migration operations can speed up the convergence of the algorithm, and the mutation is performed by integrating with DE to produce new feasible solutions. In such a way, population diversity can be maintained. Finally, the robustness of the algorithm is enhanced by adaptively changed parameters related to the BBO algorithm.
Statistical comparisons with some typical existing approaches on several commonly used datasets are provided. The main work has been done in this paper as follows. Firstly, the motif instances obtained by the MHABBO algorithm on three commonly used datasets are compared with five other algorithms. Secondly, according to the PWMs corresponding to the obtained motif instances on twelve commonly used datasets, the logos of the motif instances are acquired using the online WebLogo software. Thirdly, the Pareto fronts of obtained motif instances on twelve commonly used datasets are drawn according to threeobjective of the motif discovery problem. Finally, based on the NPC and Fscore methods, the new algorithm is compared with other classical algorithms.
The experiments have indicated that the MHABBO algorithm outperforms other algorithms on the hm03r and mus02r datasets. From the Pareto fronts obtained by MHABBO, the results demonstrate that the proposed MHABBO algorithm is competitive on the convergence to Paretooptimal front and the distribution of final solutions. It also shows that the algorithm on the convergence performs better on higher organisms than simpler organisms. It demonstrates the validity and effectiveness of the proposed MHABBO algorithm used to predict motifs from DNA sequences.
In this paper, we mainly discuss the multiobjective motif discovery problem. In the future, we will continue to improve the multiobjective BBO algorithm. We will try to combine NSGAIII or MOEA/D with the BBO algorithm for motif discovery problem. Additionally, in our earlier work, we discussed the portfolio optimization problem in secondorder stochastic dominance constraint based on the BBO algorithm [46], and we will try to apply the multiobjective BBO algorithm to the multiobjective portfolio optimization problem [47,48] in the future.
Acknowledgments
We gratefully acknowledge the support of the Joint Funds of the National Natural Science Foundation of China (No. 61462022, No. 11561017), the Major Science and Technology Project of Hainan Province (ZDKJ2016015), the Natural Science Foundation of Hainan province of China (No. 20156226) and the Hainan University scientific research foundation (No. kyqd1533) in carrying out this research.
Author Contributions
Siling Feng contributed the new processing method, and conceived and designed the experiments; Siling Feng and Ziqiang Yang performed the experiments; Siling Feng and Ziqiang Yang analyzed the data; Mengxing Huang contributed analysis tools; and Siling Feng wrote the paper.
Conflicts of Interest
The authors declare no conflict of interest.
References
 Patrik, D. What are DNA sequence motifs. Nat. Biotechnol. 2006, 24, 423–425. [Google Scholar]
 Lones, M.A.; Yo, Y.; Tyrrell, A.M. The Evolutionary Computation Approach to Motif Discovery in Biological Sequences. In Proceedings of the 7th Annual Workshop on Genetic and Evolutionary Computation (GECCO’052005), Washington, DC, USA, 25–29 June 2005. [Google Scholar]
 Lou, Y.; Li, J.; Jin, L.; Li, G. A CoEvolutionary Algorithm Based on Elitism and Gravitational Evolution Strategies. J. Comput. Inf. Syst. 2012, 7, 2741–2750. [Google Scholar]
 Che, D.; Song, Y.; Rashedd, K. MDGA: Motif discovery using a genetic algorithm. In Proceedings of the 2005 Conference on Genetic and Evolutionary Computation (GECCO 2005), Washington, DC, USA, 25–29 June 2005; pp. 447–452. [Google Scholar]
 Shao, L.; Chen, Y. Bacterial Foraging Optimization Algorithm Integrating Tabu Search for Motif Discovery. In Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2009), Washington, DC, USA, 1–4 November 2009; pp. 415–418. [Google Scholar]
 Shao, L.; Chen, Y.; Abraham, A. Motif Discovery using Evolutionary Algorithms. In Proceedings of the International Conference of Soft Computing and Pattern Recognition (SOCPAR 2009), Malacca, Malaysia, 4–7 December 2009; pp. 420–425. [Google Scholar]
 GonzalezÁlvarez, D.L.; VegaRodriguez, M.A.; GomezPulido, J.A.; SanchezPérez, J.M. Predicting DNA Motifs by Using Evolutionary Multiobjective Optimization. IEEE Trans. Syst. Man Cybern. 2012, 42, 913–925. [Google Scholar] [CrossRef]
 GonzálezÁlvarez, D.L.; VegaRodríguez, M.A.; Pulido, J.A.G.; SánchezPérez, J.M. Finding Motifs in DNA Sequences Applying a Multiobjective Artificial Bee Colony (MOABC) Algorithm. In Proceedings of the 9th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics (EvoBIO), Torino, Italy, 27–29 April 2011; pp. 89–100. [Google Scholar]
 Kaya, M. MOGAMOD: Multiobjective genetic algorithm for motif discovery. Int. J. Expert Syst. Appl. 2009, 36, 1039–1047. [Google Scholar] [CrossRef]
 Deb, K.; Jain, H. An evolutionary manyobjective optimization algorithm using referencepointbased nondominated sorting approach. Part I: Solving problems with box constraints. IEEE Trans. Evol. Comput. 2014, 18, 577–601. [Google Scholar] [CrossRef]
 Zhang, Q.; Li, H. MOEA/D: A multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evol. Comput. 2014, 11, 712–731. [Google Scholar] [CrossRef]
 Simon, D. Biogeographybased optimization. IEEE Trans. Evol. Comput. 2008, 12, 702–771. [Google Scholar] [CrossRef]
 Feng, S.L.; Zhu, Q.X.; Gong, X.J.; Zhong, S. BiogeographyBased Optimization for Motif Discovery Problem. J. Comput. Inf. Syst. 2013, 9, 6999–7010. [Google Scholar]
 Feng, S.L.; Zhu, Q.X.; Gong, X.J.; Zhong, S. Hybridizing BiogeographyBased Optimization with Differential Evolution for Motif Discovery Problem. ICIC Express Lett. 2013, 7, 3343–3348. [Google Scholar]
 Feng, S.L.; Zhu, Q.X.; Zhong, S.; Gong, X.J. Hybridizing Adaptive BiogeographyBased Optimization with Differential Evolution for Motif Discovery Problem. Sens. Transducers 2014, 162, 233–237. [Google Scholar]
 Chutima, P.; Wong, N. A Pareto biogeographybased optimisation for multi objective twosided assembly line sequencing problems with a learning effect. Comput. Ind. Eng. 2014, 69, 89–104. [Google Scholar] [CrossRef]
 E Silva, M.D.A.C.; Coelho, L.D.S.; Lebensztajn, L. Multi objective biogeographybased optimization based on predatorprey approach. IEEE Trans. Magn. 2012, 48, 951–954. [Google Scholar] [CrossRef]
 Ma, H.; Su, S.; Simon, D.; Fei, M. Ensemble multiobjective biogeographybased optimization with application to automated warehouse scheduling. Eng. Appl. Artif. Intell. 2015, 44, 79–90. [Google Scholar] [CrossRef]
 Goudos, S.K.; Plets, D.; Liu, N.; Martens, L.; Joseph, W. A multiobjective approach to indoor wireless heterogeneous networks planning based on biogeographybased optimization. Comput. Netw. 2015, 91, 564–576. [Google Scholar] [CrossRef]
 Feng, S.; Yang, Z.; Huang, M. Hybridizing Adaptive BiogeographyBased Optimization with Differential Evolution for MultiObjective Optimization Problems. Information 2017, 8, 83. [Google Scholar] [CrossRef]
 Jadon, S.S.; Tiwari, R.; Sharma, H.; Bansal, J.C. Hybrid Artificial Bee Colony algorithm with Differential Evolution. Appl. Soft Comput. 2017, 58, 11–24. [Google Scholar] [CrossRef]
 Loris, V.; Marco, S. Coupling response surface and differential evolution for parameter identification problems. Comput.Aided Civil Infrastruct. Eng. 2015, 30, 376–393. [Google Scholar] [CrossRef]
 Trivedi, A.; Srinivasan, D.; Biswas, S.; Reindl, T. Hybridizing genetic algorithm with differential evolution for solving the unit commitment scheduling problem. Swarm Evol. Comput. 2015, 23, 50–64. [Google Scholar] [CrossRef]
 Wang, X.; Xu, Z. Multiobjective optimization algorithm based on biogeography with chaos. Int. J. Hybrid Inf. Technol. 2014, 7, 225–234. [Google Scholar] [CrossRef]
 Ma, H.; Simon, D. Analysis of migration models of biogeographybased optimization using Markov theory. Eng. Appl. Artif. Intell. 2011, 24, 1052–1060. [Google Scholar] [CrossRef]
 Ma, H.; Simon, D. Blended biogeographybased optimization for constrained optimization. Eng. Appl. Artif. Intell. 2011, 24, 517–525. [Google Scholar] [CrossRef]
 Boussaïd, I.; Chatterjee, A.; Siarry, P.; AhmedNacer, M. Twostage update biogeographybased optimization using differential evolution algorithm (DBBO). Comput. Oper. Res. 2011, 38, 1188–1198. [Google Scholar] [CrossRef]
 Cai, Z. A Novel Hybrid Biogeographybased with Differential Mutation. In Proceedings of the International Conference on Electronic & Mechanical Engineering and Information Technology, Harbin, China, 12–14 August 2011; pp. 2710–2714. [Google Scholar]
 Gong, W.; Cai, Z.; Ling, C.X. DE/BBO: A hybrid differential evolution with biogeographybased optimization for global numerical optimization. Soft Comput. 2010, 15, 645–665. [Google Scholar] [CrossRef]
 Bi, X.; Wang, J.; Li, B. Multiobjective optimization based on hybrid biogeographybased optimization. Syst. Eng. Electron. 2014, 36, 179–186. [Google Scholar]
 Zitzler, E.; Laumanns, M.; Thiele, L. SPEA2: Improving the strength Pareto evolutionary algorithm for multiobjective optimization. In Proceedings of the Evolutionary Methods for Design, Optimization and Control with Applications to Industrial Problems, Athens, Greece, 19–21 September 2001; pp. 95–100. [Google Scholar]
 Wingender, E.; Dietze, P.; Karas, H.; Knüppel, R. TRANSFAC: A database on transcription factors and their DNA binding sites. Nucleic Acids Res. 1996, 24, 238–241. [Google Scholar] [CrossRef] [PubMed]
 Tompa, M.; Li, N.; Bailey, T.L.; Church, G.M.; De Moor, B.; Eskin, E.; Favorov, A.V.; Frith, M.C.; Fu, Y.; Kent, W.J.; et al. Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 2005, 23, 137–144. [Google Scholar] [CrossRef] [PubMed]
 Hughes, J.D.; Estep, P.W.; Tavazoie, S.; Church, G.M. Computational identification of cisregulatory elements associated with functionally coherent groups of genes in Saccharomyces cerevisiae. J. Mol. Biol. 2000, 296, 1205–1214. [Google Scholar] [CrossRef] [PubMed]
 Bailey, T.L.; Elkan, C. The value of prior knowledge in discovering motifs with MEME. In Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology, Cambridge, UK, 16–19 July 1995; AAAI Press: Menlo Park, CA, USA, 1995; pp. 21–29. [Google Scholar]
 Pavesi, G.; Mereghetti, P.; Mauri, G.; Pesole, G. Weeder Web: Discovery of transcription factor binding sites in a set of sequences from coregulated genes. Nucleic Acids Res. 2004, 32, W199–W203. [Google Scholar] [CrossRef] [PubMed]
 WebLogo 3. Available online: http://weblogo.threeplusone.com/create.cgi (accessed on 19 September 2017).
 Li, G.; Chan, T.M.; Leung, K.S.; Lee, K.H. A Cluster Refinement Algorithm for Motif Discovery. IEEE Trans. Comput. Biol. Bioinform. 2010, 7, 1–15. [Google Scholar] [CrossRef] [PubMed]
 Sinha, S.; Tompa, M. YMF: A program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 2003, 31, 3586–3588. [Google Scholar] [CrossRef] [PubMed]
 Favorov, A.V.; Gelfand, M.S.; Gerasimova, A.V.; Mironov, A.A.; Makeev, V.J. Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length and its validation on the ArcA binding sites. In Proceedings of the Fourth International Conference on Bioinformatics of Genome Regulation and Structure (BGRS 2004), Novosibirsk, Russia, 25–30 July 2004. [Google Scholar]
 Régnier, M.; Denise, A. Rare events and conditional events on random strings. Discrete Math. Theor. Comput. Sci. 2004, 6, 191–214. [Google Scholar]
 Thijs, G.; Lescot, M.; Marchal, K.; Rombauts, S.; De Moor, B.; Rouze, P.; Moreau, Y. A higherorder background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 2001, 17, 1113–1122. [Google Scholar] [CrossRef] [PubMed]
 Eskin, E.; Pevzner, P. Finding composite regulatory patterns in DNA sequences. Bioinformatics 2002, 18 (Suppl. 1), S354–S363. [Google Scholar] [CrossRef] [PubMed]
 Workman, C.T.; Stormo, G.D. ANNSpec: A method for discovering transcription factor binding sites with improved specificity. In Pacific Symposium on Biocomputing; Altman, R., Dunker, A.K., Hunter, L., Klein, T.E., Eds.; Stanford University: Stanford, CA, USA, 2000; pp. 467–478. [Google Scholar]
 Ao, W.; Gaudet, J.; Kent, W.J.; Muttumu, S.; Mango, S.E. Environmentally induced foregut remodeling by PHA4/FoxA and DAF12/NHR. Science 2004, 305, 1743–1746. [Google Scholar] [CrossRef] [PubMed]
 Ye, T.; Yang, Z.; Feng, S. BiogeographyBased Optimization of the Portfolio Optimization Problem with Second Order Stochastic Dominance Constraints. Algorithms 2017, 10, 100. [Google Scholar] [CrossRef]
 Sawik, B. Survey of multiobjective portfolio optimization by linear and mixed integer programming. In Applications of Management Science; Lawrence, K.D., Kleinman, G., Eds.; Emerald Group Publishing Limited: Bingley, UK, 2013; Volume 16, pp. 55–79. [Google Scholar]
 Sawik, B. A Review of MultiCriteria Portfolio Optimization by Mathematical Programming. In Recent Advances in Computational Finance; Dash, G.H., Thomaidis, N., Eds.; Nova Science Publishers: New York, NY, USA, 2013; pp. 149–172. [Google Scholar]
Figure 1.
Representation of the Pareto fronts obtained by MHABBO: (a) the distribution of solutions on the DM01g dataset; (b) the distribution of solutions on DM04g; (c) the distribution of solutions on DM05g; (d) the distribution of solutions on HM03r; (e) the distribution of solutions on HM04r; (f) the distribution of solutions on HM16g; (g) the distribution of solutions on MUS02r; (h) the distribution of solutions on MUS07g; (i) the distribution of solutions on MUS11m; (j) the distribution of solutions on YST03m; (k) the distribution of solutions on YST04r; (l) the distribution of solutions on YST08r.
Figure 2.
Comparisons of nPC value obtained by MHABBO with other algorithms on different test problems (Dm01g = 1, Dm04g = 2, Dm05g = 3, Hm03r = 4, Hm04m = 5, Hm16g = 6, Mus02r = 7, Mus07g = 8, Mus11m = 9, Yst03m = 10, Yst04r = 11 and Yst08r = 12).
1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  

A  0  1  0.25  0.25  0  0  0  0.25  0  0.25  0.75  1  0  0  0.25  0  0.25  0 
C  0  0  0.5  0.75  0  0  0  0.25  0  0.5  0  0  0.25  1  0.5  0  0.25  0 
T  0  0  0.25  0  1  1  0.75  0.5  0  0.25  0.25  0  0.75  0  0.25  1  0  0 
G  1  0  0  0  0  0  0.25  0  1  0  0  0  0  0  0  0  0.5  1 
Motif Length  Seq. 0  Seq. 1  Seq. 2  …  Seq. n 

length  S0  S1  S2  ...  Sn 
Dataset  #Sequence  Length  #Instance  #Width of Motifs  Time (s) 

Dm01g  4  1500  7  13–28  50 
Dm04g  4  2000  9  10–26  51 
Dm05g  5  2500  14  6–21  58 
Hm03r  10  1500  15  14–46  42 
Hm04m  13  2000  11  7–44  37 
Hm16g  7  3000  7  9–54  38 
Mus02r  9  1000  12  10–33  38 
Mus07g  4  1500  4  15–33  53 
Mus11m  12  500  15  6–27  42 
Yst03m  8  500  18  6–24  44 
Yst04r  7  1000  7  5–25  39 
Yst08r  11  1000  14  12–49  39 
MHABBO  DEPT  MOGAMOD 

Population Size: 100  Population Size: 200  Population Size: 200 
Migration Probability: 0.75  Crossover Probability: 0.25  Crossover: SPX with probability 0.6 
Mutation Probability: 0.05  Mutation Factor: 0.03  Mutation Factor: 0.5 
Maxgen: 100 Elitism parameter: 10  Selection Scheme: Rand/1/Binomial  Parents choose: Binary Tournament New Generation Selection: Elitist 
Scaling factor c_{1}: 0.01  
k_{1}: 0.4, k_{2}: 0.95, k_{3}: 0.05; k_{4}: 0.1 
Method  Support  Length  Similarity  Predicted Motif 

AlignACE [34]  N/A  10  N/A  C ATTCCA 
MEME [35]  N/A  11  N/A  C ATTCCCC 
Weeder [36]  N/A  10  N/A  TTTTCT CA 
MOGAMOD [9]  5  14  0.84  C A CTTCCACTAA 
6  14  0.77  C ATTCCTCTAT  
DEPT  5  22  0.854  TAAATCTTTTACTTTTTTTTCT 
6  19  0.842  CTAATTCATTCTTTTTCAA  
7  15  0.847  TTTCT CAAACACA  
MHABBO  6  5  0.85  AAATC* 
2  19  0.82  GAGCAAGAAGCCAATGAAA  
2  10  0.8  TAACCAAGAA*  
3  5  0.93  TTTCT 
Method  Support  Length  Similarity  Predicted Motif 

AlignACE  N/A  11  N/A  CACCCA ACAC 
N/A  12  N/A  T ATT CACT A  
MEME  N/A  11  N/A  CACCCA ACAC 
Weeder  N/A  10  N/A  ACACCCA AC 
MOGAMOD  7  15  0.84  C ACT T CCT 
8  14  0.83  CCA AAAAA C  
8  13  0.85  ACACCCA ACATC  
DEPT  7  20  0.84  TCAATTTTTTTTTTCTATTC 
8  19  0.83  TTATTTTTTTCTCTTTC  
8  15  0.85  CCATATTTCTTCTA  
MHABBO  2  40  0.74  CACTACAATTGCTTTGAGTGGTGTATTCTCAGTCGCCAAG 
3  16  0.75  GGTGTATGTCCTAATA*  
3  34  0.68  AACCAGACAAAC*AAAAGAAAAAAAAAATTAAAAG  
2  31  0.81  AGAACAAAAAAAAAAAAAAAAAAAAAAAAAA 
Method  Support  Length  Similarity  Predicted Motif 

AlignACE  N/A  13  N/A  T T ATAAAAAA 
MEME  N/A  20  N/A  A T TA ATAAAA AAAAAC 
Weeder  N/A  10  N/A  T ATCACT 
MOGAMOD  7  22  0.74  TATCATCCCT CCTA ACACAA 
7  18  0.82  T ACTCT TCCCTA TCT  
10  11  0.74  TTTTTTCACCA  
10  10  0.79  CCCA CTTA  
10  9  0.81  A T TCC  
DEPT  7  22  0.78  A CTTA T CCT ACACA A A 
9  12  0.83  A TCTCA T CC  
10  9  0.85  T A ACTCA  
MHABBO  2  29  0.85  ATCATAGGACCTCCCTTGCTTCCCAATGG 
2  25  0.76  CCTTTTATTGTTCTATT*  
2  13  0.85  AATTAGGAGACAA*  
3  36  0.68  AACAACAAAAGATAAAAAGTCAAATGAATGAACTCA 
Dataset  Predicted Motif 

Dm01g  
Dm04g  
Dm05g  
Hm03r  
Hm04m  
Hm16g  
Mus02r  
Mus07g  
Mus11m  
Yst03m  
Yst04r  
Yst08r 
Table 9.
Average results (precisions (P), recalls (R) and Fscores (F)) of MHABBO on the twelve datasets.
Algorithms  Dm 01g  Dm 04g  Dm 05g  Hm 03r  Hm 04m  Hm 16g  Mus 02r  Mus 07g  Mus 11m  Yst 03m  Yst 04r  Yst 08r  

MHABBO  P  3/100  2/20  4/20  6/20  8/20  5/10  8/20  2/10  8/20  8/20  4/10  5/20 
R  3/7  2/9  4/14  6/15  8/10  5/7  8/12  2/4  8/15  8/18  4/7  5/14  
F  0.06  0.14  0.24  0.34  0.53  0.59  0.5  0.29  0.46  0.42  0.47  0.29 
Table 10.
Comparisons of MHABBO with other methods on the three datasets: average results (precisions (P), recalls (R) and Fscores (F)).
Algorithms for MDP  Dataset  Algorithms for MDP  Dataset  

Hm03  Mu02  Yst08  Hm03  Mu02  Yst08  
YMF [39]  P  0/25  1/12  0/11  AlignACE [34]  P  0/14  0/0  9/41 
R  0/15  1/12  0/14  R  0/15  0/12  9/14  
F  0  0.08  0  F  0  0  0.33  
SeSiMCMC [40]  P  1/10  0/9  0/21  MEME [35]  P  1/12  2/14  6/11 
R  1/15  0/12  0/14  R  1/15  2/12  6/14  
F  0.08  0  0  F  0.074  0.154  0.48  
QuickScore [41]  P  0/22  1/22  3/56  MOTIFSAMPLE [42]  P  0/21  1/18  7/9 
R  0/15  1/12  3/14  R  0/15  1/12  7/14  
F  0  0.06  0.08  F  0  0.07  0.61  
MITRA [43]  P  0/10  0/9  1/12  ANNSPEC [44]  P  0/13  1/32  7/26 
R  0/15  0/12  1/14  R  0/15  1/12  7/14  
F  0  0  0.08  F  0  0.05  0.35  
Improbizer [45]  P  1/20  0/18  1/22  MEME3 [35]  P  0/7  0/0  9/17 
R  1/15  0/12  1/14  R  0/15  0/12  9/14  
F  0.06  0  0.06  F  0  0  0.58  
MHABBO  P  6/20  8/20  5/20  ABBO/DE/GEN [15]  P  5/30  5/30  8/30 
R  6/15  8/12  5/14  R  5/15  5/12  8/14  
F  0.34  0.5  0.29  F  0.22  0.24  0.36 
© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).