An Opposition-Based Learning CRO Algorithm for Solving the Shortest Common Supersequence Problem

As a non-deterministic polynomial hard (NP-hard) problem, the shortest common supersequence (SCS) problem is normally solved by heuristic or metaheuristic algorithms. One type of metaheuristic algorithms that has relatively good performance for solving SCS problems is the chemical reaction optimization (CRO) algorithm. Several CRO-based proposals exist; however, they face such problems as unstable molecular population quality, uneven distribution, and local optimum (premature) solutions. To overcome these problems, we propose a new approach for the search mechanism of CRO-based algorithms. It combines the opposition-based learning (OBL) mechanism with the previously studied improved chemical reaction optimization (IMCRO) algorithm. This upgraded version is dubbed OBLIMCRO. In its initialization phase, the opposite population is constructed from a random population based on OBL; then, the initial population is generated by selecting molecules with the lowest potential energy from the random and opposite populations. In the iterative phase, reaction operators create new molecules, where the final population update is performed. Experiments show that the average running time of OBLIMCRO is more than 50% less than the average running time of CRO_SCS and its baseline algorithm, IMCRO, for the desoxyribonucleic acid (DNA) and protein datasets.

Compared to other heuristic algorithms, CRO algorithms have achieved better results in solving the SCS problem. The CRO algorithm is inspired by the process of obtaining low-potential energy molecules from chemical reactions in nature. The algorithm calls the solution of the problem a molecule, which is obtained from an iterative process in which four reaction operators are used to generate new molecules performing local and global search. To solve the SCS problem, Saifullah et al. proposed the CRO_SCS algorithm, a CRO-based proposal with the inclusion of a check and repair function to repair molecules after iterations [11]. The operator checks each character of the generated superstring and records the number of illegal characters. If the number of violations exceeds a certain standard, the string is discarded. If the number of violations is less than a predefined value, it is repaired. Compared to AOC, ABC, IBS and DR, CRO_SCS is able to shorten the average length of SCS and reduce the average running time.
In order to further improve the performance of the CRO algorithm in solving the SCS problem, a new algorithm based on CRO_SCS, the IMCRO algorithm, was proposed in [12]. It introduced the circular shift operator and the two-step crossover operator in the decomposition reaction and the intermolecular collision reaction. Compared to CRO_SCS, the average length of the shortest common superstring calculated by IMCRO is shorter by 1.02 characters on average.
Even though these CRO-based algorithms have shown important performance gains, they can only perform iterative searches based on the current solution. On the one hand, they are affected by the randomness of the initial population, while on the other they face the problems of slow convergence and premature convergence [13]. The oppositionbased learning (OBL) mechanism has arisen as a powerful technique for overcoming these problems. In OBL, the current solution and the reverse solution are examined at the same time during the search process, which improves the diversity of the population and at the same time expands the search range of the algorithm. By doing this the algorithm has more opportunities to escape from the current search area, improving its global search ability. This can effectively solve the premature optimization problem of CRO-based algorithms and achieve convergence in fewer iterative steps, which accelerates the convergence speed [14].
This article introduces the OBL into IMCRO, proposing an opposition-based learning CRO algorithm dubbed OBLIMCRO. The main contribution of this paper is as follows: • Proposing an OBL function to solve the SCS problem by generating an opposite solution corresponding to the original one. When a new solution is produced, its reverse solution will be generated by the OBL function. • Introducing the OBL mechanism into the construction of the population by mixing random molecules and opposite molecules, which can be used to optimize the quality of the population and enrich the diversity of the population. • Proposing an improved CRO-based algorithm with the OBL mechanism in the initialization stage and the iteration stage, resulting in speeding up the convergence and solving the premature trap.
The rest of the paper is organized as follows: related works are summarized in Section 2; the design details of OBLIMCRO are presented in Section 3, where the OBL mechanism for population generation and updating is depicted; to illustrate the performance of OBLIMCRO, an evaluation strategy is defined and the resulting experiments and analysis are shown in Section 3.2.3; finally, the conclusions of the paper are drawn in Section 5.

Related Work
David Maier first defined the SCS problem in 1976 [15], which was proved to be NP-complete for sequences over an alphabet of size five. It has been widely used in bioinformatics and computer science. Different algorithms have been proposed to solve the SCS problem, including heuristic algorithms such as [8], ABC [7], IBS [9], DR [10], and CRO_SCS [11].
The CRO algorithm was originally proposed by Lam and Li in 2010 [16], and was inspired by the process of chemical reactions. In short, it is based on the fact that a chemical reaction undergoes sub-reactions where certain intermediate states correspond to a reaction. The molecule becomes more stable when the molecule's energy is lower than it was in the prior state.
Following the CRO principle, the CRO_SCS algorithm effectively solves the SCS problem. Compared to other heuristic algorithms, the average length of the shortest common superstring obtained by CRO_SCS is smaller under the same conditions. However, as a heuristic algorithm it faces the problems of slower convergence speed and premature maturity. The IMCRO algorithm [12] is an improved algorithm based on CRO_SCS. It introduces the two-step crossover and circular shift operators for the intermolecular collision and the decomposition reactions, respectively. The use of the reaction operator expands the degree of molecular structure change, improving overall algorithm performance. The premature problem is addressed by the algorithm's local search and global search capabilities. Compared to CRO_SCS, the average length of the shortest common superstring output by IMCRO is reduced by 1.02 characters.
OBL is an optimization strategy used in machine learning that was proposed by Tizhoosh in 2005. The main idea of the algorithm is to simultaneously evaluate both the current solution and its opposite solution during the search process. The algorithm then selects the solution closer to the optimal solution from among both the current and opposite solutions. It then performs subsequent searches, reaching the optimal solution in fewer iterations and speeding up the search process.
Meta-heuristic algorithms use OBL to obtain better results in terms of optimizing the population quality and accelerating the speed of convergence. The work presented in [21] combines the harmony search algorithm (HS) and the OBL mechanism to construct the opposition-based learning global harmony search algorithm (OLGHS). The OBL mechanism participates in the establishment of the initial harmony library, and is considered in the initialization process. For forward and reverse solutions, it chooses the proper solution to establish the initial harmony library. This is in order to ensure that the quality of the initial harmony library is appropriate as well as to solve the premature problem by optimizing the quality of the harmony library.
Several novel CRO algorithms combined with OBL have been proposed. Examples include the quasi-opposition learning chemical reaction optimization algorithm (QOCRO) [13], used to solve the reaction energy scheduling problem, and the elite opposition-based learning chemical reaction difference (EOLCRDE) optimization algorithm [22], used to solve the power dispatch problem.
To the best of our knowledge, the OBL mechanism has not previously been used to solve SCS problems. This work introduces OBL in IMCRO (our previous work) to solve the aforementioned problem.

Opposition-Based Learning
The OBL strategy included in IMCRO is intended to generate "high-quality" solutions to accelerate the iteration speed. In each iteration of the heuristic algorithm, when a new solution (e.g., the forward solution) is produced its reverse solution is provided by OBL as well. The forward solution and the reverse solution are compared and the better one joins the population. This selection process is performed before starting the next iteration, and is equivalent to expanding the search range.
Notice that by applying the OBL mechanism the quality of the population is optimized and the diversity of the population is enriched. This results in improving the algorithm's global search ability, allowing it to avoid falling into the local optimum trap.

Oblimcro Framework
The same as IMCRO, OBLIMCRO has three stages, initialization, iteration, and finalization, as shown in Figure 1. The general procedure is described in Algorithm 1, while the stages of the framework are detailed in the following subsections.

Initialization Stage
Initialization is the first stage of OBLIMCRO, where the molecules are initialized. Initial values of molecules include PopSize, KELossRate, MoleColl, buffer, Initial KE, α, and β. Therein, Popsize is the number of all feasible solutions, KELossRate is the maximum percentage of KE reduction, MoleColl is used to determine whether the chemical reaction is inter-molecule or uni-molecule, KE is the initial kinetic energy of the molecule, and α and β are the thresholds for intensification and diversification.
Molecules in the initial population group are generated with OBL. First, the initial molecule group, RX = {x 1 , x 2 , . . . , x n }, is randomly generated; then, its opposite group, OX = {ox 1 , ox 2 , , ox n | ox i = obl(x i )}, is generated with the function obl based on the OBL mechanism. Afterwards, the random and opposite group are mixed as MX, where MX = RX ∪ OX. The individuals in MX are sorted according to their potential energy and the PopSize molecules with the lowest potential energy are selected to form the initial population, denoted as IX. Figure 2 shows an example of the initialization process, where the number of molecules in RX, OX, MX, and IX is PopSize.
The construction of RX, OX, MX, and IX takes place in the initialization stage. RX is constructed with a random inserting operation, as depicted in Figure 3. Therein, the initial superstring C is randomly generated as an array the elements of which are composed of letters in the string alphabet. The element of S i is randomly inserted into C, where S i is one string of the input string set S and 1 ≤ i ≤ PopSize . This random inserting operation is performed on all strings in S to obtain the initial population with PopSize molecules. For further processing, each supersequence is encoded as a set of integer values. For instance, Σ is an alphabet set and Σ = {a, c, g, t}. It is encoded as I = {0, 1, 2, 3}, where each element of Σ is encoded as an integer in I in the corresponding position of the set. Then, the set {0, 1, 2, 3, 2, 1, 3, 0} can represent the supersequence E = {a, c, t, g, t, c, g, a}, as shown in Figure 4.
OX is obtained by the OBL mechanism after the initialization of RX. For each molecule X with n elements in RX, where X = {x 1 , x 2 , . . . , x n }, its opposite molecule OX is obtained using the function obl(), as defined in Formula (1). Therein, the function o(x) obtains the inverse number of the integer x. For a real number x the feasible region of which is [a,b], its inverse number ox is defined in Formula (2). Taking a molecule X = {0, 1, 2, 3, 2, 1, 3, 0} with a feasible region [0,3] as an example, according to Formula (1), the opposite molecule is OX = {3, 2, 1, 0, 1, 2, 0, 3}.
After obtaining RX and OX, MX is obtained by performing the collective union operation on RX and OX, where MX = RX ∪ OX = {x 1 , x 2 , . . . , x n , ox 1 , ox 2 , . . . , ox n } with the scale as 2*PopSize. The molecules are sorted according to the molecular potential energy of the molecules in MX. Half of the molecules in MX with the lowest potential energy are selected to construct IX, which is the initial molecule population.

Iteration Stage
The iteration stage consists of two subtasks, reaction and repair. There are four main operators used in the action subtask: on-wall ineffective collision, decomposition, intermolecular ineffective collision, and synthesis. On-wall ineffective collision is realized by the collision of a single molecule. Then, one element of the molecule is randomly selected and changed to form a new molecule. The operator is used for local search. Decomposition refers to a molecule colliding with the wall and splitting into two or more molecules, and is usually used for global search. Inter-molecular ineffective collision is similar to on-wall ineffective collision. Synthesis is the combination of two molecules into a new molecule, which is equivalent to the inverse reaction of decomposition.
The four operators fall into two types, uni-molecule reaction and inter-molecule reaction. Therein, uni-molecule reactions include on-wall ineffective collision and decomposition, while inter-molecule reactions include inter-molecular ineffective collision and synthesis. The four operators in this stage are the same as those defined in [12].
At the start of each iteration step, a parameter t is randomly generated. This parameter determines whether uni-molecule reactions or inter-molecule reactions are triggered. In particular, uni-molecule reactions are triggered when t > MoleColl; otherwise, inter-molecule reactions are triggered. The specific operator is further determined according to values of α and β. In uni-molecule reactions, decomposition is triggered if (NumHit − MinHit) > α; otherwise, on-wall ineffective collision is triggered. In inter-molecule reactions, synthesis is triggered if KE ≤ β; otherwise, inter-molecular ineffective collision is triggered.
Through the processing of the chemical operators a new molecule, X i , is created based on the molecule X i and an updating process is triggered, as shown in Algorithm 2. First, the opposite molecule, OX i , is created according to Formula (1). The potential energy of X i and OX i are compared and the molecule with lower potential energy is chosen to be returned to the initial population for updating.

Algorithm 2 Update based on OBL.
Input: new molecule X i .
if UX i matches stopping criteria then 10: output UX i ; else 12: put UX i into IX to update the initial group; end if Similar to that in IMCRO, a repair function to check this solution is triggered when a solution is obtained. If the requirement of the problem is satisfied, the obtained solution is repaired by a repair algorithm until the termination condition is met. Then, it enters the final stage, where the best solution is output and returned when the stopping criteria are met; otherwise, iteration is triggered again and the reactions are repeated.

Finalization
At the end of each iteration, termination conditions are checked and a new molecule is obtained through the OBL-based updating process (Algorithm 2). If the conditions are met, either a new molecule is output as the result or the iteration continues. There are two termination conditions. One is that the potential energy of the new molecule is less than the specified threshold, and the other is that the number of iterations reaches a certain upper limit. If either of the two conditions is met, it is deemed to have met the termination condition.

Experiments and Evaluation
In order to evaluate the performance and efficiency of OBLIMCRO for solving the SCS problem, we carried out a set of experiments where OBLIMCRO was compared to IMCRO and CRO_SCS, as well as to heuristic algorithms such as ACO, IBS, and DR.

Configuration of Experiments
The execution environment for the compared algorithms was a Windows-based personal computer with an i5-4210U CPU and 4.0GB RAM. We adopted the same parameters for the experiments as those in IMCRO [12]. In particular, MoleColl was 0.2, KELossRate was 0.6, and PopSize was 20.
The datasets were the same as those used in IMCRO [12], including a random DRM dataset with fifteen instances and a real DRL dataset with eleven instances. There were fifteen DNA sequences in the random DRM dataset, with each instance containing four different types of characters (Σ = 4). There were six DNA sequences and five protein sequences in the real DRL dataset. There were twenty different types of characters (Σ = 20) in each protein sequence.
In the experiment, each compared algorithm was run twenty times with the different datasets. For each instance we obtained 200 different results, including the length of SCS and the execution time. The average length and the average execution time for the SCS were obtained from these 200 results. After repeating the process described above, results such as the average SCS length, represented as L, and the average execution time, represented as T, were obtained. The value of L of the specialized algorithm alg for the instance ins is denoted as L alg (ins). Meanwhile, the value of T of the specialized algorithm alg for the instance ins is denoted as T alg (ins); alg is one of the compared algorithms, e.g., alg ∈ {DR, IBS, ACO, CRO − SCS, I MCRO, OBLI MCRO}, and ins is one type of the datasets, e.g., ins ∈ DRM ∪ DRL.
The maximum number of iterations for the CRO operations and the current potential energy exceeding the threshold were the two stopping criteria for the execution of the compared algorithms. Specifically, the maximum number of iterations was 500 and the threshold of the potential energy was specialized according to the structure of the particular instance. As depicted in Algorithm 1, if one of the two criteria were satisfied, the algorithm was stopped and the final solution was output.

Results and Analysis
The objective of the compared algorithms was to obtain the shortest average SCS length and the minimum average execution time. Tables 1-4 show the results of the algorithms with these two types of datasets. Therein, n is the instance's string number and k is the string length. The emboldened values in each table are the best results.  Table 1 shows that L OBLI MCRO (ins) < L alg (ins), where ins ∈ DRM and alg ∈ {DR, IBS, ACO, CRO − SCS, I MCRO}. Table 2 Table 4 shows that T OBLI MCRO (ins) ≤ T alg (ins), where ins ∈ DRL and alg ∈ {DR, IBS, ACO, CRO_SCS, I MCRO}.
The results shown in Tables 1 and 3 indicate that OBLIMCRO obtains the minimum value in both average SCS length and average execution time for DNA instances in real datasets. However, OBLIMCRO does not considerably overwhelm IMCRO and CRO_SCS for protein instances, while OBLIMCRO, IMCRO and CRO_SCS achieve smaller values of the average SCS length in comparison with the other algorithms. In particular, in comparison with IMCRO OBLIMCRO can reduce L for DNA instances, as shown in Formula (3), where RC is the average reduced value. In DRM, RC with OBLCRO is 2.11, where num = 15. On the other hand, RC with OBLCRO is 2.24 in DRL, where num = 6.
Except for the reduction in average SCS length, the reduction in average running time is better. In the random set, the average running time of the OBLIMCRO algorithm is the lowest among the three CRO algorithms. Compared with the CRO_SCS algorithm, after using the OBLIMCRO algorithm the average running time is shortened by 52.8%. Compared with IMCRO, after using the OBLIMCRO algorithm the average running time is shortened by 50.9%. In the real set, a substantial improvement in the average running time was found. Compared with CRO_SCS, the average running time of OBLIMCRO is shortened by 54.3%, and compared with IMCRO, the running time of OBLIMCRO is shortened by 50.3%. It can be seen that whether using experimental data in a random set or using a real set, after using OBLIMCRO the running time can be shortened by more than half when compared with CRO_SCS and IMCRO. The average reduction in average running time, ∆T, for T can be obtained using Formula (4). (3) To prove the credibility of the results above, we applied a statistical significance test to L OBLI MCRO on L CRO_SCS and L I MCRO using t-tests [23]. The p-value of the t-test was 2.83 × 10 −5 and 0.0013 for the random datasets (Table 1), respectively, and 0.04 and 0.0164 for the real datasets (Table 3), respectively. All of these p-values are smaller than 0.05, which means that the difference in the average SCS length between OBLIMCRO and IMCRO or CRO_SCS is statistically significant. The results of the t-tests statistically verify the credibility of the results, showing that OBLIMCRO can efficiently reduce the average SCS length for most of the datasets in comparison with IMCRO and CRO_SCS.
Next, we applied a statistical significance test to T OBLI MCRO on T CRO_SCS and T I MCRO using t-tests, as well. For the random dataset (Table 1), the p-value of the t-test was 0.12 and 0.13 respectively, while for the real datasets (Table 3) it was 0.021 and 0.024 respectively. As all of these p-values are smaller than 0.05, the difference in the average running time between OBLIMCRO and IMCRO or CRO_SCS is statistically significant. The results of the t-tests statistically verifies the credibility of te results, showing that in comparison with IMCRO and CRO_SCS, OBLIMCRO can efficiently reduce the running time required to resolve the SCS problem.
The number of iterations required to reach the best solution with the CRO algorithms is shown in Tables 5 and 6, and is related to the PE threshold of each instance. For the three CRO algorithms, the number of CRO operations per iteration is less than 500, which is configured as the maximum iterations. This indicates that the three CRO algorithms converge invariantly.

Conclusions
We proposed a novel algorithm by introducing an OBL mechanism into IMCRO to solve the SCS problem. The proposed algorithm, named OBLIMCRO, aims to tackle common problems found in state-of-the-art algorithms for SCS. The OBL mechanism participates in the process of constructing and updating the population, which optimizes the quality of the population that is used to obtain the final solution. Experimental results show that for DNA proteins in both random and real datasets, the average running time is reduced and the average length required to solve the SCS is shortened. The results indicate that with the help of OBL, OBLIMCRO can achieve more optimal solutions with a faster convergence speed.