Idorna: an Interacting Domain-based Tool for Designing Rna-rna Interaction Systems

RNA-RNA interactions play a crucial role in gene regulation in living organisms. They have gained increasing interest in the field of synthetic biology because of their potential applications in medicine and biotechnology. However, few novel regulators based on RNA-RNA interactions with desired structures and functions have been developed due to the challenges of developing design tools. Recently, we proposed a novel tool, called iDoDe, for designing RNA-RNA interacting sequences by first decomposing RNA structures into interacting domains and then designing each domain using a stochastic algorithm. However, iDoDe did not provide an optimal solution because it still lacks a mechanism to optimize the design. In this work, we have further developed the tool by incorporating a genetic algorithm (GA) to find an RNA solution with maximized structural similarity and minimized hybridized RNA energy, and renamed the tool iDoRNA. A set of suitable parameters for the genetic algorithm were determined and found to be a weighting factor of 0.7, a crossover rate of 0.9, a mutation rate of 0.1, and the number of individuals per population set to 8. We demonstrated the performance of iDoRNA in comparison with iDoDe by using six RNA-RNA interaction models. It was found that iDoRNA could efficiently generate all models of interacting RNAs with far more accuracy and required far less computational time than iDoDe. Moreover, we compared the design performance of our tool against existing design tools using forty-four RNA-RNA interaction models. The results showed that the performance of iDoRNA is better than RiboMaker when considering the ensemble defect, the fitness score and computation time usage. However, it appears that iDoRNA is outperformed by NUPACK and RNAiFold 2.0 when considering the ensemble defect. Nevertheless, iDoRNA can still be an useful alternative tool for designing novel RNA-RNA interactions in synthetic biology research. The source code of iDoRNA can be downloaded from the site


Introduction
Currently, RNA-RNA interactions have gained much interest as they play a crucial role in gene regulation for both prokaryotes and eukaryotes [1][2][3][4].Systems that are based on RNA-RNA interactions have been utilized in various applications, including medicine, agriculture and industry.For instance, gene inhibition systems based on the interaction of small interfering RNAs (siRNAs) with their target genes have been widely used for suppression of cancer-related genes as a means for cancer therapy [5,6].A system employing antisense RNA developed for controlling petunia flower color is a good example of an RNA-RNA interaction application in agriculture [7].In addition, a regulatory system based on small RNAs (sRNAs) has been developed in an Escherichia coli (E.coli) system for controlling, tuning and monitoring specific genes involving response to toxins, DNA damage and cell death [8].This programmed bacterium has great potential for industrial applications.With its great potentials, RNA-RNA interactions have already become a topic of interest among synthetic biologists who are exploring and developing new artificial RNA-RNA interaction-based systems with broader applications.
For successful development of RNA-RNA interaction-based regulation systems, computational tools that can help in the design of targeted RNA sequences and structures are needed.Computational design tools can help in minimizing adverse regulation resulting from undesired RNA interactions.Moreover, they can reduce the time spent in laborious experimental studies.So far, the various computational tools that have been developed for designing RNA molecules can be divided into two groups: first, tools that specifically aim to design structure-based RNAs; e.g., RNAInverse [9], RNA-SSD [10], INFO-RNA [11], MODENA [12], NUPACK [13], and RNAiFold [14].Since the functions of the required RNAs are directly related to their structures, the purpose of these design tools is to provide the best RNA sequences that can fold into the target secondary structures given by a user.These structure-based RNAs have been used as gene regulators in response to external signals such as a riboswitch [15,16] and a thermometer-RNA [17] having metabolites and temperature as the external signals, respectively.The other group of computational design tools is principally for designing antisense-based RNAs (i.e., siRNAs) used for suppression of target genes.Thus, these tools take into consideration the context sequences of siRNA and its accessibility to target mRNA.Tools of this type include siDirect [18], DEQOR [19], AsiDesigner [20], DSIR [21], and DEsi [22].Although the abovementioned RNA design tools have been useful, they are limited to designing single RNA molecules which in turn limits their applications.
In the field of synthetic biology, RNA-RNA interaction is used as a foundation for construction of a few biological regulators in synthetic organisms.For example, cis-and trans-encoded RNAs have been employed to create various regulators; e.g., on/off switches [23,24], a comparator [25] and logic gates [26].These devices serve as diverse molecular controllers necessary for the creation of programmable cells.Despite the increasing interest in RNA-RNA interactions, novel devices and systems based on these molecular interactions are still limited in number.The design of novel RNA-RNA interaction-based systems requires careful consideration of conformational changes in RNA structures resulting from intermolecular and intramolecular interactions.The conformational changes in RNA structures before and after their interaction dictate the functions.Thus, a good RNA-RNA interaction design tool should provide reliable sequences of two RNAs, which can form multi-state structures at both folding states before and after RNA-RNA interaction [24,27]."Multi-state structures" refers to two RNA changeable molecules that can fold into three distinct structures; i.e., two independent structures of two RNA molecules that further interact to form a third independent hybridized RNA molecule structure.Nevertheless, only a few RNA tools such as NUPACK [13], RNAiFold [14,28], RiboMaker [29] and iDoDe [30] are able to cope with this challenge.NUPACK is a remarkable design tool that provides nucleotide sequences for supporting the inverse folding problem.The tool uses a substructure decomposition strategy for dividing the target structures and finding a proper sequence by minimizing the ensemble defects [13].Although, this tool is able to design multi-state target structures that support the RNA-RNA interaction design problem, it still requires a user-assigned substructure for initialization.RNAiFold is another outstanding tool that uses constraint programming (CP), and a large neighborhood search (LNS) to design hybridized RNA structures.The design method attempts to find optimal sequences of two RNA molecules whose MFE hybridized structure is identical to a target hybridized structure.However, RNAiFold does not support multi-state structures [14,28].Recently, RiboMaker has been developed for providing RNA sequences of two molecules corresponding to multi-state target structures [26,29].This tool applies the evolutionary algorithm of Monte Carlo Simulated Annealing (MCSA) to optimize a thermodynamic function in order to design conformation-based riboregulation; i.e., bacterial small RNAs (sRNAs) in the 5 1 -untranslated region of its target mRNA.At the same time, we proposed another design tool, namely interacting domain-based design (iDoDe), that provides RNA sequences with a set of intermolecular and intramolecular folded structures [30].The iDoDe program applies a domain-based decomposition and a stochastic algorithm proposed by Zhang [31] to generate suitable RNA-RNA interactions.We showed that iDoDe was able to reduce the search space and unwanted interactions, and provided RNA sequences which can fold into a given structure for simple RNA-RNA interactions without any unwanted interactions.However, iDoDe falls short in satisfying the design of complex RNA-RNA interactions, mainly due to its stochastic approach.To overcome this limitation, an optimization algorithm had to be incorporated into the iDoDe algorithm to help efficiently design these RNA-RNA interactions.
Several optimization methods have been applied to the field of molecular biology including expectation maximization [32][33][34], simulated annealing [35][36][37], machine learning [38,39], linear programming [40][41][42], and the genetic algorithm (GA) [43][44][45][46][47].Among them, the GA is a well-known heuristic and evolutionary algorithm used in many RNA-based tools to solve diverse RNA problems.Computationally, the GA evolves solutions of a population through a recurrent searching process of genetic operations including selection, crossover, and mutation.Each RNA solution or individual is assigned a fitness value indicating its goodness, which serves as an objective function.For example, in the RNA alignment tools, the GA seeks and clusters similar mRNAs in a genome through repeatedly determining their similarity and distance [46,48].On the other hand, some RNA structure prediction tools use the GA to find a near optimal secondary structure by minimizing the free energy of RNA structures [47,49].In the case of RNA design tools, the GA was also used to find near optimal RNA sequences corresponding to user's requirement [12,50].Consequently, with the help of the GA, it is possible to gain a near optimal solution of a given RNA-RNA interaction.
Using a genetic algorithm, we have developed an improved tool called iDoRNA that helps to design interacting domain-based RNA-RNA interactions.The iDoRNA program can provide a near optimal RNA-RNA interaction set corresponding to the given requirements.In this article, we show that iDoRNA can efficiently design various sets of RNA-RNA interaction systems; e.g., S1-A1, and crRNA-taRNA.In comparison with iDoDe, iDoRNA performs much better in terms of both accuracy and computational time usage.When compared with the benchmark tools; i.e., NUPACK, RNAiFold 2.0 and RiboMaker, iDoRNA performs better than RiboMaker, but is outperformed by NUPACK and RNAiFold 2.0.

Description of the Algorithm of iDoRNA
The iDoRNA algorithm developed in this work combines our previous algorithm used in iDoDe [30] and a genetic algorithm (GA) [51] to help design a near optimal RNA individual for a given RNA-RNA interaction.The workflows of iDoDe and iDoRNA are shown in Figure 1a,b.In this section, we describe the details of the iDoRNA algorithm.

Representation of an RNA Individual
The representation of an RNA individual is illustrated in Figure 2.An RNA individual is a set of two single-stranded RNAs (R1, R2) that can fold into themselves and bind to each other to form a hybridized RNA (HR) possessing a specific structure and function.R1, R2 and HR are strings of nucleotide sequences; i.e., R1.seq, R2.seq, and HR.seq, respectively.There are five properties used to characterize an RNA individual including (i) minimum free energy RNA structure, (ii) Hamming distance, (iii) similarity score, (iv) stability score and (v) fitness score.In iDoRNA, a population size (n) of the individuals is specified for GA optimization, and thus the number of total RNA individual is n per each population.

Initialization
To design an RNA-RNA interaction system with iDoRNA, a user must provide the desired lengths and secondary structures of R1, R2 and HR that make up the system.The user then provides additional constraints regarding nucleotide sequences on specific regions such as start codons and ribosome binding sites (RBS) by assigning uppercase letters of IUPAC codes to the constrained regions while leaving non-specified bases with the letter "N".To represent the secondary structure of each RNA, a dot-bracket notation is used, where "." refers to an unpaired base, "(" and ")" refer to paired bases and "&" is a separator between R1 and R2 within the structure representation of the HR.
With the given user's input information, iDoRNA begins to design the sequences of R1, R2 and HR by first generating an initial population of n individuals using the initialization method of iDoDe as briefly described below.The details of the initialization method are also described in the Supplementary Information.

Initialization
To design an RNA-RNA interaction system with iDoRNA, a user must provide the desired lengths and secondary structures of R1, R2 and HR that make up the system.The user then provides additional constraints regarding nucleotide sequences on specific regions such as start codons and ribosome binding sites (RBS) by assigning uppercase letters of IUPAC codes to the constrained regions while leaving non-specified bases with the letter "N".To represent the secondary structure of each RNA, a dot-bracket notation is used, where "." refers to an unpaired base, "(" and ")" refer to paired bases and "&" is a separator between R1 and R2 within the structure representation of the HR.
With the given user's input information, iDoRNA begins to design the sequences of R1, R2 and HR by first generating an initial population of n individuals using the initialization method of iDoDe as briefly described below.The details of the initialization method are also described in the Supplementary Information.

Initialization
To design an RNA-RNA interaction system with iDoRNA, a user must provide the desired lengths and secondary structures of R1, R2 and HR that make up the system.The user then provides additional constraints regarding nucleotide sequences on specific regions such as start codons and ribosome binding sites (RBS) by assigning uppercase letters of IUPAC codes to the constrained regions while leaving non-specified bases with the letter "N".To represent the secondary structure of each RNA, a dot-bracket notation is used, where "." refers to an unpaired base, "(" and ")" refer to paired bases and "&" is a separator between R1 and R2 within the structure representation of the HR.
With the given user's input information, iDoRNA begins to design the sequences of R1, R2 and HR by first generating an initial population of n individuals using the initialization method of iDoDe as briefly described below.The details of the initialization method are also described in the Supplementary Information.
Step 1: The target structures of R1, R2, and HR provided by the user are first divided into a set of domains (D), defined as a substructure representing either interacting or non-interacting segments.A set of domains consists of D 1 , D 2 , . . ., D m , where the subscript m is the number of domains.The details and the examples of the interacting domain-based decomposition are described in the Supplementary Method and the Supplementary Figure S1, respectively.Each domain D is a string of bases: r 1 , r 2 , . . ., r , where r {a, c, g, u, A, C, G, U} and the subscript is the domain length.
Step 2: In this step, all non-specified bases originally assigned by the user as "N" in all domains are randomly replaced with lowercase RNA bases (a, c, g, or u) by using the Domain-based Design (DD) algorithm [31].
Step 3: All domains (D 1 to D m ) are concatenated into the 3 RNA strands; R1, R2, and HR, indicated by R1.seq, R2.seq, and HR.seq, respectively.Each concatenated single-stranded RNA (R1.seq or R2.seq) is a string of L bases: r 1 , r 2 , . . ., r L in the 5'-3' direction, where the subscript L is the RNA length.On the other hand, the concatenated hybridized RNA strand (HR.seq) is a combined string of R1.seq and R2.seq separated by the symbol "&".This set of R1, R2 and HR is referred to the first RNA individual.
Step 4: Steps 2 and 3 are repeated n times to generate concatenated RNA strands of n individuals.

Evaluation
Each RNA individual is evaluated for its five properties mentioned in Section 2.1.1.The definition and the calculation of each property are given below:

Minimum Free Energy RNA Structure
A minimum free energy RNA structure is a secondary structure with the minimal free energy (MFE) of a designed RNA sequence represented by a dot-bracket notation.Given R1.seq, R2.seq, and HR.seq as input, the MFE secondary structures of R1, R2, and HR are predicted using the Vienna RNA package [52].Within this package, RNAfold is used to predict the structures of R1 and R2 (R1.str and R2.str), and RNAduplex is used to predict the structure of HR (HR.str).Additionally, the value of minimal free energy of the HR structure (MFE hr ) is further used to calculate the stability of a designed RNA individual.

Hamming Distance
In iDoRNA, the Hamming distance is applied to measure the difference in the structures of a designed individual with the target structures specified by the user.It is determined by the summation of distinct positions of the given individuals relative to the target.A position with a distinct structure is assigned a value 1; otherwise it is assigned a value 0. For instance, the Hamming distance of the hybridized RNA shown below has a value of 4 because there are four distinct positions between the required and designed structures as indicated by the red positions.For each RNA individual, there are three values of the Hamming distances: R1.hd, R2.hd, and HR.hd.

Similarity Score
The similarity score (f sim ) is the value that indicates how similar the structures of a designed RNA individual is to the target structures.The f sim is calculated by Equation (1): where HD i is the hamming distance of the i-th RNA and L i indicates the length of the i-th RNA.Note that f sim has a value ranging from 0 to 1; where 0 indicates a poorly designed structure while 1 indicates a well designed structure of an RNA individual.

Stability Score
The stability score (f sta ) is the relative stability of an RNA individual.In this study, f sta was calculated based on the ratio of hybridization energy of an MFE structure of a designed HR to the reference energy.Previous studies have shown that the hybridization energy of hybridized RNAs has a highly negative correlation with biological activity [24].Thus, in this study, the hybridization energy of hybridized RNA structure was used to represent the stability property of designed RNAs in our tool.In general, a lower hybridization energy would lead to a more stable hybridized RNA, which is more biologically active.Nevertheless, too low a hybridization energy could lead to a hybridized RNA with too high a rigidity in the resulting structure and a consequent lack of the necessary conformational dynamic properties, which in turn lead to a poor biological activity [53].To avoid the hybridization energy from being too low during energy minimization, we set the reference hybridization energy (MFE re f ) as the target energy.Since the structural rigidity of a RNA duplex depends on its %GC content, we set the target energy as a function of %GC content and length of the RNA duplex, which can be defined by users.Throughout of this work, we set the %GC content at 50, which is an average content in Escherichia coli genome [54].Thus, the calculation of the stability score (f sta ) is performed using the following mathematical expressions: where MFE GC,100% and MFE GC,0% are the hybridization energy of the duplex structure in the required HR at 100% and 0% GC content, respectively.These values represent the highest and the lowest stability of a given length of required HR structure.These 2 energies were obtained from Equations ( 4) and ( 5), respectively: MFE GC,0% " ´0.89pN bp q `5.02 where N bp is the total number of paired bases (or HR length) of the required HR structure.All constants were obtained from linear regression of relationship between calculated MFE (by RNAcofold [52]) of double-stranded segments and their length (N bp ) for the case of 100% and 0% GC contents.Note that f sta has the value from 0 to 1; 0 indicates the least stable while 1 indicates the highest stable structure of an RNA individual when compared with 50% GC-containing RNA duplex of the same size.Thus, an RNA individual with f sta near 1 would have a high hybridization energy and thus high biological activity.

Fitness Score
To realize the objective design, iDoRNA simultaneously optimizes two RNA properties, similarity and stability of designed RNA structures, by maximizing the fitness score (F), represented by f sim and f sta .The fitness score is expressed as shown in Equation ( 6): F " pω Sim ˆfsim `p1 ´ωSim q ˆfsta q (6) where ω Sim is a weighting factor of similarity.By this expression, F ranges from 0 to 1, where 1 indicates a perfectly designed RNA individual.In iDoRNA, F is maximized during the GA optimization.

Reproduction
Reproduction is the module that determines an optimal individual based on GA which follows a step of natural selection; i.e., crossover and mutation.To find an optimal RNA individual, new RNA individuals are repeatedly rebuilt by genetic operations for improvement of designed RNAs.The steps of reproduction based on GA are summarized in Figure 3, and described in details as follows.

Reproduction
Reproduction is the module that determines an optimal individual based on GA which follows a step of natural selection; i.e., crossover and mutation.To find an optimal RNA individual, new RNA individuals are repeatedly rebuilt by genetic operations for improvement of designed RNAs.The steps of reproduction based on GA are summarized in Figure 3, and described in details as follows.

Selection Step
The number of individuals (n) obtained from the mutation step (called new children) are first combined with the n individuals of a previous population.The 2n individuals of the combined population are ranked in descending order of the fitness score by the merge sort algorithm [55].The top n individuals are used as a parent pool.Note that for the first generation, the initial population of n individuals obtained from the initiation step is directly used as the parent pool.Next, the n-parents (or n/2-parent pairs) are selected from the parent pool using a probabilistic function, defined as the fractional fitness of the parent.This probability of a parent is calculated by the ratio of the fitness score of the parent to the sum of the finest scores of all parents.Then, pseudo-random numbers (Nrand) in the range of [0, 1] are generated one at a time, and used for choosing n parents for the crossover.If an Nrand value is in a probability space of a parent, such a parent is selected.Thus, the chance of a parent being selected is directly proportional to its fractional fitness.Note that a parent pair must not include the same parent meaning parent 1 cannot pair with parent 1.

Crossover Step
Having selected n-parents, the sequences of R1 and R2 in each parent are improved by the recombination of corresponding RNA domains (D) between each parent pair.To choose which D is to be recombined, we set up a crossover rate (Cr) as a parameter for recombination acceptance ranging from 0 to 1. Repeatedly, Nrand is randomly generated one at every D of an individual and compared to Cr.A given D is accepted for a recombination between a parent pair when Nrand is less than Cr.Thus, a crossover operation frequently occurs if Cr is set high, and vice versa.If a D is accepted for recombination, a position on the D (between r1 and rl) is randomly generated to choose a recombination point.An illustration of crossing over of a parent consisting three RNA domains (D1-D3) with different l lengths is shown below.It can be seen that the recombination only occurs on D1 and D3 because the Nrand for each domain is less than a given Cr.After the crossover is finished, two new individuals called child 1 and child 2 are obtained.

Selection Step
The number of individuals (n) obtained from the mutation step (called new children) are first combined with the n individuals of a previous population.The 2n individuals of the combined population are ranked in descending order of the fitness score by the merge sort algorithm [55].The top n individuals are used as a parent pool.Note that for the first generation, the initial population of n individuals obtained from the initiation step is directly used as the parent pool.Next, the n-parents (or n/2-parent pairs) are selected from the parent pool using a probabilistic function, defined as the fractional fitness of the parent.This probability of a parent is calculated by the ratio of the fitness score of the parent to the sum of the finest scores of all parents.Then, pseudo-random numbers (N rand ) in the range of [0,1] are generated one at a time, and used for choosing n parents for the crossover.If an N rand value is in a probability space of a parent, such a parent is selected.Thus, the chance of a parent being selected is directly proportional to its fractional fitness.Note that a parent pair must not include the same parent meaning parent 1 cannot pair with parent 1.

Crossover Step
Having selected n-parents, the sequences of R1 and R2 in each parent are improved by the recombination of corresponding RNA domains (D) between each parent pair.To choose which D is to be recombined, we set up a crossover rate (Cr) as a parameter for recombination acceptance ranging from 0 to 1. Repeatedly, N rand is randomly generated one at every D of an individual and compared to Cr.A given D is accepted for a recombination between a parent pair when N rand is less than Cr.Thus, a crossover operation frequently occurs if Cr is set high, and vice versa.If a D is accepted for recombination, a position on the D (between r 1 and r l ) is randomly generated to choose a recombination point.An illustration of crossing over of a parent consisting three RNA domains (D1-D3) with different l lengths is shown below.It can be seen that the recombination only occurs on D1 and D3 because the N rand for each domain is less than a given Cr.After the crossover is finished, two new individuals called child 1 and child 2 are obtained.

Mutation
Step n-Children obtained from the crossover step are subjected to mutation for further improvement of R1 and R2 sequences.In this step, for every single base position of R1 and R2 in each child, a Nrand is generated and used to pick a point of mutation by comparing it with a mutation rate (Mr).Mr is defined as a parameter for mutation acceptance, which is in the range of 0 to 1.If the Nrand value of a base position is less than a given Mr, then such a base is to be mutated by altering it to one of the other possible bases.Once, the mutation step is finished, a new set of n-children is obtained.These children are subsequently ranked in descending order according to fitness scores, and used as a new population for further improvement in the next iteration.

Termination Criteria
In iDoRNA, there are two termination criteria: (i) the highest value of the fitness score of a population is unchanged for 30 consecutive runs and (ii) the generation number reaches a defined maximum number of iterations (Nterm).In this work, we set Nterm to 1000 generations.Therefore, iDoRNA is terminated when one of the two criteria is met, and then provides a best predicted (or near optimal) RNA individual as output.

Parameter Optimization
Important parameters involving the genetic algorithm were investigated to see their effects on the design performances of iDoRNA.These parameters include weighting factor of similarity (sim), population size (n), crossover rate (Cr), and mutation rate (Mr).An RNA-RNA interaction system of crR12-taR12 [23] was used as a target design.The optimal value of each parameter is chosen based on the fitness score and computational time.

Performance Assessment
For performance assessment, forty-four different RNA-RNA interaction models with different levels of structural complexity were used (Supplementary Data S1).These models can be divided into two groups; actual RNAs and artificial RNAs.Actual RNAs (models N01-N07) are the RNA structures and specific sequences of a pair of RNAs that function in natural systems.On the other hand, artificial RNAs (models A01-A37) are the RNA structures that we made up.
For the artificial models, models A01-A04 are simple models taken from Thaiprasit et al. [30].They are simple RNA structures consisting of linear segments and hairpin-loops structures (R1 and R2 in Supplementary Data S1 and S2) in the intramolecular folding state and different hybridized

Mutation
Step n-Children obtained from the crossover step are subjected to mutation for further improvement of R1 and R2 sequences.In this step, for every single base position of R1 and R2 in each child, a N rand is generated and used to pick a point of mutation by comparing it with a mutation rate (Mr).Mr is defined as a parameter for mutation acceptance, which is in the range of 0 to 1.If the N rand value of a base position is less than a given Mr, then such a base is to be mutated by altering it to one of the other possible bases.Once, the mutation step is finished, a new set of n-children is obtained.These children are subsequently ranked in descending order according to fitness scores, and used as a new population for further improvement in the next iteration.

Termination Criteria
In iDoRNA, there are two termination criteria: (i) the highest value of the fitness score of a population is unchanged for 30 consecutive runs and (ii) the generation number reaches a defined maximum number of iterations (N term ).In this work, we set N term to 1000 generations.Therefore, iDoRNA is terminated when one of the two criteria is met, and then provides a best predicted (or near optimal) RNA individual as output.

Parameter Optimization
Important parameters involving the genetic algorithm were investigated to see their effects on the design performances of iDoRNA.These parameters include weighting factor of similarity (ω sim ), population size (n), crossover rate (Cr), and mutation rate (Mr).An RNA-RNA interaction system of crR12-taR12 [23] was used as a target design.The optimal value of each parameter is chosen based on the fitness score and computational time.

Performance Assessment
For performance assessment, forty-four different RNA-RNA interaction models with different levels of structural complexity were used (Supplementary Data S1).These models can be divided into two groups; actual RNAs and artificial RNAs.Actual RNAs (models N01-N07) are the RNA structures and specific sequences of a pair of RNAs that function in natural systems.On the other hand, artificial RNAs (models A01-A37) are the RNA structures that we made up.
For the artificial models, models A01-A04 are simple models taken from Thaiprasit et al. [30].They are simple RNA structures consisting of linear segments and hairpin-loops structures (R1 and R2 in Supplementary Data S1 and S2) in the intramolecular folding state and different hybridized structures (HR in Supplementary Data S1 and S2) at the intermolecular folding state, where all R1 molecules of models A01-A04 are linear segments having a primary RNA structure with increasing lengths (10 to 15 nt), whereas R2 molecules represent either a primary structure (i.e., linear segment in model A01) or a secondary structure (i.e., hairpin-loops in models A02-A04) in the size of 10 to 30 bp.At the intermolecular folding state, interactions between R1 and R2 molecules give hybridized structures of double-stranded RNA, except for models A03-A04 that also exhibit hairpin-loop structure in R2.
Unlike models A01-A04, models A05-A37 were built from the set of benchmarking single stranded RNAs given by Garcia et al. [14,28].These single stranded RNAs are regulatory RNAs that are able to function in Nature.In this study, we created the thirty-three artificial models of RNA-RNA interaction from the set of single stranded RNA molecules as follows.Firstly, sequences and dot-bracket structures of all benchmarking set available on RNAiFold [14] web server were downloaded.Secondly, thirty-three sets whose lengths are less than 250 nucleotides were selected for providing an input of RNA-RNA interaction model.Thirdly, an RNA strand of each set was divided into two strands using the following criteria: (1) if the number of paired bases of such RNA is less than 15 pairs, the RNA strand was split at the position that yields the highest number of pair bases of HR structure, (2) if the number of paired bases of such RNA is more than 15 pairs, the RNA strand was divided equally.Fourthly, dot-bracket structures of all individual RNAs and hybridized RNA were determined by using RNAfold and RNAcofold of Vienna RNA package 2.1.8[52], and used as artificial test models (Supplementary Data S1).
For the tool assessment, iDoRNA was first compared its performance with iDoDe, and then against three benchmark tools; i.e., NUPACK [13], RNAiFold 2.0 [28] and RiboMaker [29].Models N01-N02 and A01-A04 were used for the first comparison with 30 independent trials, while all forty-four test models were used to perform the latter with 10 independent trials.The best individuals were analyzed using ensemble defect, fitness scores and computational time usage as criteria.When comparing with iDoRNA, iDoDe randomly designs each RNA model using the same computational time required in iDoRNA.The RNA individuals with the highest fitness score was compared with the optimal score of the RNA individual suggested by iDoRNA.In addition, the success rate, which is defined as the ratio of RNA individuals with perfect design (F = 1) to the total number of the designed RNA individuals of both methods were compared to assess the performance.For the comparison of iDoRNA with the benchmarks, the following procedure was used.
For NUPACK and RNAiFold 2.0, the tools were performed on design option that allows only one target structure of either individual RNA or hybridized RNA.In this study, targets of design for NUPACK and RNAiFold 2.0 were prepared from sequences and secondary structure of hybridized RNA.For design RNA-RNA interaction using RiboMaker, the same input target structures and constrained sequences used for iDoRNA were used as input.The default parameters; iteration numbers of 5000 and Monte Carlo temperature of 1 were used.In this work, all of the designs were performed on a machine with an Intel ® Core™ i7-4790 Processor 1600 MHz and 16 GB RAM.

Results and Discussion
In this study, we have incorporated a genetic algorithm (GA) into our previous RNA tool (iDoDe) to help optimally design a given RNA-RNA interaction system.This tool is renamed to iDoRNA, which stands for interacting domain-based design tool of RNA-RNA interaction.Herein, we present the effects of the parameters on design performance and, later, compare the performance of iDoRNA with that of iDoDe and benchmark iDoRNA against three other available design tools: i.e., NUPACK, RNAiFold 2.0 and RiboMaker.

Suitable Parameters for iDoRNA
To optimize the performance of iDoRNA, we studied the sensitivity of some parameters to the tools' performances in designing an RNA-RNA interaction system.These parameters include weighting factor (ω sim ), crossover (Cr) and mutation (Mr) rates, and population size (n).To maintain the computational time and the existing complex structure, the RNA-RNA interaction system used for this test is the crRNA-taRNA system (model N02), which contains basic structural elements: unpaired nucleotide, paired base, hairpin loop, bulge, and internal loop.

Effect of the Weighting Factor of Similarity
To study the sensitivity of the weighting factor of similarity on the iDoRNA performance, we set the values of ω sim at 0.3, 0.5, 0.7, and 1.0 while keeping the values of Cr, Mr, and n constant at 0.9, 0.1 and 10, respectively.The performance was determined by observing the similarity and stability scores of the RNA individuals at various generations (Figure 4).At the generation zero, the properties of the initial RNA individuals generated directly from the iDoDe algorithm are fairly diverse, with the similarity score ranging from 0.65 to 0.96, and the stability scores ranging from 0.7 to 1 (Figure 4a).It is noteworthy that the best individual at this initial population was found to possess the similarity and stability scores of 0.906 and 0.965, respectively.After these sets of individuals have gone through several rounds of natural selection with the genetic algorithm, the properties of all individuals have been improved with the number of generations (Figure 4b-d).These improvements can be observed by the movement of the RNA individuals' properties towards the upper right-hand corner of the plots where both the similarity and stability scores are high, except for the case where the weighting factor equals to 1.It can also be seen that the properties of the RNA individuals designed with ω sim at 0.3, 0.5, and 0.7 are similar.However, the ω sim of 0.7 was chosen as the suitable parameter for further study because it provides a relatively faster convergence and better overall properties.for this test is the crRNA-taRNA system (model N02), which contains basic structural elements: unpaired nucleotide, paired base, hairpin loop, bulge, and internal loop.

Effect of the Weighting Factor of Similarity
To study the sensitivity of the weighting factor of similarity on the iDoRNA performance, we set the values of sim at 0.3, 0.5, 0.7, and 1.0 while keeping the values of Cr, Mr, and n constant at 0.9, 0.1 and 10, respectively.The performance was determined by observing the similarity and stability scores of the RNA individuals at various generations (Figure 4).At the generation zero, the properties of the initial RNA individuals generated directly from the iDoDe algorithm are fairly diverse, with the similarity score ranging from 0.65 to 0.96, and the stability scores ranging from 0.7 to 1 (Figure 4a).It is noteworthy that the best individual at this initial population was found to possess the similarity and stability scores of 0.906 and 0.965, respectively.After these sets of individuals have gone through several rounds of natural selection with the genetic algorithm, the properties of all individuals have been improved with the number of generations (Figure 4b-d).These improvements can be observed by the movement of the RNA individuals' properties towards the upper right-hand corner of the plots where both the similarity and stability scores are high, except for the case where the weighting factor equals to 1.It can also be seen that the properties of the RNA individuals designed with sim at 0.3, 0.5, and 0.7 are similar.However, the sim of 0.7 was chosen as the suitable parameter for further study because it provides a relatively faster convergence and better overall properties.

Effect of the Crossover and the Mutation Rates
To study the sensitivity of the crossover and the mutation rate parameters on the iDoRNA performance, we first varied the values of Cr from 0.1 to 0.9 while keeping Mr constant at 0.1 and the population size of 10.The plots of the fitness score versus the generation indicated that Cr of greater

Effect of the Crossover and the Mutation Rates
To study the sensitivity of the crossover and the mutation rate parameters on the iDoRNA performance, we first varied the values of Cr from 0.1 to 0.9 while keeping Mr constant at 0.1 and the population size of 10.The plots of the fitness score versus the generation indicated that Cr of greater than 0.5 appeared to give fast convergence (Figure 5a).The Cr of 0.9 was chosen for further study, as it requires the least number of generations to reach the optimum.Conversely, when we increased the values of Mr from 0.1 to 0.9 while keeping the Cr constant at 0.9, the required number of generations to achieve the optimum was increased (Figure 5b).Thus, the most suitable value of the mutation rate was 0.1.Notably, the sensitivity of the design performance to the mutation rate is more than that to the crossover rate, indicating that a slight change of the mutation rate would affect the tool's performance greatly, so this parameter should be considered carefully when performing a design of a more complex RNA-RNA interaction system.5a).The Cr of 0.9 was chosen for further study, as it requires the least number of generations to reach the optimum.Conversely, when we increased the values of Mr from 0.1 to 0.9 while keeping the Cr constant at 0.9, the required number of generations to achieve the optimum was increased (Figure 5b).Thus, the most suitable value of the mutation rate was 0.1.Notably, the sensitivity of the design performance to the mutation rate is more than that to the crossover rate, indicating that a slight change of the mutation rate would affect the tool's performance greatly, so this parameter should be considered carefully when performing a design of a more complex RNA-RNA interaction system.

Effect of the Population Size
To see how the population size affects the performance of iDoRNA, we varied the population sizes from 4 to 100 RNA individuals per population.For each population size, ten independent runs were performed, and the optimal RNA solutions and the averaged computational time were compared (Table 1).It can be seen that both the optimum fitness and the required computational time increase as n is increased.However, the optimum fitness reached its maximum at 0.996 when n was equal to or greater than 8. Thus, the population size of eight RNA individuals was preferable since it required the least computational time while still providing the maximum fitness.In summary, optimum values of the weighting factor, the crossover rate, the mutation rate and the population size were 0.7, 0.9, 0.1, and 8, respectively.These values was then used in the subsequent assessment of the iDoRNA tool.

Design Performance of iDoRNA
To elucidate the design ability of iDoRNA, we employed the program to design six artificial RNA-RNA interaction models with increasing structural complexity.Thirty independent runs were performed for each model using the suitable set of GA parameters obtained from the previous step.

Effect of the Population Size
To see how the population size affects the performance of iDoRNA, we varied the population sizes from 4 to 100 RNA individuals per population.For each population size, ten independent runs were performed, and the optimal RNA solutions and the averaged computational time were compared (Table 1).It can be seen that both the optimum fitness and the required computational time increase as n is increased.However, the optimum fitness reached its maximum at 0.996 when n was equal to or greater than 8. Thus, the population size of eight RNA individuals was preferable since it required the least computational time while still providing the maximum fitness.In summary, optimum values of the weighting factor, the crossover rate, the mutation rate and the population size were 0.7, 0.9, 0.1, and 8, respectively.These values was then used in the subsequent assessment of the iDoRNA tool.* The values shown are the average and its standard deviation from 10 independent runs.

Design Performance of iDoRNA
To elucidate the design ability of iDoRNA, we employed the program to design six artificial RNA-RNA interaction models with increasing structural complexity.Thirty independent runs were performed for each model using the suitable set of GA parameters obtained from the previous step.Table 2 shows that iDoRNA was able to perfectly design models A01-04 as seen by the fitness score of 1.However, for models N01 and N02 that are more complex in structure, our tool was able to design them with the fitness score of 0.994 and 0.996, respectively.When carefully considering the secondary structures of all 30 optimal RNAs of models N01 and N02 obtained by the tool, it was found that the design errors causing imperfect fitness occurred at the internal loops of the hybridized RNAs.While the internal loop of model N01 was found to contain an extra base within the internal loop, model N02 had a missing internal loop.Thus, this design flaw should be subjected to further improvement of the tool such as the addition of a repair step as a part of the GA.Although iDoRNA did not provide a perfect design for models N01 and N02, the fitness scores were still very high, ensuring the highly stable structures, very similar in structure to their respective target models.Regarding the computational time, it should be noted that the time required to reach an optimum increases with the increasing complexity of the RNA-RNA interaction system.While requiring less than 7 s for designing models A01-A04, the tool needed about 22 and 25 s to provide the optimal designs for models N01 and N02, respectively.This is reasonable because the computational time required in the reproduction step is linearly dependent on the lengths of RNAs as well as the number of interacting domains.In conclusion, iDoRNA can efficiently design all 6 RNA-RNA interaction models with high fitness in short time.

Comparison of the Design Performance between iDoDe and iDoRNA
The design performances of iDoRNA was compared with iDoDe to see how much the GA helped increasing the performance in term of the designing accuracy as well as the computational time usage (Table 3).The six artificial RNA-RNA interaction models were used as the targets.For a fair comparison, we compared the design accuracy of the RNA solutions obtained from the two tools with the same computational time.In doing so, we used the total time required to run iDoRNA for each RNA model as the termination criterion for the iDoDerun.The values of the GA parameters used were the same as those in the previous section.The design results are summarized in Table 3.It is that iDoRNA performed much better than iDoDe for all test models.The optimal RNA individuals provided by iDoRNA for all models were almost ideal, and only failed to reach the fitness score of 1 for model N02.The success rate also indicates the design power of iDoRNA against iDoDe.It should be noted is that while iDoRNA can provide a perfect design with a 100% success rate for five out of the six models, iDoDe does not provide a perfect solution for models A03-A04 and N01-N02, mainly due to the random nature of the iDoDe design.Apparently, the RNA solutions obtained from iDoDe did not have maximum fitness, thus they might yield unstable and dissimilar structures with their target RNA-RNA interaction, especially, at the required bulges and internal-loops of HR.Additionally, we repeatedly found was unwanted hairpin-loops on linear segments in the cases of models N01 and N02.Therefore, with incorporation of GA in iDoRNA, we could overcome these problems of iDoDe.Furthermore, instead of fixing the time, we attempted to run iDoDe for 1000 trials to see if we could obtain a successful design for each model.It was clearly demonstrated that iDoDe still fails to provide a perfect design for models A03-A04 and N01-N02 (Table 4).It can be concluded that iDoRNA can give more accurate RNA solutions in a shorter time for any RNA-RNA interaction based system.Thus, this shall be a promising tool for designing RNA-RNA interaction systems for synthetic biology applications.structures and graphical structure by different visualization methods (NUPACK by NUPACK [13], RNAiFold 2.0 and RiboMaker by VARNA [61], and iDoRNA by RNAfold 2.0 and RNAcofold [52]).

Design Method
During the initialization step, NUPACK, RNAiFold 2.0, and iDoRNA separate the target structures into substructures by different decomposition methods before sequence generation.On the contrary, RiboMaker generates the random sequences of all structures without a decomposition step.Among optimization methods, only constraint-based programing (CP) used in RNAiFold 2.0 is a non-heuristic method which designs sequences based on defined constraints, whereas the other methods are based on a heuristic method.All the tools, except for NUPACK which uses its own prediction method, use additional methods from Vienna RNA package [52] for predicting a possible structure of the designed sequences.
To benchmark our tool with the other existing tools, forty-four test models containing different lengths and characteristics of secondary structural elements (Supplementary Data S1) were designed.Among these test models, N01-N07 are models built from the reported structures of previous laboratory experiments, which represent actual RNA-RNA interaction; A01-A04 are simple artificial models representing the classical secondary structural elements: unpaired nucleotides, paired bases, and hairpin loops; and A05-A37 are the artificial models built from structural RNA sets.In this work, we compared the performance of iDoRNA with NUPACK, RNAiFold 2.0 and RiboMaker using the same computational environment.To avoid inverse folding times in large and complex structures, we used RNAiFold 2.0 with a Large Neighborhood Search (LNS) method, and set the time constraint of each run to within 2 hours.Criteria for comparison include ensemble defects, fitness scores and computational times.Note that the fitness score was used as the criterion to compare the performance between RiboMaker and iDoRNA only.This is because F score could not be determined due to the unavailability of secondary structures of R1 and R2 provided by NUPACK and RNAiFold 2.0.The results of interacting domain patterns from each test model obtained from the interacting domain-based decomposition algorithm is shown in Supplementary Data S3, while the best design results for all test models from iDoRNA is reported in Supplementary Data S4.The performance comparison between these design tools is shown in Table 5 and Supplementary Data S5 which are discussed as follows.
Firstly, Table 5 shows that when using ensemble defects as the criteria, NUPACK outperforms RNAiFold 2.0, RiboMaker and iDoRNA in designing all 44 test models.The ensemble defect results indicate that all designed RNA sequences from NUPACK are able to fold into the HR structures with less average number of incorrectly paired nucleotides at equilibrium than those of the other tools.This is not surprising because the design objective of NUPACK is to minimize the ensemble defects [13] while RNAiFold 2.0 and iDoRNA use other different objectives (Table 4).Secondly, while NUPACK and RNAiFold 2.0 require HR as input, both RiboMaker and iDoRNA require input containing a set of R1, R2 and HR.Thus, it is noteworthy to carefully consider the performance differences between iDoRNA with RiboMaker.When using ensemble defect as the criterion, we found that iDoRNA shows better performance than RiboMaker in designing all test models.Furthermore, it is seen that iDoRNA also outperforms RiboMaker when considering the F-score as the criterion.This is because the objective function of RiboMaker includes the energies of hybridization region, individual RNA structures, and hamming distance, and it does not allow the specification of all single-stranded structures [29] which could lead to a lower similarity and thus the fitness score.On the contrary, the objective function of iDoRNA is to maximize the fitness score which is formulated from two RNA properties; i.e., structural similarity and stability Nevertheless, it can also be seen that iDoRNA could only provide the designs with a perfect F score (F = 1) for all four simple models (A01-A04), but not for the actual and the remaining artificial models (Supplementary Data S5).Finally, when considering the computational time, it is shown that the performance of iDoRNA is better than RiboMaker but worse than NUPACK and RNAiFold 2.0 (See Supplementary Data S5).

Figure 2 .
Figure 2. The representation of RNA individuals.2.1.2.The iDoRNA Algorithm iDoRNA consists of three main modules including initialization, evaluation and reproduction as shown in Figure 1b.The description of each module is detailed as follows.

Figure 2 .
Figure 2. The representation of RNA individuals.2.1.2.The iDoRNA Algorithm iDoRNA consists of three main modules including initialization, evaluation and reproduction as shown in Figure 1b.The description of each module is detailed as follows.

Figure 2 .
Figure 2. The representation of RNA individuals.

Figure 3 .
Figure 3. Schematic diagram of the reproduction module.

Figure 3 .
Figure 3. Schematic diagram of the reproduction module.

Figure 4 .
Figure 4.The evolution of RNA solution at different ω sim (ω sim = 0.3, 0.5, 0.7, and 1.0 depicted with a solid circle, a clear diamond, a solid square, and a solid triangle) presented on (a) initial population, (b) 5th generation, (c) 15th generation, and (d) 30th generation.

Table 1 .
Effect of different population sizes (n) on computational time and fitness score.

Table 1 .
Effect of different population sizes (n) on computational time and fitness score.

Table 2 .
Representation of the design of iDoRNA on computational time and fitness score corresponding to number of IDs.The values shown are the average and its standard deviation from 30 independent runs. *

Table 3 .
Comparison of RNA design performance between iDoDeand iDoRNA.
* iDoDe was performed using the same computational time as iDoRNA.** iDoDe was performed for 1000 runs.

Table 4 .
Comparison of features between iDoRNA and the other tools for designing RNA-RNA interaction.