DNA Paired Fragment Assembly Using Graph Theory

DNA fragment assembly requirements have generated an important computational problem created by their structure and the volume of data. Therefore, it is important to develop algorithms able to produce high-quality information that use computer resources efficiently. Such an algorithm, using graph theory, is introduced in the present article. We first determine the overlaps between DNA fragments, obtaining the edges of a directed graph; with this information, the next step is to construct an adjacency list with some particularities. Using the adjacency list, it is possible to obtain the DNA contigs (group of assembled fragments building a contiguous element) using graph theory. We performed a set of experiments on real DNA data and compared our results to those obtained with common assemblers (Edena and Velvet). Finally, we searched the contigs in the original genome, in our results and in those of Edena and Velvet.


Introduction
Each monomer comprising the DNA polymer is formed with a pentose, a phosphate group and one of four nitrogenous bases: adenine, guanine, cytosine and thymine.In 1953, scientists James Watson and Francis Crick [1] discovered the double-helix spatial structure of the DNA molecule.This double chain is coiled around a single axis, and the strands are attached by hydrogen bridges between pairs of opposite bases.The direction of the polymer chain is determined by the pentose carbon atoms 5 and 3 (the beginning and ending positions of a DNA strand are denoted as 5 and 3 , respectively; a DNA strand is read in the 5 to 3 direction, and the complementary strand runs in the opposite direction).The bases in each pair are complementary.The content of adenine (A) is the same as the content of thymine (T), and the cytosine (C) content is the same as that of guanine (G) (Figure 1).
In 1975, Frederik Sanger [2] proposed a DNA sequencing technique that involved detecting small dark bands in a thin gel using electrophoresis.Sanger proposed cutting the DNA molecule at specific points in the sequence with restriction enzymes [3].This method is slow and costly; with each digestion, the sample must be divided, and each new division must be cloned to obtain a sufficient amount of material.To reduce processing time, Sanger et al. [4,5] proposed splitting the DNA sequences at random points.The disadvantage of this method is that the order of the fragments is unknown, generating an NP-Complete (Non-deterministic polynomial time) [6] computational problem.This method is known as the shotgun technique.Rodger Staden [7] proposed a method to assemble the genome using a computer.As the DNA fragments are produced from many copies of the original genome, more than one fragment comes from the same region.The DNA fragments should be processed while looking for overlaps, coincidences in the extremes of the fragments.The total number of bases in the fragments divided by the total number of bases in the complete genome is called the coverage.If the coverage is high enough, it is possible to rebuild the genome, but it is difficult to obtain high-quality results because false overlaps can be generated due to sequencing errors, sample contamination with foreign DNA and chimeras (the cloning process is carried out using host bacteria, and sometimes the DNA of the sample is concatenated with that of the host bacteria, producing what is known as a chimera).In our experiments, we noticed that some DNA sections might not be sampled and that complete genome reconstruction would therefore not be possible.What can be obtained is a set of contigs covering most of the genome, and it is the job of a molecular biologist to assemble the contigs using other techniques.
Later, James Weber and Eugene Myers [4] proposed the generation of paired fragments in the shotgun process.Traditional sequencing starts from the 5′ end of each piece of DNA, and only a limited number of bases can be sequenced.In NGS (Next-Generation Sequencing), the number of bases is always the same, yielding fixed-size fragments; however, the DNA pieces sequenced are generally longer.The goal of the paired fragments procedure is to obtain a sequence from the 3′ end as well as one from the 5′ end, obtaining two fragments from the same piece of DNA.This would help to establish an interval in the DNA sequence associated with the paired fragments.Unfortunately, sequencing from the 3′ end is difficult and produces a relatively large number of sequencing errors.
DNA sequencing technology has advanced, decreasing costs and process time.Today, there are databases with information about many genomes, including the human genome and those of many disease-causing microorganisms [8].Steven Salzberg [9] developed a comparative study of DNA sequence assemblers; tests were carried out with fragment sets obtained using Illumina equipment (Illumina: http://www.illumina.com/,Illumina, Inc., San Diego, Cal., USA).The fragments' lengths were found by NGS technology to be in the range of 50 to 150 bases [10].The Salzberg study was developed in 2012; nevertheless, the paradigm remains the same today, and the Salzberg study is therefore still applicable.The primary metric in the evaluation was N50 (shortest sequence length close to the median of a set of contigs).Salzberg employed four organisms in the test, including a Staphylococcus aureus problem.These results are particularly interesting because the tests we present in this article have been obtained using a problem with the same bacterium.Salzberg did not include the Edena assembler (http://www.genomic.ch/edena.php,Genomic Research Institute, Geneva, Switzerland.),developed by David Hernández [11] in his research.In our comparative study, this assembler is an important reference.
Initially, greedy (Algorithmic technique that at each step tries to generate the optimal solution of that step of the problem without considering the rest of the problem) algorithms [12] were used to find the order of the fragments; later, the de Bruijn graph [13] was introduced with different values A strand is re ad in the 5' -3' dire ction

5' 3' A C C G T C G G A T
The inverse-complementary strand Figure 1.Reading a strand in direct and inverse-complementary.
Rodger Staden [7] proposed a method to assemble the genome using a computer.As the DNA fragments are produced from many copies of the original genome, more than one fragment comes from the same region.The DNA fragments should be processed while looking for overlaps, coincidences in the extremes of the fragments.The total number of bases in the fragments divided by the total number of bases in the complete genome is called the coverage.If the coverage is high enough, it is possible to rebuild the genome, but it is difficult to obtain high-quality results because false overlaps can be generated due to sequencing errors, sample contamination with foreign DNA and chimeras (the cloning process is carried out using host bacteria, and sometimes the DNA of the sample is concatenated with that of the host bacteria, producing what is known as a chimera).In our experiments, we noticed that some DNA sections might not be sampled and that complete genome reconstruction would therefore not be possible.What can be obtained is a set of contigs covering most of the genome, and it is the job of a molecular biologist to assemble the contigs using other techniques.
Later, James Weber and Eugene Myers [4] proposed the generation of paired fragments in the shotgun process.Traditional sequencing starts from the 5 end of each piece of DNA, and only a limited number of bases can be sequenced.In NGS (Next-Generation Sequencing), the number of bases is always the same, yielding fixed-size fragments; however, the DNA pieces sequenced are generally longer.The goal of the paired fragments procedure is to obtain a sequence from the 3 end as well as one from the 5 end, obtaining two fragments from the same piece of DNA.This would help to establish an interval in the DNA sequence associated with the paired fragments.Unfortunately, sequencing from the 3 end is difficult and produces a relatively large number of sequencing errors.
DNA sequencing technology has advanced, decreasing costs and process time.Today, there are databases with information about many genomes, including the human genome and those of many disease-causing microorganisms [8].Steven Salzberg [9] developed a comparative study of DNA sequence assemblers; tests were carried out with fragment sets obtained using Illumina equipment (Illumina: http://www.illumina.com/,Illumina, Inc., San Diego, CA, USA).The fragments' lengths were found by NGS technology to be in the range of 50 to 150 bases [10].The Salzberg study was developed in 2012; nevertheless, the paradigm remains the same today, and the Salzberg study is therefore still applicable.The primary metric in the evaluation was N50 (shortest sequence length close to the median of a set of contigs).Salzberg employed four organisms in the test, including a Staphylococcus aureus problem.These results are particularly interesting because the tests we present in this article have been obtained using a problem with the same bacterium.Salzberg did not include the Edena assembler (http://www.genomic.ch/edena.php,Genomic Research Institute, Geneva, Switzerland.),developed by David Hernández [11] in his research.In our comparative study, this assembler is an important reference.
Initially, greedy (Algorithmic technique that at each step tries to generate the optimal solution of that step of the problem without considering the rest of the problem) algorithms [12] were used to find the order of the fragments; later, the de Bruijn graph [13] was introduced with different values for the k-mer (section of k consecutive bases) [13].Most of the assemblers available today are based on de Bruijn graphs.In our comparative study, we include the Velvet assembler (http://www.ebi.ac.uk/~zerbino/velvet/,EML-EBI, Cambridge, UK), developed by Zerbino [14], which applies de Bruijn graphs.
Parsons et al. [15] proposed an optimization with a genetic algorithm to maximize the sum of the fragment overlaps, but the obtained contigs were relatively short and processing speed was low.Later, Mallén-Fullerton and Fernández-Anaya [16] suggested a reduction of the fragments' assembly problem to the traveling salesman problem (TSP) that has been studied extensively.Solution methods with relatively good efficiency exist for the TSP.Applying heuristics and algorithms, they obtained optimal solutions for several commonly used benchmarks, and for the first time, a real-world problem was solved by using optimization.Using graph theory and setting the appropriate objective functions, Mallén et al. [17] developed a new assembly method from the perspective of a directed graph, looking for the reduction of the algorithm's complexity.
In this publication, taking Mallén et al. [17] as a starting point, we developed a new algorithm working with the paired fragments (two records identified by "/1" and "/2", respectively, conforming to an interval of the sequence of DNA) resulting from the sequencing results of the Illumina equipment.In our initial tests, we found some contigs that were not located in the published genome for the same organism, even though these contigs were properly obtained.Empirically, we found that when these contigs were split at certain points, all of the pieces could, in most cases, be found in the genome.Using the information from the paired fragments information, we could find the locations where a contig should be split to increase the quality of the assembly.
To obtain the overlaps between DNA fragments, we used a Trie [17].Using the overlaps as edges, a directed graph has been obtained, and we could build a set of contigs improving the N50 [8] of the previous release [17].The interval of the paired fragments was a critical factor in our success.
In our experiments, we worked with the sequencing data of real-life organisms obtained from Illumina equipment, maximizing the lengths of the contigs as the objective function.
In Section 2 of this paper, we present the applied graph theory elements, data structures and algorithms.In Section 3, we reveal the developed algorithms that solve the assembly problem.Section 4 sets out the application of this new model to a real-life problem comparing our assembler with other assemblers (Velvet and Edena), and finally, in Section 5, we present our conclusions and ideas for future work.

Generalities
A graph G = (V, E) is a set of vertices V and edges E; the vertices are linked by the edges, and, on each side of the edge, only one vertex exists.Some graphs are non-directional; nevertheless, if the system requires a direction, it will activate an ordered paired of vertices, and the result will be called a directed graph.One paradigm for DNA fragment assembly using overlapping fragments is based on a graph.Several models have been developed, and these have had variable outcomes with respect to the algorithm's complexity and efficiency and the quality of the results in the assembled DNA.The algorithms based on graph theory include de Bruijn graphs [13], Eulerian paths [10], Hamiltonian paths [10] and Depth-First search (DFS) [17].

The Shotgun Technique
In the sequencing process, a sample of DNA is broken into many fragments [10].Next-Generation Sequencing (NGS) produces very short fragments, all the same size, usually between 25 and 500 bases.The cuts are made in random places, usually producing millions of fragments.Each fragment is sequenced, and the results are stored in a FASTA (plain-text file format used to represent and store genetic information) file format [18].Figure 2 illustrates the shotgun technique [10].Each fragment overlaps with other fragments; the fragments are the graph vertices, and the number of overlapped bases are the edge weights.In Figure 3 between 25 and 500 bases.The cuts are made in random places, usually producing millions of fragments.Each fragment is sequenced, and the results are stored in a FASTA (plain-text file format used to represent and store genetic information) file format [18].Figure 2 illustrates the shotgun technique [10].Each fragment overlaps with other fragments; the fragments are the graph vertices, and the number of overlapped bases are the edge weights.In Figure 3    between 25 and 500 bases.The cuts are made in random places, usually producing millions of fragments.Each fragment is sequenced, and the results are stored in a FASTA (plain-text file format used to represent and store genetic information) file format [18].Figure 2 illustrates the shotgun technique [10].Each fragment overlaps with other fragments; the fragments are the graph vertices, and the number of overlapped bases are the edge weights.In Figure 3, a graph model example is shown.

Pair Generation
The overlaps between fragments are obtained using a Trie data structure [19].With this data structure, it is possible to calculate only the real overlaps without checking all of the possible pairs or the overlaps that are smaller than a predefined length.In the worst-case scenario, the search complexity of the Trie is O(l), where l is the fragment length.If n is the number of fragments in the problem, the complexity to obtain all the overlaps is O(nl).In addition, the Trie has the advantage of storing no duplicate data [20].

Adjacency List
A proper way to manage sparse graphs, as was applied in our case, is the adjacency list data structure [21].We found the use of a linked list appropriate for our problem, with a header section as a dictionary.Each header value is a starting point for the adjacency elements stored in the same data structure, which really contains a set of adjacency lists.The input and output degrees for each vertex are calculated while the overlaps are loaded.Each overlap has a preceding and a subsequent fragment that determine the direction of the edges of the directed graph.The vertices with zero input degree value are the starting point for a path.The path with a larger sum of weights is a contig in our algorithms.
Because of the fragment overlaps, each base in a contig can appear in more than one fragment, as shown in Figure 4.The consensus is the number of times that a nucleotide appears in the same position of a contig.We considered a minimum consensus to accept a base in a contig.The base found most frequently in each position of a contig is accepted if its frequency is over the minimum consensus previously defined.In the example presented in Figure 4, a minimum consensus of four is accepted.
structure, it is possible to calculate only the real overlaps without checking all of the possible pairs or the overlaps that are smaller than a predefined length.In the worst-case scenario, the search complexity of the Trie is O(l), where l is the fragment length.If n is the number of fragments in the problem, the complexity to obtain all the overlaps is O(nl).In addition, the Trie has the advantage of storing no duplicate data [20].

Adjacency List
A proper way to manage sparse graphs, as was applied in our case, is the adjacency list data structure [21].We found the use of a linked list appropriate for our problem, with a header section as a dictionary.Each header value is a starting point for the adjacency elements stored in the same data structure, which really contains a set of adjacency lists.The input and output degrees for each vertex are calculated while the overlaps are loaded.Each overlap has a preceding and a subsequent fragment that determine the direction of the edges of the directed graph.The vertices with zero input degree value are the starting point for a path.The path with a larger sum of weights is a contig in our algorithms.
Because of the fragment overlaps, each base in a contig can appear in more than one fragment, as shown in Figure 4.The consensus is the number of times that a nucleotide appears in the same position of a contig.We considered a minimum consensus to accept a base in a contig.The base found most frequently in each position of a contig is accepted if its frequency is over the minimum consensus previously defined.In the example presented in Figure 4, a minimum consensus of four is accepted.

Specific Characteristics
Our objective is to obtain the paths that maximize the sum from the weights of the edges contained in the path [17].Each path starts at a node with zero-input degree and ends at a node with zero-output degree.Notice that several starting nodes can connect to a shared group of nodes.In this case, we keep only the path with the maximum edge sum.From the selected paths, it is possible to assemble a set of contigs.
In the assembly process for the contig, the consensus value [17] must be selected.In the experiments presented in this article, we used a consensus of 4. Bases with a consensus lower than the specified value are discarded.A consensus value greater than or equal to the specified value indicates a good-quality base, as shown in Figure 4. Usually, the lowest consensus values are found in the extremes of the contig.
To verify the quality of the contigs, we searched for them in the original genome.Those not located were separated into several segments, searching for each one in the genome.Some of these fragments were located, while a few were not, indicating that the prediction of the split point was not always accurate.The existence of paired fragments in the contig is a guarantee that the content of

T T T A A A A T C T G G A A G
bases with low consensus bases with low consensus value bases with good consensus value

Specific Characteristics
Our objective is to obtain the paths that maximize the sum from the weights of the edges contained in the path [17].Each path starts at a node with zero-input degree and ends at a node with zero-output degree.Notice that several starting nodes can connect to a shared group of nodes.In this case, we keep only the path with the maximum edge sum.From the selected paths, it is possible to assemble a set of contigs.
In the assembly process for the contig, the consensus value [17] must be selected.In the experiments presented in this article, we used a consensus of 4. Bases with a consensus lower than the specified value are discarded.A consensus value greater than or equal to the specified value indicates a good-quality base, as shown in Figure 4. Usually, the lowest consensus values are found in the extremes of the contig.
To verify the quality of the contigs, we searched for them in the original genome.Those not located were separated into several segments, searching for each one in the genome.Some of these fragments were located, while a few were not, indicating that the prediction of the split point was not always accurate.The existence of paired fragments in the contig is a guarantee that the content of the interval is correct, and this gave us the criterion for the split point with respect to the interval of the paired fragments.
Figure 5 shows two intervals; the split point of the contig is outside the intervals.All of the elements outside of the intervals can be discarded because there is no confirmation with the paired fragments ("/1", "/2").It should be noted that the N50 [9] value will be reduced; meanwhile, the new contigs are of better quality.
the interval is correct, and this gave us the criterion for the split point with respect to the interval of the paired fragments.
Figure 5 shows two intervals; the split point of the contig is outside the intervals.All of the elements outside of the intervals can be discarded because there is no confirmation with the paired fragments ("/1", "/2").It should be noted that the N50 [9] value will be reduced; meanwhile, the new contigs are of better quality.

Algorithms
To resolve the assembly problem, we hereby propose the following sequence (Figure 6): Step 1. Build a Trie (prefix tree) [19] to identify the overlapped fragments, trying to get the greatest value from the overlapping while resolving duplicates, fragments without overlap and repetition bases with the same element.
Step 2. From the list of overlapping fragments that comprise a directed graph, an adjacency list is built [21], with which we can calculate the input and output degrees of each vertex.
Step 3. Using the degree input and output information of the vertices, we could detect those that are a head of a path, with zero input degree, and build the path to those with zero output degree, accumulating the overlapping values of each edge.This accumulated value can change if,

Algorithms
To resolve the assembly problem, we hereby propose the following sequence (Figure 6): the interval is correct, and this gave us the criterion for the split point with respect to the interval of the paired fragments.
Figure 5 shows two intervals; the split point of the contig is outside the intervals.All of the elements outside of the intervals can be discarded because there is no confirmation with the paired fragments ("/1", "/2").It should be noted that the N50 [9] value will be reduced; meanwhile, the new contigs are of better quality.

Algorithms
To resolve the assembly problem, we hereby propose the following sequence (Figure 6): Step 1. Build a Trie (prefix tree) [19] to identify the overlapped fragments, trying to get the greatest value from the overlapping while resolving duplicates, fragments without overlap and repetition bases with the same element.
Step 2. From the list of overlapping fragments that comprise a directed graph, an adjacency list is built [21], with which we can calculate the input and output degrees of each vertex.
Step 3. Using the degree input and output information of the vertices, we could detect those that are a head of a path, with zero input degree, and build the path to those with zero output degree, accumulating the overlapping values of each edge.This accumulated value can change if, Step 1. Build a Trie (prefix tree) [19] to identify the overlapped fragments, trying to get the greatest value from the overlapping while resolving duplicates, fragments without overlap and repetition bases with the same element.
Step 2. From the list of overlapping fragments that comprise a directed graph, an adjacency list is built [21], with which we can calculate the input and output degrees of each vertex.
Step 3. Using the degree input and output information of the vertices, we could detect those that are a head of a path, with zero input degree, and build the path to those with zero output degree, accumulating the overlapping values of each edge.This accumulated value can change if, when on a new path, there is a value greater than the previous intersection of an intermediate vertex of the path.
Step 4. Finally, walking in reverse, beginning with the last vertex, rebuild the route marked with the greatest overlapping values until a vertex of zero input degree is reached.This path is a contig.

Trie
The Trie data structure was developed by Fredking [22].It is a tree structure that forms prefix texts that are converted into a search index for the new texts.The new texts fit with the prefix and complement the branch of the tree with a suffix.In our problem, the prefixes and suffixes are letters from the alphabet: A, C, G, T.
In the building process, the first node with four locations is empty and ready to receive the four elements A, C, G, T. The first fragment arrives and accommodates depending on the letter value.For example, the first string is AGGTCGA, and it goes on creating new blank nodes.The next one arrives with AGGTTTC, and it accommodates from AGGT; the next fragment is AGGCCTC, and now it accommodates from AGG. Figure 7 shows the resulting tree.
when on a new path, there is a value greater than the previous intersection of an intermediate vertex of the path.
Step 4. Finally, walking in reverse, beginning with the last vertex, rebuild the route marked with the greatest overlapping values until a vertex of zero input degree is reached.This path is a contig.

Trie
The Trie data structure was developed by Fredking [22].It is a tree structure that forms prefix texts that are converted into a search index for the new texts.The new texts fit with the prefix and complement the branch of the tree with a suffix.In our problem, the prefixes and suffixes are letters from the alphabet: A, C, G, T.
In the building process, the first node with four locations is empty and ready to receive the four elements A, C, G, T. The first fragment arrives and accommodates depending on the letter value.For example, the first string is AGGTCGA, and it goes on creating new blank nodes.The next one arrives with AGGTTTC, and it accommodates from AGGT; the next fragment is AGGCCTC, and now it accommodates from AGG. Figure 7 shows the resulting tree.The duplicate fragments are easily handled because they do not provide new values to the branches and are eliminated.The construction of the Trie is, in the worst case, O(ln) [19] when there are no coincidences between the fragments (l is the fragment length and n is the number of fragments).In our data problem, the number of coincidences is large.Algorithm 1 describes the process of building the Trie.The duplicate fragments are easily handled because they do not provide new values to the branches and are eliminated.The construction of the Trie is, in the worst case, O(ln) [19] when there are no coincidences between the fragments (l is the fragment length and n is the number of fragments).In our data problem, the number of coincidences is large.Algorithm 1 describes the process of building the Trie.

1.
For each fragment Algorithm 2 shows the search for the overlapped fragments.In the worst case, it is O(l) for a single fragment, where l is the length of the fragment, because, at that point, it has completely walked through the branch of the tree.
Algorithm 2. Searching a fragment in the Trie.

1.
For each fragment (l) 1.1 For each fragment (l ← l-1 until overlap limit value)

Adjacency List
This data structure is built by taking the list of edges.The list is a conventional data structure for each fragment.

Contig Assembly
In this step, the longest path will be sought, starting with all the vertices with zero input degree, these being the potential starting points of a path that could eventually become a contig.The walkthrough is carried out until a vertex with zero output degree is found, while accumulating the overlapping values.Once a path is finished, the walkthrough shifts to the next vertex with zero input degree and the process is repeated.If a node that has been used in another path is detected, the process will evaluate the greatest value and will leave the greater value as the result.Once the complete graph has been processed, each starting node is a contig.Algorithm 3 shows the process used to obtain the greatest weight of a path.Algorithm 3. Contigs assembly.

1.
For all elements with Din = 0 While assembling the branch, a content counter is generated for the column; at the end of the assembly process, the consensus is reviewed with the counter, and it is possible to remove the sections not in compliance with the predefined value.
In the FASTA file containing the original fragments [18], the paired values are identified as "/1" and "/2"; these identifiers will be used to confirm the contig by searching the interval inside the contig.Algorithm 4 describes how the paired fragments confirm intervals.

2.
For each fragment in the contig Figure 8a shows a case of a contig that demonstrates continuity between the first confirmation interval and the subsequent interval.If there is no continuity between the confirmation intervals, then there is a split point, as shown in Figure 8b.This split point will produce two contigs.The elements at the extremes are also removed because there is no confirmation interval.
Algorithms 2017, 10, 36 9 of 14 In the FASTA file containing the original fragments [18], the paired values are identified as "/1" and "/2"; these identifiers will be used to confirm the contig by searching the interval inside the contig.Algorithm 4 describes how the paired fragments confirm intervals.Figure 8a shows a case of a contig that demonstrates continuity between the first confirmation interval and the subsequent interval.If there is no continuity between the confirmation intervals, then there is a split point, as shown in Figure 8b.This split point will produce two contigs.The elements at the extremes are also removed because there is no confirmation interval.

Experiments
The application of the algorithms was carried out with the Staphylococcus aureus problem, taken from GAGE (Genome Assembly Gold-Standard Evaluations, 2011, [9]).The information consisted of

Experiments
The application of the algorithms was carried out with the Staphylococcus aureus problem, taken from GAGE (Genome Assembly Gold-Standard Evaluations, 2011, [9]).The information To obtain an improvement in the execution time, it would be advisable to eliminate the transitive edges of the graph.With this action, the data volume would be reduced significantly.
The identification of the sequences of the contigs could represent valuable information for molecular biologists, and these could probably be obtained by looking for paired fragments that span multiple contigs.The records with undetermined values ("N") could probably also contribute some information about the contig ordering.
Finally, the scaffolding process (contigs assembly) could generate important results for biologists, including a likely new step of detecting overlapping within contigs and giving rise to some adjustments in the quality assurance of the assembly.

Figure 1 .
Figure 1.Reading a strand in direct and inverse-complementary.
, a graph model example is shown.
, a graph model example is shown.

Figure 2 .
Figure 2. Shows the shotgun process for a DNA sequence and the assembly with overlaps.

Figure 2 .
Figure 2. Shows the shotgun process for a DNA sequence and the assembly with overlaps.

Figure 2 .
Figure 2. Shows the shotgun process for a DNA sequence and the assembly with overlaps.

Figure 4 .
Figure 4. Consensus effect in the contig assembly.

Figure 4 .
Figure 4. Consensus effect in the contig assembly.

Figure 5 .
Figure 5. Assembly ensured with the intervals of paired fragments.

Figure 5 .
Figure 5. Assembly ensured with the intervals of paired fragments.

Figure 5 .
Figure 5. Assembly ensured with the intervals of paired fragments.