- freely available
Agronomy 2014, 4(1), 13-33; doi:10.3390/agronomy4010013
Published: 17 January 2014
Abstract: We present a draft genome sequence for enset (Ensete ventricosum) available via the Sequence Read Archive (accession number SRX202265) and GenBank (accession number AMZH01. Enset feeds 15 million people in Ethiopia, but is arguably the least studied African crop. Our sequence data suggest a genome size of approximately 547 megabases, similar to the 523-megabase genome of the closely related banana (Musa acuminata). At least 1.8% of the annotated M. acuminata genes are not conserved in E. ventricosum. Furthermore, enset contains genes not present in banana, including reverse transcriptases and virus-like sequences as well as a homolog of the RPP8-like resistance gene. We hope that availability of genome-wide sequence data will stimulate and accelerate research on this important but neglected crop.
Enset (Ensete ventricosum) is one of the most important crop plants grown in Ethiopia, where it makes a major contribution of to the food security of the country, feeding at least 15 million people. It buffers food deficit during dry spells and recurrent drought and has been dubbed as the “tree against hunger” . Enset is a multi-purpose crop, with all parts of the plant being utilized for human food, animal forage, medicine, or ornamental uses . Furthermore, it has the capacity for high yield, can be stored for long periods, can be harvested at any time of the year and at any stage over a period of several years , thereby offering advantages over seasonal crops.
The genus Ensete falls within the botanical family Musaceae, which also includes bananas and plantains (genus Musa). Enset is susceptible to some of the same diseases that threaten banana, including bacterial wilt caused by Xanthomonas campestris pathovar musacearum . Unlike banana, the main edible parts of the enset plant are the starchy corm and pseudostem. The genome of enset is diploid with n = 9 , while the recently published doubled-haploid banana genome sequence has n = 11 .
There are many clones and landraces of enset in Ethiopia [1,3]. A collection of more than 600 clones and landraces from major enset growing areas of Ethiopia has been assembled and conserved ex situ by the Southern Agricultural Research Institute at Areka and some of these differ in important agronomic characteristics and tolerance to disease . Some attempts at molecular characterization of enset clones or landraces have been made using amplified fragment length polymorphism AFLP [8,9] and random amplified polymorphic DNA RAPD techniques [10,11], revealing the existence of genetic diversity and, therefore, the potential for improvement by breeding, if suitable markers were available. However, despite its importance and value, enset has been relatively neglected by scientific research and is arguably the least-studied African crop. There is an urgent need for efficient improvement of this crop. Our aim was to help accelerate enset research and crop improvement by providing draft genome sequence data and identifying single-nucleotide polymorphisms (SNPs) that might serve as molecular markers for marker-assisted breeding. We also aimed to investigate genetic similarity between enset and banana thus to assess the usefulness of banana genomic resources for application to enset.
2. Results and Discussion
2.1. Whole-Genome Sequencing
We generated 40.4 gigabases of whole-genome shotgun sequence data from the enset genome consisting of 202 million pairs of 100-nucleotide Illumina sequence reads. The sequence reads are freely available from the Sequence Read Archive under accession number SRX202265. Our approach was similar to that of Davey and colleagues  who recently re-sequenced the banana B genome (M. balbisiana) using 281 million pairs of 100-nucleotide Illumina sequence reads. Their attempt at de novo assembly yielded a highly fragmented genome assembly consisting of a large number of short contigs. However, they were able to gain insights into the B genome by aligning their sequence reads against the previously sequenced A genome (M. acuminata) and calling a consensus alignment . Likewise, we used both de novo sequence assembly (that is, without using a reference genome sequence) and an approach based upon alignment of reads against the banana A-genome reference sequence as described in the sections below. Our aligned enset genomic sequence reads covered 47% of the M. acuminata reference genome sequence (247 out of 523 Mb). This is less than the coverage by Davey and colleagues’ alignment of M. balbisiana reads against the same reference genome, which covered 341 out of 523 Mb (65%), perhaps not surprisingly given the larger evolutionary distance between enset and the Musa species.
To check for contamination, we aligned our enset genomic sequence reads against all of the 2735 available complete prokaryotic genomes  using the Burrows-Wheeler Aligner BWA . We found that 8.27% of our sequence reads were alignable against prokaryotic bacterial sequences. The genome sequences showing the greatest coverage were Pseudomonas fluorescens SBW25  and Methylobacterium radiotolerans JCM 2831 (, GenBank: CP001001) with sequence reads covering 30.6% and 33.5% of the lengths of their genomes, respectively. These prokaryotic sequences possibly originate from endophytes and/or epiphytes associated with the plant even though we attempted to clean and sterilize the surface of the plant material by wiping with ethanol. We note that in the study by Davey and colleagues  there was also some bacterial sequence present in the M. balbisiana genomic re-sequencing data: 3.03% of Davey’s data aligned to the prokaryotic genome sequences, with coverage of 94.3% of the Propionibacterium acnes 266  chromosome, and 60.8% of the Serratia marcescens WW4  chromosome. Therefore, it seems that bacterial contamination of plant genome sequence data is not unique to our study. We also note that the depth of coverage of any single bacterial genome by “plant” genomic reads is very low: no more than 2.03× for the P. fluorescens and M. radiotolerans genomes and no more than 9.1× for the P. acnes and S. marcescens genomes mentioned above, and, therefore, not enough to be effectively assembled de novo.
2.2. Estimation of the Enset Genome Length
Based on alignment against enset nuclear DNA sequences available in the GenBank database (Table 1), we estimate the depth of coverage as 67.67×. Given that we generated a total of 37.05 gigabases of sequence data (after removing prokaryote-matching reads) this would indicate a genome size of approximately 547 megabases. This is close to the haploid genome size of 523 megabases for the closely related M. acuminata .
2.3. Conservation of Protein-Coding Sequences between Enset and Banana
To identify which banana protein-coding genes are conserved in enset, we aligned our enset shotgun sequence reads against the 36,542 M. acuminata coding sequences identified by D’Hont and colleagues  using BWA . The advantage of this approach is that it is not confounded by incomplete assembly of or gene prediction in the enset data. The frequency distribution for breadth of coverage across these 36,542 sequences is shown in Figure 1. The breadths of coverage follow a bi-modal distribution with peaks close to zero and close to 100% coverage. The peak close to zero corresponds to banana genes that are either absent from the enset genome or else they are so divergent that the corresponding enset sequences fail to align. There are 662 (1.8%) banana protein-coding sequences that have zero coverage by the aligned enset data and are, therefore, absent, or very divergent, in enset. The Supplementary Data includes a spreadsheet indicating the breadths of coverage of each M. acuminata gene.
|Table 1. Depths of coverage of previously published enset nuclear DNA sequences. The median depth of coverage is 67.67 times.|
|GenBank accession number and description||Depth|
|HM118700.1 TCP-1-eta subunit gene||80.71|
|HM118740.1 mRNA capping enzyme large subunit family protein gene||79.26|
|HM118605.1 electron transport protein gene||79.06|
|HM118577.1 ATP:citrate lyase gene||75.76|
|HM118779.1 succinoaminoimidazole-carboximide ribonucleotide synthetase family||74.08|
|HM118753.1 methylcrotonyl-CoA carboxylase beta chain-like gene||72.01|
|HM118766.1 annexin-like protein gene||71.61|
|HM118805.1 initiation factor 2B family protein gene||68.05|
|HM118660.1 zeaxanthin epoxidase gene||67.67|
|HM118646.1 CASP protein-like gene, partial sequence||65.98|
|HM118632.1 endoribonuclease dicer protein-like gene, partial sequence||65.39|
|HM118673.1 Na/H antiporter gene||65.16|
|HM118591.1 stomatal cytokinesis defective protein gene||64.52|
|HM118819.1 DNA polymerase delta catalytic subunit gene||63.05|
|HM118713.1 NAD+ synthase domain protein gene||61.95|
|HM118619.1 non-phototropic hypocotyl 3-like gene, partial sequence||61.72|
|HM118686.1 DUF89 family protein gene||57.14|
2.4. Heterozygosity and Single-Nucleotide Polymorphisms (SNPs)
Single-nucleotide polymorphisms (SNPs) can be valuable markers for crop improvement  but have not previously been reported for enset. Given the very fragmented nature of our de novo assembly of the enset genome, we followed the example of Davey and colleagues  by performing SNP calling against the high-quality reference genome sequence of M. acuminata . To do the alignment, we used BWA  and only considered sequence reads that uniquely align to a single genomic location. By aligning the enset shotgun sequence reads against this banana genome sequence, we were able to identify 30,287 sites at which there was an approximately 50:50 ratio between the two most frequent aligned nucleotides (where the most abundant base accounts for between 49% and 51% of the aligned bases and where coverage is at least 10×). These sites are distributed over the whole genome (see Figure 2) and occur on average every 17.3 kb. If we are less stringent and include all sites where the frequency of the most abundant base is between 48% and 52%, then the number of heterozygous sites increases to 76,416, a density of one site per 6.8 kb of banana genome. See Figure 3 for an example of such a locus, containing three heterozygous sites. See the Supplementary Data for a list of these heterozygous sites. The rationale for using the banana genome as a reference sequence for identifying heterozygous SNPs is that the banana reference genome sequence is much more contiguous and better annotated than the enset de novo genome sequence. However, one limitation of this approach is that it will fail to identify heterozygous sites that fall within enset-specific sequences. We found that alignment between enset genomic sequences reads and the banana reference genome sequence covered only 47% of the banana genome and occurred much more frequently in genes rather than intergenic regions, as also observed by Davey and colleagues  for alignment of M. balbisiana genomic reads against the same reference genome. To circumvent this limitation, we also generated lists of heterozygous sites called on the enset de novo assembly; these can be found in the Supplementary Data.
2.5. De Novo Assembly of the Enset Genome Sequence
Although alignment of raw sequence reads against the banana reference genome sequence is useful for identifying SNPs and sequences conserved between both plant species, we required a de novo assembly of the enset data in order to examine gene order and to identify enset sequences that are not present in the banana genome. Our assembly had a total length of 459.5 megabases. This represents 84% of the estimated enset genome-size of 547 megabases and is 97.3% of the length of the recently published banana genome assembly of 472.2 megabases . Given that our estimate of the enset genome size based on sequence coverage is very approximate and assuming that the enset genome is of similar size to the banana genome, then this suggests that our de novo assembly represents nearly complete coverage of the enset genome.
The enset genome sequence assembly is available via the GenBank database under accession number AMZH01. Due to restrictions on the numbers of contigs and supercontigs that GenBank can accept within a whole-genome shotgun project, GenBank only includes the enset contigs and super-contigs that are at least five kilobases in length. The full assembly, including contigs and super-contigs of between 200 and 5000 nucleotides, is available via Figshare . Approximately 70% of the enset genome assembly is alignable against the banana genome sequence and average nucleotide sequence identity is 89.90% over the alignable sequence, as judged by the dnadiff tool in the MUMmer  software package.
Given that about 8% of our genomic sequence reads actually originated from prokaryotes rather than from the plant, we checked our de novo assembly for prokaryotic sequences by performing Basic Local Alignment Search Tool nucleotide (BLASTN) searches against the 2735 available complete prokaryotic genomes . A total of 81,795 bp (0.018%) of the enset de novo assembly matched prokaryotic genome sequences. These sequences were removed from the data submitted to GenBank (accession AMZH01).
We performed a preliminary annotation of the enset genome assemblies using FGENESH  to predict protein-coding genes; summary statistics are given in Table 2 and the protein sequences, their genomic coordinates, results of BLASTP searches against the M. acuminata proteome, and the results of functional prediction using PfamScan  are available via Figshare  (the file was too large to be included in the Supplementary Data). Of 42,749 predicted proteins, 9967 did not have any significant sequence similarity to the banana proteome detectable by BLASTP. It should be noted that due to the fragmented nature of the draft de novo assembly, the number of predicted genes is likely to be significantly over-estimated as some gene models are split between multiple contigs. We used RfamScan  to identify non-coding RNA genes, including microRNAs, which are listed in Table 3, and we used RepeatMasker  to search for matches to repeat sequences (Table 4), as described in the Experimental Section. Overall, the enset assembly was predicted to have a greater repeat-content (32.65%) than the banana A genome (20.31%).
Gene order was highly conserved between banana and enset, at least over the scale of tens of kilobases, as exemplified in Figure 4, which shows an alignment of the longest enset super-contig against banana chromosome 5. However, we did identify some differences in gene-content between the two genomes as described in the following sections.
|Table 2. Assembly statistics.|
|Complete assembly||Subset of assembly submitted to GenBank (AMZH00000000.1)|
|Number of scaffolds||123,779||14,787|
|N50 scaffold length||11,149||13,657|
|NG50 scaffold length (bp)||9,954||n.a. *|
|Shortest scaffold (bp)||200||5,000|
|Longest scaffold (bp)||105,416||103,995|
|Sum of scaffold lengths (bp)||458,655,998||172,241,963|
|Mean scaffold length (bp)||3,705||15,952|
|Median scaffold length (bp)||1,056||13,404|
|Number of contigs||259,028||19,109|
|N50 contig length (bp)||8,724|
|NG50 contig length (bp)||2,428||n.a. *|
|Shortest contig (bp)||201||5,000|
|Longest contig (bp)||56,178||56,178|
|Sum of contig lengths (bp)||390,884,093||163,735,150|
|Mean contig length (bp)||1,509||8,568|
|Median contig length (bp)||555||7,448|
|Number of gene models||42,749||23,423|
|Mean length of predicted protein (aa)||311.64||353.84|
|G + C (%)||38.95||39.14|
* NG50 lengths  were calculated on the basis of an estimated genome length of 50 Mb. The total length of the scaffolds submitted to GenBank (under accession AMZH00000000.1) was less than 50% of this estimated length (7.54 Mb versus 25 Mb); therefore, it is not possible to calculate NG50 length for this dataset.
|Table 3. Predicted non-coding RNAs in the enset genome assembly predicted by Rfam version 11.|
|GenBank accession number||Scaffold name||Start and end positions||Strand||Rfam ID (and accession number)||Rfam scan E value|
|Table 4. Overview and classification of the repeats present in the enset genome and comparison with those in the M. acuminata genome.|
|Ensete Ventricosum||Musa Acuminata|
2.6. Enset—Specific Genes Include Reverse Transcriptases, Viral Sequences, and a Putative Disease-Resistance Gene
Among the enset genes not conserved in the M. acuminata genome , are several predicted to encode reverse transcriptases (Pfam accession PF00078). Reverse transcriptases are characteristic of several classes of mobile elements, including retroviruses, such as the banana streak virus. The phylogenetic relationships of these reverse transcriptases are shown in Figure 5, which indicates that they fall into two distinct clades. One of these clades (in the lower part of Figure 5) includes two genes from banana along with two from enset. However, the other clade (the upper part of Figure 5) includes no known sequences from Musa species, but includes sequences from several other monocot and dicot plants.
Similarly, the enset genome encodes at least 14 predicted proteins containing the integrase core domain (Pfam: PF00665) while the banana genome  encodes only one (see Figure 6). The integrase core domain is involved in integration of a copy of a viral genome into the host chromosome. The enset genome also encodes at least 19 predicted retrotransposon gag proteins (Pfam: PF03732) with no closely related sequence in banana (Figure 7).
It has been shown that the genomes of some Musa species contain endogeneous retroviruses that are integrated into the host chromosome . The genome of E. ventricosum contains several sequences that resemble retrovirus sequences and therefore may represent endogeneous integrated viruses. Specifically, a M. balbisiana sequence containing eBSOLV (endogeneous Obino l’Ewai virus) sequence (GenBank: HE983609 ) is highly conserved in E. ventricosum, though this sequence is absent from the M. acuminata genome . Similarly, E. ventricosum contains sequences with 86% nucleotide identity to a 2.25-kb fragment of banana streak UA virus (GenBank: AEC49874) and 79% identity to a 1.1-kb fragment of the sugarcane bacilliform virus (SCBV) BT20231 (GenBank: FJ439799 ). It is not clear whether any of these virus sequences represent viruses that can become infectious as they can in Musa species .
Other enset proteins not found in the banana genome include a protein (GenBank: KB218027) that shares 42% amino-acid identity with Arabidopsis thaliana protein At1g53350, annotated as an RPP8-like resistance protein. Examples such as this are candidates for future studies on disease resistance in enset and perhaps even for introgression into banana.
3. Experimental Section
The E. ventricosum plant was grown from seed purchased from Jungle Seeds (Wallington, UK). We extracted genomic DNA using the DNAEasy Plant Minikit supplied by Qiagen (Manchester, UK). We sequenced genomic DNA using an Illumina HiSeq 2500, according to the manufacturer’s instructions. We used a single lane of an eight-lane flowcell and generated 202 million pairs of 100-nucleotide reads with a mean insert-length of approximately 350 nucleotides.
For alignment of sequence reads against reference sequences, we used BWA version 0.7.5a-r405  and visualized BWA alignments using the Integrative Genomics Viewer IGV . For de novo assembly we used SOAPdenovo version 1.05 . Prior to assembly, we removed all sequence reads that contained “N”s. Calculations of N50 and NG50 were based on the definitions of these two statistics stated by Assemblathon .
We used BLAST  and MUMMER  for pairwise alignments of assembled sequences and reference sequences and visualized BLAST alignments using the Artemis Comparison Tool (ACT) . We used MEGA5  for phylogenetic analysis.
To identify repeat sequences, we used RepeatMasker version open-4.0.1 [26,34,35] in default mode run with RMBLAST version 2.2.27+ against the customized library of M. acuminata repeats (1903 sequences) from Hřibová and colleagues [36,37]. This is the same library of banana-specific repeats used in the M. balbisiana genome re-sequencing project .
For ab initio gene prediction from our de novo genome assembly, we used FGENESH v.3.1.1  with parameters tuned for ‘monocot plant’.
Here we present the first genome-wide sequencing study of enset (Ensete ventricosum). We have identified more than 1000 candidate SNPs, and by using less stringent criteria, many more candidates could be identified. These data will be useful as a reference sequence for future “omics studies” on this neglected crop. Armed with this initial draft genome sequence, we can now extend our studies to genotypic variation among different Ethiopian varieties of enset, both cultivated and wild.
This work was funded in part by the Wellcome Trust Biomedical Informatics Hub at the University of Exeter. We are grateful to Eva Hřibová for making available the library of banana repeat sequences.
Conflicts of Interest
The authors declare no conflict of interest.
References and Notes
- Brandt, S.A.; Spring, A.; Hiebsch, C.; McCabe, J.T.; Tabogie, E.; Diro, M.; Wolde-Michael, G.; Yntiso, G.; Shigeta, M.; Tesfaye, S. The “Tree Against Hunger” Enset-Based Agricultural Systems in Ethiopia; American Association for the Advancement of Science: Washington, DC, USA, 1997; pp. 1–58. [Google Scholar]
- Pijls, L.T.J.; Timmer, A.A.M.; Wolde-Gebriel, Z.; West, C.E. Cultivation, preparation and consumption of ensete (Ensete ventricosum) in Ethiopia. J. Sci. Food Agric. 1995, 67, 1–11. [Google Scholar]
- Asfaw, B.T. Studies on Landraces Diversity in vivo and in vitro Regeneration of Enset: (Enset ventricosum Welw.); Köster: Milan, Lombardy, Italy, 2002; p. 127. [Google Scholar]
- Biruma, M.; Pillay, M.; Tripathi, L.; Blomme, G.; Abele, S.; Mwangi, M.; Bandyopadhyay, R.; Muchunguzi, P.; Kassim, S.; Nyine, M.; et al. Banana Xanthomonas wilt: A review of the disease, management strategies and future research directions. Afr. J. Biotechnol. 2007, 6, 953–962. [Google Scholar]
- Cheesman, E. Classification of the bananas: The genus ensete horan. Kew Bull. 1947, 2, 97–106. [Google Scholar] [CrossRef]
- D’Hont, A.; Denoeud, F.; Aury, J.-M.J.; Baurens, F.-C.F.; D’Hont, A.; Carreel, F.; Garsmeur, O.; Noel, B.; Bocs, S.; Droc, G.; et al. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature 2012, 488, 213–217. [Google Scholar] [CrossRef]
- Ethiopian Institute of Agricultural Research (EIAR). Enset Research and Development Experiences in Ethiopia. In Proceedings of Enset National Workshop, Wolkite, Ethiopia, 19–20 August 2010; Yesuf, M., Hunduma, T., Eds.; Ethiopian Institute of Agricultural Research (EIAR): Addis Ababa, Ethiopia, 2012. [Google Scholar]
- Tsegaye, A. On Indigenous Production, Genetic Diversity and Crop Ecology of Enset (Ensete ventricosum (Welw.) Cheesman). Ph.D. Thesis, Wageningen University, Wageningen, The Netherlands, 22 April 2002; p. 198. [Google Scholar]
- Negash, A.; Niehof, A. The significance of enset culture and biodiversity for rural household food and livelihood security in southwestern Ethiopia. Agric. Human Values 2004, 21, 61–71. [Google Scholar] [CrossRef]
- Birmeta, G.; Nybom, H.; Bekele, E. RAPD analysis of genetic diversity among clones of the Ethiopian crop plant Ensete ventricosum. Euphytica 2002, 124, 315–325. [Google Scholar] [CrossRef]
- Birmeta, G.; Nybom, H.; Bekele, E. Distinction between wild and cultivated enset (Ensete ventricosum) gene pools in Ethiopia using RAPD markers. Hereditas 2004, 140, 139–148. [Google Scholar] [CrossRef]
- Davey, M.W.; Gudimella, R.; Harikrishna, J.A.; Sin, L.W.; Khalid, N.; Keulemans, J. A draft Musa balbisiana genome sequence for molecular genetics in polyploid, inter- and intra-specific Musa hybrids. BMC Genomics 2013, 14. [Google Scholar] [CrossRef]
- National Center for Biotechnology Information. Available online: ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/ (accessed on 22 December 2013).
- Li, H.; Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25, 1754–1760. [Google Scholar] [CrossRef]
- Silby, M.W.; Cerdeño-Tárraga, A.M.; Vernikos, G.S.; Giddens, S.R.; Jackson, R.W.; Preston, G.M.; Zhang, X.-X.; Moon, C.D.; Gehrig, S.M.; Godfrey, S.A.C.; et al. Genomic and genetic analyses of diversity and plant interactions of Pseudomonas fluorescens. Genome Biol. 2009, 10, R51. [Google Scholar] [CrossRef]
- Copeland, A.; Lucas, S.; Lapidus, A.; Glavina del Rio, T.; Dalin, E.; Tice, H.; Bruce, D.; Goodwin, L.; Pitluck, S.; Kiss, H.; et al; US DOE Joint Genome Institute, Walnut Creek, CA, USA. Unpublished work. 2008.
- Brzuszkiewicz, E.; Weiner, J.; Wollherr, A.; Thürmer, A.; Hüpeden, J.; Lomholt, H.B.; Kilian, M.; Gottschalk, G.; Daniel, R.; Mollenkopf, H.-J.; Meyer, T.F.; Brüggemann, H. Comparative genomics and transcriptomics of Propionibacterium acnes. PLoS One 2011, 6, e21581. [Google Scholar] [CrossRef]
- Chung, W.-C.; Chen, L.-L.; Lo, W.-S.; Kuo, P.-A.; Tu, J.; Kuo, C.-H. Complete genome sequence of Serratia marcescens WW4. Genome Announc. 2013, 1, e0012613. [Google Scholar]
- Li, H.; Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 2010, 26, 589–595. [Google Scholar] [CrossRef]
- Mammadov, J.; Aggarwal, R.; Buyyarapu, R.; Kumpatla, S. SNP markers and their impact on plant breeding. Int. J. Plant Genomics 2012, 2012, 728398. [Google Scholar]
- Studholme, D. Ensete ventricosum Genome Sequence. Available online: http://figshare.com/articles/Ensete_ventricosum_genome_sequence/894306 (accessed on 6 January 2014).
- Kurtz, S.; Phillippy, A.; Delcher, A.L.; Smoot, M.; Shumway, M.; Antonescu, C.; Salzberg, S.L. Versatile and open software for comparing large genomes. Genome Biol. 2004, 5, R12. [Google Scholar] [CrossRef]
- Solovyev, V. Statistical Approaches in Eukaryotic Gene Prediction. In Handbook of Statistical Genetics; John Wiley & Sons, Ltd.: Chichester, West Sussex, UK, 2004; pp. 97–159. [Google Scholar]
- Finn, R.D.; Bateman, A.; Clements, J.; Coggill, P.; Eberhardt, R.Y.; Eddy, S.R.; Heger, A.; Hetherington, K.; Holm, L.; Mistry, J.; et al. Pfam: The protein families database. Nucleic Acids Res. 2013, 42, D222–D230. [Google Scholar]
- Gardner, P.P.; Daub, J.; Tate, J.; Moore, B.L.; Osuch, I.H.; Griffiths-Jones, S.; Finn, R.D.; Nawrocki, E.P.; Kolbe, D.L.; Eddy, S.R.; et al. Rfam: Wikipedia, clans and the “decimal” release. Nucleic Acids Res. 2011, 39, D141–D145. [Google Scholar] [CrossRef]
- Tempel, S.; Repeatmasker, U. Using and understanding RepeatMasker. Methods Mol. Biol. 2012, 859, 29–51. [Google Scholar] [CrossRef]
- Earl, D.; Bradnam, K.; St John, J.; Darling, A.; Lin, D.; Fass, J.; Yu, H.O.K.; Buffalo, V.; Zerbino, D.R.; Diekhans, M.; et al. Assemblathon 1: A competitive assessment of de novo short read assembly methods. Genome Res. 2011, 21, 2224–2241. [Google Scholar] [CrossRef]
- Chabannes, M.; Baurens, F.-C.; Duroy, P.-O.; Bocs, S.; Vernerey, M.-S.; Rodier-Goud, M.; Barbe, V.; Gayral, P.; Iskra-Caruana, M.-L. Three infectious viral species lying in wait in the banana genome. J. Virol. 2013, 87, 8624–8637. [Google Scholar] [CrossRef]
- Muller, E.; Dupuy, V.; Blondin, L.; Bauffe, F.; Daugrois, J.-H.; Nathalie, L.; Iskra-Caruana, M.-L. High molecular variability of sugarcane bacilliform viruses in Guadeloupe implying the existence of at least three new species. Virus Res. 2011, 160, 414–419. [Google Scholar] [CrossRef]
- Thorvaldsdóttir, H.; Robinson, J.T.; Mesirov, J.P. Integrative Genomics Viewer (IGV): High-Performance genomics data visualization and exploration. Briefings Bioinforma. 2013, 14, 178–192. [Google Scholar] [CrossRef]
- Luo, R.; Liu, B.; Xie, Y.; Li, Z.; Huang, W.; Yuan, J.; He, G.; Chen, Y.; Pan, Q.; Liu, Y.; et al. SOAPdenovo2: An empirically improved memory-efficient short-read de novo assembler. Gigascience 2012, 1. [Google Scholar] [CrossRef]
- Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. [Google Scholar]
- Carver, T.J.; Rutherford, K.M.; Berriman, M.; Rajandream, M.-A.; Barrell, B.G.; Parkhill, J. ACT: The Artemis Comparison Tool. Bioinformatics 2005, 21, 3422–3423. [Google Scholar] [CrossRef]
- Tarailo-Graovac, M.; Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics 2009, 4. [Google Scholar] [CrossRef]
- RepeatMasker. Available online: http://www.repeatmasker.org (accessed on 20 December 2013).
- Hribová, E.; Neumann, P.; Matsumoto, T.; Roux, N.; Macas, J.; Dolezel, J. Repetitive part of the banana (Musa acuminata) genome investigated by low-depth 454 sequencing. BMC Plant Biol. 2010, 10, 204. [Google Scholar] [CrossRef]
- Institute of Experimental Botany. Available online: http://wwwueb.asuch.cas.cz/Olomouc1/banana-sequencing-data/BananaREP.tar.gz (accessed on 20 December 2013).
© 2014 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).