PCR-DGGE Analysis: Unravelling Complex Mixtures of Badnavirus Sequences Present in Yam Germplasm

Badnaviruses (family Caulimoviridae, genus Badnavirus) have emerged as serious pathogens especially affecting the cultivation of tropical crops. Badnavirus sequences can be integrated in host genomes, complicating the detection of episomal infections and the assessment of viral genetic diversity in samples containing a complex mixture of sequences. Yam (Dioscorea spp.) plants are hosts to a diverse range of badnavirus species, and recent findings have suggested that mixed infections occur frequently in West African yam germplasm. Historically, the determination of the diversity of badnaviruses present in yam breeding lines has been achieved by cloning and sequencing of polymerase chain reaction (PCR) products. In this study, the molecular diversity of partial reverse transcriptase (RT)-ribonuclease H (RNaseH) sequences from yam badnaviruses was analysed using PCR-dependent denaturing gradient gel electrophoresis (PCR-DGGE). This resulted in the identification of complex ‘fingerprints’ composed of multiple sequences of Dioscorea bacilliform viruses (DBVs). Many of these sequences show high nucleotide identities to endogenous DBV (eDBV) sequences deposited in GenBank, and fall into six monophyletic species groups. Our findings highlight PCR-DGGE as a powerful tool in badnavirus diversity studies enabling a rapid indication of sequence diversity as well as potential candidate integrated sequences revealed by their conserved nature across germplasm.


Introduction
Badnaviruses are plant pararetroviruses (family Caulimoviridae, genus Badnavirus) infecting a broad range of economically important crop plants and have emerged as serious pathogens especially affecting the cultivation of tropical crops, such as banana, black pepper, cacao, citrus, sugarcane, taro and yam [1]. The development of reliable diagnostic tests for badnaviruses is challenging due to high serological and genetic heterogeneity, as experienced in a wide range of crops [2][3][4][5]. The discovery of integrated badnavirus sequences in some host plant genomes of agricultural and horticultural crops complicates the fool-proof diagnosis of episomal infections using nucleic acid-based methods as experienced for banana streak viruses (BSVs) in Musa spp. [6][7][8], and their existence poses further challenges for taxonomy, safe movement of germplasm, and disease management (reviewed by [9]).
The aim of this study was to determine: (1) whether PCR-DGGE could be used as a rapid technique to screen for differences in complex mixtures of episomal and endogenous badnavirus sequences being present in West African yam landraces and breeding lines; and (2) if this approach would enable the simple identification of integrated badnavirus sequences conserved across germplasm. Historically, this was achieved by cloning and sequencing of PCR products using the generic badnavirus primer pair Badna-forward primer/reverse primer (FP/-RP) [59]. This approach however has the disadvantage of there being a possibility of failing to identify all sequence diversity due to clones selected not being representative of the total diversity present within a tested plant. Restriction fragment length polymorphism (RFLP) analysis has also been used to study yam badnavirus diversity [3,35,60] but this technique shows a lower resolution of diversity than DGGE. However, neither technique (cloning of PCR products and RFLP) allows episomal and endogenous badnavirus sequences to be distinguished.
This study is the first to demonstrate the value of PCR-DGGE (hereafter referred to as DGGE) for unravelling complex sequence mixtures present in badnavirus PCR products amplified from yam breeding lines ( Figure 1). This approach resulted in the generation of 114 nucleotide sequences from yam germplasm which have been deposited in the GenBank database under accession numbers KY555456 to KY555569. These DBV sequences fall into six monophyletic species groups and represent several unique DGGE profiles. Interestingly, many bands are conserved across germplasm indicating integrated sequences, and many sequences identified in this study do show high nucleotide identities to eDBV sequences. It is considered that the ability to link DGGE profiles to badnavirus sequences will assist the future identification of badnavirus diversity based on comparing sample profiles to standard profile markers.

Plant Material
Yam leaf samples of breeding lines and landraces (47 samples in total) of Dioscorea alata (n = 11), Dioscorea bulbifera (n = 1), Dioscorea cayenensis (n = 1), Dioscorea dumetorum (n = 3), Dioscorea esculenta (n = 2) and Dioscorea rotundata (n = 23) used in this study were collected from screen-houses at the International Institute of Tropical Agriculture (IITA, Ibadan, Nigeria) and placed in polythene bags (14 cm × 14 cm). The breeding lines were a subset of the IITA collection samples (n = 127) consisting of first filial (F 1 ) generations of West African breeding lines of D. rotundata (n = 112) and D. rotundata landraces (n = 15). Details of samples from these collections are given in the Supplementary materials of Seal et al. [38]. Selected yam breeding lines and landraces of D. rotundata (n = 6) were provided by the IITA and tubers were grown in a quarantine aphid-proof glasshouse at the Natural Resources Institute (NRI, Chatham Maritime, UK), as described by Mumford and Seal [61]. Individual leaf samples were collected from each plant in small polythene bags (10 cm × 15 cm) and processed immediately. Dioscorea rotundata accession (TDr) 89/02475A and B, and TDr 1892A and B are clones of the same yam accessions.

Total Nucleic Acid Extraction from Yam Leaves and PCR Amplification of Badnavirus Sequences
Total nucleic acids were extracted from fresh yam leaf tissue (~100 mg) using a modified cetyltrimethylammonium bromide (CTAB) method as described by Kenyon et al. [3]. Total nucleic acids were screened for the presence of sequences typical of the genus Badnavirus by PCR using the degenerate primer set Badna-forward primer (FP) and Badna-reverse primer (RP) designed by Yang et al. [59]. These Badna-FP/-RP primers amplify a 579-bp region (528 bp excluding primer sequences and representing only complete amino acids) of the reverse transcriptase (RT)-ribonuclease H (RNaseH) gene used for taxonomic assessment of badnaviruses [62]. To confirm the suitability of DNA for PCR amplification all DNA samples were first screened using primers targeting the yam actin gene as described by Silva et al. [63]. PCR amplifications were set up in 25-µL reactions containing 1 µL of template (20 ng), 0.5 µM of each primer, 0.25 mM of each deoxynucleotide triphosphate (dNTP), 1 U DreamTaq DNA polymerase and 1× DreamTaq Green buffer (Thermo Scientific, Loughborough, UK) containing 2 mM MgCl 2 . The cycle conditions for PCR amplification were 94 • C for 2 min, followed by 40 cycles of 94 • C for 20 s, 55 • C for 30 s, 72 • C for 1 min and a final extension of 72 • C for 10 min. PCR products were analysed by gel electrophoresis through 1.5% (w/v) agarose gels including 1× RedSafe nucleic acid stain (iNtRON Biotechnology, Gyeonggi-do, Korea) in 0.5× Tris-Boric acid-ethylenediaminetetra acetic acid (EDTA) (TBE) buffer. PCR products of the expected size were purified and diluted PCR products ('nested') or diluted total nucleic acid extractions ('direct') were used as template for the purposes of generating PCR products designated for DGGE analysis. Both Badna-FP and Badna-RP primers were modified by the addition of a GC clamp (5 CGC CCG CCG CGC GCG GCG GGC GGG GCG GGG GCA CGG GGG GAT GCC ITT YGG IIT IAA RAA YGC ICC 3 and 5 CGC CCG CCG CGC GCG GCG GGC GGG GCG GGG GCA CGG GGG GCC AYT TRC AIA CIS CIC CCC AIC C 3 , respectively) generating a product of 619 bp. PCR reactions using the Badna-FP/-RP GC-clamp primers used cycle conditions as follows: one cycle at 95 • C for 5 min, then 35 cycles of 94 • C for 1 min, 55 • C for 45 s, 72 • C for 2 min followed by one cycle of extension at 72 • C for 10 min. Prior to DGGE analysis, PCR products were confirmed to be of the correct size by agarose gel electrophoresis. All sequencing in this study was performed by Source BioScience (Nottingham, UK). All primers described were synthesised using Sigma oligo service (Sigma-Aldrich, Irvine, UK) and reSource (Source BioScience, Nottingham, UK), or Sure Clean kit (Bioline, London, UK) purification kits were used to clean PCR products prior to sequencing or cloning.

Denaturing Gradient Gel Electrophoresis (DGGE)
DGGE was performed using the INGENYphorU-2×2 apparatus (INGENY, Goes, The Netherlands) according to the manufacturers' instructions and following procedural comments provided in the protocol by Green et al. [44]. Gradient gels containing 6.5% (v/v) polyacrylamide (37.5:1 ratio of acrylamide:bis-acrylamide) (National Diagnostics, Atlanta, USA) were formed using a peristaltic pump (Rietschle Thomas, Schopfheim, Germany) and a gradient maker device (INGENY) with denaturing gradients from 35 to 50% (top to bottom) unless stated otherwise (where 100% is 7 M urea and 40% (v/v) deionized formamide) in 1× Tris-acetate-EDTA (TAE) electrophoresis buffer. Samples (20 µL) were loaded on a stacking gel. Electrophoresis was performed at 80 V at a temperature of 60 • C for 18 h. Gels were stained with 1× SYBR Gold nucleic acid gel stain (Invitrogen, Life Technologies, Paisley, UK) in 1× TAE for 30 min at room temperature and destained in deionized water. Gels were placed on a UV transilluminator (G-box Chemi HR16, Syngene, Cambridge, UK) and visualized. Bands of interest were excised from DGGE gels using a sterile scalpel and DNA eluted by soaking in 100 µL of molecular grade water (Sigma) at 4 • C overnight. Aliquots were diluted 1:10 and re-amplified by PCR using the Badna-FP/-RP primer pair followed by PCR purification. Purified PCR products were cloned prior to sequencing using the pGEM-T Easy vector system (Promega, Southampton, UK) according to the manufacturers' instructions and standard sequencing primers SP6 and T7. To obtain a consensus sequence and control for cross-contamination, two clones from each excised DGGE band showing a different migration pattern were sequenced.

Sequence Analysis and Phylogeny
Yam badnavirus partial reverse transcriptase-ribonuclease H (RT-RNaseH) nucleotide sequences generated from plasmid clones were analysed using MEGA version 6.0 [64]. The Badna-FP/-RP and vector sequences were removed and the edited sequences were used for similarity basic local alignment search tool (BLAST) searches in the National Centre for Biotechnology Information (NCBI) GenBank databases [65]. Multiple alignments of the partial RT-RNaseH sequences were performed using the CLUSTALW default settings in MEGA version 6.0, where phylogenetic trees were created using the maximum likelihood method with the Kimura 2-parameter model [66]. The robustness of trees was determined by generating bootstrap consensus trees using 1000 replicates. A nucleotide percent similarity matrix (Table S1) was generated using Multiple Alignment using Fast Fourier Transform [67,68]. Protein sequences were aligned using CLUSTAL OMEGA [69,70] and further processed in BioEdit version 7.2.5 [71]. According to the International Committee on Taxonomy of Viruses (ICTV), sequences of the genus Badnavirus differing in their partial RT-RNaseH coding region by more than 20% meet the species demarcation criteria [62]. Eighty-nine yam badnavirus partial RT-RNaseH sequences (see Figure 4 for accession numbers) and the following virus sequences were obtained from the GenBank and used for comparative analyses: Banana streak OL virus (BSOLV, AJ002234); Cacao swollen shoot virus (CSSV, AJ781003); Commelina yellow mottle virus (ComYMV, NC001343); Rice tungro bacilliform virus (RTBV, X57924); Sugarcane bacilliform MO virus (SCBMOV, M89923); and Taro bacilliform virus (TaBV, AF357836). For consistency, the grouping system reported by Kenyon et al. [3] was adopted in this study. One new group (U12) reported by Umber et al. [39] and three new groups (T13-T15) described by Bömer et al. [33] were also added to the phylogenetic analysis ( Figure 4).

DGGE Resolves a Complex Mixture of Badnavirus Sequences Present in Dioscorea Species
To evaluate the potential of DGGE in unravelling the complexity of DBV diversity, yam DNAs (n = 47) were selected based on all those scoring PCR-positive for badnavirus sequences using the generic badnavirus primer pair Badna-FP/-RP. Yang et al. [59] designed this degenerate primer pair based on the consensus sequences of RT and RNaseH coding regions of published badnavirus sequences at the time. These primers are widely used in badnavirus research and also proved to be Viruses 2017, 9, 181 6 of 24 functional in several DBV diversity studies performed by Bousalem et al. [34], Kenyon et al. [3] and Seal et al. [38] among others. DGGE analysis of Badna-FP/-RP positive PCR products required the addition of a GC clamp to one of the two primers. A GC clamp is usually positioned at the 5 end of the forward primer [44]. Due to the degeneracy of the generic badnavirus primer set, we decided to test the addition of a GC clamp to both, the Badna-forward as well as the-reverse primer. A subset of eight yam samples grown at the NRI's quarantine glasshouse was selected and the variability of DBV sequences amplified using the two different Badna GC-clamped primers was compared by DGGE. Distinctive bands that were found to be sharp and intense were excised, cloned and sequenced ( Figure 2).  [38] among others. DGGE analysis of Badna-FP/-RP positive PCR products required the addition of a GC clamp to one of the two primers. A GC clamp is usually positioned at the 5′ end of the forward primer [44]. Due to the degeneracy of the generic badnavirus primer set, we decided to test the addition of a GC clamp to both, the Badna-forward as well as the-reverse primer. A subset of eight yam samples grown at the NRI's quarantine glasshouse was selected and the variability of DBV sequences amplified using the two different Badna GC-clamped primers was compared by DGGE. Distinctive bands that were found to be sharp and intense were excised, cloned and sequenced ( Figure 2).   Both Badna GC-clamped primers were able to produce PCR amplifications of the expected 619-bp size with only sample TDr 96/00629 failing in both reactions, the probable cause considered to be PCR inhibitors (Figure 2A,B). 'Direct' PCR amplification used the GC-clamped primers with DNA extractions as template, rather than 'nested' conditions, where purified Badna-FP/-RP products were used as templates and re-amplified. The 'direct' approach was found to improve the resolution of the PCR products analysed by DGGE and resulted in distinct but less intense bands on agarose gels when a lower number of PCR cycles (e.g., 25 cycles) was applied. These relatively sharp and intense DGGE bands could be excised under minimal UV exposure. Conversely, very intense PCR bands as a result of more PCR cycles being applied (35-40 cycles), led to an increase in smearing and less distinct banding patterns during DGGE analysis. Twenty-one bands were excised and successfully processed further ( Figure 2).
To obtain phylogenetic information from DGGE analysis, the excised bands were re-amplified, cloned and sequenced. This step lowered the likelihood of obtaining multiple DNA sequences from a single band of interest due to the close migration of different sequences on DGGE gels. With the exception of some samples, two clones per excised DGGE band were sequenced, with both clone sequences being presented unless they were found to be 100% identical to each other. The clone sequence data set (Table 1) was coded as follows: the first two letters stand for the country of origin (NG = Nigeria), 'b' represents breeding line samples, 'l' represents landrace yam samples, the middle number denotes the position of the excised DGGE band, the next letter denotes the clone (a = clone a and b = clone b) and the last two letters refer to the Dioscorea host species (e.g., Dr = Dioscorea rotundata).
Duplicate clone sequences originating from DGGE band numbers 1-21 resulted in >98% identical sequences, except for band numbers 8, 16 and 18. Only one clone was sequenced successfully for each of the DGGE bands 4 and 5 ( Figure 2 and Table 1). Although DGGE bands from different plant samples migrating at the same position in the gel usually resulted in >99% identical sequences (e.g., see bands 10 and 21), sometimes >99% identical sequences migrated at different positions. This was for example the case for bands 12 and 13, that showed 99-100% nucleotide identity to the eDBV 12-clone sequence S1a4Dr (KF829956, [39]) ( Table 1). Two of the four clone sequences generated from DGGE bands 12 and 13, NGl12bDr and NGl13aDr are 100% identical across the 528-bp region used for taxonomic assessment of badnaviruses [62]. However, they differ in their primer sequences (determined by examining the cloned excised band sequences), and thus the migration of these sequences at different positions in DGGE is likely a result of the significant degeneracy present in the Badna-FP/-RP primer pair.
The reproducibility of the DGGE technique was tested by running samples from two individual plants for both TDr 89/02475 and TDr 1892 accessions. The DGGE patterns for these biological replicates were found to be identical in both of the primer combinations tested (Figure 2). Nine DGGE bands of PCR amplifications using the Badna-F GC-clamp primer (Figure 2A) included DBV sequences assigned to four different monophyletic groups according to the phylogenetic analysis presented in Figure 4. Sequences clustered into groups K9 (bands 1-3), K8 (all of bands 4-7, one band 8 clone NGb8bDr), U12 (band 9) and T13 (band 8 clone NGb8aDr). Several more bands of low intensity were excised, but could not be processed successfully. The majority of sequences from DGGE bands 10-21 amplified using the Badna-R GC-clamp primer ( Figure 1B) clustered into group K8 (Table 1). Band numbers 12 and 13 represent sequences of group U12, showing two distinct positions in the DGGE analysis. The only group K9 sequence was identified for band number 16 (NGb16bDr), however the second clone NGb16aDr represented a K8 sequence. In comparison, these findings suggest that group K9 sequences could be under-represented in PCR amplifications using the Badna-R GC-clamp primer. Hence, we decided to use the Badna-F GC-clamp primer in DGGE analysis hereafter.
Screening of D. rotundata breeding lines (n = 112) and landraces (n = 15) maintained at the IITA generated 100% badnavirus PCR-positive results in a study performed by Seal et al. [38] (details of samples given in Supplementary materials of [38]). In this study, DGGE was used to study the diversity of the badnavirus sequences that exist within a subset of the 100% PCR-positive samples (Table 1). Our DGGE analysis revealed several unique bands but also many bands that are shared in a high proportion of the yam material tested. A total of at least 15 distinct bands were differentiated by DGGE, and following sequencing these could be assigned to different partial RT-RNaseH sequences. The dataset (Table 1) created reflects the complexity of the diversity of badnavirus sequences present in Dioscorea species.
Twenty-one DGGE bands were successfully cloned and sequenced from the DGGE gel presented in Figure 3. Several sequences (DGGE bands) appear to be common to all breeding lines of D. rotundata. This is for example the case for sequences NGl14bDr, NGb30bDr and NGb55aDr, all clustering with species group K8 (Table 1), and being 100% identical to BfA103Dc (AM503393, [34]) and 99% identical to known endogenous sequences of K8, such as S2h9Dr (KF829997, [39]). BLAST similarity matches showing 99-100% identity to BfA103Dc were identified for another 23 DGGE clone sequences (Table 1). Additionally, DGGE patterns can be compared across the samples tested and similarities or differences are easily observed. For example, crossings TDr 89/02475 × TDr 97/00777 and TDr 99/02793 × TDr 1892 showed identical DGGE patterns, suggesting the presence of the same set of DBV sequences.   Table 1.
Eleven cross-breeding lines of D. alata were also analysed in this study ( Figure S1). Nine out of these 11 cross-breeding lines contain a double-band made out of sequences NGb60Da and NGb61Da, which were found to be 92% identical to GyJT2Dt (AM503389, [34]) and 93% identical to NG1Da (AM944571, [35]), respectively. Sequence NGb60Da appears to be not only common to the majority of D. alata breeding lines tested in this study, but is also common to most of the D. rotundata material tested (e.g., sequence NGb6aDr corresponding to band 6 in Figure 2). Additionally, eight of the 11  Table 1.
Eleven cross-breeding lines of D. alata were also analysed in this study ( Figure S1). Nine out of these 11 cross-breeding lines contain a double-band made out of sequences NGb60Da and NGb61Da, which were found to be 92% identical to GyJT2Dt (AM503389, [34]) and 93% identical to NG1Da (AM944571, [35]), respectively. Sequence NGb60Da appears to be not only common to the majority of D. alata breeding lines tested in this study, but is also common to most of the D. rotundata material tested (e.g., sequence NGb6aDr corresponding to band 6 in Figure 2). Additionally, eight of the 11 cross-breeding lines analysed contain a second double-band which was not resolved very well, but contained sequence NGb63Da identified as 99% identical to SB42Da (AM072696), which was isolated from a D. alata plant in the Solomon Islands in 2000 [34] and fell into monophyletic group K1 (Table 1). Both double-bands appear to be conserved across most of the D. alata breeding lines analysed in this study and suggest the presence of integrated sequences in D. alata germplasm.

Phylogenetic Diversity of Dioscorea Badnavirus Sequences
All sequences produced in this study were subjected to similarity BLAST searches in the NCBI GenBank databases and nearest matches as well as percent identities can be found in Table 1. The phylogenetic analysis of the 527-528-bp-long partial RT-RNaseH nucleotide sequences showed that the 114 sequences fall within six monophyletic groups according to the suggested classification of yam badnaviruses [3,33,34,39]. The groups include K1, K5, K8, K9, U12 and T13 (Figure 4 and Table 1). A nucleotide percent similarity matrix was generated using MAFFT and is presented in Table S1. We included all DGGE-derived badnavirus sequences in the phylogenetic analysis, as this approach adds another layer of information by giving an approximate indication about the prevalence of a particular sequence in the material tested.

Monophyletic Group Assignment of Sequences Identified in This Study
Five DBV sequences, with each of those derived from individual DGGE bands, clustered into monophyletic group K1 described by Kenyon et al. [3]. Two sequences originating from D. dumetorum and D. esculenta samples share 99% nucleotide identity to FJ65bDe (AM072660) and FJ75cDe (AM072663), which were sampled from Fiji in 1999 [34]. Three further sequences were isolated from two D. alata samples and one D. dumetorum sample, showing 99% nucleotide identity to SB42Da (AM072696) isolated from a D. alata plant in the Solomon Islands in 2000 [34]. Six DBV sequences clustered into monophyletic group K5, with five of these originating from D. rotundata samples and sharing 98-99% nucleotide sequence identity to the eDBV5 clone S1g6Dr (KF829974, [39]) and NGl1950Dr (KX008589, [33]). The latter sequence NGl1950Dr was amplified from a D. rotundata plant by rolling circle amplification (RCA) and was considered to therefore most probably be an episomal sequence. Most DGGE-derived sequences (58 out of 114 in total) clustered into monophyletic group K8 and the majority of those originated from D. rotundata samples (Figure 4 and Table 1). Twenty-nine out of 114 DBV sequences in total were assigned to monophyletic group K9. Thirteen DBV sequences originating from D. rotundata samples clustered into monophyletic group U12 described by Umber et al. [39]. Only one sequence identified by DGGE fell into monophyletic group T13 described in [33]. This sequence (NGb8aDr) appears to be comigrating with DGGE sequence NGb8bDr, which is clustering into monophyletic group K8 and common to most of the D. rotundata germplasm tested in this study. A full description of the relationship of sequences is given in Appendix A.

Conservation of Amino Acid Motifs in Partial RT-RNaseH Badnavirus Sequences
BLAST similarity searches of the partial RT-RNaseH coding region derived from DGGE band sequences showed that most of the sequences had close identity to a number of existing badnavirus sequences ( Table 1). The analysis of the deduced amino acid sequences of the badnavirus sequences identified in this study, and published sequences of other badnaviruses have shown some distinctive conserved and semi-conserved regions of the family Caulimoviridae; the regions represent the 'FIAVYIDDILVFS' motif [73,74] at position 17-29 of the deduced protein sequence and the 'LKTTKGLRSWLGILNYAR' motif [35] at position 95-112 of the deduced protein sequence of the 528-bp-long partial RT-RNaseH. Moreover, the protein alignment presented in Figure S2 allows the simple identification of single amino acid changes or major differences between the phylogenetic groups. Clear patterns specific to every monophyletic species group can be observed, with for example valine at position 98 present in all sequences clustering into group U12. All protein sequences analysed in this study were compared to the reference sequence of Dioscorea bacilliform alata virus (DBALV or DaBVa, X94576-XX94581, [74]). 1). Both double-bands appear to be conserved across most of the D. alata breeding lines analysed in this study and suggest the presence of integrated sequences in D. alata germplasm.

Phylogenetic Diversity of Dioscorea Badnavirus Sequences
All sequences produced in this study were subjected to similarity BLAST searches in the NCBI GenBank databases and nearest matches as well as percent identities can be found in Table 1. The phylogenetic analysis of the 527-528-bp-long partial RT-RNaseH nucleotide sequences showed that the 114 sequences fall within six monophyletic groups according to the suggested classification of yam badnaviruses [3,33,34,39]. The groups include K1, K5, K8, K9, U12 and T13 (Figure 4 and Table  1). A nucleotide percent similarity matrix was generated using MAFFT and is presented in Table S1. We included all DGGE-derived badnavirus sequences in the phylogenetic analysis, as this approach adds another layer of information by giving an approximate indication about the prevalence of a particular sequence in the material tested.   [39] and by Kenyon et al. [3], denoted by U12 and K1-K11 respectively. Three novel monophyletic groups, T13-15, described by Bömer et al. [33] were also included. Sequences depicted in bold represent partial RT-RNaseH sequences of characterised episomal full-length DBV genomes currently available in GenBank. The phylogenetic tree was divided into sub-groups with groups K8 and U12 presented  X57924) were added, as well as representative sequences of all monophyletic groups described by Bousalem et al. [34] (where DBV-D: Dioscorea bacilliform virus D), by Umber et al. [39] and by Kenyon et al. [3], denoted by U12 and K1-K11 respectively. Three novel monophyletic groups, T13-15, described by Bömer et al. [33] were also included. Sequences depicted in bold represent partial RT-RNaseH sequences of characterised episomal full-length DBV genomes currently available in GenBank. The phylogenetic tree was divided into sub-groups with groups K8 and U12 presented in (A); K9 and T13 shown in (B) and DBV-D, K1-11, T14, T15 as well as the outgroups shown in (C). The bootstrap analysis of the sequences was 1000 replicates and the cut-off value was 70%. The scale bars show the number of substitutions per base.

Potential of DGGE in Badnavirus Diversity Studies and Identification of Potential Integrated Sequences
In this study, the variation in complex (e)DBV sequence mixtures across a subset of the 127 IITA D. rotundata samples that scored 100% Badna-positive by PCR in a study by Seal et al. [38] was rapidly and robustly evaluated. The results presented here showed that DGGE is a very useful technique for diversity profiling of the amplified DBV partial RT-RNaseH sequences. This is in agreement with previous reports on the ability of DGGE to differentiate DNA sequences of high similarity in given samples, such as in the diversity studies of viruses, bacteria and phytoplankton [54,75]. A particular strength for the badnavirus sequence diversity is, however, clearly apparent from the similarities in DGGE banding profiles observed between breeding lines of the same Dioscorea species indicating highly conserved sequences that appear to represent eDBVs. For example, crossings TDr 89/02475 × TDr 97/00777 and TDr 99/02793 × TDr 1892 show identical DGGE patterns (Figure 3), suggesting the presence of the same set of DBV sequences, with the majority of those likely to be of endogenous nature and ultimately indicating a very similar genetic background of these samples.
The DGGE banding profiles in this study and the resulting clone sequences thereof, depict the vast complexity of DBV sequences present in yam germplasm. The precise analysis of all DBV sequences present in a given sample using DGGE as a method, however, is not straightforward in the light of this complexity and the difficulty of cloning closely migrating bands. Our efforts focused on cloning clearly defined bands. For example, the majority of breeding lines of the same Dioscorea species share the common DGGE bands 5, 10 and 11 for D. rotundata (Figure 2). Sequences from these bands and others cluster closely together and a total of 22 sequences share 99-100% nucleotide identity to BfA103Dc (AM503393, [34]). It appears probable that these bands may represent integrated sequences as the presence of such high identity episomal viruses in all the different material would not be expected. Episomal badnavirus sequences infecting other crops have been shown to be highly diverse, as illustrated by the genetic diversity of four banana streak virus (BSV) isolates from Australia ranging from 21.8 to 33.6% in a comparison of the amino acid sequences of the ribonuclease H domain in open reading frame 3 (ORF3) [24] and nucleotide diversity of up to 18% of the RT-RNaseH-coding domain of six BSVs from East Africa [41]. Further evidence is that BfA103Dc was shown to be free of episomal viruses by immunosorbent electron microscopy (ISEM) and ELISA [38]. Hybridisation using the common bands as probes in Southern blots would confirm whether DGGE bands 5, 10 and 11 represent common ancient integrated sequence in the host yam genomes rather than an unusually homogeneous episomal viral infection.
The presence of more than one band within every single lane of DGGE analysis reveals that every leaf sample tested contained more than one badnavirus sequence. The presence of sequences representing more than a single badnavirus species in single-leaf samples has been reported before through sequencing of cloned PCR products and generating clones representing several different badnavirus species from a single leaf sample [3,35,60]. This study reveals that in D. rotundata this is in fact the norm, rather than an exception as also indicated in our previous study detecting mixed infections using RCA as a tool to amplify episomal badnavirus sequences [33]. In this context we propose DGGE to be used as complementary method to RCA, enabling the rapid identification of potentially integrated sequences while screening for badnavirus diversity.
Testing of DGGE as a tool in the analysis of complex badnavirus sequence mixtures present in samples of yam breeding lines in this study was based on the assumption that the previous procedure by cloning and sequencing of PCR products is an inefficient method to reveal full sequence diversity. Such techniques require the selection and sequencing of a large number of clones in order to unravel all sequence diversity present in a given sample as demonstrated also for BSV [76]. Equally, RFLP was reported not much a useful tool for yam badnavirus diversity, since sequences with different RFLP patterns were found clustered in the same species group [35]. The above illustrates that the practical advantage of DGGE is that it is a rapid means of detecting sequence diversity in uniformly sized PCR products, eliminating the need for labour-intensive screening of redundant clones [77]. Equally, in a situation when there are a large number of samples to be analysed that contain multiple badnavirus sequences, DGGE is the most appropriate technique in terms of cost and practicality for a diversity study compared to direct sequencing of clones following PCR or RFLP.
Conversely, DGGE suffers from some methodical pitfalls and inherent practical limitations. One concern is the detection limit, with Muyzer et al. [45] suggesting that any target DNA that is less than 1% of the total target pool is unlikely to be detected by DGGE. As such, low badnavirus titers could lead to the unlikely detection of true episomal DBV sequences by DGGE, compared to endogenous sequences potentially existing in higher copy numbers in some yam germplasm. As a related caveat, discrete fingerprint bands may not always be apparent when analysing highly diverse samples, leading to smearing or poorly resolved patterns [44]. Another related limitation is the comigration of DNA molecules of different sequence as reported by Ercolini [49] and also experienced in this study. Although DGGE was able to separate sequences with high similarities, problems included accurate cutting of closely-migrating DGGE bands in particular under suboptimal resolution due to the occurrence of smeared backgrounds as can be seen in several samples presented in Figure 3. Additionally, some technical problems were encountered in carrying out the DGGE technique, namely general handling of fragile DGGE gels and the persistent trouble of background smearing in gels that made band-scoring problematic. Several modifications were performed on the PCR conditions, ranging from altering concentrations of primers, the choice of the Taq polymerase enzyme, the DGGE gradient range and the volume of PCR products loaded per gel lane. In this context, we found the procedural comments on the DGGE technique given by Green et al. [44] useful. Different primers (Badna-FP versus Badna-RP GC-clamped primer) were also tried, but the comparison shown in Figure 2 concluded a limited suitability of the Badna-RP GC-clamp primer in the DBV diversity study because of the potential under-representation of badnavirus sequences clustering in the K9 monophyletic group.
For the future, a worthwhile additional improvement would be the inclusion of a reference DGGE standard in which a profile is created using pooled clones from distinct DGGE bands. Here, clone sequences or PCR re-amplification products would be analysed by DGGE in a lane adjacent to the original sample PCR product from which it was excised. This will assist the typing of unknown populations and eliminate the need for DGGE band cloning and subsequent sequencing. Moreover, the degeneracy of the Badna-FP/-RP primers is expected to contribute to the poor resolution (smearing experienced in this study), potentially creating multiple products per template sequence differing only in their primer binding site. Hence, the design of monophyletic group specific primers should be considered in the future, potentially improving DGGE resolution while reducing the complexity at the same time by focusing the analysis on the most prevalent and interesting DBV species.

DGGE-Captured Badnavirus Diversity
To investigate the in-depth diversity of badnavirus sequences present in the yam samples analysed in this study, DGGE analysis was applied and representative bands were cloned and sequenced. All sequenced bands were shown to be partial RT-RNaseH sequences of badnaviruses, with 112 DBV sequences analysed clustering into six (K1, K5, K8, K9, U12 and T13) of the 15 putative species groups identified by Bömer et al. [33], Bousalem et al. [34], Kenyon et al. [3] and Umber et al. [39]. A comparison of their amino acid sequences generated from the translated partial RT-RNaseH region sequences together with other members of the family Caulimoviridae revealed conserved and semi-conserved regions similar to the previously published badnavirus sequences [3,[33][34][35]38,73,[78][79][80]. In particular, a stretch of the conserved 'FIAVYIDDILVFS' region of the RT in the C-terminal of the ORF3 polyprotein was observed, as was the semi-conserved 'LKTTKGLRSWLGILNYAR' region ( Figure S2). This confirms that DNA extracted from Dioscorea spp. samples contained sequences belonging to the genus Badnavirus of the family Caulimoviridae [73,[78][79][80][81].
Sequence analyses of the partial RT-RNaseH domain in this work support the classification and diversity of yam badnaviruses proposed by Kenyon et al. [3], and subsequently substantiated upon by Bousalem et al. [34], Umber et al. [39] and Bömer et al. [33]. The maximum variability recorded within the partial RT-RNaseH coding region at the nucleotide level among the 112 yam badnavirus sequences determined in this study was 37%. This degree of variability was within the range of intergroup nucleotide diversity for group K1-K11 (23.1-39.4%) of yam badnaviruses [3]. For badnaviruses present in other plant hosts, high diversity levels have also been reported, with 21.8-33.6% sequence diversity in a comparison of the amino acid sequences of the ribonuclease H domain in ORF3 reported for four BSV isolates from Australia [24], 28% maximum nucleotide diversity for Ugandan BSV isolates [2], 29.4% maximum nucleotide diversity between complete genomes of CSSV isolates [82], and up to 33.5% nucleotide variability within the partial RT-RNaseH sequences for isolates of sugarcane bacilliform viruses (SCBVs) [5].
Two sequences (NGl65De and NGb53Dr) determined in this study shared <60% nucleotide identity with any previously identified yam badnavirus sequence group but were similar to the divergent sequences groups of K12 and K13 reported by Kenyon et al. [3] from South Pacific yams. The sequences clustered closer to RTBV, a more distantly related virus in the family Caulimoviridae, than any member of the genus Badnavirus. It is possible that these sequence groups represent divergent badnaviruses, ancient endogenous pararetrovirus sequences or new genera within the family Caulimoviridae [3]. Further research is needed to characterise the nature of these sequences.

Endogenous Badnavirus Partial RT-RNaseH Sequences
To date, four of the 15 yam badnavirus species groups identified have been reported to contain eDBV sequences, namely K5, K8, K9 and U12 [38,39]. Phylogenetic analyses of the cloned DGGE bands revealed that all the sequences that originated from DGGE bands common to the majority of D. rotundata (for example NGb30bDr and NGb1aDr) samples clustered into either the K8 or the K9 species groups (Figure 4). These common bands show 99-100% nucleotide identity to several more sequences identified in this study as well as integrated DBV sequences described by Umber et al. [39], and hence, it seems probable that they represent eDBV sequences. A number of DGGE band sequences from D. rotundata were found to both share 99% nucleotide identity to eDBV9 clone S1h2Dr (KF829975, [39]) or eDBV8 clone S2h9Dr (KF829997, [39]) respectively, suggesting that these sequences represent integrated sequences in the samples tested. Several more DGGE band sequences present in D. rotundata were found that clustered with the other two yam badnavirus/eDBV species groups (K5 and U12) (Figure 4). For example, sequences NGl9a/bDr, NGl12a/bDr, and NGl13a/bDr of the U12 species group were from bands that were present in several D. rotundata. They share 99-100% nucleotide identity to eDBV12 clone S1a4Dr (KF829956, [39]), suggesting these sequences could also be integrated sequences in the samples tested. Equally, NGb22a/bDr, NGb23aDr, NGb24aDr, NGb52Dr could also represent integrated sequences as the sequences share 98-99% nucleotide identity to eDBV5 clone S1g6Dr (KF829974, [39]).
In summary, 28 sequences showed eDBV sequences as their nearest match with nucleotide identities above 99%. Twenty-two sequences were also identified that have 99-100% identity to the badnavirus-particle free BfA103Dc (AM503393, [34]) and as a result are likely to represent integrated sequences as described above. Additionally, a further 32 sequences clustering into monophyletic group K8 are highly conserved among themselves (GyJT2DT, AM503389, [34], being the nearest match), suggesting that these sequences are also integrated. These findings together lead to at least 82 of 112 (72%) sequences in this study considered to most probably originate from integrated sequences. It is proposed that DGGE therefore is very effective at revealing eDBV sequences when testing plant germplasm.
In this study, the breeding lines were derived from true seed yam, and this material has been grown over successive cropping seasons at the IITA. The true seed material should have been free of episomal badnavirus particles, as yam badnaviruses have not been reported to be seed-transmissible [62]. However, the origin from true seed yam does not automatically exclude that episomal forms of virus can be present. Vertical transmission of integrated copies via seeds is possible as reported for endogenous PVCV in petunia, where the integrated virus gets activated by direct transcription of integrated PVCV sequences in the form of a tandem array [22]. Vertical transmission of activatable eDBVs cannot be excluded in yams and more research on this potential risk is needed. Nevertheless, the presence of all the common DGGE bands in yam lines analysed in this study suggests that these most likely represent eDBV sequences, whereas additional DGGE bands detected only in some lines potentially represent new infections during the propagation of the material in the field over many cultivation seasons. Further analysis of the DGGE sequences using the Southern blot technique will help to determine if these sequences are diverse eDBVs or represent episomal viruses acquired through screen-house and field propagation.

Conclusions
In summary, a workflow combining PCR and DGGE methods for rapid and efficient determination and unravelling of complex mixtures of potentially episomal and endogenous badnavirus sequences has been developed ( Figure 1). Here, we used specific PCR primers to amplify a partial sequence of the badnavirus RT-RNaseH coding region to investigate the molecular diversity of (e)DBV sequences in selected West African yam germplasm. PCR products were resolved using DGGE, giving characteristic banding patterns for each yam line examined. This approach is described here for the first time for the assessment of badnavirus sequences present in Dioscorea spp. germplasm and resulted in the identification of complex DGGE profiles representing multiple sequences of DBVs. A total of 112 yam badnavirus sequences were generated (GenBank accession numbers KY555456 to KY555569), falling into six monophyletic species groups. The conserved nature of several DGGE-derived yam badnavirus sequences, as well as high nucleotide identities to eDBV sequences deposited in GenBank suggests that the majority of West African yam germplasm contains a mixture of integrated badnavirus sequences. Our findings highlight DGGE as an extremely useful technique for rapid indication of badnavirus sequence diversity in such samples containing multiple eDBVs, enabling a snapshot of the diversity between genomes to be taken. The approach taken in this study to enable rapid identification of potential candidate integrated badnavirus sequences, indicated by their conserved nature across germplasm, should have wide application for the study of an ever-increasing number of plant species found to contain this previously understudied, but important genus of plant viruses.
Supplementary Materials: The following are available online at www.mdpi.com/1999-4915/9/7/181/s1, Figure S1: Denaturing gradient gel electrophoresis (DGGE) analysis of partial RT-RNaseH badnavirus sequences from eleven samples consisting of cross-breeding lines of D. alata, Figure S2: Protein alignment from deduced amino acid sequences of partial RT-RNaseH nucleotide sequences of 114 yam badnavirus sequences determined in this study together with other members of the family Caulimoviridae,

Appendix A
Five DBV sequences, with each of those derived from individual DGGE bands, clustered into monophyletic group K1 described by Kenyon et al. [3]. Sequences NGl71Dd isolated from D. dumetorum and NGl73De from D. esculenta both share 99% nucleotide identity to FJ65bDe (AM072660) and FJ75cDe (AM072663), which were sampled from Fiji in 1999 by Lebas and described by Bousalem et al. [34] as representative sequences of the DeBV-A sub-group A (Dioscorea esculenta bacilliform virus A) monophyletic group. Sequences NGb63Da and NGl72Da isolated from D. alata, as well as NGl70Db from D. dumetorum show 99% nucleotide identity to SB42Da (AM072696) isolated from a D. alata plant in the Solomon Islands in 2000 by Lebas and described by Bousalem et al. [34] as a representative sequence of the DeBV-A sub-group B.
Only one sequence identified by DGGE fell into the monophyletic group T13 described in our previous study [33]. Sequence NGb8aDr has 99% identity to the episomal Dioscorea bacilliform rotundata (RT) virus 1 (DBRTV1) sequence originating from D. rotundata (KX008596, [33]), which is also available as a full genome sequence (GenBank KX008574). However, the second clone of this DGGE band clustered into group K8, indicating cross-contamination with a closely migrating band.