Development of a Versatile, Near Full Genome Amplification and Sequencing Approach for a Broad Variety of HIV-1 Group M Variants

Near full genome sequencing (NFGS) of HIV-1 is required to assess the genetic composition of HIV-1 strains comprehensively. Population-wide, it enables a determination of the heterogeneity of HIV-1 and the emergence of novel/recombinant strains, while for each individual it constitutes a diagnostic instrument to assist targeted therapeutic measures against viral components. There is still a lack of robust and adaptable techniques for efficient NFGS from miscellaneous HIV-1 subtypes. Using rational primer design, a broad primer set was developed for the amplification and sequencing of diverse HIV-1 group M variants from plasma. Using pure subtypes as well as diverse, unique recombinant forms (URF), variable amplicon approaches were developed for NFGS comprising all functional genes. Twenty-three different genomes composed of subtypes A (A1), B, F (F2), G, CRF01_AE, CRF02_AG, and CRF22_01A1 were successfully determined. The NFGS approach was robust irrespective of viral loads (≥306 copies/mL) and amplification method. Third-generation sequencing (TGS), single genome amplification (SGA), cloning, and bulk sequencing yielded similar outcomes concerning subtype composition and recombinant breakpoint patterns. The introduction of a simple and versatile near full genome amplification, sequencing, and cloning method enables broad application in phylogenetic studies of diverse HIV-1 subtypes and can contribute to personalized HIV therapy and diagnosis.


Introduction
HIV-1 full genome sequencing is a challenging task due to the broad degree of HIV-1 genomic diversity worldwide. Four major groups of HIV-1 exist, groups M, N, O, and P [1][2][3], which are highly divergent and trace back to separate cross-species transmission events from primates to humans [4]. HIV-1 group M has spread globally, causing more than 85% of global HIV infections and can be subdivided into nine subtypes (A-D, F-H, J, and K), at least six sub-subtypes of A and F (A1-A6, F1-F2), currently at least 98 circulating recombinant forms (CRFs) [5], and numerous unique recombinant forms (URFs) [6]. CRFs and URFs are composed of two or more (sub)-subtypes; while CRFs have been identified in at least three epidemiologically unlinked individuals, URFs are defined as recombinants without evidence of onward transmission [5]. The continuous emergence of novel HIV-1 M strains [6][7][8][9] including sequences that remain unclassified or can be attributed to ancient strains [10] poses a significant challenge for HIV-1 molecular surveillance, diagnosis, and therapy [11,12]. Consequently, robust and versatile amplification and sequencing protocols are needed that apply to such strains over the near full genome (NFG).
The vast majority of sequences in the Los Alamos National Library (LANL) HIV sequence database are derived from partial genome sequencing [13]. Partial genome sequencing obscures subtype determinations and limits the identification of epidemiologic signatures and recombination breakpoints, which can be distributed across the entire viral genome [14,15]. To overcome these limitations, PCR-based near full genome sequencing (NFGS) protocols have been introduced, which were mostly applied in cohorts restricted to one or a few subtypes [16][17][18][19][20]. Protocols have been optimized for next-generation sequencing (NGS) [21,22], or the use of oligonucleotide probes combined with real-time PCR [23,24]. The coverage of multiple subtypes has been addressed by universal NFGS approaches [22,25]. Universal approaches rely on primer binding to semi-conserved regions in the HIV genome, and limited coverage and efficiencies can still occur depending on viral load and subtype. The combination of NGS with HIV-specific probes increased detection sensitivity and sequencing depth [26]; however, the detection of emerging and rapidly evolving strains is expected to be challenging. NGS approaches provide large-scale sequencing with in-depth coverage and analyses of minority variants; however, their usage is limited in resource-constrained settings due to the required equipment and expertise, and do not allow downstream processing of the amplicons and cloning. There is still a need for robust and straightforward NFGS protocols suited for a broad diversity of HIV-1 M subtypes and the generation of NFG amplicons adaptable to downstream biological applications.
A flexible NFGS approach has therefore been established by our team using rationally designed primers that bind at highly conserved domains across major HIV-1 group M subtypes. The set of primers enables nested PCRs to span single or multiple functional gene regions. It provides NFG amplification and sequencing through one-amplicon, two-amplicon or multiple-amplicon approaches. Pure HIV-1 M subtypes and highly diverse HIV-1 M URFs from Cameroon, characterized in detail in Banin et al. [27], were used to validate the protocol, which was applied to bulk sequencing, cloning, single-genome amplification (SGA), and third-generation sequencing (TGS).

Ethical Clearance
This study was conducted in accordance with the Declaration of Helsinki and the protocol was approved by the Institutional Ethical Review Board of the Cameroon Ministry of Public Health and by the Institutional Review Board at New York University School of Medicine (NYUSoM), New York, USA on April 6, 2018 (protocol: i09-0431). Before sample collection, informed consent was obtained from the study participants, who were all part of a cohort of HIV positive individuals; this cohort is monitored at the Medical Diagnostic Center (MDC) in Yaoundé, Cameroon in collaboration with the New York University School of Medicine (NYUSoM), New York, USA.

Study Samples
Whole blood samples were collected at the MDC between 2000 and 2015. They were shipped under standard regulatory conditions for biological materials to NYUSoM for processing of plasma and peripheral blood mononuclear cells (PBMCs) using Ficoll gradient centrifugation (Histopaque, Sigma-Aldrich, St. Louis, MO, USA), and stored at −80 • C. Study samples were selected based on previously identified URF sequences through partial genome sequencing in our group and on the availability of plasma volumes ≥500 µL [28][29][30].

RNA Extraction
Viral RNA was extracted from 500 µL HIV-1 positive plasma using QIAamp Viral RNA Minikit (Qiagen, Valencia, CA, USA). Virions were concentrated by ultracentrifugation at 14,000× g for 2 h at 4 • C. All but 140 µL of the supernatant was removed before addition of lysis buffer and subsequent RNA extraction according to the manufacturer's instructions. Extracted RNA (30 µL) was either immediately reverse transcribed or stored at −80 • C.

cDNA Synthesis
Reverse transcription of RNA was performed using SuperScript III (Life Technologies, Carlsbad, CA, USA) with primer OFM19 [31] corresponding to HXB2 position 9632-9604. If amplicons within the 5' half of the HIV-1 genome could not be obtained using cDNA generated with OFM19 primer, HIV 6352 rev [25] was used as an alternative. For the generation of cDNA, 5.25 µL nuclease-free water (Promega) was mixed with 0.75 µL reverse primer OFM19 or HIV 6352 rev (20µM), 3 µL (10 mM) dNTP mix and 30 µL extracted template RNA. The mix was incubated at 65 • C for 5 min and removed on ice for 1 min. 12 µL 5× first strand buffer, 3 µL (100 mM) DTT solution, 3 µL (40 U/µL) RNaseOUT, and 3 µL (200 U/µL) SuperScript III Reverse Transcriptase were added to the mix to get a total volume of 60 µL. The mix was vortexed and incubated at 50 • C for 1 h followed by an increase to 55 • C for 1 h. SuperScript III Reverse Transcriptase was inactivated at 70 • C for 15 min. Remaining RNA was digested by adding 3 µL RNase H (2 U/µL) and incubation at 37 • C for 20 min.

Primer Design
The rational design of universal HIV-1 group M amplification and sequencing primers was based on full genome multiple sequence alignments with reference strains downloaded from the LANL database. Our approach is related to PrimerDesign-M, an alignment-based primer design tool from the LANL database, which also served as the first guide to identify appropriate genomic regions for some of our primers. Generally, most primers were designed manually based on sequence alignments and primer criteria as follows. An HIV-1 full genome alignment comprised of 480 reference sequences from all major group M (sub)-subtypes and CRFs [32] was shortened to a representative 40 strain-panel. The latter included four references per eight (sub)-subtypes A1, A2, B, C, D, F1, F2, G, and two CRFs, 01_AE, and 02_AG (Results 3.2), which make up to the most dominant (sub-)subtypes/CRFs prevailing the five inhabited continents of the world.
The procedure was similar for the design of amplification and sequencing primers with the only difference of targeted regions. Amplification primers mostly targeted conserved domains adjacent or at the ends of the genes of interest. Sequencing primers had no spatial constraints; however, appropriate distances (~500 bp) between consecutive primer binding sites should provide sufficient sequence overlap between the multiple complementary sequence fragments (≥100 bp overlap) while aiming for the lowest possible number of primers to cover NFGS. Preferentially, primers were designed to have a length of 18-30 base pairs with 40-60% G/C content and a melting temperature (Tm) of 50-65 • C. Primer pairs should be within 5 • C difference in Tm and exempt of complementary regions to avoid heterodimers. Primers with long stretches of identical base pairs (especially G and C), hairpin structures, or self-complementary regions (homodimers) were avoided, as can be analyzed with a primer design software such as IDT OligoAnalyzer (https://www.idtdna.com/calc/analyzer). If possible, primers should start and end with one or two G/C base pairs. The 3' end of the primer was most critical to accurately match the homologous regions of the majority of reference sequences. Few ambiguous base pairs were tolerated in the central region of the primer sequence and up to three ambiguity characters were introduced for ≤70% conserved sites per primer. Since accurate primer design and positive prediction do not guarantee efficient, practical implementation, trial and error have to succeed. Amplification primers were screened with at least three study samples (cDNA templates), and if possible, in conjunction with already established pairing primers. Primer selection was based upon efficient PCR amplification yielding strong and clean single bands without smear or evidence of unspecific amplification products in at least one NFG PCR or at least 1/3 of PCR reactions amplifying smaller regions. Out of >70 newly tested amplification primers, 26 primers passed the selection criteria (Results 3.2). Sequencing primers were screened with at least two study samples, i.e., bulk-amplified nested PCR products with single strong bands that enabled positive sequencing reactions using established sequencing primers. Out of >90 newly tested sequencing primers, 64 primers worked sufficiently to determine sequences ≥700 bp in length (quality value ≥16, Macrogen sequencing service, New York, NY, USA) from half of the test samples (Results 3.2) [33].
The PCR reactions were performed with the following cycling conditions: initial denaturation step of 30 s at 98 • C and 30 cycles of 10 s at 98 • C (denaturation), 15 s at 53 • C (annealing), and elongation at 68 • C for 1 min per 1 kb of the amplification product. After 30 cycles, a final elongation step was performed at 68 • C, and the mix stored at 4 • C. The PCR products were analyzed by gel electrophoresis at 120 V for 30 min on a 0.8% agarose gel. To exclude/minimize unspecific amplification and cross-contamination, samples from different participants were processed separately or sequentially, and nucleotide removal treatments were applied frequently. Multiple sequence alignments (MEGA5.2) and highlighter plots (Los Alamos HIV Database) were done with current and historical PCR products from our lab to exclude cross-contamination.

Sanger Sequencing and Sequence Editing
Sanger Sequencing of amplified PCR products was done by Macrogen (Macrogen Corp., New York, NY, USA). Assembly and editing of the sequences were done with DNAStar package programs SeqMan Pro and EditSeq (Lasergene, Madison, WI, USA).

Simplot and Recombinant Drawing Tool
The Simplot software package version 3.2 (Window size 200, step size 20 and 250 bootstrap replicate) was used for breakpoint determination. Schematic illustrations of the various full genomes were done using the recombinant drawing tool provided by the Los Alamos HIV sequence database. A bootstrap support of 70% between a subtype reference strain and a query sequence was used as criteria to assign a subtype to a breakpoint region. If bootstrap values remained below 70% using several pure and CRF reference subtypes, the region was considered unidentified.

Third-Generation Sequencing
Third-generation sequencing (TGS) was performed using Oxford Nanopore MinIon flow cells, capable of high-throughput sequencing in real-time and generating long DNA reads. TGS was carried out at the NYU sequencing and genomic unit. Unsheared HIV DNA from 12 samples was quantitated with the qubit HS DNA kit (Thermo Fisher Scientific). DNA was normalized to 1 µg in 45 uL of buffer and individually input into the Oxford Nanopore PCR Barcoding kit (EXP-PBC001, R9 Version) according to the protocol outlined by Oxford Nanopore (PBGE_9007_V7_revA_16May2016\ PBGE_9007_v7_revJ_16May2016). Each sample was barcoded with a unique adapter index provided in the kit (using the NEB LongAmp Taq master mix, NEB # M0287S) and run with the following PCR conditions: Initial Denaturation for 3 min at 95 • C, Denaturation for 15 s at 95 • C, Annealing for 15 s at 62 • C, Extension for 12 min at 65 • C, Denaturation to Extension was repeated for an additional 14 cycles to compensate for the large fragment sizes (15 cycles total), and one final extension at the end for 14 min at 65 • C. After PCR, each sample was cleaned with Ampure XP (Beckman Coulter) and barcoded samples were quantitated with the Qubit HS DNA kit. Twelve samples were normalized to 110 ng each and combined into one pool for input into the Nanopore Sequencing kit from Oxford Nanopore (SQK-NSK007). The library prep was followed, as described in the protocol above. The pooled library was eluted for 30 min at 37 • C. The final library was quantitated with the qubit HS DNA kit and added onto a primed SpotON Flow Cell (Oxford Nanopore, #FLO-MIN105). The SpotON Flow cell was run on the MinION Mk 1 (#MIN-MAP002) with the script NC_48Hr_Sequencing_Run_FLO_MIN105. FASTQ files were generated using poretools (version 0.6.0) (PMID: 25143291) and general quality controls were performed. Barcodes were trimmed using cutadapt (version 1.9.1) (http://journal.embnet.org/index.php/embnetjournal/article/view/200) and all the reads were mapped to HIV-I (HXB2) reference genome (K03455.1) for an initial quality assessment using the Burrows-Wheeler Aligner (BWA) (version 0.7.13) mem algorithm [34] and visualized using Integrative Genomics Viewer (IGV) software packages [35]. Phylogenetic analyses were done using MEGA5.2 software [36]. Maximum likelihood phylogenetic trees were generated using 1000 bootstrap replicates [37]. Any HIV unspecific reads or minor HIV variants that clustered with a viral population from an unrelated sample were removed. TGS (sub)populations were averaged to consensus (con) sequences using Consensus Maker (Los Alamos National Library (LANL) Database) (www.hiv.lanl.gov) or SeqMan Pro.

Single-Gnome Amplification
Single genome amplifications (SGA) are PCRs performed on endpoint-diluted cDNA to preclude amplification artifacts such as template switching that can occur when mixtures of genetically diverse templates are used. Template cDNA dilutions that resulted in no more than 30% positive nested PCR reactions were assumed to contain one amplifiable cDNA template per positive PCR (>80% likelihood) [31]. Specifically, ten nested PCR reactions were performed per cDNA dilution. Multiple cDNA dilutions in the range between 1:2 and 1:1,000,000 were tested for each sample. Dilutions with no more than three positive PCR reactions were considered for SGA analyses and the positive amplicons subjected to direct sequencing. SGAs were performed using High-Fidelity PrimeSTAR GXL DNA polymerase (Clontech, Mountain View, CA, USA) using the same PCR and cycling conditions as described under 2.5.

Data Storage and Documentation
Near full genome sequences are available from GenBank with accession numbers MK086109-MK086132. Whole sets of TGS sequences are available upon request. Correspondence, data and material requests should be addressed to Ralf Duerr (Ralf.Duerr@nyumc.org).

Design of an Adaptable NFG Amplification Strategy
The goal was to establish a flexible and straightforward methodology for NFG amplification, cloning, and sequencing applicable to the broad variety of group M subtypes and recombinant forms. The protocol should be suitable for several downstream methods including high-throughput sequencing (NGS or TGS), SGA, cloning, and bulk sequencing, and applicable to modern practice in well-equipped labs as well as to basic procedures in resource-limited countries. The NFGS approaches aimed to cover >8.5 kb including all encoding HIV-1 M genes, i.e., from gag to nef. The primary goal was the reliable and consistent determination of NFGS of highly diverse group M strains irrespective of the number of amplicons needed ( Figure 1). The first option was the amplification of HIV-1 NFG in a single PCR using primers targeting the 5' UTR and 3' UTR regions. Also, we established a two-amplicon and multiple-amplicon approach. The two-amplicon approach targeted two overlapping "half genome" (HG) sequences. The 5' HG region (HG1) included gag and pol, while the 3' HG region (HG2) included env and nef. The accessory genes (vif, vpr, vpu) were included in both half genomes in their overlapping part. The multiple amplicon strategies aimed to cover near full genomes using three to five overlapping amplicons from a selection of eight PCR constructs in total. Each PCR construct encompassed complete genomic units of encoding genes (env also separated into gp120 and gp41).
Also, we established a two-amplicon and multiple-amplicon approach. The two-amplicon approach targeted two overlapping "half genome" (HG) sequences. The 5' HG region (HG1) included gag and pol, while the 3' HG region (HG2) included env and nef. The accessory genes (vif, vpr, vpu) were included in both half genomes in their overlapping part. The multiple amplicon strategies aimed to cover near full genomes using three to five overlapping amplicons from a selection of eight PCR constructs in total. Each PCR construct encompassed complete genomic units of encoding genes (env also separated into gp120 and gp41).

Rational Primer Design for NFG Amplification and Sequencing of Diverse Group M Viruses
NFGS approaches on our highly diverse Cameroonian URF samples had only limited success using published NFG primers [16,25] (see below). Therefore, rational primer design was performed using full genome reference sequence alignments with diverse HIV-1 group M strains downloaded from the LANL database (Table 1). Initially, an alignment composed of 480 reference sequences was used belonging to several known pure and CRF clades covering the major branches of each subtype [32]. Subsequently, the alignment was reduced and simplified to a representative 40 strain-panel comprising four reference sequences per eight (sub)-subtypes A1, A2, B, C, D, F1, F2, G, and two CRFs, 01_AE, and 02_AG ( Figure 2). The designed amplification primers mostly targeted conserved domains adjacent or at the ends of the HIV-1 M genes of interest under consideration of primer characteristics such as melting temperature, GC content and secondary structure (using IDT web application and/or PrimerDesign-M, LANL database). The newly designed primers were combined with published primers [25,31] that exhibited significant amplification efficiencies in our Cameroonian cohort ( Table 1). The newly developed primers targeted regions in the HIV-1 M genome, which were highly conserved across several reference sequences of diverse subtypes, as shown for the NFG and two HG amplicon approaches (Figure 2; Figures S1 and S2). Of significant importance, the terminal nucleotides of the primer binding sites were highly conserved, whereas a few internal positions exhibited moderate variance for some primers/reference strains. The binding sites of three primers for half genome amplifications were fully conserved across the 40 reference strains panel (Figures S1 and S2). Overall, we designed/selected 26 new amplification primers, complemented by ten primers from previous studies [25,31] that proved to be efficient. They were used for the amplification of 15 different constructs including four NFGs, three HGs, and eight smaller regions within the full genome (Figures 1 and 3, Table 1). Subsequently, the alignment was reduced and simplified to a representative 40 strain-panel comprising four reference sequences per eight (sub)-subtypes A1, A2, B, C, D, F1, F2, G, and two CRFs, 01_AE, and 02_AG ( Figure 2). The designed amplification primers mostly targeted conserved domains adjacent or at the ends of the HIV-1 M genes of interest under consideration of primer characteristics such as melting temperature, GC content and secondary structure (using IDT web application and/or PrimerDesign-M, LANL database). The newly designed primers were combined with published primers [25,31] that exhibited significant amplification efficiencies in our Cameroonian cohort ( Table 1). The newly developed primers targeted regions in the HIV-1 M genome, which were highly conserved across several reference sequences of diverse subtypes, as shown for the NFG and two HG amplicon approaches (Figure 2; Figures S1 and S2). Of significant importance, the terminal nucleotides of the primer binding sites were highly conserved, whereas a few internal positions exhibited moderate variance for some primers/reference strains. The binding sites of three primers for half genome amplifications were fully conserved across the 40 reference strains panel (Figures S1 and S2). Overall, we designed/selected 26 new amplification primers, complemented by ten primers from previous studies [25,31] that proved to be efficient. They were used for the amplification of 15 different constructs including four NFGs, three HGs, and eight smaller regions within the full genome (Figures 1 and 3, Table 1).

Figure 2.
Conserved primer binding sites for HIV-1 near full genome amplification. Multiple sequence alignments of primer binding sites across ten different pure (sub)-subtypes and CRFs, as relevant for the near full genome amplification approach 1. For each (sub)-subtype/CRF, four reference sequences were selected that broadly cover the respective clade; for the binding site of primer 3'UTR rev 1, only two A2 and F2 reference sequences were available from the LANL database. The numbering of primer binding regions is based on the HxB2 reference strain (GenBank: K03455). Positions (pos) of primer binding sites are shown in brackets (HxB2 pos. 596-9542). Orange and green arrows along the HxB2 genome map indicate first round and second round primer positions, respectively. Black dots and colored letters indicate matches and mismatches with primer nucleotides shown on top, respectively.  Sequencing primers were designed based on the same criteria as described above for amplification primers. They were designed to generate fragments with ≥100 bp overlap when sequences ≥700 bp are obtained ( Figure S3). Seventy-one sequencing primers were applied including MassRuler DNA ladder was used for amplicon size determination. The amplicons shown in the gels were obtained from the following samples: NYU6541-6 (NFG1), NYU129-5 (NFG2), LB089-1 (NFG3 and env), NYU1122-1 (NFG4, HG1, and HG1a), NYU119-3 (HG2, pol, and gag), NYU1999 (gagpol), NYU6556-3 (HG1b), BDHS33 (vif /gp120), NYU124-2 (gp120), and MDC131-1 (gp41/nef and vifvpu).
Sequencing primers were designed based on the same criteria as described above for amplification primers. They were designed to generate fragments with ≥100 bp overlap when sequences ≥700 bp are obtained ( Figure S3). Seventy-one sequencing primers were applied including 43 newly designed forward primers and 21 newly designed reverse primers, complemented by seven reverse primers published by other groups [25,31] (Tables 2 and 3). Fourteen sequencing primers were most frequently used based on their broad reactivity with our diverse study samples ( Figure S3). These 14 primers provided complete sequence coverage from HxB2 positions 596 to 9604 (~9.1 kb) in NFG one-amplicon approaches. For two-amplicon approaches, eight primers were used to sequence the 1st HG (HxB2 position 776 to 5956;~5.2 kb) and six primers for the 2nd HG (HxB2 position 5037 to 9533;~4.6 kb).

Subtype-Independent Amplification and Sequencing of HIV-1 Near Full Genomes
Initially, we screened published NFGS nested primers for suitability with our cohort samples from Cameroon. Nested PCRs with a primer set previously optimized for NFGS of clade C viruses [16] did not result in any NFG PCR products using three study samples including #6541-6, which worked for most other NFGS approaches tested in this study. However, using a nested primer set recently optimized for multiple HIV-1 subtypes [25], we obtained positive reactions for #6541-6 and two more study samples out of 23 tested specimens (Table 4). Based on these results, we screened the newly designed primers as well as combinations of our and published primers to optimize NFGS across multiple HIV-1 group M subtypes (Table 1).
To establish and optimize the method, we first used four study samples from infections with pure (sub)-subtype viruses (B and F2; #Strain0526 and #LB002-1), a CRF (CRF02_AG; #6506-8), and a URF (#6541-6) (based on preliminary and published partial genome sequencing data of Duerr and Nyambi labs [29,40]) (Figure 4). Subtypes F2 and CRF02_AG were chosen, because of their high prevalence in Cameroon [41]. Two of those constructs could be amplified with a one-amplicon approach, the other two with a two-amplicon approach (Table 4). Subsequently, we analyzed 19 more samples, including another subtype F2 virus (#LB022), another CRF02_AG virus (#LB006-1), and different highly diverse URF samples from Cameroon ( Table 4). The 18 URF genomes were composed of subtypes A (A1), F (F2), G, and recombinants CRF01_AE, CRF02_AG, and CRF22_01A1, and revealed non-uniform mosaic NFG sequences [27].     Out of 23 samples, 11 NFGS were obtained with a one-amplicon approach. Another 11 samples needed two amplicons and one sample at least three amplicons for NFGS determination ( Table 4). The one-amplicon approaches were based on nested PCRs with primers targeting the 5' UTR and 3' UTR ends to obtain approximately 9.1 kb products in a single PCR reaction (Figure 1). The four optimized NFGS primer pairs (NFG1 to NFG 4) obtained yields between 13% and 21.7 % (Table 1;  Table 4); the best yield was obtained with the NFG1 approach based on four newly designed primers (21.7% yield). Only one sample (#6541-6) could be amplified using all four NFGS approaches ( Table  4). The four one-amplicon approaches complemented each other to an overall 43% yield. In the twoamplicon approach, NFGS were obtained from two overlapping half genomes. Half genome 1 (HG1) consisted of gag, pol and the accessory genes vif, vpr, and vpu (~5.2 kb), whereas the second half genome (HG2) included vif, vpr, vpu, env, and nef (~4.6 kb) (Figure 1 and Table 1). HG1a and b had 45% and 80% yield, respectively, and complemented each other except for one sample, to achieve a 95% yield in combination. Of interest, HG2, which was obtained using a mix of newly designed primers and published primers [25,31] achieved 100% yield and proved to be the most efficient nested primer combination in our set of study samples ( Table 4). The amplification of smaller regions for the multiple-amplicon approach included gag (1.5 kb), gagpol (4.5 kb), pol (3.2 kb), vifvpu (1.0 kb), vif/gp120 (3.0 kb), env (3.4 kb), gp120 (1,7 kb), and gp41/nef (1.8 kb) (Figure 1; Tables 1 and 4) with yields >55%. Since plasma viral load (PVL) has been described as a critical parameter for successful NFG sequencing [16,17,21], we analyzed whether the viral load of a sample predicted the number of amplicons needed for characterizing the NFG. Using the diverse study set of Cameroonian URF viruses, no association was observed between the number of amplicons needed for NFGS and PVL ( Figure S4). The lowest PVL in our set of study samples with determined NFGS was 306 cps/mL.

Comparative NFGS Analysis Using Different Amplification, Cloning, and Sequencing Strategies
To validate our NFGS approach, different amplification, cloning, and sequencing procedures including bulk amplification, high-throughput/deep sequencing, and SGA were applied. The NFGS Out of 23 samples, 11 NFGS were obtained with a one-amplicon approach. Another 11 samples needed two amplicons and one sample at least three amplicons for NFGS determination ( Table 4). The one-amplicon approaches were based on nested PCRs with primers targeting the 5' UTR and 3' UTR ends to obtain approximately 9.1 kb products in a single PCR reaction (Figure 1). The four optimized NFGS primer pairs (NFG1 to NFG 4) obtained yields between 13% and 21.7 % (Table 1; Table 4); the best yield was obtained with the NFG1 approach based on four newly designed primers (21.7% yield). Only one sample (#6541-6) could be amplified using all four NFGS approaches ( Table 4). The four one-amplicon approaches complemented each other to an overall 43% yield. In the two-amplicon approach, NFGS were obtained from two overlapping half genomes. Half genome 1 (HG1) consisted of gag, pol and the accessory genes vif, vpr, and vpu (~5.2 kb), whereas the second half genome (HG2) included vif, vpr, vpu, env, and nef (~4.6 kb) ( Figure 1 and Table 1). HG1a and b had 45% and 80% yield, respectively, and complemented each other except for one sample, to achieve a 95% yield in combination. Of interest, HG2, which was obtained using a mix of newly designed primers and published primers [25,31] achieved 100% yield and proved to be the most efficient nested primer combination in our set of study samples ( Table 4). The amplification of smaller regions for the multiple-amplicon approach included gag (1.5 kb), gagpol (4.5 kb), pol (3.2 kb), vifvpu (1.0 kb), vif /gp120 (3.0 kb), env (3.4 kb), gp120 (1,7 kb), and gp41/nef (1.8 kb) (Figure 1; Tables 1 and 4) with yields >55%. Since plasma viral load (PVL) has been described as a critical parameter for successful NFG sequencing [16,17,21], we analyzed whether the viral load of a sample predicted the number of amplicons needed for characterizing the NFG. Using the diverse study set of Cameroonian URF viruses, no association was observed between the number of amplicons needed for NFGS and PVL ( Figure S4). The lowest PVL in our set of study samples with determined NFGS was 306 cps/mL.

Comparative NFGS Analysis Using Different Amplification, Cloning, and Sequencing Strategies
To validate our NFGS approach, different amplification, cloning, and sequencing procedures including bulk amplification, high-throughput/deep sequencing, and SGA were applied. The NFGS outcome was compared across platforms using sample NYU6541-6, which gave positive results for most amplification approaches. Bulk sequencing is the traditional method for the genetic determination of viral variants that make up ≥20% of a viral population in a sample, e.g., it is still the standard for genotypic drug resistance testing [41,42]. High-throughput sequencing enables more in-depth analyses and can assess the diversity of viral populations in a sample. As proof of principle, we used the portable MinION TGS technology (Oxford Nanopore Technologies) with the capacity to determine long reads in real-time [43]. Despite limitations in accuracy and depth [44,45], it gave an estimate of whether our diverse amplicon approaches are suited for high-throughput/deep sequencing ( Figure 5 and Table 5) [27]. outcome was compared across platforms using sample NYU6541-6, which gave positive results for most amplification approaches. Bulk sequencing is the traditional method for the genetic determination of viral variants that make up ≥20% of a viral population in a sample, e.g., it is still the standard for genotypic drug resistance testing [40,41]. High-throughput sequencing enables more indepth analyses and can assess the diversity of viral populations in a sample. As proof of principle, we used the portable MinION TGS technology (Oxford Nanopore Technologies) with the capacity to determine long reads in real-time [42]. Despite limitations in accuracy and depth [43,44], it gave an estimate of whether our diverse amplicon approaches are suited for high-throughput/deep sequencing ( Figure 5 and Table 5) [27].   Since amplification artifacts such as recombination events can occur in vitro during bulk amplification, single genome amplification (SGA) [31] was used for validation of our results. Six samples were tested for SGA including NFG and HG constructs. Our results confirmed the results reported by other groups [46,47] that major viral populations detected by deep or bulk sequencing show a good match with SGA sequences with comparable genetic coverage ( Figure 5 and Table 5) [27]. Specifically, in vitro recombination rates <1.6% were reported for bulk amplification [46,47]. As a precaution, TGS minority variants with <2% prevalence that were not confirmed by SGA were consequently excluded in our analyses. In contrast to high-throughput sequencing such as NGS or TGS, bulk sequencing and SGA allow the cloning of NFG constructs, which enables further functional characterization in downstream biological assays, such as replicative fitness or phylogenetic drug resistance tests [48][49][50]. An NFGS plasmid generated by Topo cloning of an NFG bulk amplicon was included in the comparative analysis, which revealed comparable characteristics to bulk sequencing as evident by <0.1% genetic distance and similar recombination breakpoint patterns ( Figure 5). A comparative Simplot Bootscan analysis was performed to underline the significance of the introduced NFGS approach for accurate subtype/recombinant form determination across platforms. Bulk sequencing, cloning, SGA, and TGS (consensus sequence) yielded similar NFGS results with genetic distances <2% across platforms and overall comparable recombinant breakpoints patterns ( Figure 5).

Discussion
Here, we report a simple and efficient method for the amplification and sequencing of HIV-1 near full genomes from plasma to cover a broad range of recombinant (URF and CRF) viruses and pure group M (sub)-subtypes. Due to the flexible approach based on one, two, or more amplicons, 100% of the study samples were genotyped over the near full genome. Recent studies performing NFGS out of plasma achieved different efficiencies in the range of 45-100% [16][17][18][20][21][22]25,46], from which most approaches remained below 100% efficiency and/or were restricted to specific subtypes. A one-amplicon approach obtained 67% NFGS efficiency in a subtype C cohort [16], whereas a two-amplicon approach applied to samples of diverse clades yielded 90-92% success rates [25,46]. A study in a US clade B cohort used an alternative two-or three-amplicon strategy to obtain 76% yield for samples with viral loads >10,000 cps/mL. A four-amplicon strategy achieved 93% yield when applied to diverse HIV-1 subtypes and groups [21], however when used in an Indian cohort with clade C and AC recombinant samples, the yield dropped to 45% despite adjustments to an optimized six amplicon strategy [18]. The results of these studies imply and confirm our findings in achieving higher percentages of NFGS when diverse amplicon approaches are complementarily used. In our study, the 43% efficiency of the one-amplicon approach, based on four different NFG protocols, increased to 95% when a two-amplicon strategy based on two different half genome protocols was used, and eventually achieved 100% with the help of multiple amplicons. Two NGS-based methods were recently introduced, which also obtained 100% NFGS rates. They were applied in a clade B cohort using a four amplicon strategy [20] or in a highly diverse study population using mixed four-and six-amplicon NGS libraries [22]. The latter study used an elegant method based on degenerate primers fused to common adapter sequences for efficient library generation, which enabled broad coverage of HIV subtypes and groups; however, the need of NGS equipment and expert knowledge limits its application in resource-constrained countries. Also, the method required considerable optimizations with regards to RNA extraction and cDNA synthesis for suitability with clinical samples.
For NFGS, a viral load threshold of 3,000 cps/mL [16,21] or even 10,000 cps/mL [16,17,21] has been reported. NGS can generate full genome coverage at viral loads >log3.5 cps/mL, while even most sensitive NGS/probes approach gradually decline to partial genome coverage for <log3.5 cps/mL [22,26]. Of interest, we obtained NFGS over a broad range of PVL down to~300 cps/mL, although it is worth mentioning that in our study, we only had one sample below 3,000 cps/mL PVL included, and plasma was routinely concentrated~3.5 fold (from 500 µL to 140 µL) before RNA extraction. Other than that, our method needed no additional optimizations. Other groups reported that a substantially higher yield could be obtained using Benzonase pretreatment of plasma before RNA extraction to reduce the background of human RNA and DNA and by increasing the reverse transcription temperature from 42 • C to 47 • C [22]. Doubling the amount of RT enzyme from 200 U to 400 U significantly increased the critical RT step for NFGS in another study [17].
Despite the limitation in sample size (23 subjects), the analysis of a range of pure (sub)-subtypes and recombinant forms including A (A1), B, F (F2), G, CRF01_AE, CRF02_AG, and CRF22_01A1 implies a broad applicability of the presented NFGS approach across diverse (sub)-subtypes/recombinants and geographic regions of the world. Universal usage of our approach is further corroborated by the set of primers, designed to bind several most prevalent pure (sub)-subtypes and recombinant forms on the globe. Nevertheless, future studies are needed to show that the primers and the NFGS strategy are robust on larger sample sets from diverse regions of the world. The introduced procedure of rational primer design coupled with a flexible amplicon strategy allows researchers to improve the protocol further according to their needs.
Our approach provides a universal toolbox that can be used and adapted for different amplification and sequencing applications and is easy to handle for laboratories in resource-limited countries. Besides the flexible methodology, it allows both NFGS as well as focused molecular studies for distinct genomic regions. Since our method does not necessarily depend on high-throughput sequencing (NGS/TGS), which usually determines a one-way road towards genetic read-out, the SGA or bulk amplification can be used for the generation of molecular clones and subsequent downstream applications in biological assays. Notably, the introduced combination of primers from different studies [25,31] including our newly designed primers brought up highly efficient nested PCR primer pairs that may prove useful for purposes beyond NFGS and subtype monitoring. For example, we achieved 100% yield in our HG2 approach, which includes the amplification of the complete env region and accessory genes. It could be a useful alternative for env SGA, pseudovirus generation, screening for neutralizing antibody resistant strains, and genotypic tropism analysis. Another construct encompassing whole pol, gag, and accessory genes (HG1b) was obtained with 80% amplification yield when using four newly designed nested primers. The latter represents a promising approach for genotypic drug resistance testing in research and clinical care covering several essential target sites. Although not tested in the present study, we expect our NFG amplification approaches to performing well with state-of-the-art NGS systems to capture minority variants in quasispecies populations, as we showed recently for the sensitive Illumina MiSeq platform using a related amplification approach restricted to the reverse transcriptase region of pol [41].
Hence, the proposed protocols have broad applicability and can form the basis for downstream applications in phylogenetic and molecular studies, and further project-specific optimizations of primer combinations. While the studied in-depth sequencing approach based on MinION technology needs further optimization regarding accuracy and depth, it indicates future implications in real-time NFGS on-site.

Conclusions
In summary, we designed and validated a robust and adaptable protocol for NFGS from plasma, capable of amplifying and sequencing a broad diversity of HIV-1 group M strains including pure (sub)-subtypes and mosaic viral strains (URF and CRF). Despite the limited number of 23 samples, our approach suggests broad applicability based on the semi-conserved primer design and the flexible amplicon approach. Future studies using larger sample numbers from different geographic regions and epidemics will need to confirm its suitability in global HIV surveillance. The introduced NFGS approach may prove valuable to complement existing strategies in structural and functional HIV research including regions with rapid evolution and emergence of novel recombinant viral strains.