HIV-1 Envelope Glycoprotein Amino Acids Signatures Associated with Clade B Transmitted/Founder and Recent Viruses

Background: HIV-1 transmitted/founder viruses (TF) are selected during the acute phase of infection from a multitude of virions present during transmission. They possess the capacity to establish infection and viral dissemination in a new host. Deciphering the discrete genetic determinant of infectivity in their envelope may provide clues for vaccine design. Methods: One hundred twenty-six clade B HIV-1 consensus envelope sequences from untreated acute and early infected individuals were compared to 105 sequences obtained from chronically infected individuals using next generation sequencing and molecular analyses. Results: We identified an envelope amino acid signature associated with TF viruses. They are more likely to have an isoleucine (I) in position 841 instead of an arginine (R). This mutation of R to I (R841I) in the gp41 cytoplasmic tail (gp41CT), specifically in lentivirus lytic peptides segment 1 (LLP-1), is significantly enriched compared to chronic viruses (OR = 0.2, 95% CI (0.09, 0.44), p = 0.00001). Conversely, a mutation of lysine (K) to isoleucine (I) located in position six (K6I) of the envelope signal peptide was selected by chronic viruses and compared to TF (OR = 3.26, 95% CI (1.76–6.02), p = 0.0001). Conclusions: The highly conserved gp41 CT_ LLP-1 domain plays a major role in virus replication in mediating intracellular traffic and Env incorporation into virions in interacting with encoded matrix protein. The presence of an isoleucine in gp41 in the TF viruses’ envelope may sustain its role in the successful establishment of infection during the acute stage.


Introduction
HIV-1 genetic diversity due to frequent mutation rates, polymorphisms, recombination events and altered pattern of glycosylation within the envelope (Env) [1] drives HIV-1 escape from broadly From acute HIV-1-infection serum samples (N = 469) classified as TF viruses, we obtained 98 consensus individual clade B HIV-1 envelope sequences after a molecular evolutionary genetic analysis. Five of the one hundred tree sequences were excluded because they had short envelope amino acids sequences lengths (<856 bp) after MEGA 7 multiples sequences alignment. The nested-RT-PCR success rate of acute infection samples was 23% (102/469), as presented in Figure 1.
From the total of early HIV-1 infection samples (N = 240) where viruses identified as classified as recent HIV viruses, twenty-eight (28) HIV-1 consensus envelope sequences were obtained ( Figure  1). This result corresponded to an RT-PCR amplification success rate of 15% (36/240). Eight (8) non-B HIV-1 envelope sequences were excluded for molecular analysis (Figures 1).
Of forty-eight (48) chronic HIV-1 infection samples collected from LSPQ serobank samples collections, only two HIV-1 envelopes sequences were finally obtained after analysis, which

HIV-1 RNA Extraction
One hundred (100) microliters of unique serum from each HIV-1-infected subject was used for HIV-1 RNA extraction using BioRobot MDx automated viral RNA extraction. The QIAamp ® Virus BioRobot ® MDx Kit (Qiagen, Valencia, CA, USA) was used. To respect the minimum 350-µL sample volume required for BioRobot automate extraction, we diluted each 100-µL serum sample with 250 µL Dulbecco's Modified Eagle's Medium (DMEM; Sigma-Life Science, Oakville, Ontario, CA, USA). Extraction was conducted automatically according to the manufacturer's protocol. One positive and one negative control sera were always included in each panel of extraction for quality control. Suspension of extracted RNA (approximately 60-80 µL) was immediately used for reverse transcription or stored at −80 • C for reference use.
The RNA product (10 µL) was first denatured at 65 • C for 5 min in a thermocycler (Applied Biosystems (ABI) GeneAmp PCR System 9700). The denatured RNA (5 µL) was added to the 45 µL reaction mix containing primers, RNase OUT and the Platinium ® Taq DNA polymerase. The reaction mix was then placed in an Applied Biosystems (ABI) thermocycler for cDNA synthesis. The thermal profile was as follow: 53 • C for 30 min and 94 • C for 2 min, followed by 40 cycles at 94 • C for 2 min for denaturing, 55 • C for 30 s for annealing, 68 • C for 4 min for extension and 68 • C for 5 min with a final hold at 4 • C. The PCR product was immediately amplified (nested PCR) or stored at 4-8 • C for future use.

Second Amplification
After RT-PCR, a nested polymerase chain reaction (nested PCR) of the full-length HIV-1 envelope gene (GP160) was amplified using appropriate primers that covered a fragment of 3.10 kb.
For amplification in the ABI 9700 thermocycler, the following temperatures were used: 94 • C for 2 min, followed by 45 cycles of denaturing at 94 • C for 15 s, annealing or hybridization at 55 • C for 30 s, and extension or elongation at 68 • C for 2 min, followed by 68 • C for 7 min and a hold at 4 • C. The PCR products were immediately visualized on a 1% agarose gel by electrophoresis and purified using DNA purification kits from QIAGEN and stored at −20 • C before sequencing.

DNA Sequencing and Sequence Assembly
Two µL of input purified DNA was quantified by a Nanodrop and the appropriate concentration was established. In addition, 5 µL (0.2 ng/µL) of input purified DNA was also quantified by iQ™5 Optical System Software, (Bio-Rad Laboratories Ltd. Ontario, Canada) using PicoGreen dsDNA Quantification Reagent). Full-length gp160 of the HIV-1 viral envelope gene was sequenced using MiSeq (Illumina, San Diego, CA, USA) a next generation sequencing (NGS) method with a MiSeq ® Reagent Kit (San Diego, CA, USA). The Nextera XT DNA library prep kit (Illumina, San Diego, CA) was used for library preparation and the manufacturer's protocols were respected. The Illumina MiSeq system was edited using MiSeq Reporter, a bioinformatics data analysis software built into the MiSeq. The workflow using the Nextera XT DNA library kit contains the following steps: (1) tagmentation of genomic DNA, (2) PCR amplification, (3) PCR Clean-up, (4) library normalization and (5) library pooling for MiSeq sequencing were strictly respected. After cycle sequencing, gigabase data provided by Illumina MiSeq were transferred and stored into a securely cloud-based genomics computing environment named BaseSpace Sequence Hub.

Data Management and Analysis
Sequence analysis was performed by cycle-sequencing using Illumina MiSeq. The data produced were viewed by a sequencing analysis viewer (SAV) as recommended by the manufacturer. Individual sequence fragments were assembled using an IVA (iterative virus assembler) and consensus sequences were identified by each specimen. Only the sequences that represented most of 1% of the viral population were retained for subsequent analyses to reduce potential recombination artifacts that may influence viral sequence diversity. Consensus sequences obtained from NGS and IVA assembly obtain from original data and those obtain from Los Alamos HIV-1 sequence database were aligned using Clustal Wallis and conducted in Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets (MEGA7.0) software (www.megasoftware.net) [44]. A reference "HXB2" (GenBank accession number: K03455.1) envelope sequence, gp160 (amino acids residues 512-856 of full genome numbering) was included in the alignment. Each sequence was also aligned in the blastx homepage with online software (http://blast.ncbi.nlm.nih.gov) and screened to identify potential protein products encoded by a nucleotide query of each sequence. This blastx search ensures that all sequences correspond to the amplified full-length gp160 HIV-1 envelope. All ambiguous or gaps sequences were excluded from subsequent analyses.
We used the online Los Alamos sequence database tools to determine the characteristics of the HIV-1 variable regions (GP120-V1 to V5 loops). This online tool provides results of the HIV N-linked glycosylation site, the loop length and the V3 loop net charge (NC) where we used default setting that have computed with KRH = (+) and DE = (−). The following link was used: https: //www.hiv.lanl.gov/content/sequence/VAR_REG_CHAR/index.html.
Determination of the HIV-1 envelope amino acid sequences logo and frequencies for three category of infection stages (TF, RC and CH) were performed using WebLogo Version 2.8 and 3 (http://weblogo.berkeley.edu/logo.cgi) [45,46]. The WebLogo tools generate sequence logo, graphical representations of the patterns within a multiple sequence alignment. HIV-1 envelope sequences subjected to WebLogo analyses (N = 231)) were First submitted to multiple sequence alignments using MEGA 7.0 software. The HXB2 Env GP160 sequences were introduced in alignment for numbering purposes [15,29,31,[47][48][49]. Aligned envelope sequences for each category of viruses (chronic, transmitted/founder and recent) were then downloaded separately in fasta files. Sequences files for each category of infection were subsequently uploaded individually and analyzed in WebLogo online software. Weblogo analysis identify and counts number of individuals amino acids selected at each position of the Env sequence length (1-856) and report frequency from population. Sequence data of each category of infection were further downloaded in plain text output format that reported the total count of selected amino acids at each position of Env aligned sequence. We finally generated three files of amino acid counts for each of the three category of viruses (TF, RC and CH) and proceeded to statistical comparison; position by position and amino by amino acid, between them and reported the significant difference. We referred to HXB2 Env sequences introduced as reference in MEGA 7.0 multiple sequences alignments to identify the exact position of the amino acids in Env by checking in black the boxes: without (w/o) gaps. Localization (sub-regions or domains of Env) for any amino acid changes referred to HXB2 numbering as summarized in Supplemental Table S2.
Envelope nucleotide and amino acid sequences for all full-length HIV-1 transmitted founder, recent and chronic viruses were deposited and are available in the GenBank sequence database under the accession MK076153-MK076292. HIV-1 envelope sequence data qualifiers are also available in

Statistical Analyses
We used descriptive statistics, the mean and nucleotide composition across HIV-1 envelope gp160 length to estimate the amino acid differences between transmitted founder with recent and chronic sequences using the HXB2 envelope sequence as a reference. Descriptive statistics were performed using proportion and means or median for qualitative and quantitative variables, respectively, as well as ridge plots. The Kruskal Wallis, Wilcoxon and Chi square tests were used to compare the different parameters per type of infection. The Wald test statistics with logistic regression model were also used. Stata version 14, R version 3.5.1 and SPSS version 24 were used as statistical software. A p-value less than 0.05 was considered statistically significant. Moreover, the p-values were adjusted using the Benjamin Hochberg procedure for multiple comparisons.

Ethics Approval and Consent to Participate
Ethical approval was given by the "Comité d'éthique et de la recherche (CÉR) des Centres hospitaliers affiliés à l'Université de Montréal (CHUM); Number: 2015-5569, CE14-344CA. It was yearly renewed since 2015 by our Institutional review board. All samples were anonymized before application in this study. No nominal information was used for analysis or data management. This manuscript did not contain any individual data in any form whatsoever to publish.

Results
A total (N) of 757 specimens from acute and early HIV-1 infections based respectively on EIA-p24 antigen positive, Western blot (WB) negative and WB positive with the presence of HIV-1 antibody and qualified by a recent infection testing algorithm (RITA) [37,39] (Figure 1). Chronic clade B HIV-1 viruses envelopes sequences were selected from Los Alamos HIV sequence databases and constitute a part of chronic sequences that were obtained from the LSPQ serobank collection (Figure 1, Supplemental Table S1).
From acute HIV-1-infection serum samples (N = 469) classified as TF viruses, we obtained 98 consensus individual clade B HIV-1 envelope sequences after a molecular evolutionary genetic analysis. Five of the one hundred tree sequences were excluded because they had short envelope amino acids sequences lengths (<856 bp) after MEGA 7 multiples sequences alignment. The nested-RT-PCR success rate of acute infection samples was 23% (102/469), as presented in Figure 1.
From the total of early HIV-1 infection samples (N = 240) where viruses identified as classified as recent HIV viruses, twenty-eight (28) HIV-1 consensus envelope sequences were obtained ( Figure 1). This result corresponded to an RT-PCR amplification success rate of 15% (36/240). Eight (8) non-B HIV-1 envelope sequences were excluded for molecular analysis (Figure 1).
Of forty-eight (48) chronic HIV-1 infection samples collected from LSPQ serobank samples collections, only two HIV-1 envelopes sequences were finally obtained after analysis, which demonstrated a fair result after HIV RNA amplification (4%; Figure 1).
The repartition of clade B HIV-1 envelope sequences (one sequence per individual) in total is as follow: chronic (CH): 45.46% (N = 105 include); TF viruses: 42.42% (N = 98 include, N = 4 non-B HIV-1 were exclude) and recent (RC): 12.12% (N = 28 were include, N=8 non-B HIV-1 were excluded) as shown in Figure 1 and Supplemental Table S3. A total of 105 HIV-1 B chronic envelope sequences included two HIV-1 envelope sequences of LSPQ serobank samples collections and LANL chronic HIV-1 clade B envelope sequences ( Figure 1). The background information of LANL selected chronic HIV-1 envelope sequences is available in Supplemental Table S4.
Thus, a total of 231 clade B HIV-1 full-length consensus envelope sequences were included in this analysis.

Figure 2.
Ridge plot comparing the HIV-1 envelope variable loop number of N-glycosylation sites between TF, RC and chronic (CH) viruses. The boxes represent a density plot of number of Nglycosylation sites for: Env V1 loop (a), Env V2 loop (b), Env V1V2 loop (c), Env V3 loop (d), Env V4 loop (e) and Env V5 loop (f) for CH, RC and TF viruses respectively. In box, the top (green), middle (blue) and bottom (yellow) represent respectively number of N-glycosylation sites for CH, RC and TF viruses envelope sequences respectively. The X-axis represents sequence loops number of Nglycosylation sites and the Y-axis the density of sequences number of N-glycosylation sites for each timeline category of viruses (CH, RC and TF). As shown in Figure 2d, the differences in Env V3 loop numbers of N-glycosylation sites between CH and TF, p = 0.026; RC and TF, p = 0.004 and CH and RC, p = 0.05 were statistically significant using Chi Square Test with regression logistic analysis.

Figure 2.
Ridge plot comparing the HIV-1 envelope variable loop number of N-glycosylation sites between TF, RC and chronic (CH) viruses. The boxes represent a density plot of number of N-glycosylation sites for: Env V1 loop (a), Env V2 loop (b), Env V1V2 loop (c), Env V3 loop (d), Env V4 loop (e) and Env V5 loop (f) for CH, RC and TF viruses respectively. In box, the top (green), middle (blue) and bottom (yellow) represent respectively number of N-glycosylation sites for CH, RC and TF viruses envelope sequences respectively. The X-axis represents sequence loops number of N-glycosylation sites and the Y-axis the density of sequences number of N-glycosylation sites for each timeline category of viruses (CH, RC and TF). As shown in Figure 2d, the differences in Env V3 loop numbers of N-glycosylation sites between CH and TF, p = 0.026; RC and TF, p = 0.004 and CH and RC, p = 0.05 were statistically significant using Wald test with logistic regression model.
The positive net charge of the HIV-1 Env GP120 loop 3 (V3) was also statistically significant between CH (median/range: 5 (3, 6), RC (median range: 4 (3, 5.5) and TF (median/range: 4 (3, 5), p = 0.040 and specifically, between CH (median/range: 5 (3,6), and TF (median/range: 4 (3, 5), viruses (OR = 0.82, 95% CI (0.69-0.98), p = 0.038) using the Kruskal-Wallis test (Figure 4; Supplemental Table  S5).  The boxes represent the density plot of: Env V1 loop lengths (a), Env V2 loop lengths (b), Env V1V2 loop length (c), Env V3 loop length (d), Env V4 loop length (e) and Env V5 loop length (f) for CH, RC and TF viruses respectively. For each box, the top (green), middle (blue) and bottom (yellow) represent, respectively, TF, RC and CH viruses sequences. X-axis presents sequence loop lengths and the Y-axis the loop length density for each timeline category of viruses (CH, RC and TF). As presented in Figure 2f, the differences in the HIV-1 Env GP120 V5 loop lengths between RC and TF, p = 0.003 and CH and RC, p = 0.004 are statistically significant using Wald Test with regression logistic model. Env V1V2 loop length (c), Env V3 loop length (d), Env V4 loop length (e) and Env V5 loop length (f) for CH, RC and TF viruses respectively. For each box, the top (green), middle (blue) and bottom (yellow) represent, respectively, TF, RC and CH viruses sequences. X-axis presents sequence loop lengths and the Y-axis the loop length density for each timeline category of viruses (CH, RC and TF). As presented in Figure 2f, the differences in the HIV-1 Env GP120 V5 loop lengths between RC and TF, p = 0.003 and CH and RC, p = 0.004 are statistically significant using Chi Square Test with regression logistic analysis.
The positive net charge of the HIV-1 Env GP120 loop 3 (V3) was also statistically significant between CH (median/range: 5 (3, 6), RC (median range: 4 (3, 5.5) and TF (median/range: 4 (3, 5), p = 0.040 and specifically, between CH (median/range: 5 (3,6), and TF (median/range: 4 (3, 5), viruses (OR = 0.82, 95% CI (0.69-0.98), p = 0.038) using the Kruskal-Wallis test (Figure 4; Supplemental Table  S5).  The box represents a density plot of the V3 positive net charge. The top (green), middle (blue) and bottom (yellow) represent the TF, RC and CH viruses' V3 sequences net charge, respectively. The X-axis represents the number of charges for the HIV-1 Env gp120V3 loop and the Y-axis represents the density of sequence charges of HIV-1-infected individuals for the CH, RC and TF viruses respectively. As shown in Figure 4, the difference of the V3 positive net charge was significant between CH, RC and TF, p = 0.04. Importantly, the difference in HIV-1 Env V3 loop net charge was statistically significant between CH and TF viruses, p = 0.03 using Wald with regression logistic model. No significant difference was observed between RC and TF, p > 0.05.

Clade B HIV-1 Envelope Amino Acids Signatures Associated to Transmitted/Founders and Recent Viruses Compared to Chronic
The second objective of this study was to screen full-length HIV-1 envelope sequences to identify genetic characteristics (mutation patterns) associated with transmitted/founder and recent viruses compared to chronic ones. As presented in Figures 6, 7 and Table 1, two genetic signatures were identified.
The first significant amino acids enrichment difference between CH and TF was observed in the HIV-1 envelope gp41 cytoplasmic tail, specifically in the Lentivirus Lytic peptide 1 (LLP-1). It concerns a substitution of an arginine (R) by an isoleucine (I) at position 841 (R841I) in reference to HXB2 Env sequence numbering ( Figure 5, Table 1). and bottom (yellow) represent the TF, RC and CH viruses' V3 sequences net charge, respectively. The X-axis represents the number of charges for the HIV-1 Env gp120V3 loop and the Y-axis represents the density of sequence charges of HIV-1-infected individuals for the CH, RC and TF viruses respectively. As shown in Figure 4, the difference of the V3 positive net charge was significant between CH, RC and TF, p = 0.04. Importantly, the difference in HIV-1 Env V3 loop net charge was statistically significant between CH and TF viruses, p = 0.03 using Chi Square Test with regression logistic analysis. No significant difference was observed between RC and TF, p > 0.05.

Clade B HIV-1 Envelope Amino Acids Signatures Associated to Transmitted/Founders and Recent Viruses Compared to Chronic
The second objective of this study was to screen full-length HIV-1 envelope sequences to identify genetic characteristics (mutation patterns) associated with transmitted/founder and recent viruses compared to chronic ones. As presented in Figures 6, 7 and Table 1, two genetic signatures were identified.
The first significant amino acids enrichment difference between CH and TF was observed in the HIV-1 envelope gp41 cytoplasmic tail, specifically in the Lentivirus Lytic peptide 1 (LLP-1). It concerns a substitution of an arginine (R) by an isoleucine (I) at position 841 (R841I) in reference to HXB2 Env sequence numbering ( Figure 5, Table 1).
The second genetic signature was identified in the HIV-1 envelope signal peptide (SP). It concerns a substitution of lysine (K) by an isoleucine at position six of HXB2 numbering ( Figure 6, Table 1). The substitution of lysine (K) for isoleucine (I; K6I) in the Env SP at position six was highly enriched in chronic viruses, 79.04% (83/105) compared to TF viruses, 53.60% (52/97), OR = 3.26, 95% CI (1.76-6.02), p = 0.0001 using the chi-squared (Chi 2 ) test.  Others significant amino acids mutation patterns that distinguish TF from CH HIV-1 envelope sequences were also found less significant amino signatures in the GP120 C1 VI, V5 loops and GP 41 fusion peptide (FP), Kennedy Epitope (KE), loop and Fusion peptide proximal region (FPPR; Table  1). Figure 6. Genetic signature identified under the HIV-1 envelope signal peptide (SP) associate to clade B HIV-1 chronic compared to TF viruses using WebLogo. The X axis represents sequences and amino acids (AA) identities composing the SP (direction N to C). The X axis represents amino acids (AA) composing the LLP-1 sequence (direction N to C). The Y axis represents the normalized AA frequency identified at each position of the SP sequence for each category of infection. The top line box represents chronic (CH) viruses envelope SP sequence (N = 105) and the bottom line for TF viruses (N = 98). As indicated for Weblogo analysis, the overall height of each stack indicates the sequence conservation at that position (measured in bits), whereas the height of symbols within the stack reflects the relative frequency of the corresponding amino or nucleic acid at that position. The isoleucine (I) amino acid signature was localized at position six of alignment (HXB2 position K6I) and identified by a red asterisk. Others significant amino acids mutation patterns that distinguish TF from CH HIV-1 envelope sequences were also found less significant amino signatures in the GP120 C1 VI, V5 loops and GP 41 fusion peptide (FP), Kennedy Epitope (KE), loop and Fusion peptide proximal region (FPPR; Table 1).

HIV-1 Envelope Genetic Signatures Among Transmitted/Founder and Recent Viruses Compared to Chronic
Four important genetics signatures were also identified when combining TF and RC compared to chronic viruses (Figures 7-10). The first one was localized in the GP120 V1 loop at position 153 ( Figure 7) and did not constitute a change. However, it identified a high enrichment of glutamic acid (E; 153E) in CH viruses, 89.42% (93/104) and 65.60% (82/105) in RC (+TF) ones, OR = 4.43, 95% CI (2.16-9.05), p = 0.000001 using the Chi 2 test.
The complete profile of statistically significant clade B HIV-1 envelope amino acid genetic signatures that distinguish recent from chronic viruses is summarized in Table 2.

HIV-1 Envelope Genetic Signatures Among Transmitted/Founder and Recent Viruses Compared to Chronic
Four important genetics signatures were also identified when combining TF and RC compared to chronic viruses (Figures 7-10). The first one was localized in the GP120 V1 loop at position 153 ( Figure 7) and did not constitute a change. However, it identified a high enrichment of glutamic acid (E; 153E) in CH viruses, 89.42% (93/104) and 65.60% (82/105) in RC (+TF) ones, OR = 4.43, 95% CI (2.16-9.05), p = 0.000001 using the Chi 2 test.

Chronic
Four important genetics signatures were also identified when combining TF and RC compared to chronic viruses (Figures 7-10). The first one was localized in the GP120 V1 loop at position 153 ( Figure 7) and did not constitute a change. However, it identified a high enrichment of glutamic acid (E; 153E) in CH viruses, 89.42% (93/104) and 65.60% (82/105) in RC (+TF) ones, OR = 4.43, 95% CI (2.16-9.05), p = 0.000001 using the Chi 2 test.   The third signature was localized at position 621 of HIV-1 Env GP41 loop domain. It consisted of glutamine (Q) substitution by aspartic acid (D; Figure 9). The aspartic acid was enriched at 15.38% (16/104) for CH and 41.60%, (52/125) for RC (+TF), OR = 0.25%, 95% CI (0.13-0.48), p = 0.00001 using the Chi 2 test. The last amino acid mutation patterns that distinguish chronic from recent viruses were localized in the HIV-1 Env GP 41 cytoplasmic tail specifically at position 751 ( Figure 10). Specifically, they were localized between the NF-κB activation (NA) and the highly immunogenic region, also called Kennedy Epitope (KE; Figure 10  The complete profile of statistically significant clade B HIV-1 envelope amino acid genetic signatures that distinguish recent from chronic viruses is summarized in Table 2.   Table 1

Discussion
The main objective of the current study was to determine the characteristics of the clade B HIV-1 envelope variable loop in term of sequences length, number of N-glycosylation sites and net charge. It also aimed at identifying the principal amino acid signatures associated with TF and RC founder virus strains compared to chronic viruses. The TF and RC HIV-1 viruses envelopes glycoproteins mutations patterns determine the success of viral transmission and its evolution during HIV-1 infections. Identifying such genetic signatures may help improve HIV-1 prevention and inform vaccine design. The current study included 103 untreated HIV-1 clade B HIV-1 consensus envelope sequences from different cohorts available in the Los Alamos sequence databases (Figure 1), in addition to two sequences derived from LSPQ serobank chronically infections. To limit the selection bias in LANL sequences, we carefully identified consensus sequences (one/patient) from clearly untreated chronically HIV-1-infected individuals from the North America region (United States of America and Canada) that had been previously included in published articles [1,9]. We failed to obtain more HIV-1 chronic envelope sequences from all study participants derived from LSPQ serobank collections in order to make comparisons between TF and CH derived in the same context. This was due to the lower amplification success rate obtained in this study for those samples.
Multiple factors may have affected HIV-1 envelope amplification success rate including sample quality such as the long-term storage, viral RNA extraction procedures, primers and enzymes as well as the viral loads of infected individuals (VL < 20,000 copies/ml). Depending on the length the HIV-1 genome to be amplified and specifically for Env gene, the procedure is known to be challenging [9,[50][51][52].

Clade B HIV-1 Envelope Variable Loop Characteristics
The first objective of the current study was to characterize HIV-1 TF viruses envelope variable regions, which include the V1/V2, V3, V4 and V5 loop lengths, their number of N-linked glycosylation sites and the V3 loop positive net charge.
Our results show that the V3 loop numbers of N-glycosylation sites of TF viruses were significantly less glycosylated than the chronic ones ( Figure 2 and Table S5). The Env V3 loop of TF viruses were less positively charged than chronic viruses. (Figure 4 and Table S5). This observation confirms earlier findings of a decreased positive net charge of TF viruses V3 loop sequences compared to chronic [53][54][55][56][57][58]. The positive net charge of HIV-1 envelope hypervariable loop three modulated the viral phenotype and tropism [59] at different stages of infection. The lower decreased charge of TF viruses V3 loop may constitute a regulating factor of viral phenotype during transmission.
In this study, The V1/V2 loop length and number of N-glycosylation sites did not differ between TF and CH HIV-1 viruses envelopes identified by earlier studies [60]. A shorter V1/V2 length and a fewer number of N-glycosylation sites have been associated to TF viruses in previous studies [60]. Most of these characteristics have been observed for clades A, C and D of HIV-1 [1,14,58,61]. This could reflect a difference among clades, as our study compared clade B HIV-1 Env V1 and V2 loops [14,57,58].
We also observed that HIV-1 Env GP120 loop 5 (V5) length of TF viruses was significantly shorter than RC and CH viruses ( Figure 3 and Table S1). The V5 loop has been found to be necessary for viral structure integrity maintenance, negatively affected virus assembly and virus entry [62] and constituted neutralizing determinants recognized by broadly neutralizing monoclonal antibodies [63,64]. It also participates in CD4 binding sites (CD4bs) formation [65]. The shorter loop length of TF compared to RC and CH viruses suggests that V5 sequence loops length modeling at the acute stage of infection plays an important role in the virus transmission process and subsequently to disease progression.

Clade B HIV-1 Envelope Amino Acids Signatures Associate to Transmitted/Founder and Recent Viruses
The second objective of this study was to identify specific mutation patterns across the HIV-1 envelope that may be considered as a genetic signature of TF viruses. We first compared the TF viruses envelope sequences derived of acutely HIV-1-infected individuals (Fiebig stage 1 to 2) [66,67] with those from chronically infected individuals. The first important point mutation identified consisted of substitution of an arginine (R) by an isoleucine (I) at position 959 of alignment and referred to HXB2 numbering to position 841 (R841I; Figure 5). This mutation was localized in the C-terminal of the cytoplasmic (CT/D), specifically in the lentivirus lytic peptide segment 1 (LLP-1) [68,69]. The cytoplasmic tail or domain of GP41 is important for HIV-1 replication and pathogenesis by regulating rapid clathrin-mediated endocytosis that induces low levels of Env expression on cell surface [30,70,71]. This phenomenon contributes to limiting humoral immune pressure to HIV-1 [30]. It is also known that GP41 CT contributes to Env incorporation into virions by interacting with viral matrix protein [37] and also for cellular-transcription factor NF-κB activation [30]. The CTs of HIV-1 of GP41 have also been shown to have an impact on gp120 and ECD conformation and mutations in this domain also impacted recognition and neutralization of antibody [15,32,72].
The R841I signature associated to clade B HIV-1 TF viruses was localized at position 841 of the GP41 cytoplasmic tail in this LLP-1. It was reported that the LLP-1 mutations affect Env association with lipid rafts [31,70] and reduce Env incorporation, infectivity and the replication process for certain viral phenotypes [73]. The results of our current study reveal the selection of isoleucine by TF viruses (Table 1), which may contribute to HIV-1 gp41 CT functions. Earlier studies have highlighted the importance of gp41 cytoplasmic tail domain (CD) in HIV-1 in transmission and pathogenesis [30]. Lee, S. F. et al. (2002) previously demonstrated that a single deletion of one of the two adjacent valine residues located at position 832 and 833 and Ile-830, Ala-836 and Ile-840 significantly contributed to the reduction of Env steady-state expression [74]. The R841I substitution identified in the acute stage of infection of our study may constitute a key factor to enhance LLP-1 functions. It would be necessary to evaluate its functional implications in the HIV-1 transmission process and viral replication.
The second important amino acid signature identified was K6I (within the signal peptide), highly enriched in chronic vs. TF viruses ( Figure 6, Table 1). The negative selection of this mutation for the TF viruses may represent a strategy for virus resistance to early immune responses. Gnanakaran S, et al. 2011 [1] showed that a histidine signature at position 12 (H12) in the signal peptide was highly enriched in TF viruses compared to chronic HIV-1 envelope sequences. The histidine amino acids that were normally located at position 12 were substituted by arginine (R) or proline (P) during acute infection [1,75]. This H12 signature was found to increase envelope incorporation in pseudoviruses in vitro [75,76]. The current study identified and highly selection of the isoleucine (K6I) associated to chronic compared to TF viruses ( Figure 6, Table 1). This mutation constituted an amino acid signature enriched during disease progression over chronic infection.
The HIV-1 envelope signal peptide plays an important role in virus interaction with host cells during transmission and its evolution toward the chronic stage. It contributes to increased Env gp120 transport and the secretion and expression of Env on the cell membrane surface [77]. As reported by previous studies, a natural variation in the N-terminal signal peptide (SP) of the HIV envelope significantly impacts the antigenicity and molecular mass of mature gp120 and its glycosylation and interaction with DC-SIGN [78]. The SP is also likely subjected to antibody-mediated immune pressure [77]. Compared to the Gnanakaran et al. (2011) study, phylogenetic analysis methods and the numbers of sequence datasets used for amino acid signature estimate may explain the different results. The Gnanakaran et al. study used consensus and corrected phylogenetic tree analyses [1,76] for amino acid signature estimates, whereas our study used the WebLogo online-based application to map and determine the amino acid estimates. In addition, the definition of the TF viruses was also different. The current study considered TF viruses as the sequences of HIV-1-infected individuals sampled during acute infection Fiebig 1 to 2 stages [40,41], whereas Gnanakaran et al. (2011) considered as recent viruses those identified at early stage of HIV-1 infection covering the Fiebig stages 2 to 5 [1,76].
We believed that identifying the HIV-1 envelope genetic signature very early after infection might lead to a better identification of important genetic polymorphism observed during disease transmission. Therefore, we compared chronic virus envelope sequences with those of TF and recent ones to determine a specific mutation pattern. Four significant HIV-1 envelope amino acid signatures were identified (Figures 7-10 and Table 2). Three were highly associated to chronic viruses and one to recent ones. The mutations patterns associated with chronic viruses consisted of glutamic acid (E) at position 153 in the V1 loop (153E), a methionine (M) substitution by isoleucine at position 24 in the signal peptide (M24I) and aspartic acid (D) substitution by a valine (V) in the cytoplasmic tail (D751V; Figures 7, 9 and 10 and Table 2). The amino acid signature associated with recent and TF viruses together was localized in the GP41 ectodomain, specifically in the loop domain ( Figure 9 and Table 2). It consisted of glutamine (Q) substitution by aspartic acid (D) at position 621 (Q621D). The HIV-1 envelope genetic signatures identified for chronic and recent viruses constituted results of accumulated mutations during disease progression.
In summary, we identified an important HIV-1 envelope amino acid genetic signature associate to the GP41 cytoplasmic tail, specifically in the lentivirus lytic peptides associated TF compared to chronic viruses. It would be interesting to conduct phenotypic studies to further evaluate the role of isoleucine substitution in viral Env function (R841I) including others frequents mutations. Overall, this study provided new evidence related to genetic characteristics of HIV-1 envelope sequences associate with clade B TF, RC and CH viruses. As other genetic analysis, the Weblogo would have been impacted results of sequence genetic profiles if different groupings were considered. Careful verification to ensure that all sequences have the same lengths and do not contain gaps could be considered.

Conclusions
The current study identified the presence of different point mutations patterns in the HIV-1 envelope, specifically in the GP41 cytoplasmic tail lentivirus lytic peptide segment 1 (gp41 CT_LLP-1) significantly associated with TF viruses. The LLP-1 domains of GP41 CT play an important role in the virus replication and pathogenesis. The R841I mutation identified in this segment may be considered as specific genetic signature, as well as its phenotypic properties during HIV-1 transmission merits further study. The HIV-1 transmission is complex, multifactorial and this mutations profiles identified could not be the only contributing factors to the disease transmission. But, understanding and identifying such early envelope molecular determinants may provide clues for the design of an HIV vaccine.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4915/11/11/1012/s1, Table S1: Defining of the different timeline categories of HIV-1 infection status and referred nomenclatures reported in manuscript, Table S2: Limit of HIV-1 envelope sub-regions and domains referred to HXB2 Env gp160 sequence numbering, Table S3: HIV-1 TF viruses envelope sequences data qualifiers, Table S4: Background information of LANL clade B HIV-1 chronic viruses envelope sequences, Table S5: Descriptive statistics of HIV-1 envelope variable regions characteristics including the number of N-glycosylation sites, the loop lengths and the V3 loop net charge.