Epstein-Barr Virus (EBV) Is Mostly Latent and Clonal in Angioimmunoblastic T Cell Lymphoma (AITL)

Simple Summary Angioimmunoblastic T cell lymphoma (AITL) is the most common peripheral T cell lymphoma encountered in Europe. It is a non-Hodgkin’s lymphoma with a poor prognosis. The Epstein-Barr virus (EBV) virus is detected in more than 90% of biopsies, especially in large B lymphocytes. To date, the role of EBV in this pathology is still debated. The aim of our study was to analyze whole viral genomes in AITL compared to other EBV-associated lymphomas. We observed that two viral strains were mainly found in AITL, one of which appeared to be associated with poor post-diagnosis survival. Furthermore, the virus was found to be clonal and latent in all cases of AITL; for one biopsy, the virus was both latent and most likely replicative, depending on the cells. On the whole, these results support a role for EBV in AITL. Abstract The Epstein-Barr virus (EBV) is associated with angioimmunoblastic T cell lymphoma (AITL), a peripheral T lymphoma of poor prognosis in at least 90% of cases. The role of EBV in this pathology is unknown. Using next-generation sequencing, we sequenced the entire EBV genome in biopsies from 18 patients with AITL, 16 patients with another EBV-associated lymphoma, and 2 controls. We chose an EBV target capture method, given the high specificity of this technique, followed by a second capture to increase sensitivity. We identified two main viral strains in AITL, one of them associated with the mutations BNRF1 S542N and BZLF1 A206S and with mutations in the EBNA-3 and LMP-2 genes. This strain was characterized in patients with short post-diagnosis survival. The main mutations found during AITL on the most mutated latency or tegument genes were identified and discussed. We showed that the virus was clonal in all the AITL samples, suggesting that it may be involved in this pathology. Additionally, EBV was latent in all the AITL samples; for one sample only, the virus was found to be latent and probably replicative, depending on the cells. These various elements support the role of EBV in AITL.


Introduction
The Epstein-Barr virus (EBV) is a ubiquitous human γ-herpesvirus that infects more than 95% of the adult population. EBV primary infection corresponds to the infection of the oropharyngeal epithelial cells, where it actively replicates, and of B cells, where it remains as morphous infiltrate of eosinophils and plasma cells, large immunoblasts, B cells, histiocytes, epithelioid cells, and atypical T cells with abundant clear cytoplasm. The perivascular expansion of follicular dendritic cells (FDCs) and abundant arborizing endothelial venules are observed. Neoplastic cells, often less abundant than the reactive background, are localized in close proximity to the endothelial venules. Molecular analysis revealed that the cell of origin belongs to an effector T-cell subset, the follicular T helper (T FH ) cell, which plays a key role in B-cell activation and differentiation in the germinal center [19,20].
EBV positive cells are detected in up to 85-95% of AITL biopsies, the virus being principally located in large B-cell blasts [30][31][32][33][34], which may resemble Hodgkin's RS cells and are distributed throughout tissues [35]. In some cases, the neoplastic T cells may also be infected. The role of EBV in AITL is still uncertain and several hypotheses coexist. Some authors argue that EBV reactivation occurs as a consequence of the immunodeficient state created by AITL, thereby favoring the expansion of T FH and B cells and playing a role in the development of the tumor microenvironment. Others allege that EBV itself drives the development of AITL by activating T FH cells [36]. The presence of EBV positive cells detected early in the disease course and the fact that EBV-positive B-cell proliferation may occur during AITL progression, imply that this virus may play a role in the development of AITL [31].
Here, using-next generation sequencing (NGS), we characterized the viral genomic alterations in EBV-associated AITL. For these patients, we described the mutations on the most mutated latent and tegument genes, and we demonstrated that EBV was in a latent state in all the AITL tumor biopsy samples and in a clonal form in all samples, which is consistent with a direct role for EBV in AITL.

Patients
This study was retrospectively conducted on 34 frozen EBV positive biopsies collected at the initial diagnosis from patients hospitalized at Limoges University Hospital, France, between 2000 and 2015. All the lymphoma cases were initially diagnosed after examination by a pathologist and reviewed independently by another pathologist using WHO criteria [14]. The median age was 64 ± 15.59 years, and the sex ratio was 0.49. Details of this patient cohort are provided in Table 1. Furthermore, one EBV-positive inflammatory reactive biopsy, belonging to a 73-year-old woman without hematological malignancy, and a serum collected from a 13-year-old boy hospitalized for symptomatic IM were studied as controls.
Informed consent was obtained from all the patients to analyze their samples, and the study was approved by the Ethics Committee of the Institutional Review Board.

EBER In Situ Hybridization
In order select EBV-positive AIL samples, the EBV was detected by in situ hybridization (ISH). The formalin-fixed paraffin-embedded (FFPE) tissue sections used for this detection were initially deparaffinized, then rehydrated in a graded solution of xylene and alcohol and deproteinized with proteinase K (ThermoFisher Scientific, Illkirch-Graffenstaden, France). They were subsequently incubated with the Ventana EBER 1 DNP Probe ® (Roche Diagnostics, Meylan, France; catalog number 760-1209) used for EBER hybridization, followed by staining with the Ventana ISH iVIEW blue detection kit ® (Roche Diagnostics; catalog number 760-097). Images of the EBER staining obtained for control 1 (absence of EBER detection) and for a patient with AITL are visible in the Figure S1.

B and T Clonality Determination
B-cell clonality was evaluated according to the van Dongen publication [37] after amplification of the VDJ region in heavy chains, using the consensus IGHJ primer and IGHV primer families (FR1, FR2, or FR3), as well as in light chains, using the IGkJ and IGkV primer families following the amplification protocol described by the authors. TCR clonality was determined by TCRβ and TCRγ gene amplification using, respectively, Vβ or Dβ family primers and the Jβ primer family for TCRβ or the Vγ and Jγ primer family) for TCRγ. All the primers were purchased from Sigma-Aldrich, Saint-Quentin Fallavier, France. The PCR products obtained from the Ig and TCR gene rearrangements were analyzed by heteroduplex and GeneScanning analysis (Applied Biosystmes, ThermoFisher Scientific, Illkirch-Graffenstaden, France). For heteroduplex analysis, PCR products obtained with unlabeled primers (Sigma-Aldrich) were denatured at a high temperature (95 • C for 5 min), followed by low-temperature rapid random renaturation (4 • C for 1 h). The products were then submitted to electrophoresis on a 6% polyacrylamide gel to distinguish between homoand heteroduplexes. For GeneScanning analysis, the fluorochrome-labeled single-strand (denatured) PCR products were size separated in a denaturing polyacrylamide sequencing gel or a capillary sequencing polymer (Applied Biosystmes) and detected via automated scanning with a laser (Applied Biosystmes).

DNA Extractions and Generic Amplification of Serum DNA
DNA was extracted from the cell lines (10 6 cells) and frozen tissue samples (6 sections of 10 µm each) using the DNeasy Blood and Tissue Kit ® (Qiagen, Les Ulis, France; catalog no. 69504) according to the manufacturer's recommendations. The DNA concentration was determined by the Qubit 2.0 Fluorometer TM (Life Technologies, Villebon-sur-Yvette, France). DNA was extracted from the serum using the NucliSENS EasyMAG platform TM (BioMérieux, Marcy-l'Etoile, France). Because the amount of viral DNA was too low in the serum samples, a generic amplification was conducted using the TruePrime WGA kit ® (Ozyme, Saint-Cyr-l'Ecole, France, catalog no SYG351025) according to the manufacturer's instructions. This technique, based on multiple displacement amplification, uses two enzymes: the newly discovered primase DNA TthPrimPol and the highly processive Phi29 DNA polymerase. Full-length EBV genomes of EBV type1 (NC_007605) and EBV type2 (NC_009334) prototypes were used to design the EBV probes by Roche (Madison, WI, USA). Overlapping 100-120 mer DNA probes were designed so as to cover the EBV genomes at least five times. The coverage was estimated at almost 99.7% and 99.9% of the EBV1 and EBV2 genomes, respectively. The probes did not match to the human hg19 genome (GRch38.p13) as determined by the SSAHA algorithm. A probe was considered to match to the genome if there were less than five single-base insertions, deletions, or substitutions between the probe and the genome. The vast majority of probes were unique, with a few probes that had a greater degree of multi-locus homology to increase the coverage in all regions.

Illumina Library Construction and Whole EBV Genome Sequencing
The overall experiment was conducted according to the NimbleGen Seqcap EZ library SR manufacturer's protocol ® (Roche). Two micrograms of each DNA sample were fragmented using the Bioruptor Sonicator TM (Diagenode, Liège, Belgium) with a size range between 100 and 400 bp fragments, using the following settings: volume, 100 µL, temperature, 4 • C, number of cycles, 13, for the cell lines and frozen biopsies. The fragmented samples were subsequently used to synthetize libraries with the KAPA Library Preparation Kit ® for the Illumina NGS platform (KAPA Biosystems, Roche, catalog no 07137974001) according to the manufacturer's recommendations. First, the fragments were submitted to the end-repair process to obtain blunt ends. Then, they were "A tailed" by the addition of nontemplated Adenine nucleobase, adaptor-ligated and index-tagged. The libraries were enriched with a PCR reaction for 8 cycles by ligated mediation PCR (pre LM-PCR). The final size selection of the library was achieved by a single AMPure XP Paramagnetic Bead ® (Agencourt, Beckman Coulter Genomics, Villepinte, France, catalog no A63881) cleanup, targeting a final 300 to 500 bp library size. The libraries underwent a qualitative (final size distribution) and quantitative assay using the High Sensitivity DNA Labchip kit ® (Agilent, Technologies, Les Ulis, France, catalog no 5067-626) on the 2100 Bioanalyzer (Agilent). The obtained libraries were pooled at equal molar quantities to a total of 1 microgram and hybridized with EBV biotinylated probes at 47 • C for 24 h. The hybridized fragments containing the targeted genes were adsorbed on magnetic streptavidin-beads, and the uncaptured DNA fragments were removed by washing. Enrichment of the eluted fragments was then performed by middle ligated mediation PCR (middle LMP-PCR) for 5 cycles. A second capture, a washing, and a post LM-PCR of 14 cycles were conducted to increase the hybridization yield, as AITL lymphomas are known to contain low copy numbers of EBV. The final concentration of each capture pool was verified through the 2100 Bioanalyzer, and sequencing was performed using Illumina technology with paired-end sequenced DNA libraries. According to the viral load before library synthesis, two Illumina sequencing platforms were used. The Illumina MiSeq Reagent Kit ® V2 (2 × 250 bp pair-end sequencing) and the Illumina Nextseq Reagent Kit ® (2 × 75 bp pair-end sequencing) (Illumina, ICM, Hôpital Pitié, Paris, France) were used, respectively, for samples with a viral load greater than and less than 100,000 copies/µg DNA. To validate the sequencing workflow, the MiSeq personal sequencer laser (Illumina, Evry, France) was used to resequence B95-8, Jijoye, P3HR1, and Raji whose sequences had been previously published.

De Novo Assembly
VirAmp, a galaxy-based viral genome assembly pipeline, was used to generate scaffolds from the assembly of sequencing read pairs with a de Bruijin graph algorithm. Briefly, after stringent quality control and host decontamination steps, the Velvet de novo assembler is called to generate contigs. From these contigs, the Viramp pipeline assembles longer scaffolds, summarizes assembly statistics, and displays the contigs' alignment to the reference genome in a Circos graph [38]. The scaffolds generated by Viramp were oriented to the reference genomes (NC_007605 for type 1 or NC_009334 for type 2) with BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi, accessed on 10 June 2021) [39], and linear genome sequences were built accordingly by resolving overlaps with 5 priority and by filling the gaps with Ns.

EBV Typing
As the type 1 and type 2 EBV sequences mostly diverge in EBNA-2 and EBNA-3 genes, the EBV type was determined by comparing the read mapping performance on these regions when the type 1 or type 2 sequences were used as a reference. For each sample, the ratio of the number of EBV1-mapped reads to the number of EBV2-mapped reads in the EBNA-2 and EBNA-3 regions was computed. A visual examination of read alignments was also performed in IGV Integrative Genomic viewers [45] to confirm the specificity of the read alignments over these regions.

Mutation Analysis and Clonality Assessment
The annotation of individual VCF files was performed using snpEff [46] with reference GenBank files NC_007605 (for type1) or NC_009334 (for type2). Variations found in major repeats (internal repeats and terminal repeats) were discarded. Custom Python scripts (available on request) were run to classify variants according to variation nature (synonymous or non-synonymous, substitution or insertion, or deletion), variant site homogeneity (heterogeneous if variant allele frequency is >5% or <95%), and affected protein function (according to the classification of Tabouriech et al., 2006) [47]. The proportion of heterogeneous sites was used to assess the clonality of each sample.

Phylogenetic Analysis of EBV Genomes
The reported homogeneous variations in the vcf files were parsed to generate whole genome or specific gene sequences for each sample. Multiple sequence alignments were obtained with MAFFT [48], and phylogenetic trees were built using the BioPython Phylo module functions to implement the UPGMA algorithm (Unweighted Pair Group Method with Arithmetic Mean).

Quantitative PCRs
The EBV viral load and albumin concentration were determined by TaqMan qPCR. The primer and probe sequences for EBV BMRF1 and human albumin gene amplification, designed by means of Primer Blast, were, respectively: BMRF1s (forward), 5 -CCGGCCTGAATTTGTTAAGC-3 , BMRF1as (reverse), 5 -CTTGGGCATCAACAGCACC-3 , BMRF1p (probe), 5 -AATCATCTGCTCGTTCCTCAGCC-3 and AlbR (forward) 5 -AAACTCATGGGAGCTGCTGGTT-3 , AlbS (reverse) 5 -GCTGTCATCTCTTGTGGGCTGT-3 , and Albp (probe) 5 -CCTGTCATGCCCACACAAATCTCTCC-3 . The samples (100 ng DNA) were analyzed in duplicate using the TaqMan masterMix ® (Roche Life Science, catalog no 04535286001) on a Rotor-Gene Q TM (Qiagen) at 95 • C for 10 min, followed by 45 cycles at 95 • C for 30 s and 60 • C for 1 min. Quantification was measured compared to the standards obtained by the insertion of the amplified sequence (BMRF1 or albumin) into the pCR2.1 TA cloning vector ® (Invitrogen, Villebon-sur-Yvette, France). After purification of the two constructs by Qiagen Plasmid Maxi kit ® (Qiagen), the copy number was calculated after concentration determination by OD 260 measurement. Then, serial 10-fold dilutions were made to prepare the standard curves. Knowing that there are two copies of the albumin gene per cell, the albumin copy number obtained for each sample was used to relatively determine the number of viruses per cell.

Analysis of Whole EBV Genome Sequences
The complete analysis concerned 36 patient samples (34 patients with lymphoproliferative disease, all being selected because of EBER RNA detection in their biopsies, one control with a reactive adenopathy without any EBV pathology, and one control with a primary infection) and seven cell lines. The patient description is shown in Table 1. To obtain complete EBV genome sequences, a target capture method was used because viral loads may be low and because the EBV genome shares several homologies with the human genome. For 10 samples, the viral load was <100,000 copies/µg DNA, and a very low depth was obtained by MiSeq sequencing. Therefore, these samples were sequenced using NextSeq sequencing. The use of two different approaches to analyze the results, namely reference mapping and de novo assembly, which are complementary, allowed us to obtain more complete and accurate whole EBV genomes. The overall results are reported in Table 2. The mean read number per sample was 10,889,606, with a mean depth of 4136. An average of 96% of the reads mapped to the EBV genome, and all the sequences were analyzed after removing the low-percentage reads mapping to the hg19 human genome. Using the de novo assembly, the genomes obtained on the MiSeq sequencer were assembled into contigs created by Viramp and aligned to the reference genome to generate scaffolds. Seventeen whole genome sequences, including eight AITL were determined and deposited in the NCBI GenBank (N • MH837512 to MH837528). The individual sample accession numbers are listed in Table 3. The coverage profile was similar for all samples and greater than 92%. The mean percent GC content was 58.99%. The Fastq reads obtained for the totality of the samples have been deposited in NCBI GeneBank (BioProject ID PRJNA505149).

Some AITL Strains Exhibited Similar Distribution Patterns of Mutations
The whole genome sequences obtained for the patients were mapped to the reference genomes NC_007605.1 for EBV-1 and NC_009334.1 for EBV-2. Given that the latent EBNA-2 and -3 genes are the most divergent genes between EBV-1 and -2, we used the ratio "number of reads mapping to EBV-1 EBNA-2, -3 genes/number of reads mapping to EBV-2 EBNA-2, -3 genes" to determine the EBV type of each sample ( Table 2). All but one patient sample (AIL18, which belonged to a native North African woman) harbored EBV-1. The raw reads were then aligned to the EBV-1 reference genome, and genetic variations were detected using the Varscan tool ( Figure 1 and Table S1). T Figure 2 illustrates the relative genetic variations among the genomes for the different strains compared to the reference. For the AITL patients, although no consensual sequence of the EBV genome was found, it is noticeable that AIL1/6/7/8/10/11/12/13/14/15/17 patients had very close strains. The same observation was true for the AIL3/4/5/9 patients. It is also noteworthy that two strains were particularly mutated: AIL2 and AIL16.

AITL Biopsies Revealed EBV in a Clonal Form
To determine whether the virus was clonal in biopsies, we calculated the heterogeneity, i.e., the number of heterogeneous positions, for each sample (Table S1). Because the sequence depth was high using deep sequencing, we considered a position as heterogeneous if the variant frequency was between 20 and 94%, as proposed by Kwok et al. [49]. A low level of heterogeneity supports the monoclonal origin of a strain, whereas high heterogeneity is due to the presence of various strains. Here, to determine clonality, we chose a cut-off of 0.2% heterogeneity, which is more stringent than the 0.5% cut-off chosen by Kwok et al. In this condition, Control1 and PI1 presented heterogeneous EBV strains, and the cell lines harbored monoclonal strains ( Figure 3). Interestingly, all the AITL samples tested contained EBV in clonal form. Only four patient biopsies (PTTL1, PTBL3, PTBL4, and LPL1) showed heterogeneity in favor of a non-clonal virus. We then correlated these findings to the B and/or T clonality for each AITL sample and found that all but one sample (AIL3) had a T clonality (94.4%), while B clonality was found for five samples (27.7%) (Figure 3).

AITL Biopsies Revealed EBV in a Clonal Form
To determine whether the virus was clonal in biopsies, we calculated the heterogeneity, i.e., the number of heterogeneous positions, for each sample (Table S1). Because the sequence depth was high using deep sequencing, we considered a position as heterogeneous if the variant frequency was between 20 and 94%, as proposed by Kwok et al. [49]. A low level of heterogeneity supports the monoclonal origin of a strain, whereas high heterogeneity is due to the presence of various strains. Here, to determine clonality, we by Kwok et al. In this condition, Control1 and PI1 presented heterogeneous EBV strains, and the cell lines harbored monoclonal strains ( Figure 3). Interestingly, all the AITL samples tested contained EBV in clonal form. Only four patient biopsies (PTTL1, PTBL3, PTBL4, and LPL1) showed heterogeneity in favor of a non-clonal virus. We then correlated these findings to the B and/or T clonality for each AITL sample and found that all but one sample (AIL3) had a T clonality (94.4%), while B clonality was found for five samples (27.7%) (Figure 3).

EBV Was Almost Always Latent in AITL Biopsies
In order to establish whether EBV was in a latent or a replicative state in the biopsies studied, we determined the viral load in each sample by quantitative PCR (qPCR), targeting the BMRF1 gene. The human albumin gene was also quantified in each sample using the same protocol in order to calculate the number of viruses per cell. Based on the publication of Hsieh et al. [50], a virus can be considered as latent if there are less than 20 copies per cell; a higher intracellular viral load is indicative of active replication. According to this calculation, only one AITL biopsy (AIL5) exhibited viral replication, while the others contained latent EBV (Table 4). Active replication was also found in other lymphoma samples (CTCL1, NK/TL2, PTBL1, DLBCL3, and ARL2).

EBV Was Almost Always Latent in AITL Biopsies
In order to establish whether EBV was in a latent or a replicative state in the biopsies studied, we determined the viral load in each sample by quantitative PCR (qPCR), targeting the BMRF1 gene. The human albumin gene was also quantified in each sample using the same protocol in order to calculate the number of viruses per cell. Based on the publication of Hsieh et al. [50], a virus can be considered as latent if there are less than 20 copies per cell; a higher intracellular viral load is indicative of active replication. According to this calculation, only one AITL biopsy (AIL5) exhibited viral replication, while the others contained latent EBV (Table 4). Active replication was also found in other lymphoma samples (CTCL1, NK/TL2, PTBL1, DLBCL3, and ARL2).

Phylogenic Analysis Confirmed the Existence of at Least Two Different Groups of EBV among AITL Biopsies
The phylogenic study was conducted for all the sequenced strains, based on the multiple nucleotide sequence alignment of whole genomes ( Figure 4). Unsurprisingly, it clearly individualized AIL18, the only type 2 EBV of this series. Furthermore, the AIL2 and AIL16 strains were clearly different from the vast majority of the AITL strains, which were categorized into two different groups: strain 1 and strain 2. Strain 1 was found in patients AIL1/6/7/8/10/12/13/14/15/17 and strain 2 corresponded to AIL3/4/5 and -9 patients, whose proximity to each other and to CTCL1 was noticeable. AIL3/4/5/9 patients had an average survival of 2.4 years after their diagnosis, while it was 7.1 years for the AIL1/6/7/8/10/11/12/13/14/15/17 patients; this difference was significant, as demonstrated by the Kaplan-Meier curve (Table 1 and Figure 5). Overall, phylogenetic analysis showed that there was no EBV strain specific to AITL. Phylogenetic analysis was also conducted on each of the latency genes. Interestingly, the two groups already described were clearly individualized for the EBNA-3A, EBNA-3B, EBNA-3C, and LMP-2 genes ( Figure 6). Figure S2 shows the phylogenetic trees obtained for the EBNA-1, EBNA-2, and EBNA-LP genes.

Mutations Occurred Mostly in Latency Genes and Secondarily in Tegument Genes
The means of the single nucleotide variations (SNVs) and the insertions and deletions (INDELs) were not higher in the AITL group compared to other lymphomas (respectively, 265 and 402 for SNVs and 5 and 7 for INDELs). The analysis of non-synonymous mutations according to nine main gene categories (namely latency, replication, membrane glycoprotein, tegument, capsid, transcription, metabolism, packaging, and unknown function) showed that the majority of changes were located in latency genes, though a large number of mutations also occurred in tegument genes (Figure 7). Conversely, capsid and transcription genes had the lowest number of variations. For each gene, we calculated the average number of non-synonymous mutations for the "AITL" group versus the "other lymphoma" group. The results showed a significant difference between these two groups and are reported in Table 5. In addition to the latency and tegument proteins, it can be noted that some replication proteins were more mutated in the AITL group, such as the BKRF3 protein for example. We also looked for variations in proteins implicated in the switch from latency to the lytic cycle, mainly Rta, a product of the BRLF1 gene, and Zta or ZEBRA, BZLF1 encoded protein. The patients AIL2/3/4/5/9/16 harbored the BRLF1 mutation S542N and, except for AIL2, the BZLF1 mutation A206S (Figure 8). These two mutations were positioned in CD8 epitope sites, while the other patients harbored BRLF1 A290D, V479I, and P486S. It is noteworthy that AIL2 exhibited nine BZLF1 mutations (not represented in Figure 8).

Mutations Occurred Mostly in Latency Genes and Secondarily in Tegument Genes
The means of the single nucleotide variations (SNVs) and the insertions and deletions (INDELs) were not higher in the AITL group compared to other lymphomas (respectively, 265 and 402 for SNVs and 5 and 7 for INDELs). The analysis of non-synonymous mutations according to nine main gene categories (namely latency, replication, membrane glycoprotein, tegument, capsid, transcription, metabolism, packaging, and unknown function) showed that the majority of changes were located in latency genes, though a large number of mutations also occurred in tegument genes (Figure 7). Conversely, capsid and transcription genes had the lowest number of variations. For each gene, we calculated the average number of non-synonymous mutations for the "AITL" group versus the "other lymphoma" replication proteins were more mutated in the AITL group, such as the BKRF3 prote example. We also looked for variations in proteins implicated in the switch from laten the lytic cycle, mainly Rta, a product of the BRLF1 gene, and Zta or ZEBRA, BZLF1 enc protein. The patients AIL2/3/4/5/9/16 harbored the BRLF1 mutation S542N and, exce AIL2, the BZLF1 mutation A206S (Figure 8). These two mutations were positioned in epitope sites, while the other patients harbored BRLF1 A290D, V479I, and P486S. It is worthy that AIL2 exhibited nine BZLF1 mutations (not represented in Figure 8).     The majority of strains originating from the AITL biopsies contained four EBNA-1 mutations (E16Q, G18E, E24D, and G27S), located in the Gly-Arg domain (aa, amino acids,  and implicated in the EBNA-1-dependent DNA replication and partitioning of the EBV episomes in dividing cells (Figure 9). The region implicated in the other latency gene expression activation (aa 61-89) carried the mutation T85A present in almost all the strains derived from AITL and the mutations V70A and Q74P. Furthermore, the mutation T585P, which occurs in the dimerization domain of the protein, and which is located in important CD4 and CD8 recognition epitopes, was also present in all but one AITL strain. All AITL strains, except AIL2 and AIL11, presented a threonine at the signature codon 487 and therefore belonged to the P-thrV subtype, as described by Gutierrez at al. [51]. strains, except AIL2 and AIL11, presented a threonine at the signature codon 487 and therefore belonged to the P-thrV subtype, as described by Gutierrez at al. [51]. Among the three categories of EBNA-2 domains critical for its transcription regulation function, two are particularly mutated among the AITL strains: the self-association domain 3 (SAD3) and one of the nuclear localization signals (NLS). Indeed, SAD3 (aa 148-214), especially, carried the mutations R163G, Q185R, M196I, T204S, and the conservative duplication L211, which were present in a large number of AITL biopsies. Similarly, the highly represented mutations, H316N on one hand and E476G and P478S, as well as S485P or the S485 frameshift, on the other hand, were located, respectively, on NLS1 (aa 284-341) and NLS2 (aa 471-487). Some of these mutations affected the epitopes recognized mainly by CD8 cells.
EBNA-LP contained a variable number of 66 amino acid repeats corresponding to the W1 and W2 exons of IR1, followed by a C-terminal non-repetitive domain encoded by the exons Y1 and Y2. Conserved regions were determined in the C extremity (CR1 to CR3, implicated in EBNA-2 binding) and in the N-terminal region (CR4, CR5). Although one must be prudent with regard to results obtained for repeats with fragmentation sequencing, the sequences obtained here showed significantly more substitutions in AITL EBNA-LPs compared to the others. The mutations observed for the AITL biopsies (H88N/Q/R, V94E and V101I) were all localized at exon Y2 and concerned the majority of strains.
It is noticeable that for these three proteins the AIL2 strain had an identical profile to the PI or control1 strains without any of the described mutations.
EBNA-LP contained a variable number of 66 amino acid repeats corresponding to the W1 and W2 exons of IR1, followed by a C-terminal non-repetitive domain encoded by the exons Y1 and Y2. Conserved regions were determined in the C extremity (CR1 to CR3, implicated in EBNA-2 binding) and in the N-terminal region (CR4, CR5). Although one must be prudent with regard to results obtained for repeats with fragmentation sequencing, the sequences obtained here showed significantly more substitutions in AITL EBNA-LPs compared to the others. The mutations observed for the AITL biopsies (H88N/Q/R, V94E and V101I) were all localized at exon Y2 and concerned the majority of strains.
It is noticeable that for these three proteins the AIL2 strain had an identical profile to the PI or control1 strains without any of the described mutations.

Two Tegument Genes, BNRF1 and BBRF2, Were Especially Mutated in AITL Biopsies
The major mutations concerning BNRF1 and BBRF2 are reported in Figure 8. The BBRF2 protein forms a hetero-complex with the tegument protein BSRF1, which mediates the viral envelopment [52]. For this protein, the AITL patients showed statistically more mutations than the other patients; A176S is particularly common in AILs.
The major tegument protein BNRF1 can bind to DAXX (death-domain associated protein-6) histone chaperone H3.3 and H4 to form a stable quaternary complex. BNRF1 also carries a PurM-like domain (aa 610-976) and a GATase domain (aa 1037-1318). Detected BNRF1 mutations were unevenly distributed according to the strains. For example, the two mutations, P580T and S587R, which occurred on the DAXX interaction domain (DID), aa 360-600, concerned seven AITL biopsies. Similarly, among the 3 mutations (A762V, N797S, and S861C) which affect the purM-like domain of the protein, 2 were found for 12 AITL biopsies. AIL3/4/5/9 exhibited a mutation profile different from the other strains. No mutations concerned the CD4 or CD8 epitopes.

Discussion
Although AITL is an uncommon, aggressive disease, it is one of the more frequent subtypes of PTCL in the western world. EBV is found in as high as 95% of cases [53], which highlights the close relationship between EBV and AITL. To date, however, whether EBV infection plays a role in AITL pathogenesis remains unclear, and diverse assumptions exist. For this reason, we decided to study the complete sequence of the EBV genome in AITL patients, compared to patients with other lymphomas.
Since EBV was discovered and characterized in BL [54], a B-cell lymphoma, it has been considered to be involved in other B-cell neoplasms. In these proliferations, EBVencoded latent proteins have been shown to directly promote immortalization and proliferation through NF-kB pathway stimulation and increased anti-apoptotic gene expression. Concerning T cell lymphomas, and especially AITL, the role of EBV is probably much more complex and diverse. It has been shown that EBV is mainly present in AITL in B cells [32,55,56]; so, it is assumed that it plays an indirect role in the infected tumor cells through modification of the tumor microenvironment. In these conditions, it is essential to know whether the virus is latent or replicative as the proteins involved and the mechanisms of action are different. Unfortunately, for the biopsies we examined, there was not enough material to be able to determine which cells carried the virus. For these samples, with a particular interest in AITL biopsies, we explored the viral latent or replicative state, and we found that all but one AITL sample harbored only the latent virus (Table 4). To our knowledge, very few papers have examined whether the virus was latent in AITL. Unlike our results, Smith et al., arguing that EBV-positive cell numbers were greater than expected in reactive tissue and that infected cell nuclei were larger than average, reported an active replicative state of EBV [22]. Our method for determining whether the virus was latent [50] seems more accurate and is corroborated by the fact that small EBER RNAs were found in all the AITL cases. Interestingly, only one AITL patient (AIL5) had a high viral load, with an estimated copy number of 12,074 per cell, which is consistent with a replicative state of the virus. In this case, given the presence of EBERs, it can be assumed that some cells carry latent virus, while it is replicative in others. In addition to this work, we recently published a study of viral transcriptomes for seven of these patients (AIL2/3/7/11/14/15/16) and others, showing that the virus was in latency II and, more specifically, in latency IIc [57]. These results call into question the hypothesis that the virus could act in disease progression through cytokine and chemokine modulation. Indeed, in this hypothesis, the immunodeficient state resulting from the disease would lead to viral reactivation, thereby playing a role in the development of the tumor microenvironment [53]. It is more likely that EBV persists in a life-long latent state in infected cells and that this presence, in conjunction with other carcinogens, may promote the evolution to cancer [58].
For each sample, we then calculated the heterogeneity of the obtained sequence, namely the percentage of heterogeneous sites in relation to the genome size [59]. The variability threshold that we chose was 0.2%. This threshold, which is clearly below the threshold chosen by Kwok et al., was validated by the results obtained for the cell lines tested, which were considered as monoclonal controls, and for the primary infection sample and the inflammatory reactive biopsy, which were considered as polyclonal controls (Figure 3). The overall heterogeneity for AITL was between 0.015% (for AIL6 and AIL13) and 0.171% (for AIL2). The 18 AITL samples tested showed less than 0.2% variability, indicating that for these samples the heterogeneity was too low to be due to infection with various strains. As mentioned by others [49], rare mutations may spontaneously arise in viral genomes during clonal expansion. The low number of heterogeneous positions observed is consistent with this hypothesis, and the monoclonal origin of EBV can be attributed to it. This result, found for all AITL samples, makes it possible to consider a role for EBV in this pathology, which would here be more than a simple passenger, contrary to what is suggested by others [53,60]. As expected for this pathology, T lymphocytes were clonal in all but one sample, while B clonality was found in five samples. Unfortunately, due to a lack of material, we were unable to visualize in which cells the virus was present.
Overall, we did not observe any particular strain in the AITL biopsies. After alignment against the EBV-1 and EBV-2 reference strains, the overall results showed that only one patient (AIL18), a native of North Africa, had a type 2 EBV. The low number of EBV-2 in this series is not surprising given that the majority of the patients originate from France and, on the other hand, that EBV-2 is probably less pathogenic than EBV-1. Comparative analysis of all the mutations obtained for the AITL patients and the others does not reveal a strain characteristic of AITL ( Figure 2). This figure shows that the mutation profiles obtained for the AIL1/6/7/8/10/11/12/13/14/15/17 strains are very close. Likewise, the mutation patterns of the AIL3/4/5/9 strains are quite similar; this is particularly visible in the EBNA-3A and -3B regions. It can also be noted that the strains of AIL16, and especially of AIL2, are particularly mutated. The phylogenic analysis carried out on entire sequences supports these different observations (Figure 4). Although many other factors are involved in disease progression, it is interesting to note that the AIL3/4/5/9/16 patients, whose viral strains are close, had an average survival of 2.1 years after their diagnosis, while it was 7.1 years for the AIL1/6/7/8/10/11/12/13/14/15/17 patients ( Table 1). As demonstrated by the phylogeny and regarding the latency genes, the main differences between these two groups of strains relate to EBNA-3A, EBNA-3B, EBNA-3C, and LMP-2 ( Figure 6). For the AIL2/3/4/5/9/16 strains, we showed that there was an S542N mutation on the BRLF1 gene encoding the Rta viral transcription factor. This mutation is positioned on the transactivation domain of the protein. This has also been described by Farrell P.J. (unpublished data) in T-cell disease in Paris. Similarly, the AIL3/4/5/9/16 strains carry the A206S mutation on the BZLF1 gene, a mutation already reported by other authors [61,62], located on the dimerization domain of the viral transcription factor Zta. These two mutations (BRLF1 S542N and BZLF1 A206S) affecting important domains can cause a change in viral behavior. Note that the other strains featured the mutations V479I, P486S on the activation accessory domain and A290D on the DNA-binding domain of BRLF1. These mutations have already been reported [62,63].
Our initial idea that there may be a viral strain specific to AITL led us to analyze the main mutations found on viral genomes. Analysis of genome variation locations showed that the coding regions constituted more than 70% of the mutations, while the EBER and microRNA regions were conserved. Consistent with previous reports [8,59,64], we observed a higher frequency of non-synonymous SNPs in latent genes (Figure 7), followed by the genes encoding tegument proteins. Although the mutation mean was not higher for AITL than for the other lymphomas, the calculation, for each gene, of the number of non-synonymous mutations for the AITL group versus that number for the other lymphomas, showed a significant difference for some genes, especially the EBNA-1, EBNA-2, and EBNA-LP latency genes and the tegument BBRF-2 and BNRF-1 genes (Table 5).
EBNA-1 is very interesting to consider because it is the only EBV antigen consistently expressed in EBV-associated malignancies [1], expressed in all forms of latency, and because of its major function in viral episome maintenance in latently infected cells. In this regard, it is intriguing to see that most AITL strains (15/17) carry four substitutions on the Gly-Arg domain aa 8-67, which is mainly involved in the replication and partitioning of the EBV episomes in the two dividing cells, which are thus able to modify these functions (Figure 9). Apart from these mutations, the T85A substitution, already described by Borozan et al. [62] and found in all our AITLs except AIL2, could play an important role in proliferation because it is positioned in a region involved in the transcriptional activation of other latency genes. The V70A and Q74P mutations present in 10 AITL, which are not described to our knowledge, may contribute to this effect, being positioned in the same region. Finally, the T585P mutation was found in all AITL except AIL2. Recently, the DNA-binding domain of EBNA-1 has been described to be in the form of an oligomeric hexameric ring, the oligomeric interface pivoting around residue T585 [65]. T585 is a position subject to substitutions. Mutations occurring on this residue had effects on EBNA-1-dependent DNA replication and episome maintenance [66]. Polymorphism at this position is often found in NPC tumors and Burkitt's lymphoma [65]. We compared the sequences obtained for our patients to the sequences obtained for the four LCLs (DPL, KREB2, CoAN, and MLEB2) that we established from non-hematological subjects [57]. Among the different interesting mutations found in our present patients and concerning different genes, only EBNA-1 T85A and EBNA-1 T585P were found in LCL. These mutations seem to be widespread in the French population. It is noteworthy that the changes observed in the EBNA-1 sequences fell principally within the known T-cell epitopes: all mutations but E24D and G27S are located in CD8 epitopes and V70A, Q74P, T85A, and T585P are positioned in CD4 epitopes. These results imply that immune pressure could be in part responsible for these changes. Based on the amino acid present at EBNA-1 position 487, EBV has been classified into five subtypes: P-ala (wild type B95-8 subtype), P-thr, V-leu, V-val, and V-pro [51,67]. Considering these five subtypes, it is noteworthy that 15 among the 17 AITL EBV-1 positive patients harbored a P-thr subtype, which was also the most prevalent in the other samples. V-val is reported to be dominant in Asian regions, while the P-thr subtype is most commonly observed in the peripheral blood of European and Australian subjects, as well as in African tumors [63].
The latent protein EBNA-2 acts mainly as a transcription factor and therefore comprises the transactivation domains (TAD), auto-association domains (SAD), and nuclear location signals (NLS) important for its function. The main mutations affecting EBNA-2 in the AITL group impact the SAD3 domain or the NLS domains and are thus able to modify its regulatory function. Many of these mutations, particularly the E476G mutation, present among others in the E2A subtype [68], have been described in the French as well as English or Eurasian, American [62], or Asian [69] populations. To our knowledge, only the H316N mutation located on the NLS1 domain and the R163G positioned on the SAD3 domain have not been previously described, although position 163 was found to be very frequently mutated (R163M substitution) by Wang et al. [68]. These two mutations are highly represented in AIL.
EBNA-LP is a latent protein, acting principally as an EBNA-2 coactivator and therefore has an important role in B-cell immortalization. It is notable that this was the most significantly mutated protein in the AITL group compared to the other patients. The main substitutions observed for our patients (H88N/Q/R, T93P, V94E, and V101I) were grouped and located at the C-terminal extremity, between CR4 and CR5. Most published viral sequences do not carry these mutations, apart from V94E and V101I, which were described by Ba Abdullah et al. to characterize subgroup C [70].
Two tegument proteins were particularly more mutated in AITL patients than in other patients: BBRF2 and BNRF1 (Figure 8). Recently, it was shown that BBRF2 can associate with BSRF1, another tegument protein, and the heterocomplex formed has a role in the binding of EBV nucleocapsids to the Golgi membrane during secondary envelopment. Therefore, BBRF2 appears to play an important role in viral infectivity [71]. The main mutation found (A176S) is highly represented in AIL compared to other pathologies. However, it has already been described, including in healthy subjects [4,61].
The major tegument protein BNRF1, through its binding to the antiviral DAXX histone chaperone, H3.3 and H4, is implicated in the establishment of latency and cell immortalization [72]. The stable quaternary complex formed is responsible for its localization to the PML-nuclear bodies involved in the antiviral intrinsic resistance and the transcriptional repression of host cells [73]. The mutations that we have highlighted in AITL only concerned a subgroup of patients. Most of these mutations have been previously described [4,62]. However, some of them could alter the behavior of the protein, such as those located on DID, namely P580T and S587R, or those affecting the purM-like domain of the protein (A762V, N797S, and S861C).
Interestingly, BKRF3, a viral uracil-DNA glycosylase, participates in the repair of viral DNA and prevents its mutagenesis. Obviously, for this protein, the mutations that severely affect viral replication cannot be retained. In this study, E128D substitution was present for all AITL patients except AIL2. The presence of this mutation could modify the virus-cell relationships.
In the end, it is obvious that there is probably no specific viral mutation for AIL. However, on the other hand, it is possible that a combination of mutations affecting several genes, even if they have already been listed, could modify the behavior of the virus. With this in mind, it would be interesting to study how the virus found in AIL3/4/5/9 behaves within the cell.

Conclusions
In summary, the whole viral sequences obtained for the 18 AITL patients and compared to other patients identified a poorly represented strain within the AITLs, but which seemed to be associated with poor outcome. Furthermore, we demonstrated that the virus was clonal and latent in all the AITL biopsies analyzed. These various elements suggest that the virus is involved in this pathology.
Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/cancers14122899/s1, Figure S1: Results obtained for EBER ISH staining; Figure S2: Phylogenetic tree obtained after nucleotide sequence alignment of the different strains; Table S1: Genetic Variations and Heterogeneity Determination for All Samples.

Data Availability Statement:
The seventeen generated whole genome sequences were deposited in the NCBI GenBank (N • MH837512 to MH837528). The totality of the Fastq reads obtained have been deposited in NCBI GeneBank (BioProject ID PRJNA505149).