IMGT® Biocuration and Comparative Analysis of Bos taurus and Ovis aries TRA/TRD Loci

The adaptive immune response provides the vertebrate immune system with the ability to recognize and remember specific pathogens to generate immunity, and mount stronger attacks each time the pathogen is encountered. T cell receptors are the antigen receptors of the adaptive immune response expressed by T cells, which specifically recognize processed antigens, presented as peptides by the highly polymorphic major histocompatibility (MH) proteins. T cell receptors (TR) are divided into two groups, αβ and γδ, which express distinct TR containing either α and β, or γ and δ chains, respectively. The TRα locus (TRA) and TRδ locus (TRD) of bovine (Bos taurus) and the sheep (Ovis aries) have recently been described and annotated by IMGT® biocurators. The aim of the present study is to present the results of the biocuration and to compare the genes of the TRA/TRD loci among these ruminant species based on the Homo sapiens repertoire. The comparative analysis shows similarities but also differences, including the fact that these two species have a TRA/TRD locus about three times larger than that of humans and therefore have many more genes which may demonstrate duplications and/or deletions during evolution.


Introduction
The adaptive immune response arose in jawed vertebrates or gnathostomata more than 450 million years ago. It is characterized by the remarkable specificity and the extreme diversity of their antigen receptors [1]. These antigen receptors of the adaptive immune response are the immunoglobulins (IG) or antibodies of the B cells and plasmocytes [2], and the T cell receptors (TR) of the T cells [3]. The IG recognize antigens in their native form, whereas the TR recognize processed antigens, which are presented as peptides by the major histocompatibility (MH) proteins.
T cell receptors (TR) are divided into two groups, αβ and γδ, which express distinct TR containing either α and β, or γ and δ chains, respectively [3]. Each TR chain comprises a variable and a constant domain. The variable domain is the result of one rearrangement between variable (V) and joining (J) genes for α and γ chains, and two consecutive rearrangements between diversity (D) and J genes then between V and partially rearranged D-J genes for β and δ chains. After transcription, the V-(D)-J sequence is spliced to the constant (C) gene to give the final transcript [3].
The human TRα (TRA) locus consists of a cluster of 56 TRAV genes located upstream (in 5 ) of a J-C cluster, composed of sixty-one TRAJ and one TRAC [3]. The TRδ (TRD) locus is nested in the TRA locus between the TRAV and the TRAJ genes [3]. This locus comprises A comparison was performed based on the number of genes in the locus as well as the number of genes per subgroup (potential germline repertoire), the locus representation, the functionality of genes and the CDR lengths. Potential duplications and/or deletions that may have occurred during evolution are susceptible to be highlighted from this sort of comparison.

Annotation of TRA/TRD Loci
The two TRA/TRD loci were annotated following the pipeline described in the Materials and Methods. The results of the annotation described below are summarized in Table 1. The information regarding the genome assemblies and the boundaries is provided in Supplementary Table S1.

Comparison with Previous Studies
Regarding the sequences and the number of gaps, the quality of the last assemblies (this study) is better than the previous studies. For the bovine, the entire locus is localized on the chromosome 10 and there is only seven gaps. In all the previous assemblies there are genes on unplaced scaffolds and there are more than 260 gaps, except for [11]. On the other hand, many more genes have been described in previous studies (cf. Table 2). For the sheep, the entire locus is localized on the chromosome 7 and there are eighteen gaps. In the previous assembly there are genes on unplaced scaffolds and there are more than 80 gaps. Unlike cattle, fewer genes have been described in previous studies (cf. Table 3).
Given that there is access to two full assemblies (ARS-UCD1.2 for Bos taurus and Oar_rambouillet_v1.0 for Ovis aries), qualified as "representative genome" and as the corresponding TRA/TRD locus has been fully localized on a single chromosome with fewer gaps than in previous IMGT annotated genomic sequences, IMGT000049 and IMGT000048 are considered as IMGT references loci. It has allowed the establishment of the bovine and sheep TRA/TRD gene nomenclature, as well as the evaluation of the functionality of genes. The previous IMGT genomic sequences were re-annotated accordingly and the allelic variants determined based on nucleotide differences in the core region (V-REGION, D-REGION, J-REGION, C-REGION).

Comparison of the TRA J-C-CLUSTER
The number of TRAJ genes of human and bovine is similar and there are 19 more genes in sheep (cf. Table 1). Two TRAJ genes (TRAJ51 and TRAJ55) are missing in cattle and sheep compared to humans, and there are two TRAJ8 genes while there is only one in human. (cf. Table 4). The 19 TRAJ supplementary genes found in the sheep as a consequence of a duplication (or triplication for some genes) from TRAJ29 to TRAJ39 maybe due to a sequencing error or an amplification. Regarding the functionality, TRAC genes are functional and few TRAJ genes are P in human and bovine (3-4 and 4-6, depending on alleles, respectively). On the other hand, there are more pseudogenes in sheep mostly due to the duplicated genes (11 P out of 13 are duplicated genes) (cf. Table 4). Table 4. IMGT Potential germline repertoires of the TRAJ sets in human (Homo sapiens), bovine (Bos taurus) and sheep (Ovis aries).

Sets
Homo sapiens Bos taurus Ovis aries At the genomic level, each TRAC gene consists of several exons whose sizes are the same for all species except for exon 4 which is untranslated (EX4UTR) (cf. Figure 1). On the other hand, the size of the introns varies according to the species, especially between human and bovine/sheep. In humans, the intron between the exon 1 (EX1) and the exon 2 (EX2) and the intron between EX2 and the exon 3 (EX3) are shorter while the intron between EX3 and EX4UTR is longer compared to bovine and sheep. Each TRAC gene encodes a similar protein of 142 AA with the exon 1 (EX1) encoding the constant domain, the exon 2 (EX2) and the 5' part of the exon 3 (EX3) encoding the connecting region, the middle of EX3 encoding the transmembrane region and the 3' part of EX3 encoding the cytoplasmic region (cf. Figure 2). Nevertheless, the structure of EX1 is different, there are fewer AA in the E and F strand and more AA in the G strand of human TRAC compared to bovine/sheep.     [29]. The AA between parentheses at the beginning of EX1, EX2 and EX3 corresponds to the first codon resulting from a splicing frame 1 (sf1

Comparison of the TRD D-J-C-CLUSTER
The number of TRDJ genes of human, bovine and sheep is the same but there are more TRDD genes in bovine and sheep (nine against three in human) (cf. Table 1). Regarding the functionality, TRDC genes are functional, few TRDD genes are ORF in bovine and sheep (three and four, respectively) (cf. Table 5) and one TRDJ gene is ORF both in bovine and sheep (TRDJ2) (cf. Table 6). Table 5. IMGT Potential germline repertoires of the TRDD sets in human (Homo sapiens), bovine (Bos taurus) and sheep (Ovis aries).

Sets
Homo sapiens Bos taurus Ovis aries For each TRDD set, in each species, the number of TRDD genes by functionality and, between parentheses, the number of alleles are shown. F: functional; O: ORF. Data available in IMGT Repertoire (IG and TR) http://www.imgt.org/IMGTrepertoire/ > Locus and genes > Potential germline repertoires > TRDV, TRDD and TRDJ > Human, ibid. Bovine, ibid. Sheep. Unlike TRAC, the size of the exons of TRDC varies depending on the species except for EX1 (cf. Figures 3). The EX2 is shorter in human but the EX3 is longer compared to bovine and sheep. In the same way, the size of the introns varies according to the species. Each TRDC gene encodes a similar protein of 155-156 AA with EX1 encoding the constant domain, EX2 and the 5 part of EX3 encoding the connecting region and the 3 part of EX3 encoding the transmembrane region (cf. Figure 4).

Comparison of the V-CLUSTER
The size of the V-CLUSTER (which describes the principal set of TRAV/TRDV genes) varies (cf. Figure 5). The V-CLUSTER is less extensive in human (56 genes on 900 kb) than in the bovine and sheep, which is consistent with the number of genes in these species (221 genes over 2200 kb and 346 genes on 2700 kb, respectively). Regarding the functionality of V genes, the proportion of functional genes is more important in human and in bovine compared to pseudogenes. However, there are more pseudogenes in sheep.     [29]. The AA between parentheses at the beginning of EX1, EX2 and EX3 corresponds to the first codon resulting from a splicing frame 1 (sf1  Colors are according to IMGT color menu for genes (http://www.imgt.org/IMGTScientificChart/RepresentationRules/colormenu.php#h1_28): in green: functional genes, in yellow: ORF genes and in red: pseudogenes. The dotted line in Bos taurus indicates the distance in kb between two genes not represented at scale. Data available in IMGT Repertoire (IG and TR) http://www.imgt.org/IMGTrepertoire/ > Locus and genes > Locus representations > TRA, ibid. TRD > Human, ibid. Bovine, ibid. Sheep.

Comparison of the TRAV genes
All subgroups were defined according to those of the human genome. A phylogenetic tree with one representative gene by subgroup (except for TRAVA, TRAVB and TRAVC, highly degenerated pseudogenes present only in human) for the human, the bovine and the sheep was created in order to highlight the distance between the species within a subgroup (cf. Figure 6). This phylogenetic tree shows that, for the two species, the genes of a subgroup are grouped in the same branch with a corresponding human gene. Nonetheless there are subgroups missing in both cattle and sheep (TRAV7, TRAV15, TRAV30, TRAV31, TRAV32, TRAVA, TRAVB and TRAVC) and only in sheep (TRAV40), new subgroups in bovine and sheep (TRAV43, TRAV44 and TRAV45) and three subgroups are intermingled: TRAV4, TRAV26 and TRAV44 (cf. Supplementary Figure S3). However, there is less than 75% identity among the genes of these three subgroups for a given species, so they cannot be considered as genes belonging to the same subgroup.
The number of TRAV genes varies depending on the species. There are fewer genes in human than in bovine and fewer genes in bovine than in sheep (cf. Table 1). The number of genes per subgroup also varies according to the species (cf. Table 7). In humans there are one or two genes by subgroup except for TRAV8 and TRAV12 (eight and three genes, respectively) while in cattle and sheep there are subgroups highly developed. In the sheep, there are six subgroups with more than 20 genes (TRAV8, TRAV13, TRAV22, TRAV23, TRAV25 and TRAV44) and three subgroups with more than 10 genes (TRAV9, TRAV14 and TRAV43) although there are only five subgroups in bovine with more than 10 genes (TRAV22, TRAV23, TRAV25, TRAV44 and TRAV45). In addition, as show in the phylogenetic tree (cf. Figure 6) eight subgroups are absent in both species and one subgroup is missing only in sheep.
The CDR lengths are relatively well conserved between the different species (cf. Table 8). The most important differences are in bovine where for some subgroups there are two or three different lengths (TRAV10, TRAV20, TRAV22 and TRAV38) and for three human subgroups in which the CDR length is different from bovine and sheep (TRAV11, TRAV35 and TRAV39). These differences are shown in red in Table 8. For two subgroups (TRAV17 and TRAV18) the bovine has some genes with the same CDR lengths as human (in blue) and some with the same CDR lengths as sheep (in green). Table 7. IMGT Potential germline repertoires of the TRAV subgroups in human (Homo sapiens), bovine (Bos taurus) and sheep (Ovis aries).

Comparison of the TRDV genes
Like for the TRAV genes, the subgroups were defined according to those of the human genome and a phylogenetic tree with all genes was created (cf. Figure 7). This phylogenetic tree shows that, except for the TRDV1 subgroup, the genes are grouped in the same branch with a corresponding human gene. However the TRDV1 subgroup is divided in two branches even if there is more than 75% identity between all those genes.  Tree generated using NGPhylogeny.fr [25] (with MAFFT [26] and PhyML [27] programs) and iTOL v4 [37].
As for the TRAV genes, the number of TRDV genes varies depending on the species. There are fewer genes in human than in bovine and fewer genes in bovine than in sheep (cf. Table 1). There are two new subgroups in bovine and sheep compared to human (TRDV4 and TRDV5) and the TRDV1 subgroup much larger in cattle and sheep with 50 and 84 genes, respectively, compared to 1 in human (cf. Table 9). Contrary to TRAV genes, the CDR lengths are not conserved between human and bovine/sheep for TRDV2 and TRDV3 subgroups (cf. Table 10). For TRDV1 subgroups, there are several different lengths for bovine and sheep (nine and five respectively) due to the high number of genes in this subgroup. There are also genes with lack of CDR2-IMGT and part of CDR3-IMGT (deletion of nine amino acids (AA), not shown in Table 10). This particularity was already described in bovine by [11] and is present in sheep too. Four genes are concerned in bovine (three in-frame and one out-of-frame (P with frameshift)) and two in sheep (six in-frame and two out-of-frame). The in-frame genes are shown in Figure 8.

Subgroups
Homo sapiens Bos taurus Ovis aries

Analysis of the cDNA Sequences
The last step of the biocuration pipeline consists of the automatic annotation of the cDNAs available in IMGT/LIGM-DB database with the IMGT/Automat tool [21]: 176 cDNA sequences for cattle and 102 for sheep were annotated. This annotation allowed to highlight the transcription of approximately 50% (for cattle) and 40% (sheep) of the germline genes. Interestingly, TRAJ54 which has a stop codon in position 1 of the J-REGION, and TRDV1-13 with a stop codon in position 108, last position of the V-REGION have been found rearranged and give a productive sequence (with no stop codon and an in-frame junction) in accessions numbers JX065661 (http://www.imgt.org/ligmdb/result.action?accessionNumber=JX065661) and BC113229 (http://www.imgt.org/ligmdb/result.action?accessionNumber=BC113229) respectively, showing the trimming of the stop codon during the rearrangement. FR1-IMGT  CDR1-IMGT  FR2-IMGT  CDR2-IMGT  FR3-IMGT  CDR3-IMGT  (1-26

Discussion
This study was carried out in order to highlight the differences between the IMGT ® annotation and the data previously published and to compare the TRA/TRD loci among bovine and sheep against the human locus. The annotation of each locus followed the pipeline defined in Materials and Methods. The expertise that follows this pipeline permits to establish the TRA/TRD germline repertoire according to IMGT ® nomenclature and the IMGT ® reference directory (IMGT ® reference sequences used by IMGT ® tools) of each locus and thus obtain sequence, gene and structure data. For each gene analyzed, there are more than 200 pieces of information available in IMGT ® databases, tools and web pages. The comparison of the data obtained after the biocuration was carried out against the data of the human TRA/TRD loci. This analysis was done with respect to the data entered in IMGT Repertoire.
The two loci in the last assemblies have fewer gaps and are localized on a chromosome without unplaced scaffold compared to the previous studies (cf . Tables 2 and 3). Indeed, it is a basic requirement, with an expected positional organization of genes in the locus, for the annotation of a complete locus with a definitive nomenclature in IMGT ® . We rely on publicly available data, which is why we need good quality data so that we can annotate what we see with good quality annotations.
It is worth noting that the nomenclature presented in this manuscript, for the under question loci and species, is carved on stone and it will not change in the future. As a matter of fact, once the IMGT biocuration team gets hold of a genomic assembly covering the whole locus (no contigs, no scaffolds), then a reference assembly is established which gives rise to the definite IMGT nomenclature. Obviously enough, subsequent assemblies might/will be available either for the same individual or for other individuals which will constitute novel haplotypes in the latter case, but will not afffect the original nomenclature.
During the analysis of the TRA/TRD locus in bovine and sheep, it was noted that the general organization of the locus is conserved and is similar to the human one even if the V-CLUSTER is more extensive (cf. Figure 5). It should be noted that the IMGT ® unique nomenclature, based on subgroup assignment and position of genes within the locus, represents a valuable help in highlighting locus organizational similarities or differences.
The results show that some subgroups are missing and three new subgroups were described in bovine and sheep compared to human. Some subgroups are more represented in bovine and in sheep than in human, which may indicate potential duplications during evolution. It can also explain the difference in the proportion of functional genes. Indeed, duplicated subgroups in bovine and sheep are composed of an important proportionality of pseudogenes resulting higher number of pseudogenes compared to human. Another indication of duplication during evolution is the presence of an important number of TRDV1 genes (50 in bovine and 66 in sheep) compared to 1 in human [13].
In the TRAV genes, there is only one CDR length for most of human, bovine and sheep subgroups, except for six bovine subgroups (TRAV10, TRAV17, TRAV18, TRAV20, TRAV22 and TRAV38) (cf. Table 8) while in the TRDV1 subgroups there are several lengths (cf. Table 10) and even some genes without CDR2-IMGT (cf. Figure 8).
It would be interesting to see if these specificities (expansion of the TRDV1 subgroup and of the TRAV subgroups, absence of CDR2-IMGT for some TRDV1 genes, etc.) are also found in other ruminant species.
The veterinary species are valuable models for immunological and medical research. The comparison of the TRA/TRD locus among bovine and sheep presented here allow to have a global vision of the TRA/TRD locus in Bovidae and will be a useful resource to analyze the TRA/TRD locus in new species not yet analyzed. The work carried out and the use of the methodology established for the analysis of the TRB locus [19] show that this procedure can be used to facilitate the analysis of IG (IGH, IGK and IGL) and TR (TRA, TRB, TRD and TRG) loci among different species.

Abbreviations
The following abbreviations are used in this manuscript: