IMGT® Homo sapiens IG and TR Loci, Gene Order, CNV and Haplotypes: New Concepts as a Paradigm for Jawed Vertebrates Genome Assemblies

IMGT®, the international ImMunoGeneTics information system®, created in 1989, by Marie-Paule Lefranc (Université de Montpellier and CNRS), marked the advent of immunoinformatics, a new science which emerged at the interface between immunogenetics and bioinformatics for the study of the adaptive immune responses. IMGT® is based on a standardized nomenclature of the immunoglobulin (IG) and T cell receptor (TR) genes and alleles from fish to humans and on the IMGT unique numbering for the variable (V) and constant (C) domains of the immunoglobulin superfamily (IgSF) of vertebrates and invertebrates, and for the groove (G) domain of the major histocompatibility (MH) and MH superfamily (MhSF) proteins. IMGT® comprises 7 databases, 17 tools and more than 25,000 pages of web resources for sequences, genes and structures, based on the IMGT Scientific chart rules generated from the IMGT-ONTOLOGY axioms and concepts. IMGT® reference directories are used for the analysis of the NGS high-throughput expressed IG and TR repertoires (natural, synthetic and/or bioengineered) and for bridging sequences, two-dimensional (2D) and three-dimensional (3D) structures. This manuscript focuses on the IMGT® Homo sapiens IG and TR loci, gene order, copy number variation (CNV) and haplotypes new concepts, as a paradigm for jawed vertebrates genome assemblies.


Introduction
The adaptive immune response of the jawed vertebrates (or gnathostomata), which appeared in evolution about 450 million years ago is characterized by a remarkable immune specificity and memory, which are the properties of the B and T cells owing to an extreme diversity of their antigen receptors, the immunoglobulins (IG) or antibodies and the T cell receptors (TR), respectively [1]. In humans and other mammals, an IG consists of two identical light chains (kappa (IGK) or lambda (IGL)) and two identical heavy chains (IGH) [2], while a TR consists of two chains, either alpha (TRA) and beta (TRB), or gamma (TRG) and delta (TRD) [3]. Each IG and TR chain comprises a variable domain (V-DOMAIN) which determines the specificity for the antigen, and a constant region (C-REGION) composed of one, three or four constant domains (C-DOMAIN) depending on the chain type [4,5]. The V-DOMAIN results from the genomic rearrangement of variable (V), diversity (D) and joining (J) genes for IGH, TRB and TRD chains (encoding a V-D-J-REGION) and of V and J genes for IGK, IGL, TRA and TRG chains (encoding a V-J-REGION) [1][2][3][4][5]. Additional mechanisms occurring during the rearrangements (N diversity, somatic hypermutations for the IG [5]) contribute to the extreme diversity of the IG and TR (theoretically 10 12 IG and TR per individual, which is only limited by the number of the B and T cells that an organism is genetically programmed to produce) [1]. IMGT ® , the international ImMunoGeneTics information system ® (http://www.imgt .org) (accessed on 22 February 2022) ( Figure 1) [1][2][3][4][5], was created in 1989 by Marie-Paule Lefranc (Université de Montpellier and CNRS) in order to characterize the genes and alleles involved in the IG and TR synthesis of vertebrate species from fish to human, and to standardize and manage the huge and complex diversity of IG and TR sequences and structures. The founding of IMGT ® marked the birth of immunoinformatics [1], a new science, which emerged at the interface between immunogenetics and bioinformatics. For the first time, IG and TR genes (V, D, J and C) were officially recognized as "genes" as well as were the conventional genes [2,3,6,7]. This major breakthrough allowed genes and data of the complex and highly diversified adaptive immune responses to be managed in genomic databases and tools [1]. IMGT ® , the international ImMunoGenetics information system ® has been online since 1995 (the first Internet connexion of IMGT/LIGM-DB occurred at the 9th International Congress of Immunology (ICI) in San Francisco, CA, USA), 23-29 July 1995), marking the 7-year anniversary of the first Internet France-USA connexion of 28 July 1988). The IG and TR nomenclature [1][2][3][4][5][6][7], based on the internationally acknowledged expertise of the Laboratoire d'ImmunoGénétique Moléculaire (LIGM), has been endorsed since 1992 by the World Health Organization-International Union of Immunological Societies (WHO-IUIS) [8,9], making IMGT ® , the global reference in immunogenetics and immunoinformatics.
Biomolecules 2022, 12, x FOR PEER REVIEW 2 of 49 (theoretically 10 12 IG and TR per individual, which is only limited by the number of the B and T cells that an organism is genetically programmed to produce) [1]. IMGT ® , the international ImMunoGeneTics information system ® (http://www.imgt.org) (accessed on 22 February 2022) ( Figure 1) [1][2][3][4][5], was created in 1989 by Marie-Paule Lefranc (Université de Montpellier and CNRS) in order to characterize the genes and alleles involved in the IG and TR synthesis of vertebrate species from fish to human, and to standardize and manage the huge and complex diversity of IG and TR sequences and structures. The founding of IMGT ® marked the birth of immunoinformatics [1], a new science, which emerged at the interface between immunogenetics and bioinformatics. For the first time, IG and TR genes (V, D, J and C) were officially recognized as "genes" as well as were the conventional genes [2,3,6,7]. This major breakthrough allowed genes and data of the complex and highly diversified adaptive immune responses to be managed in genomic databases and tools [1]. IMGT ® , the international ImMunoGenetics information system ® has been online since 1995 (the first Internet connexion of IMGT/LIGM-DB occurred at the 9th International Congress of Immunology (ICI) in San Francisco, CA, USA), 23-29 July 1995), marking the 7-year anniversary of the first Internet France-USA connexion of 28 July 1988). The IG and TR nomenclature [1][2][3][4][5][6][7], based on the internationally acknowledged expertise of the Laboratoire d'ImmunoGénétique Moléculaire (LIGM), has been endorsed since 1992 by the World Health Organization-International Union of Immunological Societies (WHO-IUIS) [8,9], making IMGT ® , the global reference in immunogenetics and immunoinformatics. Figure 1. IMGT ® , the international ImMunoGenetics information system ® , http://www.imgt.org (accessed on 22 February 2022) [1,5]. IMGT ® comprises seven IMGT databases (shown as cylinders), seventeen online IMGT tools (shown as rectangles) and the IMGT Web resources (more than 25,000 pages, the 'IMGT Marie-Paule page') (not shown), for genes (in yellow), sequences (in green) and structures (in blue), all available from the IMGT ® Home page. IMGT/mAb-DB has been online since 4 December 2009. IMGT/HighV-QUEST for next-generation sequencing (NGS) high-throughput sequence analysis, created in October 2010, has been available on the web since 22 November 2010.  [1,5]. IMGT ® comprises seven IMGT databases (shown as cylinders), seventeen online IMGT tools (shown as rectangles) and the IMGT Web resources (more than 25,000 pages, the 'IMGT Marie-Paule page') (not shown), for genes (in yellow), sequences (in green) and structures (in blue), all available from the IMGT ® Home page. IMGT/mAb-DB has been online since 4 December 2009. IMGT/HighV-QUEST for next-generation sequencing (NGS) high-throughput sequence analysis, created in October 2010, has been available on the web since 22 November 2010. (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org) (accessed on 22 February 2022).
This detailed identification, description and classification of the human IG and TR loci, genes and alleles [2,3], using the IMGT Scientific chart rules, is the result of a huge work of annotation and expert analysis, by LIGM, of tens of thousands of nucleotide sequences from phages, cosmids or contigs submitted by the authors to the generalist nucleotide databases (EMBL database, now European Nucleotide Archive (ENA) [123], GenBank [124] and DNA Databank of Japan (DDBJ) [125]. The annotated sequences were integrated into the newly created IMGT/LIGM-DB [10,11], using the EMBL/GenBank/DDBJ accession numbers in order to facilitate interoperability with the generalist nucleotide databases. The Nature and Science papers on the human genome sequencing [126,127], published in 2001, contain limited information on the genes of the adaptive immune responses. However a careful analysis of the maps published in these papers allowed us to confirm the chromosomal localizations of the seven main loci: IGH, IGK and IGH (for the immunoglobulins), TRA, TRB, TRG and TRD (for the T cell receptors), described in 2001, in the Immunoglobulin FactsBook [2] and T cell receptor FactsBook [3], respectively, and determined by an analysis of translocations involving the IG and/or TR loci in leukemia and lymphoma (http://www.imgt.org/IMGTrepertoire/GenesClinical/translocation/hu man/overview/Hu_overviewpart1.html) (accessed on 22 February 2022).

Extension to Mus musculus and Fish (Chondrichtyes and Teleostei)
Based on this paradigm of the human loci (IMGT nomenclature, IMGT unique numbering, IMGT standardized keywords and labels), the seven mouse (Mus musculus) loci with a total of 625 genes (377 IG and 248 TR) [128][129][130][131]  The analysis of IG genes of four Chondrichthyes and twenty-two Teleostei different species confirmed that the IG and TR paradigm was applicable for fish, however, most sequences were at that time unmapped and were assigned a provisional nomenclature with the letter S [132,133]. The Chondrichthyes and Teleostei light chain which is neither kappa nor lambda was defined as 'iota' encoded by genes of the IG iota (IGI) locus which includes IGIV, IGIJ and IGIC groups (http://www.imgt.org/IMGTrepertoire/LocusGenes/genetab le/Teleostei/#IGIV) (accessed on 22 February 2022).

Homo sapiens and Mus musculus Data Availability Online
Since 1998, novel Homo sapiens and Mus musculus genes and alleles have been announced in 'IMGT ® Creations and updates' and validated by the IUIS NOM IMGT-NC [114]. Since 2003, IMGT/GENE-DB provides direct links (access from the Query page) which allow the most frequent requests to be encoded in the form of URL: (i) for a given gene {(1). IMGT/GENE-DB entry, (2). IMGT/GENE-DB reference sequences of alleles of a given gene in FASTA format, (3). IMGT/LIGM-DB label sequences in FASTA format, (4). Tables of known IMGT/LIGM-DB cDNA or rearranged gDNA sequences or known IMGT/3Dstructure-DB entries}, or (ii) for genes of a group {(1). IMGT/GENE-DB reference sequences of genes of a group in FASTA format, (2). IMGT/LIGM-DB label sequences in FASTA format}. There are also direct links to IMGT/GENE-DB and generalist genomic databases entries in two formats, HTML tables and CSV format.
On 25 November 2021, IMGT/GENE-DB data include 732 Homo sapiens IG and TR genes (with links to HGNC, NCBI Gene, Ensembl, GenAtlas, GeneCards and UniProt) and 916 Mus musculus IG and TR genes (with links to MGI and NCBI Gene). The information, for each IMGT/GENE-DB entry, include: IMGT gene functionality, IMGT gene definition (for Homo sapiens and Mus musculus IG and TR), the HGNC gene definition (identical to the IMGT gene definition), number of alleles, chromosomal localization and IMGT/LIGM-DB reference sequence(s) for allele *01. IMGT/GENE-DB is updated weekly, with downloads available in different formats, in the "IMGT downloads" section.

IUIS NOM IMGT-NC Reports for New IG and TR Loci Gene and Allele Names
With the increase in genome sequencing and assembly, the starting point for IG and TR gene identification, description and classification has moved from individual sequences (researchers' submission to generalist databases) to the IG and TR locus identification in NCBI Whole Genome Assemblies (WGS) (submitted by sequencing groups and analyzed by researchers).
In order to allow researchers to go ahead with expression studies and to publish their data with IMGT gene names even if the loci are not yet been annotated in IMGT ® or in other specialist databases, the IUIS NOM Sub-Committee [114] has created the IUIS NOM IMGT-NC Reports. That initiative allows scientists to propose IMGT gene names for new IG and TR variable (V), diversity (D), joining (J) and constant (C) genes and alleles, for a given locus of a given species, to the IUIS Sub-Committee for approval, based on the IMGT Scientific chart rules and the IMGT-ONTOLOGY concepts of classification (CLASSIFICATION axiom).
The submission for an IUIS NOM IMGT-NC Report requires that each gene sequence has an accession number in a generalist database (with localization if large original sequence) and that each V, D, J or C gene sequence has been mapped (cloned from bacterial artificial chromosome (BAC), fosmid, cosmid or phage, or extracted from a referenced genome assembly) ( Figure 2).    Recent examples of veterinary IG and TR loci from genome assemblies, analyzed by scientists using gene and allele names validated by the IUIS NOM IMGT-NC [114], include: dog (Canis lupus familiaris) [134], the first veterinary species with the seven loci identified, cat (Felis catus) TR loci [135], rabbit (Oryctolagus cuniculus) TRA locus [136], dolphin (Tursiops truncatus) [137], Salmonid including salmon (Salmo salar) and trout (Oncorhynchus mykiss) IGH duplicated loci [138,139] and TRA/TRD locus [140]. These examples of different species and loci have been key elements in the setting of the submission criteria and steps of the now well established IUIS NOM IMGT-NC Reports [114]. They also confirm the necessity for databases using these data (for analysis or biocuration) to cite and link to the original IUIS report to guarantee interoperability. This is illustrated by the links made to the IUIS reports, in IMGT ® Creations and updates (http://www.imgt.org/IMGTinforma tion/creations/) (accessed on 20 February 2022), following data annotation by the IMGT biocurators for data entry in IMGT ® , as described in Section 6.

Locus in Genome Assembly
Before starting IMGT biocuration of a new IG or TR locus of a veterinary species, information is collected in 'Locus in genome assembly' (Figure 3). For an easier comparison between loci of different species, and/or between loci of different genomes assemblies (or of different haplotypes, including CNV), the IDENTIFICATION axiom has been enriched by the implementation of 'IMGT locus ID' and 'IMGT/LIGM-DB locus reference sequence (ID)' (Figure 3). Recent examples of veterinary IG and TR loci from genome assemblies, analyzed by scientists using gene and allele names validated by the IUIS NOM IMGT-NC [114], include: dog (Canis lupus familiaris) [134], the first veterinary species with the seven loci identified, cat (Felis catus) TR loci [135], rabbit (Oryctolagus cuniculus) TRA locus [136], dolphin (Tursiops truncatus) [137], Salmonid including salmon (Salmo salar) and trout (Oncorhynchus mykiss) IGH duplicated loci [138,139] and TRA/TRD locus [140]. These examples of different species and loci have been key elements in the setting of the submission criteria and steps of the now well established IUIS NOM IMGT-NC Reports [114]. They also confirm the necessity for databases using these data (for analysis or biocuration) to cite and link to the original IUIS report to guarantee interoperability. This is illustrated by the links made to the IUIS reports, in IMGT ® Creations and updates (http://www.imgt.org/IMGTinformation/creations/) (accessed on 20 February 2022), following data annotation by the IMGT biocurators for data entry in IMGT ® , as described in Section 6.

Locus in Genome Assembly
Before starting IMGT biocuration of a new IG or TR locus of a veterinary species, information is collected in 'Locus in genome assembly' (Figure 3). For an easier comparison between loci of different species, and/or between loci of different genomes assemblies (or of different haplotypes, including CNV), the IDENTIFICATION axiom has been enriched by the implementation of 'IMGT locus ID' and 'IMGT/LIGM-DB locus reference sequence (ID)' (Figure 3).

IMGT Locus ID and IMGT/LIGM-DB Locus Reference Sequence
An 'IMGT locus ID' comprises the 6-letter (or 9-letter) code from the genus and species (or subspecies) Latin names (IMGT taxon abbreviations), the locus type and a chronological increasing number, separated by underscores, for example, Macmul_IGL_2 ( Figure 3).
An 'IMGT/LIGM-DB locus reference sequence' is an IMGT accession number ('IMGT' followed by 6 digits) which identifies the IMGT/LIGM-DB flat files containing an IG or TR locus (or part of it) extracted from an NCBI genome assembly and presented in its own 5 to 3 locus orientation. As a locus may have, on the chromosome, a forward (or 'Watson') (FWD) or a reverse (REV) orientation (IMGT ® http://www.imgt.org (accessed on 20 February 2022), IMGT Web resources > IMGT Index > Genomic orientation), the sequence orientation in the IMGT accession number flat file is either unchanged (direct) relative to the sequence on the chromosome for an FWD locus, or reverse complemented for a REV locus. For example, the rhesus macaque (Macaca mulatta) IGL locus orientation on chromosome 10 is reverse (REV) and the IMGT/LIGM-DB locus reference sequence in IMGT000062 is reverse-complement relative to the sequence on chromosome 10.
The information from 'Locus in genome assembly' (

IMGT LOCUS-UNIT Label and Qualifiers
The label IMGT-LOCUS-UNIT (DESCRIPTION axiom) was created to describe a locus, isolated from a genome assembly, in an IMGT accession number flat file. The definition of the IMGT-LOCUS-UNIT and its qualifiers are given in Table 1. Table 1. The IMGT-LOCUS-UNIT label and its associated qualifiers and definitions.

IMGT New Label and Associated Qualifiers Definition
IMGT_locus_name Name of an IMGT-LOCUS-UNIT, that includes the genus and species Latin names and the IMGT locus type (i.e., in higher vertebrates: IGH, IGK, IGL, TRA, TRB, TRG, TRD)

IMGT Locus 5 and 3 Bornes
The IMGT Locus 5 borne (IMGT_locus_5prime_borne) and the IMGT Locus 3 borne (IMGT_locus_3prime_borne) ( Table 2) are defined for a standardized comparison of the IG and TR locus delimitation across species. The IMGT bornes are genes coding for a protein (other than IG or TR), conserved between species, located upstream of the first gene (for the IMGT 5 borne) or downstream of the last gene (for the IMGT 3 borne) of an IG or TR locus (IMGT ® http://www.imgt.org (accessed on 20 February 2022), IMGT Web resources > IMGT Repertoire (IG and TR) > 1. Locus and genes > 3. Locus descriptions > Locus bornes: IGH, IGK, IGL, TRA, TRB, TRG, TRD). If IMGT bornes are not yet identified or are too distant to be included in the locus sequence, a minimal 10 kb sequence is added upstream of the first IG or TR gene in 5 and/or downstream from the last IG or TR gene in 3 . A preliminary overview of the locus IG and TR 5 and 3 bornes is shown in Table 2.

IMGT/GENE-DB Localization in Genome Assemblies
The section "LOCALIZATION IN GENOME ASSEMBLIES" (Figure 4) integrated in 2015 in IMGT/GENE-DB, allows, for a given species and a given locus, to query the IMGT IG or TR genes of a given genome assembly. The query Species: Macaca mulatta|AG07107 (AG07107 is the isolate) and Locus: IGH locus shows the availability of IMGT/GENE-DB biocurated genes for the assembly 'Mmul_10, NCBI', 'Primary Assembly' 'Full chromosome 7 ( Figure 4A).

IMGT/GENE-DB Localization in Genome Assemblies
The section "LOCALIZATION IN GENOME ASSEMBLIES" (Figure 4) integrated in 2015 in IMGT/GENE-DB, allows, for a given species and a given locus, to query the IMGT IG or TR genes of a given genome assembly. The query Species: Macaca mulatta|AG07107 (AG07107 is the isolate) and Locus: IGH locus shows the availability of IMGT/GENE-DB biocurated genes for the assembly 'Mmul_10, NCBI', 'Primary Assembly' 'Full chromosome 7′ ( Figure 4A).  The list of genes known to belong to the locus but not localized (NL) in the assembly is also provided in this section as this may correspond to polymorphism by copy number variation, insertion/deletion, or gaps in the assembly.

IGH Locus Representation
The Homo sapiens IGH locus is located on chromosome 14, at the telomeric extremity of the long arm, at band 14q32.33 [2]. The orientation of the locus reverse (REV) on the chromosome has been determined by the analysis of translocations, involving the IGH locus, in leukemia and lymphoma. The Homo sapiens IGH locus spans 1250 kb [2,5] ( Figure  5). The human IGH locus consists of 123 to 129 IGHV genes depending on the haplotypes, 27 IGHD genes belonging to seven subgroups, nine IGHJ genes and, in the most frequent haplotype, 11 IGHC genes [2,5]. Eighty-two to 89 IGHV genes belong to seven subgroups, whereas 43 pseudogenes, which are too divergent to be assigned to subgroups, have been assigned to the clans [2,5] ( Figure 5). The list of genes known to belong to the locus but not localized (NL) in the assembly is also provided in this section as this may correspond to polymorphism by copy number variation, insertion/deletion, or gaps in the assembly.

IGH Locus Representation
The Homo sapiens IGH locus is located on chromosome 14, at the telomeric extremity of the long arm, at band 14q32.33 [2]. The orientation of the locus reverse (REV) on the chromosome has been determined by the analysis of translocations, involving the IGH locus, in leukemia and lymphoma. The Homo sapiens IGH locus spans 1250 kb [2,5] (Figure 5). The human IGH locus consists of 123 to 129 IGHV genes depending on the haplotypes, 27 IGHD genes belonging to seven subgroups, nine IGHJ genes and, in the most frequent haplotype, 11 IGHC genes [2,5]. Eighty-two to 89 IGHV genes belong to seven subgroups, whereas 43 pseudogenes, which are too divergent to be assigned to subgroups, have been assigned to the clans [2,5] (Figure 5).

Figure 5.
Representation of the human IGH locus at 14q32.33 (REV orientation on the chromosome) [2,5]. The boxes representing the genes are not to scale. Exons are not shown. Switch sequences are represented by a filled circle upstream of the IGHC genes. Pseudogenes that could not be assigned to subgroups with functional genes are designated by a Roman numeral between parentheses, corresponding to the clans, followed by a hyphen, and a number for the localization from 3′ to 5′ in the locus [2]. IMGT ® http://www.imgt.org (accessed on 20 February 2022), IMGT Repertoire (IG and TR) 1. Locus and genes > 2. Locus representations > IGH: Human (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org) (accessed on 20 February 2022).

IGH Locus Gene order
The relative positions (locus gene order) of the IGHV, IGHD, IGHJ and IGHC genes are shown in the Homo sapiens IGH locus from its 5′ end to its 3′ end (Table 3). Gene order is according to the IMGT Locus gene order (IMGT ® http://www.imgt.org (accessed on 20 February2022 ), IMGT Repertoire (IG and TR) > 1. Locus and Genes > 3. Locus descriptions > Locus gene order > IGH. The number '0' indicates that the relative position is unknown. The three most recently identified genes (IGHV1-68D, IGHV(III-67-4D, IGHV(III-67-3D) Figure 5. Representation of the human IGH locus at 14q32.33 (REV orientation on the chromosome) [2,5]. The boxes representing the genes are not to scale. Exons are not shown. Switch sequences are represented by a filled circle upstream of the IGHC genes. Pseudogenes that could not be assigned to subgroups with functional genes are designated by a Roman numeral between parentheses, corresponding to the clans, followed by a hyphen, and a number for the localization from 3 to 5 in the locus [2]. IMGT ® http://www.imgt.org (accessed on 20 February 2022), IMGT Repertoire (IG and TR) 1. Locus and genes > 2. Locus representations > IGH: Human (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org) (accessed on 20 February 2022).

IGH Locus Gene order
The relative positions (locus gene order) of the IGHV, IGHD, IGHJ and IGHC genes are shown in the Homo sapiens IGH locus from its 5 end to its 3 end (Table 3). Gene order is according to the IMGT Locus gene order (IMGT ® http://www.imgt.org (accessed on 20 February2022 ), IMGT Repertoire (IG and TR) > 1. Locus and Genes > 3. Locus descriptions > Locus gene order > IGH. The number '0' indicates that the relative position is unknown. The three most recently identified genes (IGHV1-68D, IGHV(III-67-4D, IGHV(III-67-3D) are numbered as insertions with a dot (gene order 17.1, 17.2 and 17.3) in the gene order to preserve comparisons with the reference ruler adopted in the description of the Homo sapiens IGH polymorphisms [5]. Genes of the related proteins of interest (RPI) used as markers in the locus are indicated with their orientation.
The gene IGHV(II)-20-1 (gene order 129) is an exception as being only represented by a V-RS. Its V-REGION is replaced by the Alu Homo-sapiens_IGH_Alu-20-1 preceded by an undetermined region and the Alu Homo-sapiens_IGH_Alu-20-3 (AC245166 accession number in IMGT/LIGM-DB). Table 3. Homo sapiens IGH locus: IMGT gene order and copy number variations (CNV).

IMGT Gene Name
Functionality IMGT Gene Order in Locus        Table 3, six for the IGHV genes (CNV1 to CNV6) and one (CNV7) for the IGHC genes. The IMGT CNV nomenclature comprises the genus and species (Latin names in italics) (i.e., Homo sapiens), the locus (i.e., IGH) and the CNV number (i.e., CNV1). A CNV is delimited by a 5prime gene and a 3prime gene.

IMGT Gene Copy Number Variations (CNV) Nomenclature, RPI Aliases (If Recent Changes) Orientation in Locus
The IMGT standardized definition of a CNV comprises, the group, then between parentheses the order of the start gene (the gene which follows the CNV-5prime), a dash and the order of the end gene (the gene which precedes the CNV-3prime), then the total number of genes involved in the CNV (between the 5prime and 3prime, including RPI gene(s) if present) followed, between parentheses, by the number of IG or TR per functionality (and the number of RPI if present). For instance, the definition of 'Homo sapiens IGH CNV1 is 'IGHV(17-20)7(3F,4P)' ( Table 3). The letters 'i' for insertion, 'd' for deletion, 'e' for exchange added to the CNV number indicates the status of a given gene, in a given haplotype for a given CNV, for instance, CNV1i [5] (p 39-40).

3.
IMGT CNV haplotypes illustrated with the Homo sapiens IGH locus IMGT CNV haplotypes are described based on the variability of the number of genes present for a given CNV. The description of the CNV is achieved by comparison with the IGH locus [2,5] (IMGT ® http://www.imgt.org (accessed on 20 February 2022), IMGT Web resources > IMGT Repertoire (IG and TR) > 1. Locus and genes > 2. Locus representation > IGH: Human). The horizontal main line is conventionally referred to as 'haplotype A'.
A well-characterized CNV example is the Homo sapiens IGH CNV3 IGHV(87-112)26 (8F,16P,2RPI), for which six haplotypes A to F [5,141] correspond to polymorphic amplifications of genes, found in individuals of different populations. These polymorphisms are described as insertion/deletion between IGHV4-34 (86, CNV3-5prime) and IGHV4-28 (113, CNV3-3prime) ( Table 4). Haplotype A is from GRCh37 and corresponds to the main line of IMGT Locus Representation [2] (Figure 5). Haplotype B is from GRCh38 and corresponds to BAC clone sequences [141] from the CHORI-17 BAC library. The CNV corresponds to the amplification of a motif of three or four genes: the first one is a pseudogene belonging to the IGHV(II) clan, the second one is a functional gene belonging to the IGHV3 subgroup (in blue) or to the IGHV4 subgroup (in green) and the third one (or the fourth if presence of GOLGA4P1 or GOLGA4P2 (yellow) or IGHV4-30-1) is a pseudogene of the IGHV3 subgroup (Table 4).   The Homo sapiens IGH locus on chromosome 14 (14q32.33) is characterized by a remarkable IGH CNV, the CNV7 IGHC(203-211)9(7F,1OP,1P) with seven haplotypes A to G (Table 4), with six of them (haplotypes B to G) corresponding to multigene deletions I to VI (Figure 6), identified on both chromosomes 14 in healthy individuals lacking several subclasses [2,5]. Multigene deletions of haplotypes B to G (either identical or different, on both chromosomes in a given individual) are designated I to VI according to the chronological order in which they were found (reviewed in [5]). Deletion I, first identified by the absence of the Gm1 allotypes in a 70-year-old healthy Tunisian woman (TAK3), homozygous for that deletion [142,143] allowed the ordering of the Homo sapiens IGHC genes in the IGH locus [144,145]. Deletions I and II [142,143,146] (haplotypes B and C), found in healthy individuals from consanguineous families, involve highly homologous spots of recombination [147], as also described in a healthy individual (T17) homozygous for deletion III (haplotype D) and lacking IgA1, IgG2, IgG4 and IgE [148]. The Homo sapiens IGH locus on chromosome 14 (14q32.33) is characterized by a remarkable IGH CNV, the CNV7 IGHC(203-211)9(7F,1OP,1P) with seven haplotypes A to G (Table 4), with six of them (haplotypes B to G) corresponding to multigene deletions I to VI (Figure 6), identified on both chromosomes 14 in healthy individuals lacking several subclasses [2,5]. Multigene deletions of haplotypes B to G (either identical or different, on both chromosomes in a given individual) are designated I to VI according to the chronological order in which they were found (reviewed in [5]). Deletion I, first identified by the absence of the Gm1 allotypes in a 70-year-old healthy Tunisian woman (TAK3), homozygous for that deletion [142,143] allowed the ordering of the Homo sapiens IGHC genes in the IGH locus [144,145]. Deletions I and II [142,143,146] (haplotypes B and C), found in healthy individuals from consanguineous families, involve highly homologous spots of recombination [147], as also described in a healthy individual (T17) homozygous for deletion III (haplotype D) and lacking IgA1, IgG2, IgG4 and IgE [148].
The Homo sapiens IGH locus on chromosome 14 (14q32.33) is characterized by a remarkable IGH CNV, the CNV7 IGHC(203-211)9(7F,1OP,1P) with seven haplotypes A to G (Table 4), with six of them (haplotypes B to G) corresponding to multigene deletions I to VI (Figure 6), identified on both chromosomes 14 in healthy individuals lacking several subclasses [2,5]. Multigene deletions of haplotypes B to G (either identical or different, on both chromosomes in a given individual) are designated I to VI according to the chronological order in which they were found (reviewed in [5]). Deletion I, first identified by the absence of the Gm1 allotypes in a 70-year-old healthy Tunisian woman (TAK3), homozygous for that deletion [142,143] allowed the ordering of the Homo sapiens IGHC genes in the IGH locus [144,145]. Deletions I and II [142,143,146] (haplotypes B and C), found in healthy individuals from consanguineous families, involve highly homologous spots of recombination [147], as also described in a healthy individual (T17) homozygous for deletion III (haplotype D) and lacking IgA1, IgG2, IgG4 and IgE [148].

IGK Locus Representation
The Homo sapiens IGK locus is located on chromosome 2, on the short arm, at band 2p11.2 [2]. The orientation of the locus reverse (REV) on the chromosome has been determined by the analysis of translocations, involving the IGK locus, in leukemia and lymphoma. The Homo sapiens IGK locus spans 1820 kb [2,5] (Figure 7). The human IGK locus consists of 76 IGKV genes belonging to seven subgroups, five IGKJ genes and a unique IGKC gene [2,5] (Figure 7). The 76 IGKV genes are organized in two clusters separated by 800 kb [2,5]. The IGKV distal cluster in 5′ of the locus and in the most centromeric position) spans 400 kb and comprises 36 genes. The IGKV proximal cluster (in 3′ of the locus, closer to IGKC, and in the most telomeric position) spans 600 kb and comprises 40 genes [2,5] (Figure 7).

IGK Locus Representation
The Homo sapiens IGK locus is located on chromosome 2, on the short arm, at band 2p11.2 [2]. The orientation of the locus reverse (REV) on the chromosome has been determined by the analysis of translocations, involving the IGK locus, in leukemia and lymphoma. The Homo sapiens IGK locus spans 1820 kb [2,5] (Figure 7). The human IGK locus consists of 76 IGKV genes belonging to seven subgroups, five IGKJ genes and a unique IGKC gene [2,5] (Figure 7). The 76 IGKV genes are organized in two clusters separated by 800 kb [2,5]. The IGKV distal cluster in 5 of the locus and in the most centromeric position) spans 400 kb and comprises 36 genes. The IGKV proximal cluster (in 3 of the locus, closer to IGKC, and in the most telomeric position) spans 600 kb and comprises 40 genes [2,5] (Figure 7).

Figure 7.
Representation of the human IGK locus at 2p12 (REV orientation on the chromosome) [2,5]. The boxes representing the genes are not to scale. Exons are not shown. The IGKV genes of the proximal V-CLUSTER are designated by a number for the subgroup, followed by a hyphen and a number for the localization from 3′ to 5′ in the locus. The IGKV genes of the distal duplicated V-CLUSTER are designated by the same numbers as the corresponding genes in the proximal V-CLUS-TER, with the letter D added. Arrows show the IGKV genes polarity which is opposite to that of the J-C-CLUSTER [2]. IMGT ® http://www.imgt.org (accessed on 20 February 2022), IMGT Repertoire (IG and TR) 1. Locus and genes > 2. Locus representations > IGK: Human (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGe-neTics information system ® , http://www.imgt.org) (accessed on 20 February 2022).

IGK Locus Gene Order, CNV and Haplotypes
IGK Gene order (  [2,5]. The Homo sapiens IGK CNV1 IGKV(1-36)36(16F,4O,14P,1FO,1FP) corresponds to two haplotypes ( Table 5). The first one (haplotype A), by far the most common in the populations, is characterized by the presence of the distal cluster in 5 of the IGK locus and in the most centromeric position (Figure 7). This distal cluster results from the duplication of 36 genes of the proximal cluster and spans 400 kb. The haplotype B lacking the distal cluster has only been found once [2].

IGL Locus Representation
The Homo sapiens IGL locus is located on chromosome 22, on the long arm, at band 22q11.2 [2], The orientation of the locus forward (FWD) on the chromosome has been determined by the analysis of translocations, involving the IGL locus, in leukemia and lymphoma. The Homo sapiens IGL locus spans 1050 kb [2,5] (Figure 8). The human IGL locus consists of 73-74 IGLV genes belonging to 11 subgroups, 7 to 11 IGLJ and 7 to 11 IGLC genes depending on the haplotypes, each IGLC gene being preceded by one IGLJ gene [2,5] (Figure 8). The IGLV genes localized on 900 kb define three distinct V-CLUSTER (A, B, C) based on the IGLV gene subgroup content [149] (Figure 8).  [149]. Pseudogenes that could not be assigned to subgroups with functional genes are designated by a Roman numeral between parentheses, corresponding to the clans, followed by a hyphen, and a number for the localization from 3′ to 5′ in the locus. IMGT ® http://www.imgt.org (Accessed on 20 February 2022), IMGT Repertoire (IG and TR) 1. Locus and genes > 2. Locus representations > IGL: Human (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGe-neTics information system ® , http://www.imgt.org) (Accessed on 20 February 2022)..

IGL Lous Gene Order, CNV and Haplotypes
IGL gene order is according to the IMGT Locus gene order (IMGT ® http://www.imgt.org (accessed on 20 February 2022), IMGT Repertoire (IG and TR) > 1. Locus and Genes > 3. Locus descriptions > Locus gene order > IGL One, two, three or four additional IGLC genes, each one most probably preceded by one IGLJ, have been shown to characterize IGLC haplotypes with 8, 9, 10 or 11 genes [150,151] (Table 6). Although these genes have not yet been systematically sequenced, the evidence of the polymorphisms is strongly supported by restriction fragment length polymorphisms (RFLP) and Southern blot analysis [150,151].   [149]. Pseudogenes that could not be assigned to subgroups with functional genes are designated by a Roman numeral between parentheses, corresponding to the clans, followed by a hyphen, and a number for the localization from
One, two, three or four additional IGLC genes, each one most probably preceded by one IGLJ, have been shown to characterize IGLC haplotypes with 8, 9, 10 or 11 genes [150,151] (Table 6). Although these genes have not yet been systematically sequenced, the evidence of the polymorphisms is strongly supported by restriction fragment length polymorphisms (RFLP) and Southern blot analysis [150,151].    involving J and C genes is in pale blue. In the haplotype representation, CNV-5prime and CNV-3prime as well as the duplicated genes present in the haplotypes B, C, D, E are in orange. A pale orange color indicates that these positions correspond to insertion in other haploypes.

TRB Locus Representation
The Homo sapiens TRB locus is located on chromosome 7, on the long arm, at band 7q34 [3]. The orientation of the locus forward (FWD) on the chromosome has been determined by the analysis of translocations, involving the TRB locus, in leukemia and lymphoma. The Homo sapiens TRB locus spans 620 kb [3] (Figure 9). The human TRB locus consists of 64-67 TRBV genes belonging to 32 subgroups. Except for TRBV30, localized downstream of the TRBC2 gene, in the inverted orientation of transcription, all the other TRBV genes are located upstream of a duplicated D-J-C-cluster, which comprises, for the first part TRBD1, six TRBJ and the TRBC1 gene, and for the second part, TRBD2, eight TRBJ and the TRBC2 gene [3] (Figure 9). MOXDP2 (monooxygenase DBH-like 2) (5 borne, opposite orientation relative to the locus) is located upstream of PRSS58 (serine protease 58, trypsinogen-like TRYX3, TRY1) opposite orientation relative to the locus, identified 41 kb upstream of TRBV1 (P). EPHB6 (EPH receptor B6) (3 borne, direct orientation relative to the locus) has been identified 41 kb downstream of TRBV30 (F), the most 3 gene in the locus. mined by the analysis of translocations, involving the TRB locus, in leukemia and lymphoma. The Homo sapiens TRB locus spans 620 kb [3] (Figure 9). The human TRB locus consists of 64-67 TRBV genes belonging to 32 subgroups. Except for TRBV30, localized downstream of the TRBC2 gene, in the inverted orientation of transcription, all the other TRBV genes are located upstream of a duplicated D-J-C-cluster, which comprises, for the first part TRBD1, six TRBJ and the TRBC1 gene, and for the second part, TRBD2, eight TRBJ and the TRBC2 gene [3] (Figure 9).  A polymorphism by insertion/deletion of 3 genes between the TRBV4-2 and TRBV7-2 genes, encompassing 21 kb, has been described in the human TRB locus [152,153]. It corresponds to haplotype A (L36092) and haplotype B (L36190) and involves three TRBV genes: the pseudogene TRBV3-2, and the functional TRBV4-3 and TRBV6-3 genes [154]. The CNV has been defined as Homo sapiens TRB CNV1 TRBV (11)(12)(13)(14)4(3F, 1P) ( Table 7). A second CNV, Homo sapiens TRB CNV2 T4-T8(70-74)5(nr) involves trypsinogene-like genes localized between TRBV29-1 and TRBD1 (Table 6). Two haplotypes have been described, with haplotype B having a deletion of two genes T7 and T8. Detailed sequence analysis of this CNV and characterization of new haplotypes may represent markers of the evolution of the TRB locus between populations and between species.   The Homo sapiens TRA locus is located on chromosome 14, on the long arm, at band 14q11.2 [3]. The orientation of the locus forward (FWD) on the chromosome has been determined by the analysis of translocations, involving the TRA and TRD loci, in leukemia and lymphoma. The Homo sapiens TRA spans 1000 kb [3] (Figure 10). The human TRA locus consists of 54 TRAV genes belonging to 41 subgroups, 61 TRAJ genes localized on 71 kb, and a unique TRAC gene [3] (Figure 10). The organization of the TRAJ genes on a large area is quite unusual and has not been observed in the other IG or TR loci. Moreover, the TRD locus is nestled in the TRA locus between the TRAV and TRAJ genes [3] (Figure 10). V-J rearrangements in the TRA locus, therefore, result in the deletion of the TRD D-J-C cluster genes localized on the same chromosome. This occurs in two steps: first, the deletion of the TRD D-J-C cluster, which results from a rearrangement between deltaRec (sequence located upstream of the cluster) and pseudoJalpha (sequence located downstream of the cluster (this rearrangement generates a T cell receptor excision circle (TREC), a biomarker for normal T cell development), then a TRAV to TRAJ rearrangement.
No 5 borne conserved between species has been identified upstream of TRAV1-1 (F), the most 5 gene in the locus. DAD1 (defender against cell death) (3' borne) has been identified 13 kb downstream of TRAC (F), the most 3' gene in the locus ( Figure 10). Figure 10. Representation of the human TRA locus at 14q11.2 (FWD orientation on the chromosome) [3]. The boxes representing the genes are not to scale. Exons are not shown. The TRAV genes are designated by a number for the subgroup, followed, whenever there are several genes belonging to the same subgroup, by a hyphen and a number for their relative localization in the locus. Numbers increase from 5 to 3 in the locus. The TRD genes are nestled in the TRA locus [3]. IMGT ® http://www.imgt.org (accessed on 20 February 2020), IMGT Repertoire (IG and TR) 1. Locus and genes > 2. Locus representations > TRA: Human (With permission from M-P. Lefranc  Although no CNV or haplotype has been described for the Homo sapiens TRA/TRD locus, the corresponding columns are available to provide a frame for future descriptions (Table 8).    The Homo sapiens TRG locus is located on chromosome 7, on the short arm, at band 7p14 [3]. The orientation of the locus reverse (REV) on the chromosome has been determined by the analysis of chromosome 7 inversions inv(7)(p14-q34), involving the TRG and TRB loci in ataxia-telangiectasia patients, and in leukemia. The Homo sapiens TRG locus spans 160 kb [3,115] (Figure 11). The human TRG locus consists of 12-15 TRGV genes belonging to 6 subgroups, upstream of a duplicated J-C cluster, which comprises, for the first part, three TRGJ and the TRGC1 gene, and for the second part, two TRGJ and the TRGC2 gene [3,115] (Figure 11). TRGV9, expressed in 80-95% of the human peripheral γδ T cells, is the unique member of subgroup 2. TRGV10 and TRGV11, single members of subgroups 3 and 4, respectively, have been found rearranged and transcribed, but they are ORF that cannot be expressed in a gamma chain, due to a splicing defect of the premessenger [3]. determined by the analysis of chromosome 7 inversions inv(7)(p14-q34), involving the TRG and TRB loci in ataxia-telangiectasia patients, and in leukemia. The Homo sapiens TRG locus spans 160 kb [3,115] (Figure 11). The human TRG locus consists of 12-15 TRGV genes belonging to 6 subgroups, upstream of a duplicated J-C cluster, which comprises, for the first part, three TRGJ and the TRGC1 gene, and for the second part, two TRGJ and the TRGC2 gene [3,115] (Figure 11). TRGV9, expressed in 80-95% of the human peripheral γδ T cells, is the unique member of subgroup 2. TRGV10 and TRGV11, single members of subgroups 3 and 4, respectively, have been found rearranged and transcribed, but they are ORF that cannot be expressed in a gamma chain, due to a splicing defect of the premessenger [3].  AMPH (amphiphysin) (5 borne) has been identified 16 kb upstream of TRGV1 (ORF), the most 5 gene in the locus. STARD3NL (STARD3 N-terminal like) (3 borne) has been identified 9,4 kb downstream of TRGC2 (F), the most 3 gene in the locus. 6.3.2. TRG Gene Order, CNV and Haplotypes TRG gene order is according to the IMGT Locus gene order (IMGT ® http://www.im gt.org (accessed on 20 February 2022), IMGT Repertoire (IG and TR) > 1. Locus and Genes > 3. Locus descriptions > Locus gene order > TRG. The total number of TRG genes per haploid genome is 19 or 22 of which 11 to 13 are functional (Table 9) [3]. Polymorphisms in the number of TRGV genes and in the exon number of the TRGC2 gene have been described in different populations [155][156][157][158][159][160][161]. A variation of the number of the TRGV subgroup genes (from seven to ten) has been observed [156,157,161]. These allelic polymorphisms, which result from the deletion of V4 and V5, or from the insertion of an additional V gene V3P, between V3 and V4, can be detected by restriction fragment polymorphism (RFLP) [115,156,157,161]. The two TRGC genes, which are 16 kb apart, result, with their associated TRGJ genes, from a recent duplication in the locus. However, there are several structural differences [115]. TRGJP1, TRGJ1, and TRGC1 cross-hybridize to TRGJP2, TRGJ2, and TRGC2, respectively [158][159][160][161], whereas the TRGJP has no equivalent in the duplicated TRGJP2-J2-C2 cluster [160]. The TRGC1 gene has three exons [158], whereas the TRGC2 gene has four or five exons, owing to the duplication of a region that includes exon 2 [155]. The allelic polymorphism of the TRGC2 gene with duplication (C2(2x)) or triplication (C2(3x)) of exon 2 can be identified by RFLP [155]. The exon 2 of the TRGC1 gene has a cysteine [159] involved in the interchain disulfide bridge, whereas this cysteine is not conserved in the exon 2 of the human TRGC2 gene. Enhancer and silencer sequences have been characterized by 6.5 kb downstream of the TRGC2 gene [162].

IUIS NOM IMGT-NC Validation of IMGT ® Creations and Updates
IMGT ® Creations and updates (http://www.imgt.org/IMGTinformation/creations/) (accessed on 20 February 2022) are published on the IMGT site after completion of the biocuration of new IG and TR loci, genes and/or alleles.
The lists of the data available for validation by IUIS NOM IMGT-NC comprise: The IUIS NOM validation consists in the control of the conformity of the data to the IUIS NOM IMGT-NC requirements for nomenclature assignment and to the IMGT Scientific chart rules based on CLASSIFICATION (genes and alleles names) NUMEROTATION (IMGT unique numbering), DESCRIPTION (labels) and that of their presentation in the IMGT Repertoire (IG and TR) [1][2][3]114].
The seven IG and TR loci of the dog (Canis lupus familiaris) and Rhesus monkey (Macaca mulatta) have been fully annotated. Species for which most of the IG and TR loci are annotated include cat (Felis catus), bovine (Bos taurus), sheep (Ovis aries) and goat (Capra hircus). Standardized IMGT biocuration led to the comparative study of the T cell receptor beta locus of veterinary species [163] based on the Homo sapiens TRB locus and to a comparative analysis of Bos taurus and Ovis aries TRA/TRD loci [164]. A recent comparative study on the evolution of the TRG locus in mammals [165] has highlighted the benefice of using the same IMGT standards, for the same locus, across species.
If an 'IMGT ® Creations and updates' correspond to genes previously approved in an 'IUIS NOM IMGT-NC Report', a link to the report on the IUIS site is provided [114]. This includes the reports of inferred alleles (new potential alleles deduced by inference from high-throughput sequencing of expressed repertoires) submitted by the "Inferred Allele Review Committee" (IARC) working group of the "Adaptive Immune Receptor Repertoire" (AIRR) community [166]. Table 10. IMGT ® Creations and updates: List of the web pages validated by IUIS NOM IMGT-NC. IGH  IGK  IGL  TRB  TRA  TRD  TRG   Locus representation  IGH  IGK  IGL  TRB  TRA/TRD  TRG   Locus bornes  IGH  IGK  IGL  TRB  TRA/TRD  TRG   Locus description  IGH  IGK  IGL  TRB  TRA/TRD  TRG   Locus gene order  IGH  IGK  IGL  TRB  TRA/TRD  TRG   Locus in genome assembly  IGH  IGK  IGL  TRB  TRA/TRD  TRG   Gene table: V  IGHV  IGKV  IGLV  TRBV  TRAV  TRDV  TRGV   Gene table: D  IGHD  nr  nr  TRBD  nr  TRDD  nr   Gene table: J  IGHJ  IGHJ  IGHJ  TRBJ  TRAJ  TRDJ  TRGJ   Gene table: C  IGHC  IGHC  IGHC  TRBC  TRAC  TRDC  TRGC   Potential germline repertoire  IGHV  IGHD IGHJ   IGKV  IGLV  TRBV  TRBD  TRBJ   TRAV  TRDV  TRDD  TRDJ TRGV V, D, J or V, J nr nr nr nr IGKJ IGLJ TRAJ TRGJ X X X X X X X
Author Contributions: Conceptualization, methodology, validation, investigation, resources, data curation, writing, review and editing, visualization, project administration, ontology, funding acquisition, M.-P.L. and G.L. All authors have read and agreed to the published version of the manuscript.