Immunoglobulins or Antibodies: IMGT® Bridging Genes, Structures and Functions

IMGT®, the international ImMunoGeneTics® information system founded in 1989 by Marie-Paule Lefranc (Université de Montpellier and CNRS), marked the advent of immunoinformatics, a new science at the interface between immunogenetics and bioinformatics. For the first time, the immunoglobulin (IG) or antibody and T cell receptor (TR) genes were officially recognized as ‘genes’ as well as were conventional genes. This major breakthrough has allowed the entry, in genomic databases, of the IG and TR variable (V), diversity (D) and joining (J) genes and alleles of Homo sapiens and of other jawed vertebrate species, based on the CLASSIFICATION axiom. The second major breakthrough has been the IMGT unique numbering and the IMGT Collier de Perles for the V and constant (C) domains of the IG and TR and other proteins of the IG superfamily (IgSF), based on the NUMEROTATION axiom. IMGT-ONTOLOGY axioms and concepts bridge genes, sequences, structures and functions, between biological and computational spheres in the IMGT® system (Web resources, databases and tools). They provide the IMGT Scientific chart rules to identify, to describe and to analyse the IG complex molecular data, the huge diversity of repertoires, the genetic (alleles, allotypes, CNV) polymorphisms, the IG dual function (paratope/epitope, effector properties), the antibody humanization and engineering.


Introduction 2. Immunoglobulin (IG) or Antibody Molecular Genetics
. H-gamma1 and L-kappa, written in small letters in the text, corresponf to the IMGT standardized keywords (IDENTIFICATION) [1,30,31]. The two light chains are identical and the two heavy chains are identical (the different colours are only used for a better visualization). (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org). Table 1. Immunoglobulin (IG) receptor, chain and domain structure labels and correspondence with sequence labels [34]. IMGT standardized labels are in capital letters [34,35] [1,34,35]. H-gamma1 and L-kappa, written in small letters in the text, corresponf to the IMGT standardized keywords (IDENTIFICATION) [1,30,31]. The two light chains are identical and the two heavy chains are identical (the different colours are only used for a better visualization).

IG Structure Labels (IMGT/3Dstructure-DB) [57-59]
Sequence Labels (IMGT/LIGM-DB) [ The Fc region, formed by the C-terminal domains CH2-CH3 of the two heavy H-gamma1 chains, interacts with effector molecules such as the complement component C1q and the Fc receptors [2]. The binding of these effector molecules to the Fc of antibodies coated at the surface of foreign antigens trigger elimination processes. Activation of the classical complement cascade generates a variety of potent biological molecules, which promote phagocytosis, chemotaxis and formation of the membrane attack complex, resulting in cell lysis. The pathway is triggered by the interaction of C1, a protein complex of C1q, C1r and C1s, with antigen-antibody complexes. It is the C1q head region which interacts directly with the immunoglobulin Fc. Binding of antibody-antigen complexes or aggregated immunoglobulins to the Fc receptors triggers cell functions which serve important roles against pathogenic agents as well as in the regulation of antibody production.

B Cell Differentiation
Immunoglobulins are expressed as membrane immunoglobulins (mIG) on the surface of the B lymphocytes (mature B cells and memory B cells) as part of a B cell receptor (BcR) or as soluble immunoglobulins secreted by plasma cells [2] (Figure 3). Figure 3. B cell differentiation with the antigen independent phase in the bone marrow and the antigen dependent phase in the secondary lymphoid organs [2]. (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org).
The different stages of B cell differentiation from the hematopoietic stem cells into mature B cells which express IgM and IgD, occur in the bone marrow and are antigen independent [2] (Figure 3). The final differentiation stages, from the mature B cells into memory cells or plasma cells that express or secrete IG from various classes or subclasses occur in the germinal centers of the secondary Figure 3. B cell differentiation with the antigen independent phase in the bone marrow and the antigen dependent phase in the secondary lymphoid organs [2]. (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org).
The different stages of B cell differentiation from the hematopoietic stem cells into mature B cells which express IgM and IgD, occur in the bone marrow and are antigen independent [2] (Figure 3). The final differentiation stages, from the mature B cells into memory cells or plasma cells that express or secrete IG from various classes or subclasses occur in the germinal centers of the secondary lymphoid organs, and are antigen dependent, and generally requires cooperation between B and T cells [2] ( Figure 3).

Membrane Immunoglobulins and B Cell Receptor
The B cell reeptor BcR (Figure 4) comprises an antigen recognition unit constituted by a membrane immunoglobulin (mIG), on mature B cells (IgM, IgD) or on memory B cells (IgM, IgA, IgG, IgE), anchored in the membrane and a signalling coreceptor constituted by two heterodimers CD79A (Ig-alpha, mb-1, MB-1)/CD79B (Ig-beta, B29). The CD79A/CD79B dimers ensure the signal transmission when the membrane IG binds to an antigen. The CD79A and CD79B chains are composed of a single IgSF C-like domain and exist at the cell surface as a disulfide-linked heterodimer and contain, in their cytoplasmic region (CY), an immunoreceptor tyrosine-based activation motif (ITAM) (Figure 4) (IMGT ® http://www.imgt.org, IMGT Repertoire (RPI) > IMGT RPI entries from gene to protein > IgSF other than IG or TR > CD79A; ibid > CD79B).
Biomedicines 2019, 7, x FOR PEER REVIEW 9 of 117 lymphoid organs, and are antigen dependent, and generally requires cooperation between B and T cells [2] (Figure 3).

Membrane Immunoglobulins and B Cell Receptor
The B cell reeptor BcR ( Figure 4) comprises an antigen recognition unit constituted by a membrane immunoglobulin (mIG), on mature B cells (IgM, IgD) or on memory B cells (IgM, IgA, IgG, IgE), anchored in the membrane and a signalling coreceptor constituted by two heterodimers CD79A (Ig-alpha, mb-1, MB-1)/CD79B (Ig-beta, B29). The CD79A/CD79B dimers ensure the signal transmission when the membrane IG binds to an antigen. The CD79A and CD79B chains are composed of a single IgSF C-like domain and exist at the cell surface as a disulfide-linked heterodimer and contain, in their cytoplasmic region (CY), an immunoreceptor tyrosine-based activation motif (ITAM) (Figure 4) (IMGT ® http://www.imgt.org, IMGT Repertoire (RPI) > IMGT RPI entries from gene to protein > IgSF other than IG or TR > CD79A; ibid > CD79B).  (IG) or antibody, here IgM, as a monomer H2L2, anchored in the membrane of the B cell (membrane IG or mIG) and the CD79 signalling coreceptors constituted of two heterodimers CD79A/CD79B (BcR = mIG + CD79 coreceptor). VH, CH1, CH2, CH3 and CH4 indicate the domains of the H-mu chains of the IgM. Depending on the light chain type, L-kappa or L-lambda, VL and CL correspond to V-kappa and C-kappa, or to V-lambda and C-lambda, respectively. ITAM motifs are indicated by the letters YLYL for tyrosyl (Y) and leucyl (L) amino acids. (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org). Figure 4) are rich in tyrosines and with a consensus (D/E)xxYxx(L/I)x6-8Yxx(L/I) [102]. Cross-linking of the BcR induces the tyrosylphosphorylation of the ITAM on the cytoplasmic region of CD79A and CD79B, and the signalling cascade leading to B cell  (IG) or antibody, here IgM, as a monomer H2L2, anchored in the membrane of the B cell (membrane IG or mIG) and the CD79 signalling coreceptors constituted of two heterodimers CD79A/CD79B (BcR = mIG + CD79 coreceptor). VH, CH1, CH2, CH3 and CH4 indicate the domains of the H-mu chains of the IgM. Depending on the light chain type, L-kappa or L-lambda, VL and CL correspond to V-kappa and C-kappa, or to V-lambda and C-lambda, respectively. ITAM motifs are indicated by the letters YLYL for tyrosyl (Y) and leucyl (L) amino acids. (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org). Figure 4) are rich in tyrosines and with a consensus (D/E)xxYxx(L/I)x [6][7][8] Yxx(L/I) [102]. Cross-linking of the BcR induces the tyrosylphosphorylation of the ITAM on the cytoplasmic region of CD79A and CD79B, and the signalling cascade leading to B cell activation, by recruitment of signalling molecules which belong to at least two families of protein tyrosine kinases (PTK), the Syk family and the Tec family, and provide signal transmission.

Secreted IG
Secreted IgG, IgD and IgE are monomeric, whereas IgM occurs as a pentamer. IgA occurs predominantly as a monomer in the serum and as a dimer in seromucous secretions.

IG Light Chain Types
The two light chain types, L-kappa and L-lambda, are common to all five classes. Either light chain type can associate with any of the heavy chain types, but in any particular immunoglobulin, both light and both heavy chains are identical. The kappa to lambda ratio in the serum of healthy individuals is approximately 2 to 1. Four lambda isotypes have been identified by the presence or absence of serological markers (Mcg, Kern and Oz) [2] (IMGT ® http://www.imgt.org, IMGT Repertoire (IG and TR) > 2. Proteins and alleles > 5. Isotypes: Human (Homo sapiens) IGLC). Three Km allotypes have been characterized [83,109,110] (IMGT ® http://www.imgt.org, IMGT Repertoire (IG and TR) > 2. Proteins and alleles > 4. Allotypes: Human (Homo sapiens) IGKC).

IG Variable Domains
The basic structure of an immunoglobulin (IG) or antibody comprises two identical heavy chains, associated with two identical light chains, kappa or lambda ( Figure 5). Each chain folds up into domains of approximately 100 to 110 amino acids. There are two domains for a light chain, and four or five domains for a heavy chain. The N-terminal domain of the light and heavy chains is the variable (V) domain which exhibits an enormous diversity between different IG. Each V domain comprises a beta-sheet framework (FR-IMGT) [33] supporting three hypervariable loops or complementarity determining region (CDR-IMGT) 1, 2 and 3 [32], which are spatially close to each other and constitute the recognition and antigen binding site ( Figure 2). The variable domain of a light chain, designated V-KAPPA or V-LAMBDA depending on the light chain type, is a V-J-REGION encoded by two rearranged genes (IGKV and IGKJ, or IGLV and IGLJ, respectively) ( Table 1).

IG Constant Domains
The other domains, designated as constant (C) domains, are identical between chains from the same class, subclass and with the same allotypes. The constant region or C-REGION of the heavy chain is encoded by one of the IGHC genes, and comprises three or four constant domains (CH1, CH2 and CH3, with a flexible hinge region between CH1 and CH2, for the H-gamma, H-alpha and H-delta chains of the IgG, IgA and IgD, CH1 to CH4 for the H-mu and H-epsilon chains) [118]. The hinge region located between the CH1 and CH2 of the H-gamma chains is encoded by one exon (for H-gamma1, H-gamma2 and H-gamma4) or several exons (2 for H-delta, 2-5 usually 4 for H-gamma3 [119,120]). In IgM and IgE, the CH2 replaces the hinge, and the CH3 and CH4 correspond to the CH2 and CH3 in IgG, IgD and IgA (IMGT ® http://www.imgt.org, IMGT Repertoire (IG and TR) > 1. Locus

IG Constant Domains
The other domains, designated as constant (C) domains, are identical between chains from the same class, subclass and with the same allotypes. The constant region or C-REGION of the heavy chain is encoded by one of the IGHC genes, and comprises three or four constant domains (CH1, CH2  and CH3, with a flexible hinge region between CH1 and CH2, for the H-gamma, H-alpha and H-delta  chains of the IgG, IgA and IgD, CH1 to CH4 for the H-mu and H-epsilon chains) [118]. The hinge region located between the CH1 and CH2 of the H-gamma chains is encoded by one exon (for H-gamma1, H-gamma2 and H-gamma4) or several exons (2 for H-delta, 2-5 usually 4 for H-gamma3 [119,120]). In IgM and IgE, the CH2 replaces the hinge, and the CH3 and CH4 correspond to the CH2 and CH3 in IgG, IgD and IgA (IMGT ® http://www.imgt.org, IMGT Repertoire (IG and TR) > 1. Locus and genes > 5. Gene exon/intron organization IGHC: Human) [2]. The C-REGION of the light chain is encoded by the IGKC gene (for the kappa chains), or one of the IGLC genes (for the lambda chains) [2], and comprises a unique constant domain (C-KAPPA or C-LAMBDA, respectively).
In intact immunoglobulins, the domains usually associate into pairs through multiple non-covalent lateral interactions. However the CH2 of the IgG, IgD and IgA, and the equivalent CH3 of the IgM and IgE are unpaired but stabilized with interposed N-linked, branched carbohydrate chains. All human IG heavy chains are glycosylated [121] (IMGT ® http://www.imgt.org, IMGT Education > IMGT Lexique > Immunoglobulin (IG) or antibody glycosylation). The number of potential N-glycosylation sites per heavy chain is reported in Table 3. The positions of the Asn (N) of the potential N-glycosylation sites are highlighted in the IGHC Protein display (N in green) or are identified by the underlined N-X-S/T (Asn-X-Ser/Thr) motif where X is any AA except Pro). O-glycosylation characterizes the hinge of the H-delta and H-alpha1 chains.
The molecular structure of the IG chains in domains (12 for IgG, IgD and IgA, 14 for IgM and IgE) is extensively used for the construction and expression of engineered antibodies (IMGT ® http://www.imgt.org, IMGT/mAb-DB) [60].

IG Molecular Synthesis Characteristics
The variable domain of a heavy chain, designated as VH, is a V-D-J-REGION encoded by three rearranged genes (IGHV, IGHD and IGHJ) ( Table 1). In humans, the genes encoding the heavy chains and the light chains, kappa and lambda, are located in the IGH, IGK and IGL loci on chromosome 14 (14q32.33), 2 (2p11.2) and 22 (22q11.2), respectively [2]. The synthesis of the IG heavy and light chains requires gene rearrangements, at the DNA level [122][123][124], in the IGH, IGK and IGL loci during the B cell differentiation in bone marrow ( Figure 3). The synthesis of the chains of the antigen receptors, immunoglobulin (IG) or antibody [1,2], and T cell receptors (TR) [1,3], includes several molecular mechanisms that occur at the DNA level and are unique to the B and T cells ( Figure 6): (a) combinatorial V-D-J or V-J rearrangements of the variable (V), diversity (D) and joining (J) genes (b) N-diversity resulting from the exonuclease trimming at the ends of the V, D, and J genes and the random addition of nucleotides at the V-(D)-J junctions before the gene ligation, and (c) later, during B cell differentiation, for the IG, somatic hypermutations (SHM) in the rearranged V-(D)-J genes. During the transcription in B or T cells, the rearranged V-(D)-J gene which codes the V domain is spliced to a C gene that codes the C region. Chronologically in B cells, the synthesis of the H-mu chains precedes that of the light chains.
N-diversity resulting from the exonuclease trimming at the ends of the V, D, and J genes and the random addition of nucleotides at the V-(D)-J junctions before the gene ligation, and (c) later, during B cell differentiation, for the IG, somatic hypermutations (SHM) in the rearranged V-(D)-J genes. During the transcription in B or T cells, the rearranged V-(D)-J gene which codes the V domain is spliced to a C gene that codes the C region. Chronologically in B cells, the synthesis of the H-mu chains precedes that of the light chains. The IGH locus comprises variable (V), diversity (D), joining (J) and constant (C) genes. The variable domain of a heavy chain, or VH, is a V-D-J-REGION generated by the junction at the DNA level of three genes: a variable gene IGHV, a diversity gene IGHD and a joining gene IGHJ (Figure 7). The synthesis requires two successive rearrangements. First, one of the IGHD genes is joined to one of the IGHJ genes with deletion of the intermediary DNA as an excision loop, then one of the IGHV genes is joined to the partially rearranged D-J gene to generate a completely rearranged IGHV-D-J gene. This second rearrangement is also accompanied by the formation of an excision loop which is cleaved off. The rearranged IGHV-D-J gene is transcribed with the IGHM gene, the most 5 IGHC gene in the locus, into a IGHV-D-J-M (or IGHV-D-J-Cmu) pre-messenger RNA. The IGHM gene encodes the four domains (CH1 to CH4) of the H-mu constant region. After splicing of the pre-messager RNA, translation of the messenger RNA, and elimination of the signal peptide by a peptidase in the endoplasmic reticulum, a mature H-mu chain is produced [2]. The RNA sequences corresponding to the introns and to the non-used IGHJ genes are excised by splicing, and a mature messenger which comprises the spliced coding regions and the 5′ and 3′ untranslated sequences, is obtained. (d) The messenger RNA is translated into a polypeptide chain by the ribosomes. (e) The signal peptide is cleaved off by a peptidase following the entry of the polypeptide chain in the endoplasmic reticulum, and a mature H-mu chain is produced. In DNA and pre-messenger RNA, L for Leader corresponds L-PART1 and L-PART2, and in spliced messenger RNA to L-REGION [2]. (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org). (c) The RNA sequences corresponding to the introns and to the non-used IGHJ genes are excised by splicing, and a mature messenger which comprises the spliced coding regions and the 5 and 3 untranslated sequences, is obtained. (d) The messenger RNA is translated into a polypeptide chain by the ribosomes. (e) The signal peptide is cleaved off by a peptidase following the entry of the polypeptide chain in the endoplasmic reticulum, and a mature H-mu chain is produced. In DNA and pre-messenger RNA, L for Leader corresponds L-PART1 and L-PART2, and in spliced messenger RNA to L-REGION [2]. (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org).

Synthesis of the L-Kappa and L-Lambda Chains: V-J Rearrangements in the IGK and IGL Loci
The kappa (IGK) locus and the lambda (IGL) locus comprise variable (V), joining (J) and constant (C) genes. The variable domain of a L-kappa or L-lambda chain, V-KAPPA or V-LAMBDA, is a V-J-REGION, generated by the junction, at the DNA level, of two genes: a variable and a joining genes, with deletion of the intermediary DNA to create a rearranged IGKV-J gene in the IGK locus (Figure 8), or a IGLV-J gene in the IGL locus. The kappa (IGK) locus and the lambda (IGL) locus comprise variable (V), joining (J) and constant (C) genes. The variable domain of a L-kappa or L-lambda chain, V-KAPPA or V-LAMBDA, is a V-J-REGION, generated by the junction, at the DNA level, of two genes: a variable and a joining genes, with deletion of the intermediary DNA to create a rearranged IGKV-J gene in the IGK locus ( Figure  8), or a IGLV-J gene in the IGL locus.  [2]. (a) At the DNA level, one of the IGKV gene is joined to one of the five IGKJ genes, with deletion of the intermediary DNA, to create a rearranged IGKV-J gene. (b) The rearranged IGKV-J sequence is transcribed with the IGKC gene into a IGKV-J-C premessenger RNA. (c) The RNA sequences corresponding to the introns and to the non-used IGKJ genes are excised by splicing, and a mature messenger which comprises the spliced coding regions, and the 5′ and 3′ untranslated sequences, is obtained. (d) The messenger RNA is translated into a polypeptide chain by the ribosomes. (e) The signal peptide is cleaved off by a peptidase following the entry of the polypeptide chain in the endoplasmic reticulum, and a mature L-kappa chain is produced. In DNA Figure 8. Synthesis of a L-kappa chain [2]. (a) At the DNA level, one of the IGKV gene is joined to one of the five IGKJ genes, with deletion of the intermediary DNA, to create a rearranged IGKV-J gene. (b) The rearranged IGKV-J sequence is transcribed with the IGKC gene into a IGKV-J-C pre-messenger RNA. (c) The RNA sequences corresponding to the introns and to the non-used IGKJ genes are excised by splicing, and a mature messenger which comprises the spliced coding regions, and the 5 and 3 untranslated sequences, is obtained. (d) The messenger RNA is translated into a polypeptide chain by the ribosomes. (e) The signal peptide is cleaved off by a peptidase following the entry of the polypeptide chain in the endoplasmic reticulum, and a mature L-kappa chain is produced. In DNA and pre-messenger RNA, L for Leader corresponds to L-PART1 and L-PART2, and in spliced messenger RNA to L-REGION [2]. (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org).
The rearranged IGKV-J (or IGLV-J) gene is transcribed with the IGKC gene (or one of the IGLC genes) into a IGKV-J-C (or IGLV-J-C) pre-messenger RNA. The unique IGKC gene, or one of the functional IGLC genes, with their single exon, encodes the single domain of the constant region of the L-kappa or L-lambda chains, respectively. After splicing of the pre-messenger RNA, translation of the messenger RNA and elimination of the signal peptide from the polypeptide chain in the endoplasmic reticulum, a mature L-kappa (or L-lambda) chain is produced [2]. The diversity of the variable domains of the immunoglobulin chains arises mainly from combinatorial diversity, V-J and V-D-J junctional diversity and somatic hypermutations [2]. In addition, within an immunoglobulin, the pairing of the variable domains of the heavy chain (VH) and of the light chain (VL, V-KAPPA or V-LAMBDA), to form the antigen recognition and binding site, creates an additional degree of diversity [2].

Combinatorial Diversity
The combinatorial diversity is created by the somatic V-D-J and V-J rearrangements. The somatic IGKV-J, IGLV-J and IGHV-D-J rearrangements require the presence of recombination signal (RS) sequences which are located in 3 of the V genes, 5 of the J genes, and on both sides of the D genes [2,125] (Figure 9). The RS comprise two highly preserved motifs, a palindromic heptamer and a nonamer rich in "a" or "t", separated by a not conserved spacer of 12 ±1 or 23 ±1 nucleotides. They are recognized by the recombinase enzymes (recombination activating 1 (RAG1) and recombination activating 2 (RAG2)) [126,127]. Efficient rearrangements occur between RS of different lengths, that is one with a 12 ± 1 spacer, and another one with a 23 ± 1 spacer (12/23 joining rule) [128] (IMGT ® http://www.imgt.org, IMGT Repertoire (IG and TR) > 1. Locus and genes > 7. Gene tables > Recombination signals sequence logos; ibid: IMGT Education > Tutorials > Immunoglobulins and B cells > Molecular genetics of immunoglobulins). The potential repertoire resulting from the combinatorial diversity depends on the number of V, D and J genes, and on their functionality.

Junctional Diversity
The junctional diversity is represented by the V-J junction diversity of the L-kappa chains which creates variability of the amino acid at position 115 (IMGT unique numbering) of the rearranged CDR3 [125,129], and by the N-diversity (N, for nucleotides) essentially observed at the V-D-J junctions of the IG heavy chains, and which represents the major source of the CDR3 diversity. In 1982 Alt and Baltimore proposed a model that explains the mechanism of N-diversity [130]. It results from the excision of nucleotides by an exonuclease at the ends of the V, D and J genes during rearrangement (3 end of the V-REGION, 5 end of the J-REGION and/or both ends of the D-REGION), followed by the addition of nucleotides randomly via the DNA nucleotidylexotransferase (DNTT, terminaldeoxynucleotidyltransferase TdT) [131]. This addition of nucleotides preferentially involves 'g' nucleotides and is template independent. If the ends of the coding regions are intact (no deletion due to the exonuclease), P-nucleotides may be observed adjacent to these coding regions [132]. P-nucleotides are short sequences of 1 to 3 nucleotides palindromic (inverted repeat) to the intact coding end (3 end of the V-REGION, 5 end of the J-REGION and/or both ends of the D-REGION. P-nucleotides result from the dissymmetric opening of the hairpin formed at the extremities of the coding regions during V-J or V-D-J rearrangements [133].

Somatic Hypermutations
Somatic hypermutations (SHM) appear during the B cell maturation in the germinal centers of the secondary lymphoid organs (spleen and lymph nodes) ( Figure 3). They specifically affect the IG rearranged V-J and V-D-J genes during the antigen dependent stages of differentiation and represent a major mechanism for the generation of diversity of the variable domains of antibodies. This process of somatic hypermutation involves the introduction, to a high rate of point mutations in rearranged VH and VL (V-KAPPA or V-LAMBDA) sequences [134][135][136][137][138][139]. Other genes expressed in B cells are not changed by the mechanisms of somatic hypermutation. SHM occurs at a frequency estimated 10 −3 per base in a B cell which is roughly 10 6 times more as frequent as the rate of spontaneous mutations in other cells. Somatic hypermutation shares with class-switch recombination (CSR) common mechanisms: they are both initiated by activation induced cytidine deaminase (AICDA, AID) which necessitates transcription to target single-strand DNA. AICDA induces the deamination of cytosine (c) to uracil (u). The replication system transforms the uridine in thymidine leading to a c > t transition (and g > a transition on the opposite strand). In SHM, the uracil is removed by uracil DNA glycosylase (UNG), and this abasic site is further processed by either the DNA base excision repair (BER) pathway or the DNA mismatch repair (MMR) systems. The DNA lesions are repaired by error-prone DNA polymerases, leading to nucleotide transitions and transversions [139].

Coexpression of the Membrane H-Mu and H-Delta Chains
During its differentiation, an immature B cell becomes a naive mature B cell which expresses simultaneously membrane IgM and IgD classes ( Figure 3). The VH domains of the H-mu and H-delta chains are identical, and are encoded by the same rearranged V-D-J-REGION. The mechanism of expression of the H-delta isotype differs from the expression of other isotypes in that its expression depends on a splicing mechanism and not on the class switching mechanism like the other isotypes. IgD is coexpressed with IgM on the surface of naive mature B cells (only case where two different IG classes are expressed by the same cell).
The IGHM and IGHD genes are located nearby in the IGH locus. B cells which express IgM and IgD produce two types of RNA premessengers of the IGH-V-D-J-gene, the first ones ending after the IGHM gene, and the second ones, containing IGHM and IGHD, ending after the IGHD gene and long of about 20 kilobases (kb) (the distance separating IGHM and IGHD being 6 kb in the human IGH locus). The RNA premessengers ending after the IGHM gene are spliced to produce mature IGHV-D-J-Cmu mRNA, translated into membrane H-mu chains. The RNA premessengers ending after the IGHD gene undergo splicing which removes the IGHM gene and produces mature IGHV-D-J-Cdelta mRNA, translated into membrane H-delta chains [140][141][142]. It is not excluded that this long premessager is also used to produce H-mu chains.

Expression of H-Gamma, H-Epsilon and H-Alpha Chains: Class Switch Recombination
The mature B cells which enter the lymph nodes express the IgM and IgD classes ( Figure 3). After antigenic stimulation, activated B cells proliferate and can differentiate to produce other isotypes: the class switch recombination (CSR) occurs in the lymph nodes when B cells mature as a result of B and T cell cooperation. The activated B lymphocyte by its major histocompatibiliy MH2 proteins comes into contact with a T lymphocyte CD4 +. Recognition of B cell peptide/MH2 (p/MH2) by the T cell receptor (TR) leads to the T cell activation (expression of cytokines and of CD40LG on the T cell surface). The interaction between the CD40 (TNFRSF5) constitutively expressed on the B cell and its ligand CD40LG expressed on the surface of activated T cell leads to the expression of cytokin receptors on B cells, which in presence of the interleukins secreted by the T cell, provide the signal for the B cell to switch from IgM and IgD to IgG (IgG1, IgG2, IgG3 or IgG4) or IgA (IgA1 or IgA2) or IgE classes. This switch results in a change of the constant region of the heavy chain, while maintaining the expression of the same antibody specificity. In the switch recombination, the rearranged IGHV-D-J gene, previously associated with the IGHM gene in a H-mu messenger RNA of a B cell expressing IgM, is brought into the proximity of one of the other IGHC genes, downstream (more in 3 ) in the locus, for example IGHG1 [2] (Figure 10).
Biomedicines 2019, 7, x FOR PEER REVIEW 24 of 117 Figure 10. Class switch IgM-IgG1: Smu-Sgamma1 recombination [2]. In a B cell which expresses IgM on the cell surface, a productive rearranged IGHV-D-J gene on one chromosome 14 is transcribed with the IGHM gene. Before the switch (S) recombination, all the IGHC genes are present in the IGH locus. During the switch recombination, a novel DNA rearrangement occurs in the IGH locus, between the Switch mu (Smu) sequence and another S sequence located in 5′ of a more downstream IGHC gene (for example, Sgamma1 upstream of IGHG1). This leads to the deletion of the intermediary DNA and to the loss of the IGHC genes located between the two S sequences which recombine. The enhancer (E) located between the most 3′ IGHJ and Smu is retained during the switch recombination [2]. (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org). Figure 10. Class switch IgM-IgG1: Smu-Sgamma1 recombination [2]. In a B cell which expresses IgM on the cell surface, a productive rearranged IGHV-D-J gene on one chromosome 14 is transcribed with the IGHM gene. Before the switch (S) recombination, all the IGHC genes are present in the IGH locus. During the switch recombination, a novel DNA rearrangement occurs in the IGH locus, between the Switch mu (Smu) sequence and another S sequence located in 5 of a more downstream IGHC gene (for example, Sgamma1 upstream of IGHG1). This leads to the deletion of the intermediary DNA and to the loss of the IGHC genes located between the two S sequences which recombine. The enhancer (E) located between the most 3 IGHJ and Smu is retained during the switch recombination [2]. (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org).
The switch recombination occurs between switch sequences located at about 2 kb in 5 of each IGHC gene (except IGHD) ( Figure 10). The switch sequences, of about 2 kb, are composed of 20-80 nucleotide motifs repeated in tandem. These motifs contain short 'gggct' and 'gagct' repeats and, near the recombination site, 'tggg' or 'tgag'. The class switch involves the recombination of the Smu sequence with the S sequence of another IGHC gene, for example with a Sgamma sequence in the case of a switching from IgM to IgG, resulting in the deletion of the IGHC genes located between Smu and the Sgamma of the IGHG gene used ( Figure 10). This occurs by the formation of excision loops which are cleaved off [143][144][145][146][147]. For example, in the case of switching from IgM to IgG1 (Figure 10), the IGHM, IGHD and IGHG3 genes are deleted [2].
At the molecular level, CSR shares a number of features with SHM. It necesitates transcription of the C regions which starts from a small exon (called I) located upstream of each switch region, resulting in a sterile J-C transcript which does not encode any protein. As mentioned above, CSR is also initiated by AICDA. Transcription allows the separation of both DNA strands which are then targeted by deamination of multiple 'c' nucleotides by AICDA. This is followed by generation of single-strand breaks by the BER system (single-strand break by the apurinic/apyrimidic endonuclease 1 (APE1) at a UNG abasic site), which may be converted into double-strand breaks by the MMR proteins. After loop excision of the intervening sequence, fusion of the switch regions is thought to be mediated by the non-homologous end-joining (NHEJ) system [148,149].

Expression of H-Delta Chains from IgM
Only a minority of normal plasma cells and rare B cell malignancies express exclusively IgD (IgM − IgD + B cells). The low frequency has been explained by the lack of a recognizable switch sequence between IGHM (Cmu) and IGHD (Cdelta). However, a region, designated as sigma delta, contains a relatively high content of pentameric repeats with an extremely "g-rich" area and appears to function as a vestigial switch recombination site leading to the expression of delta chains in germinal center B cells and plasma cells [150][151][152].

Expression of Membrane and Secreted Immunoglobulins
Heavy chains of membrane and secreted immunoglobulins differ in their C-terminal region. The heavy chains of the membrane IG on the B cell surface have a hydrophobic C-terminal end which holds them anchored in the plasma membrane, whereas the heavy chains of the plasma cell secreted IG have an hydrophilic end [153]. Expression of membrane and secreted IG results from an alternative splicing of the heavy chain transcripts ( Figure 11).
The C-terminal region of the membrane H-mu chain is encoded by two small exons, M1 and M2 located at about 2 kb in 3 of the CH4 exon [154], M1 encodes 39 amino acids, whereas M2 only encodes two amino acids ( Figure 11). These 41 amino acids represent the anchor region of the membrane H-mu chain which comprises an extracellular region (CO) of 13 amino acids between the CH4 domain and the membrane, a hydrophobic transmembrane region (TM) of 27 amino acids and a short cytoplasmic region (CY) of one amino acid. The C-terminal region of the secreted H-mu chain comprises 20 amino acids encoded by the 3 end of the CH4 exon (designated as CHS) [2].
For the synthesis of a membrane H-mu chain, it is the poly A site located in 3 of the M2 exon, and the splicing site located in the CH4 exon, at the 5 limit of CHS which are used ( Figure 11). This splicing deletes the CHS sequence and its stop codon, as well as the sequence between CH4 and M1, and between M1 and M2. For the synthesis of a secreted H-mu chain, it is the poly A site located 103 bp from the 3 end of the CH4 exon, and the stop codon at the 3 end of CH4 which are used ( Figure 11). One cell can therefore present the two H-mu RNA precursors and the relative expression of a H-mu chain, membrane or secreted, depends on a control in the selection of the polyA site used [155]. The organization of the 3 region of the IGHD gene differs due to the presence of a small independent CHS exon located at 1.9 kb in 3 of the CH3 exon, and which encodes the nine last amino acids of the secreted H-delta chains. The M1 and M2 exons, located at 0.8 kb and 1.1 kb in 3 of CHS, respectively, encode the TM and CY, M1 encodes 53 amino acids whereas M2 encodes two amino acids. The expression of the membrane and secreted H-delta chain depends on the selection of the poly A used: poly A in 3 of the IGHD exon M2, for the synthesis of the membrane H-delta chain, or poly A in 3 of CHS for the synthesis of the secreted H-delta chain [156].
The expression of the membrane and secreted H-gamma, H-alpha and H-epsilon chains follows the same mechanisms as those described for the H-mu chain, the CHS being part of a domain (CH3 or CH4 depending on the IGHC gene).
Only a minority of normal plasma cells and rare B cell malignancies express exclusively IgD (IgM − IgD + B cells). The low frequency has been explained by the lack of a recognizable switch sequence between IGHM (Cmu) and IGHD (Cdelta). However, a region, designated as sigma delta, contains a relatively high content of pentameric repeats with an extremely "g-rich" area and appears to function as a vestigial switch recombination site leading to the expression of delta chains in germinal center B cells and plasma cells [150][151][152].

Expression of Membrane and Secreted Immunoglobulins
Heavy chains of membrane and secreted immunoglobulins differ in their C-terminal region. The heavy chains of the membrane IG on the B cell surface have a hydrophobic C-terminal end which holds them anchored in the plasma membrane, whereas the heavy chains of the plasma cell secreted IG have an hydrophilic end [153]. Expression of membrane and secreted IG results from an alternative splicing of the heavy chain transcripts ( Figure 11). Figure 11. Synthesis of a membrane and of a secreted H-mu chain [2]. (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org).

Allelic and Isotypic Exclusion. Rearrangement Chronology
The B cells, and the plasma cells which derive from them, display: -allelic exclusion: in most cases, the only productive genes are either those of the paternal chromosome, or those of the maternal chromosome, but usually not the two together (functional haploidy); -isotypic exclusion: a single type of light chain, L-kappa or L-lambda, and usually a single type de heavy chain belonging to a given subclass, are synthesised.
Molecular analysis has shown that the excluded allele is usually either not rearranged, or unproductively rearranged (IGK locus in B cells synthesising a lambda chain) [157,158].
During the B cell differentiation, the IGH locus on one chromosome 14 undergoes first a D-J, then a V-D-J rearrangement. A productive rearrangement allows the synthesis of a H-mu chain in the cytoplasm of the pre-B cells.The H-mu chain is expressed at the surface of the pre-B cells in association with a lambda-like chain (IGLL1) and a V-pre-B (VPREB1) chain, which constitute together the pre-B cell receptor [159,160] (Figure 3) [2].
The expression of the surface IgM at a later stage requires the synthesis of L-kappa or L-lambda chains, that is a productive V-J rearrangement of the IGK or IGL loci. It is the expression of the pre-B cell receptor at the surface of a pre-B cell which gives the signal which inhibits further IGHV-D-J rearrangements on chromosome 14 ( Figure 12), and the signal which starts the light chain V-J rearrangements. Chronologically the V-J rearrangements of the IGK locus usually precede those of the IGL locus [157,158]. A chromosome 2 will be rearranged first. If the resulting rearrangement is productive, L-kappa chains will be synthesized, which will allow the expression of IgM, the other chromosome 2 and both chromosomes 22 remaining germline. If the first rearrangement is unproductive, the other chromosome 2, then the chromosome 22 will be rearranged until a productive rearrangement allows the synthesis of a light chain. Thus, in a B cell which produces antibodies, generally only one chromosome 14 is productively rearranged (expressing a productive H-mu chain) whereas only one chromosome 2 or 22 is productively rearranged (expressing a productive L-kappa or L-lambda chain). IG genes on the other chromosomes are either germline, or rearranged but unproductive, or deleted. unproductive, the other chromosome 2, then the chromosome 22 will be rearranged until a productive rearrangement allows the synthesis of a light chain. Thus, in a B cell which produces antibodies, generally only one chromosome 14 is productively rearranged (expressing a productive H-mu chain) whereas only one chromosome 2 or 22 is productively rearranged (expressing a productive L-kappa or L-lambda chain). IG genes on the other chromosomes are either germline, or rearranged but unproductive, or deleted.

Regulation of the IG Gene Expression: Enhancers
In order to synthesize complete IG heavy and light chains, the rearranged IGHV-D-J and IGKV-J (or IGLV-J) genes are transcribed with the IGHM gene and with the IGKC (or one IGLC) genes, respectively. The transcription level, low in the first B cell development stages, becomes very high in the plasma cells which result from the clonal proliferation [161]. The V genes possess a promoter sequence in 5′ of L-PART1 (exon encoding the first part of the leader peptide) [162] and can be transcribed before they rearrange. These germline transcripts correspond to an opening, and therefore to better accessibility of the chromatin before the rearrangements. However, that transcription remains low. The C genes can also be transcribed from promoter sequences located upstream of the J genes but again the level of transcription is low and the transcripts are degraded in the nucleus [163].
Murine and human IG transcription enhancers are the first enhancers described in the DNA of eucaryote cells [164][165][166][167]. By different approaches several groups have simultaneously shown that a DNA segment located between the most 3′ IGHJ and the switch Smu sequence is not only able to increase the transcription, but also possesses the properties of the enhancers previously described in the viruses. These enhancers (i) activate the transcription whatever their orientation and their position (in 5′ or 3′) relative to the gene promoter, (ii) only activate the promoters located in cis, that is on the same chromosome, and (iii) activate in vitro genes others than those to which they are normally associated in vivo and increase their transcription, even when located at a distance of several kb [168].
The presence of an enhancer in the Homo sapiens IG loci has been demonstrated between the most 3′ IGHJ and the IGHM gene [169], and between the most 3′ IGKJ and the IGKC gene [170,171]. When a IGHV-D-J or IGKV-J rearrangement occur, the promoter sequence in 5′ of the IGHV or IGKV genes is not modified, but this promoter is now closer to the enhancer sequences located in 3′ of the IGHJ or IGKJ genes. By decreasing the distance between the V gene promoter and the enhancer, the IGHV-D-J and IGKV-J rearrangements allow the interaction of factors binding to these sequences, and consequently an increased trancription of the IGHV-D-J-Cmu and IGKV-J-Ckappa transcripts.

Regulation of the IG Gene Expression: Enhancers
In order to synthesize complete IG heavy and light chains, the rearranged IGHV-D-J and IGKV-J (or IGLV-J) genes are transcribed with the IGHM gene and with the IGKC (or one IGLC) genes, respectively. The transcription level, low in the first B cell development stages, becomes very high in the plasma cells which result from the clonal proliferation [161]. The V genes possess a promoter sequence in 5 of L-PART1 (exon encoding the first part of the leader peptide) [162] and can be transcribed before they rearrange. These germline transcripts correspond to an opening, and therefore to better accessibility of the chromatin before the rearrangements. However, that transcription remains low. The C genes can also be transcribed from promoter sequences located upstream of the J genes but again the level of transcription is low and the transcripts are degraded in the nucleus [163].
Murine and human IG transcription enhancers are the first enhancers described in the DNA of eucaryote cells [164][165][166][167]. By different approaches several groups have simultaneously shown that a DNA segment located between the most 3 IGHJ and the switch Smu sequence is not only able to increase the transcription, but also possesses the properties of the enhancers previously described in the viruses. These enhancers (i) activate the transcription whatever their orientation and their position (in 5 or 3 ) relative to the gene promoter, (ii) only activate the promoters located in cis, that is on the same chromosome, and (iii) activate in vitro genes others than those to which they are normally associated in vivo and increase their transcription, even when located at a distance of several kb [168].
The presence of an enhancer in the Homo sapiens IG loci has been demonstrated between the most 3 IGHJ and the IGHM gene [169], and between the most 3 IGKJ and the IGKC gene [170,171]. When a IGHV-D-J or IGKV-J rearrangement occur, the promoter sequence in 5 of the IGHV or IGKV genes is not modified, but this promoter is now closer to the enhancer sequences located in 3 of the IGHJ or IGKJ genes. By decreasing the distance between the V gene promoter and the enhancer, the IGHV-D-J and IGKV-J rearrangements allow the interaction of factors binding to these sequences, and consequently an increased trancription of the IGHV-D-J-Cmu and IGKV-J-Ckappa transcripts. During the switch recombination, the IGH enhancer, being localized at more than 1 kb upstream from the Smu sequence, is retained in the locus and can therefore be used for the expression of all the heavy chain classes and subclasses. A second enhancer has been described in 3 of the IGKC gene [172]. Two 3 enhancers have been characterized in the Homo sapiens IGH locus, one downstream of IGHA1, and another one downstream of IGHA2, within 25 kb of each gene, respectively [173]. These enhancers were duplicated along with part of the IGH locus [114,[174][175][176], which occurred between about 30 and 60 million years ago. An enhancer has also been localized in the Homo sapiens IGL locus in 3 of IGLC7, the most 3 IGLC gene [177]. This enhancer consists of three modules located 6, 9.8 and 11.7 kb downstream of IGLC7 [178]. IgM represents about 10% of total serum immunoglobulins in human and is largely confined to the intravascular pool. It exists almost exclusively as a polymeric form (pentamer) made of five monomer units associated with the polypeptide designated as J (joining) chain [179]. IgM is the predominant antibody produced early in the immune response. Pentameric IgM is decavalent with small antigens but only pentavalent with larger antigens, presumably due to steric hindrance ( Table 2). A disulfide bridge connects the H-mu chains between CH2 and CH3. Disulfide bridges between the CH3 and the tailpieces of the different monomers are involved in the IgM polymerization. A single J chain is present per IG. This amounts to 1.5% of pentameric IgM. The conserved features of the J chain (16 kDa) is the presence of a N-glycosylation site Asn-Ile-Ser and of eight cysteines. Six of the cysteines form three intradisulfide bridges, and two are linked to the penultimate cysteine of two H-mu chains. The five 'Fc' (or paired CH3 and CH4 of each monomer) are arranged into a planar pentamer (Fc5) (IMGT/3Dstructure-DB PDB 2rcj) [180]. Electron microscope studies have revealed that uncomplexed IgM has a planar and "star" conformation with the 10 Fab arms protruding out from the Fc5. On binding to an antigenic surface, the F(ab')2 dislocate out of the plane of the central Fc5 disc, giving a "staple" or "crab-like" ("table-like") conformation [104]. In this latter conformation, IgM is a very efficient activator of the classical complement pathway. C1q interacts directly with the CH3 domain of the IgM.

IgD
IgD represents less than 1% of total serum immunolobulins. IgD has a long and extended hinge region of 58 amino acids encoded by two exons, which allows great flexibility in the relative position of the two Fab arms. The hinge N-terminus half of 34 AA encoded by the first exon is heavily O-glycosylated (four to seven oligosaccharides). The hinge C-terminus half of 24 AA encoded by the second exon is rich in charged amino acids (2 Arg, 6 Lys, 9 Glu) and is very susceptible to proteolytic attack, whch makes serum IgD unstable ( Table 2).

IgG
IgG is the major antibody class in normal human serum forming about 70% of the total immunoglobulins. IgG is a monomer and is evenly distributed between intravascular and extravascular pools. IgG is the predominant antibody of the secondary immune responses. There are four subclasses in humans.
The effector molecules binding IgG Fc are C1q, the Fcgamma receptors (FcγR) present on the surface of many cells of the immune system, the neonatal Fc receptor (FCGRT, FcRn) which transports maternal IgG to the foetus, and the bacterial Fc receptors, protein A and protein G, which are believed to mask the bacteria through immobilized immunoglobulins on their surface.
C1q interacts directly with the CH2 domain of IgG. Binding to monomeric IgG is weak, but when several IgG bind to, and effectively aggregate at an antigenic surface, two or more C1q heads may bind simultaneously leading to tighter binding and activation of the complement cascade. There are marked differences in ability to activate complement: IgG1 and IgG3 activate well, IgG2 less well, IgG4 not at all (Table 2).
Three classes of human FcγR have been described: FcγRI (CD64) are receptors with high affinity for IgG Fc and possess three C-like extracellular domains, FcγRII (CD32) and FcγRIII (CD16) are receptors with lower affinity and possess two C-like domains [181]. FcγRI, FcγRII and FcγRIII on macrophages and Natural Killer (NK) cells mediate antibody-dependent cell-mediated cytotoxicity (ADCC) and phagocytosis, whilst FcγRI, FcγRII and possibly FcγRIII on neutrophils are able to trigger release of activated oxygen species. The cellular responses also comprise endocytosis, enhanced antigen presentation and regulation of the antibody production, depending on the particular FcγR isoform and the type of cell [181]. FcγRI (CD64) displays high affinity for monomeric human IgG1 and IgG3, whilst the affinity for human IgG4 is about 10-fold lower and human IgG2 does not bind. The human FcγRII (CD32) binds IgG1 and IgG3. IgG4 does not bind, whilst the binding of IgG2 is controlled by an allotypic determinant in certain forms of the receptor. FcγRI and FcγRII appear to recognize overlapping but non-identical sites in the lower hinge region of IgG.
The crystal structures of the Fc of human IgG1 in complex with Staphylococcus aureus protein A [182], or in complex with streptococcal protein G [183], and that of Fc of rat in complex with FCGRT (FcRn) [184] revealed binding sites at the interface between the CH2 and CH3 Fc domains. The crystal structure of the human IgG1 Fc fragment-FcγRIII complex shows that FcγRIII binds to the two CH2 domains and lower hinge of the Fc [185].

IgA1 and IgA2
IgA forms about 15-20% of total serum immunoglobulins where it occurs as a monomer. IgA is the predominant immunoglobulin in seromucous secretions such as saliva, tracheobronchial secretions, colostrum, milk and genitourinary secretions, where it is found in a dimeric form known as secretory IgA (sIgA). There are two subclasses of IgA, with IgA1 being the predominant (80-90%) subclass in serum. In contrast to serum IgA, secretory IgA shows roughly equal proportions of the two subclasses. The two IgA subclasses differ in the hinge: IgA1 has an effective structural hinge of 19 amino acids containing eight potential glycosylation sites, whereas IgA2 has a short structural hinge of six amino acids including five prolines which, by its nature, is resistant to proteolysis. A further peculiarity of IgA2 is that for most molecules (allotype A2m1), the light chain is disulfide bridged, not to the heavy chain but to the light chain of the other Fab unit. The CH2 domain of both IgA suclasses has seven cysteines. Two are involved in the usual intradomain disulfide bridge, another two in a second intradomain bridge and one is thought to be free, possibly for interaction with secretory component. The remaining two form inter-H disulfide bridges. There is a further intradomain disulfide linkage in CH1 in addition to the conserved domain disulfide.

Secretory IgA
The dimer involves J chain (16 kDa) and another polypeptide known as secretory component (SC) (70 kDa). Selective transport of polymeric IgA through epithelial cells depends on the incorporation of the J chain into the polymers. Two of the J chain cysteines are linked to the penultimate cysteine of the alpha chains. The J chain, which was identified initially in human IgA [186] amounts to 4% of dimeric human IgA. The SC, unlike IG and J chain which are produced by plasma cells, is synthesized in epithelial cells. With extra segments to attach it to the epithelial cell membrane, SC serves as a receptor for polymeric IG (poly-IG) containing J chain, i.e., IgA (or IgM). After endocytosis and transport, cleavage of the poly-IG/poly-IG receptor complex releases poly-IG (poly-IG with the J-chain) associated with SC. This process is particularly important for secretory IgA release. The poly-IG receptor is composed, in its poly-IG binding portion (i.e., SC) of five highly conserved C-like domains of approximately 100 amino acids. SC (70 kDa) probably interacts non-covalently with the Fc and J chain and forms a single disulfide bridge to one of the monomers of dimeric IgA.

IgA Effector Function
IgA can activate the alternative complement pathway and bind to specific FcαR. FcαR is present on monocytes, macrophages, neutrophils and eosinophils and can mediate ADCC, phagocytosis and degranulation. FcαR has two extracellular C-like domains and spans the membrane once. FcαR binds at a site in the IgA CH2 domain.

IgE
IgE [187][188][189], though a trace IG in serum, is found bound to receptors, specific for the IgE Fc, on the cell surface of mast cells and basophils in all individuals. IgE is involved in protection against helminthic parasites [190] but is most commonly associated with atopic allergies. The ability of IgE-Fc to undergo conformational changes is critical for IgE function [191]. IgE binds to two principal receptors, FCER1 (FcεRI, tetrameric IgE Fc receptor I), the "high affinity" receptor for the IgE Fc, on the surface of mast cells and basophils, and FCER2 (IgE Fc receptor II, FcεRII, CD23), the "low affinity" receptor for IgE Fc, a Ca 2+ -dependent C-type lectin and a B cell specific antigen [192].
FCER1 is tetrameric and consists of an alpha chain (FCER1A, IgE Fc binding site) chain, a beta chan (FCER1B, which amplifies the signal), and two disulfide linked gamma chains (FCER1G, where the downstream signal initiates). FCERI is expressed on tissue mast cells, blood basophils, airway epithelial and smooth muscle cells and intestinal epithelial cells [193,194]. Aggregation of the receptors by binding of multivalent antigens, such as pollen, to prebound IgE results in cell degranulation and release of pre-formed mediators of inflammation causing an allergic response and an immediate hypersensitivity response that, if intense, can cause anaphylactic shock and even death. The FCERIA binding site on IgE involves CH3 (next to the interface between CH2 and CH3). The crystal structure of the human IgE Fc and its high-affinity FCERIA reveals that the receptor binds one Fc asymmetrically. The CH3 of each chain of the Fc is bound to two different sites of the C-like domain [D2] of FCERIA (IMGT/3Dstructure-DB, PDB code: 1f6a) [195].The IgE Fc is highly flexible adopting an acutely bent conformation when unbound (IMGT/3Dstructure-DB: PDB code: 2y7q) [196], partially bent conformation in a complex with omalizumab Fab (IMGT/3Dstructure-DB, PDB code: 5g64) [197], fully extended in a complex with aεFab (IMGT/3Dstructure-DB, PDB code: 4j4p) [198] and with the 8D6 Fab (IMGT/3Dstructure-DB, PDB code: 6eyo) [199].
FCER2 (FcεRII, CD23), the low-affinity receptor for IgE is present on monocytes, B cells, T cells, gut and airway epithelial cells, plays a role in cytotoxicity against parasites such as schistosomes. FCER2 has essential roles in B cell growth and differentiation, and the regulation of IgE production [200][201][202][203]. It also exists as a soluble secreted form [204], then functioning as a potent mitogenic growth factor. The interaction between IgE and FCER2 appears to require the presence of the IgE CH2, CH3 and CH4 domains, the latter serving to promote the dimerization of the two epsilon chains, necessary for receptor binding. Crystal structure of IgE bound to its B cell FCER2 reveals a mechanism of reciprocal allosteric inhibition with the high affinity receptor FCER1 (IMGT/3Dsructure-DB, PDB code: 4gko) [205].
Antibodies classically bind antigens via their complementarity determining regions, but an alternative mode of interaction involving V-domain framework regions has been observed for some B cell "superantigens". The crystal structure of an antibody from an allergic individual, bindng to the grass pollen allergen Phl p 7 has shown that both modes of interaction were employed simultaneously with binding of two antigen molecules (IMGT/3Dstructure-DB, PDB code: 5otj) [206].

IMGT ® Standardized Genes and Alleles (Classification)
3.1.1. IG and TR Genes and Concepts of Classification: Birth of IMGT ® and Immunoinformatics The creation of IMGT ® in 1989 by Marie-Paule Lefranc (LIGM, UM, CNRS), during the 10th Human Genome Mapping Workshop (HGM10, New Haven, CT, USA, 11-17 June 1989) gave birth to immunoinformatics, a new science at the interface between immunogenetics and bioinformatics [1]. Indeed, for the first time, immunoglobulin (IG) or antibody and T cell receptor (TR) variable (V), diversity (D), joining (J), and constant (C) genes were officially recognized as "genes" as well as were the conventional genes, with the entry of all genes of the Homo sapiens TRG locus in the HGM database [207,208]. This major breakthrough allowed IG and TR genes and alleles of the complex and highly diversified adaptive immune responses to be managed in genomic databases and tools. IMGT ® gene and allele names are based on the concepts of classification of 'Group', 'Subgroup', 'Gene' and 'Allele' [37,38,84,85]. 'Group' allows to classify a set of genes which belong to the same multigene family, within the same species or between different species. For example, there are 10 groups for the IG of higher vertebrates: IGHV, IGHD, IGHJ, IGHC, IGKV, IGKJ, IGKC, IGLV, IGLJ, IGLC. 'Subgroup' allows to identify a subset of genes which belong to the same group, and which, in a given species, share at least 75% identity at the nucleotide level, e.g., Homo sapiens IGHV1 subgroup. Subgroups, genes and alleles are always associated to a species name (84,85). An allele is a polymorphic variant of a gene, which is characterized by the mutations of its sequence at the nucleotide level, identified in its core sequence (V-REGION, D-REGION, J-REGION, C-REGION) and compared to the gene allele reference sequence, designated as allele *01. For example, Homo sapiens IGHV1-2*01 is the allele *01 of the Homo sapiens IGHV1-2 gene that belongs to the Homo sapiens IGHV1 subgroup which itself belongs to the IGHV group (84,85) ( Figure 13). For the IGH locus, the constant genes are designated by the letter (and eventually number) corresponding to the encoded isotypes (IGHM, IGHD, IGHG3 . . . ), instead of using the letter C. IG and TR genes and alleles are not italicized in publications.
IMGT human IG and TR gene names have been integrated in the CLASSIFICATION axiom [37,38] of IMGT-ONTOLOGY [29] (IMGT ® http://www.imgt.org, IMGT Index > IMGT-ONTOLOGY), on the NCBO BioPortal and, on the same site, in the HUGO ontology and in the National Cancer Institute (NCI) Metathesaurus. Since 2006, IMGT gene and allele names have been used for the description of the therapeutic mAb and FPIA from the WHO-INN programme [86,87]. Amino acid sequences of the IMGT human IG and TR constant genes (e.g., Homo sapiens IGHM, IGHG1, IGHG2) were provided to UniProt in 2008, and those of the IMGT human IG variable genes with their IMGT gene definition (e.g., P23083, Homo sapiens IGHV1-2, Immunoglobulin heavy variable 1-2), in 2016. There are reciprocal direct links between the gene entries of IMGT/GENE-DB [56], the IMGT ® gene database, and HGNC, NCBI Gene, Ensembl, Vega and UniProt.
IGHG3…), instead of using the letter C. IG and TR genes and alleles are not italicized in publications.

Homo sapiens IG Genes and Concepts of Identification and Description
To date (July 25, 2020), four hundred sixty-five IG genes have been identified in Homo sapiens, 389 of them in Major Locus and 76 in orphon sets [56] (Table 5). Genes of the major loci participate to the IG chain synthesis whereas those of the orphon sets do not. Given the complexity of the IG synthesis which generates a huge diversity of sequences, thirty-two molecular entity types have been defined. They are identified with IMGT standardized keywords [30], based on the "GeneType", "ConfigurationType", and "MoleculeType" which define them (IDENTIFICATION axiom [30,31]). The ten most classical IG and TR entity types are shown in Table 6. To each molecular entity type corresponds a molecular entity prototype. These prototypes are described with IMGT standardized labels [34] (DESCRIPTION axiom [34,35]). The V-GENE and V-D-J-GENE are shown as examples ( Figure 14). Other prototypes are available on the IMGT ® web site (IMGT ® http://www.imgt.org, IMGT Scientific chart > 1. Sequence and 3D structure identification and description > IMGT prototypes table). The ten molecular entity types are reported in the IG synthesis in Figure 15, bridging genes, sequences and proteins. Table 6. The ten most classical IG and TR entity types and associated prototypes. Entity types are identified with IMGT standardized keywords [30] (IDENTIFICATION axiom [30,31]). Prototypes are described with IMGT standardized labels [34] (DESCRIPTION axiom [34,35]).

Configuration Type
Molecule Type Functionality Molecular Entity Prototype    [30]. The functionality of undefined and germline entities is functional (F), open reading frame (ORF) or pseudogene (P) [30]. The functionality of rearranged entities is productive or unproductive [30] (Table 6)

IGHC Multigene Deletions and Gene Order, IGHC and IGHV Copy Number Variation (CNV) Haplotypes
IGHC haplotypes have been identified in Homo sapiens, which correspond to IGHC copy number variations (IGHC CNV) with absence of the corresponding IG classes and subclasses in healthy indiduals having IGHC deletions on both chromosomes (either homozygous for a same deletion, or heterozygous for two different deletions). These IGHC CNV deletions are designed I to VI according to the chronogical order in which they were found ( Figure 17).
The first two identified deletions, deletion I (del G1-EP1-A1-GP-G2-G4) [113,114] and deletion II (del EP1-A1-GP) [115] allowed ordering of the Homo sapiens IGHC genes by determinng the relative positions of two cosmids [174,175]. Deletion III (del A1-GP-G2-G4-E) [116,238] includes the IGHE gene and corresponds to the complete absence of the IgA1, IgG2, IgG4 subclasses and of the IgE class Figure 16. Representation of the human IGH locus at 14q32.33 [2]. The boxes representing the genes are not to scale. Exons are not shown. Switch sequences are represented by a filled circle upstream of the IGHC genes. Pseudogenes which could not be assigned to subgroups with functional genes are designated by a Roman numeral between parentheses, corresponding to the clans, followed by a hyphen, and a number for the localization from 3 to 5 in the locus [2]. (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org).

IGHC Multigene Deletions and Gene Order, IGHC and IGHV Copy Number Variation (CNV) Haplotypes
IGHC haplotypes have been identified in Homo sapiens, which correspond to IGHC copy number variations (IGHC CNV) with absence of the corresponding IG classes and subclasses in healthy indiduals having IGHC deletions on both chromosomes (either homozygous for a same deletion, or heterozygous for two different deletions). These IGHC CNV deletions are designed I to VI according to the chronogical order in which they were found ( Figure 17).
The IGHV cluster comprises several CNV reported in Figure 17B. As an example, the genome assembly from GRCh38 from the hydatidiform mole CHORI-17 BAC library corresponds to a new haplotype (haplotype B) in the highly polymorphic region by insertion/deletion between IGHV4-34 and IGHV4-28 [247] (IMGT ® http://www.imgt.org, IMGT Repertoire (IG and TR) > 2. Locus representations > IGH Locus representation: Human (Homo sapiens) Polymorphism by insertion/deletion between IGHV4-34 and IGHV4-28 (haplotypes A to F)). In addition, eight CNVcontaining haplotypes were identified from a panel of nine diploid genomes of diverse ethnic origin [247]. These polymorphisms confirm the diversity of genomic IGH alleles and/or CNV polymorphisms identified in extensive studies of different populations, as restriction fragment length polymorphism (RFLP) alleles [117,[248][249][250][251] (A) homozygous for that deletion [113,114] allowed the ordering of the Homo sapiens IGHC genes in the IGH locus [174,175]. Deletions I and II [113][114][115], found in healthy individuals from consanguinesous families, involve highly homologous spots of recombination [176], as also described in an healthy individual (T17) homozygous for deletion III and lacking IgA1, IgG2, IgG4 and IgE [116].

IGH Orphons
Thirty-five IGH genes have been found outside the main locus in other chromosomal localizations. These genes designated as orphons cannot contribute in the synthesis of the immunoglobulin chains, even if they have an ORF. Nine IGHV orphons and 10 IGHD orphons have . IGHC-CNV deletions, either identical or different, on both chromosomes, designed I to VI according to the chronogical order in which they were found. Deletion I, first identified by the absence of the Gm1 allotypes in a 70-year old healthy Tunisian woman (TAK3), homozygous for that deletion [113,114] allowed the ordering of the Homo sapiens IGHC genes in the IGH locus [174,175]. Deletions I and II [113][114][115], found in healthy individuals from consanguinesous families, involve highly homologous spots of recombination [176], as also described in an healthy individual (T17) homozygous for deletion III and lacking IgA1, IgG2, IgG4 and IgE [116]. (B). Polymorphisms by insertion/deletion between IGHV4-34 and IGHV4-28 (haplotypes A to F). The distance between IGHV4-34 and IGHV4-28 (sequence length shown by the regular line) is indicated in kilobases (kb) for each haplotype. Haplotype A is from GRCh37 and corresponds to the main line of IMGT Locus Representation [2]. Haplotype B is from GRCh38 and corresponds to BAC clone sequences [247] from the CHORI-17 BAC library. Dotted lines indicate missing genes compared to the haplotype C. (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® ,the international ImMunoGeneTics information system ® , http://www.imgt.org).

Potential Homo sapiens IGH Genomic Repertoire
The potential genomic IGH repertoire per haploid genome comprises 38 to 46 functional IGHV genes belonging to six or seven subgroups depending on the haplotypes [2,237], 23 IGHD, 6 IGHJ and, in the most frequent haplotype, nine IGHC genes ( Table 7). The total number of human IGH genes per haploid genome in the major locus is 170-176 of which 76-84 genes are functional.
The potential repertoire of the Homo sapiens IGH V-CLUSTER is shown in Table 7A and that of the IGH D-J-C-CLUSTER in Table 7B. The IGHV subgroups (Table 7A) (Table 7B) are listed according to the gene order. All the IGHD genes have at least one in-frame reading frame without stop-codons, however four genes are assigned to ORF owing to an unusual 5 D-HEPTAMER. Only the six functional IGHJ genes are shown. The IGHEP1 (4 P) and the IGHGP (2 ORF, 1P) are listed owing to the structural organization of the IGHC genes in two duplicated clusters. Gene order is according to the IMGT Locus gene order (IMGT ® http://www.imgt.org, IMGT Repertoire (IG and TR) > 1. Locus and Genes > 3. Locus descriptions > Locus gene order > IGH).

D-J-C-CLUSTER
IMGT gene names Nb of alleles F Nb of alleles ORF and P Gene order Table 7. Cont.
The potential genomic IGK repertoire per haploid genome comprises 31-35 functional IGKV genes belonging to five subgroups, the five IGKJ genes and the unique IGKC gene [2,264]. One rare IGKV haplotype has been described which contains only the proximal cluster. This haplotype comprises the 40 proximal IGKV genes belonging to seven subgroups, of which 17-19 are functional and belong to five subgroups [2]. If both the proximal and distal IGKV clusters are present, the total number of human IGK genes per haploid genome is 82 of which 37 to 41 are functional. If only the proximal IGKV cluster is present, the total number of genes per haploid genome is 46 of which 23-25 genes are functional [2]. The potential genomic IGK repertoire per haploid genome comprises 31-35 functional IGKV genes belonging to five subgroups, the five IGKJ genes and the unique IGKC gene [2,264]. One rare IGKV haplotype has been described which contains only the proximal cluster. This haplotype comprises the 40 proximal IGKV genes belonging to seven subgroups, of which 17-19 are functional and belong to five subgroups [2]. If both the proximal and distal IGKV clusters are present, the total number of human IGK genes per haploid genome is 82 of which 37 to 41 are functional. If only the proximal IGKV cluster is present, the total number of genes per haploid genome is 46 of which 23-25 genes are functional [2].

IGK Orphons
Twenty-five IGKV orphons have been identified and sequenced: two on the short arm of chromosome two but outside of the major IGK locus, 12

Potential Homo sapiens IGK Genomic Repertoire
The potential repertoire of the Homo sapiens IGK V-CLUSTER is shown in Table 8A and that of the IGK J-C-CLUSTER in Table 8B. The IGKV subgroups (Table 8A) are listed per subgroup and inside each subgroup in an ascending numerical order, first the proximal cluster, then the distal cluster. Only the IGKV subgroups with at least one functional allele are shown. The CDR-IMGT lengths (in number of AA or codons) are shown between square brackets, with lengths of the CDR1-IMGT, CDR2-IMGT and CDR3-IMGT separated by dots. The allele functionality is indicated by F: functional, [F]: when the accession number refers to genomic DNA, but not known as being germline or rearranged, ORF: open reading frame, P: pseudogene. In the column 'Nb of alleles P', the number of pseudogenes with V-REGION in-frame and the number of pseudogenes with frameshift(s) are shown separated by a dot, between parentheses. Seven novel IGKV alleles were characterized [265] following the sequencing of the IGK locus from the CH17 hydatidiform mole BAC library. Gene order is according to the IMGT Locus gene order (IMGT ® http://www.imgt.org, IMGT Repertoire (IG and TR) > 1. Locus and Genes > 3. Locus descriptions > Locus gene order > IGK).

Organization of the Homo sapiens IGL Locus
The Homo sapiens IGL locus is located on chromosome 22 [266], on the long arm, at band 22q11.2 [2], The human IGL locus spans 1050 kb [2] (Figure 19). The orientation of the locus has been determined by the analysis of translocations, involving the IGL locus, in leukemia and lymphoma [267]. Sequencing of the long arm of chromosome 22 showed that it encompasses about 35 megabases of DNA and that the IGL locus is localized at six megabases from the centromere [268]. Although the correlation between DNA seqences and chromosomal bands was not yet been made, the localization of the IGL locus could be refined at 22q11.2. The human IGL locus consists of 73-74 IGLV genes [237,[269][270][271][272][273], localized on 900 kb, seven to 11 IGLJ and seven to 11 IGLC genes depending on the haplotypes, each IGLC gene being preceded by one IGLJ gene [274][275][276][277]. Fifty-six-57 genes belong to 11 subgroups, whereas 17 pseudogenes which are too divergent to be assigned to subgroups, have been assigned to the clans. The potential genomic IGL repertoire per haploid genome comprises 29-33 functional IGLV gnes in the 7-IGL gene haplotype [2,278,279]. One, two, three or four additional IGLC genes, each one probably preceded by one IGLJ, have been shown to characterize IGLC haplotypes with eight, nine, 10 or 11 genes [280,281], but these genes have not yet been sequenced. The total number of human IGL genes per haploid genome is 87-96 of which 37-43 are functional [2].  [271]. Pseudogenes which could not be assigned to subgroups with functional genes are designated by a Roman numeral between parentheses, corresponding to the clans, followed by a hyphen, and a number for the localization from 3′ to 5′ in the locus. (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org).

Potential Homo sapiens IGL Genomic Repertoire
The potential repertoire of the Homo sapiens IGL V-CLUSTER in shown in Table 9A and that of the IGL J-C-CLUSTER in Table 9B. The IGLV subgroups (Table 9A) are listed per subgroup and inside each subgroup, in an ascending numerical order. Only the IGLV subgroups with at least one functional allele are shown. The CDR-IMGT lengths (in number of AA or codons) are shown between square brackets, with lengths of the CDR1-IMGT, CDR2-IMGT and CDR3-IMGT separated by dots. The allele functionality is indicated by F: functional, ORF: open reading frame, P: pseudogene. In the column 'Nb of alleles P', the number of pseudogenes with V-REGION in-frame and the number of pseudogenes with frameshift(s) are shown separated by a dot, between parentheses. IGLV alleles increase the diversity of the lambda light chain repertoire in the human population [284][285][286][287]. Four novel IGLV alleles and one IGLC allele were characterized [265] following the sequencing of the IGL locus from the CH17 hydatidiform mole BAC library. Limited CNV polymorphism by insertion and/or deletion seems to indicate that the V-CLUSTER of the human IGL locus has undergone less evolutionary shuffling that the human IGH or IGL loci [284]. However, CNV with a variable number of additional J-C cassettes, from one to four, have been identified by Southern blot analysis. These additional J-C cassettes are localized between the J2-C2 and J3-C3 cassettes. Gene order is according to the IMGT Locus gene order (IMGT ® http://www.imgt.org, IMGT Repertoire (IG and TR) > 1. Locus and Genes > 3. Locus descriptions > Locus gene order > IGL).  [271]. Pseudogenes which could not be assigned to subgroups with functional genes are designated by a Roman numeral between parentheses, corresponding to the clans, followed by a hyphen, and a number for the localization from 3 to 5 in the locus. (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org).

Potential Homo sapiens IGL Genomic Repertoire
The potential repertoire of the Homo sapiens IGL V-CLUSTER in shown in Table 9A and that of the IGL J-C-CLUSTER in Table 9B. The IGLV subgroups (Table 9A) are listed per subgroup and inside each subgroup, in an ascending numerical order. Only the IGLV subgroups with at least one functional allele are shown. The CDR-IMGT lengths (in number of AA or codons) are shown between square brackets, with lengths of the CDR1-IMGT, CDR2-IMGT and CDR3-IMGT separated by dots. The allele functionality is indicated by F: functional, ORF: open reading frame, P: pseudogene. In the column 'Nb of alleles P', the number of pseudogenes with V-REGION in-frame and the number of pseudogenes with frameshift(s) are shown separated by a dot, between parentheses. IGLV alleles increase the diversity of the lambda light chain repertoire in the human population [284][285][286][287]. Four novel IGLV alleles and one IGLC allele were characterized [265] following the sequencing of the IGL locus from the CH17 hydatidiform mole BAC library. Limited CNV polymorphism by insertion and/or deletion seems to indicate that the V-CLUSTER of the human IGL locus has undergone less evolutionary shuffling that the human IGH or IGL loci [284]. However, CNV with a variable number of additional J-C cassettes, from one to four, have been identified by Southern blot analysis. These additional J-C cassettes are localized between the J2-C2 and J3-C3 cassettes. Gene order is according to the IMGT Locus gene order (IMGT ® http://www.imgt.org, IMGT Repertoire (IG and TR) > 1. Locus and Genes > 3. Locus descriptions > Locus gene order > IGL).   The V domain strands and loops and their delimitations and lengths, based on the IMGT unique numbering for V domain [39][40][41][44][45][46], are shown in Table 10. In the IG and TR V-DOMAIN, the three hypervariable loops BC, C'C" and FG involved in the ligand recognition (native antigen for IG and pMH for TR) are designated complementarity determining regions (CDR-IMGT), whereas the strands form the framework region (FR-IMGT), which includes FR1-IMGT, FR2-IMGT, FR3-IMGT and FR4-IMGT (Table 10). Correspondences between the IMGT unique numbering for V-DOMAIN [39][40][41] with other numberings, e.g., Kabat [118], or canonical structures [288][289][290], are available in the IMGT Scientific chart (IMGT ® http://www.imgt.org, IMGT Scientific chart > Numbering > Correspondence between V numberings).  [40,41] or C domain [42].   [40,41] or C domain [42]. Arrows indicate the direction of the beta strands and their designations in 3D structures. Anchors are shown in squares. (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org).

V-DOMAIN Conserved Amino Acids
A V domain has five characteristic amino acids at given positions (positions with bold (online red) letters in the IMGT Colliers de Perles). Four of them are highly conserved and hydrophobic [81] and are common to the C domain: 23 (1st-CYS), 41 (CONSERVED-TRP), 89 (hydrophobic) and 104 (2nd-CYS). These amino acids contribute to the two major features shared by the V and C domain: the disulfide bridge (between the two cysteines 23 and 104) and the internal hydrophobic core of the domain (with the side chains of tryptophan W41 and amino acid 89). The fifth position, 118, is an anchor of the FG loop. It is occupied, in the V domains of IgSF other than IG or TR [19], by amino acids with diverse physicochemical properties [81]. In contrast, in IG and TR V-DOMAIN, the position 118 is occupied by remarkably conserved amino acids which consist in a phenylalanine or a tryptophan encoded by the J-REGION and therefore designated J-TRP or J-PHE 118. The bulky aromatic side chains of J-TRP and J-PHE are internally orientated and structurally contribute to the V-DOMAIN hydrophobic core [40,41].

V-DOMAIN Delimitation
A criterion used in the IMGT ® characterization of a V domain is its delimitation taking into account the exon delimitations, whenever appropriate. This IMGT ® genomic approach integrates the strands A and G which are usually absent of structural alignments [45].

C Domain Definition and Main Characteristics
The C domain includes the C-DOMAIN of the IG and of the TR [2,3] and the C-LIKE-DOMAIN of the IgSF other than IG and TR. The C domain description of any receptor, any chain and any species is based on the IMGT unique numbering for C domain (C-DOMAIN and C-LIKE-DOMAIN) [42,[44][45][46]. A C domain comprises about 90-100 amino acids and is made of seven antiparallel beta strands (A, B, C, D, E, F and G), linked by beta turns (AB, DE and EF), a transversal strand (CD) and two loops (BC and FG), and forming a sandwich of two sheets [ABED] [GFC] [42,[44][45][46]. A C domain has a topology and a three-dimensional structure similar to that of a V domain but without the C' and C" strands and the C'C" loop, which is replaced by a transversal CD strand [42].

C Domain IMGT Colliers de Perles
The lengths of the strands and loops are visualized in the IMGT Colliers de Perles [48][49][50][51][52], on one layer ( Figure 20B [42,[44][45][46]. A C domain has a topology and a three-dimensional structure similar to that of a V domain but without the C' and C'' strands and the C'C'' loop, which is replaced by a transversal CD strand [42].

C Domain IMGT Colliers de Perles
The lengths of the strands and loops are visualized in the IMGT Colliers de Perles [48][49][50][51][52], on one layer ( Figure 20B left) show, in the forefront, the GFC strands and, in the back, the ABED strands (located at the interface CH1/C-KAPPA), linked by the CD transversal strand. The IMGT Colliers de Perles were generated by the IMGT/Collier-de-Perles tool [51] integrated in IMGT/3Dstructure-DB [57][58][59]. Hydrogen bonds (green lines online were automatically added from the experimental structural data). The disulfide bridge (orange line between C23 and C104) was also automatically added from the experimental structural data. Amino acids are shown in the one-letter abbreviation. Positions at which hydrophobic amino acids (hydropathy index with positive value: I, V, L, F, C, M, A) and tryptophan (W) are found in more than 50% of analysed sequences are shown online in blue. All proline (P) are shown online in yellow. Hatched circles correspond to missing positions according to the IMGT unique numbering for C domain [42]. Arrows indicate the direction of the beta strands and their designations in 3D structures. Anchors are shown in squares. The ribbon representation of the 3D structures (on the right) was obtained using PyMOL (http://www.pymol.org). The identifiers of the chains to which the domains belong are 1n8z_B and 1n8z_A (IMGT ® http://www.imgt.org, IMGT/3Dstructure-DB). (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org).

C Domain Strands and Loops
The C domain strands, turns and loops and their delimitations and lengths, based on the IMGT unique numbering for C domain [42,[44][45][46], are shown in Table 12. Correspondences between the IMGT unique numbering with other numberings (Eu, Kabat [118]) are available in the IMGT Scientific chart (IMGT ® http://www.imgt.org, IMGT Scientific chart > Numbering > Correspondence between C numberings). Table 12. C domain strands, turns and loops and IMGT positions and lengths, based on the IMGT unique numbering for C domain (C-DOMAIN and C-LIKE-DOMAIN) [42].

C Domain Strands and Loops
The C domain strands, turns and loops and their delimitations and lengths, based on the IMGT unique numbering for C domain [42,[44][45][46], are shown in Table 12. Correspondences between the IMGT unique numbering with other numberings (Eu, Kabat [118]) are available in the IMGT Scientific chart (IMGT ® http://www.imgt.org, IMGT Scientific chart > Numbering > Correspondence between C numberings).

C Domain Conserved Amino Acids
A C domain has five characteristic amino acids at given positions (positions with bold (online red) letters in the IMGT Colliers de Perles). Four of them are highly conserved and hydrophobic [81] and are common to the V domain: 23 (1st-CYS), 41 (CONSERVED-TRP), 89 (hydrophobic) and 104 (2nd-CYS). As mentioned above, these amino acids contribute to the two major features shared by the V and C domain: the disulfide bridge (between the two cysteines 23 and 104) and the internal hydrophobic core of the domain (with the side chains of tryptophan W41 and amino acid 89). The fifth position, 118, is diverse and is characterized as being a FG loop anchor.

C Domain Genomic Delimitation
In IMGT ® , the C domains (C-DOMAIN and C-LIKE-DOMAIN) are delimited taking into account the exon delimitation, whenever appropriate. The exon/intron organization of the Homo sapiens IGHC genes shows that each CH C-domain corresponds to one exon [2] (IMGT ® http://www.imgt.org, IMGT Repertoire (IG and TR) > 1. Locus and genes > 5. Gene exon/intron organization > IGHC > Human). As for the V domain, this IMGT ® genomic approach integrates the strands A and G which are usually absent of structural alignments [45].

C-REGION Protein Displays
The Protein display of the C-REGION of the IGKC and IGLC corresponds to the C-KAPPA and C-LAMBDA domains ( Figure 24A,B).The Protein display of the C-REGION of the IGHC genes ( Figure 24C) is shown per CH domain and comprises, in addition to the 3 or 4 CH domains, the hinge region for the H-alpha, H-delta and H-gamma chains and, for the membrane IG (mIG), the region CO + TM + CY (encompassing the connecting region (CO), the transmembrane region (TM) and the cytoplasmic region (CY) ( Table 1) with delimitation of the exons M or M1 and M2 [2]) and for the secreted IG (sIG), the CHS (expressed instead of CO + TM + CY).
are usually absent of structural alignments [45].

C-REGION Protein Displays
The Protein display of the C-REGION of the IGKC and IGLC corresponds to the C-KAPPA and C-LAMBDA domains ( Figure 24A,B).The Protein display of the C-REGION of the IGHC genes ( Figure  24C) is shown per CH domain and comprises, in addition to the 3 or 4 CH domains, the hinge region for the H-alpha, H-delta and H-gamma chains and, for the membrane IG (mIG), the region CO + TM + CY (encompassing the connecting region (CO), the transmembrane region (TM) and the cytoplasmic region (CY) ( Table 1) with delimitation of the exons M or M1 and M2 [2]) and for the secreted IG (sIG), the CHS (expressed instead of CO + TM + CY).    figure) is located between CH1 and CH2. It is encoded by one exon (in IGHG1, IGHG2, IGHG4, IGHGP genes), two exons (in IGHD gene), two to five exons (in IGHG3 gene, 4 for the most common haplotype, shown here). The hinge region of the IGHA1 and IGHA2 is fused in 5 to the CH2 domain (encoded in a H-CH2 exon) [2]. The IGHGP, although an ORF, is shown for compleness. The C-KAPPA, C-LAMBDA and CH are numbered according to the IMGT unique numbering for C-DOMAIN and C-LIKE-DOMAIN [42].

IMGT/V-QUEST for Nucleotide Sequence Analysis
IMGT/V-QUEST [61][62][63][64][65][66] is the IMGT ® online tool for the analysis of IG and TR nucleotide sequence analysis [1]. The entry type corresponds to user nucleotide sequences of V domains (1-50 sequences per analysis), from rearranged gDNA or cDNA. IMGT/V-QUEST identifies the variable (V), diversity (D) and junction (J) genes in rearranged IG and TR sequences and, for the IG, the nucleotide (nt) mutations and amino acid (AA) changes resulting from somatic hypermutations by comparison with the IMGT/V-QUEST reference directories sets. (links available from the IMGT/V-QUEST Welcome page). The IMGT/V-QUEST reference directory sets include IMGT reference sequences (one per allele) from functional (F) genes and alleles, open reading frame (ORF) and pseudogenes (P) alleles with in-frame V-REGION. The tool integrates IMGT/JunctionAnalysis [67,68] for the detailed characterization of the V-D-J or V-J junctions, and IMGT/Automat [69,70] for a complete sequence annotation. The IMGT/V-QUEST tool functionalities include: (1) Introduction of IMGT gaps, according to the IMGT unique numbering for V-DOMAIN (Section 4), (2) Identification of the closest V, D and J genes and alleles, according to the IMGT gene and allele nomenclature (Section 3) (e.g., for Homo sapiens [2,4]) ( Figure 26A), (3) IMGT/JunctionAnalysis results [67,68] (Figure 26B), (4) Description of mutations and amino acid changes [65].
Users can choose the option 'Search for insertions and deletions in V-REGION' [65] (1-50 sequences per run) in Advanced parameters. The option 'Analysis of single chain Fragment variable (scFv) [75], available in Advanced functionalities, allows the analysis of long read scFv sequences from combinatorial libraries containing two V-DOMAIN. Customized parameters and results provided by IMGT/V-QUEST and IMGT/JunctionAnalysis have been described elsewhere [61][62][63][64][65][66].
IMGT/V-QUEST is frequently used by clinicians for the analysis of the somatic hypermutations in leukemia, lymphoma and myeloma, and more particularly in chronic lymphocytic leukemia (CLL) [291][292][293] in which the percentage of mutations in the patient VH has a prognostic value. The sequences of the V-(D)-J junctions determined by IMGT/JunctionAnalysis are also used in the characterization of stereotypic patterns in CLL and B cell lymphoproliferations [291][292][293][294] and for the junction synthesis of specific probes for the follow-up of residual diseases in leukemias and lymphomas.

IMGT/HighV-QUEST
IMGT/HighV-QUEST basic functionalities for NGS repertoire analysis: IMGT/HighV-QUEST [66,[71][72][73][74] is the high-throughput version of IMGT/V-QUEST. It is so far the only online portal available for the direct analysis of long IG and TR sequences from next generation sequencing (NGS) [1]. The submitted entries are user long read nucleotide sequences of V domains (1,000,000 sequences per run). IMGT/HighV-QUEST uses the same algorithms and IMGT reference directories (and therefore provides the same degree of resolution and high quality results) as IMGT/V-QUEST. The tool works for the IG of any species for which an IMGT reference directory is available [1]. The option 'Analysis of single chain Fragment variable (scFv)' allows the analysis of scFv long read sequences which contain two V domains [75], and the repertoire analysis of phage display scFv combinatorial libraries [295,296]. The IMGT/HighV-QUEST basic functionalities include: (1). Introduction of IMGT gaps, according to the IMGT unique numbering for V-DOMAIN [41] (Section 4), (2). Identification of indels and their correction [65] (by default), (3). Identification of the closest V, D and J genes and alleles, according to the IMGT gene and allele nomenclature (Section 3) (e.g., for Homo sapiens [2,4]), (4). IMGT/JunctionAnalysis results [67,68], (5). Description of mutations and amino acid changes [65], (6). IMGT/Automat annotation [69,70].
The IMGT/HighV-QUEST output is provided in eleven results files in CVS format (results equivalent to those of the Excel file from IMGT/V-QUEST online) (Table 13). A twelfth file 'scFv' is only present if 'Analysis of single chain Fragment variable (scFv)' was selected in Advanced functionalities).

#11 Parameters
Date of the analysis, IMGT/V-QUEST programme version, IMGT/V-QUEST reference directory release, Parameters used for the analysis: species, receptor type or locus, IMGT reference directory set and Advanced parameters.
To date (July 10, 2020), more than 21 billion of sequences were analysed by IMGT/HighV-QUEST, by 3177 users from 46 countries (43% users from USA, 34% from EU, 22% from other parts of the world).
Results from the IMGT/HighV-QUEST statistical analysis and IMGT clonotypes (AA) characterization can be analysed by the downloadable package IMGT/StatClonotype [76,77] which provides pairwise evaluation and visualisation comparison of NGS IG and TR IMGT clonotype (AA) diversity or expression ( Figure 28). nucleotides ('Nb diff nt') by comparison with that of the representative sequence, '0′ indicates that the CDR3-IMGT sequence (nt) is identical to that of the IMGT clonotype (AA) representative sequence. For #41328, there is an IMGT clonotype (nt) with 4 nt differences ('c' instead of 't' at position 9, 'c' instead of 'a' at position 12, 't' instead of 'c' at position 30 and 'c' instead of 't' at position 33) compared to the CDR3-IMGT of the representative sequence. #41328 also shows an example of 'several alleles' (for V and J) (2 alleles for IGHV) assigned to an IMGT clonotype (AA).  [66,73]. Displays are based on CDR3-IMGT lengths. In B, the pink lines correspond to IMGT clonotypes (AA) with, below each one of them, the corresponding IMGT clonotypes (nt). (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org).
QUEST, by 3,177 users from 46 countries (43% users from USA, 34% from EU, 22% from other parts of the world).
Results from the IMGT/HighV-QUEST statistical analysis and IMGT clonotypes (AA) characterization can be analysed by the downloadable package IMGT/StatClonotype [76,77] which provides pairwise evaluation and visualisation comparison of NGS IG and TR IMGT clonotype (AA) diversity or expression ( Figure 28).

IMGT/DomainGapAlign
IMGT/DomainGapAlign [58,79,80] is the IMGT ® online tool for the analysis of amino acid sequences and 2D structures of domains (e.g., V and C for IG) [1]. It is very popular in antibody humanization as it allows the comparison of the user V domain against reference sequences of V-REGION and J-REGION of genes and alleles of Homo sapiens and other vertebrate species (e.g., mouse, rat) and the delimitation and characterization of the FR-IMGT and CDR-IMGT.
IMGT/DomainGapAlign analyses amino acid domain sequences by comparison with the IMGT reference directory sets (translation of the germline V and J genes and of the C gene domains]. These reference amino acid sequences can be displayed by querying IMGT/DomainDisplay online. Several amino acid sequences can be analysed simultaneously in IMGT/DomainGapAlign, provided that they belong to the same domain type. IMGT/DomainGapAlign displays the user V domain sequences aligned with the closest V and J regions, with IMGT gaps and delimitations of the strands and loops and the FR-IMGT and CDR-IMGT, according to the IMGT unique numbering [41]. If several closest genes and/or alleles are identified, the user can select the display of each corresponding alignment. The user amino acid sequence is displayed, according to the IMGT color menu, with the delimitations of the V-REGION, J-REGION, and for VH domains, (N-D)-REGION (identified by the tool by comparison with the delimitations of the closest V and J gene and allele). The characteristics of the AA changes [81] are shown in strands and loops and in FR-IMGT and CDR-IMGT. Clicking on the user sequence name in the alignment gives access to the IMGT/Collier-de-Perles tool which automatically provides the IMGT Collier de Perles of the analysed VH or VL domain (V-D-J region or V-J region, respectively) with highlighted amino acid differences (in pink circles online) with the closest germline sequence.
IMGT/DomainGapAlign analyses the user C domain sequences with similar functionalities: alignments and identification of the genes and alleles with the closest C domain, delimitation of the C-DOMAIN in the user sequence, characteristics of the AA changes in strands, turns and loops, IMGT Collier de Perles of the C-DOMAIN with highlighted amino acid differences (in pink circles online) with the closest reference sequence.

IMGT/Collier-de-Perles Tool
The IMGT/Collier-de-Perles tool [51], on the IMGT ® Web site at http://www.imgt.org (IMGT tools), allows the user to draw IMGT Colliers de Perles [47][48][49][50], on one or two layers, starting from their own domain amino acid sequences. Sequences have to be gapped according to the IMGT unique numbering (using for example IMGT/DomainGapAlign [58,79,80]). IMGT/Collier-de-Perles tool can be customized to display the CDR-IMGT according to the IMGT color menu and the amino acids according to their hydropathy or volume, or to the eleven IMGT physicochemical classes [81]. (IMGT ® http://www.imgt.org, IMGT Education > IMGT Aide-Mémoire > Amino acids > IMGT classes of the 20 common amino acids). IMGT color menu for the CDR-IMGT of a V-DOMAIN indicates the type of rearrangement, V-J or V-D-J [2,3]. Thus, the IMGT color menu for CDR1-IMGT, CDR2-IMGT and CDR3-IMGT, is red, orange and purple for VH (encoded by a V-D-J-REGION resulting from a V-D-J rearrangement), and blue, green and greenblue for V-KAPPA or V-LAMBDA (encoded by a V-J-REGION resulting from a V-J rearrangement). The IMGT/Collier-de-Perles tool is incorporated in IMGT/V-QUEST [61][62][63][64][65][66] (users start from IG and TR V domain nucleotide sequences), IMGT/DomainGapAlign [58,79,80] (users start from V, C and G amino acid sequences).
IMGT Colliers de Perles for V, C and G domains are provided in IMGT/2Dstructure-DB (for amino acid sequences in the database) and in IMGT/3Dstructure-DB (on two layers with hydrogen bonds for the V or C domains or with the pMH contact sites for the G domains, for 3D structures) [57][58][59].
'Chain details' provides detailed IMGT annotation which includes the IMGT gene and allele identification (CLASSIFICATION), region and domain delimitations (DESCRIPTION) and amino acid (AA) positions according to the IMGT unique numbering. The closest IMGT genes and alleles expressed in the AA sequences of the 3D structures are identified by aligning the AA sequences of the 3D structures with the IMGT domain reference directory. The '1n8z' comprises the trastruzumab Fab chains (L-KAPPA '1n8z_A' and the VH-CH1 '1n8z_B' (Figure 30). The Fab is in complex with the Homo sapiens ERRB2 (erb-b2 receptor tyrosine kinase 2, HER2, NEU, CD340) (Ligand '1n8z_C') [300].  Figure 31. An 'IMGT Residue@Position' is defined by the IMGT position numbering in a domain (or if not characterized, in the chain), the AA name (3-letter and between parentheses 1-letter abbreviation), the IMGT domain description and the IMGT chain ID, e.g., '57 -TYR (Y) -VH -1n8z_B'. Its characteristics are reported in an IMGT Residue@Position card (or 'R@P') which includes (i) general information (PDB file numbering, IMGT file numbering, residue full name and formula), (ii) structural information 'IMGT LocalStructure@Position' (secondary structure, Phi and Psi angles (in degrees) and accessible surface area (ASA) (in square angstrom)) and (iii) detailed contact analysis. Contact analysis of IG/antigen complexes, is provided with detailed and standardized description of paratope/epitope in crystal structures. 'Renumbered IMGT flat file' allows to view (or download) an IMGT coordinate file renumbered according to the IMGT unique numbering and to which has been added the IMGT specific 'Renumbered IMGT flat file' allows to view (or download) an IMGT coordinate file renumbered according to the IMGT unique numbering and to which has been added the IMGT specific information.
This IMGT information (identical to that provided in 'Chain details') is in the 'REMARK 410' lines (blue online) added in the IMGT coordinate files. Tools associated to IMGT/3Dstructure-DB include IMGT/StructuralQuery and IMGT/DomainSuperimpose, available online.
The current IMGT/2Dstructure-DB entries include 336 AA sequences of antibodies from Kabat [118] (those for which there were no available nucleotide sequences), and AA sequences of mAb and FPIA from IMGT/mAb-DB [60] and the WHO-INN programme [86,87]. Queries can be made on an individual entry, using the Entry ID or the Molecule name. Thus a 'trastuzumab' query in 'Molecule name' allows to retrieve 18 results: five INN from IMGT/2Dstructure-DB, and thirteen 3D structures Figure 32. IMGT/3Dstructure-DB paratope/epitope of trastuzumab/ERBB2 in 1n8z [57][58][59]. The IMGT paratope, or antigen-binding site, include the part of the VH and VL domains that recognizes (binds to) the antigen (Ag) (epitope or antigenic determinant) [298,299]. In addition to the contacts between amino acids of the CDR1-IMGT, CDR2-IMGT and CDR3-IMGT of the V-KAPPA and VH and the antigen ERBB2 (Figure 31), framework positions which are detected as having relevant contacts with the antigen are included in the paratope. This is the case, here, of the anchor positions of the CDR2-IMGT in V-KAPPA F(66_A) ( Figure 31A) and in VH R(55_B) and R(66_B) ( Figure 31B). These posiions are classically taken into account in V domain humanization by grafting. (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org).
The current IMGT/2Dstructure-DB entries include 336 AA sequences of antibodies from Kabat [118] (those for which there were no available nucleotide sequences), and AA sequences of mAb and FPIA from IMGT/mAb-DB [60] and the WHO-INN programme [86,87]. Queries can be made on an individual entry, using the Entry ID or the Molecule name. Thus a 'trastuzumab' query in 'Molecule name' allows to retrieve 18 results: five INN from IMGT/2Dstructure-DB, and thirteen 3D structures (of which "1nz8") from IMGT/3Dstructure-DB. The same query interface is used for IMGT/2Dstructure-DB and IMGT/3Dstructure-DB.
The IMGT/2Dstructure-DB cards provide standardized IMGT information on chains and domains and IMGT Colliers de Perles on one or two layers, identical to that provided for the sequence analysis in IMGT/3Dstructure -DB, however the information on experimental structural data (hydrogene bonds in IMGT Collier de Perles on two layers, Contact analysis) is only available in the corresponding IMGT/3Dstructure-DB cards, if the antibodies have been cristallised.

IMGT/mAb-DB
IMGT/mAb-DB [60], has been developed to provide an easy access to therapeutic antibody amino acid sequences (links to IMGT/2Dstructure-DB) and structures (links to IMGT/3Dstructure-DB, if 3D structures are available). IMGT/mAb-DB data include monoclonal antibodies (mAb, INN suffix -mab, being defined by the presence of at least an IG variable domain) and fusion proteins for immune applications (FPIA, for example, a receptor or membrane ligand fused to a Fc) from the WHO INN programme [86,87]. This database also includes a few composite proteins for clinical applications (CPCA) (e.g., protein or peptide fused to a Fc for only increasing their half-life; INN prefix ef-recently adopted for these CPCA) and some related proteins of the immune system (RPI) used, unmodified, for clinical applications.

CDR-IMGT Delimitation for Grafting
For many years, the main source of specific monoclonal antibodies [301] has been from mouse or rat species, owing to the difficulty of obtaining human monoclonal antibodies by the hybridoma methodology. The objective of antibody humanization has been to graft at the DNA level the CDR of an antibody V domain, from mouse (or other species) and of a given specificity, onto the V domain framework of a human antibody, thus preserving the specificity of the original (murine or other species) antibody while decreasing its immunogenicity [302]. The Contact analysis of IG/Ag complexes in IMGT/3Dstructure-DB [57][58][59] and their analysis [303] demonstrate the preponderance of the CDR-IMGT amino acids in the paratope [298] (Figure 31; Figure 32). IMGT/DomainGapAlign [58,79,80] has become the IMGT reference tool for antibody humanization design based on CDR grafting. Indeed, it precisely defines the CDR-IMGT to be grafted and helps selecting the most appropriate human FR-IMGT by providing the alignment of the amino acid sequences between the mouse (or other species) and the closest human V-DOMAIN [90][91][92][93][94][95][96][97][98][99][100] (IMGT ® http://www.imgt.org, The IMGT Biotechnology page > Antibody humanization).
Analyses performed on humanized therapeutic antibodies underline the importance of a correct delimitation of the CDR and FR. As an example, two amino acid changes were required in the first version of the humanized VH of alemtuzumab, in order to restore the specificity and affinity of the original rat antibody. The positions of these amino acid changes (S28 > F and S35 > F) are now known to be located in the CDR1-IMGT and should have been directly grafted, but at the time of this mAb humanization they were considered as belonging to the FR according to the Kabat numbering [118]. In contrast, positions 66-74 were, at the same time, considered as belonging to the CDR according to the Kabat numbering, whereas they clearly belong to the FR2-IMGT and the corresponding sequence should have been 'human' instead of being grafted from the 'rat' sequence (IMGT ® http://www.imgt.org, The IMGT Biotechnology page > Antibody humanization > Alemtuzumab).

Dromedary IgG2 and IgG3
The dromedary or Arabian camel (Camelus dromedarius) IGHV3 genes belong to two sets based on four AA differences which have been linked to two antibody formats expressed in Camelidae: the conventional IG (H2L2) and the "only-heavy-chain" IG (H2, i.e., no light chain and only two identical H-gamma chains lacking CH1) [304] (IMGT ® http://www.imgt.org, IMGT Repertoire (IG and TR) > 2. Proteins and alleles > 1. Protein displays > V-REGION > IGHV > Arabian camel (Camelus dromedarius)). The AA differences characteristic of each set are located at four FR2-IMGT positions, 42, 49, 50 and 52 (42 in the C strand, 49, 50 and 52 in the C' strand), and belong to the [GFCC'C"] sheet at the hydrophobic VH-VL interface in conventional antibodies of Camelidae as well as of any vertebrate species [1] whereas, in camelid 'only-heavy-chain' antibodies (no light chains, and therefore no VL), these positions are exposed to the environment with, through evolution, a selection of hydrophilic amino acids [305].
The first set of IGHV3 genes is expressed in conventional tetrameric IgG1 that constitute 25% of circulating antibodies. The second set is expressed in 'only-heavy-chain' IgG2 and IgG3 that constitute 75% of the circulating antibodies [304]. The respective H-gamma2 and H-gamma3 chains are both characterized by the absence of the CH1 domain owing to a splicing defect [306]. It is the absence of CH1 which is responsible for the lack of association of the light chains (see 6.1.2) ( Figure 33) (IMGT ® http://www.imgt.org, IMGT Repertoire (IG and TR) > 1. Locus and Genes > 7. Gene tables; ibid., IMGT Biotechnology page > Characteristics of the camelidae (camel, llama) antibody synthesis). Only-heavy-chain antibodies is a feature of the Camelidae IG as they have also been found in the Bactrian camel (Camelus bactrianus) of Central Asia and in the llama (Lama glama) and alpaca (Vicugna pacos) of South America. The genetic event (splicing defect) responsible for the lack of CH1 occurred in their common ancestor before the radiation between the 'camelini' and 'lamini', dating approximately 11 million years (Ma) ago.
The V domains of Camelidae 'only-heavy-chain' antibodies have characteristics for potential pharmaceutical applications (e.g., specificities with binding to protein clefts for those with extended CDR3, easy production and selection of single-domain format with novel specificities). They are designated as VH H domain when they have to be distinguished from conventional VH (the V sequence criterion is based on the four AA at positions 42, 49, 50 and 52, particularly 49 and 50, with E49 (or Q49) and R50 in VH H , and G49 and L50 (or P50) in VH. A more complete knowledge of the germline genes is required to identify somatic mutations from genetic polymorphisms, for the other positions. Most llama VH H have normal or long CDR3-IMGT and, as expected from Arabian camel, have E49 (or Q49) and R50 (i.e., llama anti-TNF 5m2i [8.8.16] and 5m2m [12.8.18] [307], or llama anti-HIV 5hm1 [8.7.14] [308]). However some llama VH H , defined as having no paired VL, were found unexpectedly to have the conventional G49 and L50, and to be characterized by a short CDR3-IMGT of 8 amino acids and Arg R118 (instead of J-TRP W118) (i.e., 5m2j [8.8.8] [307]). The selection of the J-REGION R118 (instead of W118, in the IGHJ 118-121 'W-G-X-G' motif, G strand of the V domain) and that of the V-REGION R50 (instead of L50 or P50, in the IGHV, C' strand of the V domain) represent therefore two different evolutionary paths which make the G-F-C-C' layer of the VH H more hydrophilic (Figure 21). The term 'nanobody' initially used for describing a single-domain format antibody is not equivalent to VH H , as it has been used for V domains other than VH H and for constructs containing more than one V domain and is a registered mark (VH and/or VH H ) (e.g., caplacizumab, ozoralizumab) (IMGT ® http://www.imgt.org, IMGT/mAb-DB > caplacizumab; ibid. ozoralizumab).
circulating antibodies. The second set is expressed in 'only-heavy-chain' IgG2 and IgG3 that constitute 75% of the circulating antibodies [304]. The respective H-gamma2 and H-gamma3 chains are both characterized by the absence of the CH1 domain owing to a splicing defect [306]. It is the absence of CH1 which is responsible for the lack of association of the light chains (see 6.1.2) ( Figure  33) (IMGT ® http://www.imgt.org, IMGT Repertoire (IG and TR) > 1. Locus and Genes > 7. Gene tables; ibid., IMGT Biotechnology page > Characteristics of the camelidae (camel, llama) antibody synthesis). Only-heavy-chain antibodies is a feature of the Camelidae IG as they have also been found in the Bactrian camel (Camelus bactrianus) of Central Asia and in the llama (Lama glama) and alpaca (Vicugna pacos) of South America. The genetic event (splicing defect) responsible for the lack of CH1 occurred in their common ancestor before the radiation between the 'camelini' and 'lamini', dating approximately 11 million years (Ma) ago.

Human Heavy Chain Diseases (HCD)
The camelidae 'only-heavy-chain' antibodies synthesis is remarkably reminiscent of what is observed in human heavy chain diseases (HCD). These proliferative disorders of B lymphoid cells produce truncated monoclonal immunoglobulin heavy chains which lack associated light chains. In most HCD, the absence of the heavy chain CH1 domain by deletion or splicing defect may be responsible for the lack of assembly of the light chain [309]. Similar observations have also been reported in mouse variants [309] (IMGT ® http://www.imgt.org, IMGT Education > Pathology of the immune system > Molecular defects in Immunoglobulin Heavy Chain Diseases (HCD); ibid., IMGT Lexique > Heavy Chain Diseases (HCD)).

Contact Analysis of TR-Mimic Antibodies and TR
The IMGT unique numbering has recently allowed the contact analysis comparison between an antibody and a T cell receptor, with a same ligand [311]. Both immunoglobulin, IG Fab 3M4E5, a TR-mimic antibody, and the T cell receptor, TR 1G4_a58b61, target the NY-ESO-1 peptide SLLMWITQC presented by HLA-A*02:01 [312] (Figure 34). IMGT Colliers de Perles on one and two layers and contact analyses of the IG/pMH1 (3gjf, for Fab 3M4E5) and TR/pMH1 (2p5e, for 1G4_a58b61) complexes [311] are available in IMGT/3Dstructure-DB, based on the unique numbering V-DOMAIN for IG and TR [40][41][42], and the IMGT unique numbering for G-DOMAIN for MH [43]. They allow to visualize the features and differences in the antigen recognition by an antibody and a TR targeting the same p/MH antigen ( Figure 34). The contacts of the NY-ESO-1 peptide SLLMWITQC with the MH1 HLA-A*02:01 groove are similar in the two peptide-HLA complexes as expected [313][314][315].

Contact Analysis of TR-Mimic Antibodies and TR
The IMGT unique numbering has recently allowed the contact analysis comparison between an antibody and a T cell receptor, with a same ligand [311]. Both immunoglobulin, IG Fab 3M4E5, a TRmimic antibody, and the T cell receptor, TR 1G4_a58b61, target the NY-ESO-1 peptide SLLMWITQC presented by HLA-A*02:01 [312] (Figure 34). IMGT Colliers de Perles on one and two layers and contact analyses of the IG/pMH1 (3gjf, for Fab 3M4E5) and TR/pMH1 (2p5e, for 1G4_a58b61) complexes [311] are available in IMGT/3Dstructure-DB, based on the unique numbering V-DOMAIN for IG and TR [40][41][42], and the IMGT unique numbering for G-DOMAIN for MH [43]. They allow to visualize the features and differences in the antigen recognition by an antibody and a TR targeting the same p/MH antigen ( Figure 34). The contacts of the NY-ESO-1 peptide SLLMWITQC with the MH1 HLA-A*02:01 groove are similar in the two peptide-HLA complexes as expected [313][314][315].

Antibody C-Domain Post-Translational Modifications, Engineering and Allotypes
The constant region of the IG heavy chain is made of several CH domains, which are analysed and described in IMGT ® using the IMGT unique numbering [42,[44][45][46]. This allows a universal standardized comparison of sequences and 3D structures between C domains of any chain, any receptor and any species. Examples of post-translational modifications (glycosylations), effector properties and engineering at the C-DOMAIN level are given in the following subsections.

Antibody C-Domain Post-Translational Modifications, Engineering and Allotypes
The constant region of the IG heavy chain is made of several CH domains, which are analysed and described in IMGT ® using the IMGT unique numbering [42,[44][45][46]. This allows a universal standardized comparison of sequences and 3D structures between C domains of any chain, any receptor and any species. Examples of post-translational modifications (glycosylations), effector properties and engineering at the C-DOMAIN level are given in the following subsections.   [94]. (A) IMGT Collier de Perles of IGHG1 CH2 on one layer (on the left) and on two layers with hydrogen bonds (on the right). The N84.4 is at the DE turn. The identifier of the chain to which the domain belongs is 1hzh_H. The IMGT Colliers de Perles on two layers show, in the forefront, the GFC strands and, in the back, the ABED strands, linked by the CD transversal strand. The IMGT Colliers de Perles were generated by the IMGT/Collier-de-Perles tool [51] integrated in IMGT/3Dstructure-DB [57][58][59]. Hydrogen bonds (green lines) and disulfide bond between C23 and C104 (orange line) were automatically added from the experimental structural data). Amino acids are shown in the one-letter abbreviation. Positions at which hydrophobic amino acids (hydropathy index with positive value: I, V, L, F, C, M, A) and tryptophan (W) are found in more than 50% of analysed sequences are shown in blue. All proline (P) Figure 35. Homo sapiens IGHG1 CH2 and N-linked glycosylation site N84.4 [94]. (A) IMGT Collier de Perles of IGHG1 CH2 on one layer (on the left) and on two layers with hydrogen bonds (on the right). The N84.4 is at the DE turn. The identifier of the chain to which the domain belongs is 1hzh_H. The IMGT Colliers de Perles on two layers show, in the forefront, the GFC strands and, in the back, the ABED strands, linked by the CD transversal strand. The IMGT Colliers de Perles were generated by the IMGT/Collier-de-Perles tool [51] integrated in IMGT/3Dstructure-DB [57][58][59]. Hydrogen bonds (green lines) and disulfide bond between C23 and C104 (orange line) were automatically added from the experimental structural data). Amino acids are shown in the one-letter abbreviation. Positions at which hydrophobic amino acids (hydropathy index with positive value: I, V, L, F, C, M, A) and tryptophan (W) are found in more than 50% of analysed sequences are shown in blue. All proline (P) are shown in yellow. Hatched circles correspond to missing positions according to the IMGT unique numbering for C domain [42]. Arrows indicate the direction of the beta strands and their designations in 3D structures. Anchors (26,39,45,77,104,118) are shown in squares. (B) 3D structure of the IGHG1 CH2 dimer with the two carbohydrate chains. The N84.4 at the DE turn is shown on the CH2 on the left whereas the N84.4 of the CH2 on the right is hidden behind the carbohydrates. The ribbon representation of the 3D structures was obtained using PyMOL (http://www.pymol.org). The identifiers of the H-gamma1 chains to which the CH2 domains belong are 1hzh_H and 1hzh_K (IMGT ® http://www.imgt.org, IMGT/3Dstructure-DB). (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org).
The human glycans are mainly classified as 'biantennary complex' structure with a core fucose (Fuc) and are often terminated with N-acetylneuraminic acid (Neu5Ac), a sialic acid ( Figure 36). The largest N-linked oligosaccharide structure found in human IgG is shown in Figure 36A (left panel). The conserved heptasaccharide core is composed of two N-acetylglucosamine (GlcNAc), three mannose (Man) and two other GlcNAc residues that are β-1,2 linked to α-6 Man and α-3 Man, forming two arms. The bisecting N-acetylglucosamine (GlcNac, NAG) represents around 10% of human IgG glycoforms. The four most abundant glycans in mAb biopharmaceuticals are shown in Figure 36A (right panel). The Fc oligosaccharides are terminated by zero, one or two galactoses and are called G0, G1 or G2, respectively. For G1F, Gal can be on the α1,3-arm or on the α1,6-arm. Additional fucose (Fuc), galactose (Gal), N-acetylneuraminic acid (Neu5Ac) and N-glycolylneuraminic acid (Neu5Gc) residues may be present or not, particularly depending on the expression system. Mammalian cell expression systems are the favorite methods for the commercial production of monoclonal antibodies because their protein glycosylation machinery closely resembles that in human. The current marketed antibodies are mainly expressed in CHO (Chinese Hamster Ovary), SP2/0 (mouse myeloma cells), NS0 (Non-Secreting mouse myeloma cells) and hybridomas ( Figure 36B). The human glycans are mainly classified as 'biantennary complex' structure with a core fucose (Fuc) and are often terminated with N-acetylneuraminic acid (Neu5Ac), a sialic acid ( Figure 36). The largest N-linked oligosaccharide structure found in human IgG is shown in Figure 36A (left panel). The conserved heptasaccharide core is composed of two N-acetylglucosamine (GlcNAc), three mannose (Man) and two other GlcNAc residues that are β-1,2 linked to α-6 Man and α-3 Man, forming two arms. The bisecting N-acetylglucosamine (GlcNac, NAG) represents around 10% of human IgG glycoforms. The four most abundant glycans in mAb biopharmaceuticals are shown in Figure 36A (right panel). The Fc oligosaccharides are terminated by zero, one or two galactoses and are called G0, G1 or G2, respectively. For G1F, Gal can be on the α1,3-arm or on the α1,6-arm. Additional fucose (Fuc), galactose (Gal), N-acetylneuraminic acid (Neu5Ac) and Nglycolylneuraminic acid (Neu5Gc) residues may be present or not, particularly depending on the expression system. Mammalian cell expression systems are the favorite methods for the commercial production of monoclonal antibodies because their protein glycosylation machinery closely resembles that in human. The current marketed antibodies are mainly expressed in CHO (Chinese Hamster Ovary), SP2/0 (mouse myeloma cells), NS0 (Non-Secreting mouse myeloma cells) and hybridomas ( Figure 36B).

Knobs-into-Holes
The knobs-into-holes methodology has been proposed for obtaining bispecific antibodies [309]. The aim is to increase interactions between the CH3 domain of two H-gamma1 chains that belong to antibodies with a different specificity to obtain bispecific antibodies. The two amino acids CH3 T22 (B strand) and Y86 (E strand) selected for changes belong to the [ABED] sheet, at the interface of the two Homo sapiens IGHG1 CH3 domains ( Figure 37). Interactions of these two amino acids are described in 'Contact analysis' in IMGT/3Dstructure-DB [57][58][59] (Figure 37A). The knobs-into-holes methodology consists into an amino acid change on one CH3 domain (here, T22 > Y) that creates a knob, and another amino acid change on the other CH3 domain (here, Y86 > T) that creates a hole, thus favoring increased interactions between the CH3 of the two H-gamma1 chains at both positions 22 and 86 [316] (IMGT ® http://www.imgt.org, The IMGT Biotechnology page > Bispecific antibodies > Knobs-into-holes amino acid changes). (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org).

Knobs-into-Holes
The knobs-into-holes methodology has been proposed for obtaining bispecific antibodies [309]. The aim is to increase interactions between the CH3 domain of two H-gamma1 chains that belong to antibodies with a different specificity to obtain bispecific antibodies. The two amino acids CH3 T22 (B strand) and Y86 (E strand) selected for changes belong to the [ABED] sheet, at the interface of the two Homo sapiens IGHG1 CH3 domains ( Figure 37). Interactions of these two amino acids are described in 'Contact analysis' in IMGT/3Dstructure-DB [57][58][59] (Figure 37A). The knobs-into-holes methodology consists into an amino acid change on one CH3 domain (here, T22 > Y) that creates a knob, and another amino acid change on the other CH3 domain (here, Y86 > T) that creates a hole, thus favoring increased interactions between the CH3 of the two H-gamma1 chains at both positions 22 and 86 [316] (IMGT ® http://www.imgt.org, The IMGT Biotechnology page > Bispecific antibodies > Knobs-into-holes amino acid changes). (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org).  Homo sapiens IGHG1 CH3 on one layer (on the left) and on two layers with hydrogen bonds (on the right). T22 (in strand B) and Y86 (in strand E) are highlighted. The IMGT Colliers de Perles on two layers show, in the forefront, the GFC strands and, in the back, the ABED strands, linked by the CD transversal strand. The [ABED] sheets are at the interface between the two CH3 domains. The IMGT Colliers de Perles were generated by the IMGT/Collier-de-Perles tool [51] integrated in IMGT/3Dstructure-DB [57][58][59]. Hydrogen bonds (green lines) and disulfide bond between C23 and C104 (orange line) were automatically added from the experimental structural data. Amino acids are shown in the one-letter abbreviation. Positions at which hydrophobic amino acids (hydropathy index with positive value: I, V, L, F, C, M, A) and tryptophan (W) are found in more than 50% of analysed sequences are shown in blue. All proline (P) are shown in yellow. Hatched circles correspond to missing positions according to the IMGT unique numbering for C domain [42]. Arrows indicate the direction of the beta strands and their designations in 3D structures. Anchors (26,39,45,77,104,118) are shown in squares. (

Interface Ball-and-Socket-Like Joints
The comparison of the interface between the CH2 and CH3 domains from 3D structures of Homo sapiens IGHG2 Fc with the interface in 3D structures of IGHG1 Fc revealed that in all Fc of gamma chains the movement of the CH2 results from a pivoting around a highly conserved ball-and-socket-like joint [317]. Using the IMGT unique numbering for C-domain, the CH2 L15 side chain (last position of the A strand, next to the AB turn) (the 'ball') interacts with a pocket (the 'socket') formed by CH3 M107, H108, E109 and H115 (FG loop) ( Figure 38). The interface is stabilized by two hydrogen bonds: CH2 L15 (O) and CH3 H115 (ND1), CH2 K125 (O) and CH3 Y29 (OH), and by two salt bridges: CH2 K12 (A strand) and CH3 E40 (C strand), CH2 K123 (G strand) and CH3 E109 (FG) (Figure 38). These amino acids are well conserved between the Homo sapiens gamma isotypes and the IGHG genes and alleles except for IGHG3 H115 that shows a polymorphism associated to different G3m allotypes [83]. This ball-and-socket-like joint is a structural feature similar but reversed to that previously described at the VH and CH1 domain interface [318], in which the VH L12, T125 and S127 form the 'socket' whereas the CH1 F29 and P30 form the 'ball' (IMGT ® http://www.imgt.org, IMGT Repertoire > 2. Proteins and alleles > 1. Protein displays > C-DOMAIN with CHS, M and HINGE regions; ibid., IMGT Repertoire > 2. Proteins and alleles > 2. Alignments of alleles > IGHC; ibid., IMGT/3Dstructure-DB > query on Fab).

IGHG Alleles and Gm Allotypes
Allotypes are polymorphic markers of an IG subclass that correspond to amino acid changes and are detected serologically by antibody reagents [83]. In therapeutic antibodies (human, humanized or chimeric), allotypes may represent potential immunogenic epitopes [82], as demonstrated by the presence of antibodies in individuals immunized against these allotypes [83]. For the H-gamma chains, the allotypes are designated as Gm (for gamma marker), and for the H-gamma1 chains as G1m [83]. The allotypes G1m, G2m and G3m are carried by the constant region of the H-gamma1, H-gamma2 and H-gamma3 chains, encoded by the IGHG1, IGHG2 and IGHG3 genes, respectively. . Contact analysis and ball-and-socket interface between Homo sapiens IGHG1 CH2 and CH3 [94]. (A) Contact analysis between the CH2 and CH3 domains of the Fc gamma1 (from IMGT/3Dstructure-DB, 3ave_A ). The amino acids of the CH3 BC and FG loops (left column) and those of the CH2 G strand (right column) are shown in rectangles. CH2 96 and 97 correspond to the EF turn whereas other positions are from the A strand or AB turn. Arrows indicates the two hydrogen bonds (orange on line) and the two salt bridges (green online) mentioned in the text [312]. (B) The ball-and-socket-joint of the IGHG1 CH2-CH3 interface [312] is shown using the IMGT numbering, with the ball (L15) and the socket (M107, H108, E109 and H115 The H-gamma1 chains may express four G1m alleles (combinations of G1m allotypes): G1m3, G1m3,1, G1m17,1, and G1m17,1,2 (and in Negroid populations three additional G1m alleles, Gm17,1,27, Gm17,1,28 and Gm17,1,27,28) [83]. The correspondence between the G1m alleles and IGHG1 alleles is shown in Table 14. Figure 38. Contact analysis and ball-and-socket interface between Homo sapiens IGHG1 CH2 and CH3 [94]. (A) Contact analysis between the CH2 and CH3 domains of the Fc gamma1 (from IMGT/3Dstructure-DB, 3ave_A). The amino acids of the CH3 BC and FG loops (left column) and those of the CH2 G strand (right column) are shown in rectangles. CH2 96 and 97 correspond to the EF turn whereas other positions are from the A strand or AB turn. Arrows indicates the two hydrogen bonds (orange on line) and the two salt bridges (green online) mentioned in the text [312]. (B) The ball-and-socket-joint of the IGHG1 CH2-CH3 interface [312] is shown using the IMGT numbering, with the ball (L15) and the socket (M107, H108, E109 and H115). The interface is stabilized by two hydrogen bonds involving on CH2, L15 (O) and K125 (O) that bind on CH3, H115 (ND1) and Y29 (OH), respectively, and by two salt bridges involving on CH2, K12 (A strand) and K123 (G strand) that interact on CH3 with E40 (C strand) and E109 (FG loop), respectively. The identifier of the gamma1 chain to which the CH2 and CH3 domains belong is 3ave_A (IMGT ® http://www.imgt.org, IMGT/3Dstructure-DB). (With permission from M-P. Lefranc and G. Lefranc, LIGM, Founders and Authors of IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org).
The H-gamma1 chains may express four G1m alleles (combinations of G1m allotypes): G1m3, G1m3,1, G1m17,1, and G1m17,1,2 (and in Negroid populations three additional G1m alleles, Gm17,1,27, Gm17,1,28 and Gm17,1,27,28) [83]. The correspondence between the G1m alleles and IGHG1 alleles is shown in Table 14.  6 G1m17,1,28 G1m,17,1,27,28 Amino acids involved in the expression of G1m allotypes and localized on the CH1 and CH3 domains of H-gamma1 chains are shown in IMGT Colliers de Perles on two layers and in 3D structures ( Figure 39). In the CH1, the lysine at position 120 (K120) in strand G corresponds to the G1m17 allotype [83]. The isoleucine I103 (strand F) is specific of the H-gamma1 chain isotype. If an arginine is expressed at position 120 (R120), the simultaneous presence of R120 and I103 corresponds to the expression of the G1m3 allotype [83]. For isotypes other than H-gamma1, R120 corresponds to the expression of the nG1m17 isoallotype (an isoallotype or nGm is detected by antibody reagents that identify this marker as an allotype in one IgG subclass and as an isotype for other subclasses). In the CH3, the aspartate D12 and leucine L14 (strand A) correspond to G1m1, whereas glutamate E12 and methionine M14 correspond to the nG1m1 isoallotype [83]. A glycine at position 110 corresponds to G1m2, whereas an alanine does not correspond to any allotype (G1m2-negative chain).
Trastuzumab has been engineered in order to obtain the less immunogenic (being most frequent in different populations) allotype G1m17 (CH1 K120), associated to the nG1m1 (CH3 E12, M14) [319], defined using the generic description, as IGHG1*03v, G1m3 > G1m17, nG1m1 (CH1 R120 > K, CH3 E12,M14). The G1m allotypes have been confirmed serologically [82]. Amino acids involved in the expression of G1m allotypes and localized on the CH1 and CH3 domains of H-gamma1 chains are shown in IMGT Colliers de Perles on two layers and in 3D structures ( Figure 39). In the CH1, the lysine at position 120 (K120) in strand G corresponds to the G1m17 allotype [83]. The isoleucine I103 (strand F) is specific of the H-gamma1 chain isotype. If an arginine is expressed at position 120 (R120), the simultaneous presence of R120 and I103 corresponds to the expression of the G1m3 allotype [83]. For isotypes other than H-gamma1, R120 corresponds to the expression of the nG1m17 isoallotype (an isoallotype or nGm is detected by antibody reagents that identify this marker as an allotype in one IgG subclass and as an isotype for other subclasses). In the CH3, the aspartate D12 and leucine L14 (strand A) correspond to G1m1, whereas glutamate E12 and methionine M14 correspond to the nG1m1 isoallotype [83]. A glycine at position 110 corresponds to G1m2, whereas an alanine does not correspond to any allotype (G1m2-negative chain).
The H-gamma2 chains express only one allotype G2m23. G2m23 and the H-gamma2 chains are either G2m23 or G2m.. (two dots indicate that a specimen was tested and found to be negative for G2m23 [83]. G2m23 is localized on CH2. Amino acid sequence and 3D structure comparisons show that the G2m23 allotype is correlated with CH2 M45.1, whereas the absence of the allotype (G2m..) is correlated with valine V45.1 [83]. The G2m23-positive H-gamma2 chains are also characterized by the presence of threonine T92 in the CH1, whereas the G2m23-negative chains and the H-gamma chains of other IgG subclasses have proline P92 in the CH1. Being located on the CH1 domain this amino acid change is not involved in the expression of the G2m23 allotype, but owing to the strong linkage on the same chain, the CH1 T92 codon has been used for the molecular characterization of the G2m23 chains [83].
The H-gamma2 chains express only one allotype G2m23. G2m23 and the H-gamma2 chains are either G2m23 or G2m. (two dots indicate that a specimen was tested and found to be negative for G2m23 [83]. G2m23 is localized on CH2. Amino acid sequence and 3D structure comparisons show that the G2m23 allotype is correlated with CH2 M45.1, whereas the absence of the allotype (G2m..) is correlated with valine V45.1 [83]. The G2m23-positive H-gamma2 chains are also characterized by the presence of threonine T92 in the CH1, whereas the G2m23-negative chains and the H-gamma chains of other IgG subclasses have proline P92 in the CH1. Being located on the CH1 domain this amino acid change is not involved in the expression of the G2m23 allotype, but owing to the strong linkage on the same chain, the CH1 T92 codon has been used for the molecular characterization of the G2m23 chains [83].

IGHG Engineered Variants and Effector Properties
Amino acids in the IGHG constant regions of the IG heavy chains are frequently engineered to modify the effector properties of the therapeutic monoclonal antibodies. Amino acids changes are engineered at positions involved in antibody-dependent cellular (ADCC), antibody-dependent cellular phagocytosis (ADCP), complement-dependent cytotoxicity (CDC), half-life increase, half-IG exchange, and B cell inhibition by coengagement of antigen and FcγR on the same cell [320,321] (IMGT ® http://www.imgt.org, The IMGT Biotechnology page > Amino acid positions involved in ADCC, ADCP, CDC, half-life and half-IG exchange).
The IMGT engineered variant nomenclature (Table 15) has been set up for an easier comparison between engineered antibodies [311]. The IMGT engineered variant name comprises the species, the gene name, the letter 'v' with a number (e.g., Homo sapiens IGHG1v1), and then the domain(s) with AA change(s) defined by the letter of the novel AA and position in the domain, e.g., CH2, P1.4. In Table 15, correspondence with the Eu numbering is shown between parentheses, whereas in antibody descriptions (i.e., INN proposed and recommended lists), positons between parentheses are those in the antibody chains. The IMGT engineered variants are classified by comparison with the allele *01 of the gene and, if the effects are independent on the alleles, as a reference for the description of the amino acid (AA) changes for the other alleles. In those cases, the same variant (v) number is used for any allele of the same gene in the same species.  [311]. Homo sapiens IGHG variants involved in ADCC, ADCP, CDC, half life increase, half-IG exchange, B cell inhibition, and knobs-into-holes are shown. In mAb description, the Eu numbering between parentheses is replaced by the amino position in the antibody gamma chain. Amino acid changes and bibliographical references are quoted at The IMGT Biotechnology page (IMGT ® http://www.imgt.org The IMGT Biotechnology page > Amino acid positions involved in ADCC, ADCP, CDC, half-life and half-IG exchange). Properties modifications include: ADCC or CDC enhancement (pale green), ADCP enhancement (dark green), ADCC or CDC reduction (pale orange), B cell inhibition (orange), Half-life increase (pale blue), Hald-IG exchange reduction, Hole or Knob in knobs-into-holes interaction, Favors hexamerisation, Site-specific drug attachment, No N-glycosylation site, No disulfide bridge inter H-L (yellow).

Conclusions
IMGT ® bridging genes, structures and functions provides a unique frame for three research axes: deciphering the IG and TR locus, genes and alleles in genomes of vertebrates from fish to humans, identifying clonality and exploring high-throughput repertoires with IMGT/HighV-QUEST and exploiting data from IMGT/mAb-DB, IMGT/2Dstructure-DB and IMGT/3Dstructure-DB towards targeted and customized therapeutic antibodies. Regarding the first axis, IMGT ® genomic annotated data are classically displayed in IMGT Repertoire Web Resources (Locus description, Locus representation, Gene tables, Alignments of alleles). So far the number of higher vertebrate species present in the IMGT Web Resources reaches forty. The curated IG and TR genes and alleles are entered in the IMGT/GENE-DB database and the corresponding IMGT ® reference directories and used for coherent gene and sequence annotations of IG and TR loci of newly sequenced genomes. Thus the annotation of the IG and TR loci [78,[322][323][324][325][326][327] are key to the study and comparison of the expressed adaptive immune repertoires, in normal or pathological situations. The IMGT standardized IG and TR genes and alleles in different species  offer a unique opportunity for comparison of immune responses and of potential applications in veterinary and human medicine.
IMGT/V-QUEST is the reference tool for the clonality sequence analysis in leukemia and lymphoma [291][292][293][294][363][364][365][366], which has been extended to veterinary species owing to the availability and IMGT ® biocuration of the IG and TR loci [346,[349][350][351], and the IMGT reference directories. Exploring hight-throughput repertoires with IMGT/HighV-QUEST provides standardized NGS analysis of IG and TR repertoires in experimental engineered (combinatorial libraries) or in physiological conditions (vaccination, immunodeficiency, autoimmune diseases, cancers and infectious diseases). IMGT/HighV-QUEST is particularly well adapted for the analysis of complete V domains of the IG and TR repertoires from B and T subsets, in many experiments and from many individuals (humans or other vertebrate species). It allows the analysis of the content of scFv combinatorial phage display libraries which are classically screened for identification of novel therapeutic antibody specificities [367][368][369][370][371].
Given the importance of the interactions in the antibody specificity and affinity on the one hand and in the antibody pharmacokinetics/pharmacodynamics and half-life on the other hand, the IMGT ® integrated and standardized approach provides the genetic knowledge for allowing antibody informatics to answer the needs of targeted and customized therapy in the context of personalized medicine. The INN definition for the -mab integrates the format, target(s), IMGT gene and allele nomenclature, the CDR-IMGT lengths delimitation, the post-translational modifications and the engineered AA changes [60,86,87].
The extension of the IMGT unique numbering to the IgSF [372][373][374][375][376] and to the MhSF [377][378][379] proteins other than IG or TR has opened new perspectives for the standardized description of the polymorphism of the antigens (epitopes belonging to V, C or G domains) and of the Fc receptors (FCGR of the IgSF, FCGRT of the MhSF) and for the characterization of their interactions (antibody/antigen, FcR/antibody). The F domain (for Fibronectin type III domain), the S domain (for Scavenger, of the Scavenger receptor superfamly SrSF) and the A domain (for Apple domain) have been standardized with an IMGT unique numbering. Mass spectrometry has shown promising results in the analysis of the IGHG3 polymorphism and that anti-malarial variable domains [380][381][382]. Using the IMGT unique numbering per domain is the bridge between the biological and computational spheres.