IMGT® and 30 Years of Immunoinformatics Insight in Antibody V and C Domain Structure and Function

At the 10th Human Genome Mapping (HGM10) Workshop, in New Haven, for the first time, immunoglobulin (IG) or antibody and T cell receptor (TR) variable (V), diversity (D), joining (J), and constant (C) genes were officially recognized as ‘genes’, as were the conventional genes. Under these HGM auspices, IMGT®, the international ImMunoGeneTics information system®, was created in June 1989 at Montpellier (University of Montpellier and CNRS). The creation of IMGT® marked the birth of immunoinformatics, a new science, at the interface between immunogenetics and bioinformatics. The accuracy and the consistency between genes and alleles, sequences, and three-dimensional (3D) structures are based on the IMGT Scientific chart rules generated from the IMGT-ONTOLOGY axioms and concepts: IMGT standardized keywords (IDENTIFICATION), IMGT gene and allele nomenclature (CLASSIFICATION), IMGT standardized labels (DESCRIPTION), IMGT unique numbering and IMGT Collier de Perles (NUMEROTATION). These concepts provide IMGT® immunoinformatics insights for antibody V and C domain structure and function, used for the standardized description in IMGT® web resources, databases and tools, immune repertoires analysis, single cell and/or high-throughput sequencing (HTS, NGS), antibody humanization, and antibody engineering in relation with effector properties.


Introduction
IMGT ® , the international ImMunoGeneTics information system ® (http://www.imgt.org), was created in June 1989 at Montpellier, by Marie-Paule Lefranc (University of Montpellier and CNRS) to characterize the genes and alleles of the antigen receptors, immunoglobulins (IG) or antibodies [1] and T cell receptors (TR) [2] and to manage the huge and complex diversity of the adaptive immune responses of the jawed vertebrates (or gnathostomata) from fishes to humans [3]. The creation of IMGT ® marked the birth of immunoinformatics, a new science at the interface between immunogenetics and bioinformatics [3]. The variable (V), diversity (D), joining (J), and constant (C) genes of the antigen receptors were officially recognized as 'genes', as were the conventional genes, at the 10th Human Genome Mapping (HGM10) Workshop, in New Haven, allowing IG and TR gene and allele classification. The IMGT ® databases and tools, built on the IMGT-ONTOLOGY axioms and concepts, bridge the gap between genes, sequences and three-dimensional (3D) structures [3]. The data accuracy and consistency are based on the IMGT Scientific chart rules generated from the axioms and concepts: IMGT ® standardized keywords (IDENTIFICATION axiom, concepts of identification), IMGT ® gene and allele nomenclature (CLASSIFICATION axiom, concepts of classification), IMGT ® standardized The V domain strands and loops and their IMGT ® positions and lengths, based on the IMGT unique numbering for V domain (V-DOMAIN of IG and TR and V-LIKE-DOMAIN) [6], are shown in Table 1.
The  Figure 1). In the IG and TR V-DOMAIN, the G-STRAND is the C-terminal part of the J-REGION, with J-PHE or J-TRP 118 and the canonical motif F/W-G-X-G (J-MOTIF) at positions 118-121 [6] (Table 1). 109 structure data [9][10][11]. (C) IMGT Collier de Perles on two layers generated from 110 IMGT/DomainGapAlign [10,12,13]. Pink circles (online) indicate amino acid changes compared to the 111 closest genes and alleles from the IMGT reference directory. (D) IMGT Collier de Perles on one layer.

112
Amino acids are shown in the one-letter abbreviation. All proline (P) are shown online in yellow.

113
IMGT anchors are in square. Hatched circles are IMGT gaps according to the IMGT unique numbering 114 for V domain [6,14]. Positions with bold (online red) letters indicate the four conserved positions that 115 are common to a V domain and to a C domain: 23 (1st-CYS), 41 (CONSERVED-TRP), 89 116 (hydrophobic), 104 (2nd-CYS) [4][5][6][7]14], and the fifth conserved position,118 (J-TRP or J-PHE) which 117 is specific to a V-DOMAIN and belongs to the motif F/W-G-X-G that characterizes the J-REGION 118 [6,14] ( Table 2) Reproduced with permission from IMGT ® , the international ImMunoGeneTics information system ® , http://www.imgt.org. (A) 3D structure ribbon representation with the IMGT strand and loop delimitations [6]. (B) IMGT Collier de Perles on two layers with hydrogen bonds. The IMGT Collier de Perles on two layers show, in the forefront, the GFCC C strands (forming the sheet located at the interface VH/VL of the IG) and, in the back, the ABED strands. The IMGT Collier de Perles with hydrogen bonds (green lines online, only shown here for the GFCC'C" sheet) is generated by the IMGT/Collier-de-Perles tool integrated in IMGT/3Dstructure-DB, from experimental 3D structure data [9][10][11]. (C) IMGT Collier de Perles on two layers generated from IMGT/DomainGapAlign [10,12,13]. Pink circles (online) indicate amino acid changes compared to the closest genes and alleles from the IMGT reference directory. (D) IMGT Collier de Perles on one layer. Amino acids are shown in the one-letter abbreviation. All proline (P) are shown online in yellow. IMGT anchors are in square.
Hatched circles are IMGT gaps according to the IMGT unique numbering for V domain [6,14]. Positions with bold (online red) letters indicate the four conserved positions that are common to a V domain and to a C domain: 23 (1st-CYS), 41 (CONSERVED-TRP), 89 (hydrophobic), 104 (2nd-CYS) [4][5][6][7]14], and the fifth conserved position, 118 (J-TRP or J-PHE) which is specific to a V-DOMAIN and belongs to the motif F/W-G-X-G that characterizes the J-REGION [6,14] ( Table 2). The hydrophobic amino acids (hydropathy index with positive value: I, V, L, F, C, M, A) and tryptophan (W) [15] found at a given position in more than 50% of sequences are shown (online with a blue background color). Arrows indicate the direction of the beta strands and their designations in 3D structures. IMGT color menu for the CDR-IMGT of a V-DOMAIN indicates the type of rearrangement, V-D-J (for a VH here, red, orange and purple) or V-J (for V-KAPPA or V-LAMBDA (not shown), blue, green and greenblue) [1]. The identifier of the chain to which the VH domain belongs is 1n0x_H (from the Homo sapiens b12 Fab) in IMGT/3Dstructure-DB (http://www.imgt.org). The CDR-IMGT lengths of this VH are [8.8.20] and the FR-IMGT are [25.17.38.11]. The 3D ribbon representation was obtained using PyMOL (http://www.pymol.org) and 'IMGT numbering comparison' of 1n0x_H (VH) from IMGT/3Dstructure-DB (http://www.imgt.org).  In the IG and TR V-DOMAIN, the structurally conserved antiparallel beta strands are also designated as framework regions (FR-IMGT) whereas the loops are designated as complementarity determining regions (CDR-IMGT) [6]. Strands A and B correspond to the FR1-IMGT (positions 1 to 26), strands C and C' to the FR2-IMGT (positions 39 to 55), strands C", D, E, and F to the FR3-IMGT (positions 66 to 104) and strand G to the FR4-IMGT (positions 118 to 128). The BC, C'C", and FG loops correspond to the CDR1-IMGT, CDR2-IMGT and CDR3-IMGT, respectively [6] (Table 1, Figure 1).
IMGT anchors belong to the strands (or FR-IMGT) and represent 'anchors' supporting the three BC, C'C" and FG loops (or CDR-IMGT). V domain anchor positions are positions 26 and 39, 55 and 66, and 104 and 118, shown in square in IMGT Colliers de Perles. In a V-DOMAIN, the 2nd-CYS at position 104 (F strand) and J-PHE or J-TRP at position 118 (G strand) are anchors of the FG loop (or CDR3-IMGT) [5,6].
The loop length (number of amino acids (or codons), that is number of occupied positions) is a crucial and original concept of IMGT-ONTOLOGY [3]. The lengths of the CDR1-IMGT (BC), CDR2-IMGT (C'C"), and CDR3-IMGT (FG) characterize the V-DOMAIN. Thus, the length of the three CDR-IMGT (loops) is shown, in number of amino acids (or codons), into brackets and separated by dots. For example [8.8.20] means that the CDR1-IMGT (BC), CDR2-IMGT (C'C"), and CDR3-IMGT (FG) have lengths of 8, 8, and 20 amino acids (or codons), respectively. The JUNCTION of an IG or TR V-DOMAIN includes the anchors 104 and 118 and is therefore two amino acids longer than the corresponding CDR3-IMGT (positions 105-117) [5,6].

C Domain Strands, Loops, and Turns
The C domain strands, turns and loops and their IMGT positions and lengths, based on the IMGT unique numbering for C domain (C-DOMAIN of IG and TR and C-LIKE-DOMAIN) [7], are shown in Table 2.
The IMGT anchors belong to the strands and represent, for the C domains, anchors for the BC and FG loops and by extension to the CD strand (as C domains do not have the C'-C" loop) [7]. Anchor positions are shown in square in IMGT Colliers de Perles. C domain anchor positions are positions 26 and 39, 45 and 77 (anchors of the CD strand), and 104 and 118 [7].

C Domain and V Domain Comparison
The A-STRAND and B-STRAND of the C domain are similar to those of the V domain [7]. The longest BC-LOOP of the C domain have 10 amino acids (missing positions 32 and 33), instead of 12 amino acids in the V domain. The C-STRAND and the D-STRAND of the C domain are shorter of one position (46) and two positions (75, 76), respectively, compared to those of the V domain. The transversal CD-STRAND is a characteristic of the C domain (a V domain has instead two antiparallel beta strands C'-STRAND and C"-STRAND linked by the C'C"-LOOP). The E-STRAND, F-STRAND and G-STRAND of the C domain are similar to those of the V domain [7] (IMGT ® http://www.imgt.org, IMGT Scientific chart > Numbering > IMGT unique numbering for C-DOMAIN and C-LIKE-DOMAIN).

IMGT Gaps and Additional Positions
IMGT gaps are shown by dots in IMGT Protein displays and by hatched circles or squares in IMGT Colliers de Perles for C domain and correspond to unoccupied positions according to the IMGT unique numbering for C domain [7].
The longest BC-LOOP of the C domain have 10 amino acids (missing positions 32 and 33, that are a feature of the C domain are not shown in the IMGT Colliers de Perles and IMGT Protein displays for C domain). For BC loops shorter than 10 amino acids, gaps are created from the apex in the following order 34, 31, 35, 30, 36, etc. The FG-LOOP of the C domain is similar to that of the V domain. Gaps for FG loops shorter than 13 amino acids and additional positions for FG loops longer than 13 amino acids, are created following the same rules as those of the V domain.
Additional positions in the C domain define the AB-TURN, DE-TURN and EF-TURN (Table 2). For AB-TURN shorter than 3 amino acids, gaps are created (hatched in IMGT Colliers de Perles, or not shown in structural data representations) in a decreasing ordinal manner. For DE-TURN shorter than 14 amino acids, gaps are created in the following order: 85.7, 84.7, 85.6, 84.6, 85.5, etc. For EF-TURN shorter than 2 amino acids, gaps are created in the following order: 96.2, 96.1 [7].

CDR-IMGT Delimitation for Grafting
The objective of antibody humanization is to graft at the DNA level the CDR of an antibody V domain, from mouse (or other species) and of a given specificity, onto a human V domain framework, thus preserving the specificity of the original (murine or other species) antibody while decreasing its immunogenicity [16]. IMGT/DomainGapAlign [10,12,13] is the reference tool for antibody humanization design based on CDR grafting: (i) it precisely defines the CDR1-IMGT, CDR2-IMGT and CDR3-IMGT to be grafted, and (ii) it helps selecting the most appropriate human FR-IMGT by providing the alignment of the mouse (or other species) V-DOMAIN amino acid sequence with the closest germline Homo sapiens V-REGION and J-REGION.
Analyses performed on humanized therapeutic antibodies underline the importance of a correct delimitation of the CDR and FR. As an example, two amino acid changes were required in the first version of the humanized VH of alemtuzumab, in order to restore the specificity and affinity of the original rat antibody. The positions of these amino acid changes (S28>F and S35>F) are now known to be located in the CDR1-IMGT and should have been directly grafted, but at the time of this mAb humanization they were considered as belonging to the FR according to the Kabat numbering [17]. In contrast, positions 66-74 were, at the same time, considered as belonging to the CDR according to the Kabat numbering, whereas they clearly belong to the FR2-IMGT and the corresponding sequence should have been 'human' instead of being grafted from the 'rat' sequence.

V-DOMAIN Contact Analysis and Paratope
The amino acids of the V-DOMAIN CDR-IMGT involved in the contacts with the antigen can be visualized in IMGT/3Dstructure-DB Contact analysis [9][10][11] which provides extensive information on the atom pair contacts. Domain pair contacts ('DomPair') provide information on the contacts between a pair of partners (for examples, between the VH domain of motavizumab (3ixt_H chain) and the ligand (3ixt_P chain), or between the V-KAPPA domain of motavizumab (3ixt_L chain) and the ligand (3ixt_P chain) (Figure 3) [9][10][11]. Clicking on R@P gives access to the IMGT Residue@Position cards [9][10][11].

V-DOMAIN CDR-IMGT Lengths and Canonical Structures
For V-DOMAIN comparison including sequences and structures, the CDR1-IMGT and CDR2-IMGT lengths are more informative than the "canonical structures" (IMGT ® http://www.imgt.org, IMGT Repertoire (IG and TR) > 2D and 3D structures > CDR1-IMGT (summary) and correspondence with "canonical structures": human (Homo sapiens) and mouse (Mus musculus) Immunoglobulins; ibid CDR2-IMGT). Indeed, (1) most identified (15 out of 19) canonical structures correspond to a given CDR-IMGT length, (2) only two CDR-IMGT lengths have two canonical structures (CDR1-IMGT of nine AA of IGLV, and CDR2-IMGT of eight AA of IGHV), (3) canonical structures have not been identified for every CDR-IMGT length, (4) many 'variants' are described in the literature, based only on sequences and without experimental evidence, (5) canonical structures cannot be identified for CDR3 owing to their diversity in lengths and sequences and to their flexibility, and (6) canonical structure identification is reliable only if 3D structures are known [14]. Thus, the CDR-IMGT length is the most accurate way to define the three CDR, while working on sequences, that information being completed with characteristics Residue@Position, if necessary [9][10][11].

IGHG1 Alleles and G1m Allotypes
Allotypes are polymorphic markers of an IG subclass that correspond to amino acid changes and are detected serologically by antibody reagents [18]. In therapeutic antibodies (human, humanized, or chimeric), allotypes may represent potential immunogenic residues [19], as demonstrated by the presence of antibodies in individuals immunized against these allotypes [18]. The allotypes of the human heavy gamma chains of the IgG are designated as Gm (for gamma marker). The allotypes G1m, G2m, and G3m are carried by the constant region of the gamma1, gamma2 and gamma3 chains, encoded by the IGHG1, IGHG2 and IGHG3 genes, respectively [18]. The gamma1 chains may express four G1m alleles (combinations of G1m allotypes): G1m3, G1m3,1, G1m17,1, and G1m17,1,2 (and in Negroid populations three additional G1m alleles, G1m17,1,27, G1m17,1,28, and G1m17,1,27,28) [18] ( Table 3). The C region of the G1m3,1, G1m17,1, and G1m17,1,2 chains differ from that of the G1m3 chains by two, three and four amino acids, respectively [18]. The correspondence between the G1m alleles and IGHG1 alleles is shown in Table 3. Thus, IGHG1*01, IGHG1*02 and IGHG1*05 are G1m17,1, IGHG1*03 is G1m3, IGHG1*04 is G1m17,1,27 and IGHG1*08p is G1m3,1. In the IGHG1 CH1, the lysine at position 120 (K120) in strand G corresponds to the G1m17 allotype [18] (Figure 2D). The isoleucine I103 (strand F) is specific of the gamma1 chain isotype. If an arginine is expressed at position 120 (R120), the simultaneous presence of R120 and I103 corresponds to the expression of the G1m3 allotype [18]. For the gamma3 and gamma4 isotypes (which also have R120 but T in 103), R120 only corresponds to the expression of the nG1m17 isoallotype (an isoallotype or nGm is detected by antibody reagents that identify this marker as an allotype in one IgG subclass and as an isotype for other subclasses) [18]. In the IGHG1 CH3, the aspartate D12 and leucine L14 (strand A) correspond to G1m1, whereas glutamate E12 and methionine M14 correspond to the nG1m1 isoallotype [18] (Table 3). A glycine at position 110 corresponds to G1m2, whereas an alanine does not correspond to any allotype (G1m2-negative chain) ( Table 3). Therapeutic antibodies are most frequently of the IgG1 isotype, and to avoid a potential immunogenicity, the constant region of the gamma1 chains are often engineered to replace the G1m3 allotype by the less immunogenic G1m17 (CH1 R120 > K) (G1m17 is more extensively found in different populations) [18]. Table 3. Correspondence between the IGHG1 alleles and G1m alleles.

Dromedary IgG2 and IgG3 Only-Heavy-Chain Antibodies
Two IgG antibody formats are expressed in the dromedary or Arabian camel (Camelus dromedarius) and in Camelidae in general: the conventional IG (with two identical heavy gamma chains associated to two identical light chains) and the 'only-heavy-chain' IG (no light chain, and only two identical heavy gamma chains lacking CH1) [20]. The Camdro (for Camelus dromedarius in the 6letter species abbreviation) IGHV3 genes belong to two sets based on four amino acid changes which are characteristic of each set [21]. The first set of IGHV3 genes is expressed in conventional tetrameric IgG1 that constitute 25% of circulating antibodies. The second set is expressed in the only-heavychain antibodies, IgG2 and IgG3 that constitute 75% of the circulating antibodies [20]. The four amino acid changes are located in the FR2-IMGT at positions 42, 49, 50 and 52, the first position 42 is in the C strand and the three others (49, 50 and 52) in the C' strand ( Figure 1). They belong to the (GFCC'C") sheet at the hydrophobic VH-VL interface in conventional antibodies of Camelidae as well as of any vertebrate species whereas, in camelid only-heavy-chain antibodies (no light chains, and therefore no a In Negroid populations, the G1m17,1 allele frequently includes G1m27 and/or G1m28, leading to three additional G1m alleles, G1m17,1,27, G1m17,1,28 and G1m17,1,27,28 [18]. b Amino acids corresponding to G1m allotypes are shown in bold. c The nG1m1 and nG1m17 isoallotypes present on the Gm1-negative and Gm-17 negative gamma-1 chains (and on other gamma chains) are shown in italics. d The presence of R120 is detected by anti-nG1m17 antibodies whereas the simultaneous presence of I103 and R120 in the gamma1 chains is detected by anti-Gm3 antibodies [18]. e The IGHG1*01, IGHG1*02 and IGHG1*05 alleles only differ at the nucleotide level (codon 85.1 in CH2 of *02 and *05 differs from *01, codon 19 in CH1 and codon 117 in CH3 of *05 differ from *01 and *02). f IGHG1*05p, IGHG1*06p, IGHG1*07p and IGHG1*08p amino acids are expected [18] but not yet sequenced at the nucleotide level and therefore these alleles are not shown in IMGT Repertoire, Alignments of alleles: Homo sapiens IGHG1 (http://www.imgt.org).

Dromedary IgG2 and IgG3 Only-Heavy-Chain Antibodies
Two IgG antibody formats are expressed in the dromedary or Arabian camel (Camelus dromedarius) and in Camelidae in general: the conventional IG (with two identical heavy gamma chains associated to two identical light chains) and the 'only-heavy-chain' IG (no light chain, and only two identical heavy gamma chains lacking CH1) [20]. The Camdro (for Camelus dromedarius in the 6-letter species abbreviation) IGHV3 genes belong to two sets based on four amino acid changes which are characteristic of each set [21]. The first set of IGHV3 genes is expressed in conventional tetrameric IgG1 that constitute 25% of circulating antibodies. The second set is expressed in the only-heavy-chain antibodies, IgG2 and IgG3 that constitute 75% of the circulating antibodies [20]. The four amino acid changes are located in the FR2-IMGT at positions 42, 49, 50 and 52, the first position 42 is in the C strand and the three others (49, 50 and 52) in the C' strand ( Figure 1). They belong to the (GFCC'C") sheet at the hydrophobic VH-VL interface in conventional antibodies of Camelidae as well as of any vertebrate species whereas, in camelid only-heavy-chain antibodies (no light chains, and therefore no VL), these positions are exposed to the environment with, through evolution, a selection of hydrophilic amino acids.
The respective heavy gamma2 and gamma3 chains are both characterized by the absence of the CH1 domain owing to a splicing defect [22]. It is the absence of CH1 which is responsible for the lack of association of the light chains. Only-heavy-chain antibodies is a feature of the Camelidae IG as they have also been found in the Bactrian camel (Camelus bactrianus) of Central Asia and in the llama (Lama glama) and alpaca (Vicugna pacos) of South America. The genetic event (splicing defect) responsible for the lack of CH1 occurred in their common ancestor before the radiation between the 'camelini' and 'lamini', dating approximately 11 million years (Ma) ago.
The V domains of Camelidae only-heavy-chain antibodies have characteristics for potential pharmaceutical applications (e.g., easy production and selection of single-domain format, extended CDR3 with novel specificities and binding to protein clefts). They are designated as VH H when they have to be distinguished from conventional VH (the sequence criteria is based on the four amino acids at positions 42, 49, 50 and 52). The term 'nanobody' initially used for describing a single-domain format antibody is not equivalent to VH H, as it has been used for V domains other than VH H and for constructs containing more than one V domain (VH and/or VH H) (e.g., caplacizumab, ozoralizumab) (IMGT ® http://www.imgt.org, IMGT Repertoire > Locus and Genes > Gene tables; ibid., The IMGT Biotechnology page > Characteristics of the camelidae (camel, llama) antibody synthesis; ibid. IMGT/mAb-DB > caplacizumab; ibid. IMGT/mAb-DB > ozoralizumab).

Human Heavy Chain Diseases (HCD)
The camelidae only-heavy-chain antibodies synthesis is remarkably reminiscent of what is observed in human heavy chain diseases (HCD). These proliferative disorders of B lymphoid cells produce truncated monoclonal immunoglobulin heavy chains which lack associated light chains. In most HCD, the absence of the heavy chain CH1 domain by deletion or splicing defect may be responsible for the lack of assembly of the light chain [23]. Similar observations have also been reported in mouse variants [23]. (IMGT ® http://www.imgt.org, IMGT Education > Tutorials > Molecular defects in Immunoglobulin Heavy Chain Diseases (HCDs))

Nurse Shark IgN
A convergence mechanism in evolution is observed in nurse shark (Ginglymostoma cirratum, 'Gincir' in the 6-letter species abbreviation) IgN antibodies (previously IgNAR, 'immunoglobulin new antigen receptor') [24] which are only-heavy-chain antibodies (homodimeric heavy nu chains without CH1, and no associated light chains). The IGHV genes expressed in the Gincir heavy nu chains belong to the IGHV2 subgroup and are characterized by the absence of the CDR2-IMGT owing to a deletion that encompasses position 54 to 67. The Gincir IGH genes are organized in duplicated cassettes, and those that express IgN comprise Gincir IGHV2 subgroup genes and an IGHN constant gene. (IMGT ® http://www.imgt.org, IMGT Repertoire (IG and TR) > Protein displays: nurse shark (Ginglymostoma cirratum) IGHV).

N-Linked Glycosylation Site CH2 N84.4
A N-linked glycosylation site is present in the CH2 domain of the constant region of the human IG heavy chains of the four IgG isotypes. The N-linked glycosylation site belongs to the classical N-glycosylation motif N-X-S/T (where N is asparagine, X any amino acid except proline, S serine, T threonine) and is defined as CH2 N84.4. As shown in the IMGT Collier de Perles, this asparagine is localized at the DE turn. The IMGT unique numbering has the advantage of identifying the C domain (here, CH2) and, in the domain, the amino acid and its localization (here, N84.4) which can be visualized in the IMGT Collier de Perles and correlated with the 3D structure [25][26][27] (IMGT ® http://www.imgt.org, The IMGT Biotechnology page > Glycosylation (IMGT Lexique)).

Interface Ball-and-Socket-Like Joints
The 3D structure comparison, between Homo sapiens IGHG1 Fc and IGHG2 Fc, of the CH2 and CH3 domain interface revealed that in all IGHG Fc the movement of the CH2 results from a pivoting around a highly conserved ball-and-socket-like joint [28]. Using the IMGT numbering, the CH2 L15 side chain (last position of the A strand, next to the AB turn) (the ball) interacts with a pocket (the socket) formed by CH3 M107, H108, E109, and H115 (FG loop) [25]. These amino acids are well conserved between the gamma isotypes and the IGHG genes and alleles except for IGHG3 H115 that shows a polymorphism associated to different G3m allotypes [18]. This ball-and-socket-like joint is a structural feature similar but reversed to that previously described at the VH and CH1 domain interface [29], in which the VH L12, T125 and S127 form the socket whereas the CH1 F29 and P30 (BC loop) form the ball.

Knobs-Into-Holes CH3 for the Obtaining of Bispecific Antibodies
The knobs-into-holes methodology has been proposed for obtaining bispecific antibodies [30]. The aim is to increase interactions between the CH3 domain of two gamma1 chains that belong to antibodies with a different specificity. Two amino acids, CH3 T22 (B strand) and Y86 (E strand), which belong to the [ABED] sheet, at the interface of the two Homo sapiens IGHG1 CH3 domains [25], were selected for amino acid changes. Interactions of these two amino acids are described in 'Contact analysis' in IMGT/3Dstructure-DB [9][10][11]. The knobs-into-holes methodology consists of an amino acid change on one CH3 domain (e.g., T22>Y) that creates a knob, and another amino acid change on the other CH3 domain (e.g., Y86>T) that creates a hole, thus favoring increased interactions between the CH3 of the two gamma1 chains at both positions 22 and 86 [30] (IMGT ® http://www.imgt.org, The IMGT Biotechnology page > Knobs-into-holes). The IMGT engineered variant nomenclature (Table 4) has been set up for an easier comparison between engineered antibodies. The IMGT engineered variant name comprises the species, the gene name, the letter 'v' with a number (e.g., Homo sapiens IGHG1v1), and then the domain(s) with AA change(s) defined by the letter of the novel AA and position in the domain (e.g., CH2, P1.4). The IMGT engineered variants are classified by comparison with the allele *01 of the gene and, if the effects are independent on the alleles, as a reference for the description of the amino acid (AA) changes for the other alleles. In those cases, the same variant (v) number is used for any allele of the same gene in the same species.

Conclusions
IMGT®, created in 1989 with the official recognition of IG and TR genes, is at the origin of immunoinformatics [3]. The concepts of classification (nomenclature and IG and TR gene and allele names, CLASSIFICATION axiom) were soon followed by the concepts of identification (standardized IMGT keywords, IDENTIFICATION axiom) and the concepts of description (standardized IMGT

Conclusions
IMGT ® , created in 1989 with the official recognition of IG and TR genes, is at the origin of immunoinformatics [3]. The concepts of classification (nomenclature and IG and TR gene and allele names, CLASSIFICATION axiom) were soon followed by the concepts of identification (standardized IMGT keywords, IDENTIFICATION axiom) and the concepts of description (standardized IMGT labels, DESCRIPTION axiom) which led to the implementation of IMGT/LIGM-DB, the first IMGT sequence database demonstrated online at the 9th International Congress of Immunology (ICI), San Francisco (USA), in July 1995. It took two more years to conceive the concepts of numerotation, IMGT unique numbering and IMGT Collier de Perles (NUMEROTATION axiom) which bridge sequences and structures of V and C domain (at the amino acid and codon levels) [3]. Interestingly, the first IMGT Collier de Perles, created manually in December 2007, not only identified conflicts between the SEQRES and ATOM lines of the PDB file but also the absence of a serine at position 93, demonstrating that indeed sequence and structure were bridged using the IMGT unique numbering (http://www. imgt.org/IMGTrepertoire/2D-3Dstruct/2D-representations/mouse/IG/E5.2Fv/ighV-D-J_E5_2Fv.html).
The IMGT ® databases, tools and web resources have been built to manage immunogenetics knowledge and immunoinformatics, based on the IMGT Scientific chart rules generated from the IMGT-ONTOLOGY axioms and concepts [3]. Nowadays, IMGT ® provides standardized and integrated databases, tools and web resources for IG and TR, from gene to structure and function [31][32][33][34][35][36][37][38][39][40][41][42][43]. The same concepts and insights for the V and C domain, are used for all vertebrate species with jaws (gnathostomata), from fishes to humans, providing a unique resource whatever the antigen receptor, the chain type and the taxon, for study of the adaptive immune response [3]. IG repertoire analysis and therapeutic antibody development represent two major current fields of immunoinformatics, involving V and C domains, in fundamental, pharmaceutical and medical research. High throughput (HTS) data obtained by NGS has made IMGT ® standardization, developed originally to handle the huge diversity of the immune repertoires, more needed as ever. Since October 2010, the IMGT/HighV-QUEST web portal has been a paradigm for the characterization of the V domain diversity and expression and the identification of the IMGT clonotypes (AA) [44][45][46]. Statistical comparison of the V domain and IMGT clonotype (AA) diversity and expression between two sets can be performed using the IMGT/StatClonotype package [47,48]. NGS analysis of V domain provides immunoprofiling in normal (infectious diseases, vaccination, aging) or pathological (leukemias, lymphomas, myelomas, immunodeficiencies) conditions. An IMGT/HighV-QUEST novel functionality includes, with the same high-quality criteria, the analysis of the two V domains of single chain Fragment variable (scFv) from phage display combinatorial libraries) [49][50][51].
The therapeutic monoclonal antibody engineering field represents the most promising potential in medicine. Standardized genomic and expressed sequence, structure and interaction analysis of IG is crucial for a better molecular understanding and comparison of the mAb specificity, affinity, half-life, Fc effector properties, and potential immunogenicity. IMGT/3Dstructure-DB provides a standardized description and antibody structure/contact analysis characterization, at the V and C domain level, at the chain level (with the 'chimeric' and 'humanized' added as 'taxon'), and at the receptor level. Amino acids (or codons) changes (either polymorphic or resulting from engineering are identified. The structural unit is the V or C domain, with for regions (hinge, linker, CHS). This modular characterization per domain (and/or region) provides a great flexibility and is applicable to any novel format of antibody engineering [52][53][54][55][56][57][58][59]. IMGT concepts have been integrated in the Encyclopedia of Systems Biology [60][61][62][63]. The CDR-IMGT lengths are now required for mAb INN applications and are included in the World Health Organization International Nonproprietary Name WHO INN definitions [64], bringing a new level of standardized information in the comparative analysis of therapeutic antibodies. Availability and Citation: Authors who use IMGT ® databases and tools are encouraged to cite this article and to quote the IMGT ® Home page, http://www.imgt.org. Online access to IMGT ® databases and tools are freely available for academics and under licences and contracts for companies. IMGT ® received financial support from the GIS IBiSA, BioCampus Montpellier, the Région Occitanie (Grand Plateau Technique pour la Recherche (GPTR)), the Agence Nationale de la recherche (ANR) and the Labex MabImprove (ANR-10-LABX-53-01). IMGT ® is currently supported by the Centre National de la Recherche Scientifique (CNRS), the Ministère de l'Enseignement Supérieur, de la Recherche et de l'Innovation (MESRI) and the University of Montpellier.