In Silico Methods in Antibody Design

Antibody therapies with high efficiency and low toxicity are becoming one of the major approaches in antibody therapeutics. Based on high-throughput sequencing and increasing experimental structures of antibodies/antibody-antigen complexes, computational approaches can predict antibody/antigen structures, engineering the function of antibodies and design antibody-antigen complexes with improved properties. This review summarizes recent progress in the field of in silico design of antibodies, including antibody structure modeling, antibody-antigen complex prediction, antibody stability evaluation, and allosteric effects in antibodies and functions. We listed the cases in which these methods have helped experimental studies to improve the affinities and physicochemical properties of antibodies. We emphasized how the molecular dynamics unveiled the allosteric effects during antibody-antigen recognition and antibody-effector recognition.


Introduction
Nowadays, monoclonal immunoglobulin gamma (IgG) antibodies have become a major framework in cancer therapy and therapy for many other critical diseases. IgG molecules bind to their cognate antigens and the immune complexes subsequently interact either with type I or type II Fc receptors on effector cells and on B cells, modulating both humoral immune processes and innate immune processes [1]. Structurally, IgG contains four polypeptide chains, including two light chains (LC) and two heavy chains (HC). The four chains fold into three domains (Figure 1), that is, two Fab domains, which bind antigen and one Fc domain, which bind Fc receptors for effector function [2]. The Fab domains contain variable domain and constant domain. The variable domains, especially complementarity determining regions (CDRs), are mainly responsible for specificity and affinity [3], while the constant domains modulate the isotype/effector functions [4]. The Fc domain contains CH2 and CH3 domains. The CH2 domain mainly interacts with Fc receptors (FcRs), which are on the cell surface and play pivotal roles in humoral and cellular protection. Other regions, including the hinge region and glycan, also affect antibody activities, (e.g., binding [5], pharmacokinetics [6], and effector functions [7]). Interestingly, numerous evidences have shown that the antibody constant domain also plays a role in antibody antigen recognition [8]. Antibodies with identical V regions differing in isotype or subclass manifest either differences in antigen binding [9,10] or altered specificity [11,12]. Engineering the above portion of the antibody will optimize the properties of the antibody with the desired efficacy. The available crystal structures of antibodies, antibody-antigen complexes, and Fc/Fc receptor complexes are increasing. Meanwhile, computational resources are increasing and algorithms, and molecular mechanics force fields are becoming more accurate to model molecular behaviors, especially local rearrangement. Thus, the in silico molecular modeling techniques are becoming popular to engineer antibodies [13], (e.g., Fc-based antibody domains and fragments [14], disulfide bonds [15], and T-cell receptor(TCR) mimic antibodies [16] with desired properties, such as viscosity and phase separation properties [17]). In this review, we focused on the reports using molecular modeling in antibody design and the study of antibody behaviors, especially the allosteric effects related to antibody/antigen recognition.

Structure Prediction of Variable Domain and Complementarity-Determining Regions
Complementarity-determining regions (CDRs) are part of the variable chains in IgGs, and a set of CDRs constitutes a paratope. Chothia and Lesk define the "hypervariable loops" based on the relationship between amino acid sequences and three-dimensional structures around the antigen binding region [18,19]. They found that five out of six hypervariable regions, (i.e., light chain CDR loop 1-3 (L1, L2, L3) and heavy chain CDR loop 1-2 (H1 and H2)) typically adopt a limited number of discrete backbone conformations, called "canonical structures". They further identified several residues that are responsible for the main-chain conformations of the hypervariable loops. To build the reliable antibody variable fragment (Fv) structures, people have extensively studied the conformational library of canonical structures [20][21][22][23][24]. Within the same type of canonical structure, the average backbone root-mean-square deviation (RMSD) between a target loop and a template loop is approximately 0.7 Å . It should be noted that the above canonical structures work well in human and murine antibodies but not in other organisms, such as the camelid heavy chain antibodies, which only have varied domain of heavy chain [25], and bovine antibodies, which have ultra-long VH CDR3 regions [26]. Based on the canonical structures, the structures of antibodies can be modeled by careful selection of at least eight templates, including two for the light and heavy chain frameworks selected by sequence similarity, five templates for non-H3 loops selected using the canonical structures and one for the H3 loop partially based on the ab initio methods [27,28] (e.g., Interestingly, numerous evidences have shown that the antibody constant domain also plays a role in antibody antigen recognition [8]. Antibodies with identical V regions differing in isotype or subclass manifest either differences in antigen binding [9,10] or altered specificity [11,12]. Engineering the above portion of the antibody will optimize the properties of the antibody with the desired efficacy. The available crystal structures of antibodies, antibody-antigen complexes, and Fc/Fc receptor complexes are increasing. Meanwhile, computational resources are increasing and algorithms, and molecular mechanics force fields are becoming more accurate to model molecular behaviors, especially local rearrangement. Thus, the in silico molecular modeling techniques are becoming popular to engineer antibodies [13], (e.g., Fc-based antibody domains and fragments [14], disulfide bonds [15], and T-cell receptor(TCR) mimic antibodies [16] with desired properties, such as viscosity and phase separation properties [17]). In this review, we focused on the reports using molecular modeling in antibody design and the study of antibody behaviors, especially the allosteric effects related to antibody/antigen recognition.

Structure Prediction of Variable Domain and Complementarity-Determining Regions
Complementarity-determining regions (CDRs) are part of the variable chains in IgGs, and a set of CDRs constitutes a paratope. Chothia and Lesk define the "hypervariable loops" based on the relationship between amino acid sequences and three-dimensional structures around the antigen binding region [18,19]. They found that five out of six hypervariable regions, (i.e., light chain CDR loop 1-3 (L1, L2, L3) and heavy chain CDR loop 1-2 (H1 and H2)) typically adopt a limited number of discrete backbone conformations, called "canonical structures". They further identified several residues that are responsible for the main-chain conformations of the hypervariable loops. To build the reliable antibody variable fragment (Fv) structures, people have extensively studied the conformational library of canonical structures [20][21][22][23][24]. Within the same type of canonical structure, the average backbone root-mean-square deviation (RMSD) between a target loop and a template loop is approximately 0.7 Å. It should be noted that the above canonical structures work well in human and murine antibodies but not in other organisms, such as the camelid heavy chain antibodies, which only have varied domain of heavy chain [25], and bovine antibodies, which have ultra-long VH CDR3 regions [26]. Based on the canonical structures, the structures of antibodies can be modeled by careful selection of at least eight templates, including two for the light and heavy chain frameworks selected by sequence similarity, five templates for non-H3 loops selected using the canonical structures and one for the H3 loop partially based on the ab initio methods [27,28] (e.g., PLOP [29], Modeller [30], Loopy [31], and the loop modeling in Rosetta [32]). Different from the canonical structure approach, ab initio methods depend on conformation predictions using physicochemical principles rather than using structural templates. Different from the canonical structure approach, ab initio methods depend on physicochemical principles rather than the template. Due to the simplification of the energy functions and the limited computational resources, the ab initio methods have limitation in accuracy.
Several widely used antibody structure prediction methods are developed from the industry, (e.g., Chemical Computer Group (CCG), Schrödinger Inc. (New York, NY, USA) [33] and Accelrys Inc. (San Diego, CA, USA)) or academia, (e.g., PIGS server [27]). These methods combined different degrees of automation, template selection criteria, types of energy functions, and sampling algorithms. To evaluate the reliability of the predictions compared to the experimentally determined structure of the antibody, the antibody modeling assessment (AMA) initiated an evaluation platform to compare the antibody structures from experiments and modeling [34,35]. AMA used unpublished, high-resolution Fab crystal structures as a benchmark to compare models generated by the different methods. After the evaluation, AMA-II concluded that although there has been great progress in the prediction of antibody structure, high-quality experimental structures are still the most important for modeling antibody structures with high accuracy. Moreover, a combination of homology modeling with knowledge-based and energy-based methods can generate more reliable H3 loops [36]. For example, RosettaAntibody [37,38] combined homology and ab initio modeling to build a preliminary homology model by selecting different templates for the frameworks and non-H3 CDRs, modeling the H3 loop and optimizing the heavy chain variable domain (VH)/light chain variable domain (VL) interface using ab initio methods.

Antibody-Antigen Binding Prediction, Epitope Mapping, and Affinity Maturation
Using the three-dimensional structures of the antibody-antigen complexes, it is possible to enhance the antibody-antigen binding affinities by in silico mutations on antibody residues. In the best situation, when the antibody-antigen complex structures are available, it is relatively straight-forward to perform affinity maturation in silico. Firstly, as an initial step, the protein backbone was treated as rigid, and the conformation of the side chain was determined by discrete side-chain rotamer search. Secondly, the lowest-energy of the structures was further re-evaluated by using more accurate, but computationally more expense models, (e.g., Poisson-Boltzmann (PB) or Generalized Born (GB) continuum electrostatics, unbound-state side-chain conformation search, and minimization). Based on the crystal complex structure between 11K2 and MCP-1 [39], all residues of the CDRs of 11K2 were systematically mutated to 19 other natural amino acids computationally. The interaction energy between the antigen and the antibody was then evaluated and 12 mutations showed improved in silico binding energy. The binding affinity of these mutants were further evaluated by surface plasmon resonance (SPR). Among the 12 mutants, five showed improved binding affinity and one showed a 4.6-fold improvement. A 10 times increase in affinity was achieved by Lippow et al. by redesigning an antilysozyme antibody [40]. Interestingly, Lippow et al. showed that computed electrostatics alone is better than computed total free energy to improve binding [40] in that specific case. Their results suggested that using only electrostatics interaction could be a less expensive but more accurate indicator to predict the binding affinity between antibody and antigen.
One of the significant challenges in modeling antibody antigen complexes is docking the antibody onto its epitope on the surface of the antigen. As epitopes and paratopes are typically flat, the shape complementarity between antibody and antigen is not a good determinant of correct antibody placement, which limits the application of general protein-protein docking procedures. SnugDock [41], which is based on the RosettaDock algorithm [42], applied alternating rounds of low-resolution rigid body perturbations and high-resolution side-chain and backbone minimization to generate a model of the antibody-antigen complex. The protocol relies on random perturbation of the complex and creates large numbers (∼10 5 ) of models to capture a global energy minimum. Encouragingly, the antibody-antigen complexes showed a strong energy funnel, with low energy structures corresponding to a low RMSD to the native structure, indicating that this method can recover the native conformation. Zhao et al. used RosettaDock to search the possible complexes between Aβ fibrils/oligomers and a therapeutic antibody [43], crenuzumab, which is designed to reduce the Aβ species in blood. They selected five Aβ oligomer-crenuzumab complexes and refined the docked conformation using molecular dynamics (MD) simulations. They found that two out of five remain stable, which explain the experimental observation of the antibody's recognition of amyloid. The results illustrated the usefulness of further refinements of docking results by MD simulation.
An alternative computational approach to physical modeling is the knowledge-based residue pair preference on epitope-paratope interfaces. With the increasing crystal structures of antibody-antigen complexes in the Protein Data Bank (PDB), a statistical amino acid interaction preference matrix can be used to predict the antibody-antigen recognition. Wang et al. studied the physicochemical properties of regions on and far from the antibody-antigen interfaces, such as net charge, overall antibody charge distributions, and their role in antigen interactions. They found that amino acid preference in antibody-protein antigen recognition is entropy driven. The interface residues with low side-chain entropy are selected to compensate for the high backbone entropy when interacting with protein antigens Positively charged and polar antigen residues and bridging water molecules have a higher possibility of being selected on the antibody-protein antigen interface. Tyr, Ser, and Asp but few Lys are selected on the antibody-antigen interfaces [44]. K. Tharakaraman et al. generated a similar matrix using the available crystal structure. They applied the matrix to guide the antibody-antigen docking of antibody 4E11, which has no crystal structure [45]. They designed affinity-enhancing mutations with a 450-fold improvement, leading to a potent cross-reactive neutralizing antibody to target dengue virus [45].
Although there are many successful cases for protein-protein affinity design, there are still challenges as the antibody-antigen recognition is not simply interactions between proteins; the solvent effect is also crucial. For instance, interfacial trapped water molecules, polar and charged side chains, and the balance between protein-solvent and protein-protein interactions from the unbound to bound state need to be considered during the modeling.

Antibody Aggregation, Stability, and Immunogenicity
In high concentrations of formulations for therapy or storage [46], antibodies have been known to aggregate, and the aggregation of therapeutic proteins can lead to immunogenicity [47][48][49]. Numerous experimental studies have been performed to investigate antibody stability and resistance to aggregation [50][51][52][53], especially in single-chain Fv fragments. Molecular modeling is a useful tool in addition to experimental techniques to predict aggregate-prone regions (APRs) [46], and to design aggregation-resistant antibodies by introducing mutations in those regions. The sequence composition and several structural properties, such as hydrophobicity, charge, and secondary structure propensity, were used to predict the APRs [54][55][56][57]. The experimental datasets were also able to predict the aggregation rate upon different mutations [58,59]. Wang et al. investigated the APRs on the CDR region, and they found that the APRs most frequently appeared in CDR-H2 and less frequently in CDR-H3 [60]. Moreover, they showed that aromatic residues (e.g., Tyr and Trp) are favored both on CDR and APRs, indicating that co-incidence of APRs with CDR sites can potentially cause the loss of function upon aggregation. Trout et al. quantified the exposure of hydrophobic residues averaged over snapshots from MD simulations, and they used this value as a novel indicator (i.e., spatial aggregation propensity (SAP) to predict APRs) [61,62]. They identified 14 aggregation-prone motifs in constant regions of human IgG molecules, which are not conserved among the other antibody classes [63].
MD simulations have been used to study the mechanisms of antibody aggregation and amyloidosis [64,65]. In the physiological condition, amyloid formation and deposition of immunoglobulin light-chain proteins in systemic amyloidosis (AL) cause major organ failures [66][67][68]. While the κ light-chain is dominant (λ/κ = 1:2) in healthy individuals, λ is overrepresented (λ/κ = 3:1) in AL patients [69]. To understand the structural basis of the amyloid formation and the sequence preference, Zhao et al. examined the correlation between the sequence and structural stability of dimeric variable domains of immunoglobulin light chains using molecular dynamics simulations of 24 representative dimer interfaces, followed by energy evaluation of conformational ensembles for 20 AL patients' light-chain sequences. They identified a stable interface with displaced N-terminal residues. This interface provides the structural basis for AL protein fibrils formation (Figure 2). Proline isomerization may cause the N-terminus to adopt amyloid-prone conformations. They found that λ light chains prefer misfolded dimer conformation, while κ chain structures are stabilized by a natively folded dimer. According to the available crystal structures, the structural repertoire of λ chains is different and more diverse than that κ of the chains [22]. This suggested that λ light-chain protomers have larger conformational diversity than κ chain protomers and thus are easier to be unfolded. The unfolded λ light-chain protomer further formed aggregates, while the intact κ chain protomer can be protected by the natively folded dimer.

Allosteric Effects in Antibodies
The process of antibody-antigen recognition is complicated and associated with conformational transitions of the antibody by the inherent flexibility [70][71][72]. Recent studies showed that there are allosteric effects during the recognition process of antibody-antigen recognition [73,74]. Interestingly, the constant domain also plays an essential role in antigen recognition [8,[75][76][77][78][79]. Based on over 100 crystal structures of antibody Fab domains in either unbound or bound form, a common effect was discovered, that is, distant CH1-1 loops undergo significant fluctuation upon antigen binding [80]. The removal of the intermolecular disulfide bond between light chain and heavy chain in a Fab-recognizing prion peptide showed binding energy enhancement upon the molecular dynamics simulation [81]. Zhao et al.'s work on solanuzumab and crenezumab showed that antibodies with identical variable domain, but different constant domain have significantly different affinities when binding to Aβ species [43]. Interestingly, they found that in the apo form, the constant domain of solanuzumab is more flexible than crenezumab while more rigid than crenezumab when bound to Aβ fibrils, and this flexibility change might correspond to the binding affinity difference between crenezumab and solanuzumab ( Figure 3). They further proposed that this flexibility change is potentially due to the entropy redistribution after the antibody-antigen recognition.
Antibodies 2018, 7, x FOR PEER REVIEW 6 of 14 on over 100 crystal structures of antibody Fab domains in either unbound or bound form, a common effect was discovered, that is, distant CH1-1 loops undergo significant fluctuation upon antigen binding [80]. The removal of the intermolecular disulfide bond between light chain and heavy chain in a Fab-recognizing prion peptide showed binding energy enhancement upon the molecular dynamics simulation [81]. Zhao et al.'s work on solanuzumab and crenezumab showed that antibodies with identical variable domain, but different constant domain have significantly different affinities when binding to Aβ species [43]. Interestingly, they found that in the apo form, the constant domain of solanuzumab is more flexible than crenezumab while more rigid than crenezumab when bound to Aβ fibrils, and this flexibility change might correspond to the binding affinity difference between crenezumab and solanuzumab ( Figure 3). They further proposed that this flexibility change is potentially due to the entropy redistribution after the antibody-antigen recognition.

Modulation of the Effector Functions
Besides antigen recognition, the effector functions, such as antibody-dependent cellular cytotoxicity, complement-dependent cytotoxicity, and antibody-dependent cell-mediated phagocytosis, can also be engineered based on the structures between fragment crystallizable region (Fc) and receptors and the computational techniques in a high-throughput way. The engineerable parts are the hinge region, CH2 domain, N-glycans and N-glycan-attached residues. All'acqua et al. introduced various modifications into the hinge region of mAb 12G3H11 to modulate the hinge's length, flexibility, and/or biochemical properties. They found that the upper and middle hinge are important, and mutations introduced to these regions can modulate the FcγRIIIa or C1q binding [5]. Lazar et al. applied computational optimization of the Fc region [82] using Protein Design Automation (PDA) [83] technology and Sequence Prediction Algorithm (SPA) algorithm [84]. They created various mutations to improve the affinity up to 169-fold with a FcγRIIIa:IIb ratio about nine-fold. Engineering on the N-glycan can also modulate the effector functions. The glycans at Asn-297 (N-glycan) are important to maintain the quaternary structure and the stability of the Fc [85], as

Modulation of the Effector Functions
Besides antigen recognition, the effector functions, such as antibody-dependent cellular cytotoxicity, complement-dependent cytotoxicity, and antibody-dependent cell-mediated phagocytosis, can also be engineered based on the structures between fragment crystallizable region (Fc) and receptors and the computational techniques in a high-throughput way. The engineerable parts are the hinge region, CH2 domain, N-glycans and N-glycan-attached residues. All'acqua et al. introduced various modifications into the hinge region of mAb 12G3H11 to modulate the hinge's length, flexibility, and/or biochemical properties. They found that the upper and middle hinge are important, and mutations introduced to these regions can modulate the FcγRIIIa or C1q binding [5]. Lazar et al. applied computational optimization of the Fc region [82] using Protein Design Automation (PDA) [83] technology and Sequence Prediction Algorithm (SPA) algorithm [84]. They created various mutations to improve the affinity up to 169-fold with a FcγRIIIa:IIb ratio about nine-fold. Engineering on the N-glycan can also modulate the effector functions. The glycans at Asn-297 (N-glycan) are important to maintain the quaternary structure and the stability of the Fc [85], as well as the Fc-Fc receptor recognition [86][87][88][89]. The deglycosylation of IgG1 resulted in a 40-fold loss in FcγRI binding [86]. The composition of N-glycans can modulate the binding affinity of IgG1 Fc to Fc γ receptors [90][91][92][93]. Lee et al. applied molecular dynamics to the Fc region of an antibody, and they found that the C'E loop and the CH2-CH3 orientation are dynamic and changes in N-glycan composition shift their conformational ensembles and optimize the interface with the Fc receptor for efficient binding [94]. The noncovalent interactions of multiple amino acid residues from the Fc domain with the N-glycan are necessary for optimal recognition of the protein-binding site by FcγRI [95]. Moreover, single amino acid mutations at these residues of contact from the Fc domain have considerable effects on glycan processing [95][96][97]. X-ray crystallography and NMR data indicated that the two arms of N-glycan showed either the bound state (attached to the Fc fragment) [98] or the free state (detached from the Fc fragment) [99]. Several studies showed that mutations on the residues, which bind to the N-glycan, can shift the free/bound states and then modulate the effector function [100].

In Silico Vaccine Design
In silico methods are also widely used in vaccine design. Generally, structure-based in silico vaccine design includes epitope identification, immunogen design by epitope grafting, and antibody/antigen structure prediction. The epitopes include linear (both T-cell and B-cell) and conformational epitopes (mainly B-cell). The bioinformatics tools are well-established to predict the linear epitope [101][102][103][104]. For the conformational (discontinuous) epitopes, the three-dimensional (3D) structure of proteins is often required (e.g., envelope glycoproteins [105][106][107], epitope peptides [108], and the native viral spike [109,110]). These methods use the 3D structure to determine the physico-chemical properties, such as the surface accessibility, the propensity scores of residues in spatial proximity, and the contact numbers of residues, to obtain the final score for epitope evaluation [111][112][113]. In case of no available crystal structure, in silico protein structure prediction is useful. These methods, including template-based and free modeling, are similar to the antibody structure prediction and antibody-antigen complex structure prediction in the previous section, but with a focus on the antigen part. Ideally, among the epitope candidates, the selected epitopes in a vaccine should be conserved across different stages of the pathogen and its variants with the consideration of desired immune response. The epitope can be grafted onto a heterologous protein scaffold for immunogen design [114]. There are three criteria for selecting suitable scaffolds for epitope grafting [115]. Firstly, smaller epitopes are preferred to minimize the unwanted immunogenicity. Secondly, flexibility of the epitope can be tuned to improve the wanted immunogenicity. Thirdly, the neighboring residues are important for the specificity. Besides the epitope, the non-epitope regions should also be resurfaced to enhance the immunogen properties, (e.g., solubility and stability [116,117]). There are several successful cases of in silico design of vaccines. For example, Correia et al. started the vaccine discovery by computational protein design [118]. They generated small, thermally and conformationally stable protein scaffolds by the in silico methods. Further experiments showed that these protein scaffolds accurately mimic the viral epitope structure and induce potent neutralizing antibodies.

Conclusions
High-throughput technology combined with computational technologies have dramatically advanced the development of biological therapeutics [119][120][121]. In the field of antibody therapeutics, there are various antibody data resources in terms of their contents and features including PDB, DIGIT, IEDB, and IMGT. To engineering antibodies, the 3D structures are crucial to understand the antibody-antigen recognition mechanism, to evaluate the stability and immunogenicity of the antibody, and to predict the function/efficacy change upon modification. However, the available experimental techniques, such as crystallography and NMR, can only solve small portion of the protein structure space. Computational methods can expand this space by selecting reasonable templates, predicting epitope, and modeling the CDR region with acceptable deviation. However, the combination of these computational capabilities might result in accumulated errors, especially for the antibody-antigen complex structure prediction [122]. Recent critical assessment of protein interactions(CAPRI) reported that six out of 20 test complexes cannot obtain acceptable models even with~1000 candidate models per complex [123]. Thus, in the case of protein without available structure or template, it is still challenging to perform the whole design process in silico.
Improvements in conformational sampling methods and development of scoring functions used to estimate free energies are also much-needed. Sampling the conformation of flexible CDR loops, especially CDR-H3, is particularly important in antibody design, because large conformational changes can occur when antigen binding. Moreover, the evaluation of antibody allosteric effect requires extensive sampling about the motion and dynamic of antibody constant domains. Free energy perturbation (FEP) methodology can be used to estimate relative antigen binding affinity differences to the antibody variants [124], but the set-up of these calculations is tedious and the simulations are time-consuming (about 1-2 days per structure), which limits its applicability to only small number of structures. Empirical scoring functions [125][126][127][128][129] serve as alternative methods to free energy simulation methods, offering a quicker way to estimate binding affinities where high computational throughput is needed (Ala scanning or affinity maturation) [130]. The lower accuracy and applicability of the empirical scoring functions needs to be considered in practical applications.
Allosteric effects and conformational change in antibody-antigen recognition and antibody effector functions are emerging and challenging to evaluate using experimental techniques [131][132][133]. Molecular dynamics simulations can be used to study the allosteric effects; however, they require infeasible sampling time using regular sampling techniques. The emerging Markov State Models [134] can be used to create conformational free energy profiles with multiple shorter simulations to better evaluate the allosteric effects [135]. To summarize, the combination of in silico technologies, expending databases, and greater availability of structures of antibody-antigen complexes will have a real impact in aiding antibody drug discovery.