Harnessing Fc/FcRn Affinity Data from Patents with Different Machine Learning Methods

Monoclonal antibodies are biopharmaceuticals with a very long half-life due to the binding of their Fc portion to the neonatal receptor (FcRn), a pharmacokinetic property that can be further improved through engineering of the Fc portion, as demonstrated by the approval of several new drugs. Many Fc variants with increased binding to FcRn have been found using different methods, such as structure-guided design, random mutagenesis, or a combination of both, and are described in the literature as well as in patents. Our hypothesis is that this material could be subjected to a machine learning approach in order to generate new variants with similar properties. We therefore compiled 1323 Fc variants affecting the affinity for FcRn, which were disclosed in twenty patents. These data were used to train several algorithms, with two different models, in order to predict the affinity for FcRn of new randomly generated Fc variants. To determine which algorithm was the most robust, we first assessed the correlation between measured and predicted affinity in a 10-fold cross-validation test. We then generated variants by in silico random mutagenesis and compared the prediction made by the different algorithms. As a final validation, we produced variants, not described in any patents, and compared the predicted affinity with the experimental binding affinities measured by surface plasmon resonance (SPR). The best mean absolute error (MAE) between predicted and experimental values was obtained with a support vector regressor (SVR) using six features and trained on 1251 examples. With this setting, the error on the log(KD) was less than 0.17. The obtained results show that such an approach could be used to find new variants with better half-life properties that are different from those already extensively used in therapeutic antibody development.


Introduction
The wide therapeutic success of monoclonal antibodies (mAbs) in numerous indications is mainly due to their high target specificity and their long half-life, ranging from 3 days to more than 30 days for non-engineered mAbs. Further enhancing the half-life of therapeutic antibodies allows a decrease in the periodicity of administration and increases their efficacy [1][2][3]. Antibody half-life depends on many factors, such as the target, targetmediated drug disposition [4], heavy-chain allotype [5,6], and presence of anti-drug Abs. However, the predominant mechanism determining the half-life is the binding of the IgG Fc portion to FcRn, which protects IgG from catabolism. This binding is pH-dependent due to the presence of histidine residues in the Fc portion and glutamic acid residues in FcRn. The high-affinity complex is formed in endosomal compartments at low pH (pH 6) but not extracellularly at physiological pH (pH 7.4). In order to harness this mechanism, many companies have tested Fc mutations improving the binding to FcRn at acidic pH only, which improves the endosomal recycling efficiency and enhances the pharmacokinetics of the antibody. For example, Medimmune and Xencor have patented the M252Y/S254T/T256E and M428L/N434S mutations, respectively [1,7,8]. Finding useful mutations is not trivial, since increasing binding at acidic pH often results in a simultaneous increase in affinity at neutral pH, which mitigates the desired effect [9]. Such mutations can even worsen the pharmacokinetic properties [7,10] because of reduced antibody release from FcRn back to the plasma. In contrast, some companies voluntarily enhance the binding to FcRn at neutral pH in order to flush out antigens more rapidly [9,11].
To find the right mutants, alanine scanning combined with rational design was initially the most commonly used technique [12], leading to the identification of amino acids that are essential for the binding of Fc to FcRn. For example, mutation of the isoleucine at position 253 [12] or histidine at position 310 [13] by any other amino acid diminishes or abrogates the binding. Conversely, substitution of asparagine at position 434 by a hydrophobic amino acid (N434A, N434W, N434Y, N434F) or other types of amino acids (N434H, N434G, N434S, N434Q) [9,14] enhances the binding. More powerful approaches were then developed to find new variants, such as phage display [9], random plus directed mutagenesis [15], or combinations of in silico methods and rational design [16,17]. However, the generated mutants frequently appear as a combination of already described single mutations. Moreover, these methods still require experimental testing of many variants because of their low performance in predicting the combinatorial effect of several single mutations.
Several in silico methods have been developed to predict protein/protein binding affinity [18]. These methods are generally pre-determined equations (scoring functions) of energy terms, and the weights of the terms are optimized by machine learning on experimental datasets comprising various protein-protein complex structures. If these methods perform well with the training dataset, they generally show low correlation with a new test set, which is certainly due to the fact that the test set diverges too much from the learning set [19,20]. Indeed, as with all machine learning settings, the final performance is highly dependent on the quality and diversity of the learning dataset. Algorithms dedicated to the prediction of Fc/FcRn binding affinity have been developed [21,22]. However, the precision of these scoring functions is low, especially for evaluating the impact of multiple mutations. Most of these algorithms suffer from too reduced learning sets. Nevertheless, a lot of data are available regarding Fc/FcRn variants, but they have not been exploited with these methods yet. Indeed, only a selection of variants is usually described in the scientific literature, even in supplementary data, although a larger number of tested variants can be retrieved from patent applications or patents. For example, researchers from Chugai Pharma tested more than 1000 variants, but the comprehensive set of mutated variants can only be found in some patent applications (e.g., WO2013046704), whereas only 7 variants are described in the corresponding article [23].
In the present work, we collected these data in order to constitute a specific Fc/FcRn dataset that could be used in machine learning algorithms. Our dataset of Fc variants was mainly collected from the patent literature. We then trained different algorithms with Fc/FcRn parameters calculated with bioinformatic tools, together with affinity data, and assessed the performance of the different algorithms in a 10-fold cross-validation setting. We also evaluated the algorithms by comparing the distribution of predicted affinities for thousands of in silico randomly generated Fc variants. Finally, to validate the robustness of the models, we produced three new variants with three, five, and seven mutations and compared the predicted affinity with the experimental binding affinities measured by SPR. Table 1. Datasets and machine learning methods used.

Number of Variants Selection Criteria
First learning set (FLS) 1099 Affinities measured by SPR at 25 • C, pH 7.
Second learning set (SLS) 1323 FLS variants + 224 variants with affinities only measured at pH 6.

Support vector regressor (SVR)
The objective of support vector machines (SVMs) is to find the hyperplane separating at best the two categories of instances defined in a training sample. Support vector regression (SVR) uses the same principle, adding a constraint on the maximal distance between the instances and the hyperplane.

Multi-linear regression (MLR)
Multiple linear regression optimizes a linear function of the parameters.

Multi-layer perceptron (MLP)
An MLP is a class of feedforward artificial neural networks (ANNs) with at least three layers of nodes (input, hidden, and output) and the neurons of hidden and output layers using non-linear activation functions.

Random forest regressor (RFR)
A random forest is a meta-estimator that fits a number of classifying decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control overfitting.

Algorithms and Tested Features
The 3D structure of the 1323 Fc variants were modeled from the Fc/FcRn co-crystal (4N0U.pdb file [13]) with PyMOL v2.5.4, and features reported to be relevant in previous studies [25][26][27] were calculated with the CCP4 software v8.0.009. In total, 147 features were initially considered (Table A1) and collected from the 1323 Fc/FcRn 3D models. In our model, variants considered as different in the original patent can lead to duplicates since not all amino acids are used for computing parameters. For example, if a variant has the S239K/T256E substitution and the other variant has the L235R/T256E substitution, it is considered as a duplicate because the influence of the S239K or L235R substitutions is ignored in our model. Including these positions in the study was nevertheless considered. However, from our dataset, mutations at these two positions do not significantly alter the affinity. Consequently, in this example, only the T256E substitution is taken into account, and the two variants appear as duplicates in our set. We thus eliminated such duplicates, which could bias the training results. As a result, the FLS contains 1048 examples and the SLS 1251 examples.
We then tested different machine learning (ML) algorithms using the FLS and SLS learning sets. Among the scikit-learn library [28], we chose four different algorithms: support vector regressor (SVR), multi-linear regression (MLR), multi-layer perceptron (MLP), and random forest regressor (RFR). These methods were well suited for the type of data we had and the type of predictions we wanted to obtain. Moreover, they are quite simple in their principles, and we wanted to see if the parameters we had in mind were sufficient for the task. Using complex and more opaque artificial intelligence methods hinders problems such as insufficient examples in the learning set or overfitting.
We first used the SelectFromModel method of scikit-learn. This method evaluates the importance of each parameter based on the optimized models. The parameter with the lowest importance is removed, and the performance of the new model is computed. If the performance is not altered, the removal is confirmed, and removal of the next lowest importance parameter is evaluated. The iteration stops when the performance as compared to the initial model is altered by removal of the lowest importance parameter. Application to our two models consistently retained 25 to 28 features for the FLS and 10 to 12 features for the SLS. This first reduction in the number of features greatly improved the performance (evaluated by 10-fold cross-validation) of the MLR algorithm. The performance of SVR, RFR, and MLP remained unchanged (data not shown), but with a net gain in calculation speed.
We then removed features that were highly correlated (evaluated by the pandas.DataFrame.corr method) and kept 11 features for the FLS and 6 for the SLS (Figure 1). This second step slightly improved the performance of the MLR with the FLS and slightly decreased the performance of the other algorithms with the SLS. However, this further dimension reduction is useful to prevent overfitting. Further dimension reduction (removing of features) negatively impacted the performance of all algorithms.
We compared the results obtained for the two learning sets using the optimal number of features: FLS with 11 features and SLS with 6 features. The most important feature of the FLS model (35% relative importance) is the number of atoms interacting between the β chain of FcRn (β2-microglobulin) and the Fc (  To ensure that the models were not overfitting, despite good learning performance on the entire training datasets, we used a 10-fold cross-validation scheme. We performed this cross-validation test several times for each algorithm to ensure that scores were consistent between different runs, because each run of the algorithms can produce different results. With optimized parameters (see Materials and Methods), the consistent regression scores of the M1048/11 model (R²) obtained with MLR, MLP, SVR, and RFR are on average 0.45, 0.60, 0.75, and 0.84, respectively, and 0.77, 0.80, 0.82, and 0.88, respectively, with the M1251/6 model ( Figure 2). The scores of MAE (mean absolute error) and MSE (mean squared error) are also ranked according to the best regression score, with the best scores obtained for the RFR. Although regression scores are better with the SLS model due to the larger range of KD values in the training set, MAE and MSE increased significantly for all algorithms compared to the FLS model. We also shuffled KD values in order to control the fit of our models. As expected, the correlation dropped drastically with R² below 0 (R² with no intercept can result in a negative value) while MAE and MSE increased dramatically at the same time for all algorithms. We also tested a model that also incorporated the energy terms (60 parameters) calculated from the FoldX suite [29] with all variants and following the same procedure of removing duplicates and correlated features, but the performance did not improve, and the best correlation obtained was 0.89 with 11 parameters with the RFR ( Figure A1). To ensure that the models were not overfitting, despite good learning performance on the entire training datasets, we used a 10-fold cross-validation scheme. We performed this cross-validation test several times for each algorithm to ensure that scores were consistent between different runs, because each run of the algorithms can produce different results. With optimized parameters (see Materials and Methods), the consistent regression scores of the M1048/11 model (R 2 ) obtained with MLR, MLP, SVR, and RFR are on average 0.45, 0.60, 0.75, and 0.84, respectively, and 0.77, 0.80, 0.82, and 0.88, respectively, with the M1251/6 model ( Figure 2). The scores of MAE (mean absolute error) and MSE (mean squared error) are also ranked according to the best regression score, with the best scores obtained for the RFR. Although regression scores are better with the SLS model due to the larger range of K D values in the training set, MAE and MSE increased significantly for all algorithms compared to the FLS model. We also shuffled K D values in order to control the fit of our models. As expected, the correlation dropped drastically with R 2 below 0 (R 2 with no intercept can result in a negative value) while MAE and MSE increased dramatically at the same time for all algorithms. We also tested a model that also incorporated the energy terms (60 parameters) calculated from the FoldX suite v5.0 [29] with all variants and following the same procedure of removing duplicates and correlated features, but the performance did not improve, and the best correlation obtained was 0.89 with 11 parameters with the RFR ( Figure A1).

Randomly Generated Variants Predicted Affinity Comparison with the Four Algorithm
For evaluating the capacity of our two models and algorithms to generalize data, we tested both models with the four algorithms with in silico randomly gen Fc variants. We generated two sets of more than 8000 variants containing three (mu and five (mut5 set) random mutations. These mutations were introduced at positio 252, 253, 254, 255, 256, 257, 285, 286, 288, 307, 308, 309, 310, 311, 314, 428, 433, 434, 43 436 because the calculated features of our models only included these positions. W erated one additional set of 1000 Fc variants containing six to eight mutations (mu with not too much destabilizing, or with a positive effect on their own according dataset. The number of mutations was limited to eight because the effect of close tions on the stability and production of the antibody is hard to predict.

Randomly Generated Variants Predicted Affinity Comparison with the Four Algorithms
For evaluating the capacity of our two models and algorithms to generalize to new data, we tested both models with the four algorithms with in silico randomly generated Fc variants. We generated two sets of more than 8000 variants containing three (mut3 set) and five (mut5 set) random mutations. These mutations were introduced at positions 251, 252, 253, 254, 255, 256, 257, 285, 286, 288, 307, 308, 309, 310, 311, 314, 428, 433, 434, 435, and 436 because the calculated features of our models only included these positions. We generated one additional set of 1000 Fc variants containing six to eight mutations (mut8 set), with not too much destabilizing, or with a positive effect on their own according to our dataset. The number of mutations was limited to eight because the effect of close mutations on the stability and production of the antibody is hard to predict.
We first compared the distribution of the predicted log K D values for the SLS (1323 variants, σ 1.47, log K D values range: [−1.03, −8.49] at pH 7.0) by the four algorithms ( Figure 2). With the FLS model (Figure 2 top), the four algorithms have the same overall distributions but fail to reproduce the same distribution of log K D values as the SLS set, in contrast to the SLS models (Figure 2 bottom).
Our two models did not reproduce the same distribution of log K D values with the three sets of random mutants. With our two models, all the algorithms predicted that random variants of the mut3 and mut5 sets, but also variants of the mut8 set, would have on average less affinity at pH 7.0 than variants of the DS with a tendency to predict higher affinity for the mut8 set. The standard deviations and calculated log K D means are far higher for the SLS model than for the M1048/11 model. Interestingly, different algorithms yield different distributions of log K D , especially for the set of random variants ( Figure 2). The RFR has the lowest log K D mean predictions with the SLS and is the only algorithm that does not predict higher log K D mean for the mut8 set. The MLP predicted the same type of distribution of K D values as RFR with a tendency to predict higher values. The MLR is the algorithm with the highest standard deviation with the two models. Finally, the SVR showed a much narrower range of values with a standard deviation decreasing with the number of mutations with the first model in contrast to the second model.

Experimental Validation
To further validate our prediction method, we predicted the affinity of three new variants, which to our knowledge have never been tested. We then produced them and measured their affinities. We chose variants within our sets of in silico randomly generated variants (A3 (M252W/M428K/N434W), B5 (T256Y/H285Q/N286D/V308A/N434Y), C7 (T256E/N286H/K288E/V308P/L309D/N434Y/Y436K)) and introduced them in tocilizumab. For the control, we also generated two tocilizumab variants reported in the patent application: T8 (M252Y/N286E/T307Q/V308P/Q311A/N434Y/Y436V) and T3 (M252Y/T307D/N434Y). Our first two variants contain at least one substitution reported as a single destabilizing mutation in patents: the M428K for the mut3 variant and T256Y for the mut5 variant. The variant with seven mutations is a variant with a high predicted affinity by all the algorithms from the set of eight mutations. Affinities of the variants T8 and T3 measured in our SPR assay are close to the affinities reported in the patent application ( Figures A2 and A3). Overall, with the FLS and SLS, the four algorithms predict the affinity within a good range and are in good correlation with the measured affinities ( Table 2, Figures A2 and A3). In accordance with the 10-fold cross-validation results, the model poorly performs on the WT (tocilizumab) because it belongs to a class of antibodies with very weak binding for FcRn at neutral pH, whereas our model has better predictive potency for antibodies with affinities ranging from 1 × 10 −9 to 1 × 10 −6 for FcRn at neutral pH. The correlation of the six predicted vs actual measured affinities is better with the SLS model for the RFR, SVR, and MLR algorithms, in contrast to the MLP. However, the MAE is reduced for all algorithms (Table 3). For the new variants we produced, the SLS model has better performance than the FLS for all algorithms, especially for the SVR (Table 3). Overall, with the SLS model, the SVR algorithm has the best performance followed by the RFR, MLR, and MLP. Table 2. Comparison of predicted versus experimental affinities at pH 7.0 for 3 randomly generated variants. The 3 variants have 3, 5, and 7 mutations and are predicted with the two different models and with 4 different algorithms. Measured affinities at pH 6.0 are also shown. Cells in green, yellow, and red correspond to very good (log err = |log(pred) − log(K D )| ≤ 0.1), correct (0.1 < log err ≤ 1), and incorrect (log err > 1) predictions, respectively. Statistical analysis is given in Table 3.  Table 3. Comparison between MAE, Pearson correlation coefficient, and maximum error between predictions at pH 7.0 and measurements for the 6 antibodies of Table 2 or only for the 3 produced variants (Mut3, Mut5, and Mut8).

Discussion
Altogether, the present results show that it is possible to computationally predict the affinity for FcRn of Fc variants mutated at the interface of the Fc/FcRn complex with reasonable precision (+/−1 log). To do so, we carefully collected as many as possible publicly available Fc variants/FcRn affinity data by scrutinizing the scientific literature and relevant patents. Since differences exist between protocols used to measure the affinities, we built two different datasets. The smallest one includes only values obtained using a single protocol; the largest includes all available values. To build the two models based on these data, a large number of features relevant to the affinity prediction of a protein complex as well as features relevant for this particular type of complex were included. We also minimized as much as possible the overfitting by eliminating features that were too correlated between them in each learning set. To further optimize our procedure, we tested four algorithms. The results of these tests showed that random forest has the best capacity to adapt to our learning sets as compared to MLP, MLR, or SVR algorithms (with our hyper-parameters). Indeed, regression, MAE, and MSE scores are always better with this algorithm, regardless of the model used. This study also shows that the learning set has a high impact on the importance of features and on average predictions.
Not only are the models important but also the algorithms, as they show some variability in the predicted values and their distributions. It is, however, difficult to explain the variability between algorithms since their parameters are different. For example, the larger standard deviation of the MLR algorithm is probably due to its mathematical function, which is less sensitive to threshold effects than are MLP, SVR, and RFR. The MLP algorithm has been tuned with the tanh function (sigmoid function) and with an alpha parameter of 20 to limit overfitting. An alpha parameter of 0.1 would yield a larger range of value, but it would have a tendency to overfit the data. Algorithms with this kind of threshold are more relevant from a biochemical point of view, since the affinity of Fc variants is usually limited to 1 × 10 −10 , especially for random variants. This is important to keep in mind because if two algorithms are compared and have more or less the same performance in a crossvalidation scheme, then it becomes difficult to decide which of them will better generalize to new data. It is also possible that an algorithm with good performance overfits to data, even with a cross-validation test, and will consequently have less capacity to generalize to new data than an algorithm with lesser performance on the same cross-validation test. For example, the RFR has the best performance in the cross-validation test, but the SVR has better performance with new variants. Moreover, the MLR has the worst performance on the cross-validation test, but it performs slightly better at predicting affinities for new variants than the MLP.

Model FLS
Our entire dataset is composed of 1323 variants. However, we built our FLS model selecting only homogenous data, derived from an accurate technique (SPR), in order to limit noise that could be induced by outliers. The drawback is that the FLS model is biased towards a particular type of variants, namely variants engineered to have better affinity at pH 7.0. Indeed, despite our efforts to get a maximum of unique variants from the patent database, our approach is still limited by the number and quality of data. For example, the exact K D value of a variant described as a non-binder cannot be known, yet the impact of its mutations would certainly increase performance. In addition, companies tend to only publish good results, i.e., variants with better affinity, and not those with decreased affinity. This results in a dataset with a majority of variants with high affinities for FcRn, which decreases the performance in estimating low affinities. The quality and consistency of data is also a prerequisite of any model. However, the accuracy of measures may be low, especially for variants that are discarded from the first round of selection. Moreover, there are also sometimes discrepancies between studies reporting affinities. For example, in a recent study [17], mutation N434S has been reported to reduce the binding affinity of Fc to FcRn, whereas in patent US20100204454 this sole mutation has been reported to enhance the binding by threefold. Another effect of the dataset bias is that not only K D but also the weight of the features could be over-or underestimated. The difference in importance of the features in this model can be explained by the composition of the FLS. Indeed, most of the variants of this learning set contain a hydrophobic amino acid at position 434, but they do not systematically have mutations in the region of the Fc near the β2m, which changes the number of interactions between the two molecules. As a result, this feature has a higher importance than the buried surface of residue 434 of the Fc. The relative importance of features with this model is also due to the absence of variants containing mutations at positions that are deeply buried (252, 253, and 310), explaining the very low importance (although crucial for the binding of the complex) of these positions in this model.

Model SLS
It has been shown that antibodies binding to FcRn with affinities lower than 860 nM at physiologic pH have reduced half-lives [30]. Having data on the same variants at both acidic and physiological pH could help to better quantify the impact of this parameter. However, affinities at physiological pH are almost always reported as "no binding" because of the low sensitivity of the methods. It has been proposed that the pH impact was fairly linear between pH 6.0 and 7.4 on a log scale [24]; hence, a constant value could suffice to approximate the pH change. We made the second model M1251/6 with K D at pH 7.0 based on this assumption, since all new examples of this second model were only reported at pH 6.0, or with no binding measure at neutral pH, and were mainly variants with a single destabilization mutation introducing an interpretation bias for the pH parameter (the algorithms interpret the diminution of pH as a factor reducing the binding). We homogenized the data by lowering the K D of the examples reported only at pH 6.0 by 68-fold, since tocilizumab was reported to have a K D of 1.3 × 10 −6 M at pH 6.0 and 8.8 × 10 −5 M at pH 7.0. Although this is a crude approximation, the correlation increased for all algorithms. However, the MAE and MSE increased, probably because the 68-fold change in K D cannot be applied to all variants, or because these new examples had their affinities measured by less sensitive techniques such as ELISA. Indeed, we also evaluated the prediction of the four algorithms with our two models. The same transformation was applied on the reported affinities at pH 6.0, but the resulting precision for the described affinity was only +/−1.5 log K D by the four algorithms with the second model (Table A2). In addition, if several histidine mutations are considered, the K D change between the two pH values could be more drastic. In model M1251/6, the buried surface area of the FcRn amino acid 129 is the most discriminant feature (importance: 0.7) because most variants with no hydrophobic mutation at this position have decreased affinities for FcRn in the SLS. The weights of other features calculated by MLR, MLP, and SVR are negligible, which explains why the correlation curves of the second model show very little change in predicted K D s for large, measured K D ranges and can cluster into two groups.
Cross-validation is the classical test to evaluate if a model does not overfit. Even if the algorithms performed well with the two models, both models are biased towards variants engineered to have high affinity at neutral pH as explained above. To evaluate the impact of this bias, we tested whether the models would reproduce the same distribution of predicted K D of the learning set with the random variant sets (mut3 and mut5 sets). All the algorithms predicted ranges of values of lower affinity for the random variant sets than the learning set of the M1251/6 model ( Figure 2). Conversely, the M1048/11 model tends to stick to the range of value of the learning set except for the SVR (Figure 2).
In contrast to the SVR and MLR, the RFR and MLP algorithms did not predict higher affinities within the set of eight "good" random mutations in which only individual mutations shown to increase the affinity were kept. However, some mutation combinations incorporated in this set might have decreased affinities.
We also compared the first 20 variants for each set with the higher predicted affinity, considering each algorithm. Most of the experimental Fc variants with significantly better affinity for FcRn at neutral pH have hydrophobic substitution at position 434, whereas histidine 310 and isoleucine 253 are not substituted. However, none of the algorithms tested shows this pattern in its top 20 ranked variants (Table A3).
We challenged our models with mutation combinations not diverging too much from the examples of the learning set. We choose two variants from the set of three and five mutations, each containing a destabilizing mutation. To ensure that we would be able to measure an affinity for these variants, they also had to contain at least one mutation which showed great improvement in affinity (such as the N434Y or N434W mutations) to counterbalance the negative effect on affinity. Although the chosen mutants do not diverge too much from the learning sets, the results of the experimental measurements show that we are able to accurately predict their affinities.

Further Improvements
Although our experimental validations show the reliability of the method, the robustness and predictive power of the models would be significantly increased with a larger experimental validation set. In addition, our DS comprises 1323 variants, but this number could be larger if we had taken into account intramolecular interaction or long-range effects. Indeed, some mutations that are not at the interaction surface can impact the affinity of the complex. For example, Booth et al. [16] hypothesized that M428L and A378V could stabilize the 250 pseudo-helix. They also proposed in their study to complement the positively charged N-terminal region of the FcRn β-domain with T256, T307, H285, N286, and N315. Other general descriptors to consider could be the electrostatic complementarity between regions of the complex or the rigidity of the 250 pseudo-helix. It has also been shown that the destabilization of the region of the Fc at low pH could be responsible for higher binding [31]. Although the reasons are not very well understood, Monnet et al. [15] showed that the positions that are not in the interaction site (264 and 389) could favorably impact the binding. More intriguingly, they have also shown that mutations far away from the interaction site (P230S, P228L, or P228R) could enhance FcRn binding, although not consistently. In the same way, Ternant et al. [5] reported the influence of four different G1m allotypes regarding FcRn binding, although amino acids 214, 356, and 358 are distant from the interaction site. Some of these mutations outside of the Fc/FcRn interaction site have been introduced for optimizing binding to Fcγ receptors (or already exist in natural sequences), and they could still have an impact on FcRn binding. These new parameters could thus enhance the performance of our method.
As explained at the beginning of this paper, we chose using rather simple methods for learning because we did not know whether we had enough data, because we wanted to avoid overfitting, and because we wanted to demonstrate the validity of the global approach. The results bring positive answers to these three points, and it would now be worth trying more complex methods such as evolutionary algorithms or neural networks.
Finally, we focused on predicting the overall affinity (K D ) because there were too few data on k on and k off . However, to obtain variants with desirable properties, k on and k off should also be taken into account [24]. Indeed, it has been shown that the endosomal trafficking time of the antibody was very short (a half-life time less than 10 min). Thus, it would be important for an antibody to have a very high k on at pH 6.0 rather than a low k off , which could prevent the antibody from being released back into the circulation. However, generated variants with a slow off-rate exhibited an extended half-life in mice and cynomolgus monkeys [16]. In any case, integrating these data could help to improve in silico design methods.

Antibody Expression and Purification
T3, T8, A3, B5, and C7 antibodies were produced by RD-Biotech (Besançon, France) following standard procedures by transient transfection of CHO cells. Antibodies were purified with protein A.

Surface Plasmon Resonance
SPR experiments were performed on Bia3000 apparatus at 25 • C in 50 mM phosphate buffer with 150 mM NaCl containing 0.05% P20 surfactant (GE Healthcare, Chicago, IL, USA) adjusted at pH 7 or pH 6 as required. hFcRn (Immunitrack, Copenhagen, Denmark) was immobilized in acetate buffer at pH 5 on CM5 sensor chips at a level lower than 200 RU. Increasing concentrations of antibody variants were injected over 180 s. After a dissociation phase of 400 s, the FcRn-coated sensor chip was regenerated by a pulse of 10 mM NaOH and PBS. The multi-cycle kinetics were evaluated by a bivalent model fitting (BiaEvaluation 4.1.1, GE Healthcare). Each variant was analyzed on freshly immobilized hFcRn.

Structure-Based Feature Extraction
To model the 3D structures of the Fc mutants, the 4N0U.pdb file was used as a template. Using the mutagenesis tool from PyMOL v2.5.4, the 3D structure of the complex between FcRn and each mutant from the dataset was generated and exported as a pdb file. CCP4 software v8.0.009 was used to compute the different features used in the algorithms. Features calculated for each residue by CCP4 were: BSA (buried surface area), ASA (accessible surface area), and solvation energy. General features calculated by CCP4 for the whole complex were: number of interface residues, ∆G (solvation energy gain score), p-value (hydrophobic score), BE (theoretical binding energy), and number of hydrogen and salt bridges between interfaces. Total number of hydrogen bonds (cutoff: 3.5 angstroms), total number of salt bridges (cutoff: 4.0 angstroms), total number of contacts between amino acids' cα (cutoff: 4.0 angstroms), average distance between hydrogen bonds and number of paired hydrophilic amino acids were also added in addition to CCP4-calculated parameters.

Conclusions
Affinity prediction is one of the toughest bioinformatics challenges, and although progress has been made, there is still room for improvement. We chose to focus on one particular protein complex type for which many data were available. The results of the training show that this kind of approach is appropriate and also that the diversity of the training set is crucial to avoid bias and to correctly evaluate the importance of the different features. Despite all the limitations of our models, we were able to correctly predict the affinities of the three variants that were produced in this study. However, the obtained results do not allow us to make an educated choice between the methods. The SLS-trained algorithms appear to perform better than the FLS-trained ones, both in 10-fold crossvalidation ( Figure 1) and in predicting the affinities of the new variants (Tables 2 and 3). However, the MLS and MLP algorithms perform better in predicting the new variants, but the RFR algorithm is better in the 10-fold cross-validation. Thus, deciding between the three methods will require more validations.
The advantage of this method is that it does not require initial knowledge to generate in silico random variants and select mutants with high affinity. However, like most artificialintelligence-based methods, it does not explain how various combinations of mutations can modulate the affinity of the Fc to FcRn. Still, it provides new interesting combinations of mutations while reducing the number of variants to test.                            Figure A1. Model with Foldx. Figure A1. Model with FoldX. Figure A2. SPR experiments' bivalent fit for variants A3, B5, C7, and T8.