Insights into Comparative Modeling of VHH Domains

In the particular case of the Camelidae family, immunoglobulin proteins have evolved into a unique and more simplified architecture with only heavy chains. The variable domains of these chains, named VHHs, have a number of Complementary Determining Regions (CDRs) reduced by half, and can function as single domains making them good candidates for molecular tools. 3D structure prediction of these domains is a beneficial and advantageous step to advance their developability as molecular tools. Nonetheless, the conformations of CDRs loops in these domains remain difficult to predict due to their higher conformational diversity. In addition to CDRs loop diversity, our earlier study has established that Framework Regions (FRs) are also not entirely conformationally conserved which establishes a need for more rigorous analyses of these regions that could assist in template selection. In the current study, VHHs models using different template selection strategies for comparative modeling using Modeller have been extensively assessed. This study analyses the conformational changes in both CDRs and FRs using an original strategy of conformational discretization based on a structural alphabet. Conformational sampling in selected cases is precisely reported. Some interesting outcomes of the structural analyses of models also draw attention towards the distinct difficulty in 3D structure prediction of VHH domains.


Introduction
Immunoglobulins or antibodies are crucial proteins of the immune system in jawed vertebrates. Their function is to bind antigens. Amongst the large family of immunoglobulin, immunoglobulin gamma (IgG) has been extensively analyzed, as it is more abundant compared to the rest of the isoforms. IgG is composed of two heavy and two light chains, comprised of approximately 3000 residues organized into multiple regions. Each heavy chain is folded into three conserved domains and one variable domain, and each light chain, into one variable domain and one conserved domain. All the domains have the characteristic immunoglobulin fold. In the N-terminal, domains display larger sequence variability compared to the succeeding domains in each chain; hence, they are called the variable domains (V H and V L of the heavy and light chains). The rest of the sequence is organized into successive conserved domains (C L in light chain and C H1 , C H2 , C H3 , etc., in the heavy chain). The immunoglobulin fold in variable domains is formed by the arrangement of nine antiparallel β-strands (as opposed to 7 β-strands in the conserved domains) connected by loops and arranged into two β-sheets (see Figure 1A [1]). Variable domains contain antigenbinding regions predominantly formed by the loops towards the N-terminus. Belonging to the family of V-set domains in the immunoglobulin superfamily (named IgSF), these variable domains, both V H and V L , are composed of two regions: (i) the framework regions (FRs) that correspond mainly to the β-strands, and (ii) the complementarity determining regions (CDRs) composed of the exposed loops, which bind antigenic epitopes. The CDRs exhibit the largest variability in terms of sequence, length, and composition of amino acids, resulting in conformational variability. This large diversity explains the ability of the antibodies to recognize a large number of epitopes. Due to their sensitivity and specificity to bind specific antigens, antibodies have become useful molecules for biotechnological and pharmaceuticals development [2]. Such applications require antibody engineering to improve binding affinity, facilitate humanization process, increase solubility, and alter other biophysical and biochemical properties. Knowledge of the immunoglobulin 3D structure is therefore of great interest, in order to increase their developability. Unfortunately, the availability of large antibody structures is rather limited. Thus, 3D structure prediction of antibody structure is of crucial importance for the development of applications.
The heavy chain only antibodies (HCAbs) from the Camelidae family [3,4] are an interesting class of IgGs that completely devoid of light chains. In camelids, they occur in addition to classical IgGs. Due to a specific mutation during the course of evolution, they have lost the entire light chain and C H 1 domain of the heavy chain. Thus, they bind to antigens using only one variable domain (named V H H for the V H domain from HCAb, see Figure 1B [5]). These domains are 120 to 130 residues in length; retain their ability to bind antigens even when expressed independently. This, along with their unique biophysical and biochemical properties, have made them therapeutic, diagnostic, and biotechnological tools [6][7][8]. Due to their increasing applications, these domains need to be engineered to improve their physiochemical properties. Structure prediction of these domains during V H H design process can save time and resources. Although the complexity of their 3D structure prediction is slightly reduced in terms of the number of domains to be modeled and the subsequent prediction of their relative orientations, V H H has a CDR3 loop that is much more diverse compared to its counterpart in canonical antibodies [9]. The structure prediction of antigen-binding domains for classical antibodies (see Figure 1A) mainly focuses on the two variable domains (V H from the heavy chain and V L from the light chain), each domain comprising four FRs and three CDRs. Although, it is believed/accepted so far that the structure prediction of FRs is straightforward, the CDRs pose a non-trivial task for structure prediction. Another specific difficulty is to predict their relative orientations, which was attempted in recent studies [10,11]. Structure prediction of variable domains, in general, does not specifically require de novo/ab initio methods, as a large number of structures were resolved and deposited in the Protein Data Bank (PDB) [12]. Thus, a typical template-based structure prediction method should be able to generate structural models for these domains. The main difficulty in modeling IgG antibody variable domains lies rather in specific topology of each CDR and the orientations between the two V-set domains to obtain a high accuracy model.
To address this question, a consortium, called the antibody modeling assessment (AMA), exclusively dedicated to IgG structure prediction, was organized twice in the recent past. In these assessment challenges, algorithms, such as RosettaAntibody [13], antibody modeling of Discovery Studio (Accelrys, later BIOVIA from Dassault Systèmes) [14,15], antibody design of the Chemical Computing Group (CCG), and prediction of immunoglobulin structure (PIGS) [16,17] participated in the first evaluation meeting (AMA-I) [18]. The second evaluation (AMA-II) [19] included with the aforementioned four algorithms, are four others groups from Schrödinger, Macromoltek, and a collaborative group by Osaka University and Astellas Pharma. These algorithms differ in their model scorings and refinement protocols, but all of them use template-based modeling, at least to predict the FRs. In both of these assessments, the root mean square deviation (RMSD) was adopted as the gold standard to assess the structural similarity and quality of the models. Most algorithms performed well in the FR prediction with a combined average RMSD range of 0.9 Å ± 0.2 for the domains in the dataset tested. The heavy chain CDR3 remained the most difficult one, with a combined average RMSD of 2.8 Å ± 0.4. The assessments by the two AMAs represent the current state-of-the-art antibody modeling tools [18][19][20].
Both AMA assessments did not include any camelid antibody variable domains (V H H) in their evaluations. Although some of the algorithms could predict V H H structural models, in our literature survey, the generic structure prediction methods are often used. While using comparative modeling, template backbone conformations are critical in determining the protein conformations in the query model, for this reason, a template with best sequence identity to the query was selected to model. In some cases, two or more overlapping templates are used to sample conformational space more exhaustively as seen in few structure prediction studies of V H H. In our previous study [21], we observed few examples where the choice of the templates was critical and the variability in sequence identity between FRs and CDRs could be quite complex to handle, e.g., the choice of the structural template with the best sequence identity with query sequence may not always be the best option.
This study addresses the problem of the impact of template conformations on the V H H domain modeling, especially when using single and multi-templates (see Figure 2). It attempts to underline the effect of sequence identity and structural similarity in different regions of these domains, which affect model backbone conformations. Towards this goal, the most extensive dataset of solved V H H domains was assembled. Each V H H sequence of the dataset was then used blindly to perform comparative modeling using template(s) selected through pairwise sequence identity or structural similarity. Four different scenarios of modeling were assessed to model each query sequence in the dataset; (i) using the best sequence identity template (bestSeqIdTemp); (ii) using the best structural template (bestStructTemp); (iii) using three best sequence identity templates in a multiple template mode (3bestSeqIdTemp) and, finally, as a gold standard, to evaluate the maximal reachable accuracy; (iv) all potential structural templates were tested (All). As the AMA competitions, the structural similarity was assessed using RMSD of Cα residues of the best models to the reference native structures. In addition, protein blocks [22,23] were used to measure differences in protein local conformations. As expected, using sequence identity to choose the better template is, most of the time, a reasonable choice. However, for some complex cases, we also highlight that selection of the template based on sequence identity is the worse choice, and identify some V H H domains that cannot be properly modeled. Finally, this study presents interesting perspectives that will enable the development of better strategies for V H H modeling.

Selection of V H H Sequences
An initial dataset of 140 PDB entries of V H H domains were retrieved from the PDB [12] in a similar way to our previous research [18]. A non-redundant dataset of 125 domains was selected. Out of which, 25 had missing structural data for few residues and, hence, were removed. The final structural dataset consisted of 100 structures. The sequence identity threshold may seem high, but it is a reflection of the highly conserved FRs; closely related FRs can, structurally, be very different [21]. On average, the sequence identity between V H H in this structural dataset is only of 64% (see Figure A1).

V H H Models
We generated 100 models for each V H H with Modeller [24,25]. Models were ranked accordingly to their predicted quality determined by the DOPE score [26]. Structural similarity of these models to the originally resolved crystal structures was quantified using root mean square deviation (RMSD). Figure 3 represents the structural similarity of best models for the first three scenarios, which are (1) using the best sequence identity template (namely scenario bestSeqIdTemp); (2) lowest RMSD template (namely scenario bestStructTemp); and (3) multiple template (i.e., three templates that are close in terms of highest sequence identity to the query, namely scenario 3bestSeqIdTemp). The median RMSD values for the three cases in the ascending order are scenario 2-bestStructTemp (1.4 Å) < scenario 3-3bestSeqIdTemp (1.6 Å) < scenario 1-bestSeqIdTemp (1.9 Å). For the three scenarios, the majority of the best-selected models are at least less than 2.0 Å from the native structure (76% cases for bestStructTemp, 74% cases for 3bestSeqIdTemp, and 56% for bestSeqIdTemp). On the contrary, some proposed structural models are of low accuracy. Hence, 9% of bestStructTemp, 12% of 3bestSeqtIdTemp, and 14% of bestSeqIdTemp have a RMSD value higher than 3.0 Å. This result may seem surprising in comparison to the simplicity of immunoglobulin fold, but the same was also observed during multiple previous analyses [27,28]  Only 11 V H Hs share the same template for scenario 1 and 2, i.e., the best sequence identity is also the closest in terms of RMSD. Thus, to summarize, using only the sequence identity, the addition of two other templates provides an average gain of 0.2 Å (bestSeqIdTemp and 3bestSeqIdTemp), and an average at 0.5 Å to the best possible template (bestSeqIdTemp and bestStructTemp). Finally, the addition of two more templates that are closer in sequence identity increase the quality of 0.3 Å in regards to theoretical best available approach (3bestSeqIdTemp and bestStructTemp, respectively).
However, comparison of these different results shows that scenario 1 is not always the most unfavorable method (see Figure A2). Indeed, 22% of the proteins have a better structural model using scenario 1 (bestSeqIdTemp) compared to scenario 2 (bestStructTemp, see Figure A2A). No correlation or clear trend can be observed from the comparison of the difference in terms of sequence identity to the difference in terms of structural similarity (see Figure A2B). When using multi-templates (scenario 3-3bestSeqIdTemp), the number of cases where best models have higher RMSD value are more limited, but still represent 12% of the models (see Figure A2C). The improvement of RMSD values is also slightly limited, it is better distributed when scenarios 2 and 3 are compared (see Figure A2D).

Difficult Cases
The factors that mainly affect the quality of models are the sequence identity and structural similarity between the query and the template. Thus, to investigate their impacts, results are analyzed under two categories with models having an RMSD higher than 3 Å, or not. Table 1 summarizes cases that can be considered difficult with RMSD values higher than 3 Å for at least one scenario. For the 22 cases, nine are worst for scenario 1, three for scenario 3 and, more surprisingly, 10 for scenario 2. While the first cases (bestSeqIdTemp) are expected, the three examples of scenario 3 could imply multiple sequences (3bestSeqIdTemp) and could provide spurious constraints that decrease the quality of the structural models. Besides, for the worst scenario 2, the reasons seem more complex, as the closest structural template does not provide the best structural model: the small differences in CDRs are more important than expected. The best models are provided 11 times by scenario 2 and 11 times by scenario 3. Table 1. Structural models with more than 3Å RMSD. Listed below are the PDB IDs and chain IDs of the V H H query (column 1) with corresponding RMSD values of DOPE score selected models for the three scenarios (columns 2 to 4), the sequence identity (or mean sequence identity for the multi-template case) with the structural template (columns 5 to 7, SI and MSI), and the RMSD with the structural templates for scenario 1 (column 8) and scenario 2 (column 9). The scenario in which the best model has the highest RMSD is marked in red for scenario 1, green for scenario 2, and blue for scenario 3. A '*' indicates the lowest RMSD value. At first, the analyses of raw values do not provide any simple explanation. For example, V H Hs with PDB IDs 1KXV:D, 5GXB:B, and 3K81:B, all have templates with 63-69% sequence identity and the RMSD of the template with the structure is always between 1.4 and 2.0 Å, suggesting correct structural similarity. Other cases are also striking, the highest approximation value is observed for the structural model of a V H H complexed to green fluorescent protein (PDB ID 3G9A:B) leading to a large RMSD of more than 7.8 Å with scenario 1 (bestSeqIdTemp), while its structural template is "only" at 4.7 Å (See Figure A3A). The fact that this V H H is complex may not be the best explanation for the observed difference, as the most structurally similar V H H could have given a good structural approximation. This is not the case, as often outliers are difficult to capture.
The second most striking case is one V H H binding to a GPCR protein (namely PDB ID 3SN6:N, see Figure A3B); it has its best structural template that is solved as a complex with nucleoporin (PDB ID 5C3L:D, RMSD of 1.4 Å) and shares also the best sequence identity (SI of 83%). However, it only results in a 6.9 Å structural model. This poor quality result is mainly due the long CDR3 in 3SN6:N, which has 14 residues more compared to the template 5C3L:D. The CDR3 was slightly improved when two more templates were added with multi-template scenario 3 (3bestSeqIdTemp) as the structural template CDR3 was slightly longer with dive residues more than 5C3L:D, but still nine residues were without template information.
In fact, most of the alignments of these poor-quality models have more gaps in their alignment in comparison to the rest of the cases in a similar scenario (more than five gaps), 12 out of 14 alignments, 12 out of 12 (all alignments), and 7 out of 9 in scenarios 1, 2 and 3 respectively. To add further, in the case of a V H H binding to LacY (PDB ID 5GXB:B, 129 aa length) is particularly relevant. When modeled with scenario 1 template PDB ID 4TVS:A (SI 78%, see Figure A4A), this V H H has higher RMSD compared to scenario 2 template PDB ID 5E0Q (SI 69% of 126 aa, see Figure A4B), the former does not improve even after the addition of two more closely related templates (mean sequence identity (MSI) of 77%). These results are interesting as they underline cases for which there is: (i) with no significant difference in sequence length, and (ii) lower sequence identity, a template can lead to better structural models.
More cases in which the best models have RMSD < 3.0 Å in both scenario 1 and scenario 2 were examined. Using sequence length (especially in CDRs) as a criterion influencing their structural similarity, 18 of such cases where the sequence lengths between query and template sequences were less than 3 aa, and were found to be structurally dissimilar. As an example of these cases, a case of 5GXB:B modeling, scenario 2 was better than scenario 1, conveying using the best sequence identity template to build the best structural model must be used carefully with V H Hs.

Impact of Sequence Similarity between Template and Query on Selected Models
An analysis of the systematic impact of sequence identity between the template and the query sequences on structural model qualities measured by RMSD were examined in detail. As expected, bestSeqIdTemp showed a cluster between 70 and 80% sequence identity and 1-3 Å RMSD, with no direct correlation between sequence identity and RMSD (see Figure A5A). Interestingly, some templates are already at more than 4 Å from the query structure. For bestStructTemp cases, the sequence identity of the query sequence with the template in this scenario is lower (as we have a priori knowledge of structures of the query sequences), they have a better structural approximation, forming a cluster between 60 and 80% and 1-2 Å RMSD. Here, no correlation could be observed between sequence identity and RMSD (see Figure A5B). The generation of structural models is directly dependent of the structural proximity with the query, but is not the only parameter that impacts the quality of the model. It is not shocking to see that for both bestSeqIdTemp and bestStructTemp scenarios (see Figure A5C,D), the best structural models have higher RMSD with the true structures. The bestStructTemp scenario modes shows more deviation in RMSD in comparison to crystal structures than models from the bestSeqIdTemp scenario, leading to eight structural models at more than 4 Å RMSD when the RMSD is, at most, less than 2 Å with the query structure.
However, even for query and template sequences of the same residue lengths and of high sequence identity, the RMSD can vary. For example, the V H H domain from 1BZQ:N is modeled with (i) 4POY:A (94% sequence identity) for scenario 1; (ii) 2P4A:D (92% sequence identity) for scenario 2. They both share equal lengths to the query and have RMSDs of 1.9 Å and 0.4 Å, respectively. Even with this strikingly minimal difference (only 2% sequence identity, in CDR 1 and CDR3 in both cases) between the two template sequences), the induced structural change is three times greater in terms of RMSD values of models from both cases. In fact, there are more residues conserved in the alignment of the query (i.e., 1BZQ:N) with the template from scenario 1 (i.e., 4POY:A) than for the template from scenario 2 (i.e., 2P4A:D), which, in the CDR regions, are contributing to the change in RMSD (see Figure A6).
Structural similarity between best models of different scenarios (see Figure A7A,B) suggest that (a) templates in scenario 2 had a more negative impact than scenario 1 and (b) as the template used in scenario 1 is also used in scenario 3, along with two more templates, the structural similarity is not too divergent in most regions of the V H H. Indeed, there are surprising cases where the best structural template had more deviation than the best sequence identity template. Comparison of scenarios 1 and 3 (see Figure A7B) shows that no such drastic deviations were observed. Nonetheless using the best possible structural template was able to reduce the worse RMSD in scenario 1, around 8 Å to 4 Å. Indeed, multiple sequence alignments properly done might have better structural approximation.

Assessment of Structural Models in FRs and CDRs
After analyzing complete V H H best models, region-wise analyses were undertaken to understand the precise differences in them (see Figure 4 for CDRs and Figure A8 for FRs). FRs are not as simple as anticipated. Despite that, the RMSD values are limited on average, two points can be noted: (i) for every scenario and every type of FRs, some RMSD values are higher than 1 Å. It is surprising since FRs are supposed to be the most rigid part of the V H H to handle; (ii) scenario 2 is only better for FR2 (0.4 Å), while scenario 3 is better for FR1 (0.6 Å) and FR3 (0.4 Å) and even scenario 1 for the short FR4 (0.3 Å). It underlines a specific difficulty of V H H topology; high local sequence identity with a lot of β-sheet does not mean easy modeling. The RMSD in CDR regions is of much higher range, especially for CDR3. The CDR2 region has the lowest RMSD values, as it is the smallest in length. It is unexpected to have high RMSD values for CDR2 (only two cases can be observed are more than 2.0A). Modeling of CDR1 is more complex with RMSD values at best are close to 1.5 Å for scenario 2. CDR3 has the maximum diversity owing to changes in length and sequence identity. The addition of a multi-template does not greatly improve the structural approximation (gain of 0.27 Å) and is close to the theoretical limit of RMSD for scenario 2 (which is at 0.35 Å).

Conservation of CDR Loop Termini Distances in Best Models in Comparison to Crystal Structures
Prediction of loop conformations is known to depend on the conformations of the anchor residues and the distance between the anchor residues [29][30][31][32]. To understand if the distances between the loop termini in the templates influence the distances in the models, these distances between Cα atoms of the CDR loop termini were computed. Figure 5 shows the comparison of loop termini distances in query crystal structure versus their corresponding best model in each scenario. Figure 5D-I shows that the Cα distances of CDR2 and CDR3 termini are well approximated with few deviations compared to the native structures. Hence, distances for CDR2 are at less than 1 Å between crystal structures and models, with only 7 models are at more than 1 Å (with 2 are more than 4 Å). Interestingly, it is found for every scenario. For CDR3, it is the same trend, but it is accentuated for scenario 1, which is supposed to be the best approach. The CDR1 distance comparison showed larger deviations in most of the loops modeled. On average, CDR1 distance is of 13 Å, but the predicted distance is at +/− 1.5 Å, for every scenario, sampling with no preference; thus, adding a new difficulty in the V H H model prediction (see Figure 5A-C). This loop is particularly interesting as it extends between the two β-sheets of the domain and is the largest distance of the three loop termini.

Case Studies: Analyzing Local Backbone Conformations of Two V H H Domains in Different Scenarios for Two Representative Modeling Case Studies of V H H
The first is a llama V H H targeting muscarinic acetylcholine receptor M2 (PDB ID 4MQS:B, 125 residues) [33], while the second is from dromedary and binds Phage Tuc2009 Baseplate Tripod (PDB ID 5E7B:A, 131 residues with additional disulfide bridge) [34]. These two were chosen for the following reasons: (i) the structural models obtained by the three scenarios for the first one has interesting structural similarity, and (ii) structural models of the second one show greater deviations.

Case 1
The query sequence from V H H 4MQS:B was modeled with structural template 5HDO:A [35] (sequence identity or SI of 72%) in case of scenario 1, 3STB:B [36] (SI 64%) in case of scenario 2 and in case of scenario 3, combining 4TVS:a [37] and 4LAJ:H [38] along with 5HDO:A (mean sequence identity or MSI 70%). The best models from the respective scenarios 1, 2, and 3 had RMSD values of 2.3 Å, 2.5 Å, and 2.7 Å to the original structure. Figure 6 shows the superimposition of the query structure and the different structural template/best model. The β-strand structural similarity is excellent in all of the cases. Figure 6A,C,E show the superimposition of the 4MQS:B and its template in each scenario. It is obvious that CDRs 1 and 3 of 4MQS:B (olive green color) display no similarity with any of the templates. The selected models of scenarios 1, 2, and 3 are shown in Figure 6B,D,F. CDR1 and CDR3 loops that are structurally dissimilar in all the templates also remain quite dissimilar in these selected models. To add further, native structures, structural templates and selected models have been analyzed using protein blocks (see Figure A9A) [39]. The CDR1 sequence 'GFDFDNFD-DYA' adopts a 'hiehiafklpc' PB signature in the structure, which none of the models exhibit. Although all three of them exhibit more or less similar PB sequences at the start of the loop, hiafblklmmc, hjafklmmmpc, hiankjoklpc, they quickly deviate into highly divergent signatures. The CDR2 sequence 'DPSDGST' shows 'fknopacd' PB signature in the crystal structure, which, in the best models, are slightly changed to 'fkgoiacd', 'fknopacd' and 'fkomaccd'. These results are accurate and highly close in terms of PBs (exact match for scenario 2, two mismatches only for the two other scenarios).
It seems that the amino acids aspartate and proline at the start of the group and provide tight constraints in all of the modeling cases. In the case of CDR3, the sequence SAWTLFHSDEY shows PB signature 'dddehhlagcd' in the crystal structure, which is largely modified in ehiabddfkpa, 'ddfblmlcfbd', and 'ehiafblcddd' in scenarios 1, 2, and 3. These conformations are far away from the V H H crystal structure, with only a few PBs d at the extremity for scenario 2 and no common PBs for scenario 1 and 1 for scenario 3. This example underlines that although models are closely similar in terms of RMSD, their CDRs can be quite different in conformations.

Case 2
In case study 2, the V H H sequence 5E7B:A [34] is modeled using (i) 1SJX:A [40] (SI 74%) in scenario 1, (ii) 5H8D:A [41] (SI 70%) in scenario 2, and (iii) in case of scenario 3 4ZG1:F [42] and 5JQH:C [43] were additional used, along with the template from scenario 1 (mean SI 74%). The RMSD of the best model from scenarios 1, 2, and 3 are 5.5 Å, 4.0 Å, and 6.2 Å, respectively. Figure 7A,C,E show the superimposed representations of the query structure and the templates used. It is obvious that none of the template structures can provide a good model in the region of CDR3; this case is highly related to the one observed in [28]. Figure 7B,D,F show the best model in scenarios 1, 2, and 3, in each case superimposed to 5E7B:A crystal structure. As seen in the previous case study of 4MQS:B, PBs were assigned (see Figure A9B). For CDR1, the amino acids sequence 'GFTFDDSD' is associated to PB sequence hiafklpc, while in its models it is associated to hjfklpcc, hiafklpc, and hieolmpc for CDR1 from scenarios 1, 2, and 3, respectively. Hence, differences in their PB signature are seen in scenario 1, as only the first and last PBs are in common, and in scenario 3, as only the two first and two last PBs are in common; CDR1 PB signature in model from scenario 2 is exactly the same. For CDR2, the amino acids sequence 'FSDGSTY' is associated to the PB signature 'fkopacd'. The models provided PB series fkopacd, ehiacdd, and fkopacd in scenarios 1, 2, and 3, respectively. The CDR2 signatures from scenario 1 and scenario 3 are identical to the crystal structure signature. In case of CDR3 sequence 'AAATTTVASPPVRHVCNGY' shows PB signature 'dfbfbdcfklmmmmnommb', which has extended conformations in the beginning (presence of PBs d, b and f ) and helical (PB m) in the end. In the best models selected from the three scenarios, the CDR3 is assigned the PB signatures ddddddfbdcfehiafbdc (scenario 1), djbdcdfbfbdcfbgoiac (scenario 2), dddfbdfbdfblbdcdddd (scenario 3). It can be inferred that none of these model CDR3 PB signatures are close to the crystal structure PB signature. Moreover, they tend to be more extended in their conformation than helical in all cases. This example shows that most of the regions can be modeled with good accuracy, while CDR3 is more difficult to model and dictate the high global RMSD seen here.

Assessment of Backbone Conformational Sampling in Models Using Protein Blocks
In previous sections, we have underlined the interest of specific approaches by selecting a 'best' model. The generation of the large number of models can also provide interesting information. Hence, to understand how each modeling scenario is different from the other and what conformations each of them have sampled is carried out by analyzing PB signatures of both case studies in each scenario using PB maps. The two examples presented in Figures 6 and 7 were chosen.
For 4MQS, little diversity is observed for FRs, conformational sampling is limited but can be impacted differently in each scenario (see Figure A10A,C,E). Hence, FR1 is well conserved in scenarios 1 and 2 modelling, but shows some variations for scenario 3; all other FRs are equivalent in all scenarios. Conformational diversity in CDR 1 (positions 26-36) and CDR 2 (55-62) and especially in CDR3 (97-110) is significantly more pronounced. However, in case of CDR3, the strong PB signature in scenario 1 and scenario 2 is not at all preserved in scenario 3. Hence, the multi-template allows the deepest conformational exploration.
For 5E7B:A, models have sampled strand conformation for CDR3 in all the modeling scenarios (see Figure A10B,D,F). There are very few positions for this case in scenarios 1 and 3 that show conformational diversity in CDR1 and CDR2. The FRs also show less diversity in all cases except for just after CDR2 (position 63-68).
PB entropies (N eq [22]) are shown in Figure 8. PB N eq is able to analyze and capture information of conformational diversity. For 4MQS:B (see Figure 8A,B), the CDRs have N eq values higher than 2. CDR1 showed greater values of N eq in scenario 2 (more than 8) followed by scenarios 3 and 1. Whereas for CDR2 and CDR3, scenario 3 shows high and higher values of~5 and~7, respectively; both are considered high, suggesting a lot of diverse sampling. Many FRs along with CDRs, positions 26-36, 55-62, 97-110, have N eq values are >1, suggesting some limited sampling of local conformation, e.g., N eq of 2 for scenario 3 is seen four times. The differences in the conformations advocate the influence of template through the constraints they provide to the model-building algorithm.   [22] (raw values can be found in Supplementary Files S1 and S2); (C,D) change in PB entropy (∆N eq ) [44] and (E,F) differential PB signature at each position (∆PB) in different scenarios [44,45]. Each graph has all the three scenarios represented in different colors and notations c1, c2, and c3 for scenarios bestSeqIdTemp, bestStructTemp, and 3bestSeqIdTemp. The dotted rectangles approximately denote the regions CDR1, CDR2, and CDR3 indicated from left to right in each figure.
Analysis of 5E7B:A models in terms of N eq shows drastic differences in comparison to 4MQS:B models. The CDR3 region (93-112) models from all three scenarios show high N eq values, more than 6 in this region. For CDR1 region (26-33 aa region), no scenarios exhibit much conformational diversity (see Figure 8B). It is surprising as the CDR1 region is the next difficult region to model. In the CDR2 region (52-58), except models from scenario 2, which show slightly higher N eq , the other scenarios are conformationally less diverse. Even in the FR regions, conformational diversity is less observed compared to the previous case study.
A quantitative comparison of N eq values between any two scenarios is calculated and represented as ∆N eq values (see Figure 8C,D). It mainly underlines a large difference for 4MQS in CDR3 between all scenarios and in CDR1 between scenarios 2 and 3; for 5E7B, it is only distinctly different in CDR3 regions while the rest of the modelled regions in all the scenarios do not show much variations.
To apprehend more precise, qualitative differences, in terms of PBs, the ∆PB is computed [44]. In addition to ∆N eq [44], ∆PB is used to compare two different scenarios, but it gives qualitative information. Its value ranges from 0 to 2, suggesting identical conformations at 0 and completely different conformations at 2. The comparison of ∆PB values for scenario 1 vs. scenario 2, scenario 1 vs. scenario 3, and scenario 2 vs. scenario 3 are shown for case studies in Figures 8C and 7D. In Figure 8C, the three CDR regions (positions 25-33, 52-58, and 97-110) show ∆PB values for some cases around 2. This analysis underlines local protein conformations that are significantly different, e.g., the end of CDR2 regions between all scenarios. However, the FR regions, which should be more tightly constrained, also show non-zero ∆PB values in the regions of FR1 between scenarios 2 and 3, after CDR1 (42)(43)(44)(45), after CDR2 (62-65), and for positions 74-76, mostly in scenario 1 vs. scenario 3 and scenario 2 vs. scenario 3. In case of 5E7B:A, similar to 4MQS:B, the CDR3 region (93-112) shows much diversity in all comparisons as represented by the ∆PB values. The ∆PB values for scenario 2 vs scenario 3 are close to 2 in the CDR 1 (26)(27)(28)(29)(30)(31)(32)(33) and CDR 2 (52-58). These regions show local conformations entirely different between the two scenarios. Some regions other than CDR have non-zero ∆PB values in all comparisons; amongst these, one region stands slightly conspicuous, the amino acid region 74-76 for all scenarios. The antibody research community considers this region as the fourth CDR loop, in both cases 5E7B:A and 4MQS:B this appears to have non-zero ∆PB values in this region, suggesting conformational diversity is possible in this region. Thus, this region might not confer many constraints as a regular secondary structured region, which results in higher sampling in this region. The above analysis of modeling in different scenarios shows the applicability of PBs as a tool for analyzing conformational sampling during modeling and its efficiency in discriminating local conformations.

Systematic Modeling of V H H Domains (Scenario 4-All)
Some examples presented in this study suggest that the choice of the template with the higher sequence identity is not always a guarantee to obtain the most structurally accurate model possible. Thus, to investigate whether it is the case for all V H H domains, each domain was modeled using the 99 remaining domains. A summary of the RMSD distribution of best model superimposed over the original crystal structure obtained for each domain is provided in Figures 9A and 10A, as well as their corresponding sequence identities between the crystal structure and template used (see Figures 9B and 10B). The pair-wise sequence identity distributions only had two outlier values for 3K3Q:A and 4IDL:A, which were lower than 40%. 1BZQ:N, 2P4A:D, 2XA3:A, 3R0M:B, and 4POY:A have pairwise sequence identity outliers above 90%. Most of the pairwise sequence identity values are between 50 and 80%, suggesting that it is the case for homology modeling.  The RMSD value distributions of 2XT1:B, 3K74:B, 3ZKX:C, 4X7F:D, 5F7N:C, 5H8D:A, and 5IP4:A show very narrow distribution because of smaller sequence lengths (≤115) suggesting a reduced structural misalignment due to CDR lengths (see Figures 9A and 10A). Another notable observation is that when distributions were analyzed there were 30 cases where a lower RMSD value compared to any of the previous scenarios (such as best sequence identity, closest structural approximate, and multiple templates, see Figures 9B and 10B) was observed. This observation suggests that there are other template features that could lead to a lower RMSD value.

Discussion
To understand and appreciate the effects of the template backbone conformations on the models generated, a large-scale analysis of the V H H domain modeling was performed in this study. Four different scenarios were conceived, which cover most aspects of variable domain modeling. All of these scenarios were carried out with simple approaches, easily reproducible by the scientific community. Preliminary analyzes did not show significant gains with more advanced settings of Modeller.
The first scenario is the classical case, which is often used in V H H structure prediction studies and must be considered as the basic expectation. The main difficulty of this strategy is the length of CDR3 loops, and the conformation of the CDR1 loop, even if its length is, in most cases, constant. Moreover, 14% of the cases show an RMSD of the selected model higher than 3 Å.
The second scenario reflects the best theoretical values possible, where the structural similarity, overall, plays a major role, and must be considered as the best theoretical result. Even if the CDR lengths do not match, the FRs in this scenario are expected to be highly similar between the templates and the query. In most cases, the structural similar template had the sequence length almost the same as the query, but for some cases, the differences in length of the region to be modeled, and the template, were drastically different. This, in particular, allowed CDR3 and a couple of CDR1 regions to adopt non-template dependent conformations. Hence, 12% of these V H Hs have a RMSD higher than 3 Å.
In the third scenario, the additional constraints are applied by adding two other templates with the next best sequence identity values. It leads to only 9% of models with RMSD values more than 3 Å. When RMSD of the best models were compared, only 22 cases in scenario 1 had better RMSD when compared to scenario 2, and this number was slightly reduced to around 12 cases when compared to scenario 3. The rationale here is simply: (i) best sequence identity template provides better constraints, or, (ii) adding more constraints at each residue position by adding more templates will improve the approximation.
Different regions (FRs and CDRs) of models from three scenarios were assessed using RMSD. As expected, CDRs showed higher values than the FRs; however, even in FRs, RMSD analyses from different scenarios, unexpectedly, scenario 2 showed higher values of median RMSD than the other two. Similarly, the distance between the Cα of residues in CDR loop termini between original crystal structures and their corresponding models were not preserved in every case. CDR1 had the highest distance between the termini and it appeared to change without any specific trend. The deviations at CDR2 and CDR3 loop termini were highly limited as their loops connected two adjacent β-strands where it was not the case for CDR1. This is an opportunity to think of a larger sampling than expected in terms of both loops and distance. It should be noted that the different scenarios always produce relevant 3D models, i.e., having no aberrant conformations. For example, the number of residues proposed in unfavorable regions is, on average 1%, and is always over 90% in the highly favorable regions of the Ramachandran map (see Figure A11). On average, the models selected have a better number of residues in the favorable regions of the Ramachandran map than the experimental structures [46].
Two examples were selected to explain to the reader the differences in the models influenced by the templates. The case study of V H H 4MQS:B revealed that even though the models are close in terms of RMSD, they could be quite different in local conformations. In the case of V H H 5E7B:A, it surprising to note that most of the protein, except the CDR3, could be modeled very efficiently, even from different templates. Further, the last scenario is studied if a better RMSD is possible with any other template other than best sequence identity template; in 30 cases, it was observed to have a model with lower RMSD than the other three scenarios.
In our analyses, we find that FRs are simpler to understand than CDRs, but they have a direct effect on the quality of the models. Next, CDR1 has unexpected features; they are not very diverse in terms of conformations (and lengths), but the distances between loop extremities are not conserved and show variations between templates and models, irrespective of the scenario, while this is not the case for CDR2 and CDR3. Analyses of multiple template modeling show that, due to lack of constraints in a few cases, comparative modeling is not easy to perform. For these specific cases, a hybrid method of modeling using multiple CDR templates could be a good strategy. Similarly, this study also emphasizes the need for different tools to assess V H H models. Interestingly, the generation of the models still shows an excellent correlation between the value of the DOPE score and the quality of the model (compared to the experiential structure in terms of RMSD). Similarly, the analysis of the models selected using PROSA software [47] shows that the models have compatibility scores very close to those of the solved structures, even in the case of models that are difficult to build (see Table A1). In addition to RMSD for the analysis of the entire domain or its different regions, a protein block-based scoring function may be useful for analysis of local conformation.
In summary, this study had shown that, often, a multi-template is the best method to obtain, on average, a correct V H H model, and that DOPE score is a relevant measure to select this model. However, it also shows that, for some V H Hs, it is difficult to propose satisfactory models. In the same way, the use of the structurally closest V H H is not always the best choice, underlying the possibility for future improvement.

Dataset
V H H structures were retrieved from the Protein Databank (https://www.rcsb.org/, accessed on 31 September 2018) [12] using keywords 'VHH' and 'Nanobody'. Only V H H structures without missing residues and/or modified ones were selected. After a structural inspection, a final redundancy filter (of 95% of sequence identity) was applied.

Sequence Alignment
In order to (i) compute the sequence identity and (ii) to prepare files for the comparative modeling (see details below), pairwise sequence alignments were performed using Clustal Omega (v 1.2.4) [48].

Structural Similarity of Protein Structures and Structural Models
The root mean square deviation (RMSD) was used locally to quantify the structural similarity between two aligned protein structures. ProFit (http://www.bioinf.org.uk/ programs/profit/, accessed on 31 September 2018) based on the McLachlan algorithm for the superposition [49] was used in the study to calculate RMSD.

Comparative Modeling of V H H
Comparative modeling was performed using Modeller software [24,25], one of the most popular comparative modeling approaches, and also the most widely used for V H H molecular models [21]. Modeller 9 v.16 was used to model V H H query sequences. As input, an alignment file in the prescribed format of the query with the target template(s) was generated. The best structural model was chosen using the DOPE score implemented in Modeller to rank the models [26,50].
Four different strategies to address different hypotheses were conceived: • In the first scenario, sequence identity between the query and template sequences is used as a criterion to select a template that shares the best sequence identity with each query sequence (namely scenario bestSeqIdTemp). It is the most classical protocol for template selection used in homology modeling, • In the second scenario, templates are selected using structural similarity (to already solved structures of query sequences) measured using RMSD as criterion (namely sce-nario bestStructTemp). The template that had the lowest RMSD with the experimentally resolved structure of the query was chosen. This scenario is not possible in factual sense and it is a theoretical assessment, to check the maximal accuracy reachable with the closest structural template.

•
The third scenario is based on a multi-template strategy. Three templates exhibiting the highest sequence identity with the query were selected (namely scenario 3best-SeqIdTemp). In a multiple template mode, better models are expected thanks to the combination of different structures.

•
The last scenario is also a theoretical case to access the maximal reachable accuracy using all possible templates (namely scenario All). Indeed, all previous scenarios had a specific a priori. Here, all structures were used independently as potential templates. It permits us to have more insights that would have been missed in previous scenarios such as; could another V H H template, other than the best sequence identity, and the best structurally close ones provide a better structural model.

Local Conformational Analysis
Secondary structure assignment was performed using DSSP 2015 version 2.2.1 [51] with default parameters [52]. Similarly we used also protein blocks (PBs [22]); they are a structural alphabet composed of 16 local prototypes [23]. Each PB is characterized by a series of 8 ϕ, ψ dihedral angles of five consecutive residues. Each PB assignment focuses on the central residue. PBs give a reasonable approximation of all local protein 3D structures [53]. They are labelled from a to p. PBs m and d can be roughly described as prototypes for α-helix and central β-strand, respectively. PBs a to c primarily represent β-strand N-caps and PBs e and f representing β-strand C-caps; PBs g to j are specific to coils; PBs k and l to α-helix N-caps while PBs n to p to α-helix C-caps. PB assignment was carried out using PBxplore tool (available at GitHub) [39].
The equivalent number of PBs (N eq ) is a statistical measure similar to entropy. It represents the average number of PBs for a residue at a given position, which is calculated as follows [22]: where, f x is the probability of PB x. A N eq value of 1 indicates that only one type of PB is observed, while a value of 16 is equivalent to a random distribution.
To detect a change in PBs profile, a ∆PB value was calculated [44]. It corresponds to the absolute sum of the differences for each PB between the probabilities of a PB x to be present in the first and the second structures (x goes from PB a to PB p). ∆PB is calculated as follows: where, f 1 x and f 2 x are the percentages of occurrence of a PB x in the analyzed structures. A value of 0 indicates perfect PB identity, while a score of 2 indicates a total difference.

Protein Structure and Structural Model Visualization
Visualization of models and/or original structures were performed using PyMOL Version 1.7.2 [54,55].

Scripting
All scripts for analyzing V H H structures and models were done using Python 3.6 [56] with NumPy library [53] and R 3.3.3 [57].