Mutations Causing Mild or No Structural Damage in Interfaces of Multimerization of the Fibrinogen γ-Module More Likely Confer Negative Dominant Behaviors

Different pathogenic variants in the same protein or even within the same domain of a protein may differ in their patterns of disease inheritance, with some of the variants behaving as negative dominant and others as autosomal recessive mutations. Here is presented a structural analysis and comparison of the molecular characteristics of the sites in fibrinogen γ-module, a fibrinogen component critical in multimerization processes, targeted by pathogenic variants (HGMD database) and by variants found in the healthy population (gnomAD database). The main result of this study is the identification of the molecular pathogenic mechanisms defining which pattern of disease inheritance is selected by mutations at the crossroad of autosomal recessive and negative dominant modalities. The observations in this analysis also warn about the possibility that several variants reported in the non-pathogenic gnomAD database might indeed be a hidden source of diseases with autosomal recessive inheritance or requiring a combination with other disease-causing mutations. Disease presentation might remain mostly unrevealed simply because the very low variant frequency rarely results in biallelic pathogenic mutations or the coupling with mutations in other genes contributing to the same disease. The results here presented provide hints for a deeper search of pathogenic mechanisms and modalities of disease inheritance for protein mutants participating in multimerization phenomena.


Introduction
Fibrinogen is a secretory glycoprotein complex with a molecular mass of about 340 kDa produced in the liver and resulting from the homodimerization of a heterotrimer composed of the polypeptide chains Aα (FGA gene), Bβ (FGB gene), and γ (FGG gene) with the formal formula (Aα, Bβ, γ) 2 . In each heterotrimer, the Aα, Bβ and γ chains are held together by a triple-helical coiled-coil that links the central nodule, which results from the hexameric assembly of the N-terminal extremities of all six chains to the αC region and the βand γ-modules. Several inter-chain disulfide bonds stabilize the triple-helical coiled-coil and covalently links the two heterotrimers at the central module. The hexameric fibrinogen assembly has a rod-like shape with each of two extremities (forming the C-terminal portions of the D regions) composed by the βand γ-module at the C-termini of the respective fibrinogen chains (Figure 1). Fibrinogen is crucial in coagulation, and in the final phase of clot formation, upon thrombin activation, fibrin monomers form and spontaneously aggregate to form fibrils and subsequently the clot [3]. Genetic changes, mostly point mutations, cause either afibrinogenemia, hypofibrinogenemia, or dysfibrinogenemia. The extent of hypofibrinogenemia is usually related to the heterozygous, homozygous, or compound heterozygous patterns of mutations [4]. Pathological mutations have been observed for each of the three fibrinogen chains. A few mutations in the gamma gene (FGG) cause intracellular aggregation and plasma deficiency [5], a condition named hereditary hypofibrinogenemia with hepatic storage (HHHS) [6]. Differently from all the other hypofibrinogenemias, HHHS is not accompanied by overt coagulation problems but always sets the conditions for a progressive liver disease like in the case of α-1-antitrypsin (AAT) deficiency [7].
In this work, I made a protein structure analysis to understand the role in disease and the patterns of disease inheritance of known γ-module missense variants. The γ-module is interesting to examine because of its structurally characterized network of interactions; it interacts with the βmodule within each Aα-Bβ-γ heterotrimer of the hexameric (Aα, Bβ, γ)2 fibrinogen, contains the selfassociation sites in the γ-chain region of each D domain, which participate in fibrin or fibrinogen D:D assembly and necessary for correct end-to-end alignment of polymerizing fibrinogen or fibrin molecules [8,9], includes the γ-γ-cross-linking sites promoting alignment of cross-linking regions for factor XIII-or FXIIIa-mediated transglutamination [10][11][12], and also provides the hole "a" that spontaneously receives the new N-terminus (Gly-Pro-Arg peptide, named "knob") exposed by fibrinogen α chain upon proteolysis by thrombin during clotting [13,14].
The stability of a protein structure is determined by the protein folding free energy (∆G), which represents the change of the thermodynamic free energy along the conformational path from the unfolded to the folded state. In particular, the difference in the folding free energy change (∆∆G) between a protein mutant and the corresponding wild type allows to estimate the gain or loss of stability of the local structure upon amino acid mutations, hence allowing to infer whether and to which extent mutations can induce structure alterations. Various computational methods exist to predict the ∆∆Gs associated with protein mutations. In this work, the γ chain crystal structure with the best atomic resolution (representing the isolated γ-module) was analyzed to estimate the amount of structurally damaging and biological implications of known pathogenic missense variants as Structures of the hexameric fibrinogen (Aα, Bβ, γ) 2 and fragment double-D. (A) Crystal structure of fibrinogen [1]; (B) crystal structure of fragment double-D from human fibrin [2].
Fibrinogen is crucial in coagulation, and in the final phase of clot formation, upon thrombin activation, fibrin monomers form and spontaneously aggregate to form fibrils and subsequently the clot [3]. Genetic changes, mostly point mutations, cause either afibrinogenemia, hypofibrinogenemia, or dysfibrinogenemia. The extent of hypofibrinogenemia is usually related to the heterozygous, homozygous, or compound heterozygous patterns of mutations [4]. Pathological mutations have been observed for each of the three fibrinogen chains. A few mutations in the gamma gene (FGG) cause intracellular aggregation and plasma deficiency [5], a condition named hereditary hypofibrinogenemia with hepatic storage (HHHS) [6]. Differently from all the other hypofibrinogenemias, HHHS is not accompanied by overt coagulation problems but always sets the conditions for a progressive liver disease like in the case of α-1-antitrypsin (AAT) deficiency [7].
In this work, I made a protein structure analysis to understand the role in disease and the patterns of disease inheritance of known γ-module missense variants. The γ-module is interesting to examine because of its structurally characterized network of interactions; it interacts with the β-module within each Aα-Bβ-γ heterotrimer of the hexameric (Aα, Bβ, γ) 2 fibrinogen, contains the self-association sites in the γ-chain region of each D domain, which participate in fibrin or fibrinogen D:D assembly and necessary for correct end-to-end alignment of polymerizing fibrinogen or fibrin molecules [8,9], includes the γ-γ-cross-linking sites promoting alignment of cross-linking regions for factor XIIIor FXIIIa-mediated transglutamination [10][11][12], and also provides the hole "a" that spontaneously receives the new N-terminus (Gly-Pro-Arg peptide, named "knob") exposed by fibrinogen α chain upon proteolysis by thrombin during clotting [13,14].
The stability of a protein structure is determined by the protein folding free energy (∆G), which represents the change of the thermodynamic free energy along the conformational path from the unfolded to the folded state. In particular, the difference in the folding free energy change (∆∆G) between a protein mutant and the corresponding wild type allows to estimate the gain or loss of stability of the local structure upon amino acid mutations, hence allowing to infer whether and to which extent mutations can induce structure alterations. Various computational methods exist to predict the ∆∆Gs associated with protein mutations. In this work, the γ chain crystal structure with the best atomic resolution (representing the isolated γ-module) was analyzed to estimate the amount of structurally damaging and biological implications of known pathogenic missense variants as reported in HGMD [15] and of supposedly non-pathogenic missense variants available in gnomAD [16]. The first aim of the analysis was to identify possible differences in the mutation-induced patterns of structural alterations between these two databases characterized by distinct clinical relevance. Subsequently, various cases of representative variants of HGMD and gnomAD characterized by ∆∆Gs significantly deviating from the average trends in these databases were assessed more explicitly by examining the crystallographic structure of fragment double-D from human fibrin. The analysis of the latter structure allows the gathering of information from the context of functional intermolecular interactions exhibited by the γ-module with other fibrinogen chains. This study provides insights into the molecular implications of γ-module mutations and hints in the identification of mechanisms through which pathogenic mutations can be distinguished between those channelling into autosomal recessive or negative dominant patterns of disease inheritance.

Distribution of HGMD and GnomAD Missense Variants in the γ-Module
The HGMD and gnomAD variants falling in the fibrinogen γ-module are distributed almost complementarily ( Figure 2). In particular, HGMD variants tend to populate more frequently the C-terminal P-domain (which contains polymerization site), only some fall in the central B-domain and mostly at its external loops with only very few hitting the core of the five-stranded β-sheet. On the other hand, gnomAD variants are more concentrated in the N-terminal A-domain, in particular at its β-sheet and helix, a significant number hits the B domain β-sheet (gnomAD variants hitting this β-sheet also in its core are much more numerous compared to HGMD variants), and relatively few variants fall in the P-domain. Of note, the HGMD and gnomAD databases share a number of identical variants and also residues targeted by mutations but with a different substituting amino acid.

∆∆G Values of HGMD and GnomAD Missense Variants Localized in the γ-Module
The ∆∆G values of all γ-module missense variants reported in HGMD and gnomAD were calculated using three popular methods based on protein structure analysis, FoldX (v5.0) [17], PoPMuSiC (v3.0) [18], and CUPSAT [19] and shown in Table 1 (HGMD variants) and Table 2 (gnomAD variants). It is worth to notice that the ∆∆Gs of an important number of HGMD variants did not achieve the threshold of significance hence implying that the corresponding amino acid substitutions are predicted not to cause appreciable alterations in the protein structure (Table 1; ∆∆Gs simultaneously predicted by FoldX, PoPMuSiC, and CUPSAT as non-significant for structural changes for the same variant are greyed). On the other hand, several variants in the gnomAD database are predicted with ∆∆Gs suggesting important protein structure alterations ( Table 2; the ∆∆Gs simultaneously predicted by FoldX, PoPMuSiC, and CUPSAT as structurally altering for the same gnomAD variant are enclosed in dashed boxes). Both findings are apparently counterintuitive as HGMD, which is supposed to contain pathogenic variants, presents some variants predicted with non-significant protein structure alterations, whereas gnomAD, which excludes variants found in individuals affected by severe pediatric disease and in their first-degree relatives, hence containing only supposedly neutral variants, exhibits several variants predicted to alter the protein structure. Furthermore, and definitely puzzling, as also displayed in Figure 2, the HGMD and gnomAD databases share twelve missense variants (marked with a square in Tables 1 and 2) and several variants targeting the same residues but with different amino acid substitutions (marked with a triangle in Tables 1 and 2). Int. J. Mol. Sci. 2020, 21, x FOR PEER REVIEW 4 of 19 Figure 2. Sites of HGMD and gnomAD missense variants falling in the γ-module of fibrinogen (C α atoms of affected residues are shown as a sphere in a blue-red gradient according to their distance from the surface of the module: blue, surface-exposed residues; red, residues more in the core of the module). Sites of HGMD and gnomAD missense variants falling in the γ-module of fibrinogen (C α atoms of affected residues are shown as a sphere in a blue-red gradient according to their distance from the surface of the module: blue, surface-exposed residues; red, residues more in the core of the module). HGMD amino acid variants and their ∆∆Gs (calculated with FoldX, CUPSAT, and PoPMuSiC on the crystal structure of the C-terminal fragment of the fibrinogen gamma chain monomer, PDB 3FIB), molecular interactions, protein secondary structure, and the solvent-accessible surface of the residues affected by mutations. a Intermolecular interactions of the wild type residues with functional ligands (Bβ, γ, knob, and Ca 2+ ion, determined in the crystal structure of fragment double-D from human fibrin, PDB 1FZC). b Protein secondary structure of the wild type residues (determined on PDB 3FIB). c Percentage of the solvent-accessible surface of side chains of the wild type residues (determined on PDB 3FIB). d Structurally significant ∆∆Gs are in bold. ∆∆Gs of given variants are greyed if they do not achieve structural significance in all three methods (FoldX, CUPSAT, and PoPMuSiC). For both FoldX and CUPSAT, mutations with destabilizing and stabilizing effects on protein structure are respectively associated with ∆∆G > 0 and ∆∆G < 0, while for PoPMuSiC is the inverse, i.e., ∆∆G > 0 for stabilizing mutations and ∆∆G < 0 for destabilizing ones. The ∆∆G thresholds above which mutations are assumed to produce significant structure alteration are as follows: for both FoldX and CUPSAT, |∆∆G| > 1.0 Kcal/mol, while for PoPMuSiC, |∆∆G| > 0.5 Kcal/mol (see Materials and Methods for references on these thresholds and the predictive accuracies of the individual methods on experimentally determined mutations). e Mean and standard deviation of all ∆∆Gs calculated by the individual methods. Same variant is also reported in the non-pathogenic gnomAD database. A variant hitting the same residue but with different substituting amino acids is reported in gnomAD.  gnomAD amino acid variants, ∆∆Gs (calculated with FoldX, CUPSAT, and PoPMuSiC on the crystal structure of the C-terminal fragment of the fibrinogen gamma chain monomer, PDB 3FIB), molecular interactions, protein secondary structure, and solvent-accessible surface of the residues affected by mutations, and gnomAD allele frequency. a Intermolecular interactions of the wild type residues with functional ligands (Bβ, γ, knob, and Ca 2+ ion, determined in the crystal structure of fragment double-D from human fibrin, PDB 1FZC). b Protein secondary structure of the wild type residues (from PDB 3FIB). c Percentage of the solvent-accessible surface of side chains of the wild type residues (determined on PDB 3FIB). d Structurally significant ∆∆Gs are in bold. ∆∆Gs are enclosed in dashed boxes if, for a given variant, significant structural effects are predicted by all three methods (FoldX, CUPSAT, and PoPMuSiC). For both FoldX and CUPSAT, mutations with destabilizing and stabilizing effects on protein structure are respectively associated with ∆∆G > 0 and ∆∆G < 0, while for PoPMuSiC is the inverse, i.e., ∆∆G > 0 for stabilizing mutations and ∆∆G < 0 for destabilizing ones. The ∆∆G thresholds above which mutations are assumed to produce significant structure alteration are as follows: for both FoldX and CUPSAT, |∆∆G| > 1.0 Kcal/mol, while for PoPMuSiC, |∆∆G| > 0.5 Kcal/mol (see Materials and Methods for references on these thresholds and the predictive accuracies of the individual methods on experimentally determined mutations). e Mean and standard deviation of all ∆∆Gs calculated by the individual methods. Same variant is also reported in the pathogenic HGMD database. A variant hitting the same residue but with different substituting amino acids is reported in HGMD.

HGMD Variants Predicted to Cause Non-Significant Structural Changes
The result that HGMD variants on average present ∆∆G values indicating protein structure destabilization satisfy the expectation that pathogenic variants could be more detrimental compared to variants found in the healthy population. However, for several γ-module HGMD variants FoldX, CUPSAT, and PoPMuSiC predict ∆∆G values that do not achieve the threshold of significance (Table 1) hence implying very small or absent structural changes in the protein. In particular, structurally non-significant ∆∆Gs are consistently yielded by the three methods for p.Glu239Ala, p.Gly294Glu, p.Asn334Ile, p.Asp342Asn, and p.Asp344Val (greyed ∆∆Gs in Table 1). This firstly suggests that predicted near-to-zero ∆∆Gs cannot be used as a unique criterion for the identification of neutral mutations since these variants are indeed pathogenic. Then, is also necessary to understand whether the lack of correlation between the pathogenicity of these variants and the predicted minimal/absent structural alterations is due to inaccuracy of the ∆∆G predictions or to other factors that were not considered. Thus, the affected sites were also explicitly examined in the available crystal structures. While ∆∆G calculations were carried out, for increased reliability, on the γ-module crystal structure with best atomic resolution, which was available in the monomeric protein form, the regions targeted by mutations were inspected in the biologically more relevant context of the intermolecular interactions exhibited by the γ-module within the crystal structure of fragment double-D from human fibrin ( Figure 3A).
A first observation is that these mutations occur at sites on the γ-module surface. This justifies the predicted near-to-zero ∆∆Gs, as residues exposed on the protein surface usually are less determinant in protein folding compared to residues in the core of the protein. Another observation is that all variants hit or fall in proximity to sites where the γ-module is engaged in intermolecular interactions with other fibrinogen chains. Being the binding regions finely designed for the proper association with the other proteins, their function is highly sensitive to amino acid replacements, even if these produce little or no effects on the fold of the monomeric γ-module. In particular, the residues affected by variants p.Glu239Ala ( Figure 3B), p.Gly294Glu ( Figure 3C), and p.Asn334Ile ( Figure 3D) do not present intramolecular interactions important for the γ-module fold but are located at the D-D dimer interface and therefore are expected to alter homodimerization. The aspartic acid affected by p.Asp342Asn is near the Ca 2+ cofactor binding site and the aspartic acid hit by p.Asp344Val directly coordinates this Ca 2+ ion, thus these two mutations have a functional impact on the Ca 2+ binding region. However, they occur at a peripheral position on the γ-module surface, not in its core ( Figure 3D), thus they possibly cause local structural changes but not severe misfolding. Being not distant from the bound β chain ( Figure 3A) the interactions with this protein might become defective. Thus, all these HGMD missense variants with consistently predicted near-to-zero ∆∆Gs appear to bear only small or mild and localized γ-module structural alterations, thus not expected to induce full destabilization and loss of function of the γ chain. However, the mutations occur at or near binding regions and likely cause defective assemblies with the other fibrinogen chains. Taken together, these results indicate that the calculations of folding free energy changes on the unbound γ-module plausibly predicted that these group of HGMD variants can be somehow tolerated but this information alone is not sufficient to exclude other pathogenic mechanisms, specifically defects in protein-protein interactions ( Table 1 shows the binding status for all γ-module residues affected by HGMD missense mutations as determined in fragment double-D). A corollary is that the tolerability of the local protein structure to amino acid changes does not necessarily imply the absence of pathogenic effects. Indeed, the cases examined above suggest that the capability of a protein to structurally and functionally tolerate amino acid replacements can paradoxically become itself a subtle cause of pathogenicity. Specifically, this can happen when a) the protein variant does not unfold sufficiently to be recognized as defective and driven to degradation by cellular mechanisms of protein quality check, and b) the variant also retains the capability of its designed functional interactions with other protein chains flawing the arrangement and functions of quaternary structures and/or triggering anomalous protein aggregations. In such instances, small structural defects can propagate and amplify dramatically if incorporated in the higher structural organization levels and functions of protein multimers. Imperfect or undue recruitments of other proteins and the formation of corrupted or de novo protein assemblies underlie negative dominant mechanisms. A protein such as γ fibrinogen that undergoes a complex, finely regulated, and irreversible polymerization can be particularly exposed to such risk. Indeed, in previous works, we proposed mechanisms explaining how mutations affecting the interface of homodimerization of fibrinogen γ might lead to HHHS [20]. We also showed how fibrinogen defects can trigger storage disease and plasma deficiency not only of fibrinogen but also of non-fibrinogen proteins such as apolipoprotein B [21]. We in fact presented a case of a child with hypofibrinogenemia due to the Aguadilla mutation and severe hypobetalipoproteinaemia demonstrating that both fibrinogen and apolipoprotein B accumulated in the same endoplasmic reticulum inclusions despite the latter protein was not mutated. The hydrophilic wild type asparagine is located at the D-D interaction interface; its replacement with the hydrophobic isoleucine alters the properties of the D-D interface. (E) p.Asp342Asn and p.Asp344Val variants. Asp342 is engaged in a number of intramolecular interactions (dotted lines), which might be at least partially maintained by the quite conserved replacement with an asparagine. The residue is also close to the Ca 2+ ion cofactor binding site. Being the affected site at the γ-module surface, it might not cause a severe misfolding. However, this site is not distant from the bound β chain and therefore p.Asp342Asn might affect the interactions with this protein. The p.Asp344Val variant affects an aspartic residue whose carboxylic side chain coordinates the Ca 2+ ion, and this interaction is lost upon the non-conservative replacement with a valine. In this case, the localization of the change on the surface might not lead to the major unfolding of the γ-module but its intermolecular interactions with other fibrinogen chains can be altered. The hydrophilic wild type asparagine is located at the D-D interaction interface; its replacement with the hydrophobic isoleucine alters the properties of the D-D interface. (E) p.Asp342Asn and p.Asp344Val variants. Asp342 is engaged in a number of intramolecular interactions (dotted lines), which might be at least partially maintained by the quite conserved replacement with an asparagine. The residue is also close to the Ca 2+ ion cofactor binding site. Being the affected site at the γ-module surface, it might not cause a severe misfolding. However, this site is not distant from the bound β chain and therefore p.Asp342Asn might affect the interactions with this protein. The p.Asp344Val variant affects an aspartic residue whose carboxylic side chain coordinates the Ca 2+ ion, and this interaction is lost upon the non-conservative replacement with a valine. In this case, the localization of the change on the surface might not lead to the major unfolding of the γ-module but its intermolecular interactions with other fibrinogen chains can be altered.

GnomAD Variants Predicted to Be Structurally Damaging
FoldX, CUPSAT, and PoPMuSiC consistently compute that 25 out of 89 (28%) gnomAD missense variants hitting the γ-module are associated with ∆∆Gs significant for structural alterations (Table 2, ∆∆Gs enclosed in dashed boxes). Although not always true, important defects in protein structures can be the underlying cause of diseases. Thus, it was intriguing to see that so many variants in the gnomAD database, which is supposed to represent the healthy population, were predicted as strongly destabilizing the γ-module structure. Then, it was interesting to understand why mutations predicted to significantly alter the protein structure can be so divergent in their outcomes on health, as in some cases they behave as neutral mutations (gnomAD variants with high ∆∆Gs) or as pathogenic mutations (HGMD variants). In the case of the apparently anomalous gnomAD variants, an explicit analysis of the atomic configurations around the affected sites was made. Given the high number of the gnomAD variants characterized by remarkably high ∆∆Gs, the analysis was limited to few but representative cases: p.Cys179Phe, p.Trp217Gly, Glu251Gly, p.Glu277Gly, and p.Ile412Thr. The affected wild-type residues are illustrated in the crystal structure of the D-D dimer (Figure 4A), and the roles and expected effects of the amino acid replacements will now be commented on more in detail.
The p.Cys179Phe substitution disrupts the disulfide bond between Cys179 and Cys208, which in the wild type protein tighten α-helix 1 to β-strand 3 and contribute to the fold of the γ-module core ( Figure 4B). This substitution should cause γ-module misfolding and loss of functions. Indeed, indications that Cys179 is critical for γ-module function can be inferred from another variant affecting the same residue, p.Cys179Arg, already known as pathogenic ( Table 1). The p.Trp217Gly variant affects a tryptophan in the core of the γ-module that is engaged in multiple interactions holding together β-strand 3, β-strand 7, and β-strand 12 ( Figure 4C). The replacing glycine, which lacks the side chain, cannot provide the same intramolecular stabilization as the wild type tryptophan residue and cannot preserve the γ-module fold and functions. The p.Glu251Gly variant is another case were the multiple interactions sustained by the wild type residue, a glutamic acid, cannot be replaced by glycine ( Figure 4D). This non-conservative change is expected to cause major structural alteration at the γ-module region of interaction with the β-module. The p.Glu277Gly affects a glutamic acid that provides several intramolecular interactions shaping the loop between β-strand 7 and β-strand 8 and also tightening these two strands to β-strand 12 ( Figure 4E). In this case, these interactions are also lost upon the replacement with a glycine, which allows one to foresee important unfolding. Finally, the p.Ile412Thr variant hits isoleucine characterized by several hydrophobic interactions that hold together β-strand 4, β-strand 7, β-strand 12, and α-helix 3 thus contributing critically to the fold of the γ-module core. The non-conservative replacement of this isoleucine with a threonine is also expected to cause an important structural alteration in the γ-module. Thus, the visual inspection on experimental γ-module structures supports the destabilizing ∆∆Gs of these gnomAD variants as consistently predicted by FoldX, CUPSAT, and PoPMuSiC. In all the gnomAD cases here examined, a severe misfolding of the γ-module and complete loss of its functions is the likely outcome. Structural alterations only of the moderate entity and confined near the affected sites so that they would warrant a somehow normal γ-module functioning are not plausible. In fact, the non-conserved amino acid substitutions in these particular locations with critical structural importance cannot occur without important conformational changes, which, in such a small-sized globular structure, are easily relayed and extended to other γ-module regions severely impairing its functions. Thus, these gnomAD variants are candidates as pathogenic variants. Below I explain why variants strongly altering the protein structure do not always end up in noticeable diseases. as in some cases they behave as neutral mutations (gnomAD variants with high ∆∆Gs) or as pathogenic mutations (HGMD variants). In the case of the apparently anomalous gnomAD variants, an explicit analysis of the atomic configurations around the affected sites was made. Given the high number of the gnomAD variants characterized by remarkably high ∆∆Gs, the analysis was limited to few but representative cases: p.Cys179Phe, p.Trp217Gly, Glu251Gly, p.Glu277Gly, and p.Ile412Thr. The affected wild-type residues are illustrated in the crystal structure of the D-D dimer (Figure 4A), and the roles and expected effects of the amino acid replacements will now be commented on more in detail.  (C) The p.Trp217Gly replacement is also non-conserved, it causes the loss of several intramolecular interactions holding together various β strands in the γ-module core and is expected to cause misfolding. (D) The non-conserved p.Glu251Gly substitution causes the loss of multiple intramolecular interactions and is expected to cause severe structural changes in the γ-module. (E) The non-conserved p.Glu277Gly replacement disrupts multiple intramolecular interactions necessary to tighten various β strands and is predicted to cause important structural alteration. (F) The p.Ile412Thr variant affects isoleucine engaged in several hydrophobic interactions contributing to the γ-module fold. The non-conserved replacement with a threonine is expected to misfold the module. In the detailed views, the interactions between the affected wild type residues and surrounding residues contributing to the fold of the γ-module, are indicated by dotted lines.

Rationalization of the Missense Variants Shared by the HGMD and GnomAD Databases
GnomAD includes many missense variants in the γ-module expected to severely impair its structure and function. To dissipate doubts that a significant number of them can be pathogenic suffices to notice that gnomAD shares twelve missense variants with the pathogenic HGMD database: p.Gly191Arg, p.Tyr207Cys, p.Gly226Val, p.Tyr237His, p.Ser245Phe, p.Gln265His, p.Arg301His, p.Tyr306Cys, p.Asn334Ile, p.Ala367Thr, p.Gly377Ser, and p.Asn387Lys. Most of these variants imply non-conservative amino acid changes and critically affect the protein structure/function. The case of p.Asn334Ile is illustrated in Figure 3D, but some of these variants have been functionally characterized and described in the literature. gnomAD and HGMD also contain additional variants affecting the same amino acid residues but with different amino acid replacements (gnomAD and HGMD variants that are identical or that affect the same residue but with different substituting residues are marked in Tables 1 and 2). The fibrinogen variants simultaneously included in both HGMD and gnomAD can be rationalized assuming that their impact on the protein structure is sufficient to induce its degradation or to reduce, at least partially, its ability to bind other proteins thus limiting negative dominant effects such as defective polymerization or aspecific aggregations. This also requires that haploinsufficiency is absent or only modest. This last point can be confirmed by the truncating or frameshift mutations in the fibrinogen γ chain currently available in gnomAD and all reported exclusively in heterozygosity: p.Tyr15Ter, p.Pro102GlnfsTer3, p.Met120Ter, p.Arg134Ter, p.Trp234Ter, p.Leu261PhefsTer36, p.Ala305AsnfsTer15, p.Phe321LeufsTer42. These mutations are clearly problematic for the γ chain functions as they range from null mutations (the most N-terminal variants) to mutations that delete totally or to a great extent at least the γ-module sequence, which spans residues 174-415 ca. (numbering from the initiator methionine of fibrinogen γ chain: UniProt code P02679). Among the above mutations, those that abrogate the functions of fibrinogen are candidates as causative a disease such as afibrinogenemia, which requires homozygosity or compound heterozygosity of null mutations. It is worth to mention that p.Arg134Ter, which is unique among the above truncating and frameshift gnomAD variants also reported in HGMD, was found in homozygosity in ten afibrinogenemic patients born after consanguineous marriages [22]. Interestingly enough, p.Arg134Ter is the variant with the highest allele frequency within the truncating and frameshift gnomAD variants ( Figure 5A). Another interesting observation can be made examining the allele frequencies of the missense variants in gnomAD. In this case, it also happens that the most frequent gnomAD missense variant, p.Gly191Arg (allele frequency 2.77E-03), is also reported in HGMD. The second most frequent gnomAD missense variant, p.Ser245Phe (allele frequency of 1.52E-04), is also present in HGMD. Furthermore, most of the other gnomAD missense variants that happened to be simultaneously reported by HGMD are also distributed on the higher frequency side compared to the missense variants represented exclusively in the gnomAD database ( Figure 5B). It, therefore, appears that some of the gnomAD variants were eventually characterized as pathogenic simply because they occur less rarely than most of the gnomAD variants. This is why potentially pathogenic variants can be found in gnomAD, or, more correctly, their pathogenic nature waits to be discovered. This study allows one to propose that the gnomAD database contains several potentially pathogenic variants that obey a recessive modality of disease transmission and/or exhibit negative dominant effects in a non-quantitative fashion, or at least not causing overt diseases so that they require additional pathogenic mutations in other fibrinogen chains or other genes and/or particular environmental factors to result in evident or more severe illness. Carriers of such variants with only one defective allele can present physical conditions ranging from normal health to mild and/or difficult to identify disorders, or diseases with late-onset or triggered by environmental factors or trauma, etc. Fortunately, most of the γ-module variants catalogued in gnomAD present very low allele frequency ( Table 2) and those that could indeed be pathogenic rarely combine with the same or other pathogenic mutations. However, there are exceptions like p.Arg301His, which is reported in both HGMD and gnomAD databases (in the latter with allele frequency 8.00E-06) and considered a hotspot mutation for dysfibrinogenemia. One of the disorders associated with this variant is thromboembolism whose genetic risk is increased also by p.Gln534Arg in coagulation factor V (factor V Leiden variant), which is also reported in gnomAD and quite common (allele frequency of 9.81E−01). While heterozygous carriers of either one or the other variant are mostly asymptomatic, patients with both heterozygosities manifest thromboembolism [23,24]. Despite fibrinogen γ chain p.Arg301His is a rare pathogenic mutation (its base-10 logarithm frequency is close to the average for the gnomAD fibrinogen γ chain missense variants, Figure 5), this variant can easily combine with the much more common factor V Leiden variant and lead quantitatively to thromboembolism. This study allows one to propose that the gnomAD database contains several potentially pathogenic variants that obey a recessive modality of disease transmission and/or exhibit negative dominant effects in a non-quantitative fashion, or at least not causing overt diseases so that they require additional pathogenic mutations in other fibrinogen chains or other genes and/or particular environmental factors to result in evident or more severe illness. Carriers of such variants with only one defective allele can present physical conditions ranging from normal health to mild and/or difficult to identify disorders, or diseases with late-onset or triggered by environmental factors or trauma, etc. Fortunately, most of the γ-module variants catalogued in gnomAD present very low allele frequency ( Table 2) and those that could indeed be pathogenic rarely combine with the same or other pathogenic mutations. However, there are exceptions like p.Arg301His, which is reported in both HGMD and gnomAD databases (in the latter with allele frequency 8.00E-06) and considered a hotspot mutation for dysfibrinogenemia. One of the disorders associated with this variant is thromboembolism whose genetic risk is increased also by p.Gln534Arg in coagulation factor V (factor V Leiden variant), which is also reported in gnomAD and quite common (allele frequency of 9.81E−01). While heterozygous carriers of either one or the other variant are mostly asymptomatic, patients with both heterozygosities manifest thromboembolism [23,24]. Despite fibrinogen γ chain p.Arg301His is a rare pathogenic mutation (its base-10 logarithm frequency is close to the average for the gnomAD fibrinogen γ chain missense variants, Figure 5), this variant can easily combine with the much more common factor V Leiden variant and lead quantitatively to thromboembolism.
Care must be exercised with variants found in the healthy population and believed to represent neutral mutations as a yet unknown fraction of them can be disease-causing mutations.

Protein Structural Analysis
The structural analysis to determine the regions of binding of the γ-module was performed on the crystal structure of fragment double-D from human fibrin (PDB entry 1FZC). ∆∆G calculations were made with FoldX (v5.0), PoPMuSiC (v3.0), and CUPSAT for one amino acid replacement at a time on the crystal structure of the C-terminal fragment of the fibrinogen gamma chain representing the unbound γ-module with best atomic resolution reported to date (PDB entry 3FIB). Prior to executing FoldX, the PDB repair utility was applied to the crystal structure coordinate file, and calculations were then carried out averaging the ∆∆G results of FoldX over 5 runs for each mutation. ∆∆G calculations with PoPMuSiC and CUPSAT were performed directly on the original PDB structure 3FIB. The stability change thresholds here employed were those tested for the highest accuracy of predictions of the individual FoldX, PoPMuSiC, and CUPSAT methods on sets of experimentally characterized mutations: FoldX, |∆∆G| > 1.0 Kcal/mol [25]; PoPMuSiC, |∆∆G| > 0.5 Kcal/mol [26], and CUPSAT, |∆∆G| > 1.0 Kcal/mol [19]. Protein secondary structure elements were determined with STRIDE [27]. Molecular graphics were made with PyMOL (www.pymol.org). The missense variants of gnomAD were retrieved from the database v2.1.1, the truncating and frameshift variants of gnomAD were collected from the database v3.1. HGMD variants were downloaded from the public version of the database in the first half of May 2020.

Conclusions
Four main findings emerge from this work regarding useful criteria to discriminate recessive from negative dominant variants in the fibrinogen γ chain.
(a) A number of variants in the healthy population might indeed be pathogenic with an autosomal recessive modality of disease transmission. Variants with such characteristics can especially be those inducing severe protein misfolding leading to the degradation of the protein or at least its inability to recruit the designed interacting fibrinogen protein partners or to undergo undue aggregations, i.e., variants causing full loss of function, or not insinuating into negative domain effects. (b) Other potentially disease-causing variants found in the healthy population are those with negative dominant effects producing mild and/or difficult to identify disorders and/or late-onset diseases, or disorders caused by specific environmental factors, trauma, etc. (c) A recurrent observation in this and previous studies is that a number of pathogenic variants hit the protein structure only "softly" but in spots important for protein-protein interactions. Proteins with defects in these critical regions, if not efficiently neutralized by the cellular mechanism of protein degradation or at least by a loss of the protein capacity to bind and recruit other proteins render these variants candidates to be checked as negative dominant mutations. This can be particularly true for proteins engaging in processes of homo-and homo-hetero multimerization. (d) This study highlights that a significant fraction of gnomAD variants are not neutral (for their co-presence in the HGMD database), and this might be only the tip of the iceberg (as suggested by ∆∆G calculations and protein structural analysis). Care must be exercised for all variants found in the general population and believed as neutral mutations.
Finally, the considerations here presented might help to address the pathogenic mechanisms of many other proteins that, likewise the fibrinogen γ chain, are involved in homo-and hetero-multimerization processes.
Funding: This research received no external funding.