Prediction of Temperature Factors in Proteins: Effect of Data Pre-Processing and Experimental Conditions
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe author shows that he can improve
his predictions of B-factor values by using more detailled models.
The correlation with experimental values reached is rather high.
MD-RMSF are also predicted in a consistent way,
though in a few cases only.
Main points:
Talking about graphs and B-factors without mentioning (or even quoting)
the GNM of Bahar, Erman and col. is a bit weird.
The paper of Hinsen on the effect of crystal contacts on
B-factor predictions needs also to be mentioned.
If you want your work to be further considered (and used),
you need to make your scripts available.
Others questions/minor points:
l174: X-ray resolution between 1.6 Å and 2.6 Å
What is the rationale for excluding the highest resolution ones ?
PDB-REDO: there are two flavors of models there
(conservative refinment or more extensive one).
Which one was considered ?
What about water molecules ?
I suppose they were removed but this is not mentioned.
I also suppose that keeping buried ones would
not help much but maybe this deserves a check.
l249: this study considers all (non-H, I suppose) protein atoms.
But the graph cutoff is 7: this seems large, since
this value corresponds to a typical choice
for defining residue-residue contacts.
Did you try less ? (e.g. 4.5)
l278: Details of the preprocessed data can be found in Table 1.
Not a good idea to do that. Better put enough informations in the
legend so that the Figure becomes self explanatory.
Author Response
Comment 1: Talking about graphs and B-factors without mentioning (or even quoting) the GNM of Bahar, Erman and col. is a bit weird. The paper of Hinsen on the effect of crystal contacts on B-factor predictions needs also to be mentioned.
Response 1: Additional text whit corresponding references (Tirion, 1996; Bahar, 1997; Haliloglu 1997; Hinsen 2008) was added in the Manuscript.
The following text was added to the manuscript (line 55-60):
In addition to the linear models, the thermal fluctuations were evaluated using the Kirchhoff matrix, which is also known as the Laplacian matrix in spectral graph theory. The inverse of the Kirchhoff matrix, whose diagonal elements reflect the thermal motion of the atoms, proved to be suitable for estimating the B-factors of the Cα atoms.
-
Bahar I, Atilgan AR, Erman B. Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. Fold Des. 1997;2(3):173-81.
-
Turkan Haliloglu, Ivet Bahar, and Burak Erman, Gaussian Dynamics of Folded Proteins, Phys. Rev. Lett. 79, 3090 – Published 20 October, 1997.
-
Tirion,M. (1996) Large-amplitude elastic motions in proteins from a single-parameter atomic analysis. Phys. Rev. Lett., 77, 1905–1908.
The following text was added to the manuscript (line 90-92):
Hinsen (2008) investigated the effects of close crystal contacts on atomic fluctuations using egg white lysozyme. The study demonstrated that crystal packing interactions can significantly influence the magnitude of atomic fluctuations.
- Konrad Hinsen, Structural flexibility in proteins: impact of the crystal environment, Bioinformatics, Volume 24, Issue 4, February 2008, Pages 521–528.
Comment 2: If you want your work to be further considered (and used), you need to make your scripts available.
Response 2: Scripts are available at GitHub. The following sentence was added to the manuscript (Methods section - Data Set and Software):
R scripts with an example can be found at https://github.com/jure-praznikar/B-factor-prediction.
Comment 3: l174: X-ray resolution between 1.6 Å and 2.6 Å. What is the rationale for excluding the highest resolution ones?
Response 3: The distribution of PDB (or PDB-REDO) data by resolution shows that most entries are at a resolution of 1.8-2.0 Å. Above a resolution of 1.6 Å, the number of proteins tends to be limited; moreover, at high resolution, anisotropic B-factors are usually refined. The linear GDV model was learned exclusively for isotropic B-factors and cannot be used to predict anisotropic B-factors.
Comment 4: PDB-REDO: there are two flavors of models there (conservative refinement or more extensive one). Which one was considered ?
Response 4: The more extensive one (re-refined and rebuild) was used.
The following sentence was improved:
The re-refined and rebuild protein structures were retrieved from the PDB-REDO database.
Comment 5: What about water molecules? I suppose they were removed but this is not mentioned. I also suppose that keeping buried ones would not help much but maybe this deserves a check.
Response 5: Yes, the water molecules were removed. The effects of buried and non-buried water molecules have not yet been analyzed in detail. However, during the research, some preliminary work and calculations were done with small ligands that could have a similar effect as buried water molecules. The results were inconsistent, suggesting that this issue should be further investigated in future studies.
In general, small and more mobile molecules — such as water or loosely bound ligands - can increase the number of atomic contacts, which may lead to an underestimation of B-factors. Therefore, special care should be taken when including such heteroatoms, as they are not covalently bound to the protein and may introduce artifacts into the predictions. Another case is molecules that are covalently bound to the protein, such as glycans in glycoproteins.
Be that as it may, this study has shown that relatively large ligands significantly influence the predictions for the B-factor. For other scenarios, such as buried or non-buried water molecules, small ligands or covalently bound ligands, further studies are needed to determine if and how they affect the accuracy of B-factor prediction. This is mentioned in the conclusion, quote: “However, a limitation of the presented approach is the focus on relatively large ligands and the use of arbitrarily defined selection criteria. It should be emphasized that the simple inclusion of mobile solvent molecules would increase the number of close contacts, which could lead to an overestimation of the rigidity of the outer atoms - a problem that also applies to small ligands. Therefore, further research is needed to find out which and how many heteroatoms significantly influence the flexibility of certain protein regions.”
Comment 6: l249: this study considers all (non-H, I suppose) protein atoms. But the graph cutoff is 7: this seems large, since this value corresponds to a typical choice for defining residue-residue contacts. Did you try less ? (e.g. 4.5)
Response 6: This was tested in an earlier study (Pražnikar, 2023) over a range from 3.0 Å to 8.0 Å. The worst results were observed at 3.0 Å, and as the cutoff length increased, the performance of a linear model improved. Around 5.5 Å, the performance began to stabilize. In other words, the performance of a linear model remained stable in the range of 6.0 Å to 8.0 Å.
Comment 7: l278: Details of the pre-processed data can be found in Table 1. Not a good idea to do that. Better put enough informations in the legend so that the Figure becomes self explanatory.
Response 7: The graphical representation of the pre-processed data sets has been inserted in Figure 2.
Reviewer 2 Report
Comments and Suggestions for AuthorsIn this study, the authors described a linear model based on graphlet degree vectors for predicting B-factors of protein structures and validating deposited structural models.
The results were appropriate for evaluating the reliability of the prediction method, and the case study effectively demonstrated the accuracy and characteristics of the method under different conditions. Therefore, this study is expected to be of interest to readers, as it presents valuable insights for protein crystallography, structure validation, and protein flexibility research. Therefore, after very minor revisions commented below, the publication of this manuscript is recommended.
Corrections:
Non-necessary italic; line 60-73
Should be italic; line 357 “Rhodobacter sphaeroides”
Unclear sentence; line 383-384 “suggesting that certain regions of the protein structure are more rigid than observed in the aqueous environment”
suggestion; “suggesting that certain regions of the protein structure can be more rigid than reasonably expected in the aqueous environment”
Author Response
Comment 1: Non-necessary italic; line 60-73
Response 1: Fixed.
Comment 2: Should be italic; line 357 “Rhodobacter sphaeroides”
Response 2: Fixed.
Comment 3:Unclear sentence; line 383-384 “suggesting that certain regions of the protein structure are more rigid than observed in the aqueous environment”
suggestion; “suggesting that certain regions of the protein structure can be more rigid than reasonably expected in the aqueous environment”
Response 3: Fixed.