OPUS-BFactor: Predicting Protein B-Factor with Sequence and Structure Information

Yang, Yulu; Lv, Ying; Luo, Zhenwei; Wang, Qinghua; Xu, Gang; Ma, Jianpeng

doi:10.3390/molecules30122570

Open AccessArticle

OPUS-BFactor: Predicting Protein B-Factor with Sequence and Structure Information

by

Yulu Yang

¹,

Ying Lv

²,

Zhenwei Luo

^1,2,

Qinghua Wang

³,

Gang Xu

^1,* and

Jianpeng Ma

^1,2,*

¹

Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China

²

Shanghai AI Laboratory, Shanghai 200030, China

³

Center for Biomolecular Innovation, Harcam Biomedicines, Shanghai 200131, China

^*

Authors to whom correspondence should be addressed.

Molecules 2025, 30(12), 2570; https://doi.org/10.3390/molecules30122570

Submission received: 17 April 2025 / Revised: 6 June 2025 / Accepted: 10 June 2025 / Published: 12 June 2025

(This article belongs to the Special Issue Computational Insights into Protein Engineering and Molecular Design)

Download

Browse Figures

Versions Notes

Abstract

Protein B-factor, also known as the Debye–Waller temperature factor or atomic displacement parameter, measures the thermal fluctuation of an atom around its average position. It serves as a crucial indicator of protein flexibility and dynamics. However, accurately predicting the B-factor of Cα atoms remains challenging. In this work, we introduce OPUS-BFactor, a tool for predicting the normalized protein B-factor. OPUS-BFactor employs a transformer-based module to integrate sequence-level and pair-level features, encompassing structural attributes derived from the protein’s 3D structure and evolutionary profiles obtained from the protein language model ESM-2. Specifically, OPUS-BFactor treats pair features as a bias term, incorporating them into the attention matrix derived from the sequence-level features of each residue pair, thereby effectively merging pair features with sequence features. OPUS-BFactor operates in two modes, enabling predictions based solely on either the protein sequence or the 3D structure of the target protein. Evaluation on three test sets, including recently released targets from CAMEO and CASP15, demonstrated that OPUS-BFactor significantly outperformed other B-factor prediction methods. Therefore, OPUS-BFactor is a valuable tool for predicting protein properties related to the B-factor, such as flexibility, thermal stability, and regional activity.

Keywords:

protein B-factor; protein language model; protein flexibility

1. Introduction

Protein B-factor, also known as the Debye–Waller factor, atomic displacement parameter, or temperature factor, measures the mean squared displacement or uncertainty of atomic positions [1,2,3]. Numerous studies have shown that protein B-factor is valuable in various areas, such as predicting protein flexibility [4,5], evaluating thermal stability [6], analyzing active and disordered regions [7,8], and studying protein dynamics [9]. Since protein fluctuation provides a crucial link between structure and function [2,10], accurately predicting protein B-factor is essential for understanding the characteristics of target proteins. Recently, several studies have highlighted the limitations associated with the utilization of B-factors, underscoring the crucial significance of rescaling B-factors in protein crystal structure analyses [11,12,13].

Over the past several decades, numerous methods have been proposed for predicting protein B-factor [2,14,15,16,17,18,19]. Some studies have introduced normal mode analysis (NMA) into this field [10,20,21,22]. In NMA, the Hessian of the harmonic potential is employed to describe the atomic thermal fluctuations; therefore, the B-factors of proteins are correlated with the eigenvalues of the Hessian. Meanwhile, the Gaussian network model (GNM) and the anisotropic network model (ANM) are two elastic network models that have been widely used to study protein fluctuation dynamics [23,24,25].

In recent years, several machine learning-based models have been proposed for predicting protein B-factors [2,14,15,26,27]. Some of these models utilize support vector regression (SVR) [14,15,26], and some of them employ graph models, such as multiscale weighted colored graphs [27]. With the development of deep learning techniques [28], several new methods based on deep learning frameworks have emerged [16,18,29]. Most of these methods [16,18] adopt the bidirectional long short-term memory (BiLSTM) network [30]. Additionally, the method proposed by Sarparast et al. [29] utilizes a graph-based network to capture structural features from protein 3D structures, significantly enhancing the accuracy of B-factor prediction.

In this study, we introduce a deep learning-based model named OPUS-BFactor for predicting protein B-factors (specifically for C_α atoms). OPUS-BFactor operates in two modes. In the first mode, it uses sequence information as input, enabling predictions based solely on protein sequence. Previous sequence-based methods typically rely on one-hot encoding (representing residues as 20-dimensional binary vectors where each vector corresponds to a specific residue type) [16] or evolutionary features such as PSSM profiles (position-specific scoring matrix) and HMM profiles (hidden Markov model) [18]. Recently, numerous protein language models have been developed, significantly improving the quality of the extracted evolutionary features [31,32,33,34]. Among them, ESM-2 [35] stands out as the most widely used, with applications spanning numerous domains [36,37,38]. Consequently, in this study, OPUS-BFactor utilizes evolutionary features derived from the protein language model ESM-2. The results show that using ESM-2 features as inputs significantly improves B-factor prediction accuracy compared with those using one-hot encoding and PSSM features. In the second mode, OPUS-BFactor utilizes structural information, achieving better results than the sequence-based mode. For clarity, we refer to the results of OPUS-BFactor based on sequence information as OPUS-BFactor-seq (first mode) and the results based on structural information as OPUS-BFactor-struct (second mode).

We assessed the performance of OPUS-BFactor using three test sets: CAMEO65, CASP15, and CAMEO82. The results indicated that OPUS-BFactor-struct significantly outperformed other methods. Specifically, on the most recently released CAMEO82 test set, the average Pearson correlation coefficient (PCC) for B-factor from C_α atoms was 0.67 for OPUS-BFactor-struct and 0.58 for OPUS-BFactor-seq, compared with 0.41 for the most recent method proposed by Pandey et al. [16].

Although many methods have been proposed for protein B-factor prediction, they usually use different training and test sets, making fair comparisons difficult. Additionally, the code for many methods is not publicly available, complicating their use by other researchers. Therefore, we will make our training and test sets, as well as our code, available to all researchers. We hope that OPUS-BFactor will serve as a fair baseline method in protein B-factor prediction. Additionally, the formatted datasets may become a useful benchmark to facilitate the development of protein language models, given that the performance of sequence-based B-factor prediction models still lags behind that of structure-based models.

2. Results

2.1. Performance of Different B-Factor Prediction Methods

We evaluated the performance of OPUS-BFactor against a normal mode analysis (NMA)-based method, ProDy [39], and a deep learning-based method developed by Pandey et al. [16] across three test sets (CAMEO65, CASP15, and CAMEO82). As shown in Table 1, OPUS-BFactor consistently surpassed other methods in terms of average PCC on all three test sets. Additionally, the structure-based mode of OPUS-BFactor (OPUS-BFactor-struct) delivered better results than its sequence-based version (OPUS-BFactor-seq), indicating that structural information was crucial for accurate protein B-factor prediction.

We combined the targets from all three test sets, resulting in a complete dataset comprising 181 targets, and then conducted a head-to-head comparison by analyzing the average Pearson correlation coefficient (PCC) for each target across different methods. As shown in Figure 1, the structure-based mode of OPUS-BFactor, specifically OPUS-BFactor-struct, demonstrated superior performance in most cases, outperforming the other evaluated methods.

Furthermore, we conducted a comparative analysis of the average PCC among various methods, stratified by the lengths and subfamilies of all 181 targets. As depicted in Figure 2, OPUS-BFactor-struct and OPUS-BFactor-seq consistently outperformed other methods across various target lengths and subfamilies. However, it was noteworthy that the PCCs of all the methods exhibited a decline when evaluated on targets predominantly characterized by coil structure.

In Figure 3 andFigure 4, we present some prediction results obtained from each method. These results showed that OPUS-BFactor was capable of achieving satisfactory results for most of the residues on these targets.

2.2. Correlation Between Protein B-Factors and the pLDDT Values from Structure Prediction Methods

In recent years, numerous highly effective protein structure prediction methods have been proposed, significantly advancing the field of computational biology. To investigate the correlation between the protein B-factor and the predicted local distance difference test (pLDDT) values derived from these prediction methods, we initially employed the state-of-the-art ESMFold [35] model to predict the structures of targets within our three test sets. Subsequently, we calculated the average PCC between the real B-factor values and the predicted pLDDT scores. Given that the pLDDT score served as an indicator of the prediction uncertainty, where smaller values indicated reduced confidence in the corresponding region, we chose to employ negative pLDDT values in our correlation analysis to assess the feasibility of utilizing pLDDT scores as a proxy for protein B-factors. Note that while pLDDT values were always positive, they exhibited an inverse relationship with B-factors regarding structural flexibility: lower pLDDT values indicated higher flexibility, whereas higher B-factors indicated higher flexibility. To ensure consistent interpretation, we used negative pLDDT values in this study.

As shown in Table 2, the average PCCs between the real B-factors and pLDDT values were notably lower compared with those achieved by our sequence-based approach, OPUS-BFactor-seq. This indicated a relatively weak correlation between protein B-factors and the pLDDT values, necessitating the development of tailored approaches for predicting protein B-factors.

Meanwhile, Table 3 presents a disaggregated analysis of the average PCCs for both approaches, classified by the structural prediction accuracy of ESMFold for the entire set of 181 targets. The results demonstrated that as the structural prediction difficulty of the targets increased, there was a corresponding decrease in the accuracy of B-factor prediction when relying solely on sequence information. When using real PDB structures, OPUS-BFactor-struct exhibited only a negligible decrease in PCC between targets with TM score >0.9 and those with TM scores between 0.8 and 0.9. This demonstrated that OPUS-BFactor-struct maintained reasonably good performance for easily predicted targets. Additionally, compared with results derived from real PDB structures, those based on ESMFold exhibited inferior performance. Notably, this performance gap widened as the quality of the predicted structure decreased.

Furthermore, we used AlphaFold2 [40] to predict the structures of 44 targets in the CASP15 test set and calculated the PCC between the real B-factors and their corresponding pLDDT values. The results showed that the average PCC between the real B-factors and the pLDDT values from AlphaFold2 on the CASP15 test set was 0.23, which was even lower than the PCC achieved by ESMFold (0.24 in Table 2). Consequently, this finding also indicated a relatively weak correlation between B-factors and pLDDT values, which was consistent with earlier observations reported by Carugo et al. [41].

2.3. Evaluation of Different Evolutionary Profiles

We evaluate the performance of sequence-based B-factor prediction models (OPUS-BFactor-seq) using different evolutionary profiles on the CAMEO82 test set. As shown in Figure 5, the model using ESM-2 features as inputs significantly improved prediction accuracy compared with models using (1) one-hot encoding (represented as a 20-dimensional binary vector indicating residue identity), (2) HMM profiles (generated using hh-suite), or (3) PSSM features (derived from BLAST2.14 alignments). This indicated that the performance of protein B-factor prediction could be enhanced by the utilization of more advanced evolutionary features.

2.4. Case Study

In this study, we utilized OPUS-BFactor-seq to predict the B-factors for T4 lysozyme and the tumor suppressor p53 based on their sequences exclusively. As shown in Figure 6A, we highlight two regions (regions A and B) in the prediction with relatively high values on the T4 lysozyme. The studies from other researchers show that region A (D20-G23) corresponds to the active site of T4 lysozyme [42], and region B (K35-L39) is a relatively flexible region as some studies indicate that an insertion or duplication of short peptide fragments in this area may cause a secondary structural transition (from helix to strand) [43]. Furthermore, in Figure 6B, we observe a region with relatively high B-factor values in tumor suppressor p53 (region C), which is related to its DNA-binding site [44]. Our results suggest that OPUS-BFactor could effectively predict B-factor-related properties like flexibility, thermal stability, and functional activity.

3. Method

3.1. Framework of OPUS-BFactor

OPUS-BFactor adopts the RotaFormer module from OPUS-Rota5 [45] as its backbone architecture, with some modifications. As shown in Figure 7, the 1D and 2D features are derived from protein structural information, while the ESM-2 features are obtained from the protein language model ESM-2 [35], which relies solely on protein sequence. Specifically, the 1D protein features include two one-hot encoded features for the 3-state and 8-state secondary structures, seven physicochemical properties [46,47], 19 PSP features representing 19 rigid-body blocks within residues [47,48,49], and six backbone torsion angle features (sine and cosine values for ϕ, ψ, and ω). The 2D features describe residue–residue backbone contact information [50,51], including C_β–C_β distance distributions and orientational distributions of three dihedrals (ω, θ_ab, and θ_ba) and two angles (φ_ab and φ_ba) between residues a and b. Here, ω represents the dihedral of Cαa–Cβa–Cβb–Cαb, θab represents the dihedral of Na–Cαa–Cβa–Cβb, and φab represents the angle of Cαa–Cβa–Cβb. Distances of C_β–C_β span from 2 to 20 Å, segmented into 36 bins at 0.5 Å intervals, with an additional bin for distances exceeding 20 Å. The φ angle ranges from 0 to 180°, divided into 18 bins at 10° intervals, with an extra bin for non-contact scenarios. Both ω and θ range from −180 to 180°, segmented into 36 bins at 10° intervals, with an extra bin for non-contact scenarios. The ESM-2 features include a 1280-dimensional feature for each residue, containing their evolutionary information.

In OPUS-BFactor, the embedding module transforms the 1D features into sequence and pair features. The 2D features are processed through a 2D convolution layer and then added to the pair features, while the ESM-2 features are passed through a dense layer and added to the sequence features. Next, the RotaFormer module [45] is used to integrate the sequence and pair features. OPUS-BFactor employs 24 RotaFormers for feature extraction. After that, four BiLSTM layers [30] are used to further aggregate the sequence features and output the predicted B-factor value of C_α atoms. Following Pandey et al. [16], since the normalized B-factor has been shown to be more robust against experimental noise [1], we used the normalized B-factor in this study.

During training, we used the mean absolute error (MAE) loss between the predicted and actual normalized B-factor. We employed the Glorot uniform initializer and the Adam optimizer [52]. However, one notable limitation of OPUS-BFactor is that despite the sophistication of its transform-based model, it requires substantial computational resources, with each epoch taking nearly 15 h to complete. To address this, we initiated the training process with a learning rate of 1 × 10³ and halved it every two epochs. Additionally, we randomly selected 90% of the training data for model training. The training was conducted for a total of six epochs. Following this, the remaining 10% of the data was utilized as a validation set to select the optimal model. To mitigate overfitting and enhance generalization, we employed early stopping with a patience of 4 evaluations, selecting the best model checkpoint based on peak validation performance. Validation was performed every 1500 training steps. OPUS-BFactor was developed using TensorFlow v2.4 [53] and trained on four NVIDIA Tesla V100 GPUs.

3.2. Datasets

In OPUS-BFactor, we used the same training dataset as trRosetta [50]. Additionally, we removed proteins where all residues shared identical B-factors. To evaluate the performance across various methods, we utilized three recently released test sets. The first, CAMEO65, was collected by Xu et al. [54] and contained 65 challenging targets released between May 2021 and October 2021 from the CAMEO website [55]. After filtering, 62 targets remained. The second test set, CASP15, included 44 targets available from the CASP website (http://predictioncenter.org (accessed on 1 January 2020)). The third, CAMEO82, was collected by Xu et al. [39] and contained 82 targets released between May 2023 and August 2023 from the CAMEO website, with 75 targets remaining after filtering. In this study, we used the normalized B-factor (B) for each C_α atom as the corresponding labels, calculated using the formula B = (B − μ)/σ, where μ and σ are the mean and standard deviation of the unnormalized B-factor value (B) within the target protein.

3.3. Performance Metrics

To evaluate the accuracy of each method, we used the average Pearson correlation coefficient (PCC) for each test set as our metric.

3.4. Data and Software Availability

The code and pre-trained models of OPUS-BFactor, as well as the datasets used in the study, can be downloaded from http://github.com/OPUS-MaLab/opus_bfactor (accessed on 1 January 2020).

4. Concluding Discussion

In this study, we propose a protein B-factor prediction method called OPUS-BFactor, which operates in two modes: the first one (OPUS-BFactor-seq) uses sequence information exclusively, allowing predictions based solely on protein sequence, and the second one (OPUS-BFactor-struct) utilizes structural information, requiring the coordinates of backbone atoms in the target protein. The results (Table 1, Figure 1, Figure 2, Figure 3 and Figure 4) on three recently released test sets showed that our method significantly outperformed other B-factor prediction methods. Meanwhile, the results highlight a performance gap between sequence-based and structure-based B-factor prediction models; the latter is significantly better than the former.

It should be noted that most of the previous methods employed the LSTM architecture as their neural network backbone. In contrast, OPUS-BFactor is based on a more sophisticated transformer-based network architecture (i.e., RotaFormer module in Figure 7), which is capable of integrating features between the sequence level and the pair level more effectively. Specifically, in OPUS-BFactor, sequence-level features are integrated with pair-level features through an outer product operation. Meanwhile, pair-level features serve as a bias term to integrate with the attention matrix derived from sequence-level features. Moreover, given that most previous methods relied on traditional evolutionary features such as PSSM or HMM profiles, the superior performance of OPUS-BFactor can also be largely attributed to its utilization of more powerful evolutionary features derived from ESM-2.

We also evaluated the correlation between real B-factors and predictions from OPUS-BFactor-seq, as well as the correlation between real B-factors and the pLDDT from ESMFold and AlphaFold2; all of them relied on sequence information exclusively. The results (Table 2 and Table 3) showed that OPUS-BFactor-seq delivered better results. In this case, B-factor prediction methods, such as OPUS-BFactor, can be utilized to provide additional information regarding protein flexibility for the structure prediction methods such as ESMFold and AlphaFold2.

Additionally, the results on T4 lysozyme (Figure 6A) and the tumor suppressor p53 (Figure 6B) indicated that the regions with relatively high values of B-factors predicted by OPUS-BFactor-seq corresponded with the active/binding sites and flexible regions of the target. Therefore, OPUS-BFactor-seq may serve as a useful tool for predicting protein properties related to the B-factor, such as flexibility, thermal stability, and regional activity.

Furthermore, the results (Figure 5) showed that the performance of protein B-factor prediction may benefit from more advanced evolutionary features. In this case, protein B-factor prediction could serve as a valuable benchmark task for assessing protein language models. To facilitate this, we will make our formatted training and test sets, along with our code, available to all researchers.

Although OPUS-BFactor achieved a relatively high correlation with the B-factors from the PDB file, it should be noted that further investigation is needed to implement stricter filtering. This is because B-factors can be influenced by various factors beyond conformational flexibility, such as static disorder, crystal packing effects, and experimental noise. Furthermore, OPUS-BFactor currently cannot reliably differentiate B-factor variations between closely related protein structures. This limitation suggests an important direction for future improvement, potentially through the integration of additional structural or evolutionary information into the model. Such refinements could improve predictive accuracy and expand its applications, such as assessing the quality of predicted protein structures.

Author Contributions

Conceptualization, J.M. and G.X.; methodology, G.X.; software, G.X.; validation, Y.Y., Y.L., and G.X.; formal analysis, Y.Y., Z.L., and G.X.; investigation, Y.Y. and G.X.; resources, J.M. and G.X.; data curation, G.X.; writing—original draft preparation, G.X. and Y.Y.; writing—review and editing, J.M. and Q.W.; visualization, G.X.; supervision, J.M.; project administration, J.M.; funding acquisition, J.M. and G.X. All authors have read and agreed to the published version of the manuscript.

Funding

J.M. wants to thank the support from the National Key Research and Development Program of China (No. 2024YFA1307502), the Science and Technology Innovation Plan of Shanghai Science and Technology Commission (No. 23JS1400200), and the Research Fund for International Senior Scientists (No. W2431060). G.X. wants to thank the support from the National Natural Science Foundation of China (No. 32300535).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The code and pre-trained models of OPUS-BFactor, as well as the datasets used in the study, can be downloaded from http://github.com/OPUS-MaLab/opus_bfactor (accessed on 1 January 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

pLDDT

predicted local distance difference test

References

Sun, Z.T.; Liu, Q.; Qu, G.; Feng, Y.; Reetz, M.T. Utility of B-Factors in Protein Science: Interpreting Rigidity, Flexibility, and Internal Motion and Engineering Thermostability. Chem. Rev. 2019, 119, 1626–1665. [Google Scholar] [CrossRef] [PubMed]
Bramer, D.; Wei, G.W. Blind prediction of protein B-factor and flexibility. J. Chem. Phys. 2018, 149, 134107. [Google Scholar] [CrossRef] [PubMed]
Carugo, O.; Argos, P. Reliability of atomic displacement parameters in protein crystal structures. Acta Crystallogr. Sect. D Struct. Biol. 1999, 55, 473–478. [Google Scholar] [CrossRef] [PubMed]
Vihinen, M.; Torkkila, E.; Riikonen, P. Accuracy of Protein Flexibility Predictions. Proteins 1994, 19, 141–149. [Google Scholar] [CrossRef]
Karplus, P.A.; Schulz, G.E. Prediction of Chain Flexibility in Proteins—A Tool for the Selection of Peptide Antigens. Naturwissenschaften 1985, 72, 212–213. [Google Scholar] [CrossRef]
Parthasarathy, S.; Murthy, M.R.N. Protein thermal stability: Insights from atomic displacement parameters (B values). Protein Eng. Des. Sel. 2000, 13, 9–13. [Google Scholar] [CrossRef]
Yuan, Z.; Zhao, J.; Wang, Z.X. Flexibility analysis of enzyme active sites by crystallographic temperature factors. Protein Eng. Des. Sel. 2003, 16, 109–114. [Google Scholar] [CrossRef]
Radivojac, P.; Obradovic, Z.; Smith, D.K.; Zhu, G.; Vucetic, S.; Brown, C.J.; Lawson, J.D.; Dunker, A.K. Protein flexibility and intrinsic disorder. Protein Sci. 2004, 13, 71–80. [Google Scholar] [CrossRef]
Atilgan, A.R.; Durell, S.R.; Jernigan, R.L.; Demirel, M.C.; Keskin, O.; Bahar, I. Anisotropy of fluctuation dynamics of proteins with an elastic network model. Biophys. J. 2001, 80, 505–515. [Google Scholar] [CrossRef]
Ma, J.P. Usefulness and limitations of normal mode analysis in modeling dynamics of biomolecular complexes. Structure 2005, 13, 373–380. [Google Scholar] [CrossRef]
Mlynek, G.; Djinovic-Carugo, K.; Carugo, O. B-Factor Rescaling for Protein Crystal Structure Analyses. Crystals 2024, 14, 443. [Google Scholar] [CrossRef]
Carugo, O. Atomic displacement parameters in structural biology. Amino Acids 2018, 50, 775–786. [Google Scholar] [CrossRef] [PubMed]
Carugo, O. Uses and Abuses of the Atomic Displacement Parameters in Structural BiologyStructural biology. In Data Mining Techniques for the Life Sciences; Carugo, O., Eisenhaber, F., Eds.; Springer: New York, NY, USA, 2022; pp. 281–298. [Google Scholar]
Pan, X.Y.; Shen, H.B. Prediction of Protein B-factor Profile based on Feature Selection and Kernel Learning. In Proceedings of the 2009 Chinese Conference on Pattern Recognition, Nanjing, China, 4–6 November 2009; Volume 1–2, pp. 588–592. [Google Scholar]
Yuan, Z.; Bailey, T.L.; Teasdale, R.D. Prediction of protein B-factor profiles. Proteins 2005, 58, 905–912. [Google Scholar] [CrossRef]
Pandey, A.; Liu, E.; Graham, J.; Chen, W.; Keten, S. B-factor prediction in proteins using a sequence-based deep learning model. Patterns 2023, 4, 100805. [Google Scholar] [CrossRef]
Pang, Y.-P. Use of multiple picosecond high-mass molecular dynamics simulations to predict crystallographic B-factors of folded globular proteins. Heliyon 2016, 2, e00161. [Google Scholar] [CrossRef]
Wang, Q.; Xiao, X.; Miao, Z.; Zhang, X.; Jiang, B.; Liu, M. Prediction of Protein B-factor Profiles based on Bidirectional Long Short-Term Memory Network. ChemRxiv 2023. [Google Scholar] [CrossRef]
Weiss, M.S. On the interrelationship between atomic displacement parameters (ADPs) and coordinates in protein structures. Acta Crystallogr. Sect. D Struct. Biol. 2007, 63, 1235–1242. [Google Scholar] [CrossRef]
Kidera, A.; Go, N. Refinement of Protein Dynamic Structure—Normal Mode Refinement. Proc. Natl. Acad. Sci. USA 1990, 87, 3718–3722. [Google Scholar] [CrossRef]
Kidera, A.; Go, N. Normal Mode Refinement—Crystallographic Refinement of Protein Dynamic Structure. 1. Theory and Test by Simulated Diffraction Data. J. Mol. Biol. 1992, 225, 457–475. [Google Scholar] [CrossRef]
Diamond, R. On the Use of Normal-Modes in Thermal Parameter Refinement—Theory and Application to the Bovine Pancreatic Trypsin-Inhibitor. Acta Crystallogr. Sect. A Found. Crystallogr. 1990, 46, 425–435. [Google Scholar] [CrossRef]
Haliloglu, T.; Bahar, I.; Erman, B. Gaussian dynamics of folded proteins. Phys. Rev. Lett. 1997, 79, 3090–3093. [Google Scholar] [CrossRef]
Bahar, I.; Atilgan, A.R.; Erman, B. Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. Fold. Des. 1997, 2, 173–181. [Google Scholar] [CrossRef] [PubMed]
Tirion, M.M. Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Phys. Rev. Lett. 1996, 77, 1905–1908. [Google Scholar] [CrossRef]
Yang, J.Y.; Wang, Y.; Zhang, Y. ResQ: An Approach to Unified Estimation of B-Factor and Residue-Specific Error in Protein Structure Prediction. J. Mol. Biol. 2016, 428, 693–701. [Google Scholar] [CrossRef]
Bramer, D.; Wei, G.W. Multiscale weighted colored graphs for protein flexibility and rigidity analysis. J. Chem. Phys. 2018, 148, 054103. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Sarparast, S.; Zaimi, A.; Ebert, M.; Goldsmith, M.-R.J.A. Advanced atom-level representations for protein flexibility prediction utilizing graph neural networks. arXiv 2024, arXiv:2408.12519. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Wang, C.; Fan, H.; Quan, R.; Yang, Y.J.A. ProtChatGPT: Towards Understanding Proteins with Large Language Models. arXiv 2024, arXiv:2402.09649. [Google Scholar]
Lv, L.; Lin, Z.; Li, H.; Liu, Y.; Cui, J.; Chen, C.Y.-C.; Yuan, L.; Tian, Y. ProLLaMA: A Protein Language Model for Multi-Task Protein Language Processing. IEEE Trans. Artif. Intell. 2024, 1–12. [Google Scholar] [CrossRef]
Su, J.; Han, C.; Zhou, Y.; Shan, J.; Zhou, X.; Yuan, F. SaProt: Protein Language Modeling with Structure-aware Vocabulary. bioRxiv 2023. [Google Scholar] [CrossRef]
Li, M.; Tan, P.; Ma, X.; Zhong, B.; Yu, H.; Zhou, Z.; Ouyang, W.; Zhou, B.; Hong, L.; Tan, Y. ProSST: Protein Language Modeling with Quantized Structure and Disentangled Attention. bioRxiv 2024. [Google Scholar] [CrossRef]
Lin, Z.M.; Akin, H.; Rao, R.S.; Hie, B.; Zhu, Z.K.; Lu, W.T.; Smetanin, N.; Verkuil, R.; Kabeli, O.; Shmueli, Y.; et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023, 379, 1123–1130. [Google Scholar] [CrossRef] [PubMed]
Du, Z.; Ding, X.; Hsu, W.; Munir, A.; Xu, Y.; Li, Y. pLM4ACE: A protein language model based predictor for antihypertensive peptide screening. Food Chem. 2024, 431, 137162. [Google Scholar] [CrossRef]
Xu, X.; Bonvin, A.M.J.J. DeepRank-GNN-esm: A graph neural network for scoring protein–protein models using protein language model. Bioinform. Adv. 2024, 4, vbad191. [Google Scholar] [CrossRef]
Zeng, S.; Wang, D.; Jiang, L.; Xu, D. Parameter-efficient fine-tuning on large protein language models improves signal peptide prediction. Genome Res. 2024, 34, 1445–1454. [Google Scholar] [CrossRef]
Bakan, A.; Meireles, L.M.; Bahar, I. Protein Dynamics Inferred from Theory and Experiments. Bioinformatics 2011, 27, 1575–1577. [Google Scholar] [CrossRef]
Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Zídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
Carugo, O. pLDDT Values in AlphaFold2 Protein Models Are Unrelated to Globular Protein Local Flexibility. Crystals 2023, 13, 1560. [Google Scholar] [CrossRef]
Weaver, L.H.; Matthews, B.W. Structure of Bacteriophage-T4 Lysozyme Refined at 1.7 a Resolution. J. Mol. Biol. 1987, 193, 189–199. [Google Scholar] [CrossRef]
Kaur, H.; Sasidhar, Y.U. Molecular dynamics study of an insertion/duplication mutant of bacteriophage T4 lysozyme reveals the nature of α → β transition in full protein context. Phys. Chem. Chem. Phys. 2013, 15, 7819–7830. [Google Scholar] [CrossRef] [PubMed]
Joerger, A.C.; Allen, M.D.; Fersht, A.R. Crystal structure of a superstable mutant of human p53 core domain—Insights into the mechanism of rescuing oncogenic mutations. J. Biol. Chem. 2004, 279, 1291–1296. [Google Scholar] [CrossRef] [PubMed]
Xu, G.; Luo, Z.; Yan, Y.; Wang, Q.; Ma, J. OPUS-Rota5: A highly accurate protein side-chain modeling method with 3D-Unet and RotaFormer. Structure 2024, 32, 1001–1010.e1002. [Google Scholar] [CrossRef]
Hanson, J.; Paliwal, K.; Litfin, T.; Yang, Y.D.; Zhou, Y.Q. Improving prediction of protein secondary structure, backbone angles, solvent accessibility and contact numbers by using predicted contact maps and an ensemble of recurrent and residual convolutional neural networks. Bioinformatics 2019, 35, 2403–2410. [Google Scholar] [CrossRef]
Xu, G.; Wang, Q.H.; Ma, J.P. OPUS-TASS: A protein backbone torsion angles and secondary structure predictor based on ensemble neural networks. Bioinformatics 2020, 36, 5021–5026. [Google Scholar] [CrossRef]
Lu, M.Y.; Dousis, A.D.; Ma, J.P. OPUS-PSP: An orientation-dependent statistical all-atom potential derived from side-chain packing. J. Mol. Biol. 2008, 376, 288–301. [Google Scholar] [CrossRef]
Xu, G.; Ma, T.Q.; Zang, T.W.; Sun, W.T.; Wang, Q.H.; Ma, J.P. OPUS-DOSP: A Distance- and Orientation-Dependent All-Atom Potential Derived from Side-Chain Packing. J. Mol. Biol. 2017, 429, 3113–3120. [Google Scholar] [CrossRef]
Yang, J.Y.; Anishchenko, I.; Park, H.; Peng, Z.L.; Ovchinnikov, S.; Baker, D. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl. Acad. Sci. USA 2020, 117, 1496–1503. [Google Scholar] [CrossRef]
Xu, G.; Wang, Q.H.; Ma, J.P. OPUS-X: An open-source toolkit for protein torsion angles, secondary structure, solvent accessibility, contact map predictions and 3D folding. Bioinformatics 2022, 38, 108–114. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Abadi, M.; Barham, P.; Chen, J.M.; Chen, Z.F.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
Xu, G.; Wang, Q.H.; Ma, J.P. OPUS-Mut: Studying the Effect of Protein Mutation through Side-Chain Modeling. J. Chem. Theory Comput. 2023, 19, 1629–1640. [Google Scholar] [CrossRef]
Haas, J.; Barbato, A.; Behringer, D.; Studer, G.; Roth, S.; Bertoni, M.; Mostaguir, K.; Gumienny, R.; Schwede, T. Continuous Automated Model EvaluatiOn (CAMEO) complementing the critical assessment of structure prediction in CASP12. Proteins 2018, 86, 387–398. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The head-to-head comparison of the average PCC achieved by different methods. (A) Comparison between two sequence-based methods: OPUS-BFactor-seq and Pandey et al. (sequence features). (B) Comparison between two structure-based methods: OPUS-BFactor-struct and ProDy. (C) Comparison between two structure-based methods: OPUS-BFactor-struct and Pandey et al. (all features). (D) Comparison between two modes of OPUS-BFactor: OPUS-BFactor-struct and OPUS-BFactor-seq.

Figure 2. The average PCC achieved by different methods [16,39] on the targets categorized by their lengths and subfamilies. (A) The average PCC of different methods on the targets with varying lengths. (B) The average PCC of different methods on the targets belonging to distinct subfamilies. The subfamilies were primarily defined based on the secondary structure of the targets, with the categories of mainly alpha helix, beta sheet, and coil representing targets where more than 50% of the residues belonged to the respective secondary structure types. The numbers enclosed in parentheses denote the total count of targets within each of the subgroups.

Figure 3. Protein B-factor prediction results of different methods [16,39] against the real normalized protein B-factor values for the target 2023-05-06_00000171_1. (A) Comparison between real values and predictions from Pandey et al. (sequence features). (B) Comparison between real values and predictions from OPUS-BFactor-seq. (C) Comparison between real values and predictions from ProDy. (D) Comparison between real values and predictions from Pandey et al. (all features). (E) Comparison between real values and predictions from OPUS-BFactor-struct.

Figure 4. Protein B-factor prediction results of different methods. Results are colored in a spectrum (from blue to red) according to the B-factor of each C_α atom using PyMOL software-v2.4. (A) Prediction results on 2023-05-06_00000066_1. (B) Prediction results on 2023-05-13_00000063_1. (C) Prediction results on 2023-05-06_00000171_1.

Figure 5. The average Pearson correlation coefficient (PCC) of OPUS-BFactor-seq using different evolutionary features as inputs on CAMEO82.

Figure 6. OPUS-BFactor-seq prediction results using the sequence information exclusively. (A) Results for the T4 lysozyme. (B) Results for the tumor suppressor p53. The results are colored in a spectrum (from blue to red) according to the B-factor of each C_α atom using PyMOL software-v2.4. The B-factors of the two residues located at both ends of the sequences were set to 0. In (A), region A includes residues between 20 and 23, and region B includes residues between 35 and 39. In (B), region C includes residues between 116 and 124.

Figure 7. Overview of the OPUS-BFactor framework. OPUS-BFactor takes in three primary inputs: 1D protein sequence features, 2D residue–residue contact features, and 1D ESM-2 protein evolutionary features. In the first mode, OPUS-BFactor-seq, the structure-based features (first two) are set to zero, allowing predictions to be based solely on the protein sequence. In the second mode, OPUS-BFactor-struct, the ESM-2 features are set to zero. For a target protein, OPUS-BFactor predicts the normalized B-factor of C_α atoms for all residues.

Table 1. The average (median) Pearson correlation coefficient (PCC) of different methods on each test set.

	CAMEO65	CASP15	CAMEO82
Sequence-based methods
Pandey et al. [16]	0.37 (0.30)	0.20 (0.24)	0.33 (0.30)
OPUS-BFactor-seq	0.50 (0.56)	0.34 (0.41)	0.58 (0.60)
Structure-based methods
ProDy [39]	0.31 (0.33)	0.25 (0.28)	0.43 (0.44)
Pandey et al. [16]	0.38 (0.42)	0.33 (0.35)	0.41 (0.43)
OPUS-BFactor-struct	0.61 (0.69)	0.48 (0.56)	0.67 (0.69)

Table 2. The average PCC between the real B-factors and the predicted B-factors generated by OPUS-BFactor-seq, as well as the average PCC between the real B-factors and the pLDDT values derived from the ESMFold predicted structures on each test set.

	CAMEO65	CASP15	CAMEO82
pLDDT (ESMFold)	0.28	0.24	0.38
OPUS-BFactor-seq	0.50	0.34	0.58

Table 3. The average PCC achieved by different methods on the targets categorized based on the structural prediction accuracy of ESMFold. The numbers enclosed in parentheses denote the total count of targets within each of subgroups.

	TM Score > 0.9 (89)	0.8 < TM Score < 0.9 (32)	Others (60)
pLDDT (ESMFold)	0.42	0.34	0.14
OPUS-BFactor-seq	0.61	0.48	0.33
OPUS-BFactor-struct (ESMFold)	0.63	0.55	0.34
OPUS-BFactor-struct (PDB)	0.67	0.64	0.48

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, Y.; Lv, Y.; Luo, Z.; Wang, Q.; Xu, G.; Ma, J. OPUS-BFactor: Predicting Protein B-Factor with Sequence and Structure Information. Molecules 2025, 30, 2570. https://doi.org/10.3390/molecules30122570

AMA Style

Yang Y, Lv Y, Luo Z, Wang Q, Xu G, Ma J. OPUS-BFactor: Predicting Protein B-Factor with Sequence and Structure Information. Molecules. 2025; 30(12):2570. https://doi.org/10.3390/molecules30122570

Chicago/Turabian Style

Yang, Yulu, Ying Lv, Zhenwei Luo, Qinghua Wang, Gang Xu, and Jianpeng Ma. 2025. "OPUS-BFactor: Predicting Protein B-Factor with Sequence and Structure Information" Molecules 30, no. 12: 2570. https://doi.org/10.3390/molecules30122570

APA Style

Yang, Y., Lv, Y., Luo, Z., Wang, Q., Xu, G., & Ma, J. (2025). OPUS-BFactor: Predicting Protein B-Factor with Sequence and Structure Information. Molecules, 30(12), 2570. https://doi.org/10.3390/molecules30122570

Article Menu

OPUS-BFactor: Predicting Protein B-Factor with Sequence and Structure Information

Abstract

1. Introduction

2. Results

2.1. Performance of Different B-Factor Prediction Methods

2.2. Correlation Between Protein B-Factors and the pLDDT Values from Structure Prediction Methods

2.3. Evaluation of Different Evolutionary Profiles

2.4. Case Study

3. Method

3.1. Framework of OPUS-BFactor

3.2. Datasets

3.3. Performance Metrics

3.4. Data and Software Availability

4. Concluding Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI