The three-dimensional structures of proteins are important biomolecular data in structure-based drug design [1
]. Protein structures are usually determined by three techniques: X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, and electron microscopy (EM). In X-ray crystallography, a protein structure is deduced from the unique diffraction patterns of the protein crystal. The molecular structure derived from X-ray experiments has always been considered as the most accurate structural model. However, as the purification and crystallization of proteins is very difficult and time-consuming, the number of solved protein structures remains much lower than the number of protein sequences. Meanwhile, NMR and EM require specialized equipment and facilities, which prevent their large-scale application. To overcome these problems, researchers have developed computational methods for protein-structure prediction. Popular methods include Modeller [3
], SWISS-MODEL [4
], Rosetta [5
], I-TASSER [7
], FALCON [8
], Raptor/RaptorX [9
], and IntFOLD [11
] (see [12
] for recent comprehensive reviews of the prediction theory and methods). Prediction functions are also available in some commercial software packages such as Internal Coordinate Mechanics, Molecular Operating Environment, and Schrödinger. Owing to their different algorithms and scoring strategies, these methods can predict very different structural models for the same protein sequence. For selecting the best predicted model, other means to evaluate the quality of a protein model are needed. Initially, a model selection function is included as a component in some structure prediction methods, but more and more independent methods have emerged in recent years. These methods are collectively called estimation of model accuracy (EMA) methods (formerly, model quality assessment methods). As their name suggests, these methods estimate how accurately the model fits the actual native structure, which is still unknown. A global-level EMA gives the average quality of a model, whereas a local-level EMA indicates the prediction quality of a segment of residues or a group of atoms. Due to the importance of model evaluation and ranking, Critical Assessment of Protein Structure (CASP) challenges started to assess EMA methods (QA category) since CASP7.
Machine learning (ML) and deep learning (DL) have proven their effectiveness in natural language processing, image processing, computer vision, speech recognition, and other computing domains. These successes have attracted the attention of researchers in bioinformatics and computational biology [2
]. Thus far, ML and DL have been applied in protein classification and the predictions of protein structure and function, protein–ligand binding affinity, and protein–peptide/protein–DNA binding sites. Whereas traditional EMA methods are mainly based on energy, physicochemical, or statistical considerations [19
], ML-based EMA methods combine multiple types of information. Recent methods can recognize the latent features such as protein contact pattern [20
] and atom density map [21
] from native structures. The superiority of ML-based EMA methods has been confirmed by their high rankings in CASP challenges.
ML-based EMA methods can be categorized into four major types (Figure 1
): single-model, multi-model (also called consensus or clustering models), quasi-single, and hybrid methods. Methods for single models perform inherent feature extraction, with no reliance on external predictors. Their predictions are mainly based on the geometric and energetic analysis of a single-protein structural model. In contrast, multi-model methods cluster and extract the consensus information from a pool of protein structural models generated by multiple methods or from different templates [22
]. Multi-method models assume that the correct structure is embedded in the recurring structural patterns of the model ensemble [25
]. Therefore, the performance of a multi-model method depends on the quality and size of the model pool. A large model pool (possibly including tens of methods and tens to hundreds of models [26
]) provides an accurate structure, but at high computational cost. Before CASP11, multi-model methods always outperformed single-model methods. In CASP11, single-model methods surpassed multi-model methods because of advancements in energy features and ML techniques [19
]. However, multi-model methods achieved spuriously high performance in CASP13 compared with single-model methods; this is due to the significant improvements of protein structure prediction methods in recent years, leading to a high-quality model pool [31
]. Meanwhile, quasi-single methods score a model by referencing a set of models generated within their internal pipeline, rather than by pooling externally generated models. In this sense, they differ from multi-model methods [22
]. Finally, hybrid or combined approaches [33
] combine the quality scores or patterns of different EMA algorithms (both single-model and multi-model) by weighting or ML algorithms. The final scores are more accurate than any of the single scores.
This review focuses on the ML techniques currently used in EMA methods. The remainder of the review is organized as follows. In Section 2
, we briefly introduce the concepts and recent progress of ML, protein-structure prediction, the CASP challenge, and the popular features with data sources for training and evaluating ML-based EMA methods. After screening the citations, representativeness, reproducibility (available server or source code), and release time, we obtained 17 applications. Section 3
lists and compares these 17 applications in detail. Finally, we summarize the current developments and highlight the challenges and future directions of EMA research.
4. Summary and Future Perspectives
Motivated by the importance of protein structures, researchers have actively sought quality assessment methods for protein models over the past two decades. With modern advances in ML algorithms, ML methods have become the mainstream techniques for protein quality assessment, and their prediction quality has remarkably improved. After reviewing the major applications and breakthroughs of ML-based EMA methods, we made four observations:
First, most of the EMA methods are single-model methods. This trend is reflected in the number of single-model EMA methods in the CASP of each year, which increased from five in CASP10 to 22 in CASP12 and 33 in CASP13 [19
Second, NN and SVM are the most popular techniques. The surging popularity of DL has increased the number of CNN-based EMA methods in the past three years [21
]. These methods learn from only a few low-level input features, which promises to eliminate or reduce the effort of heavy feature engineering.
Third, a systematic and quantitative performance comparison of ML-based and non-ML-based methods is precluded because the benchmarks, EMA tasks, and training/evaluation data differ between the two method types. Nevertheless, the superior performance of ML-based methods over non-ML-based methods is evidenced by two facts: the popularity of ML-based approaches in EMA methods and the excellent performance of ML-based approaches in CASP. The former trend is reflected in the increasing number of ML-based EMA methods in recent CASP challenges. In the last CASP (CASP13), the 18 top-performing EMA methods proposed by six groups/laboratories included 12 NN-based methods, two SVM-based methods, three linear regression methods, and one knowledge-based potential method [20
]. All of these methods except the last are related to ML. Moreover, ProQ2 was the most successful EMA method in the CASP11 challenge [30
], whereas SVMQA and ProQ3 selected the best models from the model pool with excellent performance. These three methods are SVM-based EMA methods. In addition, the NN-based ModFOLD6 method reasonably predicted the global quality score in CASP12 [19
]. These performances also highlight the excellent performance of ML in the quality assessment of the protein structure.
Fourth, the emergence of deep learning techniques has profoundly affected the performance of protein structure prediction methods. With the high quality protein models generated by DL-based prediction servers, the difficulty for EMA methods to differentiate these models accurately has increased. It is important to note that the pool of high quality models might lead to spuriously good performance in consensus methods as seen in the CASP13 assessment [31
]. As most EMA methods are always trained on previous CASP models, this also poses the question of how the next generation EMA methods can meet the more stringent requirements of the ever-improved high quality models.
ML-based EMA methods are certainly meritorious, as on average, the best EMA methods select models that are better than those provided by the best server; however, so far, no single EMA method can always select the best model for a target [20
]. This suggests that the best ML-based EMA methods are yet to come. Most of the ML algorithms are inputted with multiple features such as energy-based features, basic physicochemical features, and statistical features. Experimental results show that inputting different feature categories and different combination of features can change the performance of the algorithm [84
]. Therefore, the features must be carefully selected. Finding the best feature combination is a future research direction. Although the RF algorithm is available for feature screening [23
], it is not widely used for this purpose. On the other hand, because CNN-based EMA methods use the low-level (raw) features, they negate the need for feature screening. For example, the only input features of 3DCNN_MQA are 11 types of atom density map.
Meanwhile, the optimal use of ML in model accuracy evaluations is underdeveloped [20
]. The number of new DL approaches increases each year, providing increasingly advanced ML approaches for EMA research. For example, AngularQA [78
], which has been recently proposed for quality assessment of protein structures, is the first EMA method built with the LSTM architecture. Innovative ML approaches provide another avenue for improving current EMA methods. For example, ProQ4 [21
] has a multi-stream network architecture and adopts an innovative transfer-learning approach. These constructs improve the global-score prediction and the selection from the model pool.