COMTOP: Protein Residue–Residue Contact Prediction through Mixed Integer Linear Optimization

Protein contact prediction helps reconstruct the tertiary structure that greatly determines a protein’s function; therefore, contact prediction from the sequence is an important problem. Recently there has been exciting progress on this problem, but many of the existing methods are still low quality of prediction accuracy. In this paper, we present a new mixed integer linear programming (MILP)-based consensus method: a Consensus scheme based On a Mixed integer linear opTimization method for prOtein contact Prediction (COMTOP). The MILP-based consensus method combines the strengths of seven selected protein contact prediction methods, including CCMpred, EVfold, DeepCov, NNcon, PconsC4, plmDCA, and PSICOV, by optimizing the number of correctly predicted contacts and achieving a better prediction accuracy. The proposed hybrid protein residue–residue contact prediction scheme was tested in four independent test sets. For 239 highly non-redundant proteins, the method showed a prediction accuracy of 59.68%, 70.79%, 78.86%, 89.04%, 94.51%, and 97.35% for top-5L, top-3L, top-2L, top-L, top-L/2, and top-L/5 contacts, respectively. When tested on the CASP13 and CASP14 test sets, the proposed method obtained accuracies of 75.91% and 77.49% for top-L/5 predictions, respectively. COMTOP was further tested on 57 non-redundant α-helical transmembrane proteins and achieved prediction accuracies of 64.34% and 73.91% for top-L/2 and top-L/5 predictions, respectively. For all test datasets, the improvement of COMTOP in accuracy over the seven individual methods increased with the increasing number of predicted contacts. For example, COMTOP performed much better for large number of contact predictions (such as top-5L and top-3L) than for small number of contact predictions such as top-L/2 and top-L/5. The results and analysis demonstrate that COMTOP can significantly improve the performance of the individual methods; therefore, COMTOP is more robust against different types of test sets. COMTOP also showed better/comparable predictions when compared with the state-of-the-art predictors.


Introduction
Protein contact prediction aims at predicting which residues of a protein are in contact. Two non-local residues are far away from each other in the protein primary structure, but they are close to each other in the 3D structure. Protein contact prediction is helpful in determining protein structure, model ranking, selection, and evaluation [1,2] and is also important for other fields in evolutionary biology and biotechnology, such as protein AlphaFold2, a novel strategy that used a different deep learning technique than CASP13 AlphaFold to simulate protein 3D structures. However, it had some targets where the prediction was not very good. The protein targets set at the CASP14 conference do not fully represent all the proteins with many unique structural prediction issues. Thus, the algorithm could not be universally applicable to all proteins.
COMTOP is the first consensus method for protein contact prediction using MILP to maximize the probability of the sum of residue contacts, based on the previous work [44]. This method uses seven selected residue-residue contact prediction methods, including CCMpred [15], EVfold [4], DeepCov [19], NNcon [28], PconsC4 [20], plmDCA [13], and PSI-COV [21]. COMTOP maximally combines the strengths of seven protein contact prediction methods by optimizing the number of correctly predicted pairs of residues in the training set. A consensus prediction score based on the confidence scores of the seven individual methods is initiated to assess the likelihood of a residue pair being at one of the protein contact states. Our method performed well compared with seven individual methods when tested by 239 proteins, and a prediction accuracy of about 89.04%, 94.51%, and 97.35% for top-L, top-L/2, and top-L/5 predicted contacts, respectively, was obtained. When tested on CASP13, CASP14, and 57 non-redundant TM proteins, the consensus method achieved accuracies of 75.91%, 77.49%, and 73.91% for top-L/5 predictions, which was better than the seven individual methods and could achieve state-of-the-art prediction performance.

Data Description
High-quality training sets and test sets are crucial for the development and validation of prediction models. To train our proposed methods, a training set and a validation set were constructed as follows. We first downloaded a list of 3298 protein chains from PISCES website [62] with a maximum sequence identity of 20%, a maximum R-factor of 0.3, and resolutions better than 2.0 Å. Further, we removed the protein chains with less than 50 amino acids, and 3133 chains (Table S1) were left for our database construction. Then we generated the confidence score by the seven locally installed methods (CCMpred, EVfold, DeepCov, NNcon, PconsC4, PlmDCA, PSICOV) based on 3133 proteins, but some methods could not predict the confidence score for some protein chains. We deleted those proteins from our dataset because our method combines seven individual methods. Following this criterion, we had 1189 proteins in our dataset. The selected protein set was divided into two parts: a training set with 950 proteins (Table S2) and a testing set with 239 proteins (Table S3). Then we ranked the confidence score for the training set and test set, and prepared the dataset for MILP model by selecting the top-5L, top-3L, top-2L, top-L, top-L/2, and top-L/5 predictions where L was the length of a protein.
The second test set was based on the CASP13 protein domains (http://www.predicti oncenter.org/download_area/CASP13/, accessed on 5 March 2021). There were 32 target domains, and certain methods were unable to predict the confidence score for two target domains, so they were excluded from the test set, leaving 30 target domains in the CASP13 dataset, which are listed in Table S4 of the Supplementary Materials. The third test dataset was based on the CASP14 target domains (https://www.pr edictioncenter.org/casp14/index.cgi, accessed on 5 March 2021). There were 45 target domains, and 28 of them had PDB IDs. After excluding the targets that were unable to obtain confidence scores from all the seven methods, the CASP14 dataset contained 25 target domains, as shown in Table S5.
Finally, we assessed our method on a non-redundant α-helical TM protein test set consisting of 57 α-helical TM proteins through culling the α-helical TM proteins in the PDBTM database against the training test sets of COMTOP and the individual methods NNcon, DeepCov, and PconsC4 (EVfold, plmDCA, PSICOV, and CCMpred had no training sets) with a maximum sequence identity of 25%, a maximum R-factor of 0.3, and resolutions better than 2.0 Å. The 57 α-helical TM proteins are listed in Table S6 of the Supplementary Materials.

List of the Sets, Parameters and Variables
This section lists the sets, parameters, and variables used in this method.
(I) Indices and sets (I,J): set of pairs of amino acid positions of a protein, i∈I, j∈J; P: set of training proteins, p∈P; M: set of the methods used, [1][2][3][4][5][6][7], m∈M. Seven methods were used in this consensus method: m = 1 indicates the CCMpred method; m = 2 indicates the DeepCov method; m = 3 indicates the EVfold method; m = 4 indicates the NNcon method; m = 5 indicates the PconsC4 method; m = 6 indicates the plmDCA method; m = 7 indicates the PSICOV method; and subsetP(I,J)(p,(I,J)): subset indicates the number of pairs of residues for each protein p.
(II) Parameters confS(p,(i,j),m): the confidence score predicted by method m for pairs of residues (i,j) of a protein p, p∈P, (i,j)∈subsetP(I,J)(p,(I,J)), m∈M. Firstly the score was normalized for each protein; we used min-max normalization, where x ij is the i th and j th score for a residue pair (i,j) of a protein. Then we selected the top 4.5 L prediction score from the top 5 L prediction score for each protein and created a score matrix by using top 4.5 L scores of method M; however, the score matrix generated nan-value because all methods were unable to predict the same pair of residues of a protein.
(III) Binary variables y(p, (i,j)): equals to 1 if the sum of the scores of the correct contact predictions is higher than the sum of the incorrect ones for residue pair (i,j) of a protein p by at least (p, (i, j)), p∈P, i,j∈subset(I,J) (P, (I,J)); y2(p): equals to 1 if the sum of the scores of the correct contact predictions of all pairs of amino acid (i,j) of a protein p is higher than the sum of the score of the incorrect prediction of the same protein p by at least 2(p), p∈P, i,j∈subset(I,J) (P, (I,J)); (IV) Positive variables λ(m): the weight variables for different methods, 0 ≤ λ(m) ≤ 1, m∈M; (p, (i, j)): a soft margin variable for the binary variable y((p, (i,j)), p∈P, i,j∈subset(I,J) (P, (I,J)) (see Section 2.2.1 (III)); and 2(p) : a soft margin variable for the binary variable y2(p), p∈P (see Section 2.2.1 (III))

The Training Objective Function
For protein contact prediction, the training objective function of the MILP model takes the following format: where y(p,(i,j)) is a set of binary variables, and it equals to 1 if the sum of the scores of the correct contact predictions is higher than the sum of the incorrect ones for pairs of amino acids (i,j) of a protein p by at least (p,(i,j)), and this objective function is to maximize the total number of pair of residues. (p, (i, j)) is included here to minimize the sum of soft margins. The training objective function was conducted on the individual contact/pairs of residues of a protein. The principle of this function is that some protein contact prediction approaches have better prediction performance in some native regions of a protein than other approaches. The consensus approach aims to identify the correct contact prediction for proteins from various approaches by relying on confidence scores for each contact of residues in a protein.

The Model Constraints
For the protein contact prediction, there are two basic constraints in the consensus scheme. The first constraint makes sure that the binary variable y(p,(i,j)) is equal to zero for each contact of residues of a protein if the difference between the sum of the scores of correct contact predictions and the sum of the scores of incorrect predictions from different methods is lower than e(p,(i,j)); this constraint is expressed as: ∑ m λ(m) * con f S(p, (i, j), m) * (1 − predSS(p, (i, j), m)) − ∑ m λ(m) * con f S(p, (i, j), m) * predSS(p, (i, j), m) + (p, (i, j)) < 1 − y(p, (i, j)), ∀ (p, (i, j)) ∈ subsetP(I, J)(P, (I, J)), m ∈ M The second type of constraint used in the model normalizes the weights terms λ(m) of the seven methods.

The Prediction Score and Prediction Label
By carefully selecting the weight variables λ(m) of each individual method through the MILP optimization-based approach, the consensus method was developed to score higher for the correct contact predictions than for the incorrect contact predictions from different methods. The consensus method ensures that pairs of residues have contact if the sum of scores of correct predictions is higher than the sum of scores of incorrect predictions of each pairs of residues for different methods. It is expressed as S(p, (i, j), m) = ∑ m λ(m) * con f S(p, (i, j), m) , ∀ (p, (i, j)) ∈ subsetP(I, J)(P, (I, J)), m ∈ M where S(p,(i,j),m) is the consensus contact confidence score for (i,j) residues of pth protein and mth method, conf (p,(i,j), m) is the confidence score for the mth method, λ(m) is the weighting factor for the mth method.
The consensus label matrix is the prediction result of method m for the (i,j) residues of a protein p (a value of 1 corresponds to a true prediction, a value of 0 corresponds to a false prediction). It is expressed by Label matrix = 1, returned prediction by methods 0, not returned prediction by methods

The Training and Prediction Procedure
The MILP based training system uses CPLEX (ILOG CPLEX 8.0 reference manual) to optimize the training objective function, from which the weight parameters λ(m) are attained. The training system for each fold takes around two/four weeks. For the prediction procedure, the seven individual programs run in parallel rather than serially. Once we get the results from the individual methods, the prediction from the MILP model of COMTOP can finish in one second. The running time taken by COMTOP depends on the slowest time of the seven methods, and each protein contact prediction takes around 5-25 min, depending on the size of the database used for sequence-profile/MSA analysis.

Evaluation Measures for Prediction Performance
The effectiveness of our proposed method was evaluated by five widely used metrics: the prediction accuracy, coverage, specificity, negative predictive value (NPV), and Matthews's correlation coefficient (MCC).
The accuracy is defined as the ratio of correct predictions to total predictions. Accuracy can also be written in terms of true positives (TP) and false positives (FP), as shown in Equation (9). A higher value of accuracy means a better contact prediction model.
where N corr is the number of correctly predicted protein contacts, N pred is the number of total predicted contacts, TP is the number of true positive contacts, and FP is the number of false positive contacts. Coverage, also called the true positive rate or referred to as "sensitivity" is defined as the ratio of correct predictions to the number of protein contacts in the native structure, as shown in Equation (10). A higher value of coverage means a better contact prediction model. Coverage = N corr N native = TP TP + FN (10) where, N corr is the number of correctly predicted protein contacts, N native is the number of protein contacts in the native structure, and FN is the number of false negative contacts. Specificity, also called the true negative rate, is the percentage of predicted contacts that are present in the native structure, as shown in Equation (11). It denotes how good the test is at identifying negative conditions.
where TN is the number of true negative contacts. When a screening test returns a negative result, the negative predictive value (NPV) is the probability of properly detecting all pairs of residues that do not have contact from among pairs of residues that might or might not have contact and can be calculated using the following formula: The last metric used to measure the performance of the contact prediction method was Matthew's correlation coefficient (MCC), a measure of the quality of two-class classifications, which can be calculated using the following formula: The proposed method was developed with the purpose of producing higher accuracy, so we consider that accuracy should have a higher weight than the other metrics.

Performance Evaluation Based on the Training Set
The overall workflow of protein contact prediction is illustrated in Figure 1. The measure of COMTOP performance depends on the weight values given by seven individual methods, which balances the accuracy, coverage, specificity, and MCC. For the training process, a set of optimal parameters from the MILP model were generated (listed in Supplementary Table S7); these parameters were the weights for the seven individual methods that should be used in the consensus prediction model.   Figure 2A shows that PconsC4 generated the highest weight value and DeepCov generated the second highest weight value, while the PSICOV and NNcon generated the lowest weight values at different sample sizes. The fact that different systems reveal different weights is owing to the different prediction accuracies of each method. Similarity between seven different approaches plays an important role in determining the weight values (Table 1). Jaccard's similarity coefficient value was 0.585 between PconsC4 and DeepCov, so PconsC4 and DeepCov generated the highest weight values. CCMpred, PSICOV, and EVfold are similar type methods (see Table 1) and generated lowest weight values. This explains why CCMpred, plmDCA, NNcon, and PISICOV methods had the smallest weights. On the other hand, note that even though the CCMpred method has the third highest accuracy after PconsC4 and DeepCov for contact prediction, its weight for two cases was very small. This is because many of these seven contact prediction methods use PSI-Blast, HHblits, or Jackhmmer (e.g., CCMpred, EVfold, plmDCA, and PISICOV), artificial neural networks use NNcon and deep learning approaches (e.g., DeepCov, PconsC4), and the prediction results of the different methods may correlate with each other in some fashion.   Figure 2B shows overall performance based on the training dataset, the average accuracies, coverages, specificities, and MCCs for COMTOP are plotted against different sample sizes. The prediction accuracy, coverages, and specificities were highest when the sample sizes were small, such as top-L/5 and top-L/2 predictions, respectively, while the accuracy and coverage decreased monotonically with the increasing sample size. This represents the classic trade-off phenomenon common to many prediction problems. Although MCC is also an important estimator in protein contact prediction evaluation, this value increased with increasing sample size. The prediction accuracy of the different training models based on the different sample size was 98.99%, 96.88%, 92.18%, 89.91%, 85.77%, and 79.09%, respectively.

Performance Evaluation Based on the Independent Set
The measure of COMTOP's performance for residue contact prediction depends on the best range that balances the accuracy, coverage, specificity, NPV, and MCC. We used four datasets to evaluate the performance of our method (see Section 2.1). Figures 3 and 4 and Table 2 summarize the performance of COMTOP in terms of accuracy (alternatively known as positive predictive value) compared with the seven individual methods when applied to the 239 test proteins.   We have evaluated the accuracy of the top L/k (k = 5, 2, 1) and top K*L (K = 5, 3, 2, 1) predicted contacts where L is the length of a protein. The prediction accuracy is defined as the percentage of native contacts among the top L/k and KL predicted contacts. On 239 proteins, we evaluated our method using the suggested weight values. The prediction results of our method, together with the other seven individual methods, are shown in Table 2 and Figure 3. We observe that our method generated the highest prediction scores compared with the seven individual methods.
The COMTOP model has six sub-models (see in Table S7 Figure 3A shows the overall performance of the COMTOP model in terms of its average accuracy. For comparison, results for CCMpred, EVfold, DeepCov, NNcon, PconsC4, PlmDCA, and PSICOV are also shown. Clearly, the COMTOP model significantly outperforms the other individual methods. To see this in more detail, Figure 3B-E shows the Bland-Altman plot indicating the relationship of COMTOP against PconsC4 and DeepCov. For Figure 3B-E, the majority of the points are above the zero line, and about 95%, 95.8%, 95.4%, and 97.5% fall within the confidence limit, respectively. The mean/bias values of the differences are also all positive, indicating that COMTOP outperformed DeepCov and PconsC4 for test datasets [63].
In Figure 4, the average accuracies, coverages, specificities, NPVs, and MCCs for COMTOP are plotted against the different sample sizes on the 239 non-redundant proteins. The prediction accuracies and specificities are highest when the sample sizes are small, such as top-L/5 predictions, while the accuracy and specificities decrease monotonically with the increasing sample size. The accuracies of the COMTOP model for top-5L, top-3L, top-2L, top-L, top-L/2, and top-L/5 predictions were 59.68%, 70.79%, 78.86%, 89.04%, 94.51%, and 97.35%, respectively, while the corresponding NPVs were 99.22%, 95.41%, 91.66%, 82.34%, 70.10%, and 59.85%, respectively [64]. The coverage improves monotonically with the increase of the sample size. This represents the classic trade-off phenomenon common to many prediction problems. Although MCC is also an important estimator in protein contact prediction evaluation, this value is highest when the sample size is top-L and top-2L. More importantly, top-L is the best range for protein contact prediction. Concerning the number of contacts required for accurate folding, the top-L contacts have been shown to produce good results [65,66]; nevertheless, the researchers have recommended that the number of contacts required be specific to the prediction methods.

Testing on CASP13 Targets
The critical assessment of protein structure prediction (CASP) is a biennial worldwide competition for protein structure prediction, identifying what progress has been made and highlighting where future effort may be most productively focused. The competition unfolds in a double-blind fashion: The structures of the target domains are unknown to the predictors and the organizers (http://predictioncenter.org/download_area/CASP13/, accessed on 25 April 2021).

Comparison of COMTOP's Performance with the Seven Individual Methods
In this section, we tested COMTOP on 30 domains in CASP13 using the proposed parameter sets listed in Table S7. The prediction results of COMTOP, together with the other seven individual methods, are shown in Table 3 and Figure 5A, and the COMTOP model significantly beat the other individual techniques. As shown in Table 3, the improvement of COMTOP in accuracy over the seven individual methods increases with the increasing number of predicted contacts. For example, the improvement of COMTOP in accuracies over the top individual method DeepCov were 27.1%, 23.40%, 22.0%, 14.1%, 9.5%, and 0.81% for top-5L, top-3L, top-2L, top-L, top-L/2, and top-L/5 predictions, respectively.  Figure 5B-E as Bland-Altman plots shows the relationship of COMTOP against PconsC4 and DeepCov. For Figure 5B-E, the majority of the points are above the zero line, and about 93.4%, 93.4%, 93.4%, and 96.7% fall within the confidence limit, respectively. The mean/bias of the differences are also shown to be all positive, indicating that COMTOP outperformed DeepCov and PconsC4 for CASP13 datasets. In Figure 6, the average accuracies, coverages, specificities, NPVs, and MCCs for COMTOP are plotted against the different number of contact predictions for 30 domains on CASP13 datasets. For top-L/5 predictions, the accuracy and specificity of the COMTOP model were 75.91% and 91.24%, respectively, while NPVs and coverage were 64.66% and 61.45%. Although MCC is also an important estimator in protein contact prediction evaluation, this value was highest when the sample size was top-5L and top-2L, respectively.   Table 4 shows the accuracies achieved by COMTOP on the domains classified as FM/TBM-easy/TBM-hard/FM/TBM based on CASP13 datasets. Over these target domains, COMTOP achieved average accuracies of 75.91% and 73.90% when considering the top-L/5 and top-L/2 predictions. For top-L/5 predictions, COMTOP showed prediction accuracies larger than 90% for 18 domains and accuracies of about 100% for 16 of these domains. Remarkably, our system obtained high accuracies for TBM-easy, TBM-hard, and FM/TBM classification domain. Among all the individual methods, DeepCov performed best with an accuracy value of 21.40%, 29.80%, 38.10%, 52.60%, 64.40%, and 75.10% for top-5L, top-3L, top-2L, top-L, top-L/2, and top-L/5 predictions, respectively. CCMpred, EVfold, and plmDCA methods had the same ranking as the test set, with 239 proteins, but DeepCov, PconsC4, PSICOV, and NNcon showed different rankings. The prediction accuracies of all the methods were ranked as follows: DeepCov > PconsC4 > CCMpred > EVfold > plmDCA > NNcon > PSICOV, whereas the ranks of all methods based on the test set with 239 proteins were as follows: PconsC4 > DeepCov > CCMpred > EVfold > plmDCA > PSICOV > NNcon. The prediction accuracy of COMTOP was reduced from an accuracy of 97.35% on the 239 proteins to 75.91% on the CASP13 dataset for the top-L/5 predictions. Moreover, the prediction accuracy of COMTOP was reduced about 11.18%, 17.59%, 18.76%, 22.34%, 20.61%, and 21.45% in CASP13 for top-5L, top-3L, top-2L, top-L, top-L/2, and top-L/5 predictions, respectively. Among the seven individual methods, PconsC4 method had the largest decrease in accuracy, about 13.29%, 17.25%, 29.75%, 28.59%, 30.51%, and 31.86% in CASP13 for top-5L, top-3L, top-2L, top-L, top-L/2, and top-L/5 predictions, respectively. The CCMpred method had the second largest decrease in accuracy, about 8.91%, 12.14%, 19.56%, 22.13%, 27.88%, and 31.70% in CASP13 for top-5L, top-3L, top-2L, top-L, top-L/2, and top-L/5 predictions, respectively. On the other hand, PSICOV performed better for CASP13 targets than the test dataset only for top-L/5 predicted contacts; the prediction accuracy improved from 40.45% to 42.06%.

Comparison of COMTOP's Performance with a Few State-of-the-Art Schemes
Finally, we compared our COMTOP system with a few state-of-the-art systems that were not used for developing COMTOP; another set of five contact prediction systems was chosen: RapterX [29], Yang_Server [67], TripletRes [68], ResTriplet [68], and DNCON3 [69]. For this assessment, 20 domains were selected from the CASP13 dataset for which the native structures were publicly available, and all 7 methods generated the results. These 20 domains included both FM (free-modeling) and TBM (template-based modeling) domains. To get the predicted contacts for other methods, we evaluated the contact predictions of the top-L/2 and top-L/5 groups in CASP13 over these 20 targets from the webserver at https://predictioncenter.org/casp13/rrc_results.cgi, accessed on 25 April 2021.
The performance of the COMTOP scheme, together with five state-of-the-art systems in terms of accuracy of top-L/2 and top-L/5 predictions is shown in Table 5. From the table, we can see that COMTOP performed better than Yang_server, TripletRes, ResTriplet, and DNCON3 schemes for the top-L/2 contacts and better than TripletRes, ResTriplet, and DNCON3 schemes for the top-L/5 predictions. On the other hand, our model had accuracies of 84.02% and 88.87% for top-L/2 and top-L/5 predictions, respectively, after RapterX. In addition, Yang_server has a slightly better performance than our method for top-L/5 predictions.

Testing on CASP14 Targets
In this section, we tested COMTOP on 25 CASP14 target domains using the proposed parameter sets. These 25 domains in the CASP14 dataset included both FM and TBM domains, for which the native structures were publicly available. Table 6 shows the COMTOP prediction results alongside the seven individual methods, and the COMTOP model significantly outperformed the other methods.

Performance Comparison of COMTOP against the Seven Individual Methods
For top-L/2 and top-L/5 predictions, COMTOP achieved average accuracies of 68.33% and 77.49%, respectively. Our prediction accuracies were more than 90% for 13 domains, with 11 of these domains showing 100% accuracies for top-L/5 predictions. Among all the individual methods, DeepCov performed best with accuracies of 67.65% and 71.86% for top_L/2 and top_L/5 contacts, respectively. The overall ranking differed from the CASP13 dataset. DeepCov, PconsC4, PSICOV, and plmDCA methods had the same ranking as the CASP14 dataset, but the ranking of EVfold, NNcon, and CCMpred changed. The prediction accuracies of all the methods were ranked as follows: DeepCov > PconsC4 > NNcon > CCMpred > plmDCA > EVfold > PSICOV. As shown in Table 6, the improvement of COMTOP in accuracy over the seven individual methods increased with the increasing number of predicted contacts. For the CASP14 dataset, COMTOP's prediction accuracy for top-L/5 contacts increased by 1.58% compared with the CASP13 dataset; overall it reduced by 7.23%, 6.28%, 7%, 5.52%, and 5.57% for the top-5L, top-3L, top-2L, top-L, and top-L/2 predictions. In addition, COMTOP's prediction accuracy for the top-L/2, and top-L/5 contacts decreased by 26.18% and 19.86% in the CASP14 dataset from the 239 proteins test dataset. Other methods also showed a decreasing trend in accuracy for CASP14 than for CASP13 and the 239 proteins test dataset, indicating that CASP14 was more difficult than others. Among the seven individual methods, the best performing method, DeepCov, decreased in accuracy for top-L/5 contacts, approximately by 3.24% and 19.75% in the CASP14 from the CASP13 and the test set with 239 proteins, respectively. The second-best method, PconsC4, showed a 5.56% improvement in accuracy for top-L/5 contacts in the CASP14 from CASP13 dataset but a 26.3% decrease from the 239 proteins test dataset.

Performance Comparison of COMTOP against State-of-the-Art Methods
We compared our COMTOP model against a group of seven state-of-the-art contact prediction systems that were recently developed: MULTICOM-AI [70], Kiharalab_Contact [71], tFold, DeepPotential, trfold, RapterX [29], and TripletRes [68]. To get the prediction accuracy for state-of-the-art systems, we took the contact predictions of the top-L/2 and top-L/5 predictions of these methods from the webserver at https://predictioncenter.o rg/casp14/rrc_results.cgi, accessed on 25 April 2021. Table 7 compares the prediction accuracy of the COMTOP method with state-of-the-art systems on the CASP14 dataset. From Table 7, COMTOP performed better than tFold, MULTICOM-AI, RaptorX, trfold, and Kiharalab_Contact schemes for the top-L/2 and top-L/5 predictions, respectively. On the other hand, TripletRes and DeepPotential had better performance for top-L/2 and top-L/5 predictions.

Discussion
In this paper, the proposed hybrid framework COMTOP model used information from the seven individual methods that were different from each other in terms of both methodology and input features. The seven methods can be roughly classified into three different categories: traditional machine learning, evolutionary coupling analysis, and deep learning. These methods also rely on different input data types. As shown in our previous work [72], the prediction results of these methods show certain degrees of similarity and difference, and the differences of prediction results from methods in the different categories are larger than that in the same category. COMTOP selects the weight variables for each individual method through the MILP optimization-based approach. Our consensus method scores higher for the correct contact predictions than for the incorrect contact predictions from the different methods; thus, individual methods with higher accuracy usually obtain higher weights. The seven methods can complement each other in prediction performance, so our method produces higher accuracy compared with seven individual methods and shows better or close prediction performance when compared with other state-of-the-art methods.
These seven methods were selected among the available methods based on two criteria: methods that (1) have comparatively better prediction accuracy and (2) belong to different method categories and complement each other. These seven contact prediction methods are classified into two categories: (1) coevolution-derived information-based, and (2) machinelearning-based. Most of these coevolution-derived approaches have been used MSA as input, which can be generated by approaches such as PSI-Blast, HHblits, or Jackhmmer.
Most of these machine-learning-based methods have accepted a wide range of features as input, including features involved with the local window of the amino acids, amino acid type information (polarity and acidic properties), and the protein itself. This includes features such as mutual information of sequence profiles, information about the amino acid type (polarity and acidic properties), sequence profiles, sequence separation length between the amino acids under consideration, secondary structure, solvent accessibility, and pairwise information between all the amino acids involved [73]. The commonly used machine learning techniques for contact prediction are hidden Markov model, support vector machines, shallow neural networks and deep learning techniques.
COMTOP uses the confidence scores of the predicted contact and the weights of the seven individual methods to determine the protein residue-residue contact. The contact prediction of COMTOP is based on the sum of the products between the confidence score and the weight term over all methods; thus, the consensus method can also provide the confidence scores of the prediction for each pair of residues. Although the results for COMTOP use seven individual methods, COMTOP consistently shows better performance than the seven individual methods for top-5L, top-3L, top-2L, top-L, top-L/2, and top-L/5 contacts. It can be seen in Table 2 that the prediction accuracy of our methods is about 24.89%, 23.44%, 19.01%, 10.75%, 5.10%, and 1.59% better than the best of the individual methods, PconsC4, for top-5L, top-3L, top-2L, top-L, top-L/2, and top-L/5 contacts, respectively.
We also tested the performance of our method on CASP13, CASP14, and a nonredundant TM protein test set. The prediction accuracies were 75.91%, 77.49%, and 73.91%, respectively, for top-L/5 contacts. The prediction accuracies of COMTOP for all test sets significantly outperformed those of the seven individual methods. Furthermore, when we compared our method with a few state-of-the-art methods, it can be observed that the RapterX method showed better performance than our model for the CASP13 dataset. The COMTOP model outperformed Yang_server, TripletRes, ResTriplet, and DNCON3 schemes for the top-L/2 and top-L/5 contact predictions for CASP13 targets. In the CASP14 dataset, COMTOP performs better than tFold, MULTICOM-AI, RaptorX, trfold, and Kiharalab_Contact schemes for the top-L/2 and top-L/5 contacts, respectively. On the other hand, TripletRes and DeepPotential have better performance for top-L/2 and top-L/5 contacts. The results and analysis demonstrate that COMTOP can significantly improve the performance of the individual methods; therefore, COMTOP is more robust against different types of test sets. COMTOP also showed better/comparable predictions when compared with the state-of-the-art predictors.

Conclusions
In this paper, we presented a novel hybrid consensus method named as COMTOP and based on seven methods, aiming to predict high-quality protein contacts that can be used for 3D structure prediction. This consensus contact prediction method is based on a MILP model that produces the parameters for protein residue-residue contact prediction. The test on the 239 targets showed that COMTOP performed well compared with seven individual methods according to the prediction accuracy. COMTOP achieved a prediction accuracy of 75.91%, 77.49%, and 73.91% for top-L/5 contacts test on the CASP13, CASP14, and 50 TM target proteins, respectively, and showed satisfactory results compared with the state-of-the-art predictors. For all test datasets, the improvement of COMTOP in accuracy over the seven individual methods increases with the increasing number of predicted contacts. For example, COMTOP performs much better with a large number of contact predictions (such as top-5L and top-3L) than for a small number of contact predictions, such as top-L/2 and top-L/5.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/1 0.3390/membranes11070503/s1, Table S1: The non-redundant list with 3133 proteins, Table S2: The  training set of COMTOP, Table S3: 239 non-redundant proteins used for testing, Table S4: 30 CASP13  target domains used for testing, Table S5: 25 CASP14 target domains list used for testing, Table S6: 57 non-redundant TM proteins used for testing, Table S7: Parameters λ(m1) i (for the seven method COMTOP) obtained from the overall training process are shown in columns 1 to 6, respectively.