TMP-SSurface: A Deep Learning-Based Predictor for Surface Accessibility of Transmembrane Protein Residues

Transmembrane proteins (TMPs) play vital and diverse roles in many biological processes, such as molecular transportation and immune response. Like other proteins, many major interactions with other molecules happen in TMPs’ surface area, which is important for function annotation and drug discovery. Under the condition that the structure of TMP is hard to derive from experiment and prediction, it is a practical way to predict the TMP residues’ surface area, measured by the relative accessible surface area (rASA), based on computational methods. In this study, we presented a novel deep learning-based predictor TMP-SSurface for both alpha-helical and betabarrel transmembrane proteins (α-TMP and β-TMP), where convolutional neural network (CNN), inception blocks, and CapsuleNet were combined to construct a network framework, simply accepting one-hot code and position-specific score matrix (PSSM) of protein fragment as inputs. TMP-SSurface was tested against an independent dataset achieving appreciable performance with 0.584 Pearson correlation coefficients (CC) value. As the first TMP’s rASA predictor utilizing the deep neural network, our method provided a referenceable sample for the community, as well as a practical step to discover the interaction sites of TMPs based on their sequence.


Introduction
Transmembrane protein (TMP) is one of the most important types of membrane proteins (MPs) that span the entire biological membranes in the whole molecular life cycle as a gateway or receptor. They involve in diverse biological processes, such as cell mechanics regulation [1], signal transduction [2], molecule transport [3], etc. Special interest in TMPs also arises from the fact that they associate with many types of diseases, such as autism [4], dyslipidemia [5], epilepsy [6], and various types of cancers [7][8][9]. Since TMPs play numerous roles in basic physiology and pathophysiology, TMPs are major targets for more than one-third of known drugs on the current therapeutics market [10]. On one side of the membrane, TMPs INTERACT with ligands, including protons, metal ions, enzyme, drug-like compound, etc. On the other side, they interact with proteins, RNAs, or other molecules to trigger a series of molecular reactions and eventually control the cell functions. The interaction interface is always located on the surface areas of TMPs, according to the statistics [11]. The surface accessibility of the residues in the protein can be measured by the relative accessible surface area (rASA), Table 1. The summaries of existing methods for predicting surface accessibility of transmembrane protein (TMP) residues.

Method
Year Samples Algorithm TMP Type Seq Region Measure ProperTM [14] 2004 59 knowledge α-TMP TM region Burial state ASAP [18] 2006 73 SVR all TMP TM region ASA TMX [15] 2007 43 SVC α-TMP TM region Burial state MPRAP [19] 2010 80 SVR α-TMP full sequence rASA Yao et al. (2011) [16] 2011 53 SVM α-TMP TM region Burial state Yao et al. (2012) [20] 2012 122 RF all TMP TM region ASA TMexpoSVR [17] 2013 110 SVR α-TMP TM region rASA TMexpoSVC [17] 2013 110 SVC α-TMP TM region Burial state MenBrain-Rasa [21,22] 2015 80 SVR α-TMP full sequence rASA Although considerable achievements have been made in the field of TMP surface accessibility prediction, there are still several issues that deserved to be further improved. First of all, none of the mentioned methods could predict the rASA of the whole sequence of all kinds of TMPs. On the one hand, except for MPRAP and MenBrain-Rasa, most predictors can only be applied within transmembrane regions of TMPs, which focus only on the lipid-accessible surface while ignoring the water-accessible surface. It is worth pointing out that the prediction of rASA on the full sequence is more challenging than those that only apply to transmembrane residues. On the other hand, most methods only focus on α-helical TMPs while ignoring β-barrel TMPs-including the only two full sequence predictors. Although β-barrel TMPs just account for a small proportion of TMPs, it is also essential to be studied and should not be ignored. Up to now, ASAP and Yao et al. (2012) are the only two predictors that can be applied to both α-helical and β-barrel TMPs, but it is a pity that they can only be used to predict transmembrane regions of TMPs. Thus, it is meaningful to design a more powerful full sequence predictor to predict rASA for all kinds of TMPs.
Besides, previous predictors relied heavily on the features derived from third-party tools, such as position-specific score matrix (PSSM) [23], Z-coordinate, secondary structure [24], and so on. Although these features contribute to the improvement of the predictor performance [25][26][27], their weakness cannot be ignored. On the one hand, using these third-party tool-derived features will make the predictor slow and may lead to uncontrollable failure. For example, MemBrain-Rasa uses six types of features, four of whom relied on the third-part tools, and seven out of 50 proteins cannot get a reliable prediction result from MemBrain-Rasa on the independent test. On the other hand, expertise in TMPs is always required to successfully use the previous methods, which may confuse the non-professional users and hinder the exploration of the biological significance of the prediction process. Since most previous predictors can only be applied in the transmembrane regions of TMPs, the topology structure of TMPs must be known before using them. However, it is difficult for non-professional researchers to determine the topology of TMPs. Based on this consideration, we tried to describe the protein fragment with features as concise as possible to make the predictor simpler and more efficient. After a series of experiments, we selected two types of encoding schemes to represent the protein fragment: one-hot code [28][29][30] and a position-specific scoring matrix (PSSM), where the former one encodes the residues arrangement and the latter one reflect the evolutionary profile. However, reducing the number of features will inevitably result in less information that may get by the predictor and cause performance deterioration. As a promising solution, decreasing the dependency on sophisticated features, a deep learning-based method was introduced in this study for its ability to discover the structural features from the sequence. The proposed method was a deep learning network that combines a convolutional neural network (CNN), inception network, and CapsuleNet.
In this study, we proposed a sequence-based rASA predictor (TMP-SSurface) for the full sequence of all types of TMPs, that achieved considerable performance while simplifying the input features as much as possible. Only one-hot code and PSSM were used as the input features of a new proposed deep learning-based regression method, which combined the inception network with CapsuleNet. The experimental result showed that the performance of TMP-SSurface achieved a Pearson correlation coefficient (CC) of 0.581 on the independent validation, which was slightly better than the results of today's best predictor, but much more simple than it. TMP-SSurface is accessible freely in http://icdtools.nenu.edu.cn/tmp_ssurface. The datasets used in the experiment and project of the predictor could be downloaded from the web-server.

Feature Analysis
We tried several features, such as topology structure, physicochemical properties, and Z-coordinate. Although these features contributed to the predictor more or less, they were not as significant as one-hot code and PSSM. Besides, additional features would make the predictor more complex. To make the predictor as simple as possible while ensuring the prediction performance, we decided to use a one-hot code and PSSM as the features to describe the basic information of the protein fragment.
In order to investigate the contribution of different features to the predictor, we trained three models using one-hot code, PSSM, and both of them, respectively. Since the proposed model was parameter sensitive, we carried out the complete process of hyper-parameter tuning for each model to make sure the reliable prediction performance. The performance of predictors by using different features on the validation samples is illustrated in Table 2. It was evident that the predictors using a single feature achieved similar performance and achieved a more considerable performance when they were combined.

Effect of Window Size
Because the length of the sliding window determined the information feeding in the proposed predictor, it was an important variable that affected the prediction performance directly. We searched for the values of window size from 13 to 23 by the step of 2. As could be seen in Table 3, the predictor achieved the best prediction performance (CC value) on the validation samples when the window size reached 19.

Hyper-Parameter Tuning
We carried out a series of experiments to identify a better configuration of hyper-parameters for the proposed predictor. The performance of the network was affected by a large number of parameters, among which the inception block's number and dynamic routing times were two major hyper-parameters that greatly influenced it. Table 4 illustrates the effect of the inception blocks' number on the involved parameter's number, training time, and CC performance. It was obvious that as the number of inception blocks grew, the number of parameters involved in the network increased exponentially. When the number of inception blocks reached three, the best CC value had been achieved. Thus, three inception blocks were suitable.  Table 5 illustrates the effect of the number of dynamic routings on training time and CC performance. As the number of dynamic routing increased, the time required for training the network increased rapidly. Previous studies had shown that too much dynamic routing times would lead to a decrease in prediction performance [31]. When the number of dynamic routings reached three, the CC value started to fluctuate and decrease slowly. Thus, three dynamic routings were suitable.

Ablation Study
We proposed a compound network that combined CNN, inception, and CapsuleNet. In order to prove the effectiveness of the proposed model, we carried out an ablation study by removing some parts of the network. Each model in the ablation study was performed using the same data, feature, and hyper-parameters. Table 6 illustrates the performance of different models. We found that CapsuleNet was the most effective component: The CapsuleNet achieved the best performance compared with the other two components, and the performance significantly decreased when removing CapsuleNet.
Since the performance of the TMP-SSurface model was considerably better than others, combining three components made sense.

Comparison with Previous Predictors
As described previously, several works have been done to predict the rASA of membrane proteins. However, most of the methods predict the rASA of the transmembrane region in the TMPs, instead of the whole sequence. Since MPRAP and MemBrain-Rasa are the only two predictors that can be used to predict the entire sequence of TMPs, we compared TMP-SSurface with them. For the result presented in Table 7, we found that TMP-SSurface significantly outperformed MPRAP and was similar to MemBrain-Rasa. MemBrain-Rasa was the most effective predictor in this field. On the contrary, TMP-SSurface was much more simple: first, MemBrain-Rasa contained a template-based pre-processing before using the traditional machine learning method, while TMP-SSurface used a deep learning method. Second, MemBrain-Rasa used six types of features that were calculated by several third-party tools, such as R4S, Zpred, PSIPRED, etc. These third-party tools might cause the failure: seven out of 50 proteins could not get a reliable prediction result from MemBrain-Rasa. TMP-SSurface used only one-hot code and PSSM as features-it was stable to get reliable prediction results. It is worth to note that the web-server of MPRAP and MemBrain-Rasa accepted only one protein sequence as the input, while TMP-SSurface accepted multiple sequences as input. We tested the time cost of three web-servers: TMP-SSurface was significantly faster than others. The details of the comparison are shown in Table 7.

Short Sequence Test
Both MPRAP and MemBrain-Rasa limited the length of the input sequence: The limitation of MPRAP was 20-10,000, and MemBrain-Rasa was 30-5,000. This limitation might sometimes be frustrating for users. Although we removed the short proteins with residues less than 30 when building the benchmark datasets, the predictor TMP-SSurface and the corresponding web-server had no restriction on the length of the input sequence. Since there are no proteins longer than 5000 in the Protein Data Bank of Transmembrane Proteins (PDBTM, version: 2019-01-04) [32], we could only carry out an additional experiment on short sequences to prove that the predictor performs well on them. A total of 122 short sequences with residues less than 30 were collected from PDBTM. After removing the high homology sequences by using CD-HIT [33] with a 30% sequence identity cut-off, 89 non-redundant sequences were left. The performance of TMP-SSurface on the short sequence dataset was compared with that on the independent test dataset (50 proteins with 30-5000 residues). The data of the short sequences can be found in the Supplementary Materials: Data sets used in the experiments. From the result presented in Table 8, we found that TMP-SSurface performed well on short sequences.

TMP Type Test
Both MPRAP and MemBrain-Rasa only focused on α-helical TMPs while ignoring β-barrel TMPs. Although β-barrel TMPs just account for a small proportion of TMPs, it is also essential to be studied and should not be ignored. The independent testing dataset contained 45 α-helical TMPs and five β-barrel TMPs. Table 9 illustrates the prediction performance of the different types of TMPs on the independent testing dataset. It could be seen that the prediction performance of β-barrel TMPs was a little bit lower than that of α-helical TMPs', but was also considerable.

Case Study
To further demonstrate the effectiveness of TMP-SSurface, we took 4n6h_A and 1a0s_P as examples of case studies. 4n6h_A is a Escherichia coli α-helical transmembrane protein (subgroup: G protein-coupled receptor), which is the receptor of various ligands, such as heme, sodium ion, and δ-opioid [34]. Opioids represent widely prescribed and abused medications, although their signal transduction mechanisms are not well understood. When visualizing the PDB file of 4n6h_A, we found that the δ-opioid was located on a pit on the surface of the protein. 1a0s_P is a Salmonella typhimurium β-barrel transmembrane protein (subgroup: porin), which is the transporter of calcium ion and sucrose and involves in many signal pathways. When visualizing the pdb file of 1a0s_P, we found that the ligand-binding sites were located on the extracellular solvent surface and the water-filled transmembrane channel (the solvent surface of the pore). Hence, accurately predicting the rASA of these proteins would help to study the characteristics of their functional or structural regions. Figure 1 is the visualization of the predicted result of 4n6h_A and 1a0s_P. (a) and (c) are illustrations of TMP-SSurface-predicted rASA on the 3D version of 4n6h_A and 1a0s_P, respectively. It could be seen that TMP-SSurface did a good job, especially for residues located on the non-transmembrane regions-surface residues exposed to water in these regions. In the transmembrane regions, the TMP-SSurface-predicted rASA was always lower than DSSP [35] calculated rASA. This might be explained by the amino acid composition of surface residues, located on transmembrane regions, which was significantly different from that of non-transmembrane regions. Since the surface residues located on the transmembrane regions were exposed to lipid, most of them were hydrophobic residues. Still, TMP-SSurface did a good job on TM regions as well. (d) and (b) are comparisons between the TMP-SSurface-predicted rASA and the DSSP-calculated rASA of 4n6h_A and 1a0s_P by line chart. The prediction accuracy of TMP-SSurface on the exposed residues (0.2 ≤ rASA) was better than that on the burial residues (rASA < 0.2). The surface residues located on the transmembrane regions were exposed to the lipid-the hydrophobic environment, which is similar to the environment inside the protein.
TMP-SSurface might confuse the burial residues with surface residues located on the transmembrane regions, resulting in low prediction accuracy of these residues.

Benchmark Datasets
As illustrated in Table 1, the number of samples used by previous methods is small. Since the number of TMP structures has increased rapidly in the past few years, a more comprehensive data set is required. Protein Data Bank of Transmembrane Proteins (PDBTM) [32] is the first comprehensive and up-to-date transmembrane protein selection of the Protein Data Bank (PDB) [36]. We downloaded 4007 transmembrane proteins from PDBTM (version: 2019-01-04), which contained 3559 alpha proteins and 426 beta proteins. We first removed the proteins, which contained unknown residues (such as "X"), as well as those less than 30 residues in length. In order to reduce the influence of data redundancy and homology bias, these proteins were clustered by CD-HIT with a 30% sequence identity cut-off, and the representative sequences in each cluster were picked. After that, we had 704 protein chains (618 alpha protein chains and 86 beta protein chains) left. After that, these proteins were divided randomly into a training set with 604 proteins, a validation set with 50 proteins, and a test set with 50 proteins. The data can be found in the Supplementary Materials: Data sets used in the experiments.

Benchmark Datasets
As illustrated in Table 1, the number of samples used by previous methods is small. Since the number of TMP structures has increased rapidly in the past few years, a more comprehensive data set is required. Protein Data Bank of Transmembrane Proteins (PDBTM) [32] is the first comprehensive and up-to-date transmembrane protein selection of the Protein Data Bank (PDB) [36]. We downloaded 4007 transmembrane proteins from PDBTM (version: 2019-01-04), which contained 3559 alpha proteins and 426 beta proteins. We first removed the proteins, which contained unknown residues (such as "X"), as well as those less than 30 residues in length. In order to reduce the influence of data redundancy and homology bias, these proteins were clustered by CD-HIT with a 30% sequence identity cut-off, and the representative sequences in each cluster were picked. After that, we had 704 protein chains (618 alpha protein chains and 86 beta protein chains) left. After that, these proteins were divided randomly into a training set with 604 proteins, a validation set with 50 proteins, and a test set with 50 proteins. The data can be found in the Supplementary Materials: Data sets used in the experiments.

Calculation of rASA
Accessible surface area (ASA) refers to the surface accessibility of a residue when it exposes to the water or lipid. It can be calculated from its structural information by several tools, such as DSSP [35], PSAIA [37], and Naccess [38]. In this work, the ASA of each residue was calculated by DSSP, with a probe of the radius of 1.4 Å. A residue's relative accessible surface area (rASA) is calculated by dividing its ASA by the maximum accessible surface area (MaxASA), which is the rASA of the extended tri-peptides (Gly-X-Gly) [39]. Several MaxASA scales have been published [40,41], and we used the empirical values for MaxASA defined by Tien et al. in 2013 [39]. rASA can be calculated by the formula:

Encoding of Protein Fragments
For a given protein sequence, a sliding window scheme was used to slice the protein into fragments. The reason for using the sliding window is that the rASA of the residue is greatly influenced by its sequential neighbors [42]. Here, we set the window size to 19: target residue with 9 residues from upstream and 9 residues from downstream.
To accurately predict a TMP's rASA, it is crucial to extract useful information from the primary sequence as the input of prediction models. Besides, we tried to describe the protein fragments with features as concise as possible to make the predictor simpler and more efficient. After a series of experiments, we selected two types of encoding schemes to represent the protein fragment: one-hot code and PSSM.
One-hot code is a 20-dimension vector whose elements represent the type of residues. For a given residue, the position of the corresponding residue is 1, and all the others are 0. It is simple to design and have been proved to be a powerful feature for protein function prediction associated problems [43][44][45]. To improve the prediction performance of the residues located on the ends of the protein sequence, we added one dimension after the one-hot code vector to encode the sequence's terminal flag. As shown in Figure 2, if the "residue" was beyond the range of the protein sequence, we encoded the flag bit as 1 with all one-hot code bits as 0. In contrast, we encoded the flag bit as 0 while the one-hot code was legal. For the given residue in the protein sequence, the one-hot code features of the corresponding fragment were encoded by a 21 × 19 matrix. For a protein with L residues, we obtained L matrices.
Accessible surface area (ASA) refers to the surface accessibility of a residue when it exposes to the water or lipid. It can be calculated from its structural information by several tools, such as DSSP [35], PSAIA [37], and Naccess [38]. In this work, the ASA of each residue was calculated by DSSP, with a probe of the radius of 1.4 Å. A residue's relative accessible surface area (rASA) is calculated by dividing its ASA by the maximum accessible surface area (MaxASA), which is the rASA of the extended tri-peptides (Gly-X-Gly) [39]. Several MaxASA scales have been published [40,41], and we used the empirical values for MaxASA defined by Tien et al. in 2013 [39]. rASA can be calculated by the formula:

Encoding of Protein Fragments
For a given protein sequence, a sliding window scheme was used to slice the protein into fragments. The reason for using the sliding window is that the rASA of the residue is greatly influenced by its sequential neighbors [42]. Here, we set the window size to 19: target residue with 9 residues from upstream and 9 residues from downstream.
To accurately predict a TMP's rASA, it is crucial to extract useful information from the primary sequence as the input of prediction models. Besides, we tried to describe the protein fragments with features as concise as possible to make the predictor simpler and more efficient. After a series of experiments, we selected two types of encoding schemes to represent the protein fragment: one-hot code and PSSM.
One-hot code is a 20-dimension vector whose elements represent the type of residues. For a given residue, the position of the corresponding residue is 1, and all the others are 0. It is simple to design and have been proved to be a powerful feature for protein function prediction associated problems [43][44][45]. To improve the prediction performance of the residues located on the ends of the protein sequence, we added one dimension after the one-hot code vector to encode the sequence's terminal flag. As shown in Figure 2, if the "residue" was beyond the range of the protein sequence, we encoded the flag bit as 1 with all one-hot code bits as 0. In contrast, we encoded the flag bit as 0 while the one-hot code was legal. For the given residue in the protein sequence, the one-hot code features of the corresponding fragment were encoded by a 21 × 19 matrix. For a protein with L residues, we obtained L matrices.  The position-specific scoring matrix (PSSM) represents the evolutionary profile of the protein sequence. It has been proved that highly conserved regions are always correlated within the functional regions [46][47][48]. PSSM has been widely used in many bioinformatics problems, such as membrane-ligand binding sites prediction [11] and protein secondary structure prediction [49]. The PSSM of TMPs was obtained by using the PSI-BLAST [50] tool to search the uniref50 (version: 2019-01-16) database through 3 iterations with a 0.01 E-value cutoff. For the given residue in the protein sequence, the PSSM feature of the corresponding fragment was encoded by a 20 × 19 matrix.
In conclusion, we described the given residue by a 41 × 19 matrix, which contained a one-hot code and PSSM.

Model Design
We presented a deep learning network called TMP-SSurface, whose design is shown in Figure 3a. For a given residue in the TMP, the input features were one-hot code (19 × 21 array) and PSSM (19 × 20 array). First of all, one CNN layer (256 3 × 3 kernels and a stride of 1) was applied to generate the convolved features to extract local low-level features. After that, the abstracted features were fed into the inception layers: Three Inception blocks were applied side by side to extract low-to-intermediate features. Inception V1 was used as one inception block (See Figure 3b for details). A capsule layer was placed after the inception layers to extract high-level features or explore the spatial relationship among the local features that were extracted in the layers mentioned above. The primary capsule layer was a convolutional capsule layer, as described in the work of Sabour's team [51]. It contained 32 channels of convolutional 8D capsules, with a 9 × 9 kernel and a stride of 2. The final layer (regression capsule) had one 16D capsule to represent the probability of residues being exposed to the surface. The weights between primary capsules and regression capsules were determined by the iterative dynamic routing algorithm. The squashing activation function was applied in the computation between the primary capsule layer and the regression capsule layer.
where v j is the vector output of capsule j, and s j is the total output. The position-specific scoring matrix (PSSM) represents the evolutionary profile of the protein sequence. It has been proved that highly conserved regions are always correlated within the functional regions [46][47][48]. PSSM has been widely used in many bioinformatics problems, such as membrane-ligand binding sites prediction [11] and protein secondary structure prediction [49]. The PSSM of TMPs was obtained by using the PSI-BLAST [50] tool to search the uniref50 (version: 2019-01-16) database through 3 iterations with a 0.01 E-value cutoff. For the given residue in the protein sequence, the PSSM feature of the corresponding fragment was encoded by a 20 × 19 matrix.
In conclusion, we described the given residue by a 41 × 19 matrix, which contained a one-hot code and PSSM.

Model Design
We presented a deep learning network called TMP-SSurface, whose design is shown in Figure  3a. For a given residue in the TMP, the input features were one-hot code (19 × 21 array) and PSSM (19 × 20 array). First of all, one CNN layer (256 3 × 3 kernels and a stride of 1) was applied to generate the convolved features to extract local low-level features. After that, the abstracted features were fed into the inception layers: Three Inception blocks were applied side by side to extract lowto-intermediate features. Inception V1 was used as one inception block (See Figure 3b for details). A capsule layer was placed after the inception layers to extract high-level features or explore the spatial relationship among the local features that were extracted in the layers mentioned above. The primary capsule layer was a convolutional capsule layer, as described in the work of Sabour's team [51]. It contained 32 channels of convolutional 8D capsules, with a 9 × 9 kernel and a stride of 2. The final layer (regression capsule) had one 16D capsule to represent the probability of residues being exposed to the surface. The weights between primary capsules and regression capsules were determined by the iterative dynamic routing algorithm. The squashing activation function was applied in the computation between the primary capsule layer and the regression capsule layer.
where is the vector output of capsule j, and is the total output.

From Capsule Length to rASA
According to Sabour et al., the length of the output vector of a capsule indicates the probability that the current input belongs to the entity represented by the capsule [51]. The length of the capsule can be used to assess the prediction confidence: The longer the capsule, the more confident the predicted result will be [31]. In this study, the length of the vector of the positive capsule in the last layer could be used to describe the probability of the input residue exposed to the environment. According to the statistics, we found that the rASA was correlated with but could not be expressed directly by the capsule length. An exponential function was used to fit the capsule length and rASA: rASA pred = Len 1. 6 (3) where rASA pred is the predicted rASA of the current input residue, and Len represents the corresponding capsule length. The value of the exponent was obtained by experiments.

Performance Evaluation
To quantitatively evaluate the proposed predictor TMP-SSurface, two measurements that are widely used for the rASA prediction method were adopted in this study: mean absolute error (MAE) and Pearson correlation coefficients (CC). MAE was used to measure the average deviation between the predicted and observed rASA values of all residues. MAE value ranged in [0, 1], the smaller the MAE value, the better the prediction performance. CC was used to measure the linear correlation between predicted and observed rASA value. CC value ranged in [−1, 1], where -1 represents a totally negative correlation, 1 totally positive correlation, and 0 totally no correlation. MAE and CC could be calculated by formulas: where L represents the number of residues. x i and y i represent the observed and predicted rASA value of the ith residue, and x and y represent the corresponding mean value.

Conclusions
In this study, we proposed a sequence-based rASA predictor for the full sequence of all type of TMPs, called TMP-SSurface. To make the predictor as simple as possible while ensuring the prediction performance, only one-hot code and PSSM were used as the input features of a deep learning-based predictor. The experimental result proved the usefulness of these features, suggesting that sequence encode and evolution information could illuminate the characteristics of a surface structure. Besides, a deep learning-based method had verified the ability to mining the information of protein structure from the most simple and basic sequence information. TMP-SSurface did not have any restriction: it could predict the whole sequence of any kind of TMP with any length. The predicted rASA could be used for further researches of TMPs, such as structure analysis, TMP-ligand binding prediction, and TMP function identification.