Secondary and Topological Structural Merge Prediction of Alpha-Helical Transmembrane Proteins Using a Hybrid Model Based on Hidden Markov and Long Short-Term Memory Neural Networks

Alpha-helical transmembrane proteins (αTMPs) play essential roles in drug targeting and disease treatments. Due to the challenges of using experimental methods to determine their structure, αTMPs have far fewer known structures than soluble proteins. The topology of transmembrane proteins (TMPs) can determine the spatial conformation relative to the membrane, while the secondary structure helps to identify their functional domain. They are highly correlated on αTMPs sequences, and achieving a merge prediction is instructive for further understanding the structure and function of αTMPs. In this study, we implemented a hybrid model combining Deep Learning Neural Networks (DNNs) with a Class Hidden Markov Model (CHMM), namely HDNNtopss. DNNs extract rich contextual features through stacked attention-enhanced Bidirectional Long Short-Term Memory (BiLSTM) networks and Convolutional Neural Networks (CNNs), and CHMM captures state-associative temporal features. The hybrid model not only reasonably considers the probability of the state path but also has a fitting and feature-extraction capability for deep learning, which enables flexible prediction and makes the resulting sequence more biologically meaningful. It outperforms current advanced merge-prediction methods with a Q4 of 0.779 and an MCC of 0.673 on the independent test dataset, which have practical, solid significance. In comparison to advanced prediction methods for topological and secondary structures, it achieves the highest topology prediction with a Q2 of 0.884, which has a strong comprehensive performance. At the same time, we implemented a joint training method, Co-HDNNtopss, and achieved a good performance to provide an important reference for similar hybrid-model training.


Introduction
Alpha-helical transmembrane proteins (αTMPs) play an important role in physiological processes and biochemical pathways, particularly in drug targeting and disease therapy [1]. The 3D structure of αTMPs are challenging to determine using nuclear magnetic resonance (NMR) and X-ray crystal diffraction [2,3] due to their structural specificity. Due to the physiological significance of αTMPs and the lack of known structures [4,5], many bioinformatics approaches have explored their structure and function based on amino acid sequences.
The protein's secondary structure helps to identify functional domains [6][7][8]. Topological structure is generally the position of transmembrane proteins (TMPs) relative to the biological membrane and the number of transmembrane segments. The secondary 2 of 16 structure of the transmembrane region of αTMPs is the α-helix. The combined prediction of the secondary and topological structures avoids conflicting predictions of alpha-helix transmembrane structures caused by different structural predictors. Moreover, it provides a simpler and more effective method for the local primary screening of drug-targeted binding and helps subsequent studies focus on the appropriate region of action in advance. Although Alphafold2 has been shown to perform well in predicting transmembrane proteins [9], the study of the combined structure can provide an important reference for related applications.
Machine Learning (ML) improved the performance of structural predictors, replacing earlier empirical, statistical, and principle approaches [10][11][12]. Neural Networks (NNs) were successfully applied to the secondary structure-prediction methods JPred, PSIPRED, and PredictProtein [13][14][15]. PHDhtm [7] and MEMSAT3 [16] used NNs to improve the topology prediction performance. TMHMM [17] and HMMTOP [18] applied Hidden Markov Models (HMMs) to solve topology-prediction problems. However, HMMs lack the ability to extract long-term correlations for complex patterns. OCTOPUS [19] and SPOCTOPUS [20] further improved topology-prediction performance by combining HMMs and NNs. Deep Learning (DL) offers the possibility of learning more implicit features from biological sequences. MUFOLD-SS and DeepTMHMM use DL to predict the protein's secondary structure and topology of TMPs, respectively [21,22]. However, there are only a few methods for the merge prediction of the topological and secondary structures of TMPs. TMPSS [23] and MASSP [24] achieved simultaneous prediction by using the DL of Convolutional Neural Networks (CNNs) or Long Short-Term Memory (LSTM) layers, and such studies usually employed multi-task learning to predict structures separately, creating conflicting prediction results. Furthermore, DL is not well suited for modeling temporal phenomena.
Hidden Neural Networks (HNNs) provide new insight by reasonably combining the Class Hidden Markov Model (CHMM) and NNs [25,26]. Similar hybrid models have been applied in many fields such as speech recognition, time-perception recommendation systems [27][28][29], and such forth due to their excellent performance. HNNs have also been explored in computational biology [30][31][32]. The HNNs' prediction performance can be further improved by choosing more complex Deep Learning Neural Networks (DNNs) instead of a simple multilayer perceptron. In this study, we implemented topology and secondary structure merge-prediction methods for αTMPs, which construct hybrid models by combining CHMM and LSTM. The hybrid models not only have the ability of general DNNs to learn complex patterns, but also constrain the corresponding state paths using statistical transition probabilities, making the results more biologically meaningful in practice. A deep learning module consisting mainly of LSTM was used to capture context dependencies of the sequence. The CHMM module extracts temporal features and correlations of states, where the final output of the deep learning module assigned to each state was used as the emission probability by the CHMM. The prediction of the topological structure corresponding to the Q2 metric refers to the accuracy of Transmembrane (T) and No-transmembrane (N) in classification. The prediction of the protein's secondary structure corresponds to the Q3 metric refers to helix, strand, and coil. Merge prediction corresponds to Q4 and refers to T-helix (M), N-helix (H), N-strand (E), and N-coil (C). HDNNtopss, as a merged predictor which uses a separate training method, used independently generated datasets to achieve a Q4 of 0.779 in merge prediction, and it achieved a Q2 of a whopping 0.884 in topology prediction. It has strong practical significance and all-around performance as the final predictor of this study. At the same time, we put forward a joint training idea, Co-HDNNtopss, which combined DNNs and CHMM and attempted to mutually promote both to achieve the best convergence effect in training. Although the current result is not the best, potentially due to the unbalanced ratio of structural label categories, it can provide some reference for similar hybrid-model training. Our method, pre-trained models, and supporting materials can be accessed through https://github.com/NENUBioCompute/ HDNNtopss (accessed on 10 March 2023).

Evaluations Criteria
Combined with the evaluation criteria for predicting the secondary structure and topology of transmembrane proteins at the residual level, for the combined prediction of these two structures of transmembrane proteins, we evaluated the performance of the methods based on the following metrics, where TN, TP, FN, and FP, respectively, denoted true negative, true positive, false negative, and false positive samples.
ACC is the number of correctly predicted residues in standard state mode, corresponding to Q2, Q3, and Q4.
To evaluate the performance of the model, we also took the following metrics, including the Matthews Correlation Coefficient (MCC), Segment Overlap Measure (SOV) [33], Recall, Precision, Specificity, and F1-score [34][35][36]. The formulae are as follows: The correct TM, indicates the number of proteins with the correctly predicted number of transmembrane strands.

Window Size Exploration Experiment of HDNNtopss
The selection of the optimal network input size is the key to deep learning networks. Based on the above accuracy measurements, the optimal network input size of HDNNtopss was determined, and the results are shown in Table 1. Considering the accuracy of transmembrane fragment identification, the Q4, and the MCC of merge-structure prediction, a network input with a window size of 19 yields the best network-prediction results for αTMPs on HDNNtopss.

Prediction Performance Analysis at the Residue Level
In order to evaluate the prediction performance at the residual level, we used a confusion matrix, Receiver Operating Characteristic (ROC) curve, and Precision-Recall (PR) curve to display the prediction results of HDNNtopss for the blind-test set (see Figures 1 and 2). Figure 1A shows the confusion matrix of the merge-structure prediction, while Figure 1B,C show the confusion matrix of the secondary-structure prediction and topology prediction, respectively. It can be seen that the topology prediction of αTMPs is good, while there are some confusing structures in the secondary-structure prediction. Figure 2A,B show the ROC and PR curves on topology prediction, respectively. They also support the conclusion.

Prediction Performance Analysis at the Residue Level
In order to evaluate the prediction performance at the residual level, we used a confusion matrix, Receiver Operating Characteristic (ROC) curve, and Precision-Recall (PR) curve to display the prediction results of HDNNtopss for the blind-test set (see Figures 1 and 2). Figure 1(A) shows the confusion matrix of the merge-structure prediction, while (B) and (C) show the confusion matrix of the secondary-structure prediction and topology prediction, respectively. It can be seen that the topology prediction of αTMPs is good, while there are some confusing structures in the secondary-structure prediction. Figure  2(A) and (B) show the ROC and PR curves on topology prediction, respectively. They also support the conclusion. For Co-HDNNtopss, we showed and analyzed the performance of each structure category using the confusion matrix. It can be seen from Figure 3 that the transmembrane (T) and non-transmembrane regions (N) under the topological-structure prediction classification criteria of the C-plot can be well-divided and handled. There is not much confusion between the two categories, and only a few residues originally in the N region are predicted to be in the T region. In contrast, the B-plot, corresponding to the classification criteria for secondary-structure prediction, shows that class 'E' (strand) is poorly predicted, and this class is confused with both 'H' (helix) and 'C' (coil), i.e., residues originally labeled as 'E' are predicted as 'H' or 'C', while the box value of 'E' in the confusion matrix is only 0.37. This result suggests that the unevenness of the structural classes of αTMPs leads to biased prediction results. This phenomenon was not evident in the predictions of the separate training approach, suggesting that this joint training approach amplifies the

Prediction Performance Analysis at the Residue Level
In order to evaluate the prediction performance at the residual level, we used a confusion matrix, Receiver Operating Characteristic (ROC) curve, and Precision-Recall (PR) curve to display the prediction results of HDNNtopss for the blind-test set (see Figures 1  and 2). Figure 1(A) shows the confusion matrix of the merge-structure prediction, while (B) and (C) show the confusion matrix of the secondary-structure prediction and topology prediction, respectively. It can be seen that the topology prediction of αTMPs is good, while there are some confusing structures in the secondary-structure prediction. Figure  2(A) and (B) show the ROC and PR curves on topology prediction, respectively. They also support the conclusion. For Co-HDNNtopss, we showed and analyzed the performance of each structure category using the confusion matrix. It can be seen from Figure 3 that the transmembrane (T) and non-transmembrane regions (N) under the topological-structure prediction classification criteria of the C-plot can be well-divided and handled. There is not much confusion between the two categories, and only a few residues originally in the N region are predicted to be in the T region. In contrast, the B-plot, corresponding to the classification criteria for secondary-structure prediction, shows that class 'E' (strand) is poorly predicted, and this class is confused with both 'H' (helix) and 'C' (coil), i.e., residues originally labeled as 'E' are predicted as 'H' or 'C', while the box value of 'E' in the confusion matrix is only 0.37. This result suggests that the unevenness of the structural classes of αTMPs leads to biased prediction results. This phenomenon was not evident in the predictions of the separate training approach, suggesting that this joint training approach amplifies the For Co-HDNNtopss, we showed and analyzed the performance of each structure category using the confusion matrix. It can be seen from Figure 3 that the transmembrane (T) and non-transmembrane regions (N) under the topological-structure prediction classification criteria of the C-plot can be well-divided and handled. There is not much confusion between the two categories, and only a few residues originally in the N region are predicted to be in the T region. In contrast, the B-plot, corresponding to the classification criteria for secondary-structure prediction, shows that class 'E' (strand) is poorly predicted, and this class is confused with both 'H' (helix) and 'C' (coil), i.e., residues originally labeled as 'E' are predicted as 'H' or 'C', while the box value of 'E' in the confusion matrix is only 0.37. This result suggests that the unevenness of the structural classes of αTMPs leads to biased prediction results. This phenomenon was not evident in the predictions of the separate training approach, suggesting that this joint training approach amplifies the defect of sample category imbalance. The A-plot, corresponding to the classification criteria of the structure merge prediction, also shows that the N-helix (H) and N-strand (E) have slightly poorer prediction results, except for the T-helix (M) and N-coil (C), which have a clearer sample category delineation. defect of sample category imbalance. The A-plot, corresponding to the classification criteria of the structure merge prediction, also shows that the N-helix (H) and N-strand (E) have slightly poorer prediction results, except for the T-helix (M) and N-coil (C), which have a clearer sample category delineation. After analysis, this was hypothesized to be due to the joint training process in which we used the sigmoid output for each category in the deep learning module for its probability, while some categories with a smaller proportion could not use the softmax output as in the separate training process, forcing each category to adjust and promote each other in time. Furthermore, the phenomenon of fewer positive samples and more negative samples creates bias in the classification model.

Overall Prediction Performance
To compare the accuracy in predicting the four classes of transmembrane-protein topology and secondary structures combined, we compared the results with TMPSS, which predicts both structures simultaneously. Since these two predictions are simultaneous rather than combined, we transformed the prediction of topological dichotomization and secondary-structure dichotomization to obtain four classification structures. The Bio-Chemical Library (BCL), a necessary tool of MASSP, could not be downloaded when we tested it, and the web server provided by MASSP was not available [24]. Therefore, we didn't compare it with another method for the multi-task prediction of two structures using MASSP.
In Table 2, the results show that the performance of our separate training predictor is better; the HDNNtopss method achieved a Q4 of 0.779 and an MCC of 0.673, while the other classification criteria also performed well. The time indicator showed the prediction time excluding the generation of HHblits profiles in the test set, and HDNNtopss consumes the least time. Whether it is due to the structural relevance of the combined prediction of the two structures or the sensitivity created by the hybrid model, our work using HDNNtopss showed the best performance. After discussion, HDNNtopss has some practical significance, and the combined prediction of structures may also provide a reference role for local primary screening for drug-target binding. Table 2 also shows the metrics of the joint training model for each category, and it can be seen that M and C are well predicted, while E is slightly worse with lower recall. The results indicate that the non-transmembrane strand with smaller proportions is poorly predicted due to the unbalanced sample categories, and the joint training method is less suitable as the final prediction model compared to the separate training method. However, it can provide a reference for similar mixed-model training, especially under data with a balanced proportion of categories. After analysis, this was hypothesized to be due to the joint training process in which we used the sigmoid output for each category in the deep learning module for its probability, while some categories with a smaller proportion could not use the softmax output as in the separate training process, forcing each category to adjust and promote each other in time. Furthermore, the phenomenon of fewer positive samples and more negative samples creates bias in the classification model.

Overall Prediction Performance
To compare the accuracy in predicting the four classes of transmembrane-protein topology and secondary structures combined, we compared the results with TMPSS, which predicts both structures simultaneously. Since these two predictions are simultaneous rather than combined, we transformed the prediction of topological dichotomization and secondary-structure dichotomization to obtain four classification structures. The BioChemical Library (BCL), a necessary tool of MASSP, could not be downloaded when we tested it, and the web server provided by MASSP was not available [24]. Therefore, we didn't compare it with another method for the multi-task prediction of two structures using MASSP.
In Table 2, the results show that the performance of our separate training predictor is better; the HDNNtopss method achieved a Q4 of 0.779 and an MCC of 0.673, while the other classification criteria also performed well. The time indicator showed the prediction time excluding the generation of HHblits profiles in the test set, and HDNNtopss consumes the least time. Whether it is due to the structural relevance of the combined prediction of the two structures or the sensitivity created by the hybrid model, our work using HDNNtopss showed the best performance. After discussion, HDNNtopss has some practical significance, and the combined prediction of structures may also provide a reference role for local primary screening for drug-target binding. Table 2 also shows the metrics of the joint training model for each category, and it can be seen that M and C are well predicted, while E is slightly worse with lower recall. The results indicate that the non-transmembrane strand with smaller proportions is poorly predicted due to the unbalanced sample categories, and the joint training method is less suitable as the final prediction model compared to the separate training method. However, it can provide a reference for similar mixed-model training, especially under data with a balanced proportion of categories.

Topological Structure Prediction Performance
To compare the expressiveness of topology prediction with current advanced methods, we transformed the output's quadruple classification results into two classifications for transmembrane and non-transmembrane regions. Table 3 reports the comparative analysis of different methods in terms of topology-prediction performance. In addition to the separate training method, HDNNtopss, and joint training, Co-HDNNtopss, the state-of-theart tested methods include TOPCONS [37], CCTOP [38], PolyPhobius [39], OCTOPUS [19], SPOCTOPUS [20], SCAMPI_MSA [40], DeepTMHMM [22], Phobius [41], Philius [42], and TMPSS [23]. For topology prediction, the methods were compared on the Test50 defined in this paper. As seen from Table 3, by testing using an independent test set, our HDNNtopss Q2 was the best, reaching 0.884 with an MCC also reaching 0.763. It can be seen that the Q2 of our method was the same as that of CCTOP, but the MCC was slightly better. Although TM was slightly lower than DeepTMHMM, it is worth mentioning that up to six transmembrane proteins were predicted as non-transmembrane proteins using this method. Overall, our method had the best prediction ability for the topology.

Secondary Structure Prediction Performance
In order to compare the expressiveness of secondary structure prediction with current advanced methods, we transformed the output's quadruple classification results into three classifications for helix, coil, and strand. Table 4 reports the comparative analysis of different methods in terms of secondary-structure prediction performance. In addition to the separate training method, HDNNtopss, and common training, Co-HDNNtopss, the stateof-the-art tested methods included JPred4 [13], SSpro5 [43], Spider3 [44], PSIPRED4.0 [45], MUFOLD-SS [21], DeepCNF [46], Porter 5 [47], and TMPSS [23]. The methods were compared on the Test50 defined in this paper. In terms of secondary-structure prediction results, the prediction performance of SSpro5 is far ahead of other methods. Furthermore, it can be seen that the prediction of our separate training method, HDNNtopss, for the secondary structure is slightly worse compared with some current advanced methods, but it also achieves a better prediction level.
In summary, our method HDNNtopss also achieves better performance under the topology and secondary-structure prediction classification criteria and has a strong synthesis capability.

Case Studies
To further discuss the validity of HDNNtopss in predicting the topology and secondary structure of αTMPs, we performed a case study of several alpha-helical transmembrane proteins. By comparing the predicted labels with the real labels on the 3D view, it was found that, although most of the structures were accurately predicted by the predictor, there were still some issues that needed to be improved. The details are discussed below, where the results are visualized using PyMOL [48].
As shown in Figure 4, the predicted topology of 6ryo_A is shown on the left side. 6ryo is a lipoprotein signal peptidase from Staphylococcus aureus [49], and it is important for the study of antibiotic resistance. The cyan part indicates the predicted result of M, the light orange part indicates the predicted result of H, the light blue part indicates the predicted result of C, and the salmon part indicates the predicted result of E, On the right side, the difference between the predicted structure and the real structure is shown, the green part indicates a successful prediction, and the red part indicates a failed prediction. Comparing the results with the real structural labels, it can be found that the prediction model basically restores the structure of 6ryo_A, except for one to several residues at the junction that were incorrectly predicted.
The vast majority of αTMPs are similar to 6ryo_A, and their topology and secondary structure can be largely recovered using this predictor with only a few errors of the poor identification of the junction of different structures. This may be caused by the different labeling criteria of the training set and the dependence of the deep learning method on the data. The analysis of a large number of results revealed that a very small number of αTMPs showed a small confusion error between different structures, except for boundary prediction error, which occurred in 4wmz_A. 4wmz exists in the Saccharomyces cerevisiae YJM789 organism, and it is a kind of oxidoreductase inhibitor [50], which provides an indepth understanding of a drug-resistance mechanism. As shown in Figure 5 for 4wmz_A, the color-labeling and significance are consistent with Figure 4. It can be seen that, although the transmembrane fragments were correctly identified, a small segment of residue that should have been H was predicted to be C, and some that should have been C was predicted to be E. This indicates that, due to the unbalanced proportion of samples, some of the samples could not be accurately identified due to the small proportion, and a small number of unreasonable structures were not adjusted by the model. However, it can also be seen that, although the protein has only one transmembrane segment, we accurately predicted it. Although our predictor still has some small problems, it does have quite a good prediction ability. The vast majority of αTMPs are similar to 6ryo_A, and their topology and secondary structure can be largely recovered using this predictor with only a few errors of the poor identification of the junction of different structures. This may be caused by the different labeling criteria of the training set and the dependence of the deep learning method on the data. The analysis of a large number of results revealed that a very small number of αTMPs showed a small confusion error between different structures, except for boundary prediction error, which occurred in 4wmz_A. 4wmz exists in the Saccharomyces cerevisiae YJM789 organism, and it is a kind of oxidoreductase inhibitor [50], which provides an in-depth understanding of a drug-resistance mechanism. As shown in Figure 5 for 4wmz_A, the color-labeling and significance are consistent with Figure 4. It can be seen that, although the transmembrane fragments were correctly identified, a small segment of residue that should have been H was predicted to be C, and some that should have been C was predicted to be E. This indicates that, due to the unbalanced proportion of samples, some of the samples could not be accurately identified due to the small proportion, and a small number of unreasonable structures were not adjusted by the model. However, it can also be seen that, although the protein has only one transmembrane segment, we accurately predicted it. Although our predictor still has some small problems, it does have quite a good prediction ability.  The vast majority of αTMPs are similar to 6ryo_A, and their topology and secondary structure can be largely recovered using this predictor with only a few errors of the poor identification of the junction of different structures. This may be caused by the different labeling criteria of the training set and the dependence of the deep learning method on the data. The analysis of a large number of results revealed that a very small number of αTMPs showed a small confusion error between different structures, except for boundary prediction error, which occurred in 4wmz_A. 4wmz exists in the Saccharomyces cerevisiae YJM789 organism, and it is a kind of oxidoreductase inhibitor [50], which provides an in-depth understanding of a drug-resistance mechanism. As shown in Figure 5 for 4wmz_A, the color-labeling and significance are consistent with Figure 4. It can be seen that, although the transmembrane fragments were correctly identified, a small segment of residue that should have been H was predicted to be C, and some that should have been C was predicted to be E. This indicates that, due to the unbalanced proportion of samples, some of the samples could not be accurately identified due to the small proportion, and a small number of unreasonable structures were not adjusted by the model. However, it can also be seen that, although the protein has only one transmembrane segment, we accurately predicted it. Although our predictor still has some small problems, it does have quite a good prediction ability.

Datasets
TMPs are usually large in size. Moreover, some parts are buried in biological membranes, and most of them are difficult to dissolve in appropriate detergents, inhibiting their purification and crystallization. Therefore, fewer TMPs with known structures were included in the study compared to proteins in general. The number of known structures of transmembrane proteins is increasing at a slow rate, but our predictor can provide a reference for local primary screening. We chose to process the data source to obtain the dataset. The Protein Data Bank of transmembrane proteins (PDBTM) database [51] was the data source. Approximately 5721 αTMPs were downloaded from the 13 August 2021 version of the PDBTM database. Chains containing unknown residues such as ("X") and those <30 in length were removed. CD-HIT [52,53] was used to remove redundancy, retaining 30% sequence similarity, leaving 1242 chains. The data were randomly divided into 1142 protein chains in the training set, 50 in the test set named Test50, and 50 in the valid set. Regarding the division of the 4 classes of combined structures, for the transmembrane helix structure, we took the more accurate structure obtained using the resolution in PDBTM [51] as the standard. For the 3 classes of secondary structures in the non-transmembrane region, we took the Protein Data Bank (PDB) file analyzed using the DSSP [54] program as the standard.

Inputs Encoding
Based on the special structure and function of αTMPs, we chose two inputencoding features of amino acids for the deep learning module: the One-hot code and the HHblits [55] evolutionary profile. In our experiments, the HHblits used the database uniprot20_2016_02. The profile values were scaled by the sigmoid function into the range (0, 1), making the distribution more uniform and reasonable (see Equation (10)). Each amino acid in the protein sequence was represented as a vector of 30 HHblits profile values and 1 NoSeq label (representing a gap). The One-hot encoding matrix, which indicates the index of the amino acid species to which the residue belongs, was based on a 20-dimensional One-hot and a 1-dimensional NoSeq. There was a total of 52-dimensional feature encoding matrices.
The inputs of CHMM in the real sense are protein sequences that were encoded using numbers from 0-19 corresponding to each amino acid. The inputs conform to the observation sequence in the general HMMs. This was based on the trained transition probabilities and the emission probabilities obtained from the deep learning module used to decode sequences.

Models Framework
We implemented a separate training method, HDNNtopss, and a joint training, Co-HDNNtopss. They both use the hybrid model of a deep learning module and CHMM.
For HDNNtopss (see Figure 6), the DNN part comprises the deep learning module. Each observation in the protein sequence is encoded by contextual amino acid as input to the DNNs. The output of the deep learning module is the probability matrix corresponding to each state of the sequence observations. The emission probability of CHMM is replaced by the output to realize the combination of the two modules. CHMM uses the Viterbi algorithm [56] for optimizing the segmentation of state predictions generated by the deep learning module. The 4-category predicted merge structures are the final output.
For Co-HDNNtopss (see Figure 7), the combination of DNNs and CHMM shows the following details. The context s i of the observations from the CHMM is input to the DNN k , which outputs the values corresponding to the emission network for the state k. The first half shows the detailed implementation details of the DNN k emission network. In order to reduce the complexity of the model in the joint training to reduce the time consumption of training, after many experiments and attempts, the deep learning module was finally chosen; see Enlarged DNN k implementation details, in Figure 7.
The first half shows the detailed implementation details of the DNNk emission network. In order to reduce the complexity of the model in the joint training to reduce the time consumption of training, after many experiments and attempts, the deep learning module was finally chosen; see Enlarged DNNk implementation details, in Figure 7.    The first half shows the detailed implementation details of the DNNk emission network. In order to reduce the complexity of the model in the joint training to reduce the time consumption of training, after many experiments and attempts, the deep learning module was finally chosen; see Enlarged DNNk implementation details, in Figure 7.

Deep Learning Module
The deep learning module in HDNNtopss is mainly composed of CNNs and attentionenhanced Bidirectional Long Short-Term Memory (BiLSTM) layers. As shown in the DNN part of Figure 6, this includes feature-integration layers, CNNs, attention-enhanced BiLSTM layers, and fully connected layers. The input to the model is the HHBlits profile and Onehot code generated from the sequence, using a sliding window of the length of 19 residues as its central residue feature. Then, the input of the model is passed to Stacked CNN layers to extract effective data features separately, and BiLSTM layers capture the longer distance as dependent relations and global information. We also added an attention mechanism to the BiLSTM layers to help the model enhance attention to the critical regions. After the BiLSTM, there are two fully connected layers. Apart from the original HNN text, in order to avoid training the emission network of multiple groups of states, we did not use sigmoid but the softmax function as the activation function of the last layer.
The deep learning module in Co-HDNNtopss is almost the same as in the HDNNtopss method but simpler. The input part is a mixture of two features into the stacked CNNs, attention-enhanced BiLSTM with fewer units, and fully connected layers. To facilitate the rewriting of the loss function in the joint training, the sigmoid output probability of each state is treated as the emission probability in CHMM.

CHMM Module
CHMM is more suitable for complex protein-structure prediction problems than the standard HMM with specified labeled data [26]. The CHMM module is the same in separate training and joint training. For predicting the structure of αTMPs, assume that the protein sequence is x, a state sequence π, and a set of labels y.
Our model consisted of 12 states, including Begin and End states, and conceptually combines the previous ideas of αTMPs topology and protein secondary-structure prediction [31,57,58]. The model consists of 4 sub-models corresponding to the 4 labels to be predicted, namely, sub-models M, H, C, and E. Except for C, each sub-model contains a beginning and end state, and the state transition of the model is shown in Figure 8.

Deep Learning Module
The deep learning module in HDNNtopss is mainly composed of CNNs and attention-enhanced Bidirectional Long Short-Term Memory (BiLSTM) layers. As shown in the DNN part of Figure 6, this includes feature-integration layers, CNNs, attention-enhanced BiLSTM layers, and fully connected layers. The input to the model is the HHBlits profile and One-hot code generated from the sequence, using a sliding window of the length of 19 residues as its central residue feature. Then, the input of the model is passed to Stacked CNN layers to extract effective data features separately, and BiLSTM layers capture the longer distance as dependent relations and global information. We also added an attention mechanism to the BiLSTM layers to help the model enhance attention to the critical regions. After the BiLSTM, there are two fully connected layers. Apart from the original HNN text, in order to avoid training the emission network of multiple groups of states, we did not use sigmoid but the softmax function as the activation function of the last layer.
The deep learning module in Co-HDNNtopss is almost the same as in the HDNNtopss method but simpler. The input part is a mixture of two features into the stacked CNNs, attention-enhanced BiLSTM with fewer units, and fully connected layers. To facilitate the rewriting of the loss function in the joint training, the sigmoid output probability of each state is treated as the emission probability in CHMM.

CHMM Module
CHMM is more suitable for complex protein-structure prediction problems than the standard HMM with specified labeled data [26]. The CHMM module is the same in separate training and joint training. For predicting the structure of αTMPs, assume that the protein sequence is , a state sequence , and a set of labels .
Our model consisted of 12 states, including Begin and End states, and conceptually combines the previous ideas of αTMPs topology and protein secondary-structure prediction [31,57,58]. The model consists of 4 sub-models corresponding to the 4 labels to be predicted, namely, sub-models M, H, C, and E. Except for C, each sub-model contains a beginning and end state, and the state transition of the model is shown in Figure 8. The transition probability of CHMM from to π is denoted by . The emission probability of the state , generating the observation , is denoted by . The output value of the corresponding state of the DNNs is used to replace the emission probabilities between each observation and state. We obtained the appropriate transition The transition probability of CHMM from π i to π i+1 is denoted by θ π i π i+1 . The emission probability of the state π i , generating the observation x i , is denoted by e πi (x i ). The output value of the corresponding state of the DNNs is used to replace the emission probabilities between each observation and state. We obtained the appropriate transition probability using the supervised training of CHMM. A protein sequence x was decoded into the corresponding structural properties by using the above probability parameter.

HDNNtopss Module Training
HDNNtopss is a separate training method; the deep learning module uses the supervise learning method, and CHMM uses Conditional Maximum Likelihood (CML) estimation.
For deep learning modules, we used labels to supervise learning and training to get the emission value e k (s i ; w) between the observation and state, where s i refers to the contextual encoding of each x i , and w refers to the DNNs weights. An early termination strategy and a save-best strategy were adopted. When the loss value of the verification set did not continue to decline within the training period of 10 epochs, the best weight information of the model was retained and training was stopped. The parameters were trained using an Adam optimizer [59] to change the learning rate during model training dynamically. Batch normalization layers can effectively prevent network overfitting and improve training speed.
For CHMM, the parameters to be trained included the transition probability. It corresponded to two rounds of the forward and backward algorithms: once in the free-running phase (f), which corresponds to all paths without considering the labels, and once in the clamped phase (c), which is a labeled joint probability calculation. We used CML estimation to update the parameters. CML makes the probability of the free-running phase as close as possible to that of the clamped phase by adjusting the model parameters so as to achieve the desired training probability parameters. In order to facilitate calculations such as gradient descent, the negative likelihood logarithm conversion was performed as follows: In the above formula, P(x|Θ) refers to the probability of the occurrence of x under the parametric model, P(x, y|Θ) is the joint probability of x, y occurring under the parametric model, i.e., the probability of x occurring when the label sequence is consistent with the state path. We implemented a gradient descent on the transition probabilities using ∂L ∂θ ij in each iteration, thus, achieving an update of the transition probabilities, noting that the CHMM is different from the Baum-Welch learning method of the general HMM.

Co-HDNNtopss Module Training
Co-HDNNtopss training, the idea of joint training deep learning and CHMM, was proposed here. The flow diagram of the joint training is shown in Figure 9. After training one epoch of weights in deep learning, we brought the output probability value ; of the emission network DNNk into CHMM for one training iteration with the transition probability parameter in CHMM, where the output value of this emission network is gradient descent as the emission probability parameter ; in CHMM, and its gradient descent method by ℒ was consistent with the update of the transition probability (see Equation (15)). The updated emission probability After training one epoch of weights in deep learning, we brought the output probability value e k (s i ; w k ) of the emission network DNN k into CHMM for one training iteration with the transition probability parameter in CHMM, where the output value of this emission network is gradient descent as the emission probability parameter E k = e k (s i ; w k ) in CHMM, and its gradient descent method by ∂L ∂E k was consistent with the update of the transition probability (see Equation (15)). The updated emission probability parameter E k obtained after one iteration in CHMM, the target value T k , and e k (s i ; w k ) together form a new error calculation (see Equation (16)). Here, we regarded each state as a regression problem. The final activation function is sigmoid, and each state corresponds to a deep-learning emission network. It is worth mentioning that the outputs are the sigmoid activation regression output, with one DNN k for each state. Then, we followed the commonly used MSE to establish a new loss function L 2 for each state (see Equation (17)). According to the convergence of the loss function, the gradient descent w k was calculated (see Equation (18)).
This above process was repeated and then brought into CHMM training until the accuracy of the validation dataset stopped rising. At this point, the training transition probability parameters and emission probabilities were obtained. We hoped that using this approach, the two modules could achieve a joint promotion during the training so that the expectation value decreased. However, since the deep learning module was too complex and effective, a few iterations of the weights reached a reasonable value, while the transition probability parameter required many iterations to make the expected value converge, and the convergence speed and efficiency of these two were not consistent. Hence, the effect of the joint training method was not as good as the separate training, but in the subsequent improved version, we will further consider a reasonable method for training the two together to further improve the prediction accuracy.

Decoding
The decoder used the Viterbi algorithm, which is widely used to optimally partition the 12 states by initialization, recursion, termination, and optimal-path backtracking [56]. The optimal path for each sequence was obtained by encoding the probabilistic parameters obtained from training. Four classes of predicted structures were finally obtained, including M, H, C, and E.

Conclusions
In this study, we proposed hybrid models of CHMM and DNN to predict the topology and secondary merged structure of αTMPs from the primary sequence. The DNN mainly consists of stacked CNN and an attention-enhanced BiLSTM capturing the dependencies between contextual residuals. The statistical model CHMM extracts the correlation of labels through trained state-transition probabilities. This hybrid model not only improves the topology and secondary-structure prediction accuracy but also makes the predicted structural label strings match the actual structure according to the constraints of state-label correlation, leading to predictions that are more biologically meaningful.
Our separate training method, HDNNtopss, reflects a strong utility, with a Q4 reaching 0.779 and an MCC reaching 0.673 in the blind-test set results. Additionally, HDNNtopss achieved the best performance in the topology-prediction task compared to other state-of-the-art methods, it reached a high Q2 of 0.884 and had a strong comprehensive performance. The new method Co-HDNNtopss of joint training DNN and CHMM was also proposed, and although the accuracy was not the highest, this idea combined the two methods of training to closely promote each other. It can provide a reference for similar hybrid-model training. After discussion and analysis, the unbalanced ratio of structural label categories has been resolved.
In the future, we will choose newer datasets or reasonable sampling methods to solve the sample imbalance problem so that the joint training method can reflect a better performance capability. We will also add beta-barrel transmembrane proteins to expand the scope of our data and present the tool as a web server to make our predictor more extensive and useful.
Furthermore, we implemented HDNNtopss as a publicly available predictor for the research community. The pre-trained model and the datasets we used in this paper can be downloaded at https://github.com/NENUBioCompute/HDNNtopss.git (accessed on 10 March 2023). Finally, we sincerely hope that the predictor and the support materials we released in this study will help the researchers who need them.