Next Article in Journal
Synergistic Effect of Stauntonia hexaphylla (Thunb.) Decne Fruit and Leaf on RAW 264.7 Osteoclast and MC3T3-E1 Osteoblast Differentiation
Previous Article in Journal
Establishing a Novel E. coli Heterologous Secretion Expression System Mediated by mScarlet3 for the Expression of a Novel Lipolytic Enzyme
Previous Article in Special Issue
AFM for Studying the Functional Activity of Enzymes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

MTPrompt-PTM: A Multi-Task Method for Post-Translational Modification Prediction Using Prompt Tuning on a Structure-Aware Protein Language Model

1
Department of Electrical Engineering and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
2
Chemical & Materials Engineering, University of Kentucky, Lexington, KY 40506, USA
*
Authors to whom correspondence should be addressed.
Biomolecules 2025, 15(6), 843; https://doi.org/10.3390/biom15060843
Submission received: 3 May 2025 / Revised: 26 May 2025 / Accepted: 7 June 2025 / Published: 9 June 2025
(This article belongs to the Special Issue Innovative Biomolecular Structure Analysis Techniques)

Abstract

:
Post-translational modifications (PTMs) regulate protein function, stability, and interactions, playing essential roles in cellular signaling, localization, and disease mechanisms. Computational approaches enable scalable PTM site prediction; however, traditional models focus only on local sequence features from fragments around potential modification sites, limiting the scope of their predictions. Recently, pre-trained protein language models (PLMs) have improved PTM prediction by leveraging biological knowledge derived from extensive protein databases. However, most PLMs used for PTM site prediction are pre-trained solely on amino acid sequences, limiting their ability to capture the structural context necessary for accurate PTM site prediction. Moreover, these methods typically train separate single-task models for each PTM type, which hinders the sharing of common features and limits potential knowledge transfer across tasks. To overcome these limitations, we introduce MTPrompt-PTM, a multi-task PTM prediction framework developed by applying prompt tuning to a structure-aware protein language model (S-PLM). Instead of training several single-task models, MTPrompt-PTM trains one multi-task model to predict multiple types of PTM sites using shared feature extraction layers and task-specific classification heads. Additionally, we incorporate a knowledge distillation strategy to enhance the efficiency and generalizability of multi-task training. Experimental results demonstrate that MTPrompt-PTM outperforms state-of-the-art PTM prediction tools on 13 types of PTM sites, highlighting the advantages of multi-task learning and structural integration.

1. Introduction

Post-translational modifications (PTMs) are crucial regulators of protein function, stability, and interactions. These modifications occur after translation and play essential roles in cellular signaling, protein localization, and disease mechanisms [1,2,3]. Although over 400 distinct PTM types have been identified, most remain poorly characterized regarding their target sites and biological context [4]. Experimental techniques such as mass spectrometry (MS), Western blotting, and radiolabeling are widely used for PTM identification; however, they are expensive, time-consuming, and constrained by technical limitations [5,6,7]. Computational approaches address these challenges by providing fast, cost-effective, and scalable PTM site prediction [8,9,10]. A common practice involves training models on existing PTM datasets to identify potential PTM sites on unseen data. This approach can be broadly categorized into supervised training from scratch or fine-tuning pre-trained protein language models (PLMs).
Training models from scratch typically involves using protein sequence fragments or local structural information through machine learning or deep learning methods. For example, NetPhos 3.1 [10] developed an artificial neural network (ANN) model incorporating sequence-based motifs and structural features to predict phosphorylation sites in eukaryotic proteins. NetNGlyc 1.0 [11] built prediction models for N-linked, O-linked, and C-linked glycosylation sites by utilizing artificial neural networks that examined the sequence context and surface accessibility of potential glycosylation sites. The group-based prediction system (GPS) algorithm in GSP-MSP [12] integrates sequence features to identify specific methylation types on lysine and arginine residues in proteins. MethylSight [13] created a machine learning model that predicts lysine methylation sites in human proteins by utilizing alignment-free features that capture structural information around lysine residues. Ertelt et al. [14] combined machine learning and structure-based protein design to predict and engineer protein post-translational modifications (PTMs), offering a powerful tool for synthetic biology and therapeutic development.
In addition to traditional machine learning techniques, deep learning-based methods have been applied for PTM prediction. For example, MusiteDeep [15] uses convolutional neural networks (CNNs) to automatically learn sequence representations, overcoming the limitations of traditional feature engineering and achieving improved accuracy in phosphorylation site identification. Meanwhile, CapsNet-PTM [16] employs capsule networks to predict seven different PTM types by capturing spatial dependencies between PTM features. Additionally, Wang et al. [17] introduced a web server for the prediction and visualization of 13 PTM types by combining MusiteDeep/CNN and CapsNet deep learning networks and leveraging advanced ensemble techniques. Furthermore, GPS-SUMO 2.0 [18] utilized three advanced machine learning methods—penalized logistic regression (PLR), deep neural networks (DNNs), and Transformer models—to improve the prediction of SUMOylation sites by incorporating multiple sequence features. However, these methods focus only on local sequence features from fragments around potential modification sites, limiting the scope of their predictions.
Training from pre-trained protein language models (PLMs) has proven highly successful in predicting PTM sites. Recently, embeddings from various pre-trained PLMs, such as ESM2 [19], ProtBERT [20], and ProtT5 [20], have been used as features for the training of PTM site prediction models. For instance, Lmnglypred [21] utilizes ProtT5 embeddings to predict N-linked glycosylation sites. PTG-PLM [22] uses embeddings from multiple PLMs, including ProtBERT-BFD, ProtAlbert, ProtXLNet, ESM-1b, and TAPE, to enhance glycosylation and glycation site prediction using CNNs. LM-OGlcNAc-Site [23] applies sophisticated ensemble strategies by combining embeddings from Ankh, ESM-2, and ProtT5 to predict O-linked N-acetylglucosamine (O-GlcNAc) modification sites. In contrast to the aforementioned methods, which rely solely on PLM embeddings as features, PTM-GPT2 [24] fine-tunes a decoder-based autoregressive Transformer model, ProtGPT2, using a custom prompt to guide the model in accurately predicting PTM sites. More recently, PTM-Mamba [25] introduces a novel protein language model that integrates PTM information by incorporating PTM-specific tokens through bidirectional Mamba blocks and fusing them with ESM-2 embeddings using a gating mechanism. The resulting representations can be directly applied to downstream tasks such as phosphorylation and non-histone acetylation site prediction. In contrast to training models from scratch, these methods benefit from embeddings pre-trained on extensive protein sequence databases, allowing them to capture both local sequence motifs (e.g., short patterns around modification sites) and the global sequence context (e.g., long-range dependencies in protein sequences). This makes them highly effective for PTM site prediction. However, these methods have several limitations. First, 3D structural information plays a crucial role in PTM prediction, as most PTMs occur at solvent-exposed residues rather than buried ones, and PTM sites are often influenced by non-local sequence interactions due to protein folding [26]. However, most PLMs used for PTM site prediction are trained solely on amino acid sequences, limiting their ability to capture the structural context necessary for accurate PTM site prediction. Second, different PTM types often occur close to each other on the protein sequence and can share sequence motifs and structural dependencies. However, these methods typically train models for different PTM types separately, preventing the sharing of common features among them. The advantages and disadvantages of the representative PTM prediction tools are presented in Table 1.
To address the limitations mentioned above, we propose MTPrompt-PTM, a novel multi-task PTM site prediction model that leverages the prompt tuning of a structure-aware protein language model (S-PLM) [27] for 13 types of PTM sites, including phosphorylation (S, T, Y), N-linked glycosylation (N), O-linked glycosylation (S, T), ubiquitination (K), acetylation (K), methylation (K, R), SUMOylation (K), succinylation (K), and palmitoylation (C). Our model consists of an encoder and a decoder. The encoder uses S-PLM as the backbone to encode the protein sequence. S-PLM is a pre-trained PLM incorporating structural information with sequence-based embeddings from ESM2. During the multi-task training phase, all parameters of S-PLM are frozen. However, to effectively leverage the pre-trained model’s information, we perform prompt tuning on S-PLM. Prompt tuning [28] is a parameter-efficient fine-tuning (PEFT) technique that adds trainable embeddings, called ‘prompts’, to the sequence embeddings. Unlike full fine-tuning, where all model weights are updated, prompt tuning optimizes only the additional trainable embeddings, reducing the computational overhead while maintaining generalization. In our model, we propose a novel method for the initialization of our task prompts. The decoder consists of shared layers and task-specific layers, which capture common features and task-specific features separately. Additionally, we incorporate a knowledge distillation strategy, where single-task models teach a multi-task model, helping the multi-task model to outperform its single-task counterparts by integrating knowledge across multiple tasks. Our experimental results show that MTPrompt-PTM improves the predictive performance compared to single-task models. To further validate its effectiveness, we compare MTPrompt-PTM with state-of-the-art PTM prediction tools. The results demonstrate that MTPrompt-PTM outperforms these tools across all 13 PTM types, confirming its effectiveness.

2. Materials and Methods

2.1. Dataset and Data Processing

Numerous databases and research studies provide PTM data; however, most of them offer only short peptide fragments centered around the modified residue, lacking a full protein context or complete sequence information. These truncated sequences often miss the essential global context, making it difficult to capture long-range interactions between residues, which can be critical in determining PTM occurrence. To address this limitation, we utilize full-length protein sequences in our study, enabling the model to capture the comprehensive contextual information and long-range sequence dependencies necessary for accurate PTM site prediction.
UniProt, the largest protein sequence database with PTM annotations, contains over 200 million protein sequences and provides annotations for 200 PTM types. Therefore, we constructed a new PTM dataset from UniProt, incorporating 13 PTM types: phosphorylation (S, T, Y), N-linked glycosylation (N), O-linked glycosylation (S, T), ubiquitination (K), acetylation (K), methylation (K, R), SUMOylation (K), succinylation (K), and palmitoylation (C).
To build this dataset, we first downloaded full-length protein sequences along with their PTM annotations for the 13 PTM types from UniProt. The data were then filtered by species and sequence length, retaining only metazoan proteins and excluding sequences longer than 1022 residues. Sequences longer than 1022 were not processed because longer sequences would have exceeded the model’s input size limit and negatively affected the Transformer model’s performance. Processing excessively long sequences can lead to memory overload, slower processing times, and reduced effectiveness due to the quadratic complexity (O(n2)) of the attention mechanism. We chose not to truncate the sequences because truncation could result in losing important functional or structural information, especially for long protein sequences, where key motifs or functional sites might be outside the truncated region. By limiting the sequence length to 1022 tokens, we ensured that we retained the most relevant parts of the sequence without losing critical details, striking a balance between computational efficiency and maintaining the essential sequence context. Table 2 presents the PTM types, the corresponding UniProt annotations, and the number of protein sequences.
For each protein sequence, the PTM sites are treated as positive samples, while other positions with the same amino acids, excluding the PTM sites, are treated as negative samples. Figure 1 shows the number of PTM sites in terms of positive and negative sites for each PTM type. During training, the entire protein sequence is input, but the loss is calculated only for the positive and negative sites.
We then separated all the protein sequences into a training set and a testing set based on the timestamp. Protein sequences annotated in UniProt prior to 2010 were used for training, while those annotated after 2010 were reserved for testing. We trained our model on the training data and used the test data to compare our model’s performance with that of other state-of-the-art tools. Additionally, we applied the widely used clustering program CD-HIT-2D to assess the similarity between the training and testing data. The testing protein sequences with no more than 60%, 70%, and 80% similarity to the training data were generated using CD-HIT-2D. We present the performance of the testing data at different levels of sequence similarity to the training data.
Furthermore, we created another non-redundant dataset to evaluate our model. We applied CD-HIT [29] to cluster this dataset based on a 60% sequence similarity threshold. To avoid homologous redundancy, only one representative sequence from each cluster was selected. The non-redundant dataset was then split into training and testing sets in a 4:1 ratio. These datasets were used to train and evaluate our model, ensuring a robust performance assessment.
We further evaluated our model using an independent benchmark for phosphorylation and non-histone acetylation. The phosphorylation test set was obtained from the ProteinBERT benchmark [30], which is derived from PhosphoSitePlus [31], a comprehensive resource of experimentally validated post-translational modifications in human and mouse proteins. The non-histone acetylation test set was sourced from TransPTM [32], a Transformer-based model specifically designed for the prediction of non-histone acetylation sites.

2.2. Architecture of MTPrompt-PTM

This paper introduces MTPrompt-PTM, a multi-task model for post-translational modification prediction. The overall architecture of MTPrompt-PTM is illustrated in Figure 2. Our model includes an encoder and decoder. The encoder leverages S-PLM v2 as its backbone and is trained using prompt tuning with task prompts. The decoder is a hybrid architecture comprising shared feature extraction layers and task-specific classification layers.
Unlike most PTM prediction methods that use peptides as input, our model takes entire protein sequences as input. Initially, the protein sequences containing PTM sites are tokenized using the ESM2 tokenizer, which converts them into sequence embeddings. Task prompts, which act as additional trainable embeddings to guide the model in distinguishing between different PTM prediction tasks, are concatenated with the protein sequence embedding. This combined matrix is then passed through the encoder as usual. Throughout the multiple Transformer layers in S-PLM v2, the model generates updated task prompts and protein sequence embeddings. After the Transformer layers, the task prompts, along with the [CLS] and [EOS] tokens, are discarded. [CLS] (Classification) and [EOS] (End of Sequence) are special tokens commonly used in Transformer-based language models. The [CLS] token is typically added at the beginning of an input sequence, and its corresponding output embedding is used for classification tasks, summarizing the entire input. The [EOS] token marks the end of a sequence, signaling the model where the input terminates, which is particularly important in generative or sequential prediction tasks. Only the residue-level embeddings for the sequence are retained and passed to the decoder for further processing. During the entire training process, while the parameters of S-PLM remain frozen, the task embeddings are updated through gradient descent.
Our decoder is designed with both shared layers and task-specific layers. The shared layers consist of two CNN Inception modules and a fully connected (FC) layer. Each CNN Inception module is composed of three 1D CNN layers with different filter sizes, enabling the capture of multi-dimensional local information from the input sequences. The outputs from these two modules are concatenated to combine the captured local features. Following the CNN layers, a fully connected layer is added to further capture and refine the information. These shared layers are responsible for learning common, generalizable representations that can be applied across different post-translational modification (PTM) types. Once the shared representation is learned, task-specific classification layers are introduced to handle the unique characteristics of each PTM type. These task-specific layers consist of 13 fully connected layers, each corresponding to a different PTM type. These layers can be seen as 13 independent classification heads, with each head trained to focus on the specific sequence patterns, structural features, or biochemical properties associated with its respective PTM. Each PTM-specific head independently predicts the probability of the presence or absence of its respective modification at each relevant sequence position. By maintaining separate classification heads for each PTM type, the model ensures that features are tailored for each modification, enhancing the model’s predictive accuracy and allowing for more precise modeling of the diverse PTM signals.

2.3. Prompt Tuning on MTPrompt-PTM

In our encoder, to generate residue-level embeddings with enhanced structural information, we use S-PLM v2 [33] as the backbone. S-PLM [27] is a structure-aware protein language model integrating both sequence and structural information via contrastive learning. S-PLM v2 is the upgraded version of S-PLM, using a geometric vector perceptron (GVP) model [34] to achieve more precise residue-level embeddings by capturing detailed geometric properties. The sequence encoder of S-PLM builds upon a pre-trained ESM2 model, preserving previously learned protein knowledge while effectively adapting to new tasks. In contrast to ESM2, S-PLM explicitly incorporates structural information due to its pre-training on paired protein sequences and contact maps, enabling the direct encoding of spatial relationships and residue–residue interactions into its representations. T-SNE clustering results have shown that S-PLM achieves superior kinase group clustering compared to ESM2, underscoring its potential as an effective backbone model for PTM site prediction [27].
Although S-PLM already contains rich general knowledge learned from large-scale protein sequence data, we use prompt tuning to make task-specific adjustments for the prediction of post-translational modifications. The core idea of prompt tuning is to concatenate the task prompts with the protein sequence embeddings and input them into the pre-trained language model. This allows the Transformer operations to be performed while keeping the original model’s weights frozen. As a result, the final protein sequence embeddings are adjusted by the task prompts. Given that prompt tuning is highly sensitive to the initialization of the task prompts, it is crucial to initialize the prompts effectively. Therefore, we propose a novel initialization method for different tasks. First, we collect all the protein sequences from the training set and obtain their sequence embeddings by inputting them into S-PLM v2. Next, we extract 21-residue peptides centered on the PTM site and compute the average of their embeddings. These averages are then clustered into K clusters. Finally, we average the values within each cluster to generate the final prompt matrix. The specific initialization process is outlined below.
Step 1. Extracting Protein Embeddings
We utilized S-PLM v2 to generate embeddings for every residue in the training protein sequences. By feeding the entire training set into S-PLM, we obtained residue-level embeddings with a shape of N × 1280, where N represents the total number of residues. The value 1280 corresponds to the dimensionality of the embedding vector produced by S-PLM v2 for each residue. These embeddings capture rich, context-aware biochemical and structural information for each amino acid, providing a robust foundation for downstream PTM prediction.
Step 2. Generating PTM-Centered Embeddings
Since PTMs are often influenced by the local sequence environment surrounding the modified residue, we extract a 21-residue window centered on each PTM site to capture this context. This window includes a modified residue along with its ten upstream and ten downstream neighbors, effectively preserving the immediate biochemical environment. For each PTM site, the contextual window Si is defined as
S i = E i 10 , E i 9 , , E i , , E i + 9 E i 10 ,   S i R 21 × 1280
where E i represents the embedding of the i-th residue. These windows provide a localized, high-dimensional representation of the sequence, which is essential for accurate PTM site modeling.
Step 3. Computing Mean Embeddings
Each window is then averaged to produce a single 1280-dimensional vector that represents the local environment of the PTM site. The mean embedding Ei for each PTM site is computed as
E i = 1 21 j = i 10 i + 10 E j ,   E i R 1280
This mean pooling simplifies the representation while retaining the essential pattern of this PTM context.
Step 4. Clustering PTM Representations
Instead of averaging all PTM site embeddings into a single global representation, we apply K-means clustering to group them into K distinct clusters. After experimenting with different values of K, we found that the best results were achieved when K = 500. Each cluster captures a recurring motif or feature pattern shared across different proteins, preserving the diversity and subtlety of PTM-specific contexts. Formally, each cluster Ck is defined as
C k = E i | i c l u s t e r   k
The rationale behind selecting K clusters is to preserve the inherent diversity and fine-grained patterns captured in the embedding space. Biological modifications (PTMs) often exhibit subtle yet meaningful variations, reflecting different functional contexts, regulatory mechanisms, or kinase substrate specificity. By clustering the embeddings into multiple distinct groups, we maintain these biologically relevant variations, allowing each cluster to represent a unique pattern or functional state more accurately. Direct averaging could obscure these subtle differences and lead to the loss of critical biological insights.
Step 5. Computing Cluster Centroids
For the k-th cluster, the centroid vector (the mean of the embeddings in this cluster) μ k is computed as
μ k = 1 | C k | E i C k E i ,       μ k ϵ R 1280
These centroids serve as prototypical representations of common PTM-related contexts.
Step 6. Constructing the Final Prompt Matrix
The 500 centroids obtained from clustering are then stacked to form a matrix that serves as a task-specific embedding. This matrix acts as a learnable guide for the model, capturing diverse PTM-related patterns and functional contexts. The final prompt embedding matrix M has a shape of K × 1280 and is defined as
M = [ μ 1 ; μ 2 ; ; μ K ] ϵ R K × 1280

2.4. Multi-Task Training of MTPrompt-PTM

To improve both the performance and generalizability of the multi-task model, we adopt a knowledge distillation strategy, which transfers knowledge from a teacher model to a student model by training the student to imitate the teacher’s outputs. Unlike traditional training with one-hot labels, the teacher’s probability distribution over classes, referred to as soft labels, provides a richer and more informative learning signal. Clark et al. [35] demonstrated that using a single-task teacher model to guide a multi-task student model is significantly more effective than employing multiple teachers for multiple tasks. This is because the student benefits from exposure to a diverse set of PTM-specific teachers, similarly to how ensemble learning enhances generalization. Inspired by this, we apply knowledge distillation in our framework by using single-task models as teachers to train the multi-task model, enabling it to leverage both expert knowledge and shared task representations for improved performance.
As shown in Figure 3, the training process consists of two main steps. In Step 1, we independently train 13 single-task models for all 13 PTM types. These single-task models act as teacher models, with architectures nearly identical to that of the multi-task model, except for the absence of task-specific layers. After training, we use the single-task models to generate predictions for all training data. These predictions, serving as soft labels, capture subtle patterns and uncertainties often missed by traditional hard labels. In Step 2, we merge the training data from all PTM types to train the multi-task student model. The soft labels from the teacher models are used to guide the student model’s training. To further enhance the training, we adopt a teacher annealing strategy [35], progressively blending the teacher’s soft labels with ground truth annotations. This combination provides a refined supervisory signal, improving the model’s generalization and accuracy across diverse PTM types. To address potential class imbalances from simply concatenating all datasets, we apply a weighted loss function that combines soft labels (from teacher models) and hard labels (ground truth annotations). The weights are determined based on the dataset size, and the loss function is defined as
L ( θ ) =   t ϵ T w e i g h t t x t i y t i ϵ D t l ( γ y t i + 1 γ f t x t i , θ t , f t x t i , θ ) ,   γ ( 0 , 1 )
Here, T denotes the set of PTM tasks. For each task t, we first train a single-task teacher model with parameters θ t and then use its predictions to guide the multi-task student model with parameters θ . The variable γ controls the balance between hard and soft labels during training; specifically, we set γ = 0.5 to dynamically reconstruct the training labels by averaging the teacher’s soft predictions and the ground truth.

3. Results

3.1. Comparison with State-of-the-Art Tools

Here, we compare MTPrompt-PTM with several state-of-the-art PTM prediction tools, including MusiteDeep [17], PTMGPT2 [24], NetPhos3.1 [10], NetOGlyc4.0 [36], NetNGlyc1.0 [11], GPS-SUMO2.0 [18], CSS-Palm4.0 [37,38], GSP-MSP [12], and MethylSight [13]. Table 2 presents the prediction results on the test set for all 13 PTM types, including phosphorylation (S, T, Y), N-linked glycosylation (N), O-linked glycosylation (S, T), ubiquitination (K), acetylation (K), methylation (K, R), SUMOylation (K), succinylation (K), and palmitoylation (C).
As shown in Table 3, MTPrompt-PTM outperformed all other tools across all 13 PTM types. For example, in terms of the Matthews correlation coefficient (MCC), phosphorylation (S) exhibited a substantial improvement, with MTPrompt-PTM achieving a 118.9% increase in the MCC compared to MusiteDeep. This trend continued with phosphorylation (T, Y), where our model outperformed MusiteDeep by 58.0% and 26.5%, respectively. In the case of O-linked glycosylation (S, T), MTPrompt-PTM showed improvements of 24.2% and 63.6% over MusiteDeep. For N-linked glycosylation (N), MTPrompt-PTM showed a smaller improvement of 3.2%. For SUMOylation (K), MTPrompt-PTM outperformed PTMGPT2 by 46.2%, while MusiteDeep did not achieve comparable results. This difference can be attributed to the fact that MusiteDeep uses a smaller dataset than the one used to train our model. Similarly, for ubiquitination (K) and acetylation (K), our model exceeded MusiteDeep by 140.3% and 16.7%, respectively. Since MusiteDeep does not provide a model for the prediction of succinylation, we compared MTPrompt-PTM only with PTMGPT2 for this PTM type. Here, our model achieved a 104.2% improvement over PTMGPT2 in succinylation (K). For palmitoylation (C), methylation (R), and methylation (K), MTPrompt-PTM surpassed MusiteDeep by 28.5%, 50.7%, and 13.3%, respectively. These results demonstrate that our model outperforms many existing PTM prediction tools and can be considered a leading tool for PTM site prediction.
Figure 4 and Figure 5 present the ROC curves and precision–recall curves on the test set for all PTM types across different methods. Since PTM-GPT2 only provides binary predictions (i.e., whether a residue is a PTM site or not) without probability scores, we could not plot its full ROC and precision–recall curves. However, we calculated its TPR, FPR, precision, and recall to mark PTM-GPT2 as a single point on the curves. The AUC and PRAUC values of all methods are consistent with the results in Table 2, further demonstrating that our model outperforms other approaches.
To further evaluate the kinase-level specificity and robustness of MTPrompt-PTM, we conducted a comparative analysis across several kinase families. Table 4 summarizes the number of kinase sites included in both the training and testing sets, as well as the number of correctly predicted kinase sites and the corresponding accuracy for each model. Our results demonstrate that MTPrompt-PTM consistently achieves high accuracy across most kinase families. Particularly in the CMGC and AGC families, which have relatively large numbers of kinase sites in both the training and testing sets, MTPrompt-PTM significantly outperforms other models, indicating its strong generalization abilities in data-rich settings. These findings suggest that our multi-task prompt tuning framework and structure-aware backbone not only improve overall PTM prediction but also enhance kinase-specific site recognition. This highlights the potential of MTPrompt-PTM as a generalizable tool for kinase-centered PTM analysis. Notably, all models perform poorly on the “other” kinase family, with MTPrompt-PTM achieving accuracy of only 0.111 and the remaining models showing similarly low or inconsistent performance. This may be attributed to the relatively limited number of training samples available for this group (only 415 sites, compared to over 1700 for CMGC or 879 for AGC), which constrains the model’s ability to learn meaningful and generalizable features for these kinases. Additionally, the “other” category likely includes a diverse and heterogeneous set of kinases that do not share common sequence or structural motifs, further complicating the prediction task. This observation highlights a common challenge in PTM prediction: models tend to favor well-represented kinase families during training (e.g., AGC and CMGC), and their performance may degrade when applied to underrepresented or diverse groups (e.g., other).
Figure 6 illustrates the performance on test subsets with varying levels of sequence similarity to the training data, evaluated using the F1 score. The test subsets, containing protein sequences with no more than 60%, 70%, 80%, 90%, and 100% similarity to the training set, were generated using CD-HIT-2D. Across nearly all PTM types, MTPrompt-PTM consistently outperformed other methods regardless of the sequence similarity thresholds. However, for certain PTMs, we observe a noticeable decline in MTPrompt-PTM’s performance as the sequence similarity decreases. This degradation can be attributed to several factors. Some PTM types may have fewer annotated examples or less diversity in the training set. As a result, the model may be overfit to specific sequence patterns and struggle to generalize to more dissimilar sequences. In addition, certain PTMs, such as phosphorylation (S) and SUMOylation (K), are known to be highly context-dependent, often influenced by local sequence motifs or secondary structure elements. When the sequence similarity drops, these subtle cues may no longer be preserved, making it more challenging for the model to accurately identify modification sites. Another contributing factor may be the imbalance in positive and negative samples across the different similarity subsets. As the similarity threshold decreases, the number of true PTM sites that remain in the test set may decline disproportionately, leading to a more severe class imbalance and potentially skewing the model’s predictions. Despite these challenges, MTPrompt-PTM still maintains relatively strong performance across all similarity levels, underscoring its robustness compared to existing tools.
Figure 7 compares the performance of MTPrompt-PTM with that of MusiteDeep, PTMGPT2, and PTM-Mamba on independent benchmark datasets for phosphorylation and non-histone acetylation. For phosphorylation, MTPrompt-PTM consistently achieves the best overall performance, particularly excelling in its accuracy, precision, F1 score, and MCC. This suggests that our model makes more reliable and balanced predictions with fewer false positives and the better discrimination of true modification sites. PTM-Mamba shows the highest recall, indicating its sensitivity in detecting true sites, but this comes at the cost of lower precision, leading to more false positives. In non-histone acetylation, MTPrompt-PTM again outperforms other methods in terms of accuracy and precision, demonstrating its ability to correctly identify modification sites with fewer errors. While MusiteDeep and PTM-Mamba have comparable recall values, their lower precision reduces their overall predictive quality. The differences may be due to MTPrompt-PTM’s multi-task prompt tuning strategy and structure-aware embedding, which enhance its specificity and robustness across diverse PTM types. Overall, these results highlight that MTPrompt-PTM strikes a better balance between sensitivity and specificity, making it a more effective tool for PTM site prediction compared to competing methods.

3.2. Comparison with Single-Task Models on Different PTM Types

To evaluate the effectiveness of our proposed multi-task architecture, we compared it against 13 independently trained single-task models on our non-redundant dataset, with each model trained on a single PTM type. Both the multi-task and single-task models shared the same pre-trained backbone and utilized the same predefined task tokens, ensuring a fair comparison.
The results in Figure 8 indicate that phosphorylation (S, T, and Y) exhibited varying degrees of improvement in the multi-task model. Phosphorylation (S) showed only a 2% improvement in the AUPRC, likely due to its large training dataset, which may reduce its dependency on knowledge transfer from other PTM types. In contrast, phosphorylation (T) showed substantial gains, with threonine improving by 9% (AUPRC), suggesting that phosphorylation (T) benefits from shared knowledge with phosphorylation (S). O-linked glycosylation (S and T) demonstrated strong improvements across all metrics, with O-linked glycosylation (S) achieving improvements of 11.1% (AUPRC) and O-linked glycosylation (T) seeing the highest gains, with the AUPRC increasing by 11.2%, suggesting a possible interaction with phosphorylation. N-linked glycosylation (N) performed slightly worse in the multi-task setting, with an AUPRC decrease of 0.1%, indicating that asparagine may be more independent and less influenced by other PTM types. Acetylation (K), ubiquitination (K), succinylation (K), and SUMOylation (K) benefited from the multi-task model, showing improvements of 9.5%, 2.9%, 10.2%, and 2.5% (AUPRC), suggesting that PTMs occurring on lysine (K) may support each other through shared information. Overall, the results demonstrate that multi-task learning improves the performance for PTMs that share functional or structural similarities, such as phosphorylation and O-linked glycosylation. However, PTMs that are more independent, such as N-linked glycosylation (N), may not benefit as much from multi-task training. Additionally, PTMs on lysine (K) appear to influence each other, as seen in the gains for acetylation, ubiquitination, SUMOylation, and succinylation. The ROC curves are shown in Supplementary Figure S1. From the AUROC, we observe that the performance of the multi-task model is similar to that of the single-task model. This could be due to the imbalance between negative and positive samples. The improved AUPRC of the multi-task model likely arises from its ability to generalize across multiple PTM types, which helps to enhance its precision and recall for the positive class, especially in the presence of class imbalances. While the multi-task setup does not significantly impact the AUROC (as the AUROC is less sensitive to class imbalances), it has a more pronounced effect on the precision–recall performance.
The F1 and MCC can show the improvements in the multi-task model as well, as presented in Table 5. From this table, we can see that MTPrompt-PTM achieves better performance than the single-task model in most PTM types. Notably, for phosphorylation (T), phosphorylation (Y), O-linked glycosylation (S and T), and SUMOylation (K), the multi-task model yields higher F1 and MCC scores, indicating that leveraging shared knowledge across tasks enhances the prediction accuracy for these modification types. For example, in O-linked glycosylation (T), the F1 and MCC scores of the multi-task model reach 0.716 and 0.685, respectively, outperforming the single-task model (0.670/0.634). Although the single-task model performs slightly better in a few PTMs, such as N-linked glycosylation (N) and methylation (R), the difference is marginal, suggesting that the multi-task framework does not significantly compromise the performance even in well-characterized or distinct PTMs. Overall, the results demonstrate that the multi-task model provides more stable and generalized performance across diverse PTM types, especially those with limited training data or weaker individual signals.

3.3. Ablation Study

3.3.1. Comparison with Multi-Task Model Without Knowledge Distillation on Different PTM Types

To assess the impact of knowledge distillation, we compared MTPrompt-PTM with a baseline multi-task model trained solely on hard labels without distillation. As shown in Table 6, MTPrompt-PTM consistently achieved higher F1 and MCC scores across all PTM types, demonstrating improved predictive accuracy and robustness.
Table 7 presents the AUROC and AUPRC comparisons, where MTPrompt-PTM achieves notably higher AUPRC values in most cases. For instance, in O-linked glycosylation (S) and (T), the AUPRC increased from 0.518 and 0.761 to 0.552 and 0.784, respectively. This improvement in the AUPRC is particularly significant given the class imbalance typically present in PTM site prediction tasks.
These results confirm that incorporating soft labels from single-task models during training enables the multi-task framework to capture richer probabilistic information and subtle interdependencies across PTM types. This additional knowledge improves its generalization, especially for low-signal or data-scarce PTMs, ultimately leading to better overall performance than training on hard labels alone.

3.3.2. Comparison with Fine-Tuning the Last Two Layers of S-PLM on Different PTM Types

To further evaluate the effectiveness of our prompt tuning strategy, we compared MTPrompt-PTM with a commonly used fine-tuning approach that updates only the last two layers of the pre-trained S-PLM model. This comparison assessed whether prompt tuning could match or exceed the performance while minimizing the computational overhead. As shown in Table 8, MTPrompt-PTM achieved higher F1 and MCC scores across nearly all PTM types. Notably, large gains are observed in phosphorylation (T) and O-linked glycosylation (T), where the F1/MCC scores increase from 0.403/0.392 and 0.548/0.554 to 0.461/0.432 and 0.716/0.685, respectively. These results indicate that prompt tuning significantly enhances both the precision and robustness in PTM site prediction. Although the fine-tuned model slightly outperforms MTPrompt-PTM in ubiquitination (K), methylation (K), and methylation (R), the performance gap is minimal.
Table 9 further confirms this trend through AUROC and AUPRC comparisons. MTPrompt-PTM shows a notable advantage in the AUPRC, especially for PTMs like O-linked glycosylation (S) and O-linked glycosylation (T), with improvements from 0.525 and 0.757 to 0.552 and 0.784, respectively. These metrics are particularly important in highly imbalanced datasets, where the ability to correctly identify true positives is critical. Furthermore, in phosphorylation (Y), the AUPRC improves from 0.499 to 0.504, and, in SUMOylation (K), from 0.354 to 0.369, reinforcing the consistency of the performance gains across diverse PTM types.
This improvement may be attributed to two key factors. First, by keeping the entire ESM2 model frozen, MTPrompt-PTM retains the broad, general-purpose protein representations learned during large-scale pre-training, avoiding the risk of overfitting or forgetting. Second, the integration of task-specific prompt embeddings enables fine-grained adaptation to each PTM type, capturing subtle biochemical cues that are often lost in shallow fine-tuning. In contrast, updating only the last two layers may be insufficient to extract the deep contextual information required for accurate PTM site identification.

4. Discussion

Exposing a model to a diverse set of tasks can serve as an effective form of regularization, reducing the risk of overfitting by encouraging the learning of generalizable patterns rather than memorizing task-specific details. A key advantage of multi-task learning lies in its ability to facilitate knowledge transfer between related tasks, i.e., improvements in one task can enhance the performance in others. Building on this principle, we developed MTPrompt-PTM, the first multi-task PTM prediction model capable of predicting 13 types of post-translational modifications (PTMs): phosphorylation (S, T, Y), N-linked glycosylation (N), O-linked glycosylation (S, T), ubiquitination (K), acetylation (K), methylation (K, R), SUMOylation (K), succinylation (K), and palmitoylation (C). At inference time, users simply need to provide a protein sequence along with the PTM type(s) that they wish to predict, making MTPrompt-PTM both versatile and user-friendly.
Unlike conventional PLM-based methods, MTPrompt-PTM leverages multi-task prompt tuning on the pre-trained S-PLM model, allowing it to adapt to diverse PTM types by incorporating task-specific signals. A decoder architecture composed of shared and task-specific layers further enables the model to capture both general and PTM-specific representations during training. To enhance the performance and generalization, knowledge distillation is employed, transferring insights from multiple single-task teacher models into a unified multi-task student model. Through extensive comparisons with single-task models and several state-of-the-art PTM prediction tools, MTPrompt-PTM consistently outperforms alternative methods across all PTM types, affirming the effectiveness of multi-task learning within this domain.
The effectiveness of MTPrompt-PTM can be attributed to three key factors. First, MTPrompt-PTM leverages the S-PLM v2 backbone, which captures both local and global sequence and structural information, providing a strong foundation for PTM prediction. Second, it employs multi-task prompt tuning, a lightweight fine-tuning method that efficiently adapts the PLM to the nuances of multiple PTM types while retaining the general-purpose knowledge encoded in the protein language model. This approach enables PTM prediction without compromising the pre-trained model’s integrity. Third, MTPrompt-PTM incorporates a multi-PTM training framework with a knowledge distillation strategy, facilitating shared learning across different PTM types. This strategy enhances the performance, particularly for PTMs with limited training data.
However, several limitations exist. First, due to the computational complexity of processing long sequences, MTPrompt-PTM can only accept protein sequences of up to 1022 residues. This limitation could restrict its applicability in real-world scenarios, where longer sequences are common. Second, to better simulate real-world conditions, we separated the training and testing sets based on timestamps. However, some similarities between the training and testing sets remained. As the sequence similarity between the training and testing sets decreases, the performance also decreases. This could be due to data leakage, where information from the testing set unintentionally influences the training process, potentially causing overfitting. As a result, the model becomes overly specialized in sequences present in the training set and struggles to generalize to novel sequences in the test set.
In the future, we aim to extend our framework to support continuous learning, enabling it to accommodate additional modifications as new data become available. Expanding the dataset and incorporating more diverse annotations will improve the generalizability. However, challenges related to ensuring that continuous learning does not interfere with the performance of previous models need to be addressed. Additionally, the imbalanced nature of PTM training data may lead to biased predictions toward overrepresented classes. Future work should explore techniques such as focal loss, class reweighting, or data augmentation to address this imbalance and improve the model’s fairness and accuracy.

5. Conclusions

MTPrompt-PTM represents a step forward in the scalable, multi-task prediction of post-translational modification sites. Instead of relying on fragmented single-task approaches, it unifies 13 PTM types into one flexible framework, enabling users to make efficient, type-specific predictions from a single model. Its consistent performance across benchmark datasets, kinase-specific analyses, and external validation scenarios underscores its potential for broad application in bioinformatics. Looking ahead, MTPrompt-PTM provides a foundation for continuous, modular learning as PTM databases expand, supporting future developments in multi-label PTM annotation at the proteome scale.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/biom15060843/s1, Figure S1: Performance comparison of AUROC between multi-task and single-task on 13 PTM types.

Author Contributions

Conceptualization, D.W., Y.H. and D.X.; methodology, D.W., D.X., F.H. and Y.H.; software, Y.H.; formal analysis, D.W., F.H. and D.X.; data curation, Y.H.; writing—original draft preparation, Y.H.; writing—review and editing, D.W., F.H., D.X. and Q.S.; supervision, D.X.; project administration, D.X.; funding acquisition, D.X. and Q.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Institutes of Health (grant R35GM126985 to D.X.) and the National Institutes of Health (grant R01LM014510 to Q.S.).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The source code, data, and trained model are available at GitHub (https://github.com/hanye311/MTPrompt-PTM/) (accessed on 6 June 2025).

Acknowledgments

The authors thank the anonymous reviewers for their valuable suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Humphrey, S.J.; James, D.E.; Mann, M. Protein phosphorylation: A major switch mechanism for metabolic regulation. Trends Endocrinol. Metab. 2015, 26, 676–687. [Google Scholar] [CrossRef] [PubMed]
  2. Vu, L.D.; Gevaert, K.; De Smet, I. Protein language: Post-translational modifications talking to each other. Trends Plant Sci. 2018, 23, 1068–1080. [Google Scholar] [CrossRef] [PubMed]
  3. Deribe, Y.L.; Pawson, T.; Dikic, I. Post-translational modifications in signal integration. Nat. Struct. Mol. Biol. 2010, 17, 666–672. [Google Scholar] [CrossRef] [PubMed]
  4. Khoury, G.A.; Baliban, R.C.; Floudas, C.A. Proteome-wide post-translational modification statistics: Frequency analysis and curation of the swiss-prot database. Sci. Rep. 2011, 1, 90. [Google Scholar] [CrossRef]
  5. Zhu, H.; Bilgin, M.; Snyder, M. Proteomics. Annu. Rev. Biochem. 2003, 72, 783–812. [Google Scholar] [CrossRef]
  6. Olsen, J.V.; Blagoev, B.; Gnad, F.; Macek, B.; Kumar, C.; Mortensen, P.; Mann, M. Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 2006, 127, 635–648. [Google Scholar] [CrossRef]
  7. Renart, J.; Reiser, J.; Stark, G.R. Transfer of proteins from gels to diazobenzyloxymethyl-paper and detection with antisera: A method for studying antibody specificity and antigen structure. Proc. Natl. Acad. Sci. USA 1979, 76, 3116–3120. [Google Scholar] [CrossRef]
  8. Chen, Z.; Liu, X.; Li, F.; Li, C.; Zhang, X.; Liu, B.; Zhou, Y.; Song, J. Large-scale comparative assessment of computational predictors for lysine post-translational modification sites. Brief. Bioinform. 2020, 21, 2065–2076. [Google Scholar] [CrossRef]
  9. Esmaili, F.; Pourmirzaei, M.; Ramazi, S.; Shojaeilangari, S.; Yavari, E. A Review of Machine Learning and Algorithmic Methods for Protein Phosphorylation Sites Prediction. arXiv 2022, arXiv:2208.04311. [Google Scholar] [CrossRef]
  10. Blom, N.; Gammeltoft, S.; Brunak, S. Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J. Mol. Biol. 1999, 294, 1351–1362. [Google Scholar] [CrossRef]
  11. Gupta, R.; Brunak, S. Prediction of glycosylation across the human proteome and the correlation to protein function. Pac. Symp. Biocomput. 2002, 7, 310–322. [Google Scholar] [PubMed]
  12. Deng, W.; Wang, Y.; Ma, L.; Zhang, Y.; Ullah, S.; Xue, Y. Computational prediction of methylation types of covalently modified lysine and arginine residues in proteins. Brief. Bioinform. 2017, 18, 647–658. [Google Scholar] [CrossRef] [PubMed]
  13. Biggar, K.K.; Ruiz-Blanco, Y.B.; Charih, F.; Fang, Q.; Connolly, J.; Frensemier, K.; Adhikary, H.; Li, S.S.C.; Green, J.R. MethylSight: Taking a wider view of lysine methylation through computer-aided discovery to provide insight into the human methyl-lysine proteome. bioRxiv 2018. bioRxiv:274688. [Google Scholar]
  14. Ertelt, M.; Mulligan, V.K.; Maguire, J.B.; Lyskov, S.; Moretti, R.; Schiffner, T.; Meiler, J.; Schroeder, C.T. Combining machine learning with structure-based protein design to predict and engineer post-translational modifications of proteins. PLoS Comput. Biol. 2024, 20, e1011939. [Google Scholar] [CrossRef]
  15. Wang, D.; Zeng, S.; Xu, C.; Qiu, W.; Liang, Y.; Joshi, T.; Xu, D. MusiteDeep: A deep-learning framework for general and kinase-specific phosphorylation site prediction. Bioinformatics 2017, 33, 3909–3916. [Google Scholar] [CrossRef]
  16. Wang, D.; Liang, Y.; Xu, D. Capsule network for protein post-translational modification site prediction. Bioinformatics 2019, 35, 2386–2394. [Google Scholar] [CrossRef]
  17. Wang, D.; Liu, D.; Yuchi, J.; He, F.; Jiang, Y.; Cai, S.; Li, J.; Xu, D. MusiteDeep: A deep-learning based webserver for protein post-translational modification site prediction and visualization. Nucleic Acids Res. 2020, 48, W140–W146. [Google Scholar] [CrossRef]
  18. Gou, Y.; Liu, D.; Chen, M.; Wei, Y.; Huang, X.; Han, C.; Feng, Z.; Zhang, C.; Lu, T.; Peng, D.; et al. GPS-SUMO 2.0: An updated online service for the prediction of SUMOylation sites and SUMO-interacting motifs. Nucleic Acids Res. 2024, 52, W238–W247. [Google Scholar] [CrossRef]
  19. Lin, Z.; Akin, H.; Rao, R.; Hie, B.; Zhu, Z.; Lu, W.; Smetanin, N.; Verkuil, R.; Kabeli, O.; Shmueli, Y.; et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023, 379, 1123–1130. [Google Scholar] [CrossRef]
  20. Elnaggar, A.; Heinzinger, M.; Dallago, C.; Rihawi, G.; Wang, Y.; Jones, L.; Gibbs, T.; Feher, T.; Angerer, C.; Steinegger, M.; et al. ProtTrans: Towards cracking the language of life’s code through self-supervised deep learning and high performance computing. arXiv 2020, arXiv:2007.06225. [Google Scholar]
  21. Pakhrin, S.C.; Pokharel, P.; Bhattarai, A.; Kc, D.B. LMNglyPred: Prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model. Glycobiology 2023, 33, 411–420. [Google Scholar] [CrossRef] [PubMed]
  22. Alkuhlani, A.; Gad, W.; Roushdy, M.; Voskoglou, M.G.; Salem, A.M. PTG-PLM: Predicting Post-Translational Glycosylation and Glycation Sites Using Protein Language Models and Deep Learning. Axioms 2022, 11, 469. [Google Scholar] [CrossRef]
  23. Pokharel, S.; Pratyush, P.; Ismail, H.D.; Ma, J.; KC, D.B. Integrating Embeddings from Multiple Protein Language Models to Improve Protein O-GlcNAc Site Prediction. Int. J. Mol. Sci. 2023, 24, 16000. [Google Scholar] [CrossRef] [PubMed]
  24. Shrestha, P.; Kandel, J.; Tayara, H.; Chong, K.T. Post-translational modification prediction via prompt-based fine-tuning of a GPT-2 model. Nat. Commun. 2024, 15, 6699. [Google Scholar] [CrossRef]
  25. Peng, F.Z.; Wang, C.; Chen, T.; Schussheim, B.; Vincoff, S.; Chatterjee, P. PTM-Mamba: A PTM-aware protein language model with bidirectional gated Mamba blocks. Nat. Methods 2025, 22, 945–949. [Google Scholar] [CrossRef]
  26. Bludau, I.; Willems, S.; Zeng, W.-F.; Strauss, M.T.; Hansen, F.M.; Tanzer, M.C.; Karayel, O.; Schulman, B.A.; Mann, M. The structural context of posttranslational modifications at a proteome-wide scale. PLoS Biol. 2022, 20, e3001636. [Google Scholar] [CrossRef]
  27. Wang, D.; Abbas, U.L.; Shao, Q.; Chen, J.; Xu, D. S-PLM: Structure-aware Protein Language Model via Contrastive Learning between Sequence and Structure. Adv. Sci. 2023, 12, e2404212. [Google Scholar] [CrossRef]
  28. Lester, B.; Al-Rfou, R.; Constant, N. The Power of Scale for Parameter-Efficient Prompt Tuning. arXiv 2021, arXiv:2104.08691. [Google Scholar]
  29. Li, W.; Godzik, A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006, 22, 1658–1659. [Google Scholar] [CrossRef]
  30. Brandes, N.; Ofer, D.; Peleg, Y.; Rappoport, N.; Linial, M. ProteinBERT: A universal deep-learning model of protein sequence and function. Bioinformatics 2022, 38, 2102–2110. [Google Scholar] [CrossRef]
  31. Hornbeck, P.V.; Kornhauser, J.M.; Tkachev, S.; Bin Zhang, B.; Skrzypek, E.; Murray, B.; Latham, V.; Sullivan, M. PhosphoSitePlus: A comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res. 2012, 40, D261–D270. [Google Scholar] [CrossRef] [PubMed]
  32. Meng, L.; Chen, X.; Cheng, K.; Chen, N.; Zheng, Z.; Wang, F.; Sun, H.; Wong, K.-C. TransPTM: A transformer-based model for non-histone acetylation site prediction. Brief. Bioinform. 2024, 25, bbae219. [Google Scholar] [CrossRef] [PubMed]
  33. Zhang, Y.; Qin, Y.; Pourmirzaei, M.; Shao, Q.; Wang, D.; Xu, D. Enhancing Structure-aware Protein Language Models with Efficient Fine-tuning for Various Protein Prediction Tasks. bioRxiv 2025. bioRxiv:2025.04.23.650337. [Google Scholar]
  34. Jing, B.; Eismann, S.; Suriana, P.; Townshend, R.J.L.; Dror, R. Learning from Protein Structure with Geometric Vector Perceptrons. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual Event, Austria, 3–7 May 2021. [Google Scholar]
  35. Clark, K.; Luong, M.-T.; Khandelwal, U.; Manning, C.D.; Le, Q.V. BAM! Born-Again Multi-Task Networks for Natural Language Understanding. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 5931–5937. [Google Scholar]
  36. Steentoft, C.; Vakhrushev, S.Y.; Joshi, H.J.; Kong, Y.; Vester-Christensen, M.B.; Schjoldager, K.T.; Lavrsen, K.; Dabelsteen, S.; Pedersen, N.B.; Marcos-Silva, L.; et al. Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology. EMBO J. 2013, 32, 1478–1488. [Google Scholar] [CrossRef] [PubMed]
  37. Zhou, F.; Xue, Y.; Yao, X.; Xu, Y. CSS-Palm: Palmitoylation site prediction with a clustering and scoring strategy (CSS). Bioinformatics 2006, 22, 894–896. [Google Scholar] [CrossRef]
  38. Ren, J.; Wen, L.; Gao, X.; Jin, C.; Xue, Y.; Yao, X. CSS-Palm 2.0: An updated software for palmitoylation sites prediction. Protein Eng. Des. Sel. 2008, 21, 639–644. [Google Scholar] [CrossRef]
Figure 1. The distribution of positive and negative PTM sites across different PTM types.
Figure 1. The distribution of positive and negative PTM sites across different PTM types.
Biomolecules 15 00843 g001
Figure 2. The architecture of MTPrompt-PTM. During multi-task training, the model takes the task name and protein sequence as input. The task prompts are initialized using our proposed method based on the task name. The protein sequence is tokenized and embedded using the ESM2 tokenizer. The task prompts are then concatenated with the sequence embeddings and input into the encoder. The backbone of the encoder is S-PLM v2, with the input passing through 33 Transformer encoder layers. Then, residue-level representations (excluding [CLS], [EOS], and task tokens) are extracted and passed to the decoder. During the entire training process, while the parameters of S-PLM remain frozen, the task prompts are updated through gradient descent. The decoder features a hybrid architecture with shared and task-specific layers. The shared component consists of two CNN Inception modules, each containing three 1D convolutional layers with varying kernel sizes, followed by concatenation and a fully connected layer. The task-specific layers process the shared residue representation and perform classification. Each task-specific head corresponds to a different PTM type, receiving residue representations and outputting whether the residue belongs to the respective PTM type.
Figure 2. The architecture of MTPrompt-PTM. During multi-task training, the model takes the task name and protein sequence as input. The task prompts are initialized using our proposed method based on the task name. The protein sequence is tokenized and embedded using the ESM2 tokenizer. The task prompts are then concatenated with the sequence embeddings and input into the encoder. The backbone of the encoder is S-PLM v2, with the input passing through 33 Transformer encoder layers. Then, residue-level representations (excluding [CLS], [EOS], and task tokens) are extracted and passed to the decoder. During the entire training process, while the parameters of S-PLM remain frozen, the task prompts are updated through gradient descent. The decoder features a hybrid architecture with shared and task-specific layers. The shared component consists of two CNN Inception modules, each containing three 1D convolutional layers with varying kernel sizes, followed by concatenation and a fully connected layer. The task-specific layers process the shared residue representation and perform classification. Each task-specific head corresponds to a different PTM type, receiving residue representations and outputting whether the residue belongs to the respective PTM type.
Biomolecules 15 00843 g002
Figure 3. Training process of MTPrompt-PTM. In Step 1, we independently train 13 single-task models, each corresponding to a different PTM type, which serve as teacher models. The saturated orange ovals are teacher models. These models have architectures similar to that of the multi-task model, with the key difference being the absence of task-specific layers. The teacher models generate soft labels through predictions on the training data, capturing subtle patterns and uncertainties missed by traditional hard labels. The paler peach boxes immediately to their right are the soft-label outputs those teachers produce. In Step 2, we merge the training data from all PTM types to train the multi-task student model. The powder-blue oval is the student model that learns all tasks jointly. The soft labels from the teacher models guide the student model’s training. A teacher annealing strategy is applied, progressively blending the teacher’s soft labels with ground truth annotations to improve the model’s generalization and accuracy. The light-blue rectangle below it is the student’s prediction. To address class imbalances, we use a weighted loss function that combines both soft labels and hard labels, with weights determined by the dataset size.
Figure 3. Training process of MTPrompt-PTM. In Step 1, we independently train 13 single-task models, each corresponding to a different PTM type, which serve as teacher models. The saturated orange ovals are teacher models. These models have architectures similar to that of the multi-task model, with the key difference being the absence of task-specific layers. The teacher models generate soft labels through predictions on the training data, capturing subtle patterns and uncertainties missed by traditional hard labels. The paler peach boxes immediately to their right are the soft-label outputs those teachers produce. In Step 2, we merge the training data from all PTM types to train the multi-task student model. The powder-blue oval is the student model that learns all tasks jointly. The soft labels from the teacher models guide the student model’s training. A teacher annealing strategy is applied, progressively blending the teacher’s soft labels with ground truth annotations to improve the model’s generalization and accuracy. The light-blue rectangle below it is the student’s prediction. To address class imbalances, we use a weighted loss function that combines both soft labels and hard labels, with weights determined by the dataset size.
Biomolecules 15 00843 g003
Figure 4. Performance comparison of AUC on 13 PTM types.
Figure 4. Performance comparison of AUC on 13 PTM types.
Biomolecules 15 00843 g004aBiomolecules 15 00843 g004b
Figure 5. Performance comparison of PRAUC on 13 PTM types.
Figure 5. Performance comparison of PRAUC on 13 PTM types.
Biomolecules 15 00843 g005aBiomolecules 15 00843 g005b
Figure 6. Performance on test subsets with different levels of sequence similarity to the training data, evaluated by F1. The protein sequences of testing data that had no more than 60%, 70%, 80%, 90%, and 100% similarity to the training data were generated by CD-HIT-2D.
Figure 6. Performance on test subsets with different levels of sequence similarity to the training data, evaluated by F1. The protein sequences of testing data that had no more than 60%, 70%, 80%, 90%, and 100% similarity to the training data were generated by CD-HIT-2D.
Biomolecules 15 00843 g006
Figure 7. Comparative analysis of PTM prediction tools on independent phosphorylation and acetylation data.
Figure 7. Comparative analysis of PTM prediction tools on independent phosphorylation and acetylation data.
Biomolecules 15 00843 g007
Figure 8. Performance comparison of PRAUC between multi-task and single-task models on 13 PTM types.
Figure 8. Performance comparison of PRAUC between multi-task and single-task models on 13 PTM types.
Biomolecules 15 00843 g008
Table 1. Overview of representative PTM prediction tools.
Table 1. Overview of representative PTM prediction tools.
CategoryModelDescriptionAdvantagesDisadvantages
Machine Learning-BasedNetPhos 3.1
NetNGlyc 1.0
GPS-MSP
MethylSight
Ertelt et al.
Use manually designed features with classical ML models such as ANNs or SVM.Easy to interpret and efficient for small datasets, producing well-established tools in early PTM prediction research.Cannot capture long-range dependencies, rely heavily on expert-crafted features, and generalize poorly to unseen data.
Deep Learning-BasedMusiteDeep
CapsNet-PTM
GPS-SUMO 2.0
Leverage CNNs, CapsuleNets, and other DL architectures to automatically learn features from sequence data.Automatically learn features from raw data and offer better performance on large-scale datasets.Rely on local sequence windows, ignore structural information, are usually trained separately for each PTM type, and require large, labeled datasets.
Protein Language Model-BasedLmnglypred
PTG-PLM
O-GlcNAc
PTM-GPT2
PTM-Mamba
Use embeddings from large-scale pre-trained PLMs or fine-tune PLMs for PTM prediction.Capture long-range sequence dependencies, benefit from massive pre-training, support transfer learning and generalization.Lack of direct structural context and rarely leverage effective joint learning across multiple PTM types.
Table 2. UniProt annotations and number of downloaded protein sequences for different PTM types.
Table 2. UniProt annotations and number of downloaded protein sequences for different PTM types.
PTM TypePTM Annotation in UniProtNumber of Protein Sequences
Phosphorylation (S)Phosphoserine; Diphosphoserine; O-(2-cholinephosphoryl)serine;
(Microbial infection) Phosphoserine; O-(pantetheine4′phosphoryl)serine;
(Microbial infection) O-(2-cholinephosphoryl) serine
12,230
Phosphorylation (T)(Microbial infection) Phosphothreonine; Phosphothreonine8551
Phosphorylation (Y)Phosphotyrosine3782
Ubiquitination (K)(Microbial infection) Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in ubiquitin); Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in ubiquitin and interchain with MARCHF2);
Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in ubiquitin)
1225
N-Linked Glycosylation (N)N-linked (GlcNAc…) (paucimannose) asparagine; N-linked (GlcNAc…) (keratan sulfate) asparagine;
N-linked (GlcNAc…) (complex) asparagine;
N-linked (GlcNAc) asparagine; N-linked (Glc…) asparagine;
N-linked (GlcNAc…) (hybrid) asparagine; N-linked (GalNAc…) asparagine;
N-linked (GlcNAc…) (polylactosaminoglycan) asparagine; N-linked (GlcNAc…) asparagine; N-linked (Hex) asparagine; N-linked (HexNAc…) asparagine;
N-linked (GlcNAc…) (high mannose) asparagine
12,285
O-Linked Glycosylation (S)O-linked (Xyl…) (dermatan sulfate) serine; O-linked (Fuc…) serine;
O-linked (Xyl…) (heparan sulfate) serine; O-linked (HexNAc…) serine;
O-linked (Fuc) serine; O-linked (GalNAc…) serine;
O-linked (Xyl…) serine; O-linked (Hex…) serine;
O-linked (GlcA) serine; O-linked (GlcNAc) serine;
O-linked (GalNAc) serine; O-linked (Man…) serine;
O-linked (Xyl…) (glycosaminoglycan) serine; O-linked (Hex) serine;
O-linked (GlcNAc…) serine; O-linked (Glc…) serine;
O-linked (Xyl…) (chondroitin sulfate) serine; O-linked (Man) serine
942
O-Linked Glycosylation (T)O-linked (GlcNAc…) threonine; O-linked (Xyl…) (keratan sulfate) threonine;
O-linked (Hex) threonine; O-linked (GalNAc) threonine;
O-linked (GalNAc…) threonine; O-linked (GlcNAc) threonine;
(Microbial infection) O-linked (Glc) threonine; O-linked (Fuc) threonine;
O-linked (HexNAc) threonine; O-linked (Man6P…) threonine;
O-linked (Man…) threonine; O-linked (Fuc…) threonine;
O-linked (HexNAc…) threonine; O-linked (Hex…) threonine;
O-linked (Man) threonine
694
Acetylation (K)N6-acetyllysine; N6-acetyl-N6-methyllysine;
(Microbial infection) N6-acetyllysine
6009
Palmitoylation (C)N-palmitoyl cysteine; S-palmitoyl cysteine1531
Methylation (R)Asymmetric dimethylarginine;
N5-[4-(S-L-cysteinyl)-5-methyl-1H-imidazol-2-yl]-L-ornithine (Arg-Cys) (interchain with C-151 in KEAP1);
Symmetric dimethylarginine;
Dimethylated arginine;
Omega-N-methylated arginine;
Omega-N-methylarginine
1680
Methylation (K)N6-acetyl-N6-methyllysine;
N6-methyllysine;
N6,N6,N6-trimethyllysine;
N6-methylated lysine;
N6,N6-dimethyllysine
578
SUMOylation (K)Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO;
Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO1, SUMO2 and SUMO3);
Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in /SUMO5);
Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO);
Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO3);
Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO1);
Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2 and SUMO3);
Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO1 and SUMO2);
Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO1P1/SUMO5);
Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO2)
3724
Succinylation (K)N6-succinyllysine2069
Table 3. Performance comparison between different methods.
Table 3. Performance comparison between different methods.
PTM TypeMethodAccuracyF1MCCPrecisionRecall
Phosphorylation (S)MTPrompt-PTM0.9640.7360.7180.7720.704
MusiteDeep0.7060.3110.3280.1870.917
PTMGPT20.8210.2810.2250.1980.483
NetPhos3.10.2890.1550.090.0850.905
Phosphorylation (T)MTPrompt-PTM0.9750.8110.7980.7870.836
MusiteDeep0.9030.5120.5050.3780.793
PTMGPT20.880.320.2720.2520.439
NetPhos3.10.4290.1530.1030.0850.802
Phosphorylation (Y)MTPrompt-PTM0.9730.8730.8580.8770.87
MusiteDeep0.9210.7050.6780.5930.87
PTMGPT20.7850.4070.3420.290.681
NetPhos3.10.6130.2620.1540.1650.634
O-Linked Glycosylation (S)MTPrompt-PTM0.9830.7740.7650.7620.787
MusiteDeep0.9560.60.6160.4540.885
PTMGPT20.9290.3630.3510.2730.541
NetOGlyc4.00.7180.1510.1630.0850.672
O-Linked Glycosylation (T)MTPrompt-PTM0.9620.7860.7690.7240.859
MusiteDeep0.870.490.470.3570.781
PTMGPT20.8920.3860.3290.3550.422
NetOGlyc4.00.8010.2670.1960.190.453
N-Linked Glycosylation (N)MTPrompt-PTM0.9770.9150.9030.8890.944
MusiteDeep0.9670.8870.8750.8020.992
PTMGPT20.9440.8080.7820.730.905
NetNGlyc1.00.2940.2330.0330.1360.825
SUMOylation (K)MTPrompt-PTM0.970.8380.8230.8780.802
MusiteDeep0.9010.3320.2990.4720.256
PTMGPT20.920.6060.5630.5720.644
GPS-SUMO2.00.1870.1870.0810.1030.978
Ubiquitination (K)MTPrompt-PTM0.9560.7820.7640.8790.704
MusiteDeep0.6810.3670.3180.2360.831
PTMGPT20.6340.2590.140.1670.575
Succinylation (K)MTPrompt-PTM0.9830.9330.9230.9440.922
PTMGPT20.8340.5190.4520.4080.715
Acetylation (K)MTPrompt-PTM0.9810.8990.8890.9240.877
MusiteDeep0.9480.7780.7620.6710.926
PTMGPT20.7240.2860.1990.1920.562
Palmitoylation (C)MTPrompt-PTM0.9740.9260.910.9430.909
MusiteDeep0.9160.7590.7080.7740.745
PTMGPT20.8830.6950.6260.6510.745
CSS-Palm4.00.1830.3-0.030.1770.982
Methylation (R)MTPrompt-PTM0.9890.8920.8890.9670.829
MusiteDeep0.9150.5650.590.3990.967
PTMGPT20.9210.540.540.4030.819
GSP-MSP0.7340.2740.3040.1620.881
Methylation (K)MTPrompt-PTM0.9760.8830.870.9010.867
MusiteDeep0.9520.7910.7680.7280.867
PTMGPT20.8690.5290.4770.4270.695
GSP-MSP0.3370.2340.1620.1330.962
MethylSight0.3840.190.0220.110.686
Note: Numbers in bold represent the highest values achieved for each metric (Accuracy, F1, MCC, Precision, Recall) within a given PTM type.
Table 4. Performance comparison of PTM prediction models across different kinase families.
Table 4. Performance comparison of PTM prediction models across different kinase families.
Kinase TypeNumber of Kinase Sites in Training SetMethodNumber of Kinase Sites in Testing SetNumber of Predicted Kinase Sites in Testing SetAccuracy
AGC879MTPrompt-PTM17150.882
MusiteDeep17140.824
PTMGPT21770.412
NetPhos3.117120.706
CAMK392MTPrompt-PTM210.5
MusiteDeep221
PTMGPT2210.5
NetPhos3.1221
CK148MTPrompt-PTM111
MusiteDeep111
PTMGPT2100
NetPhos3.1111
CMGC1739MTPrompt-PTM47430.915
MusiteDeep47420.894
PTMGPT247210.447
NetPhos3.147400.851
Other415MTPrompt-PTM910.111
MusiteDeep930.333
PTMGPT2910.111
NetPhos3.1920.222
STE174MTPrompt-PTM221
MusiteDeep221
PTMGPT2221
NetPhos3.1221
TK753MTPrompt-PTM441
MusiteDeep441
PTMGPT2420.5
NetPhos3.1420.5
Note: Numbers in bold represent the highest values achieved for accuracy within a given kinase type.
Table 5. Performance comparison of F1 and MCC on MTPrompt-PTM and separately trained model.
Table 5. Performance comparison of F1 and MCC on MTPrompt-PTM and separately trained model.
F1/MCC
PTM Type (Residue)Multi-Task ModelSingle-Task Model
Phosphorylation (S)0.428/0.3840.429/0.383
Phosphorylation (T)0.461/0.4320.439/0.406
Phosphorylation (Y)0.503/0.4590.498/0.448
N-Linked Glycosylation (N)0.918/0.9020.922/0.907
O-Linked Glycosylation (S)0.524/0.50.487/0.447
O-Linked Glycosylation (T)0.716/0.6850.670/0.634
Palmitoylation (C)0.74/0.6970.730/0.685
Acetylation (K)0.214/0.2060.189/0.180
Ubiquitination (K)0.081/0.1290.051/0.074
Succinylation (K)0.208/0.1760.144/0.109
SUMOylation (K)0.352/0.3420.361/0.332
Methylation (K)0/0.1420.089/0.143
Methylation (R)0.431/0.4140.470/0.454
Note: Numbers in bold represent the highest values achieved for F1 and MCC within a given PTM type.
Table 6. Performance comparison of F1 and MCC on MTPrompt-PTM and multi-task model without knowledge distillation.
Table 6. Performance comparison of F1 and MCC on MTPrompt-PTM and multi-task model without knowledge distillation.
F1/MCC
PTM Type (Residue)Multi-Task Model with Knowledge DistillationMulti-Task Model Without
Knowledge Distillation
Phosphorylation (S)0.428/0.3840.341/0.338
Phosphorylation (T)0.461/0.4320.389/0.381
Phosphorylation (Y)0.503/0.4590.448/0.424
N-Linked Glycosylation (N)0.918/0.9020.916/0.901
O-Linked Glycosylation (S)0.524/0.50.45/0.446
O-Linked Glycosylation (T)0.716/0.6850.667/0.634
Palmitoylation (C)0.74/0.6970.719/0.678
Acetylation (K)0.214/0.2060.160/0.164
Ubiquitination (K)0.081/0.1290.081/0.129
Succinylation (K)0.208/0.1760.044/0.062
SUMOylation (K)0.352/0.3420.253/0.301
Methylation (K)0/0.1420/0.142
Methylation (R)0.431/0.4140.411/0.415
Note: Numbers in bold represent the highest values achieved for F1 and MCC within a given PTM type.
Table 7. Performance comparison of AUROC and AUPRC of MTPrompt-PTM and multi-task model without knowledge distillation.
Table 7. Performance comparison of AUROC and AUPRC of MTPrompt-PTM and multi-task model without knowledge distillation.
AUROC/AUPRC
PTM TypeMulti-Task Model with Knowledge DistillationMulti-Task Model Without
Knowledge Distillation
Phosphorylation (S)0.866/0.4090.866/0.411
Phosphorylation (T)0.878/0.4360.879/0.429
Phosphorylation (Y)0.844/0.5040.845/0.496
N-Linked Glycosylation (N)0.990/0.9240.991/0.927
O-Linked Glycosylation (S)0.870/0.5520.864/0.518
O-Linked Glycosylation (T)0.933/0.7840.933/0.761
Palmitoylation (C)0.929/0.7910.925/0.792
Acetylation (K)0.739/0.2200.742/0.212
Ubiquitination (K)0.669/0.2150.669/0.215
Succinylation (K)0.720/0.2600.722/0.268
SUMOylation (K)0.798/0.3690.797/0.359
Methylation (K)0.663/0.1220.669/0.130
Methylation (R)0.893/0.4380.910/0.450
Note: Numbers in bold represent the highest values achieved for AUROC and AUPRC within a given PTM type.
Table 8. Performance comparison of F1 and MCC of MTPrompt-PTM and multi-task model with fine-tuning in the last two layers of S-PLM.
Table 8. Performance comparison of F1 and MCC of MTPrompt-PTM and multi-task model with fine-tuning in the last two layers of S-PLM.
F1/MCC
PTM TypeMulti-Task Model
with Prompt Tuning
Multi-Task Model with Fine-Tuning in Last Two Layers of S-PLM v2
Phosphorylation (S)0.428/0.3840.355/0.340
Phosphorylation (T)0.461/0.4320.403/0.392
Phosphorylation (Y)0.503/0.4590.450/0.427
N-Linked Glycosylation (N)0.918/0.9020.916/0.900
O-Linked Glycosylation (S)0.524/0.50.49/0.481
O-Linked Glycosylation (T)0.716/0.6850.548/0.554
Palmitoylation (C)0.74/0.6970.695/0.651
Acetylation (K)0.214/0.2060.214/0.206
Ubiquitination (K)0.081/0.1290.082/0.156
Succinylation (K)0.208/0.1760.11/0.124
SUMOylation (K)0.352/0.3420.268/0.304
Methylation (K)0/0.1420.048/0.152
Methylation (R)0.431/0.4140.435/0.435
Note: Numbers in bold represent the highest values achieved for F1 and MCC within a given PTM type.
Table 9. Performance comparison of AUROC and AUPRC of MTPrompt-PTM and multi-task model with fine-tuning in the last two layers of S-PLM.
Table 9. Performance comparison of AUROC and AUPRC of MTPrompt-PTM and multi-task model with fine-tuning in the last two layers of S-PLM.
AUROC/AUPRC
PTM TypeMulti-Task Model
with Prompt Tuning
Multi-Task Model with Fine-Tuning in Last Two Layers of S-PLM v2
Phosphorylation (S)0.866/0.4090.865/0.409
Phosphorylation (T)0.878/0.4360.882/0.434
Phosphorylation (Y)0.844/0.5040.837/0.499
N-Linked Glycosylation (N)0.990/0.9240.991/0.930
O-Linked Glycosylation (S)0.870/0.5520.845/0.525
O-Linked Glycosylation (T)0.933/0.7840.922/0.757
Palmitoylation (C)0.929/0.7910.927/0.795
Acetylation (K)0.739/0.2200.739/0.220
Ubiquitination (K)0.669/0.2150.672/0.207
Succinylation (K)0.720/0.2600.717/0.259
SUMOylation (K)0.798/0.3690.792/0.354
Methylation (K)0.663/0.1220.704/0.289
Methylation (R)0.893/0.4380.903/0.461
Note: Numbers in bold represent the highest values achieved for AUROC and PRAUC within a given PTM type.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Han, Y.; He, F.; Shao, Q.; Wang, D.; Xu, D. MTPrompt-PTM: A Multi-Task Method for Post-Translational Modification Prediction Using Prompt Tuning on a Structure-Aware Protein Language Model. Biomolecules 2025, 15, 843. https://doi.org/10.3390/biom15060843

AMA Style

Han Y, He F, Shao Q, Wang D, Xu D. MTPrompt-PTM: A Multi-Task Method for Post-Translational Modification Prediction Using Prompt Tuning on a Structure-Aware Protein Language Model. Biomolecules. 2025; 15(6):843. https://doi.org/10.3390/biom15060843

Chicago/Turabian Style

Han, Ye, Fei He, Qing Shao, Duolin Wang, and Dong Xu. 2025. "MTPrompt-PTM: A Multi-Task Method for Post-Translational Modification Prediction Using Prompt Tuning on a Structure-Aware Protein Language Model" Biomolecules 15, no. 6: 843. https://doi.org/10.3390/biom15060843

APA Style

Han, Y., He, F., Shao, Q., Wang, D., & Xu, D. (2025). MTPrompt-PTM: A Multi-Task Method for Post-Translational Modification Prediction Using Prompt Tuning on a Structure-Aware Protein Language Model. Biomolecules, 15(6), 843. https://doi.org/10.3390/biom15060843

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop