1. Introduction
Ferroptosis, a recently discovered form of regulated cell death characterized by iron-dependent lipid peroxidation [
1], is implicated in a range of pathologies, including cancer [
2], neurodegenerative disorders [
3], and ischemia–reperfusion injury [
4]. First defined by Dixon et al. [
5], this process is driven by the accumulation of lipid peroxides, which is executed or inhibited by a class of proteins known as ferroptosis-related proteins (FRPs). Key examples include the suppressor GPX4 (UniProtKB P36969) [
6], the driver ACSL4 (UniProtKB O60488) [
7], and various iron-regulating proteins [
8]. Consequently, the accurate identification of FRPs is a critical step for advancing therapeutic strategies, such as targeting ferroptosis suppressor protein 1 (FSP1, UniProtKB Q96NN9) to overcome treatment resistance [
9].
However, traditional experimental methods for identifying and characterizing FRPs, such as gene knockout or overexpression studies, are often time-consuming, laborious, and resource-intensive. Computational methods, particularly those based on machine learning, offer a promising alternative for large-scale and efficient FRP prediction. To our knowledge, only one computational tool, FRP-XGBoost [
10], has been specifically developed for this purpose. The authors of that study employed four types of sequence-based features and evaluated six traditional machine learning classifiers. Although FRP-XGBoost achieved satisfactory accuracy in 10-fold cross-validation, its performance on the independent test set was notably lower, indicating a need for more robust predictive models. This performance gap suggests that there is significant room for improvement through the design of more advanced feature descriptors and more powerful classification algorithms.
Recent years have witnessed remarkable advancements in protein bioinformatics, largely driven by the development of deep learning models, particularly pre-trained protein language models (PLMs). Models like ESM2 [
11] have revolutionized various protein prediction tasks, including the identification of disease-related proteins [
12]. Trained on vast datasets of protein sequences, these powerful models excel at learning intricate patterns and long-range dependencies, leading to significant improvements in prediction accuracy over traditional computational methods. Despite their immense promise, the application of PLMs to FRP prediction remains largely unexplored. The prior work, FRP-XGBoost, relied on conventional machine learning and did not leverage the rich, contextual embeddings offered by PLMs. This represents a significant opportunity to enhance the accuracy and efficiency of FRP identification.
In this study, we introduce PLM-FRP, a novel machine learning framework designed for the accurate identification of FRPs by harnessing the power of PLMs. Our framework integrates traditional sequence-based features (dipeptide composition and DDE) with advanced contextual embeddings from the ESM2 model. These comprehensive features are then supplied to an optimized XGBoost classifier for robust prediction. The overall workflow of PLM-FRP is illustrated in
Figure 1. By combining these diverse feature sets, PLM-FRP captures a holistic view of protein properties, from local sequence composition to global evolutionary and structural information. We demonstrate that our model significantly outperforms existing methods on a benchmark dataset, achieving state-of-the-art accuracy and representing a 4% improvement over the previous best-performing method [
10]. This advance holds immense potential to accelerate ferroptosis research by enabling the rapid annotation of novel FRPs, thereby facilitating a deeper understanding of their signaling pathways and paving the way for new targeted therapies.
2. Materials and Methods
2.1. Benchmark Dataset
The performance of our proposed model was evaluated using the benchmark dataset curated by Lin et al. [
10]. This dataset was meticulously constructed by sourcing known FRPs—including drivers, suppressors, and markers—from the Ferroptosis Database (FerrDb V2) [
13]. For the negative set, non-FRPs were carefully selected from the Universal Protein Knowledgebase (UniProtKB) [
14] under stringent criteria, ensuring a balanced and representative dataset by excluding any proteins with known links to ferroptosis.
To mitigate potential bias from sequence homology, the CD-HIT tool was employed to remove redundant entries with a similarity threshold of 90%. The final, non-redundant dataset consists of 1149 positive (FRPs) and 1149 negative (non-FRPs) sequences. For robust model validation, this dataset was partitioned into a training set of 1840 proteins (80%) and an independent test set of 460 proteins (20%), maintaining a balanced class distribution in both subsets. This rigorous preparation process yields a high-quality benchmark for developing and evaluating FRP prediction models.
2.2. Traditional Feature Extraction
In this study, we employed four widely used traditional feature extraction methods to represent protein sequences:
(i) Amino acid composition (AAC) [
15] calculates the frequency of each amino acid in a protein sequence, resulting in a 20-dimensional feature vector.
where
is the number of occurrences of amino acid
i and
N is the total number of amino acids in the sequence.
(ii) Composition of k-Spaced Amino Acid Pairs (CKSAAP) [
16] captures the composition of amino acid pairs separated by
k residues, providing information regarding short-range interactions.
where
is the number of occurrences of pair
separated by
k residues and
is the total number of possible pairs in the sequence.
(iii) Dipeptide Deviation from Expected Mean (DDE) [
17] measures the deviation of observed dipeptide frequencies from expected mean frequencies, reflecting preferences in local amino acid order.
where
is the observed frequency of dipeptide
,
is the expected frequency of dipeptide
, and
is the standard deviation of dipeptide
.
(iv) Grouped Tripeptide Composition (GTPC) [
18] groups amino acids based on physicochemical properties and calculates the frequency of tripeptides formed by these groups.
where
is the number of occurrences of tripeptides formed by groups
,
, and
, and
is the total number of tripeptides in the sequence.
2.3. PSSM Feature Extraction
Evolutionary information plays a vital role in deciphering protein function and structure. To leverage this evolutionary information, the PSI-BLAST tool [
19] was used to generate a Position-Specific Scoring Matrix (PSSM) for each protein sequence. Following standard bioinformatics practice, PSI-BLAST was run against the comprehensive NCBI non-redundant (NR) protein database. The search parameters were set to three iterations (iter = 3) with an E-value threshold of 0.001 (E = 0.001), which are commonly used settings for generating robust PSSMs. The PSSM captures the evolutionary conservation of amino acids at each position, offering a more informative representation compared to the raw amino acid sequence. In this study, three PSSM-based features were employed to comprehensively capture evolutionary information from protein sequences and enhance classifier performance.
2.3.1. AADP-PSSM
The AADP-PSSM [
20] method combines AAC and dipeptide composition (DPC) features derived from the normalized PSSMs.
AAC-PSSM: For each protein sequence, AAC-PSSM is calculated by averaging the normalized PSSM values across all positions for each of the 20 standard amino acids, resulting in a 20-dimensional feature vector.
where
L is the length of the protein sequence, and
is the normalized PSSM value at position
j for amino acid type
i.
DPC-PSSM: DPC-PSSM captures the frequency of dipeptide combinations within the evolutionary context. It is calculated by averaging the product of normalized PSSM values for adjacent amino acid pairs across all positions, resulting in a 400-dimensional feature vector.
where
L is the length of the protein sequence and
and
are the normalized PSSM values at positions
k and
for amino acid types
i and
j, respectively.
The final AADP-PSSM feature vector is a concatenation of the AAC-PSSM and DPC-PSSM features, resulting in a 420-dimensional feature vector.
2.3.2. S-FPSSM
The S-FPSSM [
21] (Sum-Frequency PSSM) method generates a 400-dimensional feature vector derived from the FPSSM matrix by row transformation. In this method, the FPSSM is a matrix where the negative values from the original PSSM have been filtered out (set to zero).
The S-FPSSM vector is constructed by considering each of the 20 amino acids (rows in a PSSM) across the length of the protein sequence. Each element of the vector is calculated based on the FPSSM and an indicator function that considers amino acid matches. This vector represents a 400-dimensional feature.
The value of
is calculated as follows:
where
L is the length of the protein sequence.
represents the element in the ith row and jth column of the FPSSM matrix. This means it is the positive part of the PSSM for the ith amino acid type at position j.
is an indicator function defined as follows:
represents the amino acid at position j in the original protein sequence.
represents the ith amino acid type in the standard amino acid order. This indicates whether the amino acid at position j in the protein sequence corresponds to the ith amino acid type.
2.3.3. k-Separated Bigram-PSSM (KSB-PSSM)
This method extends the DPC-PSSM concept by considering amino acid pairs separated by k residues (k-spaced pairs or k-separated bigrams) within the PSSM profile [
22]. It calculates the transfer probability of k-spaced amino acid pairs across all positions, resulting in a 400-dimensional feature vector.
The KSB-PSSM feature vector is a 400-dimensional vector, where each element is denoted as
and corresponds to the transfer probability of a pair of amino acids,
i and
j, separated by
k residues. This is represented as follows:
The calculation of a single element
is as follows:
where
L represents the protein sequence length and
and
represent the normalized PSSM values at positions
t and
for amino acid types
i and
j, respectively.
By employing these three PSSM-based feature extraction methods, we aimed to capture a diverse range of evolutionary information embedded in the PSSM profiles, including overall amino acid composition, short-range dipeptide interactions, and long-range evolutionary dependencies, providing a comprehensive representation of protein sequences for FRP prediction.
2.4. PLM Feature Embedding
To capture rich contextual information and evolutionary relationships inherent in protein sequences, we utilized pre-trained PLMs such as ESM-1b, ESM2, and ProtBert. These models are all based on the Transformer architecture [
23]. PLMs, trained on extensive protein sequence datasets, have shown significant success in capturing complex patterns and evolutionary dependencies essential for understanding protein function [
24].
ESM-1b and ESM2 are large-scale models, each comprising 33 Transformer layers and approximately 650 million parameters [
11,
25]. ProtBert, another prominent PLM, is derived from Google’s BERT model and pre-trained on a massive dataset of 100 million protein sequences [
26]. This pre-training allows ProtBert to learn general representations of protein sequences, effectively capturing the "language" of proteins. The depth and capacity of these models (ESM-1b, ESM2, and ProtBert) enable them to effectively learn intricate patterns in protein sequences [
27], capturing both local and long-range interactions vital for accurate protein function prediction.
Figure 2 schematically illustrates the process of extracting and aggregating protein features from these pre-trained language models. Specifically, as the sequence propagates through the Transformer layers, the model generates contextualized embeddings for each amino acid at each layer, capturing both the identity and the contextual information of each amino acid within the sequence [
28]. This enables the model to discern the influence of neighboring amino acids on protein function. Hidden states from the last (33rd) layer of both ESM-1b and ESM2-650M models, and the output of the final layer for ProtBert, were extracted as feature representations for each protein sequence. Specifically, for each protein sequence, the embeddings of all amino acids from the last layer were averaged to produce a single, fixed-length feature vector. Both ESM-1b and ESM2 models yield 1280-dimensional feature vectors (embeddings) for each amino acid, and thus the mean embedding for an entire protein sequence also results in a 1280-dimensional vector. ProtBert, similarly, produces 1024-dimensional feature vectors for each amino acid, resulting in a 1024-dimensional mean embedding for the entire sequence.
Using these pre-trained PLMs, we extracted highly informative and contextualized representations of protein sequences, capturing intricate patterns and evolutionary dependencies essential for FRP prediction. The substantial parameter size and depth of ESM-1b, ESM2, and ProtBert enable them to directly learn complex representations from the data, outperforming traditional methods reliant on handcrafted features. This approach improves prediction accuracy by leveraging deep learning and large-scale pre-training to model the complex biological properties of protein sequences.
2.5. Feature Selection
MRMD3.0 [
29], a robust dimensionality reduction tool employing an ensemble link analysis strategy, was used to identify the optimal feature subset. MRMD3.0 integrates multiple feature ranking algorithms, incorporating the PageRank algorithm, a well-established method for ranking nodes in a network based on their importance [
30]. In the context of MRMD3.0, the PageRank algorithm is adapted to model features as nodes in a graph. The “importance” or “rank” of a feature node is then iteratively calculated based on its relevance to the target variable and its redundancy with other features. Features that are highly relevant to the target and less redundant with other high-ranking features receive higher PageRank scores, thereby contributing to a more effective feature ranking. The feature ranking algorithms included Information Gain, Gain Ratio, Gini Index, Chi-squared, ReliefF, minimum Redundancy Maximum Relevance (mRMR), and Mutual Information. By combining these algorithms, each with distinct strengths, MRMD3.0 generates a robust and comprehensive feature ranking.
The feature selection process in MRMD3.0 comprises two key steps. Initially, raw hybrid features are ranked based on importance scores calculated by the ensemble of ranking algorithms. Subsequently, a forward feature selection strategy, combined with 10-fold cross-validation, is used to iteratively add features to the model and assess its performance, measured by classification accuracy. The feature subset achieving the highest accuracy is selected as the optimal set [
31]. This two-step strategy leverages the strengths of ensemble learning and PageRank, facilitating the identification of a robust and informative feature set for accurate FRP prediction.
Furthermore, an XGBoost classifier combined with LambdaRank, a ranking algorithm specifically designed for optimizing ranking tasks [
32], was incorporated to complement MRMD3.0. While MRMD3.0 provides an initial robust feature subset, LambdaRank, integrated with XGBoost, serves as a secondary refinement step. It optimizes the feature ranking specifically for the classification task by leveraging gradient boosting to assign importance scores that directly improve the classifier’s performance. This enhanced feature selection process results in improved prediction accuracy and model robustness. This integrated approach ensures that the selected features are both informative and optimally ranked for the classification task [
33].
2.6. Model Training
We employed a diverse set of machine learning algorithms, encompassing both traditional classifiers and deep learning models, to train our prediction models. This approach facilitated a thorough exploration of the strengths and limitations of each algorithm in capturing the intricate characteristics of FRPs.
2.6.1. Traditional Machine Learning Classifiers
We employed four widely adopted traditional machine learning classifiers to evaluate their effectiveness in predicting FRPs:
Support Vector Machine (SVM): A robust supervised learning algorithm that seeks to find the optimal hyperplane that best separates data into distinct classes [
34]. By maximizing the margin between classes, SVM is effective in high-dimensional spaces. We employed the radial basis function (RBF) kernel, renowned for its ability to model non-linear relationships within the data.
Random Forest (RF): Random Forest (RF) is a popular ensemble learning method for classification and regression tasks in machine learning. It extends the decision trees by combining multiple trees to make predictions [
35]. For classification, RF predicts the class label via majority voting among individual trees, where each tree’s prediction contributes to the final class with the most votes.
Naive Bayes (NB): A probabilistic classifier based on Bayes’ theorem, which operates under the assumption of conditional independence between features given the class label [
36]. Despite its simplicity, Naive Bayes classifiers have demonstrated strong performance in various classification tasks, particularly with high-dimensional data.
eXtreme Gradient Boosting (XGBoost): A scalable and efficient implementation of gradient boosting machines, celebrated for its high accuracy and efficiency in handling structured data [
37,
38]. XGBoost builds an ensemble of weak predictive models, typically decision trees, in a stage-wise manner and optimizes a differentiable loss function.
These classifiers were implemented using the scikit-learn library (v0.24.2) in Python 3.10 [
39]. Hyperparameters were optimized through grid search and cross-validation to ensure optimal performance for each classifier.
2.6.2. Deep Learning Models
To further explore the potential of deep learning in FRP prediction, we employed two advanced architectures: 1D Convolutional Neural Networks (1D-CNNs) and Bidirectional Long Short-Term Memory networks (BiLSTMs). These architectures were chosen due to their proven capabilities in handling sequential data, which is characteristic of protein sequences, and their ability to capture complex hierarchical and temporal dependencies, respectively. While simpler models like Multilayer Perceptrons (MLPs) or Artificial Neural Networks (ANNs) can also process sequential data, they typically lack the specialized architectural inductive biases that make 1D-CNNs and BiLSTMs particularly effective at extracting meaningful features from long and intricate sequences like proteins. Therefore, we focused on these more advanced deep learning models to maximize the predictive power for FRPs.
1D-CNNs are powerful architectures adept at processing sequential data such as protein sequences [
40]. Our
1D-CNN architecture (
Figure 3a) comprises convolutional layers, max-pooling layers, and fully connected layers, incorporating ReLU activation functions and dropout regularization to prevent overfitting. This design facilitates the extraction of hierarchical feature representations, capturing both local and global patterns within the sequences.
BiLSTM networks, a specialized variant of Recurrent Neural Networks (RNNs), are designed to capture long-range dependencies within sequential data [
41]. BiLSTM processes sequences in both forward and backward directions, enabling it to learn comprehensive representations of the input sequences. Our BiLSTM architecture (
Figure 3b) consists of two bidirectional LSTM layers followed by a fully connected layer, effectively capturing temporal dependencies and contextual information critical for accurate prediction [
42].
For both 1D-CNN and BiLSTM models, we extensively utilized dropout layers (with a rate of 0.5) within the network architectures and implemented Early Stopping during training. Early Stopping monitored the validation loss, halting training when no further improvement was observed, thereby ensuring that the models generalized well to unseen data.
2.6.3. Hyperparameter Optimization and Cross-Validation
To ensure optimal performance for each classifier and deep learning model, we employed grid search to systematically explore a range of hyperparameter values and identify the best configuration [
43]. Grid search involves exhaustively searching through a specified subset of hyperparameters, allowing us to fine-tune the models for optimal performance [
44]. We utilized 10-fold cross-validation, a robust technique that partitions the training data into ten folds [
45]. In each iteration, nine folds were used for training the model, and the remaining fold was used for validation, rotating the folds to guarantee that each data point was used for validation exactly once. This approach mitigates the risk of overfitting and provides a reliable estimate of the model’s generalization ability.
Table 1 details the final set of hyperparameters chosen for each model following grid search and 10-fold cross-validation. By employing a diverse array of machine learning algorithms, optimizing their hyperparameters through systematic grid search, and utilizing robust cross-validation techniques, we aimed to develop highly accurate and generalizable prediction models for FRPs.
2.7. Model Evaluation
An objective assessment of the predictive performance of our classifiers was conducted using a 10-fold cross-validation approach. The dataset was divided into ten partitions, with nine partitions used for training and one for testing. This process was iterated ten times, and the average accuracy across these iterations was used as the final estimate of model performance.
To comprehensively evaluate the models, five commonly used metrics were employed: sensitivity (Sn), specificity (Sp), Matthews correlation coefficient (MCC), accuracy (Acc), and area under the receiver operating characteristic (ROC) curve (AUC). The formulas for these metrics are as follows:
Acc measures the proportion of correctly predicted samples over the total samples. Sn measures the proportion of actual positives correctly identified, also known as Recall. Sp measures the proportion of actual negatives correctly identified. MCC considers true and false positives and negatives to provide a balanced evaluation, while AUC indicates the model’s ability to distinguish between classes.
3. Results and Discussion
3.1. Classifier Selection
To evaluate the efficacy of various feature extraction methods in capturing the characteristics of FRPs, we assessed the performance of each feature representation using six distinct classifiers: SVM, XGBoost, RF, NB, 1D-CNN, and BiLSTM.
Figure 4 presents the ROC curves for each feature set, with the corresponding AUC values provided in the legend.
Analysis of the ROC curves and AUC values (
Figure 4) revealed that SVM and XGBoost consistently delivered the highest performance across the various single-feature representations. This suggests their superior capability in discerning the complex patterns associated with FRPs. Furthermore, the DDE and ESM2 features consistently yielded high AUC values across multiple classifiers, demonstrating their strong discriminative power for distinguishing between FRPs and non-FRPs.
Based on these findings, we selected SVM and XGBoost for subsequent comparative analysis.
Figure 5 provides a detailed performance comparison, based on classification accuracy, between the SVM and XGBoost classifiers across the different single-feature sets.
As illustrated in
Figure 5, the XGBoost classifier consistently outperformed SVM across most individual feature representations, achieving higher accuracy in 8 out of 10 cases. This superior performance indicates XGBoost’s enhanced capacity to effectively capture complex patterns and interactions within the feature spaces derived from these methods. The gradient boosting framework of XGBoost constructs an ensemble of weak predictive models—typically decision trees—in a stage-wise fashion while optimizing a differentiable loss function. This process enhances its ability to model non-linear relationships and feature interactions, making it particularly well-suited for the high-dimensional and complex nature of FRP prediction. Moreover, XGBoost’s built-in regularization techniques help prevent overfitting, further contributing to its robust performance. These findings validate our selection of XGBoost as the primary classifier for subsequent analyses and model development.
3.2. Performance Comparison with Hybrid Features
To explore the potential of integrating complementary features for enhancing prediction accuracy, a series of ablation experiments were conducted to evaluate the performance of different feature combinations. Initially, combinations of traditional features, PSSM-based features, and embeddings from large models like ESM-1b and ESM2 were tested. While these combinations improved prediction accuracy, strategic feature selection emphasizing evolutionary information and structural properties further enhanced performance. To systematically evaluate the contribution of each feature type, ablation experiments were performed using optimal traditional features, PSSM features, and large model embeddings. Specific combinations tested included traditional features alone, PSSM features alone, large model embeddings alone, and their pairwise combinations. It is noteworthy that among the tested large model embeddings, ProtBERT features (
Figure 4j) showed comparatively lower performance than ESM2 for this task, leading us to prioritize ESM2 for further feature combinations. Results indicated that while each feature type individually contributed to prediction accuracy, their combinations offered complementary perspectives on protein sequence characteristics, leading to further enhancements.
As detailed in
Table 2, the combination of DDE and ESM2 features achieved the highest performance across most evaluation metrics, with an accuracy of 94.78% and an MCC of 0.896. This result highlights the powerful synergy between traditional, sequence-derived statistics and modern, deep learning-based representations. The DDE feature effectively captures local compositional biases, while the ESM2 embeddings provide rich, context-aware information about global evolutionary and structural properties. This fusion of complementary information proved to be the most effective strategy, outperforming other combinations, including those incorporating PSSM-based features.
The DDE + ESM2 combination demonstrated the highest prediction accuracy, underscoring the importance of integrating features that provide complementary insights into protein sequences. The DDE feature encapsulates both amino acid composition and dipeptide composition within an evolutionary context, while the ESM2 embeddings capture detailed structural and functional information. This synergy between evolutionary and structural properties proved to be the most effective in our experiments.
In conclusion, our ablation study confirms that integrating DDE features with ESM2 embeddings creates a highly discriminative feature set for FRP prediction. This finding underscores the value of a hybrid approach, which leverages diverse feature types to capture a multifaceted view of protein sequences, ultimately leading to more accurate and robust predictive models.
3.3. Classification Results and Model Performance
Building upon the optimal DDE + ESM2 feature combination, we employed a LambdaRank-based feature selection strategy to further refine the feature set and enhance model performance. This advanced selection process identified a highly informative subset of features, effectively reducing dimensionality, mitigating the risk of overfitting, and improving model generalizability. Subsequently, we trained our final XGBoost classifier, previously identified as the top-performing algorithm, on this optimized feature subset.
Figure 6 illustrates the relationship between the number of selected features and the corresponding classification accuracy.
The final optimized feature set, derived from the DDE and ESM2 embeddings, resulted in a 216-dimensional input vector for the XGBoost classifier. To rigorously assess the model’s robustness and generalizability, we performed 10-fold cross-validation on the training set. The model achieved an impressive average accuracy of 96.52% across the 10 folds, with a peak accuracy of 97.26% in 1 fold. This consistently high performance across different data partitions underscores the model’s strong ability to generalize to unseen data, indicating its robustness and reliability.
The performance of our final model, named PLM-FRP, was validated on the independent test set. As shown in
Table 3, PLM-FRP achieved a remarkable accuracy of 96.09%, significantly surpassing the performance of existing FRP prediction methods.
These exceptional results, corroborated by the high accuracy in 10-fold cross-validation, highlight the power of combining PLM embeddings with a robust feature selection strategy to achieve state-of-the-art performance in FRP prediction. This approach not only improves prediction accuracy but also yields a more focused and interpretable model, enabling a deeper understanding of the features driving the prediction.
3.4. Model Interpretation
To elucidate the decision-making process of our final model, PLM-FRP, we employed SHapley Additive exPlanations (SHAP) [
46]. The SHAP framework provides a robust method for interpreting machine learning models by quantifying the contribution of each feature to an individual prediction. By leveraging principles from cooperative game theory, SHAP fairly allocates the impact of each feature on the model’s output, thereby offering a comprehensive understanding of feature importance.
To visualize the feature space and the effectiveness of our selection strategy, we first employed t-Distributed Stochastic Neighbor Embedding (t-SNE) [
47]. As shown in
Figure 7, the t-SNE plots reveal a markedly improved separation between positive (FRPs) and negative (non-FRPs) samples after feature selection. This observation underscores the efficacy of our feature selection pipeline in identifying a highly discriminative feature subset that enhances the model’s ability to distinguish between the two classes. To further examine the relationships between the most predictive features, a correlation heatmap was also generated (
Figure 8).
To further analyze the impact of individual features, we generated a SHAP summary plot for the top 20 features from the optimized DDE + ESM2 combination (
Figure 9). This plot illustrates both the magnitude and direction of each feature’s influence on the model’s output. Features with positive SHAP values push the prediction towards the FRP class, while negative values push it towards the non-FRP class. The color intensity indicates the feature’s value, providing deeper insight into its behavior.
To further elucidate the impact of individual features on the model’s predictions, we analyzed the SHAP values for the top 20 features derived from the DDE + ESM2 feature combination, as depicted in
Figure 9. This visualization provides insights into the direction and magnitude of each feature’s influence on the model’s output. Features with positive SHAP values contribute to a higher prediction score (FRPs), while features with negative SHAP values contribute to a lower prediction score (non-FRPs). The color intensity reflects the magnitude of the feature’s impact.
To gain further insights into the interactions between features, we generated a SHAP decision plot for the 216-dimensional feature set extracted from the combined DDE + ESM2 features. This plot illustrates the cumulative contribution of each feature to the model’s decision-making process, highlighting the most influential features and their interactions (
Figure 10).
While SHAP analysis effectively quantifies individual feature contributions, directly mapping high-importance features from the combined DDE and ESM2 set to specific biological motifs presents inherent challenges. This difficulty arises from the abstract, high-dimensional nature of ESM2 embeddings, which capture complex, non-linear dependencies, and the statistical derivation of DDE features. Nevertheless, the significant contributions of these features strongly suggest that our model leverages underlying biological signals. For instance, the importance of DDE features indicates the model’s sensitivity to local amino acid environments, while the prominent role of ESM2 embeddings highlights the capture of global protein characteristics crucial for FRP prediction. This demonstrates that PLM-FRP effectively learns sophisticated biological patterns, even if they are not immediately interpretable as discrete biological units.
Through this comprehensive analysis, we have gained a deeper understanding of the model’s decision-making process, revealing the interplay among key features and their influence on FRP prediction. These insights not only enhance confidence in the model’s reliability but also contribute to a better understanding of the underlying biological factors associated with ferroptosis.
3.5. Web Server
To facilitate the widespread use of our model, we have developed a user-friendly web server, PLM-FRP, which is publicly accessible at
https://www.frppredict.site (accessed on 6 July 2025). The server provides a convenient platform for researchers to submit protein sequences in FASTA format and obtain rapid predictions of their potential as FRPs.
The web server is built on the Flask framework, chosen for its lightweight nature and seamless integration with our Python-based machine learning pipeline. It is hosted on a cloud platform to ensure high availability, scalability, and efficient processing of user requests. Upon submission, the server processes the input sequences using our optimized PLM-FRP model. The results are promptly displayed, including a probability score indicating the likelihood of each sequence being an FRP. To enhance transparency and provide deeper insights, the server also presents the key underlying features (DDE and ESM2 embeddings) that contributed to the prediction.
Furthermore, to ensure the reproducibility and extensibility of our research, the complete source code for the training and prediction models, along with the datasets, is publicly available at our GitHub repository:
https://github.com/Moeary/PLM_FRP (accessed on 2 June 2025). This repository includes all necessary scripts and detailed instructions, enabling other researchers to replicate our findings and build upon our work.
3.6. Discussion
3.6.1. Summary of Findings
The accurate identification of FRPs is crucial for advancing our understanding of ferroptosis and its role in various diseases. Our proposed method, PLM-FRP, leverages a multimodal approach by integrating traditional sequence-based features (DDE) with contextualized embeddings from the pre-trained protein language model ESM2. This hybrid feature set, refined through a robust LambdaRank-based feature selection strategy, enables PLM-FRP to capture a comprehensive range of biologically relevant information. Our results demonstrate that PLM-FRP achieves state-of-the-art performance, with a final accuracy of 96.09% on the independent test set, representing a significant improvement of approximately 4% over previous methods. This underscores the power of integrating diverse feature types, particularly the rich representations from PLMs, for complex biological prediction tasks.
3.6.2. Comparison with Deep Learning Models
Contrary to the common expectation that deep learning models like CNNs and BiLSTMs would yield superior performance, our XGBoost-based method consistently outperformed these architectures. Several factors may explain this observation. First, deep learning models typically require extensive datasets to learn complex patterns effectively, and our benchmark dataset, though well-curated, is of limited size. In such data-scarce scenarios, ensemble methods like XGBoost are often more robust and less prone to overfitting. Second, the strong performance of PLM-FRP is heavily reliant on its highly discriminative, engineered feature set (DDE + ESM2). While deep learning models can learn features end-to-end, the explicit provision of these potent, pre-extracted features to XGBoost proved more effective for this specific task. Finally, despite systematic hyperparameter optimization via grid search for the CNN and BiLSTM models, the inherent stability and powerful gradient boosting framework of XGBoost demonstrated superior performance with our optimized multimodal features.
3.6.3. Limitations and Future Directions
Despite its promising performance, PLM-FRP has several limitations that present opportunities for future research. The primary limitation is the relatively small size of the current benchmark dataset; expanding it with more diverse, experimentally validated FRPs would likely enhance the model’s generalizability. Additionally, while our negative dataset was carefully constructed, the evolving nature of biological discovery means some proteins currently labeled as non-FRPs might later be identified as having a role in ferroptosis, introducing potential label noise. Furthermore, the interpretability of the hybrid features, particularly the abstract embeddings from ESM2, remains a challenge. Future work should focus on developing advanced interpretation techniques to map these predictive features to concrete biological properties. Addressing these limitations by curating larger datasets and improving model interpretability will be crucial for developing the next generation of predictive tools for ferroptosis research.
4. Conclusions
In this study, we have introduced PLM-FRP, a novel machine learning framework designed for the accurate and efficient prediction of ferroptosis-related proteins (FRPs). Our approach successfully integrates traditional sequence-based features (DDE) with advanced contextual embeddings from the pre-trained protein language model ESM2. By employing a sophisticated feature selection strategy, we identified an optimal hybrid feature set that captures a comprehensive spectrum of biological information, from local amino acid composition to global evolutionary and structural properties.
The resulting model, PLM-FRP, achieves state-of-the-art performance, significantly outperforming existing methods on the independent test set. To enhance model transparency, we utilized SHAP analysis to interpret the contributions of individual features, providing valuable insights into the model’s decision-making process. Furthermore, we have developed a user-friendly web server to ensure broad accessibility, enabling researchers to leverage our powerful predictive tool with ease. PLM-FRP promises to be a valuable resource for accelerating the discovery of novel FRPs, deepening our understanding of ferroptosis mechanisms, and guiding the development of future therapeutic interventions. Future work will focus on expanding the training dataset and exploring more advanced deep learning architectures to further enhance predictive accuracy and interpretability.