Next Article in Journal
Research Progress on the Application of Upconversion Nanoparticles in Heavy Metal Detection in Foodstuff
Previous Article in Journal
Antimicrobial Effects of Three Plant-Derived Phenolic Compounds and Their Potential Role in Strawberry Preservation
Previous Article in Special Issue
Effects of 2,2′-Azobis(2-amidinopropane) dihydrochloride (AAPH) on Functional Properties and Structure of Winged Bean Protein
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

HemPepPred: Quantitative Prediction of Peptide Hemolytic Activity Based on Machine Learning and Protein Language Model–Derived Features

Key Laboratory of Biorheological Science and Technology, Ministry of Education, College of Bioengineering, Chongqing University, Chongqing 400044, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Foods 2025, 14(23), 4143; https://doi.org/10.3390/foods14234143 (registering DOI)
Submission received: 22 October 2025 / Revised: 27 November 2025 / Accepted: 28 November 2025 / Published: 3 December 2025

Abstract

Accurate prediction of hemolytic peptides is essential for peptide safety evaluation and therapeutic design; however, existing models remain constrained by limited accuracy and interpretability. To overcome these challenges, we propose a regression framework that integrates embeddings from a protein language model with handcrafted amino acid descriptors. Specifically, sequence representations derived from the ESM2_t33 model are fused with physicochemical amino acid descriptor features, and key predictive variables are selected through a three-stage strategy involving variance filtering, F-test ranking, and mutual information analysis. The final ensemble model, composed of Random Forest, Extremely Randomized Trees, Gradient Boosting, eXtreme Gradient Boosting (XGBoost), and Ridge Regression, achieved a coefficient of determination (R2) of 0.57 and a correlation coefficient (R) of 0.76 on the test set, outperforming previous approaches. To enhance interpretability, we applied Shapley value analysis and the Calibrated_Explanation algorithm to quantify feature contributions and generate reliable sample-specific explanations. The trained model has been deployed online as HemPepPred, a tool for predicting hemolytic concentration (HC50) values, which provides a practical platform for rational peptide design and safety assessment.

1. Introduction

Food-derived bioactive peptides (BAPs) have emerged as promising ingredients of functional foods and nutraceuticals for health and preventing disease [1]. Extensive research has demonstrated that these peptides exhibit diverse biological activities, including antihypertensive, antimicrobial, anticoagulant, anticancer, anti-inflammatory, and antidiabetic effects [2]. Despite their considerable potential in the food industry and clinical nutrition, the possible toxicity of BAPs remains a major obstacle to large-scale application.
Under certain conditions, some peptides can misfold and form β-sheet aggregates similar to human islet amyloid polypeptides (hIAPPs) associated with type 2 diabetes, leading to disruption of cell membrane integrity and subsequent cytotoxicity [3]. This process mirrors the oxidative damage mechanisms observed with β-amyloid in Alzheimer’s disease [4]. The occurrence of toxic peptides in a wide variety of food sources has long been recognized, posing significant challenges to food safety and industry growth. A well-known example is the 33-mer proline-rich peptide derived from gluten proteins in wheat and barley, which acts as a major immunogenic epitope in celiac disease—a hereditary autoimmune disorder affecting approximately 1% of the global population [5]. Other plant-derived foods also contain potentially harmful peptides. For instance, lectins from legumes that are not adequately degraded during digestion can cause acute lectin poisoning, characterized by nausea, diarrhea, vomiting, and agglutination of red blood cells [6]. Furthermore, the soybean toxin, a single-chain acidic protein, exhibits marked toxicity in mice, inducing tonic–clonic seizures, flaccid paralysis, and ultimately death following injection [7]. Additionally, cassava contains cyanogenic glycosides such as linamarin and lotaustralin, which release the potent toxin hydrogen cyanide during enzymatic hydrolysis, and inadequate processing (e.g., insufficient soaking or fermentation) can therefore result in acute cyanide poisoning [8]. Collectively, these examples underscore the urgent need for the identification and evaluation of peptide toxicity in food systems.
The hemolytic concentration (HC50) is a classical indicator of peptide toxicity, defined as the concentration that causes 50% lysis of normal human red blood cells under physiological conditions [9]. Conventional in vitro assays, including the MTT and lactate dehydrogenase (LDH) release tests, can directly measure cytotoxicity but are limited by low throughput and lengthy turnaround times [10]. Consequently, developing efficient and accurate computational approaches has become a key priority in both drug discovery and functional food safety evaluation [11].
Early work primarily addressed the binary classification of peptides as hemolytic or non-hemolytic, whereas more recent approaches have shifted toward quantitative modeling of hemolytic activity, particularly the prediction of hemolytic concentration (HC50). Various machine learning strategies have been developed to improve prediction accuracy. Rathore et al. (2025) proposed a hybrid model that achieved an AUC of 0.909 for HC50 prediction, with a companion regression model attaining a correlation coefficient of 0.739 [12]. Plisson et al. (2020) reported a gradient boosting classifier that reached 95–97% accuracy and enhanced reliability through outlier detection [13]. In 2024, Yang and Xu introduced HemoDL, which integrates a dual-LightGBM framework with composition-transition-distribution (CTD) and transformer-derived sequence features, outperforming conventional methods in predicting hemolytic activity [14]. Almotair et al. (2024) designed a hybrid architecture combining Transformer and CNN networks, emphasizing that balanced train-test partitioning is essential for robust generalization [15]. Similarly, Raza and Arshad (2020) demonstrated that appropriate dataset partitioning improves predictive accuracy and AUC-ROC performance [16]. Karasev et al. (2024) highlighted the impact of experimental conditions on model precision and reported a regression R2 of 0.69 [17]. Additionally, Castillo-Mendieta (2024) developed a multi-query similarity search approach [18], while Yaseen et al. (2021) introduced HemoNet, a neural network model—both methods outperformed traditional machine learning baselines [19]. Hasan et al. (2020) further advanced this field with HLPPred-Fuse, a two-tier framework that integrates multiple feature representations to enhance predictive performance [20].
Recent studies indicate a growing trend toward refined modeling of hemolytic activity; however, high-accuracy regression prediction of HC50 remains limited. As model complexity and predictive performance continue to increase, the need for robust interpretability has become more pressing. Notable approaches—including SHapley Additive exPlanations (SHAP) [21], Local Interpretable Model-agnostic Explanations (LIME) [22] and Anchor Explanations (Anchor) [23]—enable feature-importance analysis to elucidate the decision basis underlying specific model outputs and have been applied in hemolytic-peptide regression tasks. Nevertheless, these methods typically provide only point estimates of feature contributions and, therefore, fail to capture the uncertainty associated with both predictions and importance assessments. This limitation reduces their reliability in high-stakes applications, underscoring the need for more resilient interpretability frameworks capable of quantifying feature-level uncertainty for individual samples.
To address this gap, we propose a regression framework for HC50 prediction that integrates protein language model embeddings with amino acid descriptors (AAD). In contrast to conventional classification methods that merely discriminate between hemolytic and non-hemolytic peptides, the proposed framework enables fine-grained quantitative prediction of HC50, offering higher-resolution guidance for the rational design of hemolytic peptides. The predictor is constructed using an ensemble learning strategy, with global interpretability provided by SHAP [24,25,26]. Furthermore, we incorporate the Calibrated_Explanation algorithm to generate confidence intervals for sample-level feature importance, thereby enhancing the robustness and reliability of model interpretation [27]. Finally, the trained model is deployed as HemPepPred, available online: http://hem.cqudfbp.net (accessed on 22 October 2025), which is an online platform for HC50 prediction that provides an efficient and interpretable tool for rational hemolytic peptide design and food safety evaluation (Figure 1).

2. Materials and Methods

2.1. Dataset

The dataset employed in this study was derived from the Hemopi2 project by Rathore et al., originally compiled for the qualitative and quantitative analysis of hemolytic activity. It comprises 3147 experimentally validated hemolytic peptides from the DBAASP database and 560 peptides from the Hemolytik database [28]. In this work, we directly utilized the curated dataset from Rathore et al., in which peptides containing non-natural amino acids or fewer than six residues had already been excluded as part of the original preprocessing procedure. For peptides reported with multiple HC50 values or value ranges, the mean HC50 was calculated to represent the overall hemolytic activity under different experimental conditions, thereby improving the robustness of the predictive model. The descriptive statistics of the Hemopi2 dataset are provided in Table 1.
All HC50 values were standardized to a consistent unit (µM) and subsequently converted to pHC50 values using Equation (1), facilitating normalization and enhancing the model’s capacity to capture relative differences in hemolytic potency.
y l o g = log 10 y + 1 × 10 8 ,
The calibration dataset, obtained from Karasev et al. [17] in CSV format, comprised peptides with experimentally validated HC50 values (µM) and followed the same format as the training and test datasets.

2.2. Feature Engineering

Feature engineering constituted the core of constructing an ensemble learning model [29]. This process entailed extracting informative descriptors from peptide sequences and refining them through multi-stage selection.
In this study, three categories of features were utilized to represent peptide sequences. First, we used Pfeature to extract 1167-dimensional classical AAD [30] from the Hemopi2 study’s ALLCOMP-ex SOC results, including AAC, DPC, ATC, and BTC features. These descriptors capture physicochemical and compositional properties known to influence peptide activity. In addition, high-dimensional embeddings from protein language models were incorporated: the ProtT5 model [31] produced 1024-dimensional embeddings encoding deep structural and functional information through global context modeling, while the ESM2_t33 model [32] generated 1280-dimensional vectors capturing evolutionary and structural patterns.
Collectively, these three feature representations complement one another by linking sequence-derived physicochemical characteristics with higher-order structural and contextual information. This integration provides a mechanistic basis for understanding sequence-activity relationships beyond the capacity of traditional descriptors alone. According to Equation (2), the three feature sets were horizontally concatenated using the np.hstack function to construct an initial composite feature matrix containing 3471 features, thereby providing a comprehensive foundation for subsequent feature selection. Because the ensemble primarily consisted of tree-based models, which are invariant to feature scaling, no normalization was applied before concatenation. The Ridge regression component was trained and validated under the same preprocessing to ensure consistency across models.
X combined = X a a , X m 1 , X m 2 R n × 3 4 7 1 ,

2.3. Feature Selection

A three-stage strategy was applied for feature selection to progressively reduce dimensionality by eliminating redundant and low-information features. The specific workflow is as follows:
First, variance filtering (threshold < 0.005) was applied to eliminate features with minimal variance, as such features contribute little to discriminative power.
Var x j = 1 n i = 1 n x i j x j ¯ 2 , x j ¯ = 1 n i = 1 n x i j ,
Next, an F-test (using f_regression in the SelectKBest method) ranked features according to their linear correlation with the target variable, retaining the top 80% ( k f = 0.8 × n v a r ).
F = M S R M S E , M S R = i = 1 n y i ^ y ¯ 2 , M S E = i = 1 n y i y i ^ 2 n 2 ,
Finally, mutual information analysis (mutual_info_regression) was used to capture nonlinear dependencies, selecting the top k m i = 0.5 × n f features. When the number of selected features exceeded the target range (600–700), a secondary F-test ranking was performed to refine the final subset.
I X ; Y = x X y Y p x , y log p x , y p x p y ,
To ensure robust model training, we applied tailored preprocessing to the selected features. The feature matrix was first transformed using a QuantileTransformer to mitigate scale bias and capture nonlinear relationships, followed by standardization with RobustScaler to enhance convergence stability. This preprocessing pipeline produced a compact, well-scaled dataset, thereby improving predictive accuracy and computational efficiency in hemolytic peptide regression.

2.4. Cross-Validation

Following standard bioinformatics practice, the dataset was randomly partitioned into a training set (80%) and a test set (20%). Model performance was assessed using five-fold cross-validation within the training set. Specifically, the data were randomly divided into five folds, with each iteration using four folds for training and one for validation, ensuring that every fold served as the validation set exactly once. The test set remained completely isolated from all training, validation, and hyperparameter tuning procedures to guarantee an unbiased final evaluation. Additionally, random seeds were fixed throughout all experiments to ensure reproducibility.

2.5. Machine Learning Model Construction

An enhanced ensemble framework was employed by integrating multiple regression algorithms, including Random Forest Regressor, Extremely Randomized Trees (Extra Trees) Regressor, Gradient Boosting Regressor, eXtreme Gradient Boosting (XGBoost) Regressor, and Ridge Regression (Ridge CV). The hyperparameter ranges for each model are provided in Supplementary Table S1.
Each model’s five-fold cross-validation score served as its base weight, reflecting its reliability. To enhance ensemble discrimination, a nonlinear transformation (score2) was applied to amplify the contribution of consistently strong learners. Furthermore, empirical calibration showed that moderate scaling factors (1.3–2.0×) improved predictive stability and mitigated overfitting tendencies in tree-based models. Specifically, models achieving cross-validation scores above 0.9 and 0.8 were assigned weight multipliers of 2× and 1.5×, respectively, while the Ridge model was upweighted by 1.3× to counterbalance the overfitting tendency of tree-based regressors. Finally, all weights were normalized, and the final prediction was computed as a weighted average.

2.6. Performance Evaluation

Model performance was evaluated using five metrics, grouped into two functional categories: (i) goodness-of-fit metrics, including the coefficient of determination (R2) and correlation coefficient (R); and (ii) error-based metrics, including mean squared error (MSE), mean absolute error (MAE), and root mean squared error (RMSE). The mathematical formulations of these metrics are shown in Equations (6)–(9):
R 2 = 1 y i y i ^ 2 y i y ¯ 2 ,
R = y i y ¯ y i ^ y ^ ¯ y i y ¯ 2 y i ^ y ^ ¯ 2 ,
M A E = 1 n i = 1 n y i y i ^ ,
R M S E = 1 n y i y i ^ 2 = M S E ,

2.7. Global Model Interpretation

To elucidate the model’s decision process, we employed SHAP, a method grounded in cooperative game theory. SHAP quantifies the contribution of each feature to a prediction by evaluating its marginal impact across all possible feature combinations. Specifically, the algorithm estimates the contribution of a feature to an individual prediction by iteratively adding or removing features, assigning a Shapley value that ensures fair attribution of importance. The underlying principle of Shapley values is to comprehensively assess each feature’s influence by integrating its contributions over all possible feature subsets.
In this study, SHAP values were computed for the entire ensemble model across all test samples, enabling interpretation of the ensemble’s integrated decision boundaries and reflecting the joint effect of its constituent models. We quantified the impact of each feature on model predictions and identified the top 20 most influential features along with their corresponding Shapley values and indices. These features highlight the primary determinants of hemolytic peptide toxicity, offering valuable insights for further model refinement. Additionally, a global feature importance plot was generated using the summary_plot function, illustrating the distribution of Shapley values and providing an intuitive overview of each feature’s overall influence. These analyses enhance the interpretability of the model and serve as a reference for subsequent feature selection and optimization.

2.8. Single-Sample Explanations

While SHAP was employed to interpret global feature importance across the ensemble, single-sample explanations and prediction uncertainty were further analyzed using the Calibrated Explanations framework built upon the Conformal Predictive Systems (CPS) methodology. In this framework, SHAP-derived feature attributions were calibrated to provide statistically valid confidence bounds for each individual prediction.
Using the Python (3.10) package calibrated-explanations, we implemented two distinct interpretable modes: standard regression for calibrating continuous HC50 values, and probabilistic regression for estimating the probability of HC50 exceeding a specific threshold, respectively. The calibration dataset used in this step was derived from Karasev et al. [17], consisting of peptides with experimentally validated HC50 values.

3. Results and Discussion

3.1. Analysis of the Amino Acid Composition

Comparative analysis of amino acid composition revealed distinct distributional patterns between hemolytic and non-hemolytic peptides (Figure 2). Lysine (K) occurred at a significantly higher frequency in non-hemolytic samples, whereas Leucine (L) was notably more abundant in hemolytic peptides. The remaining amino acids exhibited relatively balanced distributions across both groups, with less pronounced differences than those observed for K and L.
These compositional differences suggest underlying physicochemical disparities between hemolytic and non-hemolytic peptides. The abundance of L in hemolytic peptides may enhance hydrophobic interactions with cell membranes, thereby facilitating membrane disruption and hemolytic activity. Conversely, the higher frequency of K in non-hemolytic peptides might reduce overall hydrophobicity or impede membrane binding, ultimately diminishing hemolytic potential. These residue-specific patterns provide a crucial physicochemical foundation for interpreting the feature importance results of the ensemble learning model.

3.2. Ablation Experiments

Ablation experiments were conducted to identify key factors influencing model performance. We systematically removed specific feature groups, including AAD, features derived from the large protein language model ProtT5, and embeddings from the ESM2_t33 model. Table 2 summarizes the outcomes and highlights the effectiveness of the feature-selection strategy, showing that AAD and ESM2_t33 embeddings are the two most influential contributors to regression performance.
Feature selection played a decisive role in improving the ensemble model across all experimental settings, regardless of whether individual feature types (AAD, ESM2_t33, or ProtT5) were excluded or combined. For instance, in the “Without AAD” setting, applying feature selection reduced the MAE from 0.3689 to 0.3471 and the RMSE from 0.4779 to 0.4528, while the R2 increased from 0.4364 to 0.4942. These results confirm that feature selection enhances both predictive accuracy and generalization capacity.
The further analysis of the outcomes from various feature combinations revealed variations in the efficacy of feature selection among them. The “Without ESM2_t33” experiment demonstrated the most pronounced effect setting and exhibited the most pronounced improvement, where MAE decreased from 0.3636 to 0.3115, RMSE from 0.4729 to 0.4201, and R2 increased from 0.4482 to 0.5645. Although excluding ProtT5 features resulted in smaller gains, the combination of manually extracted AAD and ESM2_t33 embeddings achieved the best overall performance across all four evaluation metrics. Accordingly, this feature set was adopted for all subsequent experiments.
Overall, these findings demonstrate that the feature-selection pipeline consistently reduces model error and enhances robustness, with particular efficacy in handling high-dimensional data.

3.3. Regression Performance

Given the strong performance of the handcrafted AAD and the ESM2_t33 embeddings, the two feature sets were selected as inputs for the final regression model. Using this combined feature representation, we employed 80% of the Rathore et al. dataset for training and the remaining 20% for testing. Model performance was evaluated by comparing predicted and true −log(HC50) values.
As shown in Figure 3, the diagonal distribution of data points suggests a strong agreement between predicted and actual values. Quantitatively, the model achieved an R2 of 0.57 and an R of 0.76, confirming reliable predictive capability for hemolytic activity. Model stability was further verified through five-fold cross-validation on the training set, yielding consistent results (average R2 ≈ 0.53, average PCC ≈ 0.73). The close alignment between the cross-validation and independent test performance demonstrates the model’s robustness and generalization capability, effectively ruling out overfitting. Moreover, the comparable distributions of predicted and observed values, visualized through histograms, further support this consistency.
Having established the base performance of our model, we performed a comparative analysis against established benchmarks. We compared the performance with eight other machine learning regressors, such as RF, XGBoost (XGB), Decision Tree (DT), Adaptive Boosting (ADB), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Extra Trees (ET). As summarized in Table 3, the results revealed clear differences in error control, explanatory power, and correlation strength. ADB achieved the lowest MAE (0.254) and RMSE (0.400), indicating superior error control, but its R2 (0.374) and R (0.625) suggested limited explanatory capacity. In contrast, RF and ET demonstrated balanced performance, with R2 values around 0.53–0.56 and relatively low errors. Other algorithms, including XGB, DT, SVM, and KNN, lagged behind across all metrics.
The proposed ensemble model achieved the highest R2 (0.565) and R (0.755), indicating the strongest fit and linear correlation with experimental data, despite its moderate MAE (0.313) and RMSE (0.420). Collectively, ADBR proves optimal when minimizing prediction error is the primary objective, whereas the proposed model excels in scenarios prioritizing interpretability and predictive correlation. Meanwhile, RF and ET offer a balanced compromise.
To further evaluate practical performance, we investigated computational efficiency under varying dimensional complexity. As model dimensionality increased, testing time rose correspondingly from 0.221 s to 0.690 s (Figure 4), reflecting the higher computational burden of larger feature spaces. Training time exhibited more complex behavior: although it remained relatively stable overall, slight decreases were observed within certain dimensional ranges, likely due to optimization dynamics or hardware utilization differences. In general, testing time was more sensitive to dimensionality, whereas training time was influenced by multiple interacting factors, including model complexity and system performance.
In summary, this study successfully established an effective regression model for hemolytic activity by integrating AAD and ESM2_t33 embeddings. The model demonstrated strong predictive accuracy (R2 = 0.57, R = 0.76) and consistently outperformed baseline algorithms across key metrics. Despite the computational demands of high-dimensional input features, the model achieved an effective balance between predictive accuracy, interpretability, and efficiency—making it a practical and reliable tool for quantitative hemolytic activity prediction.

3.4. Global Interpretability Analysis

To elucidate the determinants of model predictions, SHAP analysis was applied directly to the integrated ensemble model to quantify feature contributions. Figure 5 displays the top 20 most influential features and their corresponding Shapley values. In this study, the sign of the Shapley values directly corresponds to the direction of influence: negative values correspond to lower predicted hemolytic activity, whereas positive values indicate higher predicted activity.
Notably, the most impactful features included both engineered AADs (e.g., CeTD_12_VW, CeTD_13_CH) and ESM2_t33 embeddings (e.g., esm_572, esm_688, esm_897), confirming that these two types of features provide complementary information. The AADs capture interpretable physicochemical trends, while the ESM2_t33 embeddings encode contextual and structural dependencies within sequences. Their integration enables the ensemble to exploit both explicit biochemical knowledge and implicit sequence representations, enhancing predictive robustness and interpretability.
Among these, the most influential feature, CeTD_12_VW, characterizes global sequence composition, transitions, and distributions based on the 12th van der Waals volume grouping. It exhibits a clear monotonic pattern: higher values (red) correspond to positive Shapley values that increase predicted hemolytic activity, whereas lower values (blue) associate with negative Shapley values, reducing the predicted response. This relationship reflects a direct mechanistic linkage between sequence-level physicochemical properties and hemolytic potential.

3.5. Single-Sample Interpretability Analysis

To further investigate the model’s decision pathways and quantify predictive uncertainty, we employed Calibrated Explanations based on the Conformal Predictive Systems framework. Using the first test sample as an illustrative case, both factual and counterfactual explanations were generated for standard and probabilistic regression analyses. Table 4 presents the relevant information for the peptide sequence that is subsequently used in the single-sample explanation.
In the baseline interpretability analysis of the Hemopi2 dataset (Figure 6a), the model generated a calibrated prediction for the selected test peptide. The upper panel shows a predicted median HC50 of approximately 300.9 μM (red line), with the shaded area indicating the 90% confidence interval (5th–95th percentiles). This indicates that the true HC50 value is expected to fall within this range. The lower panel illustrates the contributions of key features to the prediction: negative weights (red) represent features that decrease the predicted HC50, whereas positive weights (blue) indicate features that increase it.
Among these predictors, the Composition-enhanced Transition and Distribution (CeTD) descriptors exerted the strongest influence. CeTD features integrate amino acid composition, transition probabilities, and positional distributions based on physicochemical classifications. For example, CeTD_13_SA captures the transition and distribution of amino acids grouped by solvent accessibility, describing how residues with different exposure levels are arranged and how these patterns affect peptide-membrane interactions. When CeTD_13_SA ≤ −3.45, its negative effect is maximal, reducing the predicted HC50 by nearly 200 μM. Similarly, lower values of AAC_C (the relative abundance of cysteine residues) and CTC_171 (a conjoint triad feature representing short-range residue combinations) are associated with decreased HC50, suggesting that diminished cysteine content and specific local residue arrangements may attenuate hemolytic potential.
Conversely, CeTD_25_p_CH2 and CeTD_HB2 exhibit positive contributions. CeTD_25_p_CH2 reflects transitions among residues rich in −CH2− side chains, indicating elevated hydrophobic carbon content, while CeTD_HB2 captures higher-order distributions based on hydrogen-bonding potential. When CeTD_25_p_CH2 > −0.67 and CeTD_HB2 > 0.71, both features increase the predicted HC50, implying that enhanced hydrophobicity and hydrogen-bonding networks may jointly mitigate hemolytic activity.
Uncertainty analysis for the same Hemopi2 sample (Figure 6b) incorporated predictive confidence into feature attribution. CeTD_13_SA remained the dominant negative factor, with its 90% confidence interval indicating at least a 190 μM reduction in HC50 under the specified condition. In contrast, AAC_C and CTC_171 exhibited intervals crossing zero, implying statistically indeterminate directional effects. Because feature normalization was not applied, the interval widths were directly comparable, allowing consistent evaluation of absolute effect magnitudes.
Taken together, these results demonstrate that the model delivers robust point predictions while simultaneously quantifying the confidence of feature contributions, enabling a more comprehensive understanding of the molecular determinants of peptide hemolysis. These findings emphasize that CeTD descriptors, which capture transitions and distributions among amino acid groups, particularly in terms of solvent exposure, hydrophobicity, and hydrogen-bonding potential, are major contributors to hemolytic behavior. This integration of explainable machine learning with physicochemical sequence analysis provides valuable mechanistic insight and a rational basis for subsequent peptide design and experimental validation.
For each rule, the solid line and the lighter red band in the figure represent the expected median and confidence interval of the predicted HC50 when the sample satisfies that rule. As shown in Figure 7a, when all other conditions remain constant and the median solvent accessibility exceeds −3.55, the expected HC50 is approximately 150 μM. Evidently, a decrease in the amino acid transition ratio (DDR_T) or in the distribution proportion of amino acids grouped by van der Waals volume (CeTD_VW1) results in a higher HC50, whereas an increase in these parameters leads to a lower HC50.
Further one-sided counterfactual analysis, presented in Figure 7b, provides an upper-bound explanation at the 90% confidence level for the hemolytic peptide dataset. Under otherwise unchanged conditions, when solvent accessibility exceeds −3.55, the probability that HC50 falls below roughly 160 μM reaches 90%. Moreover, when DDR_T is less than −2.59 or CeTD_VW1 is less than −2.61, the model estimates with 90% confidence that HC50 will be below approximately 110 μM.
To summarize, our results identify solvent accessibility, sequence transition ratio, and van der Waals volume distribution as key determinants in predicting hemolytic potential, linking hemolytic potential to residue accessibility, hydrophobic clustering, and hydrogen-bonding capacity [33,34,35]. These findings align with established hemolysis mechanisms, emphasizing the role of amino acid composition and spatial arrangement in peptide-membrane disruption. Furthermore, we demonstrate that counterfactual analysis offers conditional, causal insights beyond single-instance explanations, providing a solid modeling basis for elucidating peptide hemolysis mechanisms and guiding sequence optimization.

3.6. Server Implementation

Building upon our ensemble learning framework, we developed a web server (HemPepPred) for high-throughput peptide toxicity prediction, available at http://hem.cqudfbp.net (accessed on 22 October 2025). Users can submit sequence data on the main page (Figure 8a) and download the resulting outputs for further analysis (Figure 8b). The platform supports batch processing and automates the entire analytical pipeline—from sequence validation and statistical summarization to feature extraction, model prediction, and report generation.
To efficiently process large-scale datasets, the feature extraction module integrates ESM2_t33 embeddings with traditional AAD, leveraging batch parallelization to accelerate computation. Powered by PyTorch (2.3.1), the backend prediction engine employs an ensemble architecture that supports both classification and regression tasks, outputting toxicity probabilities and predicted labels. The web frontend, developed using Flask, HTML, and CSS, provides an intuitive interface for user interaction, while integration with MySQL ensures reliable data management and operational stability.

4. Conclusions

This study introduces an ensemble-learning regression framework for the quantitative prediction of peptide hemolytic activity by integrating amino acid descriptors (AADs) with protein language model embeddings. The proposed ensemble model outperforms conventional baselines, demonstrating that engineered descriptors and language model embeddings contribute complementary information. A multi-stage feature selection strategy further enhances predictive accuracy and generalization. Interpretability analyses identified CeTD descriptors as key determinants, providing mechanistic insights into hemolysis.
While the current framework achieves promising predictive performance, it remains limited by the scale and diversity of available experimental data, as well as by its reliance on sequence-derived features. These constraints may affect the model’s generalizability to novel peptide classes or varying experimental conditions. Future extensions could involve integrating 3D structural descriptors, physicochemical simulations, or transfer learning from broader peptide datasets to improve robustness, interpretability, and applicability.
The proposed framework paves the way for more accurate and interpretable prediction of peptide toxicity, facilitating safer peptide design and practical applications in biotechnology.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/foods14234143/s1, Table S1: Hyperparameter selection for ensemble learning.

Author Contributions

Conceptualization, X.L. (Xiang Li) and W.Z.; methodology, X.L. (Xiang Li) and W.Z.; software, X.L. (Xiang Li) and W.Z.; validation, S.Y.; formal analysis, X.L. (Xiang Li); investigation, X.Z.; resources, X.Z.; data curation, S.Y.; writing—original draft preparation, X.L. (Xiao Liang) and W.Z.; writing—review and editing, X.L. (Xiao Liang) and W.Z.; visualization, X.L. (Xiao Liang), W.Z. and X.L. (Xiao Liang); supervision, G.L.; project administration, G.L.; funding acquisition, G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (32172196), Chongqing Talent Program Project (cstc2024ycjhbgzxm0113), and the Technology Innovation and Application Development Key Project of Chongqing (CSTB2024TIAD-KPX0014).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhao, Y.; Xia, Y.; Yu, Y.; Liang, G. QSAR in natural non-peptidic food-related compounds: Current status and future perspective. Trends Food Sci. Technol. 2023, 140, 104165. [Google Scholar] [CrossRef]
  2. Sun, X.; Li, Y.; Wang, M.; Amakye, W.K.; Ren, J.; Matsui, T.; Wang, W.; Tsopmo, A.; Udenigwe, C.C.; Giblin, L.; et al. Research Progress on Food-Derived Bioactive Peptides: An Overview of the 3rd International Symposium on Bioactive Peptides. J. Agric. Food Chem. 2024, 72, 23709–23715. [Google Scholar] [CrossRef]
  3. Bortoletto, A.S.; Graham, W.V.; Trout, G.; Bonito-Oliva, A.; Kazmi, M.A.; Gong, J.; Weyburne, E.; Houser, B.L.; Sakmar, T.P.; Parchem, R.J. Human Islet Amyloid Polypeptide (hIAPP) Protofibril-Specific Antibodies for Detection and Treatment of Type 2 Diabetes. Adv. Sci. 2022, 9, 2202342. [Google Scholar] [CrossRef]
  4. Zhang, W.; Smith, N.; Zhou, Y.; McGee, C.M.; Bartoli, M.; Fu, S.; Chen, J.; Domena, J.B.; Joji, A.; Burr, H.; et al. Carbon dots as dual inhibitors of tau and amyloid-beta aggregation for the treatment of Alzheimer’s disease. Acta Biomater. 2024, 183, 341–355. [Google Scholar] [CrossRef] [PubMed]
  5. del Amo-Maestro, L.; Mendes, S.R.; Rodríguez-Banqueri, A.; Garzon-Flores, L.; Girbal, M.; Rodríguez-Lagunas, M.J.; Guevara, T.; Franch, À.; Pérez-Cano, F.J.; Eckhard, U.; et al. Molecular and in vivo studies of a glutamate-class prolyl-endopeptidase for coeliac disease therapy. Nat. Commun. 2022, 13, 4446. [Google Scholar] [CrossRef] [PubMed]
  6. Cavada, B.S.; Osterne, V.J.S.; Oliveira, M.V.; Pinto-Junior, V.R.; Lima Silva, M.T.; Bari, A.U.; Dias Lima, L.; Lossio, C.F.; Nascimento, K.S. Reviewing Mimosoideae lectins: A group of under explored legume lectins. Int. J. Biol. Macromol. 2020, 154, 159–165. [Google Scholar] [CrossRef] [PubMed]
  7. Morais, J.K.S.; Gomes, V.M.; Oliveira, J.T.A.; Santos, I.S.; Da Cunha, M.; Oliveira, H.D.; Oliveira, H.P.; Sousa, D.O.B.; Vasconcelos, I.M. Soybean Toxin (SBTX), a Protein from Soybeans That Inhibits the Life Cycle of Plant and Human Pathogenic Fungi. J. Agric. Food Chem. 2010, 58, 10356–10363. [Google Scholar] [CrossRef]
  8. Zhong, Y.; Xu, T.; Ji, S.; Wu, X.; Zhao, T.; Li, S.; Znahg, P.; Li, K.; Lu, B. Effect of ultrasonic pretreatment on eliminating cyanogenic glycosides and hydrogen cyanide in cassava. Ultrason. Sonochemistry 2021, 78, 105742. [Google Scholar] [CrossRef]
  9. Lohan, S.; Konshina, A.G.; Efremov, R.G.; Maslennikov, I.; Parang, K. Structure-Based Rational Design of Small α-Helical Peptides with Broad-Spectrum Activity against Multidrug-Resistant Pathogens. J. Med. Chem. 2023, 66, 855–874. [Google Scholar] [CrossRef]
  10. Xie, P.; Yao, L.; Guan, J.; Chung, C.-R.; Zhao, Z.; Long, F.; Sun, Z.; Lee, T.-Y.; Chiang, Y.-C. ConsAMPHemo: A computational framework for predicting hemolysis of antimicrobial peptides based on machine learning approaches. Protein Sci. 2025, 34, e70087. [Google Scholar] [CrossRef]
  11. Wu, F.; Zhou, Y.; Li, L.; Shen, X.; Chen, G.; Wang, X.; Liang, X.; Tan, M.; Huang, Z. Computational Approaches in Preclinical Studies on Drug Discovery and Development. Front. Chem. 2020, 8, 726. [Google Scholar] [CrossRef]
  12. Rathore, A.S.; Kumar, N.; Choudhury, S.; Mehta, N.K.; Raghava, G.P.S. Prediction of hemolytic peptides and their hemolytic concentration. Commun. Biol. 2025, 8, 176. [Google Scholar] [CrossRef] [PubMed]
  13. Plisson, F.; Ramírez-Sánchez, O.; Martínez-Hernández, C. Machine learning-guided discovery and design of non-hemolytic peptides. Sci. Rep. 2020, 10, 16581. [Google Scholar] [CrossRef] [PubMed]
  14. Yang, S.; Xu, P. HemoDL: Hemolytic peptides prediction by double ensemble engines from Rich sequence-derived and transformer-enhanced information. Anal. Biochem. 2024, 690, 115523. [Google Scholar] [CrossRef] [PubMed]
  15. Almotairi, S.; Badr, E.; Abdelbaky, I.; Elhakeem, M.; Salam, M.S. Hybrid transformer-CNN model for accurate prediction of peptide hemolytic potential. Sci. Rep. 2024, 14, 14263. [Google Scholar] [CrossRef]
  16. Raza, A.; Arshad, H.S. Prediction of Hemolysis Tendency of Peptides using a Reliable Evaluation Method. arXiv 2020, arXiv:2012.06470. [Google Scholar] [CrossRef]
  17. Karasev, D.A.; Malakhov, G.S.; Sobolev, B.N. Quantitative prediction of hemolytic activity of peptides. Comput. Toxicol. 2024, 32, 100335. [Google Scholar] [CrossRef]
  18. Castillo-Mendieta, K.; Agüero-Chapin, G.; Marquez, E.; Perez-Castillo, Y.; Barigye, S.J.; Pérez-Cárdenas, M.; Peréz-Giménez, F.; Marrero-Ponce, Y. Multiquery Similarity Searching Models: An Alternative Approach for Predicting Hemolytic Activity from Peptide Sequence. Chem. Res. Toxicol. 2024, 37, 580–589. [Google Scholar] [CrossRef]
  19. Yaseen, A.; Gull, S.; Akhtar, N.; Amin, I.; Minhas, F. HemoNet: Predicting hemolytic activity of peptides with integrated feature learning. J. Bioinform. Comput. Biol. 2021, 19, 2150021. [Google Scholar] [CrossRef]
  20. Hasan, M.M.; Schaduangrat, N.; Basith, S.; Lee, G.; Shoombuatong, W.; Manavalan, B. HLPpred-Fuse: Improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics 2020, 36, 3350–3356. [Google Scholar] [CrossRef]
  21. Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the NIPS’17: 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017. [Google Scholar]
  22. Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
  23. Ribeiro, M.T.; Singh, S.; Guestrin, C. Anchors: High-Precision Model-Agnostic Explanations. In Proceedings of the AAAI’18: AAAI Conference on Artificial Intelligence, New Orleans, LO, USA, 2–7 February 2018. [Google Scholar]
  24. Long, T.Z.; Shi, S.H.; Liu, S.; Lu, A.-P.; Liu, Z.-Q.; Li, M.; Hou, T.-J.; Cao, D.-S. Structural Analysis and Prediction of Hematotoxicity Using Deep Learning Approaches. J. Chem. Inf. Model. 2023, 63, 111–125. [Google Scholar] [CrossRef] [PubMed]
  25. Zou, Y.; Shi, Y.; Sun, F.; Liu, J.; Gao, Y.; Zhang, H.; Lu, X.; Gong, Y.; Xia, S. Extreme gradient boosting model to assess risk of central cervical lymph node metastasis in patients with papillary thyroid carcinoma: Individual prediction using SHapley Additive exPlanations. Comput. Methods Programs Biomed. 2022, 225, 107038. [Google Scholar] [CrossRef] [PubMed]
  26. Fan, Y.W.; Liu, W.H.; Chen, Y.T.; Hsu, Y.S.; Pathak, N.; Huang, Y.W.; Yang, J.M. Exploring kinase family inhibitors and their moiety preferences using deep SHapley additive exPlanations. BMC Bioinform. 2022, 23 (Suppl. S4), 242. [Google Scholar] [CrossRef] [PubMed]
  27. Löfström, T.; Löfström, H.; Johansson, U.; Sönströd, C.; Matela, R. Calibrated explanations for regression. Mach. Learn. 2025, 114, 100. [Google Scholar] [CrossRef]
  28. Gautam, A.; Chaudhary, K.; Singh, S.; Joshi, A.; Anand, P.; Tuknait, A.; Mathur, D.; Varshney, G.C.; Raghava, G.P.S. Hemolytik: A database of experimentally determined hemolytic and non-hemolytic peptides. Nucleic Acids Res. 2014, 42, D444–D449. [Google Scholar] [CrossRef]
  29. Mienye, I.D.; Sun, Y. A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects. IEEE Access 2022, 10, 99129–99149. [Google Scholar] [CrossRef]
  30. Zhou, P.; Liu, Q.; Wu, T.; Miao, Q.; Shang, S.; Wang, H.; Chen, Z.; Wang, S.; Wang, H. Systematic Comparison and Comprehensive Evaluation of 80 Amino Acid Descriptors in Peptide QSAR Modeling. J. Chem. Inf. Model. 2021, 61, 1718–1731. [Google Scholar] [CrossRef]
  31. Elnaggar, A.; Heinzinger, M.; Dallago, C.; Rehawi, G.; Wang, Y.; Jones, L.; Gibbs, T.; Feher, T.; Angerer, C.; Steinegger, M.; et al. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 7112–7127. [Google Scholar] [CrossRef]
  32. Lin, Z.; Akin, H.; Rao, R.; Hie, B.; Zhu, Z.; Lu, W.; Smetanin, N.; Verkuil, R.; Kabeli, O.; Shmueli, Y.; et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023, 379, 1123–1130. [Google Scholar] [CrossRef]
  33. Yang, Y.; Dias, C.L. Peptide–Membrane Binding: Effects of the Amino Acid Sequence. J. Phys. Chem. B 2023, 127, 912–920. [Google Scholar] [CrossRef]
  34. Kabelka, I.; Vácha, R. Advances in Molecular Understanding of α-Helical Membrane-Active Peptides. Acc. Chem. Res. 2021, 54, 2196–2204. [Google Scholar] [CrossRef]
  35. Lewis, R.N.; Liu, F.; Krivanek, R.; Rybar, P.; Hianik, T.; Flach, C.R.; Mendelsohn, R.; Chen, Y.; Mant, C.T.; Hodges, R.S.; et al. Studies of the minimum hydrophobicity of alpha-helical peptides required to maintain a stable transmembrane association with phospholipid bilayer membranes. Biochemistry 2007, 46, 1042–1054. [Google Scholar] [CrossRef]
Figure 1. Workflow of Hemolytic Peptide Toxicity Prediction: (1) Dataset Preparation; (2) Feature Extraction and Selection; (3) Model Selection; (4) Performance Evaluation; (5) Server Implementation.
Figure 1. Workflow of Hemolytic Peptide Toxicity Prediction: (1) Dataset Preparation; (2) Feature Extraction and Selection; (3) Model Selection; (4) Performance Evaluation; (5) Server Implementation.
Foods 14 04143 g001
Figure 2. Amino Acid Composition of the Hemopi2 Dataset.
Figure 2. Amino Acid Composition of the Hemopi2 Dataset.
Foods 14 04143 g002
Figure 3. Scatter plot of −log(HC50), (ae) Scatter plots of predicted versus true −log(HC50) values from the five-fold cross-validation on the training set. The solid line in each panel represents the line of perfect prediction (y = x). The corresponding performance metrics (R2, MSE, PCC) for each fold are annotated within the plots. (f) Scatter plot of predicted versus true -log(HC50) values on the independent test set.
Figure 3. Scatter plot of −log(HC50), (ae) Scatter plots of predicted versus true −log(HC50) values from the five-fold cross-validation on the training set. The solid line in each panel represents the line of perfect prediction (y = x). The corresponding performance metrics (R2, MSE, PCC) for each fold are annotated within the plots. (f) Scatter plot of predicted versus true -log(HC50) values on the independent test set.
Foods 14 04143 g003
Figure 4. Training and testing time of the ensemble model at different feature dimensions.
Figure 4. Training and testing time of the ensemble model at different feature dimensions.
Foods 14 04143 g004
Figure 5. Top 20 features utilized in the regression model. The left bar plot shows the ranked features, with importance calculated by averaging Shapley values over the Sreg test dataset. The right beeswarm plot illustrates the influence of different feature values on the prediction.
Figure 5. Top 20 features utilized in the regression model. The left bar plot shows the ranked features, with importance calculated by averaging Shapley values over the Sreg test dataset. The right beeswarm plot illustrates the influence of different feature values on the prediction.
Foods 14 04143 g005
Figure 6. The blue bars indicate positive feature weights that increase the predicted value, and red bars indicate negative weights that decrease it. (a) Regular plot for the Hemopi2 dataset. The top subplot displays median values and confidence intervals, while the lower subplot visualizes the feature importance. (b) Uncertainty plot for the Hemopi2 dataset. This figure shares the same top subplot as (a), while the lower subplot highlights the uncertainty associated with each feature’s weight using shaded percentiles.
Figure 6. The blue bars indicate positive feature weights that increase the predicted value, and red bars indicate negative weights that decrease it. (a) Regular plot for the Hemopi2 dataset. The top subplot displays median values and confidence intervals, while the lower subplot visualizes the feature importance. (b) Uncertainty plot for the Hemopi2 dataset. This figure shares the same top subplot as (a), while the lower subplot highlights the uncertainty associated with each feature’s weight using shaded percentiles.
Foods 14 04143 g006
Figure 7. (a) The counterfactual plot for the hemolytic peptide dataset. (Each row represents a rule, showing its median (solid line) and 5th–95th percentile confidence interval. The overall background provides the original instance’s confidence interval for reference); (b) One-sided counterfactual plot using the 90th upper percentile only to define confidence intervals, demonstrating the effect of conditional violations.
Figure 7. (a) The counterfactual plot for the hemolytic peptide dataset. (Each row represents a rule, showing its median (solid line) and 5th–95th percentile confidence interval. The overall background provides the original instance’s confidence interval for reference); (b) One-sided counterfactual plot using the 90th upper percentile only to define confidence intervals, demonstrating the effect of conditional violations.
Foods 14 04143 g007
Figure 8. An overview of the end-to-end web platform with a user-friendly interface. (a) The main page of the web server. (b) The resultant output can be downloaded for further analysis.
Figure 8. An overview of the end-to-end web platform with a user-friendly interface. (a) The main page of the web server. (b) The resultant output can be downloaded for further analysis.
Foods 14 04143 g008
Table 1. Sample Statistics of the Hemopi2 Dataset.
Table 1. Sample Statistics of the Hemopi2 Dataset.
Dataset Training DatasetTest Dataset
PositiveNegativePositiveNegative
Taken from
(Rathore et al. [28])
713828178207
Taken from
(Karasev et al. [17])
826
Table 2. Ablation study results based on the ensemble learning model.
Table 2. Ablation study results based on the ensemble learning model.
MethodFeature (dim)MAERMSER2R
Without
AAD
Without feature selection
(1024 + 1280 = 2304)
0.36890.47790.43640.6772
Feature selection0.34710.45280.49420.7173
Without ESM2_t33Without feature selection
(1024 + 1167 = 2191)
0.36360.47290.44820.6753
Feature selection0.31150.42010.56450.7527
Without ProtT5Without feature selection
(1167 + 1280 = 2447)
0.32490.42710.550.7494
Feature selection (633)0.31300.42000.570.7551
Fusion of three featuresWithout feature selection
(3471)
0.33770.44480.51180.7250
Feature selection0.31910.42240.56000.7520
Table 3. Performance comparison across different machine learning models (dim = 633).
Table 3. Performance comparison across different machine learning models (dim = 633).
RegressorMAERMSER2R
RF0.3260.4370.5290.732
XGB0.3340.4530.4940.708
DT0.3300.4430.1850.442
ADB0.2540.4000.3740.625
SVM0.4590.5940.1310.439
KNN0.3680.500.3880.623
ET0.3060.4200.5600.751
Ours0.3130.4200.5650.755
Table 4. A randomly selected peptide sequence prediction from the test set.
Table 4. A randomly selected peptide sequence prediction from the test set.
SequenceTrue_
HC50
Predicted_
HC50
True_
Hemolytic
Predicted_
Hemoplytic
Log_
True
Log_
Predicted
GIMSSLMKKLKAHIAK400.0300.9112.602.48
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, X.; Zhao, W.; Liang, X.; Zhuo, X.; Yu, S.; Liang, G. HemPepPred: Quantitative Prediction of Peptide Hemolytic Activity Based on Machine Learning and Protein Language Model–Derived Features. Foods 2025, 14, 4143. https://doi.org/10.3390/foods14234143

AMA Style

Li X, Zhao W, Liang X, Zhuo X, Yu S, Liang G. HemPepPred: Quantitative Prediction of Peptide Hemolytic Activity Based on Machine Learning and Protein Language Model–Derived Features. Foods. 2025; 14(23):4143. https://doi.org/10.3390/foods14234143

Chicago/Turabian Style

Li, Xiang, Wanting Zhao, Xiao Liang, Xinlan Zhuo, Shuang Yu, and Guizhao Liang. 2025. "HemPepPred: Quantitative Prediction of Peptide Hemolytic Activity Based on Machine Learning and Protein Language Model–Derived Features" Foods 14, no. 23: 4143. https://doi.org/10.3390/foods14234143

APA Style

Li, X., Zhao, W., Liang, X., Zhuo, X., Yu, S., & Liang, G. (2025). HemPepPred: Quantitative Prediction of Peptide Hemolytic Activity Based on Machine Learning and Protein Language Model–Derived Features. Foods, 14(23), 4143. https://doi.org/10.3390/foods14234143

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop