A Novel LSTM-Based Machine Learning Model for Predicting the Activity of Food Protein-Derived Antihypertensive Peptides

Liao, Wang; Yan, Siyuan; Cao, Xinyi; Xia, Hui; Wang, Shaokang; Sun, Guiju; Cai, Kaida

doi:10.3390/molecules28134901

Open AccessArticle

A Novel LSTM-Based Machine Learning Model for Predicting the Activity of Food Protein-Derived Antihypertensive Peptides

by

Wang Liao

^1,2

,

Siyuan Yan

^1,2,

Xinyi Cao

^1,2,

Hui Xia

^1,2

,

Shaokang Wang

^1,2

,

Guiju Sun

^1,2

and

Kaida Cai

^1,3,4,*

¹

Key Laboratory of Environmental Medicine and Engineering of Ministry of Education, School of Public Health, Southeast University, Nanjing 210009, China

²

Department of Nutrition and Food Hygiene, School of Public Health, Southeast University, Nanjing 210009, China

³

Department of Epidemiology & Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China

⁴

Department of Statistics and Actuarial Sciences, School of Mathematics, Southeast University, Nanjing 210009, China

^*

Author to whom correspondence should be addressed.

Molecules 2023, 28(13), 4901; https://doi.org/10.3390/molecules28134901

Submission received: 10 May 2023 / Revised: 14 June 2023 / Accepted: 19 June 2023 / Published: 21 June 2023

(This article belongs to the Special Issue Bioactive Peptides: Emerging Fronts in Nutrition)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Food protein-derived antihypertensive peptides are a representative type of bioactive peptides. Several models based on partial least squares regression have been constructed to delineate the relationship between the structure and activity of the peptides. Machine-learning-based models have been applied in broad areas, which also indicates their potential to be incorporated into the field of bioactive peptides. In this study, a long short-term memory (LSTM) algorithm-based deep learning model was constructed, which could predict the IC₅₀ value of the peptide in inhibiting ACE activity. In addition to the test dataset, the model was also validated using randomly synthesized peptides. The LSTM-based model constructed in this study provides an efficient and simplified method for screening antihypertensive peptides from food proteins.

Keywords:

antihypertensive peptides; structure–activity relationship; machine learning; LSTM algorithm

Graphical Abstract

1. Introduction

Globally, hypertension has been ranked as one of the major chronic diseases. It has been estimated that about 1.4 billion adults are suffering from hypertension worldwide and the prevalence is still on an upward trend [1]. The renin–angiotensin system (RAS) plays a major role in the regulation of blood pressure. Angiotensin II (Ang II), which is a potent vasoconstrictor in the RAS, is generated from Ang I with the action of angiotensin converting enzyme (ACE) [2]. Clinically, the inhibition of ACE activity to suppress the formation of Ang II has been considered an efficient strategy for the management of high blood pressure. Thus, synthetic ACE inhibitors have been used as a first-line pharmaceutical drug for hypertension therapy [3].

Notably, peptides that could inhibit ACE activity were identified from snake venom in 1971 and were characterized as ACE inhibitory peptides [4]. Since then, a large number of ACE inhibitory peptides have been identified from various natural protein sources, including food proteins such as milk proteins, egg proteins and soy proteins [5]. Compared with synthetic drugs, food protein-derived ACE inhibitory peptides are considered to have fewer side-effects and lower production costs, which makes these peptides a promising alternative for antihypertensive drugs.

As a representative category of food protein-derived bioactive peptides, research on ACE inhibitory peptides is diverse and mainly focuses on peptide identification, mechanistic study and clinical trials [6]. Particularly over the past two decades, enormous efforts have been paid to delineate the relationship between the structure and activity of ACE inhibitory peptides. Since it has been widely accepted that the biological activity of a chemical structure can be described by its chemical features, such as its composition, electronic attributes and hydrophobicity [7], the value that inhibits 50% of the ACE activity (known as the IC₅₀ value) has been used as an output that correlates with the structural features of the peptides. Based on this principle, quantitative structure and activity relationship (QSAR) modelling was applied in order to predict the IC₅₀ value of the ACE inhibitory peptides, and several models have been established [8,9]. However, representing the structural features of a peptide is a complicated process. In addition, the use of different strategies for peptide representation may result in variations in the accuracy of these models.

Artificial neural networks (ANN) are algorithmic mathematical models that mimic the behavioral characteristics of animal neural networks and perform distributed parallel information processing. ANN relies on the complexity of the system and achieves the purpose of processing information by adjusting the interconnected relationships between a large number of internal nodes. The deep learning-based ANN has been widely applied in the field of biomedicine. Several deep learning-based models have been constructed to predict the activity of antioxidant peptides [10,11], anticancer peptides [12] and antibacterial peptides [13]. The long short-term memory (LSTM) network is a special type of recurrent neural network (RNN) that is capable of learning order dependence in sequence prediction problems. Compared with shallow learning, LSTM has a deep learning framework with a large number of hidden layers, allowing it to learn more complex non-linear patterns [14]. Notably, the LSTM-based model has been constructed for the discovery of antimicrobial peptides [15], suggesting the feasibility of applying LSTM in predicting the activity of bioactive peptides.

Collectively, an LSTM-based prediction model was constructed in the present study, which could provide an efficient and simplified structure and activity model for ACE inhibitory peptides, as well as enabling further exploration of the application of LSTM networks in the field of bioactive peptides.

2. Results

2.1. An Overview of the Dataset

In total, 3429 peptide sequences with their corresponding IC₅₀ ACE inhibitory values were retrieved from the database and used in this study. As shown in Figure 1A, the IC₅₀ values of the peptides were variable and ranged from less than 1 μM to above 1000 μM. However, the IC₅₀ values of most of the peptides were less than 100 μM, indicating that these peptides have potent ACE inhibitory activity. In total, 2327 peptides in the data set were functional ACE inhibitory peptides.

The amino acid distribution of the peptides from benchmark datasets was also analyzed. It is obvious that proline appeared most frequently, accounting for 19.2% of all the amino acids, which is strikingly higher than the frequency of the other amino acids (Figure 1B). This finding is in line with previous reports that proline appears to be a frequent amino acid present in various bioactive peptides [10,16]. On the contrary, methionine is absent in the dataset, and the underlying reasons for this are yet to be determined (Figure 1B).

2.2. Performance Evaluation of the Model

The variations in train loss and test loss for the LSTM model show that as the training cycle progresses, the variations in train loss and test loss decrease (Figure 2), which indicates that the prediction accuracy of the LSTM model could be improved through training. However, the curves of the train set and test set were not superimposable, which might be due to the limited number of data included in the test set.

The performance of the model was then evaluated by five-fold cross-validation. The mean accuracy, average sensitivity and average specificity of the model was 85.20%, 84.92% and 85.43%, respectively. Furthermore, the RMSE was 0.18.

In addition, for the 343 peptides included in the test set, the ratio of the predicted IC₅₀ and the reported IC₅₀ was plotted. As shown in Figure 3, the ratio of 256 peptides were distributed within the range of 0.75 and 1.25, which suggested the accuracy of the model.

2.3. Model Validations

Based on the literature search, 54 peptides were retrieved that were reported with both their in vitro ACE inhibitory IC₅₀ value and their in vivo blood-pressure-lowering effect. We then applied our LSTM-based model to predict the IC₅₀ value of these peptides. As shown in Table 1, the ratio of the predicted IC₅₀ and the reported IC₅₀ of 38 peptides were distributed between 0.80 μM and 1.20 μM, among which, the ratio of 19 peptides were between 0.90 and 1.10. These results indicate the potential of our LSTM-based model to predict the IC₅₀ value of antihypertensive peptides with in vivo activity.

Finally, 20 peptides were randomly generated and synthesized. The LSTM-based model was then applied to predict the IC₅₀ value of these peptides. The experimental IC₅₀ value of each peptide was provided via the HPLC-based assay. As shown in Table 2, the ratio of the predicted IC₅₀ and the experimental IC₅₀ of 15 peptides were between 0.75 and 2. Such a result suggests the feasibility of predicting the ACE inhibitory value of a random sequence using the model developed in the present study.

3. Discussion

Food protein-derived antihypertensive peptides are one of the representative categories of bioactive peptides. Research on antihypertensive peptides has been ongoing for about five decades. Research into the structure and activity relationship of peptides has long been a prominent research area. The QSAR modelling of food protein-derived antihypertensive peptides started about two decades ago. Initially, research concentrated on the structural features of di- and tri-peptides using partial least squares regression [39]. However, the efficiency of the model was too limited to be used for high throughput prediction. Notably, machine-learning-based techniques have been applied widely across multiple areas. Importantly, several machine-learning-based models have been constructed that could be used to predict the activity of antioxidant, anticancer and antimicrobial peptides [11,13,15], which indicates the feasibility of applying machine learning algorithms in the QSAR modelling of bioactive peptides.

It has been previously reported that a machine learning model based on the support vector machine algorithm was developed to predict the antihypertensive activity of food protein-derived bioactive peptides. However, the accuracy of the model was less than 80% [40]. In a later study, the extremely randomized tree algorithm was applied, and the performance of the model was improved to 85.0%. However, this model consists of 51 feature descriptors, which makes the model complicated [41]. Notably, we utilized an LSTM deep learning model to investigate the relationship between peptide structure and bioactivity in the present study. As a special type of RNN, LSTM has the advantage of capturing historical information from prior inputs, allowing it to influence the current input and output applications for speech recognition, natural language processing and time series prediction [42]. In real-life data analyses, when the time interval is long due to the gradient vanishing problem, RNN does not have the ability to memorize the previous information well. To overcome this disadvantage, LSTM was proposed by combining short-term memory with long-term memory through gate control [43]. Importantly, our results demonstrate that the LSTM model achieved a correlation coefficient of 0.85 on the validation dataset. In addition, the LSTM model’s superiority over other models may stem from its ability to capture the sequential nature of peptide data, which allows it to detect subtle structural patterns that influence bioactivity. However, it is important to note that LSTM models have some potential drawbacks, including high computational costs due to their complex architecture and the possibility of overfitting if the dataset is not diverse enough. Therefore, future studies should explore ways to optimize LSTM model performance while controlling these factors.

Since the research on antihypertensive peptides originated from ACE inhibitory peptides, the database available for deep learning training was constructed based on the IC₅₀ values of the peptides in the in vitro ACE inhibitory assay. Thus, despite the satisfactory performance of the model developed in this study, the biological significance of the model is yet to be determined. Furthermore, it is suggested that the current peptide database should be expanded by adding the results from biologically relevant assays, such as cellular experiments and animal studies, if available. The information from biologically relevant assays could be incorporated into the machine learning model in the future. On the other hand, studies in recent years have shown that there may be targets other than ACE for antihypertensive peptides in the context of reducing blood pressure [44]. Hence, it is also recommended that a database based on the other activity parameters of the peptides is constructed.

The LSTM-based model developed in this study also demonstrated high efficiency in predicting the IC₅₀ values of randomly generated peptides. Therefore, this model could be potentially applied in peptide design, which may create novel opportunities for the screening of antihypertensive peptides. In addition, a recent study developed a machine learning empowered model capable of performing in silico gastrointestinal digestion of food proteins [45], which could be incorporated into our model to create a more comprehensive activity prediction system. However, only peptides composed of less than six amino acids were randomly generated, and the ability of the model to predict the activity of longer peptides is yet to be determined.

4. Materials and Methods

4.1. Benchmark Dataset

The peptide sequences used for data training in this study were obtained from a number of databases, including BIOPEP-UWM [46], FeptideDB [47] and BioPepDB [48]. In addition, we manually searched the literature to identify the peptides that were not included in the above databases. All of the peptides in the present study were manually curated, merged and cross-checked in order to construct a non-redundant data set. Furthermore, only peptides with an IC₅₀ value less than 2000 μM were included in this study. Following data collection, the data was randomly divided into a training set and a validation set for the model in a ratio of 9:1.

4.2. Literature Searching Strategy

PubMed and Web of Science were searched in order to identify studies investigating the IC₅₀ in in vitro ACE inhibitory assays, as well as the in vivo blood-pressure-lowering effect of food protein-derived bioactive peptides published up to April 2023. The search was performed using the following strings: “Bioactive peptides” AND “ACE inhibition” AND “Blood pressure reduction”. For model validations, peptides with known sequences that have been previously reported to exhibit in vitro ACE inhibitory IC₅₀ values and significant in vivo blood pressure lowering effects were used in this study.

4.3. Representation of the Peptide Sequence

The 19 amino acids that appeared in all the peptides were mapped to different integers, as shown in Table 3. Then, each peptide sequence was converted into a digital sequence, which was then packaged into a Pytorch dataset, with a batch size of 32 as per the specified scale.

4.4. Machine Learning Algorithms

As shown in Figure 4, the LSTM network consisted of one input and output layer and a series of recurrently connected hidden layers. The hidden layers were memory blocks, with an input gate, an output gate, a forget gate and some self-recurrent memory cells. The input, output and forget gates provided read, write and reset operations for the memory cells, respectively. Figure 1 gives an example of an LSTM memory block with a single cell. There exists a recurrently self-connected linear unit-constant error carousel (CEC) at the core of each memory block. The outside interference was stopped by the self-recurrent memory cell and the status was held from one time point to another. This is why the LSTM can solve the vanishing gradient problem. Assuming that the model input at time t was X_t = (X_t1, …, X_tn)^⊤, where n is the number of input dimensions, the input gate selected the information of input X_t to be saved into cell C_t. The forget gate selectively forgot the state of the last moment cell C_t−1. The forget gate learnt to reset memory blocks once their status was out of date. Furthermore, the forget gate prevented the cell status from growing boundless and saturating the squashing function. The components of the output ht were controlled by the output gate; that is, the output gate controlled the ability of the cell state to influence other neurons.

To show the details, the training process of the LSTM model can be formulated with some equations. The input gate

i_{t}

and the forget gate

f_{t}

have the following formulas:

i_t = σ (W_i [h_t−1,Xt] + b_i),

(1)

f_t = σ (W_f [h_t−1,Xt] + b_f),

(2)

where h_t−1 is the output of the previous cell, X_t is the input and b and W denote the bias vectors and the weight matrices, respectively. Then, we can update the cell state C_t using the following formula:

C_{t} = f_{t} C_{t - 1} + i_{t} \tanh (W_{c} [h_{t - 1, X_{t}}] + b_{c}),

(3)

where C_t−1 is the state of the previous cell, b_c and W_c denote the bias vector and weight matrix, respectively. Finally, the output gate o_t and output h_t can be defined as:

o_t = σ (W_o [h_t−1,Xt] + b_o),

(4)

h_t = o_ttanh(C_t),

(5)

where b_o and W_o denote the bias vector and weight matrix, respectively. δ(⋅) and tanh(⋅) are the sigmoid and the tanh functions defined as follows:

σ (a) = \frac{1}{{1 + e}^{-}},

(6)

\tanh (a) = \frac{e^{a} - e^{- a}}{e^{a} + e^{- a}} .

(7)

The training frequency was set to 100 times. In each training session, the program disrupted the order of the entire database and reprocessed, encapsulated and allocated the training and validation sets in a 9:1 ratio. Then, the training set data was used to adjust the parameters of the model, and the validation set data was used to calculate the current error of the model. When the calculated error was less than the previous minimum error, the current model parameters and the output results of the model for the validation set were retained.

4.5. Model Evaluations

The model was evaluated in two dimensions. Firstly, the accuracy of the model in predicting the IC₅₀ value of the specific peptide was assessed using the ratio of the predicted value and the reported or experimental IC₅₀ value. The prediction was defined as “accurate” when the ratio matrix was within the range of 0.75 and 1.25, otherwise it was considered “inaccurate”. In this way, the regression task was converted to the classification task, which was further used for the five-fold cross-validation.

To assess the overall reliability of the model, a five-fold cross-validation was executed according to the literature, in which the original dataset was randomly separated into five equally sized sub-samples. Then, each sub-sample was used for the test data, whereas the remaining sub-samples were used for the training set. The cross-validation process was then repeated five times. The average of the five-fold cross-validation yielded the accuracy of the algorithm [49,50]. The results of the five-fold cross-validation were presented as the mean accuracy, average sensitivity and average specificity. In addition, the root mean square error (RMSE) of the model was calculated according to the following formula:

R M S E = \sqrt{\frac{1}{m} \sum_{i = 1}^{m} {(y_{i} - {\hat{y}}_{i})}^{2}},

(8)

where m is the sample size, y is the reported value and

\hat{y}

is the predicted value.

4.6. The In Vitro ACE Inhibitory Assay

An online tool (https://www.genscript.com/sms2/random_protein.html accessed on 1 March 2023) was used to randomly generate peptides in order to test the efficiency of the model. The top peptides with a small number of IC₅₀ values were selected for synthesis. In addition, since peptides composed of less than six amino acid residues possess stability in the gastrointestinal tract [51], the maximum length of the generated peptides consisted of five amino acids. The peptides used for validation were synthesized by Genescript with a purity > 97%. The ACE inhibitory assay was performed according to a previous study [52] with modifications. ACE, N-hippuryl-His-Leu tetrahydrate (HHL, Sigma-Aldrich, St. Louis, MI, USA) and the peptide samples were dissolved in 100 mM of boric acid containing 300 mM of NaCl (pH8.3). Firstly, 10 μL of the peptide solution was preincubated with 50 μL of 6.5 mM HHL at 37 °C for 5 min. Then, 5 μL of 0.1UN/mL ACE (preincubated at 37 °C) was added to the reaction system and incubated at 37 °C for another 30 min. The reaction was terminated by adding 85 μL of 1 M HCl. The concentration of hippuric acid (Hip, the reaction product) was measured by HPLC with a C₁₈ column (5 µm, 250 mm × 4.6 mm). The sample (20 μL) was eluted by a gradient of solvent A (H₂O with 0.05% TFA) and solvent B (acetonitrile with 0.05% TFA) at a flow rate of 1.2 mL/min. The absorbance at 228 nm was monitored. The concentration of HA was calculated based on its standard curve. The area under each peak was calculated, in which A = the area under the peak of the blank group (without the peptide) and B = the area under the peak of the peptide group. The ACE inhibitory ratio = (A − B)/A. The ACE inhibitory ratio of each peptide at different concentrations was measured. The IC₅₀ value was defined as the peptide concentration inhibiting 50% of the ACE activity.

5. Conclusions

In this study, a novel model utilizing the LSTM-based deep learning network was constructed to predict the activity of food protein-derived antihypertensive peptides. The model achieved excellent performance in activity prediction, which was validated by both the test set of the benchmark dataset and the in vitro ACE inhibitory assay for randomly generated peptides. Therefore, this model could be used to screen antihypertensive peptides from various food proteins. In addition, this research provides a novel aspect for the QSAR study of antihypertensive peptides.

Author Contributions

Conceptualization, W.L. and K.C.; methodology, S.Y. and K.C.; investigation, S.Y. and X.C.; resources, W.L. and K.C.; data curation, S.Y. and X.C.; writing—original draft preparation, W.L.; writing—review and editing, H.X., S.W. and G.S.; supervision, K.C.; project administration, W.L.; funding acquisition, W.L. and K.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (82103834) and the High Level Personnel Project of Jiangsu Province (JSSCBS20220079). Wang Liao is the recipient of the Young Elite Scientists Sponsorship Program by CAST (2021QNRC001). Wang Liao and Kaida Cai are recipients of the Zhishan Young Scholar Award at the Southeast University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Al-Makki, A.; DiPette, D.; Whelton, P.K.; Murad, M.H.; Mustafa, R.A.; Acharya, S.; Beheiry, H.M.; Champagne, B.; Connell, K.; Cooney, M.T.; et al. Hypertension Pharmacological Treatment in Adults: A World Health Organization Guideline Executive Summary. Hypertension 2022, 79, 293–301. [Google Scholar] [CrossRef]
Forrester, S.J.; Booz, G.W.; Sigmund, C.D.; Coffman, T.M.; Kawai, T.; Rizzo, V.; Scalia, R.; Eguchi, S. Angiotensin II Signal Transduction: An Update on Mechanisms of Physiology and Pathophysiology. Physiol. Rev. 2018, 98, 1627–1738. [Google Scholar] [CrossRef] [PubMed]
Wright, J.M.; Musini, V.M.; Gill, R. First-line drugs for hypertension. Cochrane Database Syst. Rev. 2018, 8, CD001841. [Google Scholar] [CrossRef] [PubMed]
Bakhle, Y.S. How ACE inhibitors transformed the renin–angiotensin system. Br. J. Pharmacol. 2020, 177, 2657–2665. [Google Scholar] [CrossRef] [PubMed]
Aluko, R.E. Antihypertensive Peptides from Food Proteins. Annu. Rev. Food Sci. Technol. 2015, 6, 235–262. [Google Scholar] [CrossRef] [PubMed]
Wu, J.; Liao, W.; Udenigwe, C.C. Revisiting the mechanisms of ACE inhibitory peptides from food proteins. Trends Food Sci. Technol. 2017, 69, 214–219. [Google Scholar] [CrossRef]
Li, F.-M.; Wang, X.-Q. Identifying anticancer peptides by using improved hybrid compositions. Sci. Rep. 2016, 6, 33910. [Google Scholar] [CrossRef]
Jahangiri, R.; Soltani, S.; Barzegar, A. A review of QSAR studies to predict activity of ACE peptide inhibitors. Pharm. Sci. 2014, 20, 122–129. [Google Scholar]
Nongonierma, A.B.; FitzGerald, R.J. Learnings from quantitative structure–activity relationship (QSAR) studies with respect to food protein-derived bioactive peptides: A review. RSC Adv. 2016, 6, 75400–75413. [Google Scholar] [CrossRef]
Shen, Y.; Liu, C.; Chi, K.; Gao, Q.; Bai, X.; Xu, Y.; Guo, N. Development of a machine learning-based predictor for identifying and discovering antioxidant peptides based on a new strategy. Food Control 2022, 131, 108439. [Google Scholar] [CrossRef]
Olsen, T.H.; Yesiltas, B.; Marin, F.I.; Pertseva, M.; García-Moreno, P.J.; Gregersen, S.; Overgaard, M.T.; Jacobsen, C.; Lund, O.; Hansen, E.B. AnOxPePred: Using deep learning for the prediction of antioxidative properties of peptides. Sci. Rep. 2020, 10, 21471. [Google Scholar] [CrossRef] [PubMed]
Manavalan, B.; Basith, S.; Shin, T.H.; Choi, S.; Kim, M.O.; Lee, G. MLACP: Machine-learning-based prediction of anticancer peptides. Oncotarget 2017, 8, 77121. [Google Scholar] [CrossRef] [PubMed]
Li, C.; Sutherland, D.; Hammond, S.A.; Yang, C.; Taho, F.; Bergman, L.; Houston, S.; Warren, R.L.; Wong, T.; Hoang, L. AMPlify: Attentive deep learning model for discovery of novel antimicrobial peptides effective against WHO priority pathogens. BMC Genom. 2022, 23, 77. [Google Scholar] [CrossRef] [PubMed]
Bouktif, S.; Fiaz, A.; Ouni, A.; Serhani, M.A. Optimal deep learning lstm model for electric load forecasting using feature selection and genetic algorithm: Comparison with machine learning approaches. Energies 2018, 11, 1636. [Google Scholar] [CrossRef]
Sharma, R.; Shrivastava, S.; Kumar Singh, S.; Kumar, A.; Saxena, S.; Kumar Singh, R. Deep-ABPpred: Identifying antibacterial peptides in protein sequences using bidirectional LSTM with word2vec. Brief. Bioinform. 2021, 22, bbab065. [Google Scholar] [CrossRef]
Wang, L.; Niu, D.; Wang, X.; Khan, J.; Shen, Q.; Xue, Y. A Novel Machine Learning Strategy for the Prediction of Antihypertensive Peptides Derived from Food with High Efficiency. Foods 2021, 10, 550. [Google Scholar] [CrossRef]
Tavares, T.; Sevilla, M.A.; Montero, M.J.; Carron, R.; Malcata, F.X. Acute effect of whey peptides upon blood pressure of hypertensive rats, and relationship with their angiotensin-converting enzyme inhibitory activity. Mol. Nutr. Food. Res. 2012, 56, 316–324. [Google Scholar] [CrossRef]
Geng, X.; Tian, G.; Zhang, W.; Zhao, Y.; Zhao, L.; Wang, H.; Ng, T.B. A Tricholoma matsutake Peptide with Angiotensin Converting Enzyme Inhibitory and Antioxidative Activities and Antihypertensive Effects in Spontaneously Hypertensive Rats. Sci. Rep. 2016, 6, 24130. [Google Scholar] [CrossRef]
Nakano, D.; Ogura, K.; Miyakoshi, M.; Ishii, F.; Kawanishi, H.; Kurumazuka, D.; Kwak, C.-J.; Ikemura, K.; Takaoka, M.; Moriguchi, S.; et al. Antihypertensive Effect of Angiotensin I-Converting Enzyme Inhibitory Peptides from a Sesame Protein Hydrolysate in Spontaneously Hypertensive Rats. Biosci. Biotechnol. Biochem. 2006, 70, 1118–1126. [Google Scholar] [CrossRef]
Bravo, F.I.; Mas-Capdevila, A.; López-Fernández-Sobrino, R.; Torres-Fuentes, C.; Mulero, M.; Alcaide-Hidalgo, J.M.; Muguerza, B. Identification of novel antihypertensive peptides from wine lees hydrolysate. Food Chem. 2022, 366, 130690. [Google Scholar] [CrossRef]
Alcaide-Hidalgo, J.M.; Romero, M.; Duarte, J.; López-Huertas, E. Antihypertensive Effects of Virgin Olive Oil (Unfiltered) Low Molecular Weight Peptides with ACE Inhibitory Activity in Spontaneously Hypertensive Rats. Nutrients 2020, 12, 271. [Google Scholar] [CrossRef] [PubMed]
Li, Q.; Liao, W.; Fan, H.; Wu, J. Optimization and Scale-Up Preparation of Egg White Hydrolysate with Angiotensin I Converting Enzyme Inhibitory Activity. J. Food Sci. 2018, 83, 1762–1768. [Google Scholar] [CrossRef] [PubMed]
Suetsuna, K.; Maekawa, K.; Chen, J.-R. Antihypertensive effects of Undaria pinnatifida (wakame) peptide on blood pressure in spontaneously hypertensive rats. J. Nutr. Biochem. 2004, 15, 267–272. [Google Scholar] [CrossRef] [PubMed]
Majumder, K.; Chakrabarti, S.; Morton, J.S.; Panahi, S.; Kaufman, S.; Davidge, S.T.; Wu, J. Egg-Derived Tri-Peptide IRW Exerts Antihypertensive Effects in Spontaneously Hypertensive Rats. PLoS ONE 2013, 8, e82829. [Google Scholar] [CrossRef]
Suetsuna, K. Isolation and characterization of angiotensin I-converting enzyme inhibitor dipeptides derived from Allium sativum L. (garlic). J. Nutr. Biochem. 1998, 9, 415–419. [Google Scholar] [CrossRef]
Majumder, K.; Chakrabarti, S.; Morton, J.S.; Panahi, S.; Kaufman, S.; Davidge, S.T.; Wu, J. Egg-derived ACE-inhibitory peptides IQW and LKP reduce blood pressure in spontaneously hypertensive rats. J. Funct. Foods 2015, 13, 50–60. [Google Scholar] [CrossRef]
Shobako, N.; Ogawa, Y.; Ishikado, A.; Harada, K.; Kobayashi, E.; Suido, H.; Kusakari, T.; Maeda, M.; Suwa, M.; Matsumoto, M.; et al. A Novel Antihypertensive Peptide Identified in Thermolysin-Digested Rice Bran. Mol. Nutr. Food Res. 2018, 62, 1700732. [Google Scholar] [CrossRef]
Balti, R.; Bougatef, A.; Sila, A.; Guillochon, D.; Dhulster, P.; Nedjar-Arroume, N. Nine novel angiotensin I-converting enzyme (ACE) inhibitory peptides from cuttlefish (Sepia officinalis) muscle protein hydrolysates and antihypertensive effect of the potent active peptide in spontaneously hypertensive rats. Food Chem. 2015, 170, 519–525. [Google Scholar] [CrossRef]
Dang, Y.; Zhou, T.; Hao, L.; Cao, J.; Sun, Y.; Pan, D. In vitro and in vivo studies on the angiotensin-converting enzyme inhibitory activity peptides isolated from broccoli protein hydrolysate. J. Agric. Food Chem. 2019, 67, 6757–6764. [Google Scholar] [CrossRef]
Sonklin, C.; Alashi, M.A.; Laohakunjit, N.; Kerdchoechuen, O.; Aluko, R.E. Identification of antihypertensive peptides from mung bean protein hydrolysate and their effects in spontaneously hypertensive rats. J. Funct. Foods 2020, 64, 103635. [Google Scholar] [CrossRef]
Qian, Z.-J.; Jung, W.-K.; Lee, S.-H.; Byun, H.-G.; Kim, S.-K. Antihypertensive effect of an angiotensin I-converting enzyme inhibitory peptide from bullfrog (Rana catesbeiana Shaw) muscle protein in spontaneously hypertensive rats. Process Biochem. 2007, 42, 1443–1448. [Google Scholar] [CrossRef]
Tokunaga, K.H.; Yoshida, C.; Suzuki, K.M.; Maruyama, H.; Futamura, Y.; Araki, Y.; Mishima, S. Antihypertensive Effect of Peptides from Royal Jelly in Spontaneously Hypertensive Rats. Biol. Pharm. Bull. 2004, 27, 189–192. [Google Scholar] [CrossRef] [PubMed]
Priyanto, A.D.; Doerksen, R.J.; Chang, C.-I.; Sung, W.-C.; Widjanarko, S.B.; Kusnadi, J.; Lin, Y.-C.; Wang, T.-C.; Hsu, J.-L. Screening, discovery, and characterization of angiotensin-I converting enzyme inhibitory peptides derived from proteolytic hydrolysate of bitter melon seed proteins. J. Proteom. 2015, 128, 424–435. [Google Scholar] [CrossRef] [PubMed]
Miguel, M.; Gómez-Ruiz, J.Á.; Recio, I.; Aleixandre, A. Changes in arterial blood pressure after single oral administration of milk-casein-derived peptides in spontaneously hypertensive rats. Mol. Nutr. Food Res. 2010, 54, 1422–1427. [Google Scholar] [CrossRef] [PubMed]
Song, Y.; Yu, J.; Song, J.; Wang, S.; Cao, T.; Liu, Z.; Gao, X.; Wei, Y. The antihypertensive effect and mechanisms of bioactive peptides from Ruditapes philippinarum fermented with Bacillus natto in spontaneously hypertensive rats. J. Funct. Foods 2021, 79, 104411. [Google Scholar] [CrossRef]
Kıvrık, M.; Süfer, Ö.; Bozok, F. A Research on Quality Evaluation of Eight Wild Edible Macrofungi Collected from East Mediterranean Region of Turkey. Chem. Biodivers. 2022, 19, e202100967. [Google Scholar] [CrossRef]
Lin, F.; Chen, L.; Liang, R.; Zhang, Z.; Wang, J.; Cai, M.; Li, Y. Pilot-scale production of low molecular weight peptides from corn wet milling byproducts and the antihypertensive effects in vivo and in vitro. Food Chem. 2011, 124, 801–807. [Google Scholar] [CrossRef]
De Freitas, M.A.G.; Amaral, N.O.; Álvares, A.C.M.; de Oliveira, S.A.; Mehdad, A.; Honda, D.E.; Bessa, A.S.M.; Ramada, M.H.S.; Naves, L.M.; Pontes, C.N.R.; et al. Blood pressure-lowering effects of a Bowman-Birk inhibitor and its derived peptides in normotensive and hypertensive rats. Sci. Rep. 2020, 10, 11680. [Google Scholar] [CrossRef]
Wu, J.; Aluko, R.E.; Nakai, S. Structural Requirements of Angiotensin I-Converting Enzyme Inhibitory Peptides: Quantitative Structure—Activity Relationship Study of Di- and Tripeptides. J. Agric. Food Chem. 2006, 54, 732–738. [Google Scholar] [CrossRef]
Kumar, R.; Chaudhary, K.; Singh Chauhan, J.; Nagpal, G.; Kumar, R.; Sharma, M.; Raghava, G.P. An in silico platform for predicting, screening and designing of antihypertensive peptides. Sci. Rep. 2015, 5, 12512. [Google Scholar] [CrossRef]
Manavalan, B.; Basith, S.; Shin, T.H.; Wei, L.; Lee, G. mAHTPred: A sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation. Bioinformatics 2019, 35, 2757–2765. [Google Scholar] [CrossRef] [PubMed]
Tian, C.; Ma, J.; Zhang, C.; Zhan, P. A deep neural network model for short-term load forecast based on long short-term memory network and convolutional neural network. Energies 2018, 11, 3493. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Liao, W.; Wu, J. The ACE2/Ang (1–7)/MasR axis as an emerging target for antihypertensive peptides. Crit. Rev. Food Sci. Nutr. 2021, 61, 2572–2586. [Google Scholar] [CrossRef]
Kalyan, G.; Junghare, V.; Khan, M.F.; Pal, S.; Bhattacharya, S.; Guha, S.; Majumder, K.; Chakrabarty, S.; Hazra, S. Anti-hypertensive peptide predictor: A machine learning-empowered web server for prediction of food-derived peptides with potential angiotensin-converting enzyme-I inhibitory activity. J. Agric. Food Chem. 2021, 69, 14995–15004. [Google Scholar] [CrossRef] [PubMed]
Minkiewicz, P.; Iwaniak, A.; Darewicz, M. BIOPEP-UWM database of bioactive peptides: Current opportunities. Int. J. Mol. Sci. 2019, 20, 5978. [Google Scholar] [CrossRef]
Panyayai, T.; Ngamphiw, C.; Tongsima, S.; Mhuantong, W.; Limsripraphan, W.; Choowongkomon, K.; Sawatdichaikul, O. FeptideDB: A web application for new bioactive peptides from food protein. Heliyon 2019, 5, e02076. [Google Scholar] [CrossRef] [PubMed]
Li, Q.; Zhang, C.; Chen, H.; Xue, J.; Guo, X.; Liang, M.; Chen, M. BioPepDB: An integrated data platform for food-derived bioactive peptides. Int. J. Food Sci. Nutr. 2018, 69, 963–968. [Google Scholar] [CrossRef]
François, D.; Rossi, F.; Wertz, V.; Verleysen, M. Resampling methods for parameter-free and robust feature selection with mutual information. Neurocomputing 2007, 70, 1276–1288. [Google Scholar] [CrossRef]
Diamantidis, N.; Karlis, D.; Giakoumakis, E.A. Unsupervised stratification of cross-validation for accuracy estimation. Artif. Intell. 2000, 116, 1–16. [Google Scholar] [CrossRef]
Fan, H.; Liao, W.; Wu, J. Molecular interactions, bioavailability, and cellular mechanisms of angiotensin-converting enzyme inhibitory peptides. J. Food Biochem. 2019, 43, e12572. [Google Scholar] [CrossRef] [PubMed]
Fan, H.; Wu, J. Purification and identification of novel ACE inhibitory and ACE2 upregulating peptides from spent hen muscle proteins. Food Chem. 2021, 345, 128867. [Google Scholar] [CrossRef] [PubMed]

Figure 1. An overview of the dataset. (A) Frequency distribution of the IC₅₀ values. (B) The percentage of each amino acid residue.

Figure 2. The training history showing train loss and test loss for the model.

Figure 3. The evaluation of the test set. (A) The reported IC₅₀ and the predicted IC₅₀ of the test set. (B) The distribution of the ratio of predicted IC₅₀ and reported IC₅₀.

Figure 4. The network structure of the LSTM model.

Table 1. Peptides reported with in vitro ACE inhibitory activity and the in vivo blood-pressure-lowering activity.

Peptide Sequence	Predicted IC₅₀ (μM)	Reported IC₅₀ (μM)	Predicted IC₅₀/ Reported IC₅₀	Reference Reporting the IC₅₀
KGYGGVSLPEW	0.23	0.70	0.33	[17]
LLVTLKK	0.42	0.95	0.44	[18]
LKY	0.36	0.78	0.46	[19]
PAGELHP	0.29	0.50	0.58	[20]
DAQSAPLRVY	7.60	12.20	0.62	[17]
RDGGYCC	0.56	0.84	0.67	[21]
WV	217.83	307.61	0.71	[22]
KF	20.08	28.30	0.71	[23]
LVY	1.30	1.80	0.72	[19]
IRW	0.44	0.61	0.72	[24]
FY	2.71	3.70	0.73	[23]
LEEFCC	1.36	1.85	0.73	[21]
GF	213.69	277.90	0.77	[25]
MLPAY	1.27	1.58	0.80	[19]
IQW	1.26	1.56	0.81	[26]
LRA	141.66	174.30	0.81	[27]
KIDKVVK	0.53	0.62	0.85	[18]
LKP	2.49	2.93	0.85	[26]
AFVGYVLP	12.62	14.41	0.88	[28]
LAK	42.46	48.00	0.88	[29]
NF	41.66	46.30	0.90	[25]
VY	10.24	11.30	0.91	[23]
HLNVVHGN	46.29	50.88	0.91	[30]
DKVGINYW	23.13	25.40	0.91	[17]
EKSYELP	16.54	18.02	0.92	[28]
PGSGCAGTDL	53.67	57.86	0.93	[30]
LSA	7.26	7.81	0.93	[19]
GAAELPCSADWW	10.25	10.95	0.94	[31]
KY	7.25	7.70	0.94	[23]
IVY	43.52	45.77	0.95	[32]
KW	10.28	10.80	0.95	[23]
VW	10.29	10.80	0.95	[23]
VDSDVVK	8.26	8.64	0.96	[33]
VF	42.00	43.70	0.96	[23]
LRLESF	5.21	5.39	0.97	[30]
YY	46.30	47.90	0.97	[27]
LDSPSEGRAPG	17.31	17.90	0.97	[20]
VIY	4.36	4.50	0.97	[19]
VELYP	5.23	5.22	1.00	[28]
WQVLPNAVPAK	1023.89	1010.00	1.01	[34]
TFQGGlPPHGIQVER	3.47	3.40	1.02	[29]
VISDEDGVTH	8.33	8.16	1.02	[35]
RLSGQTIEVTSEYLFRH	577.19	560.18	1.03	[36]
ILSKLK	4.28	4.02	1.07	[18]
AY	156.44	146.76	1.07	[37]
IISKIK	1.28	1.19	1.07	[18]
CTFSIPAQC	26.31	24.40	1.08	[38]
IY	2.96	2.70	1.09	[23]
LT	1.22	1.11	1.10	[22]
TVTNPARIA	16.33	14.50	1.13	[20]
LVLPGELAK	214.22	184.00	1.16	[29]
LQP	1.35	1.04	1.30	[19]
IPPAYTK	35.75	23.50	1.52	[29]
LVLPGE	20.79	13.50	1.54	[29]

Table 2. The predicted IC50 and the experimental IC50 of the randomly synthesized peptides.

Peptide Sequence	Predicted IC₅₀ (μM)	Experimental IC₅₀ (μM)	Predicted IC₅₀ /Experimental IC₅₀
LKPDQ	0.70	0.88	0.79
WD	0.63	0.51	1.23
GVPK	0.61	0.25	2.44
FI	0.61	0.31	1.95
PDFLI	0.60	0.33	1.83
HDHR	0.59	0.59	1.00
LKPNS	0.56	0.5	1.12
VYHEL	0.55	0.38	1.45
GPAY	0.54	0.37	1.45
LVL	0.51	0.32	1.59
LKL	0.49	0.56	0.88
FDKA	0.47	0.6	0.79
VAWKL	0.46	0.23	2.00
VHLAP	0.46	0.33	1.39
IQWCA	0.46	0.1	4.59
PLPLL	0.55	0.2	1.75
KLPAY	0.44	0.12	3.63
LKPI	0.43	0.39	1.11
FALPC	0.42	0.16	2.65
ALPD	0.72	1.55	0.46

Table 3. The digit representing each amino acid.

Amino Acid	Representing Digit	Amino Acid	Representing Digit
I	2	D	11
L	3	C	12
S	4	T	13
H	5	N	14
R	6	V	15
P	7	G	16
A	8	Q	17
W	9	K	18
F	10	Y	19
		E	20

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liao, W.; Yan, S.; Cao, X.; Xia, H.; Wang, S.; Sun, G.; Cai, K. A Novel LSTM-Based Machine Learning Model for Predicting the Activity of Food Protein-Derived Antihypertensive Peptides. Molecules 2023, 28, 4901. https://doi.org/10.3390/molecules28134901

AMA Style

Liao W, Yan S, Cao X, Xia H, Wang S, Sun G, Cai K. A Novel LSTM-Based Machine Learning Model for Predicting the Activity of Food Protein-Derived Antihypertensive Peptides. Molecules. 2023; 28(13):4901. https://doi.org/10.3390/molecules28134901

Chicago/Turabian Style

Liao, Wang, Siyuan Yan, Xinyi Cao, Hui Xia, Shaokang Wang, Guiju Sun, and Kaida Cai. 2023. "A Novel LSTM-Based Machine Learning Model for Predicting the Activity of Food Protein-Derived Antihypertensive Peptides" Molecules 28, no. 13: 4901. https://doi.org/10.3390/molecules28134901

APA Style

Liao, W., Yan, S., Cao, X., Xia, H., Wang, S., Sun, G., & Cai, K. (2023). A Novel LSTM-Based Machine Learning Model for Predicting the Activity of Food Protein-Derived Antihypertensive Peptides. Molecules, 28(13), 4901. https://doi.org/10.3390/molecules28134901

Article Menu

A Novel LSTM-Based Machine Learning Model for Predicting the Activity of Food Protein-Derived Antihypertensive Peptides

Abstract

1. Introduction

2. Results

2.1. An Overview of the Dataset

2.2. Performance Evaluation of the Model

2.3. Model Validations

3. Discussion

4. Materials and Methods

4.1. Benchmark Dataset

4.2. Literature Searching Strategy

4.3. Representation of the Peptide Sequence

4.4. Machine Learning Algorithms

4.5. Model Evaluations

4.6. The In Vitro ACE Inhibitory Assay

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI