Next Article in Journal
Aberrant Calcium Signals in Reactive Astrocytes: A Key Process in Neurological Disorders
Next Article in Special Issue
Novel Benzene-Based Carbamates for AChE/BChE Inhibition: Synthesis and Ligand/Structure-Oriented SAR Study
Previous Article in Journal
Targeting the Iron-Response Elements of the mRNAs for the Alzheimer’s Amyloid Precursor Protein and Ferritin to Treat Acute Lead and Manganese Neurotoxicity
Previous Article in Special Issue
Ciprofloxacin and Clinafloxacin Antibodies for an Immunoassay of Quinolones: Quantitative Structure–Activity Analysis of Cross-Reactivities
Article Menu
Issue 4 (February-2) cover image

Export Article

Int. J. Mol. Sci. 2019, 20(4), 995; https://doi.org/10.3390/ijms20040995

Article
Quantitative Structure-Activity Relationship Study of Antioxidant Tripeptides Based on Model Population Analysis
1
Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University, Guangzhou 510642, China
2
Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, China
3
State Key Laboratory of Food Science and Technology, International Joint Laboratory on Food Safety, Jiangnan University, Wuxi 214122, China
4
Yunnan Food Safety Research Institute, Kunming University of Science and Technology, Kunming 650500, China
*
Correspondence: [email protected]; Tel.: +86-159-2090-5773
These authors contributed equally to this work.
Received: 1 January 2019 / Accepted: 18 February 2019 / Published: 25 February 2019

Abstract

:
Due to their beneficial effects on human health, antioxidant peptides have attracted much attention from researchers. However, the structure-activity relationships of antioxidant peptides have not been fully understood. In this paper, quantitative structure-activity relationships (QSAR) models were built on two datasets, i.e., the ferric thiocyanate (FTC) dataset and ferric-reducing antioxidant power (FRAP) dataset, containing 214 and 172 unique antioxidant tripeptides, respectively. Sixteen amino acid descriptors were used and model population analysis (MPA) was then applied to improve the QSAR models for better prediction performance. The results showed that, by applying MPA, the cross-validated coefficient of determination (Q2) was increased from 0.6170 to 0.7471 for the FTC dataset and from 0.4878 to 0.6088 for the FRAP dataset, respectively. These findings indicate that the integration of different amino acid descriptors provide additional information for model building and MPA can efficiently extract the information for better prediction performance.
Keywords:
quantitative structure-activity relationship; QSAR; antioxidant tripeptides; model population analysis; amino acid descriptors

1. Introduction

Bioactive peptides, usually containing 2–20 amino acid residues, are typically derived from the enzymatic hydrolysis of proteins [1]. They are inactive within the sequence of proteins, but they can exert various physiological functions after release. Antioxidant peptides are one of the most important groups of bioactive peptides, which can prevent oxidative stress and they have notable contributions to human health [2]. Antioxidant peptides have been isolated and purified from sources, such as cereals, milk, meat, and fish [3]. The methods to assess the antioxidant capacities of peptides include the Trolox equivalent antioxidant capacity (TEAC), the ferric ion reducing antioxidant power (FRAP), the 2,2-diphenyl-1-picrylhydrazyl radical-scavenging capacity (DPPH), the oxygen radical absorbance capacity (ORAC), the total radical trapping antioxidant parameter (TRAP), etc. [4]. However, it is impossible to test all of the peptides to find valid antioxidants, when considering the large number of theoretical possible peptides, i.e., 400 dipeptides, 8000 tripeptides, 160,000 tetrapeptides, etc.
The activities of peptides are determined by the amino acid compositions, sequences, and structures. Quantitative structure-activity relationship (QSAR), which is a well-recognized tool for estimating chemical activities, has been widely applied for bioactive peptides prediction [5]. The QSAR models have been successfully built on ACE-inhibitory peptides [6], antimicrobial peptides [7], antioxidant peptides [8,9,10], antitumor peptides [11], bitter peptides [12], and etc. The QSAR study of antioxidant peptides mainly focused on di and tripeptides, because they can be absorbed intact from the intestinal lumen into the bloodstream and then produce biological effects at the tissue level [13]. When compared to dipeptides, tripeptides were reported to exhibit higher levels of antioxidant activity [14]. Besides, tripeptides had much larger structural diversity than dipeptides, which is a good property for developing multifunctional food additives [15].
The prediction performances need to be further improved, although plenty of QSAR models have been built on antioxidant peptides. The relationship between peptide structure and antioxidant activity is still unclear. This may be due to the restriction of model building methods. Model population analysis (MPA) provides a new strategy of model building, which is to use multi-models instead of a single model to improve prediction ability and interpretability [16,17]. Previous studies showed that, through the application of MPA strategy, the performance of regression models could be improved [6,18].
In this study, we built QSAR models based on two antioxidant tripeptides datasets. The first dataset contains 214 artificially designed tripeptides and the second dataset contains 172 β-Lactoglobulin derived tripeptides, which represent designed or food originated tripeptides, respectively. 16 amino acid descriptors were used to construct sophisticated data for the comprehensive information of peptides. The MPA strategy was applied to extract useful information from the data and to optimize the models. The aim of this study is not to build a new set of descriptors, but to integrate different descriptors under the framework of MPA for better QSAR model performance on antioxidant tripeptides data. The improved method for QSAR modelling will help in discovering new antioxidant tripeptides for future drugs or food additives.

2. Results

2.1. FTC Dataset

The results of QSAR models on the FTC dataset are displayed in Table 1. Before outlier elimination, the largest Q2 value of 0.4901 is obtained on the VSW descriptor. After outlier elimination, the HSEHPCSV descriptor showed the largest Q2 value of 0.6170 among the 16 amino acid descriptors. The integration of 16 descriptors gave rise to an improvement of the model performance (Q2 = 0.6818). Finally, the model prediction performance was further improved (Q2 = 0.7471) after variable selection while using the BOSS method.
In this study, an MPA-based outlier elimination procedure [19] was carried out to remove outliers one by one (Figure 1). For the integrated data, samples of no. 181, 183, 182, 134, 151, 153, and 188 were removed in sequence. Finally, all of the samples were within the range according to the three-sigma rule after outlier removal (Figure 1H, dashed line).
Figure 2 showed the selected variables by the BOSS method in 100 runs. The variables being selected more frequently reflect high variable importance. The top 11 variables (frequency>75), in descending order, were as follows: C-VSW-5 = N-G-7 > C-ST-3 > M-ST-7 > N-DPPS-8 > C-HESH-2 > N-FASGAI-5 > M-G-6 > N-VSW-3 > C-VHSE-6 >C-HSEHPCSV-9, which are marked on Figure 2. All the top 11 variables originated from the best preformed amino acid descriptors, i.e., HSEHPCSV, ST-scale, HESH, G-scale, FASGAI, and DPPS (Table 1). It showed that the ultimate model has the merit of the best performed models that were constructed by single amino acid descriptors.

2.2. FRAP Dataset

The results of QSAR models on FRAP dataset are displayed in Table 2. Before logarithmic transformation of response vector Y, the largest Q2 value of 0.1408 is obtained on 5Z-scale descriptor. The low Q2 value indicated that the tripeptide structures and their antioxidant activities that were evaluated by FRAP assay did not share a linear relationship. After logarithmic transformation, the VHSE descriptor showed the largest Q2 value of 0.4878. Through integrating the 16 descriptors, the Q2 value was increased slightly to 0.4953. The prediction performance of the model was promoted after variable selection using the BOSS method (Q2 = 0.6088). It indicated that a linear relationship between the structures and the activities was built after the logarithmic transformation of Y and the MPA strategy was efficient in improving the model.
Similarly, an MPA-based outlier elimination procedure was carried out on the FRAP dataset. No outlying sample was detected, since all of the samples gather within the range according to the three-sigma rule (Figure 3A, dashed line). The important variables that were selected by BOSS are displayed in Figure 3B. The six most important variables (frequency > 75) are C-Z5-5, M-Z5-5, N-VSW-9, N-VHSE-8, N-ST-3, and C-VSW-2, respectively. Most of the important variables originated from three well performed descriptors, i.e., VHSE, 5Z-scale, and ST-scale. However, there still some variables selected from the poorly performed descriptor, such as VSW. It suggested that descriptors with poor performance also contained useful information for model building.

3. Discussion

3.1. Comparison with the Reported Models

For the FTC dataset, our method showed higher prediction accuracy (Q2 = 0.7471), when compared to the previous report (Q2 = 0.6310) [20]. Note that 41 sample were eliminated as outliers in the previous study, while only seven outliers were eliminated in this study. A much larger number of samples was used in our model, which is more representative. It showed that our method exhibited a model with higher prediction performance and the relatively larger applicability domain.
Similarly, for FRAP dataset, our method showed a higher prediction accuracy (Q2 = 0.6008) when compared to the previous report (Q2 = 0.5410) [21]. It should be noted that, in the previous study, five samples with the highest activities and 14 inactive samples were removed, while in our study, only inactive samples were removed. Thus, our model showed improved prediction accuracy and enlarged applicability domain.

3.2. Relationship between Antioxidant Activities and Peptide Structures

Previous studies showed that the N-terminus and C-terminus amino acids are important in relating to antioxidant activities [20]. Our results are in agreement with the previous findings that most of the important variables that were selected by BOSS originated from the N-terminus or C-terminus (Figure 2 and Figure 3B). In addition, studies showed that tripeptides containing Cys (C), Trp (W), and Tyr (Y) residues exhibited strong antioxidant activities [8,10]. Tripeptides YHY and LTC, for the two datasets, respectively, having the highest antioxidant activities is confirmed by our study.
On the FTC dataset, a linear relationship between antioxidant activities and peptide structures was constructed. However, on the FRAP dataset, the relationship was only built on the log-transformed activities and structure properties. It indicates that the antioxidant activity and peptide structures on the FRAP dataset exhibits a non-linear relationship. Data transformation is crucial before model building on this kind of data. The different performance of the two datasets may be attributed to the structure diversities of peptides. In the FTC dataset, tripeptides contain either the His or Tyr residue, which have similar structures, while the structure diversity in the FRAP dataset is much larger.

3.3. The Integration of Amino Acid Descriptors

A number of amino acid descriptors have been developed and applied in the QSAR studies of bioactive peptides. Each descriptor has its merits and demerits. Our study shows that an optimal descriptor does not exist. Instead, all of the descriptors are data dependent, which means that each descriptor performs well on different datasets. It makes the researches difficult to select descriptors. By integrating different descriptors, each one can contribute particular information to the model and create a new possibility for further improvement of the model. Subsequently, the next question has become how to efficiently extract information from different descriptors and to get rid of the redundancy of the data? Model population analysis (MPA) may provide a solution for that. It uses multi-models instead of a single model for prediction. Each sub-model contains a random combination of different descriptors. Through statistical analysis of the sub-model outcomes, the informative variables from the descriptors are extracted and an optimized descriptor combination is obtained [22]. Finally, the optimized model performs better than any of the single descriptor model, as it is shown in Table 1 and Table 2. To summarize, the aim of this study is not to build a new set of descriptors, but to provide a general framework to integrate different descriptors. The framework can take in any newly developed descriptor and fit on different datasets. The more diverse the integrated descriptors are, the better performance the model can be.

4. Materials and Methods

4.1. Data Collection

4.1.1. Ferric Thiocyanate (FTC) Dataset

A dataset of 214 antioxidant tripeptides that contain either His or Tyr residue was obtained from the published literatures [20,23]. All of the tripeptides were chemically synthesized using solid phase Fmoc Chemistry and their antioxidant activities were measured by the FTC method [23]. Test samples (500 μg) in 0.5 mL of deionized water were mixed with linoleic acid emulsion (1.0 mL, 50 mM) and phosphate buffer (1.0 mL, 0.1 M) in glass test tubes (5 mL). The tubes were sealed with silicon rubber caps and then kept at 60 °C in the dark. 50 μL reaction mixtures were taken out at different intervals during incubation. The degree of oxidation was measured by sequentially adding ethanol (2.35 mL, 75%), ammonium thiocyanate (50 μL, 30%), and ferrous chloride (50 μL, 20 mM in 3.5% HCl). After the mixture had stood for 3 min, the absorbance of the solution was measured at 500 nm with a Jasco model Ubest 30 spectrophotometer (Tokyo, Japan). A control was performed containing the same contents with test sample but without the peptides. The number of days that was taken to attain the absorbance of 0.3 was defined as the induction period. The relative activities were calculated by dividing the induction period of test samples by that of the control (Table 3). All of the experiments were carried out in triplicate and averaged.

4.1.2. Ferric-reducing Antioxidant Power (FRAP) Dataset

A dataset of 172 antioxidant tripeptides were derived from β-Lactoglobulin, where all possible tripeptides were collected based on its amino sequence [21]. All of the tripeptides were chemically synthesized while using solid phase Fmoc Chemistry and their antioxidant activities were evaluated using the FRAP assay [24]. Ten microliters of 100 mmol/mL tripeptide solution were incubated at 37 °C with 100 μL of FRAP reagent, containing 10 mmol/L of 2,4,6-tripyridyl-s-triazine and 20 mmol/L of FeCl3. The absorption values were read at a wavelength of 570 nm using a microplate reader (Model 680, Bio-Rad, Hercules, CA, USA) after 10 min reaction. Aqueous Fe2+ solutions at concentrations that ranged from 10 to 1000 μmol/L were used to produce a calibration curve. The results were expressed as micromoles Fe2+ equivalents per mole of the sample based on the standard curve. All of the experiments were carried out in triplicate and then averaged. The activities were logarithmic transformed prior to modeling, where 14 inactive peptides (activity = 0) were removed (Table 4). The measured activities before logarithmic transformation were displayed in Table S1.
The two datasets are representative for artificially designed or food protein originated tripeptides, respectively. Both of the datasets have been used for building QSAR models before. Thus, it is suitable for model comparison.

4.2. Data Processing

The tripeptide sequences were transformed into X-matrices using 16 amino acid descriptors, respectively, while the dependent variable Y-vectors represents the relative activities of peptides. These descriptors include Z-scale, 5Z-scale, DPPS, MS-WHIM, ISA-ECI, VHSE, FASGAI, VSW, T-scale, ST-scale, E-scale, V-scale, G-scale, HESH, and HSEHPCSV, as is shown in Table 5. They are the most frequently used amino acid descriptors in the QSAR study of bioactive peptides. The peptide structure is characterized by describing amino acids within the sequence. For example, Z-scale descriptor, containing three parameters (Z1, Z2, and Z3), would generate nine variables (3 parameters × 3 amino acids) for tripeptides. To clearly label each variable, we used a unified rule to name them. The amino acid at the N-terminus was designated as N, the C-terminus amino acid was designated as C, and the middle amino acid was designated as M. Thus, the nine variables that were generated by Z-scale descriptor were labeled as N-Z-1, N-Z-2, N-Z-3, M-Z-1, M-Z-2, M-Z-3, C-Z-1, C-Z-2, and C-Z-3, respectively. The 16 descriptors were integrated to build an X-matrix, which contained 306 variables (V1-V306), with the correspondence, as follows: Z-scale (V1-V9), 5Z-scale (V10-V24), DPPS (V25-V54), MS-WHIM1 (V55-V63), MS-WHIM2 (V64-V72), ISA-ECI (V73-V78), VHSE (V79-V102), FASGAI (V103-V120), VSW (V121-V147), E-scale (V148-V162), T-scale (V163-V177), ST-scale (V178-V201), V-scale (V202-V210), G-scale (V211-V234), HESH (V235-V270), and HSEHPCSV (V271-V306), respectively.

4.3. QSAR Model Building

Partial least squares (PLS) regression [40] was used to build the connection between the peptide structure descriptions (variables, X-matrices) and the relative activities (responses, Y-vectors). It was implemented using MATLAB software (Version R2015a, the MathWorks, Inc., Natick, MA, USA). All of the variables were auto-scaled to unit variance and all of the responses were mean-centered prior to model building. The models were validated using cross-validation and the optimal number of PLS components were chosen based on a statistic, called Q2, which is the cross-validated R2, referring to the predictive ability of the model. R2 is the coefficient of determination, providing an estimate of the model fit.
MPA was applied to optimize the model through outlier elimination and variable selection. It is a framework for model building that utilizes multiple models instead of a single model to construct results [16,17]. Generally, it worked, as follows: (1) firstly, a random resampling procedure was applied to obtain sub-datasets; (2) then, sub-models were built based on the sub-datasets; and, (3) finally, a statistical analysis was used to extract useful information from the outcome of sub-models. In the present study, MPA was utilized for outlier detection and variable selection.
The MPA-based outlier detection method [19] was applied to remove the outlying samples from measured data. To begin with, 1000 sub-datasets were generated through random reselecting of 80% samples in sample space. Subsequently, for each sub-dataset, a PLS regression model was built and the prediction error for each sample was recorded. The mean of prediction errors was used as the basis for outlier detection and a three-sigma rule was applied to define the boundary, as it is reported previously [6]. The bootstrapping soft shrinkage (BOSS) method [18] was applied to select informative variables from the pool of descriptors. It is also based on the idea of MPA. Firstly, 1000 sub-datasets were obtained using bootstrap resampling in the variable space. Afterwards, 1000 PLS models were built based on the sub-datasets and the regression coefficients were extracted. In the next step, weighted bootstrap resampling was used to regenerate sub-datasets and to rebuild sub-model. The resampling procedure was repeated until all of the uninformative variables were eliminated.

5. Conclusions

In this study, we have constructed QSAR models on two datasets of antioxidant tripeptides, i.e., FTC dataset and FRAP dataset. After the integration of 16 amino acid descriptors and utilization of the MPA strategy for model building, the Q2 values were enlarged from 0.6170 to 0.7471 and from 0.4878 to 0.6088, respectively. The results show that the MPA framework is powerful in QSAR model building on antioxidant tripeptides data. The framework can also be applied to investigate the structure and activity relationships of other types of bioactive peptides and to integrate more different molecular descriptors.

Supplementary Materials

The following are available online at https://www.mdpi.com/1422-0067/20/4/995/s1.

Author Contributions

B.D. conceived and designed the study; H.L., T.T. and X.N. collected the data; H.L., J.C., G.Y. carried out the model building; F.Z., R.C., D.C., M.Z. and L.Y. analyzed the data; B.D. and H.L. wrote the paper.

Funding

This research was funded by the Nation K&D Program of China (2018YFD0500603), the National Natural Science Foundation of China (Grant Nos. 31790411 and 31872985), and the Natural Science Foundation of Guangdong Province (Grant Nos. 2017A030310410) for supporting of the projects. The studies meet with the approval of the university’s review board.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

BOSSthe bootstrapping soft shrinkage method
DPPH2,2-diphenyl-1-picrylhydrazyl radical-scavenging capacity
FRAPferric-reducing antioxidant power
FTCferric thiocyanate
MPAmodel population analysis
ORACoxygen radical absorbance capacity
PLSpartial least squares
QSARquantitative structure-activity relationships
TEACTrolox equivalent antioxidant capacity
TRAPtotal radical trapping antioxidant parameter

References

  1. Chakrabarti, S.; Guha, S.; Majumder, K. Food-Derived Bioactive Peptides in Human Health: Challenges and Opportunities. Nutrients 2018, 10, 1738. [Google Scholar] [CrossRef] [PubMed]
  2. Lorenzo, J.M.; Munekata, P.E.S.; Gomez, B.; Barba, F.J.; Mora, L.; Perez-Santaescolastica, C.; Toldra, F. Bioactive peptides as natural antioxidants in food products—A review. Trends Food Sci. Technol. 2018, 79, 136–147. [Google Scholar] [CrossRef]
  3. Sila, A.; Bougatef, A. Antioxidant peptides from marine by-products: Isolation, identification and application in food systems. A review. J. Funct. Foods 2016, 21, 10–26. [Google Scholar] [CrossRef]
  4. MacDonald-Wicks, L.K.; Wood, L.G.; Garg, M.L. Methodology for the determination of biological antioxidant capacity in vitro: A review. J. Sci. Food Agric. 2006, 86, 2046–2056. [Google Scholar] [CrossRef]
  5. Zou, T.B.; He, T.P.; Li, H.B.; Tang, H.W.; Xia, E.Q. The Structure-Activity Relationship of the Antioxidant Peptides from Natural Proteins. Molecules 2016, 21, 72. [Google Scholar] [CrossRef] [PubMed]
  6. Deng, B.; Ni, X.; Zhai, Z.; Tang, T.; Tan, C.; Yan, Y.; Deng, J.; Yin, Y. New Quantitative Structure-Activity Relationship Model for Angiotensin-Converting Enzyme Inhibitory Dipeptides Based on Integrated Descriptors. J. Agric. Food Chem. 2017, 65, 9774–9781. [Google Scholar] [CrossRef] [PubMed]
  7. Vishnepolsky, B.; Gabrielian, A.; Rosenthal, A.; Hurt, D.E.; Tartakovsky, M.; Managadze, G.; Grigolava, M.; Makhatadze, G.I.; Pirtskhalava, M. Predictive Model of Linear Antimicrobial Peptides Active against Gram-Negative Bacteria. J. Chem. Inf. Model. 2018, 58, 1141–1151. [Google Scholar] [CrossRef] [PubMed]
  8. Chen, N.; Chen, J.; Yao, B.; Li, Z.G. QSAR Study on Antioxidant Tripeptides and the Antioxidant Activity of the Designed Tripeptides in Free Radical Systems. Molecules 2018, 23, 1407. [Google Scholar] [CrossRef] [PubMed]
  9. Liao, W.Z.; Gu, L.J.; Zheng, Y.M.; Zhu, Z.S.; Zhao, M.M.; Liang, M.; Ren, J.Y. Analysis of the quantitative structure-activity relationship of glutathione-derived peptides based on different free radical scavenging systems. MedChemComm 2016, 7, 2083–2093. [Google Scholar] [CrossRef]
  10. Zheng, L.; Zhao, Y.; Dong, H.; Su, G.; Zhao, M. Structure-activity relationship of antioxidant dipeptides: Dominant role of Tyr, Trp, Cys and Met residues. J. Funct. Foods 2016, 21, 485–496. [Google Scholar] [CrossRef]
  11. Radman, A.; Gredicak, M.; Kopriva, I.; Jeric, I. Predicting Antitumor Activity of Peptides by Consensus of Regression Models Trained on a Small Data Sample. Int. J. Mol. Sci. 2011, 12, 8415–8430. [Google Scholar] [CrossRef] [PubMed]
  12. Wu, S.; Qi, W.; Su, R.; Li, T.; Lu, D.; He, Z. CoMFA and CoMSIA analysis of ACE-inhibitory, antimicrobial and bitter-tasting peptides. Eur. J. Med. Chem. 2014, 84, 100–106. [Google Scholar] [CrossRef] [PubMed]
  13. Miner-Williams, W.M.; Stevens, B.R.; Moughan, P.J. Are intact peptides absorbed from the healthy gut in the adult human? Nutr. Res. Rev. 2014, 27, 308–329. [Google Scholar] [CrossRef] [PubMed]
  14. Chen, H.-M.; Muramoto, K.; Yamauchi, F.; Nokihara, K. Antioxidant activity of designed peptides based on the antioxidative peptide isolated from digests of a soybean protein. J. Agric. Food Chem. 1996, 44, 2619–2623. [Google Scholar] [CrossRef]
  15. Wang, J.H.; Liu, Y.L.; Ning, J.H.; Yu, J.; Li, X.H.; Wang, F.X. Is the structural diversity of tripeptides sufficient for developing functional food additives with satisfactory multiple bioactivities? J. Mol. Struct. 2013, 1040, 164–170. [Google Scholar] [CrossRef]
  16. Deng, B.-C.; Yun, Y.-H.; Liang, Y.-Z. Model population analysis in chemometrics. Chemom. Intell. Lab. Syst. 2015, 149, 166–176. [Google Scholar] [CrossRef]
  17. Deng, B.C.; Lu, H.M.; Tan, C.Q.; Deng, J.P.; Yin, Y.L. Model population analysis in model evaluation. Chemom. Intell. Lab. Syst. 2018, 172, 223–228. [Google Scholar] [CrossRef]
  18. Deng, B.C.; Yun, Y.H.; Cao, D.S.; Yin, Y.L.; Wang, W.T.; Lu, H.M.; Luo, Q.Y.; Liang, Y.Z. A bootstrapping soft shrinkage approach for variable selection in chemical modeling. Anal. Chim. Acta 2016, 908, 63–74. [Google Scholar] [CrossRef] [PubMed]
  19. Cao, D.S.; Liang, Y.Z.; Xu, Q.S.; Li, H.D.; Chen, X. A new strategy of outlier detection for QSAR/QSPR. J. Comput. Chem. 2010, 31, 592–602. [Google Scholar] [CrossRef] [PubMed]
  20. Li, Y.W.; Li, B.; He, J.G.; Qian, P. Quantitative structure-activity relationship study of antioxidative peptide by using different sets of amino acids descriptors. J. Mol. Struct. 2011, 998, 53–61. [Google Scholar] [CrossRef]
  21. Tian, M.; Fang, B.; Jiang, L.; Guo, H.Y.; Cui, J.Y.; Ren, F.Z. Structure-activity relationship of a series of antioxidant tripeptides derived from beta-Lactoglobulin using QSAR modeling. Dairy Sci. Technol. 2015, 95, 451–463. [Google Scholar] [CrossRef]
  22. Deng, B.C.; Yun, Y.H.; Liang, Y.Z.; Yi, L.Z. A novel variable selection approach that iteratively optimizes variable space using weighted binary matrix sampling. Analyst 2014, 139, 4836–4845. [Google Scholar] [CrossRef] [PubMed]
  23. Saito, K.; Jin, D.H.; Ogawa, T.; Muramoto, K.; Hatakeyama, E.; Yasuhara, T.; Nokihara, K. Antioxidative properties of tripeptide libraries prepared by the combinatorial chemistry. J. Agric. Food Chem. 2003, 51, 3668–3674. [Google Scholar] [CrossRef] [PubMed]
  24. Benzie, I.F.F.; Strain, J.J. The ferric reducing ability of plasma (FRAP) as a measure of “Antioxidant power”: The FRAP assay. Anal. Biochem. 1996, 239, 70–76. [Google Scholar] [CrossRef] [PubMed]
  25. Hellberg, S.; Sjöström, M.; Skagerberg, B.; Wold, S. Peptide quantitative structure-activity relationships, a multivariate approach. J. Med. Chem. 1987, 30, 1126–1135. [Google Scholar] [CrossRef] [PubMed]
  26. Sandberg, M.; Eriksson, L.; Jonsson, J.; Sjöström, M.; Wold, S. New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. J. Med. Chem. 1998, 41, 2481–2491. [Google Scholar] [CrossRef] [PubMed]
  27. Tian, F.; Yang, L.; Lv, F. In silico quantitative prediction of peptides binding affinity to human MHC molecule:an intuitive quantitative structure-activity relationship approach. Amino Acids 2009, 36, 535–554. [Google Scholar] [CrossRef] [PubMed]
  28. Zaliani, A.; Gancia, E. ChemInform Abstract: MS-WHIM Scores for Amino Acids: A New 3D-Description for Peptide QSAR and QSPR Studies. J. Chem. Inf. Model. 1999, 39, 525–533. [Google Scholar] [CrossRef]
  29. Collantes, E.R.; Rd, D.W. Amino acid side chain descriptors for quantitative structure-activity relationship studies of peptide analogues. J. Med. Chem. 1995, 38, 2705–2713. [Google Scholar] [CrossRef] [PubMed]
  30. Mei, H.; Liao, Z.H.; Zhou, Y.; Li, S.Z. A new set of amino acid descriptors and its application in peptide QSARs. Biopolymers 2005, 80, 775–786. [Google Scholar] [CrossRef] [PubMed]
  31. Liang, G.; Li, Z. Factor analysis scale of generalized amino acid information as the source of a new set of descriptors for elucidating the structure and activity relationships of cationic antimicrobial peptides. QSAR Comb. Sci. 2007, 26, 754–763. [Google Scholar] [CrossRef]
  32. Tong, J.B.; Liu, S.L.; Zhou, P.; Wu, B.L.; Li, Z.L. A novel descriptor of amino acids and its application in peptide QSAR. J. Theor. Biol. 2008, 253, 90–97. [Google Scholar] [CrossRef] [PubMed]
  33. Tian, F.F.; Zhou, P.; Li, Z.L. T-scale as a novel vector of topological descriptors for amino acids and its application in QSARs of peptides. J. Mol. Struct. 2007, 830, 106–115. [Google Scholar] [CrossRef]
  34. Yang, L.; Shu, M.; Ma, K.W.; Mei, H.; Jiang, Y.J.; Li, Z.L. ST-scale as a novel amino acid descriptor and its application in QSAM of peptides and analogues. Amino Acids 2010, 38, 805–816. [Google Scholar] [CrossRef] [PubMed]
  35. Venkatarajan, M.S.; Braun, W. New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical–chemical properties. Mol. Model. Annu. 2001, 7, 445–453. [Google Scholar]
  36. Lin, Z.H.; Long, H.X.; Bo, Z.; Wang, Y.Q.; Wu, Y.Z. New descriptors of amino acids and their application to peptide QSAR study. Peptides 2008, 29, 1798–1805. [Google Scholar] [CrossRef] [PubMed]
  37. Wang, X.; Wang, J.; Lin, Y.; Ding, Y.; Wang, Y.Q.; Cheng, X.; Lin, Z. QSAR study on angiotensin-converting enzyme inhibitor oligopeptides based on a novel set of sequence information descriptors. J. Mol. Model. 2011, 17, 1599–1606. [Google Scholar] [CrossRef] [PubMed]
  38. Shu, M.; Mei, H.; Yang, S.B.; Liao, L.M.; Li, Z.L. Structural Parameter Characterization and Bioactivity Simulation Based on Peptide Sequence. Qsar Comb. Sci. 2009, 28, 27–35. [Google Scholar] [CrossRef]
  39. Shu, M.; Huo, D.Q.; Mei, H.; Liang, G.Z.; Zhang, M.; Li, Z.L. New Descriptors of Amino Acids and Its Applications to Peptide Quantitative Structure-activity Relationship. Chin. J. Struct. Chem. 2008, 27, 1375–1383. [Google Scholar]
  40. Wold, S.; Sjöström, M.; Eriksson, L. PLS-regression: A basic tool of chemometrics. Chemom. Intell. Lab. Syst. 2008, 58, 109–130. [Google Scholar] [CrossRef]
Figure 1. The process of model population analysis (MPA)-based outlier elimination on the FTC dataset of integrated descriptors. The dashed line is defined as the boundary for outliers, which is mean ± 3× standard deviation of prediction errors. (A) No outlier was eliminated, (B) sample No. 181 was eliminated, (C) sample No. 183 was eliminated, (D) sample No. 182 was eliminated, (E) sample No. 134 was eliminated, (F) sample No. 151 was eliminated, (G) sample No. 153 was eliminated, and (H) sample No. 188 was eliminated and all of the outliers were removed.
Figure 1. The process of model population analysis (MPA)-based outlier elimination on the FTC dataset of integrated descriptors. The dashed line is defined as the boundary for outliers, which is mean ± 3× standard deviation of prediction errors. (A) No outlier was eliminated, (B) sample No. 181 was eliminated, (C) sample No. 183 was eliminated, (D) sample No. 182 was eliminated, (E) sample No. 134 was eliminated, (F) sample No. 151 was eliminated, (G) sample No. 153 was eliminated, and (H) sample No. 188 was eliminated and all of the outliers were removed.
Ijms 20 00995 g001
Figure 2. Frequency of variables selected by the bootstrapping soft shrinkage (BOSS) method on the FTC dataset in 100 runs. The higher frequency denotes higher variable importance. The top 11 variables with frequency larger than 75 were marked in the figure.
Figure 2. Frequency of variables selected by the bootstrapping soft shrinkage (BOSS) method on the FTC dataset in 100 runs. The higher frequency denotes higher variable importance. The top 11 variables with frequency larger than 75 were marked in the figure.
Ijms 20 00995 g002
Figure 3. The result of QSAR model building on the FRAP dataset. (A) The result of MPA-based outlier detection on the FRAP dataset of integrated descriptors. No outlier was detected. (B) Frequency of variables selected by the BOSS method on the FRAP dataset in 100 runs. The higher frequency denotes higher variable importance. The six top variables with frequency larger than 75 are marked in the figure.
Figure 3. The result of QSAR model building on the FRAP dataset. (A) The result of MPA-based outlier detection on the FRAP dataset of integrated descriptors. No outlier was detected. (B) Frequency of variables selected by the BOSS method on the FRAP dataset in 100 runs. The higher frequency denotes higher variable importance. The six top variables with frequency larger than 75 are marked in the figure.
Ijms 20 00995 g003
Table 1. Comparisons among different quantitative structure-activity relationships (QSAR) models on ferric thiocyanate (FTC) dataset a.
Table 1. Comparisons among different quantitative structure-activity relationships (QSAR) models on ferric thiocyanate (FTC) dataset a.
DescriptorsBefore Outlier EliminationAfter Outlier Elimination
Q2R2optPCQ2R2optPCOutlier
HSEHPCSV0.38610.578140.61700.733820183, 182, 181, 134
ST-scale0.42680.5733120.59930.684413183, 182, 181, 134
HESH0.40910.536620.59680.704710183, 181, 182, 134, 129
VSW0.49010.577130.59250.67685181, 183, 182, 134, 151
G-scale0.45160.552760.58430.65749181, 183, 182, 134, 118
FASGAI0.48140.545750.55440.61306129, 181, 128
DPPS0.47400.563770.53790.62788181, 182, 183, 134
E-scale0.49560.545140.51440.55824181, 182, 183, 112
5Z-scale0.39030.4626120.39740.46539181, 182, 183, 172
VHSE0.42650.5432120.39740.5148181, 182, 183, 172
T-scale0.32800.421590.37280.43629181, 182, 183
V-scale0.33710.378550.30700.34586181, 183, 182
Z-scale0.28140.339840.26780.34154181
ISA-ECI0.14930.191660.15720.18366183, 182, 181
MS-WHTM10.07360.148830.10360.16783181, 183, 182
MS-WHTM20.07750.144530.08820.16173181, 182, 183
Integrated descriptors0.48110.584330.68180.79648181, 183, 182, 134, 151, 153, 188
BOSS 0.7471 ± 0.00320.7931 ± 0.00629.72 ± 3.2199
a R2 is the coefficient of determination; Q2 is the cross-validated R2; optPC is optimal principal components for PLS regression model; the results of BOSS are shown in the form of mean value ± standard deviation in 100 runs; the top ranked Q2 scores were marked in bold.
Table 2. Comparisons among different QSAR models on FRAP dataset a.
Table 2. Comparisons among different QSAR models on FRAP dataset a.
DescriptorsBefore Logarithmic TransformationAfter Logarithmic Transformation
Q2R2optPCQ2R2optPC
VHSE0.00420.265530.48780.61226
5Z-scale0.14080.317720.48090.55683
DPPS0.00590.229030.41470.54634
ST-scale0.02630.322080.39680.54109
FASGAI0.04700.275320.37350.50064
E-scale0.05600.252110.37140.47345
HESH0.04440.2818100.36680.52903
HSEHPCSV0.02590.247570.36240.49523
G-scale0.10660.233450.28360.38501
VSW0.01300.307110.23820.43612
MS-WHTM20.03420.037030.17280.25943
MS-WHTM10.04520.032990.12070.19414
T-scale0.06820.070620.07500.212910
V-scale0.02930.074840.06990.14951
Z-scale0.00520.144510.03010.14566
ISA-ECI0.02420.014110.00710.04111
Integrated descriptors0.10690.421230.49530.64233
BOSS 0.6088 ± 0.00410.6655 ± 0.00943.5100 ± 2.5086
a R2 is the coefficient of determination; Q2 is the cross-validated R2; optPC is optimal principal components for PLS regression model; the results of BOSS are shown in the form of mean value ± standard deviation in 100 runs, the top ranked Q2 scores were marked in bold.
Table 3. Sequences and antioxidant activities of tripeptides on ferric thiocyanate (FTC) dataset a.
Table 3. Sequences and antioxidant activities of tripeptides on ferric thiocyanate (FTC) dataset a.
No.SequenceActivityNo.SequenceActivityNo.SequenceActivityNo.SequenceActivityNo.SequenceActivityNo.SequenceActivity
1LHA3.91837PHA5.79373RHA5.205109DHH0.9045145HHH0.0635181YHY9.886
2LHD3.59338PHD4.62274RHD3.304110EHH0.9045146HHK0.0635182YKY9.886
3LHE6.13639PHE6.15275RHE5.096111HHH0.0000147HHR0.0635183YRY9.886
4LHF3.62840PHF3.91676RHF3.300112KHH0.0000148HHA0.0680184YAY3.607
5LHG6.69741PHG5.19777RHG5.725113AHH2.020149HHI0.0680185YIY3.607
6LHH4.83642PHH6.05178RHH3.296114IHH2.020150HHL0.0680186YLY3.607
7LHI6.53143PHI4.91679RHI4.806115FHH1.803151HHF3.612187YFY2.233
8LHK4.22544PHK3.42680RHK2.694116WHH1.803152HHW3.612188YWY2.233
9LHL5.92045PHL5.31181RHL3.501117YHH1.803153HHY3.612189YYY2.233
10LHM4.50446PHM3.71482RHM3.218118GHH1.089154HHG0.3170190YGY3.366
11LHN5.14847PHN6.06183RHN5.713119NHH1.089155HHN0.3170191YNY3.366
12LHQ4.13648PHQ3.71884RHQ3.108120QHH1.089156HHQ0.3170192YQY3.366
13LHR5.18449PHR4.75185RHR4.302121MHH2.015157HHM0.0817193YMY1.780
14LHS4.29350PHS4.04286RHS3.386122SHH1.320158HHS0.0862194YSY3.447
15LHT5.58451PHT6.24787RHT5.987123THH1.320159HHT0.0862195YTY3.447
16LHV3.48152PHV3.33588RHV3.206124CHH0.9369160HHC0.1277196YCY3.087
17LHW6.79153PHW6.53589RHW5.878125HDH1.477161DYY3.417197YYD4.116
18LHY4.20354PHY4.22790RHY3.378126HEH1.477162EYY3.417198YYE4.116
19LWA1.19255PWA1.39691RWA1.212127HHH0.0441163HYY2.257199YYH5.303
20LWD1.71756PWD1.09692RWD0.9091128HKH0.0441164KYY2.257200YYK5.303
21LWE1.71757PWE1.09693RWE1.091129HRH0.0441165RYY2.257201YYR5.303
22LWF1.41458PWF0.919294RWF0.9091130HAH0.9518166AYY3.071202YYA3.344
23LWG1.31359PWG2.68795RWG1.717131HIH0.9518167IYY3.071203YYI3.344
24LWH3.21260PWH1.18496RWH1.091132HLH0.9518168LYY3.071204YYL3.344
25LWI1.11161PWI1.39697RWI1.232133HFH2.026169FYY1.911205YYF4.050
26LWK1.89962PWK0.406698RWK0.6061134HWH2.026170WYY1.911206YYW4.050
27LWL0.606063PWL1.09699RWL3.212135HYH2.026171YYY1.911207YYY4.050
28LWM1.39464PWM0.7955100RWM0.7273136HGH0.8318172GYY5.071208YYG2.996
29LWN1.31365PWN2.104101RWN2.404137HNH0.8318173NYY5.071209YYN2.996
30LWQ2.50566PWQ1.202102RWQ0.6061138HQH0.8318174QYY5.071210YYQ2.996
31LWR2.90967PWR2.705103RWR2.384139HMH0.8734175MYY1.991211YYM2.103
32LWS2.02068PWS1.096104RWS0.8081140HSH0.7304176SYY3.070212YYS3.983
33LWT2.02069PWT2.598105RWT3.818141HTH0.7304177TYY3.070213YYT3.983
34LWV1.61670PWV1.008106RWV0.6061142HCH0.9747178CYY0.4699214YYC0.6369
35LWW3.51571PWW2.899107RWW2.707143HHD0.1877179YDY3.047
36LWY2.22272PWY1.114108RWY0.8081144HHE0.1877180YEY3.047
a The data containing 214 antioxidant tripeptides was collected from the literature of Saito et al. [23] and Li et al. [20]. Antioxidant activities of tripeptides were measured by the FTC method and were relative values by adjusting the control to 1.0.
Table 4. Sequences and activities of tripeptides on ferric ion reducing antioxidant power (FRAP) dataset a.
Table 4. Sequences and activities of tripeptides on ferric ion reducing antioxidant power (FRAP) dataset a.
No.SequenceActivityNo.SequenceActivityNo.SequenceActivityNo.SequenceActivityNo.SequenceActivityNo.SequenceActivity
1LTC2.8330LPM1.0459YKK0.2588NGE−0.30117ELK−0.66146KIP−1.15
2CQC2.5331TDY1.0160AQA0.2289QSA−0.33118PEQ−0.66147LLD−1.22
3GTW2.5232QCH1.0061LRV0.2090DAQ−0.34119IDA−0.68148DLE−1.22
4LFC2.0733TWY0.9662PTP0.1891ENS−0.34120LLA−0.70149PEV−1.22
5CLV2.0634RVY0.9563ALN0.1892ENG−0.37121ALA−0.72150LKP−1.40
6QKW2.0335KWE0.9064LEI0.1693NSA−0.37122GLD−0.72151ALE−1.52
7CME1.9936CLL0.8965LVR0.1394EKT−0.38123DIS−0.72152TQL−1.52
8YLL1.9137LAM0.8566HIR0.1295EQS−0.38124PEG−0.72153LEE−1.52
9QCL1.6938YSL0.8167KKI0.1196AMA−0.41125LDI−0.74154LEK−1.70
10LAC1.6939MKG0.8068SFN0.0797KID−0.41126AEP−0.74155DAL−2.00
11GEC1.6440QTM0.8069SLL0.0698GAQ−0.43127ALI−0.77156EVD−2.00
12EQC1.5241LAL0.7670PAV0.0499PLR−0.44128LDA−0.77157VDD−2.00
13FCM1.5142QAL0.7371RLS0.04100ILL−0.46129VFK−0.77158DEA−2.00
14CHI1.4543MEN0.7372AGT0.04101VRT−0.46130ALK−0.77159ALT-
15ACQ1.3844MKC0.7273LLF0.02102IAE−0.49131AQK−0.82160KGL-
16EEL1.3345LSF0.6974PMH0.00103QSL−0.49132IIA−0.82161IQK-
17WEN1.3146TCG0.6775EEQ−0.01104KTK−0.51133LIV−0.85162QKV-
18VYV1.1947SLA0.6576LVL−0.02105ASD−0.52134EGD−0.85163GDL-
19MHI1.1648TMK0.6477QLE−0.05106APL−0.52135QKK−0.85164EIL-
20CAQ1.1249LDT0.6278FDK−0.07107AQS−0.57136IPA−0.85165KII-
21WYS1.1250EKF0.5479LLL−0.08108ENK−0.57137SDI−0.89166NKV-
22KYL1.0851VLV0.5380SAP−0.08109TPE−0.59138VEE−0.89167DTD-
23CGA1.0852MAA0.4481LLQ−0.12110RTP−0.59139DDE−0.89168EPE-
24KKY1.0853PTQ0.4482NPT−0.17111VLD−0.62140KVL−0.92169EAL-
25NEN1.0854VAG0.4183FNP−0.20112IRL−0.62141KFD−0.92170DKA-
26ECA1.0755ALP0.3784LNE−0.24113AAS−0.64142IVT−0.96171KAL-
27DYK1.0656AVF0.3685SAE−0.26114LQK−0.64143VTQ−0.96172LKA-
28KCL1.0557KVA0.3186KPT−0.28115FKI−0.64144AEK−0.96
29YVE1.0558TQT0.2687DIQ−0.30116ISL−0.66145TKI−1.10
a The data containing 172 antioxidant tripeptides was collected from the literature of Tian et al. [21]. Antioxidant activities of tripeptides were measured by the FRAP assay and were logarithmic transformed. Fourteen inactive peptides were removed before model building.
Table 5. Parameters of 16 amino acid descriptors.
Table 5. Parameters of 16 amino acid descriptors.
DescriptorNo. of Physicochemical PropertyNo. of Extracted VariableScope of Variable
Z-scale [25]293Electronic property, steric property and hydrophobic property
5Z-scale [26]265Electronic property, steric property and hydrophobic property
DPPS [27]11910Electronic property, steric property, hydrophobic property and hydrogen bond
MS-WHIM [28]363Surface charge distribution, size and charge over shape dependence
ISA-ECI [29]/2Isotropic surface area and electronic charge index
VHSE [30]508Electronic property, steric property and hydrophobic property
FASGAI [31]3356Hydrophobic property, alpha and turn property, bulky property, electronic property, compositional characteristics, local flexibility
VSW [32]999Molecular size, shape, symmetry and atom distribution
T-scale [33]675Topological property
ST-scale [34]8278Molecular constitutional, topological, geometrical, hydrophobic, electronic and steric property
E-scale [35]2375Hydrophobic property, size, preferences for amino acids to occur in α-helices, number of degenerate triplet codons and the frequency of occurrence of amino acid residues in β-strands
V-scale [36]/3Van Der Wall’s volume, net charge index and hydrophobic parameter of side chains
G-scale [37]4578Electronic property, steric property and hydrophobic property
HESH [38]17112Electronic property, steric property, hydrophobic property and hydrogen bond
HSEHPCSV [39]9512Hydrophobic, steric, electronic properties and hydrogen bond

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Int. J. Mol. Sci. EISSN 1422-0067 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top