In Silico Prospecting for Novel Bioactive Peptides from Seafoods: A Case Study on Pacific Oyster (Crassostrea gigas)

Pacific oyster (Crassostrea gigas), an abundant bivalve consumed across the Pacific, is known to possess a wide range of bioactivities. While there has been some work on its bioactive hydrolysates, the discovery of bioactive peptides (BAPs) remains limited due to the resource-intensive nature of the existing discovery pipeline. To overcome this constraint, in silico-based prospecting is employed to accelerate BAP discovery. Major oyster proteins were digested virtually under a simulated gastrointestinal condition to generate virtual peptide products that were screened against existing databases for peptide bioactivities, toxicity, bitterness, stability in the intestine and in the blood, and novelty. Five peptide candidates were shortlisted showing antidiabetic, anti-inflammatory, antihypertensive, antimicrobial, and anticancer potential. By employing this approach, oyster BAPs were identified at a faster rate, with a wider applicability reach. With the growing market for peptide-based nutraceuticals, this provides an efficient workflow for candidate scouting and end-use investigation for targeted functional product preparation.


Introduction
Bioactive peptides (BAPs) are short peptides, usually 2-20 residues in length [1], that can exhibit a positive impact on human health through mitigating various molecular pathways implicated in chronic disease development. Dietary proteins are a widely available and accessible source of bioactive peptides. The human digestive system can, among many other methods such as food processing, fermentation, enzymatic and chemical hydrolysis [2], effectively release cryptic BAPs within food proteins through the hydrolytic action by digestive proteases. Therefore, the bioactive potential of plant and animal proteins liberated during digestion has been widely studied through in vitro, in vivo, and in silico approaches [3,4]. Many functional food products and ingredients that carry food-derived bioactive peptides are commercially available [5].
Marine organisms have gained tremendous research interest in their bioactivity and therapeutic potential [6,7]. Oysters are the most cultured shellfish worldwide [8] and are traditionally used as medicine for various purposes in Asia [9]. They are nutrient-dense, high in protein, low in fat, and rich in various micronutrients and bioactive components. Pacific oyster (Crassostrea gigas) has shown promise experimentally exhibiting various bioactivities including antimicrobial, antioxidant, anti-inflammatory, angiotensin-converting enzyme (ACE)-inhibitory, anti-cancer, and promotion of sexual reproductive hormones [8,10,11]. Therefore, C. gigas demonstrates immense opportunity as a BAP precursor in a wide range of bioactivity classes for functional food and nutraceutical application. BAPs bring key advantages including high target specificity, potency, and low toxicity compared to small molecule drugs. However, the main challenges in translating BAP research remain their questionable bioavailability, stability in digestion and circulation, and bitterness [12,13]. To circumvent these disadvantages in new BAP discovery, in silico screening and prediction methods may be applied. Many studies examining bioactive peptides have effectively incorporated in silico methods to accelerate the conventional BAP discovery process, thereby achieving higher precision, time, cost, and resource savings [14][15][16][17]. While there are many studies mining the bioactive potential and BAP sequences from C. gigas and related species, few have integrated in silico methods to cover a wider candidate pool than the physical samples and available reagents may allow [15,18]. In this present study, we set out to identify novel BAP candidates from C. gigas flesh released by human consumption using in-silico screening methods and develop a transferrable pipeline for the in silico discovery of BAPs from GI-digested food proteins.

Simulated Gastrointestinal Digestion
The enzymatic hydrolysis on all 13 selected C. gigas proteins with human digestive enzymes, pepsin, trypsin, and chymotrypsin (Table 1) resulted in 1800 total protein fragments (excluding single amino acids) via BIOPEP Enzyme Action Tool and ExPASy PeptideCutter combined. The peptide lengths ranged from 2 to 16 residues. The resulting peptide profiles from each protein are shown in Figure 1. To maximize the potential for novelty and specificity to oyster origin while maintaining bioavailability, only peptides of 5 to 10 residues in length were selected for further analyses, which narrowed down the candidate pool to a total of 289 unique peptide sequences. ToxinPred and iBitter-SCM revealed 126 bitter peptides, one of which is also predicted to be toxic. Only the remaining 163 unique non-toxic, non-bitter peptides were considered for further analysis. HLP analysis shows that all but two remaining peptide candidates have high intestinal stability, defined by half-life > 1.0 s, while PLifePred analysis predicted plasma half-life ranging from 757.01 to 1058.51 s. Most natural food-derived peptides have a half-life between 800 to 900 s based on PLifePred from Gülseren and Vahapoglu's [19] investigation of 3074 peptides from 12 different food sources. Therefore, a threshold half-life of 800 s or above was chosen to benchmark for relatively high C. gigas peptide stability in the blood. From the multiple database scans, only one sequence AGDDAPR from the peptide list was previously reported as a bioactive peptide record in BIOPEP and EROP-Moscow databases, with ACE-inhibiting, antioxidative, Pancreatic lipase-inhibiting, and alpha-amylase-inhibiting activities. With our comprehensive screening phase detailed above, 151 nontoxic, non-bitter, highly stable, and novel sequences remained for bioactivity screening in silico, shown in Table 2.

Prospect Oyster BAPs
From the BAP candidate pool (151 sequences), 22 were shortlisted based on their predicted bioactivities in various in silico platforms performed (Table 3). Sequences were

Prospect Oyster BAPs
From the BAP candidate pool (151 sequences), 22 were shortlisted based on their predicted bioactivities in various in silico platforms performed (Table 3). Sequences were considered high potential/prospect BAPs when they gain scores beyond the established predictive model threshold and when these consistently show interaction on some known binding sites associated with each target bioactivity. Oyster antihypertensive peptides (AHPs) were selected based on AHTpin scores (>1.0) and predicted binding to all three active site residues of human angiotensin-converting enzyme (ACE1) with PepSite2; a full analysis is shown in Table 4. Candidate AHPs were further shortlisted based on predicted bioavailability indices, namely <30% predicted human intestinal absorption and <20% bioavailability score using ADMETlab2.0 (Supplementary Table S2).    [22]. In addition, the water molecule was also bound to E384 via hydrogen bonds. These main stabilizing residues are Q281, H353, A354, K511, H513, Y520, and Y523 [22,23].
Antidiabetic peptides (ADPs) were selected based on iDPPIV-SCM scores (> 350) with at least two binding sites for human dipeptidyl peptidase (DPPIV) when analyzed using PepSite2 (Table 5). High-confidence anti-inflammatory peptides (AIPs) were predicted by both PreAIP and Antiinflam were ranked to select the top four peptides. The top antimicrobial peptide (AMP) selection cutoffs were set at CAMP-SVM = 1, CAMP-RF > 0.54, ADAM > 2. The only positive hit in DBAASP v3 s General Antimicrobial Activity prediction result was also included. The three most recent and performance-optimized anti-cancer peptide prediction servers, ACPred, iDACP, and mACPpred were chosen from a wide range of available tools. Positive cross-hits from all three platforms were selected as the top three anti-cancer peptides.

Discussion
The ease and efficiency of in silico approach have fueled its increasing traction in the predictive screening of BAPs, especially combining with or preceding the traditional in vitro/in vivo experiments. Meanwhile, a growing and diversifying pool of in silico tools is being developed and improved. This presents a huge opportunity to define an adaptable workflow for effective BAP scouting using the latest tools. Pacific oyster has shown tremendous potential with their wide range of associated bioactivities [8]. Therefore, using C. gigas as a case study, we have outlined an applicable framework using existing publicly accessible in silico tools to efficiently scope for novel BAPs from commonly consumed seafood.
In silico proteolysis is an efficient way to simulate GI digestion, since the oyster proteins are often consumed directly. Both available platforms, BIOPEP Enzyme Action Tool and ExPASy PeptideCutter, present two sets of pepsin and chymotrypsin subtypes: pepsin at pH = 1.3 vs pH > 2; chymotrypsin A with low or high specificity. However, recent studies which adopted in silico simulated GI digestion used discordant enzyme choices without a clear explanation of the selections [3,[25][26][27][28]. Such a variation presents a potential understanding gap in applying the platforms. In comparison, the human pepsin A3 information as documented in the peptidase database MEROPS (A01.001) better matches the cleavage preferences and lower specificity of pepsin at pH > 2 ( Table 2). In addition, the lower specificity yields shorter and more digestion-tolerant peptide products, which is preferable for an overhaul of bio-accessible fragments. Similarly, human digestive chymotrypsin A's catalytic activity from its MEROPS (S01.152) entry is a much better match to low specificity chymotrypsin A's activity in the hydrolysis platforms and can produce highly digestive-tolerant peptides. Finally, the comparisons led to the best-aligned enzyme choices used for this study, indicated in Table 2.
The peptide distribution post-digestion in Figure 1 shows muscle proteins, particularly myosin and paramyosin, yielded greater proportions of short peptides, which indicates higher digestibility with dietary relevance as muscle proteins are highly abundant in the flesh [15]. One caveat is that both digestion tools assume no protein folding which could restrict enzyme access. Under real-life gut conditions, there might be fewer cleavages made than virtually generated.
The outputs from both digestion tools, BIOPEP Enzyme Action Tool and ExPASy PeptideCutter were included collectively due to the observation that only a small portion of peptide results overlapped, with only di-and tripeptides, despite using the same set of enzymes. We suspected that the enzyme's cleavage sites were encoded differently. As evident in Figure 1, BIOPEP generated predominantly di-and tripeptides, and very few oligopeptides. In fact, the sequence length range of five to ten residues has no overlapping results between the two platforms. Table 2 details the enzyme cleavage information cited in the respective server page, which shows slight variation between the two platforms both based on the references and the resulting sequences. Therefore, pooling all resulting peptides from both sources helps maximize the chance to identify potential bioactive peptides.
A peptide length range of 5 to 10 residues was determined to balance and maximize bioavailability and novelty. Bioactive peptides can transport from the gut lumen through the intestinal epithelium into the bloodstream, by four possible pathways: PepT1-mediated permeation, paracellular transport through tight junctions, transcytosis via vesicles, and passive transcellular diffusion [29]. Xu et al.'s review [29] also presented BAPs in this length range that can cross the Caco-2 cell monolayers via the above mechanisms, with high permeability coefficients comparable to shorter peptides. Another reason to exclude di-, tri-and tetra-peptides from our study is the ubiquity of short peptides which makes them more likely to be omnipresent in various food sources. For example, dipeptide VY possesses antihypertensive activity but is shared by sardines, seaweed, sesame, and royal jelly products among commercialized bioactive peptide products [2]. Therefore, our peptide size filter will help maximize novelty by uncovering new and uniquely oyster sequences.
From enzymatic digestion to preliminary screening against unwanted attributes, to final in silico bioactivity scoping, the peptide candidate pool has been narrowed down from 1800 total peptide products post-digestion, with 308 sequences in the 5 to 10 residue length range, to 151 unique non-toxic, non-bitter, stable, and novel peptide candidates, then finally to 22 top sequences with high potential bioactivities. Such a sequential funnel procedure can greatly improve the breadth and power of BAP screening. The selected activity prediction servers in this study utilize a variety of Machine Learning algorithm-based models. Top BAP candidates in the final shortlist have scored highly across multiple predictors, whose model construction and datasets have been vetted for reliability. In addition, PepSite2 analysis complements the predictions by modeling the extent of protein-peptide interaction for enzyme inhibitory mechanisms.
Amino acid and dipeptide compositions combined with physicochemical properties are also helpful to elucidate shared features and trends in new and known BAPs. The peptides with strong ACE-inhibitory activity likely contain C-terminal proline, hydrophobic, aliphatic, branched, aromatic, or positively charged residues, N-terminal branched chain, or hydrophobic residues [30][31][32][33]. In addition, proline-containing peptides generally help resist digestion [1,2]. The peptides with strong DPPIV-inhibitory activity likely contain F, R, and Y residues [34]. C-terminal proline and DPP-IV substrate motif at the N-terminal, Xaa-Pro, or Xaa-A may act as DPP-IV inhibitors [35,36]. Anti-inflammatory peptides benefit from hydrophobic residues at the termini, and high hydrophobic residue content has shown stronger NO inhibition ability in vitro [37,38]. They have also shown an association with positively charged residues, though inconsistent findings have been reported [39,40]. Antimicrobial peptides (AMPs) are generally amphiphilic with hydrophobic and cationic residues, mainly lysine and arginine, to facilitate interaction with the negatively charged microbial lipid cell membranes [41][42][43][44]. A subset of AMPs can also serve as anticancer peptides with selective membranolytic and cytolytic activities. These anticancer peptides generally contain 5-30 residues in length, and as with AMPs, hydrophobic and cationic (preferably by lysine) and amphiphilic properties are significant to their diverse modes of action [42,45,46], while cysteine-rich domains can form stabilized scaffolds to help maintain extracellular motifs and structures [46,47]. Our shortlisted peptides in Table 3 show general consistency with previously reported composition and property trends detailed above. Nevertheless, no conclusive causal relationship has been drawn between sequencebased propensities and bioactivities. Additional research investigating how the sequence, structure, and properties of BAPs enable their mechanistic action is warranted.
On the other hand, there remains an opportunity to strengthen the proposed framework by incorporating more in-depth 3-D structural analyses. The current study is limited to PepSite2 comparative assessment of the best docking model of protein-peptide interactions, which is applicable to only two out of five bioactivities assessed. Yuan et al.'s recent work on ACE-and DPP-IV-inhibitory peptides from C. gigas followed a similar investigation process but included a docking study analyzed based on CDOCKER interaction energy, a CHARMm-based molecular docking algorithm [48]. The CDOCKER scores indicate ligand binding affinities calculated at different poses of simulated receptor-ligand binding modes. The accuracy and reliability of such molecular docking methods in reproducing experimental observations have shown promise to elucidate molecular mechanisms of small ligands. Although we have narrowed down and identified different BAPs from the same source and digestion process, the in silico platform selection and parameter settings greatly influence the filtering results. For example, in contrast to Yuan et al.'s study, we opted not to use PeptideRanker, (http://bioware.ucd.ie/~compass/biowareweb/Ser verpages/peptideranker.php/, accessed on 10 January 2022), a widely utilized general bioactivity predictor built on an N-1 Neural Network (N1-NN) model based on shared features across various functional classes of BAPs [49]. However, its three classes of BAPs only included antimicrobial peptides, peptide hormones, and peptide toxins/venom. The training dataset is made up of heavily AMPs, including sources from BIOPEP, PeptideDB, APD2, and CAMP. Therefore, within the five bioactivities of interest in this study, we expected PeptideRanker scoring to be skewed towards AMPs.
Future work following up this study may adopt and cross-check using optimized molecular docking software such as flexible CDOCKER and AutoDock [50][51][52], as well as quantitative structure-activity relationship (QSAR) analysis to complement the proposed screening strategy. For example, membrane permeability can be modeled for predicted AMPs with Gram-positive or Gram-negative bacteria, associated with probable antibacterial activity. Going forward, this framework shall be continually updated and optimized with the frequent release of new tools and findings. It may also branch out to include other production methods of BAPs such as via commercial enzymes and microbial fermentation. We also suggest future in vitro validation in two ways. From a top-down approach, enzyme digestion can be performed to mirror the in silico process in this study. Peptidomic analysis can be performed to compare the peptide product profile to that from BIOPEP Enzyme Action Tool and PeptideCutter results. Additionally, bioactivity assays can be performed on both hydrolysates and synthetic peptides of the top candidate sequences from this study to confirm predicted bioactivity and potency.

Protein Selection and Retrieval
C. gigas protein sequences and information were retrieved from the UniProt Knowledgebase (https://www.uniprot.org/uniprotkb/, accessed on 15 November, 2021). Only SwissProt-reviewed entries, and major proteins identified through proteomics from recent literature were included. From the SwissProt database, all nine manually annotated and reviewed protein sequences of C. gigas expressed in the oyster flesh were included. In addition to the database search, we also included the 5 prominent C. gigas proteins identified through SDS-PAGE and subsequent nanoLC-nanoESI-MS/MS analysis [15] to strengthen the abundant protein inclusion criteria. The information on all 13 proteins analyzed in this study is listed in Table 6.

Toxicity, Bitterness, Stability, and Allergenicity Screening
All virtually generated peptide products were subjected to multiple database searches as peptide queries in FASTA form. ToxinPred (https://webs.iiitd.edu.in/raghava/to xinpred/multiple_test.php/, accessed on 10 January 2022), a support vector machine (SVM)-based prediction method with a threshold score of 0.0, was chosen to screen for non-specific peptide cytotoxicity. Digestion peptide products predicted as toxic were excluded. iBitter-SCM (http://camt.pythonanywhere.com/iBitter-SCM/, accessed on 10 January 2022) was used as the bitterness screening tool. Any peptides scoring above the threshold of 333 were predicted to be bitter-tasting, and therefore excluded from subsequent analyses. PLifePred (https://webs.iiitd.edu.in/raghava/plifepred/, accessed on 1 February 2022) was used to assess peptide stability in blood. Peptides with predicted half-life values above 800 s are included. HLP (https://webs.iiitd.edu.in/raghava/hlp/, http://crdd.osdd.net/raghava/hlp/interactive.htm/, accessed on 1 February 2022) was used to assess peptide stability in the intestinal environment. Additionally, our peptide pool was checked against reported allergenic motifs of the major oyster allergen tropomyosin Cra g1 [53][54][55], along with BIOPEP-UVM's Allergenic Proteins and their Epitopes database, listed in Supplementary Table S1. No matches were found to be excluded from subsequent analysis.
In addition, PepSite2 (http://pepsite2.russelllab.org/, accessed on 5 May 2022) was adopted to predict the binding site interactions between BAP candidates and human ACE (PDB entry 1O86) or dipeptidyl peptidase-4 enzyme (DPP-IV) (PDB entry 2ONC) respectively, using Mohd Salim and Gan's analysis procedure with minor modifications [17]. The top-ranked sequences and cross-hits between platforms for each bioactivity were finally selected as high-potential bioactive peptides, shortlisted in Table 3.

Conclusions
The present study presents an in silico-based approach to discover novel bioactive peptides from C. gigas by investigating a wide range of protein and peptide-level databases and tools. A promising bioactive peptide candidate shortlist is provided for antihypertensive, antidiabetic, anti-inflammatory, and anti-cancer activities. These novel BAPs are predicted to be non-toxic, non-bitter, bioavailable, and stable in the intestines and circulation. Based on their predictive bioactivities, it will be of value to validate these bioactivities and explore potency in terms of hydrolysate preparation.

Conflicts of Interest:
The authors declare no conflict of interest.
Sample Availability: Samples of the compounds are not available from the authors.

Abbreviations
BAP bioactive peptide C. gigas Crassostrea gigas T2D type 2 diabetes DPP-IV dipeptidyl peptidase-4 ACE angiotensin-converting enzyme SVM support vector machine RF Random Forest ADMET absorption, distribution, metabolism and excretion properties, and toxicities QSAR quantitative structure-activity relationship GI gastrointestinal