Integration of Deep Learning and Sequential Metabolism to Rapidly Screen Dipeptidyl Peptidase (DPP)-IV Inhibitors from Gardenia jasminoides Ellis

Traditional Chinese medicine (TCM) possesses unique advantages in the management of blood glucose and lipids. However, there is still a significant gap in the exploration of its pharmacologically active components. Integrated strategies encompassing deep-learning prediction models and active validation based on absorbable ingredients can greatly improve the identification rate and screening efficiency in TCM. In this study, the affinity prediction of 11,549 compounds from the traditional Chinese medicine system’s pharmacology database (TCMSP) with dipeptidyl peptidase-IV (DPP-IV) based on a deep-learning model was firstly conducted. With the results, Gardenia jasminoides Ellis (GJE), a food medicine with homologous properties, was selected as a model drug. The absorbed components of GJE were subsequently identified through in vivo intestinal perfusion and oral administration. As a result, a total of 38 prototypical absorbed components of GJE were identified. These components were analyzed to determine their absorption patterns after intestinal, hepatic, and systemic metabolism. Virtual docking and DPP-IV enzyme activity experiments were further conducted to validate the inhibitory effects and potential binding sites of the common constituents of deep learning and sequential metabolism. The results showed a significant DPP-IV inhibitory activity (IC50 53 ± 0.63 μg/mL) of the iridoid glycosides’ potent fractions, which is a novel finding. Genipin 1-gentiobioside was screened as a promising new DPP-IV inhibitor in GJE. These findings highlight the potential of this innovative approach for the rapid screening of active ingredients in TCM and provide insights into the molecular mechanisms underlying the anti-diabetic activity of GJE.


Introduction
Diabetes mellitus (DM) is a common metabolic disease characterized by hyperglycemia [1].An epidemiological survey showed that the global prevalence of diabetes is increasing year by year, and it was predicted to rise to 10.2% (578 million) by 2030, with the vast majority of these cases being type II diabetes (T2D) [2].Incretin-based therapy has recently surfaced as a viable treatment option for individuals with T2D [3].The incretin system is composed of two hormones, namely, glucagon-like peptide-1 (GLP-1) and glucosedependent insulinotropic polypeptide (GIP), which elicit the secretion of insulin from pancreatic β-cells in reaction to elevated levels of blood glucose [4].Nonetheless, the instability of these peptides in vivo presents a significant obstacle due to their limited halflives and vulnerability to degradation and inactivation by dipeptidyl peptidase (DPP)-IV enzymes [5].To tackle this issue, a novel category of therapeutic agents for T2D called DPP-IV inhibitors are increasingly being utilized in clinical settings due to their advantageous physiological properties in reducing glucose levels.These include trogliptin, vildagliptin, and sitagliptin [6].These agents function primarily by extending the degradation time of GLP-1 within the body, thereby promoting insulin secretion and inducing glucose concentration-dependent hypoglycemic effects [7].Nevertheless, clinical investigations have demonstrated that the chemically synthesized DPP-IV inhibitors that are marketed are susceptible to adverse effects, such as headache, rash, diarrhea, and abnormal liver function [8,9].Traditional Chinese medicine (TCM) is a class of drugs based on natural plants, many of which are homologous with medicine and food [10].These drugs are gaining popularity because of several advantages: they often have fewer side effects and better patient tolerance, and they are relatively less expensive and are accepted due to a long history of use [11].A variety of Chinese medicines, such as Astragalus membranaceus, Mulberry leaves, and Radix scutellariae, have demonstrated significant efficacy in lowering blood glucose levels both in vivo and in vitro [12][13][14][15].TCMs can exert hypoglycemic effects through various mechanisms, such as inhibiting the activity of DPP-IV enzyme [16].Therefore, screening for effective DPP-IV inhibitors with low side effects from TCMs is one of the important directions for the development of hypoglycemic drugs.
Drug discovery and development starts with target identification and ends with clinical trials [17].Owing to the large number of assays and tests required and a high risk of failure, the whole process of developing a new drug generally takes 10-20 years as well as a capital investment, which ranges from USD 0.5 billion to USD 2.6 billion [18,19].A major stage in the drug discovery process entails the identification of interactions between drugs and their respective targets, a task conventionally achieved through rigorous in vitro experiments.To mitigate the considerable expenditure of time and resources, there has been a growing emphasis on in silico methodologies [20].Consequently, rather than embarking on an exhaustive in vitro exploration, the initial step involves virtual screening, followed by subsequent experimental validation of potential candidates.The emergence of drug-target interaction (DTI) based on artificial intelligence has become a crucial tool in drug discovery, and its progress has substantially improved the effectiveness of novel drug development [21].DTI prediction serves as an important step in the process of drug discovery.More recently, deep-learning-based approaches have rapidly progressed for computational DTI prediction due to their successes in other areas, enabling large-scale validation in a relatively short time [22].In this study, a deep-learning model, DrugBAN, was employed to identify the compound DPP-IV protein interactions.This is the state-ofthe-art method in the prediction of compound-protein interactions and was reported to have excellent accuracy [23].
Contrary to the predominant "one target, one drug" approach of Western medicine, TCM is a multifaceted system that operates through the modulation of multiple physiological pathways, utilizing a variety of components and targets [24].The complex chemical profile of TCM results in certain components being incapable of producing their intended effects.Only components that successfully reach the target and maintain an appropriate blood concentration are deemed therapeutically effective [25,26].Therefore, targeting the absorbed ingredients by in vivo metabolic methods can effectively increase the hit rate of the active compound and simplify the description of the active substance [27].Through this approach, Luo et al. introduced an integrated strategy founded upon a sequential metabolite identification approach, network pharmacology, molecular docking, and surface plasmon resonance (SPR) analysis.This method led to the successful identification of the active constituents within Paeoniae Radix Alba [28].
Gardenia jasminoides Ellis (GJE) is the dried and mature fruit of the Rubiaceae plant Gardenia and a kind of homologous food medicine that is rich in various bioactive compounds that exhibit a diverse range of pharmacological activities.Among them, iridoid glycosides and yellow pigment are generally considered the main bioactive and characteristic ingredients.GJE has a rich and wide range of cultivation resources, with a low price, which contributes to its low-cost characteristics.It is well known and frequently used not only as an excellent natural colorant, but also as an important traditional medicine for the treatment of different diseases, such as clearing away heat, cooling the blood, and eliminating stasis to activate blood circulation.It has demonstrated notable efficacy in anti-inflammatory, hypoglycemic, hepatoprotective, and cholagogic aspects and is widely used in Chinese clinical prescriptions [29].Previous studies have reported that the 60% alcoholic extract of GJE exhibited significant hypoglycemic effects and improved insulin resistance [30,31].Iridoid glycosides were identified as one of the major kinds of components in GJE and have demonstrated hypoglycemic effects in animal models [32].However, the mechanism of action underlying these effects remains unclear.Furthermore, the absorption and metabolism of GJE and its primary active components also need to be investigated to evaluate their in vivo activity.
In this study, the interactions between 11,549 compounds from the traditional Chinese medicine systems pharmacology database (TCMSP) and DPP-IV were first predicted using a deep-learning model, and GJE was screened as a model drug for further study.Subsequently, sequence metabolism was employed to gain a more comprehensive understanding of the major absorbed components of GJE and their distribution in rats.This was followed by a comprehensive strategy of molecular docking and in vitro activity analyses to validate the potential active components in GJE.This approach combines deep-learning prediction, in vivo uptake distribution, and in vitro activity validation, and all experimental techniques and methods can be used as mature tools for screening and verifying related compounds.These tools do not depend on specific compounds, so they can be effectively extended to the study of more herbal ingredients.This approach holds great potential for similar studies in the future and can serve as a valuable methodological framework.

Validation of Deep-Learning Model
The activities of compounds in TCM were predicted using the drugBAN model, with the code available at GitHub (https://github.com/peizhenbai/DrugBAN/tree/main,accessed on 1 June 2022).The DPP-IV dataset was collected from PubChem (https:// pubchem.ncbi.nlm.nih.gov/,accessed on 1 June 2022) and used to train the model, which employed 80-20 splits for training and testing.Precision-recall curves plotting the true positive rate against the positive predictive value were then generated (Figure 1).It was observed that the area under the precision-recall curve (auPRC), which measures the ability of the model to correctly identify a compound, was favorable with a value of 0.73.This indicated that the model could more accurately identify DPP-IV-inhibitory compounds in our training set compared to random prediction (auPRC of 0.28).Further evaluation of the model's performance using various metrics such as the AUROC, F1 score, sensitivity, specificity, accuracy, and threshold score also exhibited superior results in the DrugBAN model (Table 1).

Deep-Learning Model Prediction and Filters
The optimized model was applied to predict the DPP-IV-inhibitory activities of 11,549 compounds from the TCMSP Database (https://old.tcmsp-e.com/index.php,accessed on 2 June 2022).The interactions were predicted with scores ranging from 0 to 1.The prediction results for the 11,549 compounds interacting with DPP-IV are shown in Supplementary Table S1.The top 30 compounds with the best inhibition capacity were selected to identify their respective sources, and the results are presented in Table 2.We investigated the major botanical sources of the top 30 compounds and screened them on the basis of the frequency of their use in herbal medicines in clinical prescriptions, their level of toxicity (https://db.ouryao.com/,accessed on 2 June 2022), and their potential for food medicine with homologous properties (http://www.foodmate.net/,accessed on 2 June 2022).Although compounds such as genipin 1-gentiobioside and others did not exhibit particularly significant affinity, it is worth noting that a considerable portion of the top 30 hit compounds were derived from GJE.Therefore, GJE was chosen as the model drug for further investigation.

Deep-Learning Model Prediction and Filters
The optimized model was applied to predict the DPP-IV-inhibitory activities of 11,549 compounds from the TCMSP Database (https://old.tcmsp-e.com/index.php,accessed on 2 June 2022).The interactions were predicted with scores ranging from 0 to 1.The prediction results for the 11,549 compounds interacting with DPP-IV are shown in Supplementary Table S1.The top 30 compounds with the best inhibition capacity were selected to identify their respective sources, and the results are presented in Table 2.We investigated the major botanical sources of the top 30 compounds and screened them on the basis of the frequency of their use in herbal medicines in clinical prescriptions, their level of toxicity (https://db.ouryao.com/,accessed on 2 June 2022), and their potential for food medicine with homologous properties (http://www.foodmate.net/,accessed on 2 June 2022).Although compounds such as genipin 1-gentiobioside and others did not exhibit particularly significant affinity, it is worth noting that a considerable portion of the top 30 hit compounds were derived from GJE.Therefore, GJE was chosen as the model drug for further investigation.

Identification of the Absorbed Components in Gardenia jasminoides Ellis
In order to provide a more comprehensive understanding of the major constituents of GJE and their distribution in rats, plasma samples obtained from different sites and different methods were analyzed using a sequential metabolism process [33], as reported previously.Briefly, the mesenteric vein/femoral vein plasma and the abdominal aorta plasma were collected from in situ intestinal perfusion and gavage, respectively.The mesenteric vein plasma samples were metabolized by enzymes of the intestinal wall, and both the intestine and liver contributed to the metabolic processes of the femoral vein plasma samples.Following metabolism by multiple organs and bacterial flora throughout the body, the abdominal aorta plasma samples served as a state of equilibrium.By comparing these samples, a clearer understanding of the absorption and metabolism site of GJE extract can be obtained.The extract of GJE was analyzed, and the results showed that a total of 46 components were identified, including iridoid glycosides, organic acids and their derivatives, monoterpenoids, and flavonoids, as illustrated in Supplementary Table S1.Moreover, a total of 38 prototype compounds were identified from drug-treated plasma samples through a meticulous comparison of their molecular formulas, fragment ions, and retention times with those of the parent compounds, with 38 in the mesenteric blood (MB) group, 25 in the femoral venous blood (FVB) group, and 14 in the abdominal aorta (AA) group, as illustrated in Table 3. DPP-IV is secreted by intestinal cells and enters the bloodstream to rapidly degrade GLP-1, thereby inhibiting its ability to stimulate insulin secretion [4,5].Therefore, components absorbed in the mesenteric vein and systemic circulation were all included in the analyses.It is of significance to note that the predominant absorbed compounds in rat plasma were iridoid glycosides, which was consistent with the results predicted by the deep-learning model.Taken together, the above study could give a comprehensive map of the dynamic metabolic process of GJE, which would effectively narrow the range of potentially bioactive components of GJE.

Molecular Docking Studies
The compounds that could be predicted by deep-learning models and could also be absorbed into the bloodstream (e.g., genipin 1-gentiobioside, shanziside, geniposide, geniposidic acid, shanziside methyl ester, scandoside) were used for molecular docking to verify their interactions with the DPP-IV binding site to verify the credibility of the docking.The constructed conditional parameters were used to re-dock the sitagliptin.The parameters were judged to be reasonable based on whether the root mean square deviation (RMSD) was less than or equal to 2 Å [34].The obtained RMSD value was 1.6079, indicating that the parameters were suitably set to reproduce the original binding pattern of the ligand receptor and were suitable for predicting the conformation of the ligand.The results indicated that the GJE components had been docked successfully with DPP-IV, as listed in Supplementary Table S2.Docking scores of the above six compounds absorbed into the blood and sitagliptin with DPP-IV are listed in the table below (Table 4).Understanding the molecular basis of ligand binding to receptors provides insights useful for rational drug design [35].Based on Libdock scores, the docking of compounds with DPP-IV was analyzed using genipin 1-gentiobioside as an example.As a reference, sitagliptin and the three amino acid residues of the DPP-IV binding site each formed three hydrogen bonds (Figure 2B), and the C-and N-terminal residues established salt bridges with Glu205 and Glu206 (Figure 2A).Additionally, hydrophobic interactions were noted, which have been demonstrated to augment the inhibition of DPP-IV [36,37].Regarding the docking mode of six compounds, it was found that, like sitagliptin, each compound could form various hydrogen bonds and hydrophobic interactions with key amino acid residues at the DPP-IV binding site.As shown in Figure 2D, the interaction between receptor and ligand occurred through hydrogen bond, hydrophobic, and electrostatic interactions.The molecule could establish five hydrogen bonds with DPP-IV, involving four oxygen atoms and one hydrogen atom from the glycosyl side chain.Additionally, it can engage with five residues located at the binding site of DPP-IV, specifically at positions Ser209, Glu205, Glu206, Pro550, and Gln553.The carboxymethyl and cyclopentane structures of the compound could form hydrophobic interactions with His126 and Phe357.As previously mentioned, they facilitated the substrate's binding to the catalytic site of the enzyme.The docking pose of genipin 1-gentiobioside and 6 key residues are shown in Figure 2C.
interactions.The molecule could establish five hydrogen bonds with DPP-IV, involving four oxygen atoms and one hydrogen atom from the glycosyl side chain.Additionally, it can engage with five residues located at the binding site of DPP-IV, specifically at positions Ser209, Glu205, Glu206, Pro550, and Gln553.The carboxymethyl and cyclopentane structures of the compound could form hydrophobic interactions with His126 and Phe357.As previously mentioned, they facilitated the substrate's binding to the catalytic site of the enzyme.The docking pose of genipin 1-gentiobioside and 6 key residues are shown in Figure 2C.Overall, this integrated method combining UPLC-HRMS with computer analysis is effective for screening active ingredients from complex traditional Chinese medicine systems.Therefore, based on the virtual screening and docking process described above, combined with commercially available monomers, these six compounds were selected for subsequent DPP-IV inhibition experiments.The molecular structures of these six compounds and sitagliptin are shown in Figure 3. Overall, this integrated method combining UPLC-HRMS with computer analysis is effective for screening active ingredients from complex traditional Chinese medicine systems.Therefore, based on the virtual screening and docking process described above, combined with commercially available monomers, these six compounds were selected for subsequent DPP-IV inhibition experiments.The molecular structures of these six compounds and sitagliptin are shown in Figure 3.

In Vitro Activity Assay
The IC50 value for the positive control, sitagliptin, was 28.91 ± 0.33 nM, which was similar to that reported in the literature [38] and demonstrated the suitability of this system for activity determination.The present study identified that GJE showed certain

In Vitro Activity Assay
The IC 50 value for the positive control, sitagliptin, was 28.91 ± 0.33 nM, which was similar to that reported in the literature [38] and demonstrated the suitability of this system for activity determination.The present study identified that GJE showed certain inhibitory activity on DPP-IV (IC 50 2270 ± 230 µg/mL, Figure 4A).As shown in Figure 4C, the inhibitory activity showed a considerable concentration dependence manner.As shown in Figure 4B, the six GJE compounds inhibited the activity of DPP-IV, and genipin 1-gentiobioside (2) exhibited better anti-DPP-IV activity.The order of potency of the compounds tested was 2 > 6 > 4 > 1 > 5 > 3.This is consistent with the order of activity predicted by the DTI model.All compounds demonstrated concentration dependence.As the DPP-IV inhibitory activity demonstrated moderate potency, no activity assay at a higher concentration was conducted.The content of iridoid glycosides in GJE extract can be increased from 5.31% to 31.72% after D101 macroporous resin treatment, which indicates that this macroporous resin can be successfully used to enrich and purify iridoid glycosides in GJE, and it exhibits the most notable inhibitory activity (IC 50 53 ± 0.63 µg/mL) (Figure 4D).There is presumed to be a cooperative interaction between individual compounds on the inhibitory activity of DPP-IV.
Molecules 2023, 28, x FOR PEER REVIEW 11 future research should focus on designing and synthesizing a series of derivatives ba on the ligand-receptor interaction mode results, aiming to improve the compou affinity towards the target.

Materials
GJE was supplied by Beijing Tong Ren Tang Co., Ltd.(Beijing, China), which identified by Prof. Jingjuan Wang (Beijing University of Chinese Medicine, Bei China).A voucher specimen has been deposited in B401 Laboratory of School of Chi Materia Medica, Beijing University of Chinese Medicine (voucher No. 211221 Methanol, MS-grade acetonitrile (purity ≥ 99.9%), and formic acid (purity ≥ 99) w provided by Thermo Fisher Scientific (Fairlawn, NJ, USA).Absolute ethanol (puri 99.9%) was supplied by Tianjin Damao Chemical Reagent Factory (Tianjin, Ch Gardenoside, genipin, scandoside, genipin 1-gentiobioside, geniposidic acid, shanzhis This study found that the fraction of iridoid glycosides in GJE has significant inhibitory activity on DPP-IV.Genipin 1-gentiobioside has been identified as a novel and promising DPP-IV inhibitor, potentially serving as a lead compound for type 2 DM treatment.At the same time, extracting the active parts of natural products for the treatment of diabetes is also a promising strategy.However, it is important to note that the binding ability and selectivity of natural products towards the protein target require further enhancement when compared to commercially available drugs.To address this, future research should focus on designing and synthesizing a series of derivatives based on the ligand-receptor interaction mode results, aiming to improve the compounds' affinity towards the target.

Data Collection and Preparation
The training dataset for this study was obtained from PubChem, comprising compounds that have been experimentally confirmed through assay.After representing them by SMILES, the collected compounds were curated to eliminate duplicates, inorganic material, and mixtures.Additionally, the protein sequences were extracted from the UniProt protein database using the UniProt ID as a reference.Accordingly, a dataset containing 1691 protein-drug samples was obtained, in which 536 were positive samples and 1155 were negative ones.A protein-drug sample is positive if the IC 50 is less than 10 µM, or it is negative if the IC 50 is greater than 10 µM.The data were divided into a training set (422 positive samples, 930 negative samples), a validation set (62 positive samples, 107 negative samples), and a test set (52 positive samples, 118 negative samples) with a guaranteed ratio of positive to negative samples.

Deep-Learning Model
The deep-learning model used in this work builds on that applied in DrugBAN [35] (https://github.com/peizhenbai/DrugBAN/tree/main,accessed on 2 June 2022), a deeplearning bilinear attention network (BAN) framework with adversarial domain adaption to explicitly learn pair-wise interactions between drugs and targets.For each compoundprotein pair, firstly, a graph-based molecular representation was generated from the compound's simplified molecular-input line-entry system (SMILES) string, and a protein representation was encoded by 1D convolutional neural network (1D CNN) blocks from the protein sequence.Then, a bilinear attention network module was used to learn local interactions between encoded drug and protein representations.Finally, the interaction score was output by a fully connected classification layer.More detailed information is available in Ref. [39].

Model Optimization and Evaluation
A binary activity value of 0 (no inhibition of DPP-IV activity) or 1 (possesses DPP-IV inhibitory activity) was assigned to each compound-protein pair in this work.To evaluate the model performance, we used the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) as the major metrics.The training was performed for 100 epochs using random 80-10-10 training-validationtesting splits of the dataset.By default, and consistent with previous work [39], the binary cross entropy was used as the loss function.Precision-recall curves were generated by comparing the prediction score to the withheld activity value for each compound-protein pair in the testing subset using scikit-learn.In addition, we also report accuracy, sensitivity, and specificity at the threshold of the best F1 score (Supplementary Table S1).

Model Prediction
For the final model, we used the full data (training, valid, and test data) to train.The model was then deployed to predict the compounds in HSD and the DPP-IV protein target score.There were several different source DPP-IV protein targets we selected to pair with each compound.Mouse, rat, and human DPP4 enzymes are very similar in structure, and the inhibitory effect of some inhibitors on the mouse enzyme may occur with the human enzyme as well.In the existing data study, some inhibitors used rat enzymes, some used mouse enzymes, and some used human enzymes to perform the experiment.These data were used as our training data to train the model.We wanted to make predictions for enzymes from different animals so as not to miss some compounds.The detailed information is shown in Table 5.

Preparation of Sample Solutions
The GJE was crushed into a fine powder using a grinder and subsequently subjected to sonication twice with 50% ethanol, at a solid-liquid ratio of 1:10 and at room temperature, for a duration of 20 min each time, using approximately 300 g of GJE powder.Then, the supernatant was transferred into an evaporating dish and concentrated using a water bath at 55 • C to 300 mL (1 g/mL), which was used for animal studies and macroporous resin column chromatography.The GJE solution (1 g/mL) was subjected to chemical analysis by diluting it to a concentration of 10 mg/mL crude drug.Six reference standards were individually dissolved in methanol using a 10 mL volumetric flask and stored at 4 • C. Prior to LC-MS analysis, the sample solution was filtered through a 0.22 µm pore size filter.

Enrichment of the Iridoid Glycoside Extract of GJE with Macroporous Resin
A D-101 macroporous resin column was used for the enrichment of the iridoid glycoside extract of GJE.Firstly, the effects of sample flow rate, sample concentration, elution solvent type, concentration, flow rate, and dosage on the adsorption efficiency were investigated separately.The enriched iridoid glycoside from the GJE extract was quantified by HPLC, using geniposide, genipin 1-gentiobioside, and geniposidic acid as the standard materials.The results showed that the best purification process, with a sample flow rate of 1 mL/min, a sample volume of 2.0 g of raw drug/g resin, an elution solvent of 30% ethanol, an elution flow rate of 2 mL/min, and a dosage of 30 mL, could effectively enrich and purify the iridoid glycosides of GJE.

Animals
Sprague-Dawley rats (males, 200-250 g) were procured from Spfanimals Laboratory Animal Technology Co., Ltd.(Beijing, China).The animals were maintained under controlled conditions of a 12 h light/dark cycle, a temperature range of 25-27 • C, and a relative humidity of 50-70%.Prior to the commencement of the study, the rats underwent an acclimatization period of no less than 7 days, during which they were provided ad libitum access to standard laboratory chow and water.Additionally, the animals were subjected to a 12 h water fast before the initiation of the experimental procedures.All protocols involving animal treatment in this study were ethically reviewed and approved by the Animal Ethics Committee of Beijing University of Chinese Medicine, under approval number BUCM-4-2022061502-2062.

In Vivo Metabolic Experiments
The in situ closed-loop technique is a well-established method utilized for the investigation of intestinal absorption.This approach enables the emulation of physiological conditions, allowing for the assessment of intestinal absorption over a predetermined timeframe.Notably, this model permits the precise measurement of absorption within specific anatomical segments of the rat intestine, namely, the jejunum, ileum, and colon [40].The surgical procedures for IPVS were executed following established protocols outlined in the literature [41].As shown in Figure 5, prior to the initiation of the perfusion surgery, five to seven rats were selected as blood donors for each experiment.Approximately 50-70 mL of blood was extracted from the abdominal aorta using a heparinized syringe and then incubated in a 37 • C water bath.This blood was subsequently prepared for transfusion into the recipient rat via the jugular vein, compensating for any blood loss through the mesenteric vein.The recipient rat was anesthetized through intraperitoneal administration of anesthetics, positioned supine on the operating table, and had its left external jugular vein exposed and cannulated with a 24-gauge i.v.catheter, facilitating the transfusion of blood from the donor reservoir.The abdominal cavity was meticulously opened along the abdominal line to expose the jejunum and the mesenteric/femoral veins.The jejunal segment ends were incised surgically, and two silicone tubes were carefully inserted and secured through a small incision.The jejunal segment was flushed with 37 • C saline until the effluent was clarified.Subsequently, the inlet tube was connected to the syringe pump.A catheter containing heparinized saline was then cannulated into the mesenteric vein (for intestinal wall metabolism) or femoral vein (for hepatic metabolism) and secured using instant glue.In cases involving the examination of intestinal wall absorption, the hepatic portal vein required ligation.The GJE solution (1 g/mL) was incubated in a water bath at 37 • C to maintain temperature.The solution was then pumped at a flow rate of 0.2 mL/min, and blood was pumped at a flow rate of 0.3 mL/min.
Molecules 2023, 28, x FOR PEER REVIEW 14 of 1 aorta using a heparinized syringe and then incubated in a 37 °C water bath.This bloo was subsequently prepared for transfusion into the recipient rat via the jugular vein compensating for any blood loss through the mesenteric vein.The recipient rat wa anesthetized through intraperitoneal administration of anesthetics, positioned supine o the operating table, and had its left external jugular vein exposed and cannulated with 24-gauge i.v.catheter, facilitating the transfusion of blood from the donor reservoir.Th abdominal cavity was meticulously opened along the abdominal line to expose th jejunum and the mesenteric/femoral veins.The jejunal segment ends were incise surgically, and two silicone tubes were carefully inserted and secured through a sma incision.The jejunal segment was flushed with 37 °C saline until the effluent was clarified Subsequently, the inlet tube was connected to the syringe pump.A catheter containin heparinized saline was then cannulated into the mesenteric vein (for intestinal wa metabolism) or femoral vein (for hepatic metabolism) and secured using instant glue.I cases involving the examination of intestinal wall absorption, the hepatic portal vei required ligation.The GJE solution (1 g/mL) was incubated in a water bath at 37 °C t maintain temperature.The solution was then pumped at a flow rate of 0.2 mL/min, an blood was pumped at a flow rate of 0.3 mL/min.Within a span of two hours, mesenteric/femoral vein blood was collected int heparinized centrifuge tubes.Plasma was obtained through centrifugation of bloo samples at 4000 rpm for 10 min, followed by protein precipitation using methano Following vortexing for 10 min, the mixture underwent centrifugation at 8000 rpm for a additional 10 min.The resulting organic layer was carefully transferred to a separate tub and subsequently dried under a stream of nitrogen at 40 °C.The resulting residue wa then reconstituted in 200 μL of methanol for subsequent LC/MS analysis.

Intragastric Administration
The rats were randomly divided into eight groups (three animals each Subsequently, the treatment groups received a 4 mL oral gavage of GJE solution (1 g/mL Within a span of two hours, mesenteric/femoral vein blood was collected into heparinized centrifuge tubes.Plasma was obtained through centrifugation of blood samples at 4000 rpm for 10 min, followed by protein precipitation using methanol.Following vortexing for 10 min, the mixture underwent centrifugation at 8000 rpm for an additional 10 min.The resulting organic layer was carefully transferred to a separate tube and subsequently dried under a stream of nitrogen at 40 • C. The resulting residue was then reconstituted in 200 µL of methanol for subsequent LC/MS analysis.

Intragastric Administration
The rats were randomly divided into eight groups (three animals each).Subsequently, the treatment groups received a 4 mL oral gavage of GJE solution (1 g/mL), and the corresponding control groups were administered 4 mL of saline.Prior to experimentation, the rats underwent anesthesia via intraperitoneal injection of chloral hydrate (400 mg/kg).Subsequently, blood samples were collected from the abdominal aorta at specific time intervals (0.5, 1, 1.5, and 2 h), with three rats sampled at each time point.At the end of this study, all rats were sacrificed by conducting abilateral thoracotomy.
The MS conditions were as follows: alternate switching (−)/(+) ESI full scan mode, the capillary temperature was 320 • C, auxiliary temperature was 250 • C, positive spray voltage was set at +3.5 kV, negative spray voltage was set at −3.0 kV, shealth gas (N 2 ) flow was 35 Arb, aux gas flow rate was 10 Arb.Full MS scans were acquired in the range of m/z 100-1500, and the collision energy was set at 20, 30, and 40 eV.The MS/MS experiments were set as data-dependent scans.Data acquisition and processing were accomplished with Xcalibur software (version 4.2; Thermo Fisher Scientific).

Molecular Docking
The molecular docking process was conducted using the LibDock module of Discovery Studio 2019 software (Accelrys Software Inc., San Diego, CA, USA).The molecular structures of the compounds were obtained from the ChemSpider website (www.chemspider.com,accessed on 2 June 2022), and the crystal structure of DPP-IV, with the inhibitor vildagliptin bound in the active site, was obtained from the protein data bank (PDB ID:6B1E).Water molecules and co-crystallized ligands were removed from the protein structure.Atom types, charges, and hydrogen atoms were assigned to both the protein and ligand structures.A radius of 12 Å was set for the docking process, and the catalytic domain of DPP-IV consists of Ser630, Asp708, and His740 [42].Also, Glu205 and Glu206 play a critical role in the activity of this enzyme [34,43].Next, the 38 candidate ligand compounds were subjected to the "prepare ligands" module to match with the receptor.Subsequently, the ligands with diverse conformations were rigidly superimposed onto the map to ascertain the optimal interaction and energy optimization.The optimal conformation of each compound could be determined based on its highest docking score, followed by the arrangement of the compounds in descending order of their respective docking scores.

In Vitro DPP-IV Inhibition Assay
The DPP-IV inhibition assay was employed for in vitro biological activity evaluation of compounds [44].Based on the results of DTI model prediction and the prototypical uptake of major components of GJE in rats, the iridoid glycosides' potent fractions of GJE were enriched with macroporous resin, and the DPP-IV inhibitory activity of iridoid glycosides' potent fractions and six iridoid glycosides, namely, geniposide (1), genipin 1-gentiobioside (2), scandoside (3), geniposidic acid (4), shanzhiside methyl ester (5), and shanzhiside (6), were tested (0.02, 0.05, 0.1, 0.2, 0.4, 0.8, and 1 mM).Sitagliptin was used as a positive control, and the assay was performed in 96-well microplates based on optimized conditions.Briefly, test compounds at various concentrations (40 µL), diluted with assay buffer (0.1 M Tris-HCl buffer, 0.1 M NaCl, and 1 mM EDTA, pH = 8.0), were added to a 96-well clear-bottom microtiter plate.Subsequently, 20 µL of 1.75 mM Gly-Pro-pNA was added and thoroughly mixed, followed by a 10 min incubation at 37 • C. The reaction was initiated by adding 40 µL of 0.4 µg/mL human recombinant DPP-IV.A control group without the inhibitor was also included.After a 30 min incubation, absorbance was measured at 405 nm using a Skanlt RE absorbance reader (Thermo Scientific, San Jose, CA, USA), and IC 50 values were calculated using the provided equation: DPP-IV inhibition(%) = 1 − A(sample) − A(sample control) A(negative reaction) − A(negative control) × 100% In the provided experimental setup: A(sample control) included both the sample and substrate (Gly-Pro-pNA).A(negative reaction) comprised DPP-IV and substrate.A(negative control) solely contained the substrate.The term IC 50 denotes the concentration of inhibitors necessary to suppress 50% of DPP-IV activity.

Conclusions
This study presents a novel approach containing deep-learning model prediction, absorption and metabolism characteristics, virtual screening, and target activity screening.This powerful approach can be effectively employed for the discovery of potential active ingredients or lead compounds in TCM.This finding presents a novel avenue for the exploration of potent and low-toxicity DPP-IV inhibitors derived from herbal medicine.This study found that the effective fraction of iridoid glycosides in GJE has significant inhibitory activity on DPP-IV, which will guide the effective development of GJE as a hypoglycemic Chinese medicine in the future.In addition, it can also be used as a dietary supplement for the prevention and adjuvant treatment of diabetes.In contrast, the compound genipin 1-gentiobioside has been identified as a promising novel DPP-IV inhibitor that is effective, low-cost, and non-toxic, and its potential as a lead compound for the management of T2D warrants further investigation.In our research, we utilized the DrugBAN model [39] to swiftly forecast the activity of over 10,000 traditional Chinese medicine compounds in a dataset.This computational methodology yielded outcomes within a brief one-week period.Conversely, conventional experimental techniques not only involve significant financial costs but also necessitate several months to obtain the corresponding activity data.These findings demonstrate that combining experiment with computation and deep learning can enable one to efficiently discover DPP-IV inhibitory compounds in TCM and rapidly elucidate their potential mechanisms of action.Despite the promising nature of our deep-learning approach, it is imperative to acknowledge its inherent limitations.Deeplearning techniques are commonly restricted by the quality and diversity of the utilized training data.Although our dataset comprised 1691 compound-protein samples, providing an adequate number of active compounds for training models with predictive capabilities, enhancing the structural diversity within the training set is imperative.This augmentation will enable models to better discern the chemical substructures responsible for conferring activity, thereby facilitating the discovery of a wider array of structurally diverse inhibitors.Overall, this innovative method has great potential for the rapid screening of active ingredients in TCM and provides a new research idea and material basis for the development of new T2D drugs.

Figure 1 .
Figure 1.The true positive rate against the positive predictive value of DrugBAN and random prediction model.

Figure 1 .
Figure 1.The true positive rate against the positive predictive value of DrugBAN and random prediction model.

Figure 2 .
Figure 2. Molecular docking between sitagliptin, genipin 1-gentiobioside, and DPP-IV.(A) The specific residues of sitagliptin in the binding pockets/active sites of DPP-IV.(B) A 2D view of the interaction of sitagliptin with DPP-IV.(C) The specific residues of genipin 1-gentiobioside in the

Figure 2 .
Figure 2. Molecular docking between sitagliptin, genipin 1-gentiobioside, and DPP-IV.(A) The specific residues of sitagliptin in the binding pockets/active sites of DPP-IV.(B) A 2D view of the interaction of sitagliptin with DPP-IV.(C) The specific residues of genipin 1-gentiobioside in the binding pockets/active sites of DPP-IV.(D) A 2D view of the interaction of genipin 1-gentiobioside with DPP-IV.

Molecules 2023 ,
28, x FOR PEER REVIEW 10 of 19 binding pockets/active sites of DPP-IV.(D) A 2D view of the interaction of genipin 1-gentiobioside with DPP-IV.

Figure 3 .
Figure 3. Molecular structures of six compounds with sitagliptin.

Figure 3 .
Figure 3. Molecular structures of six compounds with sitagliptin.

Figure 4 .
Figure 4.The DPP-IV inhibitory activity of Gardenia jasminoides Ellis extract, six iridoid glycos and iridoid glycosides.(A) The inhibition rate-concentration curve of GJE extracts.(B) Inhib activities of constituents 1-6 of GJE on DPP-IV in vitro.Final concentration of each sample: 1 Data were expressed as mean ± SEM of triplicate experiments.(C) Concentration depend inhibitory activity of the iridoid glycosides of GJE.(D) The inhibition rate-concentration curv the iridoid glycosides.

Figure 4 .
Figure 4.The DPP-IV inhibitory activity of Gardenia jasminoides Ellis extract, six iridoid glycosides, and iridoid glycosides.(A) The inhibition rate-concentration curve of GJE extracts.(B) Inhibitory activities of constituents 1-6 of GJE on DPP-IV in vitro.Final concentration of each sample: 1 mM.Data were expressed as mean ± SEM of triplicate experiments.(C) Concentration dependency inhibitory activity of the iridoid glycosides of GJE.(D) The inhibition rate-concentration curve of the iridoid glycosides.

Figure 5 .
Figure 5. Schematic illustration depicting the process of in situ intestinal perfusion with venou sampling.

Figure 5 .
Figure 5. Schematic illustration depicting the process of in situ intestinal perfusion with venous sampling.

Table 1 .
Model performance comparison between trained and random prediction model.

Table 2 .
Top 30 compounds' sources and predicted affinity scores with different sources of DPP-IV.

Table 3 .
Identification of compounds from Gardenia jasminoides Ellis and the absorbed components in different plasma samples.

Table 3 .
Cont. : MB: mesenteric blood; FVB: femoral venous blood; AA: abdominal aorta.* means components were compared with reference standards.+ means that the component is absorbed into the blood.-means that the component is not absorbed into the blood. Note