Genome-Wide Differential Methylation Profiles from Two Terpene-Rich Medicinal Plant Extracts Administered in Osteoarthritis Rats

Extracts from the plants Phlomis umbrosa and Dipsacus asperoides—which are widely used in Korean and Chinese traditional medicine to treat osteoarthritis and other bone diseases—were used to treat experimental osteoarthritis (OA) rats. Genome-wide differential methylation regions (DMRs) of these medicinal-plant-treated rats were profiled as therapeutic evidence associated with traditional medicine, and they need to be investigated further using detailed molecular research to extrapolate traditional practices to modern medicine. In total, 49 protein-encoding genes whose expression is differentially regulated during disease progression and recovery have been discovered via systematic bioinformatic analysis and have been approved/proposed as druggable targets for various bone diseases by the US food and drug administration. Genes encoding proteins involved in the PI3K/AKT pathway were found to be enriched, likely as this pathway plays a crucial role during OA progression as well as during the recovery process after treatment with the aforementioned plant extracts. The four sub-networks of PI3K/AKT were highly regulated by these plant extracts. Overall, 29 genes were seen in level 2 (51–75%) DMRs and were correlated highly with OA pathogenesis. Here, we propose that these genes could serve as targets to study OA; moreover, the iridoid and triterpenoid phytochemicals obtained from these two plants may serve as potential therapeutic agents.


Introduction
In the genomic era, genome-wide methylation profiles enable precise combinatorial drug screening for the treatment of various human diseases [1]. The one-drug, one-target mechanism is associated with various adverse effects, as it targets a single node in the complex molecular networks. Although moving toward combinatorial drug therapy involves another complex layer, i.e., the selection of an effective drug ratio for specific diseases-a laborious and expensive task [2]. To date, most of the functional combinations of drugs have been identified either from clinical trials or via prescription in medical practices. Similar approaches have been used in traditional medical practices since ancient times, but they have been associated with adverse reactions as well [3].
With the help of recent advancements, we can formulate drugs that exhibit reduced toxicity and can help people gain access to low cost medicines for various diseases. However, as most traditional medicine formulations prepared using regional plants are documented in the local language, the understanding of the mechanism of action of certain regional phytochemical products is further complicated by the language barrier. Thus, to identify a molecular signature that can be universally understood, we initiated genome-wide differential methylation screening for two oriental medicinal plant extracts (i.e., Phlomis umbrosa (PU) [4] and Dipsacus asperoides (DA) [5]), which were used to treat rats with mono-iodoacetate (MIA)-induced osteoarthritis (OA).
OA is a heterogeneous disease caused by various unknown factors [6] and is characterized by chronic joint pain caused by destruction of the cartilage and synovial membrane tissue at the knee joint. Extensive efforts have been made to understand the disease pathogenicity and to identify biomarkers for OA. For example, the insight of cell heterogeneity was assessed via a single-cell transcriptome study that identified the different clusters of synoviocytes (rich in the glycoproteins required for lubrication) and chondrocytes (which produce the structural components of cartilage) from the knee cartilage [7].
One of the main challenges in treating OA is the low-grade inflammation, which causes the destruction of the smooth tissues around the joints [6]. Interestingly rheumatoid arthritics (RA) also has the same phenotype; however, the underlying mechanisms are different from those involved in OA. The repurposing of RA drugs for OA failed due to this very reason [6]. To date, none of the approved drugs can reverse the phenotypes associated with OA and other bone-related diseases. Only a few molecules that target signaling pathways, such as Wnt and PI3K/AKT, are currently undergoing clinical trials for investigating their effects on OA [6,8]. Among these, some steroidal drugs are structurally similar to di-and tri-terpenoids [6].
Particularly, iridoids are natural bioactive components that exert anti-inflammatory, hepatoprotective, neuroprotective, anti-tumor, and hypolipidemic effects [14]. For instance, loganin is an iridoid present in certain plants that exhibits a neuroprotective effect via the insulin-like growth factor-1 receptor (IGF1R) and glucagon-like peptide-1 receptor (GLP1R). It is also used in combination with LY294002 (PI3K/AKT inhibitor) to reduce extracellular matrix degradation in chondrocytes [17]. These two plant extracts were able to reduce an experimental OA phenotype in rat models, and nearly normalized the condition of the rats [4,5].
However, the molecular mechanisms and signaling pathways involved when cells are subjected to chemical perturbations are not clearly understood. Certain facts are known, including that methylation changes mostly occur during development and aging, and relatively few methylation changes are observed upon exposure to environmental stress factors. Considering data from several genome-wide methylation studies, we noted that chemical perturbations were observed in only five to ten percent of differentially methylated regions [18]. Taking these factors into consideration, we studied the differentially methylated regions (DMRs) in rats with MIA-induced OA treated with the above mentioned plant extracts. Additionally, the DMRs were correlated with molecular entities curated on DrugBank and the human genome to support this experimentalist approach.

CpG Profiles and DMRs
The model includes three groups with three biological replicates each and a control ( Figure 1). On average, 5.8 Gb of bases for each sample were sequenced, and 74.8% of the sequenced bases were mapped to the reference genome with 5.5 log2 coverage ( Figure S1A,B). The principal component analysis and dendrogram cluster of CpG methylation sites clearly Plants 2021, 10, 1132 3 of 14 revealed the variance in the biological replicates ( Figure S2A,B). In total, 1,861,526 CpG regions were covered-15,786 (94.8%) of the genes from the reference genome (Figure 2A), distributed across 15,090 (90.6%) promoters, 13,878 (92%) genic, and 11,143 (80.3%) 3-prime untranslated regions (3 -UTR) ( Figure 2B)-and genes containing the CpG regions from all three region combinations are shown in a Venn-diagram in Figure 2C.

CpG Profiles and DMRs
The model includes three groups with three biological replicates each and a control ( Figure 1). On average, 5.8 Gb of bases for each sample were sequenced, and 74.8% of the sequenced bases were mapped to the reference genome with 5.5 log2 coverage ( Figure  S1A,B). The principal component analysis and dendrogram cluster of CpG methylation sites clearly revealed the variance in the biological replicates ( Figure S2A,B). In total, 1,861,526 CpG regions were covered-15,786 (94.8%) of the genes from the reference genome (   Approximately, 9511 (57.1%) genes contained CpGs in all three annotated regions. To obtain clear numbers in DMRs, we divided the DMRs into three levels as shown in Figure 1. In total, 9765 (58.6%) of the genes were shown to contain level 1 DMRs and only 801 (4.7%) contained level 2 and level 3 DMRs, which we consider the potential gene sub-set containing key markers for drug discovery ( Figure 3A). The DMRs from all three annotated locations are presented as a Venn-diagram ( Figure 3B), which shows that 929 genes contained CpGs from all three regions. With respect to CpGs, i.e., 50,415 (2.7%) were DMRs, and among those, 10,028, 29,400, and 10,987 DMRs were observed in the promoter, genic, and 3 -untranslated regions, respectively ( Figure 3C). Approximately, 9511 (57.1%) genes contained CpGs in all three annotated regions. To obtain clear numbers in DMRs, we divided the DMRs into three levels as shown in Figure 1. In total, 9765 (58.6%) of the genes were shown to contain level 1 DMRs and only 801 (4.7%) contained level 2 and level 3 DMRs, which we consider the potential gene subset containing key markers for drug discovery ( Figure 3A). The DMRs from all three annotated locations are presented as a Venn-diagram ( Figure 3B), which shows that 929 genes contained CpGs from all three regions. With respect to CpGs, i.e., 50,415 (2.7%) were DMRs, and among those, 10,028, 29,400, and 10,987 DMRs were observed in the promoter, genic, and 3′-untranslated regions, respectively ( Figure 3C).  Approximately, 9511 (57.1%) genes contained CpGs in all three annotated regions. To obtain clear numbers in DMRs, we divided the DMRs into three levels as shown in Figure 1. In total, 9765 (58.6%) of the genes were shown to contain level 1 DMRs and only 801 (4.7%) contained level 2 and level 3 DMRs, which we consider the potential gene subset containing key markers for drug discovery ( Figure 3A). The DMRs from all three annotated locations are presented as a Venn-diagram ( Figure 3B), which shows that 929 genes contained CpGs from all three regions. With respect to CpGs, i.e., 50,415 (2.7%) were DMRs, and among those, 10,028, 29,400, and 10,987 DMRs were observed in the promoter, genic, and 3′-untranslated regions, respectively ( Figure 3C).

DMRs Common between Plant Treatments and Experimentally-Induced OA
To identify genes whose expression is regulated during MIA-induced disease onset and the subsequent treatment with medicinal plant extracts, we divided the dataset into four subsets, i.e., Set 1: OA (control vs MIA); Set 2: Phlomis umbrosa extract-treated (PUE); Plants 2021, 10, 1132 5 of 14 Set 3: Dipsacus asperoides extract-treated (DAE); and Set 4: the difference between PUE and DAE ( Figure 1). We plotted DMRs in set one to three as a Venn-diagram ( Figure 4). which a response was observed upon treatment in Figure 4 (subset 1, genes that responded to PUE; and subset 2, genes that responded to DAE).
Here, we selected the following genes: genes that exhibited level 2 expression and DMR signals in the regulatory the regions, i.e., promoter and 3′-UTR; and gene targets approved by the FDA to treat multiple diseases, for which some experimental evidence is available on DrugBank. In our analysis, CKMT2, FDXR, FGR, GRIN2C, LGALS1, PCK1, PDXP, and THTPA responded to both plant extracts. Further, OGFOD1 exhibited a DMR signal in response to DAE treatment, whereas SLC7A5, SOAT2, TAGLN2, and TUBD1 exhibited a DMR signal in response to PUE treatment. All combinations of genes are presented in Supplementary File S1.

Functional Annotation of CpGs
A systematic bioinformatic analysis was performed to simplify the target selection protocol in the subsequent experiments. Candidate molecules that have been proposed to be serve as drug targets were selected for use in animal experiments. The following public databases were used: DrugBank, to identify drugs that target the selected molecules; and the human proteome atlas database, to obtain information regarding FDA-approved and Here, we considered that genes that overlapped between two gene sets i.e., a gene set containing genes that responded to chemical perturbation with MIA, which induced cartilage degradation and OA development, and a gene set that contained genes that responded to treatment with these medicinal plant extracts could hold the key to treating OA. These gene sets were termed subset 1 and subset 2 based on the plant extract against which a response was observed upon treatment in Figure 4 (subset 1, genes that responded to PUE; and subset 2, genes that responded to DAE).
Here, we selected the following genes: genes that exhibited level 2 expression and DMR signals in the regulatory the regions, i.e., promoter and 3 -UTR; and gene targets approved by the FDA to treat multiple diseases, for which some experimental evidence is available on DrugBank. In our analysis, CKMT2, FDXR, FGR, GRIN2C, LGALS1, PCK1, PDXP, and THTPA responded to both plant extracts. Further, OGFOD1 exhibited a DMR signal in response to DAE treatment, whereas SLC7A5, SOAT2, TAGLN2, and TUBD1 exhibited a DMR signal in response to PUE treatment. All combinations of genes are presented in Supplementary File S1.

Functional Annotation of CpGs
A systematic bioinformatic analysis was performed to simplify the target selection protocol in the subsequent experiments. Candidate molecules that have been proposed to be serve as drug targets were selected for use in animal experiments. The following public databases were used: DrugBank, to identify drugs that target the selected molecules; and the human proteome atlas database, to obtain information regarding FDA-approved and proposed targets to understand the importance of the target selection. Rat and human gene orthologs were identified using the Alliance database, and OA-related disease and pathway annotations were obtained from the Rat Genome Database (RGD) as described in the methods section.
Finally, a DMR network was established using all this information and is presented in Supplementary File S1. A combination of annotations is presented in Figure 5A,B, and our results showed that 151 genes contained level 2 DMRs and were connected with other three databases. Among these, 49 genes belonged to subset 1 and subset 2. Among these, drug information was present for 33 genes on DrugBank ( Figure 6A), and 16 genes encoded proteins that are proposed to be druggable according to the Human Protein Atlas ( Figure 6B). These genes could serve as effective targets to conduct further experiments aimed at investigating the effects of these two medicinal plant extracts or specific components identified in the extracts in the context of OA treatment (Table S1). our results showed that 151 genes contained level 2 DMRs and were connected with other three databases. Among these, 49 genes belonged to subset 1 and subset 2. Among these, drug information was present for 33 genes on DrugBank ( Figure 6A), and 16 genes encoded proteins that are proposed to be druggable according to the Human Protein Atlas ( Figure 6B). These genes could serve as effective targets to conduct further experiments aimed at investigating the effects of these two medicinal plant extracts or specific components identified in the extracts in the context of OA treatment (Table S1).
Among the genes in the two subsets, a few were notable, i.e., the gene encoding insulin-like growth factor-1 receptor (IGF1R), which has been shown to be targeted by loganin (an iridiod present in DAE) during the treatment of neurotoxicity in previous studies (Table S1), and CHST11, recently identified as an osteo-chondrodysplasia marker in a large scale genomic study [19,20], which has also been proposed as a druggable candidate ( Figure 6B). These targets can be isolated from data analyzing the effect of these plant extracts in organisms.

PI3K/AKT Signaling Pathway
To understand how these two medicinal plants extracts regulated the signaling networks in OA, the level 2 DMRs were subjected to KEGG pathway enrichment analysis, resulting in 26 enriched pathways (Table 1). Among these, the PI3K/AKT pathway was highly enriched, followed by the cancer pathways. The PI3K/AKT signaling pathways comprise the following: ECM-receptor signaling, insulin signaling, focal adhesion signaling, and AMPK signaling (Figure 7). Among the genes in the two subsets, a few were notable, i.e., the gene encoding insulin-like growth factor-1 receptor (IGF1R), which has been shown to be targeted by loganin (an iridiod present in DAE) during the treatment of neurotoxicity in previous studies (Table S1), and CHST11, recently identified as an osteo-chondrodysplasia marker in a large scale genomic study [19,20], which has also been proposed as a druggable candidate ( Figure 6B). These targets can be isolated from data analyzing the effect of these plant extracts in organisms.
In total, 29 genes in PI3K/AKT pathway were level 2 and subset 1 and 2. Among those, eight genes (COL11A2, COL3A1, CHAD, LAMA4, COL1A1, COL4A3, LAMC1, and COL2A1) were involved in ECM-receptor interaction and five genes (EFNA5, KITLG, ANGPT4, VEGFA, and FGF2) encoded growth factor receptors, and five genes (EPHA2, FGFR2, IGF1R, FGFR3, and EGFR) encoded macrophage colony-stimulating factor 1 (RTK) receptor. In addition to these genes, PEPCK is responsible for the cell metabolism, and its expression was highly regulated in all three sets. These genes also contain DMRs in the regulatory modules ( Figure S4). The expression of these genes could be modulated to destroy or reform the cartilage and synovial membrane tissues around the knee joint.

Discussion
This study was performed to understand the "many drugs with many targets" principle (also known as polypharmacy) by using the PU and DA extracts to treat MIA-induced OA. The biological replicates of the samples in this study ensured the significance of the findings (observed DMRs in genes known to be associated with "osteo"-related diseases). To understand the detailed functions of genes, correlations were identified between the ontologies/functional terms downloaded from well-known databases and the genome-wide methylation profile datasets.
Our OA dataset could enable further research in this field by OA researchers. PU and DA are rich in iridoid glycosides, (Table S1) that have been studied for their effect on OA in two forms, i.e., loganin and Shahzhiside methylester [21]. The main advantage of iridoid glycosides in the treatment of OA is their hepatoprotective effect [13,14], especially considering the fact that the major factor limiting the use of natural extracts to treat the diseases is liver toxicity.
Saponins also exhibit functional activities in the context of disease signaling pathways [16]. Thus, saponins in DAE could serve as alternatives to steroid drugs, such as dexamethasone [24], 17β-estradiol [6], and corticosteroids [6], which are widely used to treat OA, thereby, reducing the adverse reactions associated with these immunosuppressive drugs (e.g., new-onset diabetes is an adverse reaction associated with glucocorticoid use [25]). Genipin, an iridoid glycoside can regulate glucose homeostasis via interactions with uncoupling protein 2 (UCP2) [26].
Another phenomena in OA is the loss of 17β-estradiol (a steroid hormone), a phenomenon associated with hip and knee pathogenesis in OA [27]. This phenotype can be managed by supplementation with saponin-rich extracts that have low liver toxicity and bioavailability [28]. The known bioactive compounds in these extracts are administered individually and in combination with other drugs. For example, when co-administered with LY294002 (PI3K/AKT inhibitor), loganin attenuates cartilage degeneration and bone sclerosis in subchondral bone.
Alternatively, OA drug discovery research also focuses on different molecular signaling patterns through the literature mining approach, which explains the signaling pathways and genes studied for OA up to 2018 [17]. The Osteoarthritis society also summarized the progress in disease management [29]. Furthermore, Tonia L Vincent explained the convergence of molecular signaling in the in-vitro model and humans and the progress of OA drug discovery up to 2020 [6]. All these studies summarized that OA therapeutics were focused on targeting PI3K/AKT/mTOR signaling pathway [8], activating the polarized macrophages [30], growth factor therapies [6], and low-grade inflammation [29].
The iridoid components were also predicted to be a good natural component that promotes nerve growth and other blood vessel growth [14]. Our result strongly correlates with their suggestions, as DMR genes were enriched in the PI3K/AKT signaling pathway and sub-networks as depicted in Section 2.4. Genes involved in the extra cellular matrix (ECM)-receptor interactions could be therapeutic targets because the extra cellular matrix (ECM) is an important layer of most of the tissues in our body, and it anchors hundreds of proteins to maintain the structural flexibility of the tissues, particularly in the joint cartilage [31].
ECM is also known as the key modulator that frequently responds to external stimuli and makes the decision regarding the cell's fate by altering the intracellular signaling pathways [32]. Through mathematical modeling, Jordan F Hastings proposed ECM as a key regulatory sub-network that could decide the cell responses, behavior, phenotype, and drug response for acute and chronic diseases [32]. ECM components can act as ligands to activate signaling networks in both healthy and diseased states. ECM components, such as laminin, and collagen are used to activate the integrin family receptors.
In our results, the collagens (COL11A2, COL3A1, COL1A1, COL4A3, and COL2A1) and laminin (LAMA4, and LAMC1) were highly regulated to ensure these phenomena. Focal adhesion kinase (FAK)-involved in cell adhesion dynamics and mobility via the ECM-activates the PI3K/AKT signaling pathways, which has the key to regulate the proliferation, progression, metabolism, and survival mechanisms. This is a potential signaling mechanism known for wound healing and tissue repair process [8]. In the case of osteoarthritic drug discovery, the ECM degradation and formation mechanism has also gained attention [33].
The expression of growth factor receptors, such as KITLG [34], ANGPT4 [35], VEGFA [36], and FGF2 [37], are regulated by the 17β-estradiol. Similarly, our results also showed DMRs in these growth factors. Therefore, we hypothesize that Akebia saponin D could directly influence these growth factors to treat OA with these plant extracts. These can be further validated in detailed molecular experiments, and the genes, such as EPHA2, FGFR2, IGF1R, FGFR3, and EGFR, can be used to trigger polarized macrophages via the macrophage colony-stimulating factor 1 (RTK) receptor since these receptors are highly regulated in our dataset.
Another mode of targeting OA involves therapeutics via altering low-grade inflammation, which can be achieved by conducting a detailed research of the iridoids for AKT signaling via the above receptors. A study reported that the IGF1R is a strong activator of AKT phosphorylation and can activate AKT to promote synthesis of collagen II [14]. To the best of our knowledge, this is the first methylome profiled dataset for these two plant extracts that are used to treat experimental OA. This study provides detailed insight regarding potential OA therapeutic agents from these two plant extracts and specific chemical components.

Ethics Statement
Fifteen male Sprague-Dawley rats (7 weeks old) were purchased from Daehan Bio Link, Inc. (Eumseong, Chungcheongbuk-do, Korea). The animal experiment procedures were performed by following the guidelines of the Institutional Animal Care and Use Committee and were approved by the Ethical Committee of Kyungpook National University (NO. KNU 2018-0091). While conducting this study, all efforts were made to maximize the scientific benefit while minimizing the suffering of the animals.  [4,5].

Sample Collection and Experimental
The experimental design and the rats in this study were also brought from a previous study, while using one additional group of DAE (MIA-injected with saline and DAE) rats. A total of four groups (n = 3 per group), including untreated with saline (NC: normal control), MIA-injected with saline (MIA), MIA + DAE treatment (DAE), and MIA + PUE treatment (PUE), of rats were used in this study. To induce osteoarthritis (OA) in rats, MIA (3 mg/50 µL saline) was directly injected into the intra-articular space of the right knee of each rat while they were subjected to inhalation anesthesia. The medicinal plant extract treatments (PUE and DAE) were dosed daily at 200 mg/kg body weight for 21 days.

Library Preparation and Sequencing
The total DNA for bisulfite-seq was collected in the rat samples. Each rat's cartilage was collected, and the DNA was extracted using a QuickGene DNA tissue kit, following the manufacturer's protocol and was immediately frozen in liquid nitrogen and preserved for further use. The DNA quantification was carried out as a quality check, and three samples in each group were selected for library construction and sequencing. Bisulfite-seq libraries for sequencing were prepared using the SureSelect Methyl-Seq Library Prep Kit (Agilent Technologies, Santa Clara, CA, USA). The constructed libraries were sequenced using a Novaseq 6000 Sequencing System (Illumina, San Diego, CA, USA), yielding more than 2 × 50 million reads with 2 × 100 base-pair (bp) read lengths for 12 samples.

Genome Wide Methylation Patterns from Bisulfite-Seq Data
The reads were trimmed using trimmomatic (v0.38) after a quality check to remove low quality reads and adapters after sequencing. The bisulfite-seq reads were mapped to the rat DNA reference genome (rn6) acquired from NCBI using Bowtie2 (v2.3.4) via Bismark (v0.20.0) [38] and sorted and de-duplicated using samtools (v1.9) via Bismark. The read process statistics of the bisulfite-seq data were detailed in Figure S1. After the alignments, the read coverages on the cytosine sites were extracted using Bismark.
The coverage files were used to identify DMRs site by site using the methylKit R package (v1.10.0) [39]. The differential methylation of hyper-or hypo-methylated regions was calculated with a 25% methylation difference between groups, and differences were considered statistically significant at p-values < 0.05. The DMRs (for CpG, CHH, and CHG sites) were then annotated for genomic features (promoter, genic, and 3 UTR). All the DMRs were divided into three levels as shown in Figure 1 (Level 1: 25-49%, Level 2: 50-74%, and Level 3: 75-100%).

Gene Set Enrichment Analysis among Union Genes in DMRs
The methylation profile heatmap of union genes in the DMRs was prepared with the pheatmap R package (v1.0.12) and hierarchically clustered with the Euclidean distance and ward.D clustering algorithm. To identify the function and pathways of the clustered gene set, both datasets were evaluated using PANTHER [40] and clusterProfiler (v3.12.0) [41] in R using the gene ontology enrichment functions and org.Rn.eg.db library (v3.8.2) to build Gene Ontology (GO) terms.
The genes in each heatmap cluster were grouped into various categories involved in similar functions (biological processes, molecular functions, and cellular components) and pathways through the GO database and Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) database to reveal their functional roles. The statistical enrichment of GO terms for the genes was tested using Fisher's exact test with an adjusted p-value of 0.05. Finally, the KEGG enrichment pathway and GO analysis for DMRs (Level 2) was conducted with the DAVID online functional analysis tool (https://david.ncifcrf.gov/).

Functional Database
As shown in Figure 1, functional annotations for each gene were made as per the drug discovery protocol. Four datasets are included in Supplementary File S1, i.e., the Drug-Bank [42], Alliance [43], Human Protein Atlas [44], and Rat Genome Database (RGD) [45]. Here, we include the DrugBank ID for each gene, to easily navigate the details regarding known drugs from DrugBank from the complete database xml file. Second is the Alliance database from the files DISEASE-ALLIANCE_RGD_37.tsv, and ORTHOLOGY-ALLIANCE_COMBINED_37.tsv. Third is the RGD database to obtain the detailed information of diseases and pathway annotations from the files rattus_terms_pw and rat-tus_terms_rdo. Fourth, the FDA approved, potential drug candidates were downloaded from the human protein atlas database.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the animal experiment procedures were performed by the guidelines of the Institutional Animal Care and Use Committee and which were approved by the Ethical Committee of Kyungpook National University (NO. KNU 2018-0091).

Informed Consent Statement: Not applicable.
Data Availability Statement: The complete sequences generated in this study were deposited in the SRA repository under the accession PRJNA717328. The metadata of the individual samples were given in Supplementary File S1.