Identifying Novel Osteoarthritis-Associated Genes in Human Cartilage Using a Systematic Meta-Analysis and a Multi-Source Integrated Network

Osteoarthritis, the most common joint disorder, is characterised by deterioration of the articular cartilage. Many studies have identified potential therapeutic targets, yet no effective treatment has been determined. The aim of this study was to identify and rank osteoarthritis-associated genes and micro-RNAs to prioritise those most integral to the disease. A systematic meta-analysis of differentially expressed mRNA and micro-RNAs in human osteoarthritic cartilage was conducted. Ingenuity pathway analysis identified cellular senescence as an enriched pathway, confirmed by a significant overlap (p < 0.01) with cellular senescence drivers (CellAge Database). A co-expression network was built using genes from the meta-analysis as seed nodes and combined with micro-RNA targets and SNP datasets to construct a multi-source information network. This accumulated and connected 1689 genes which were ranked based on node and edge aggregated scores. These bioinformatic analyses were confirmed at the protein level by mass spectrometry of the different zones of human osteoarthritic cartilage (superficial, middle, and deep) compared to normal controls. This analysis, and subsequent experimental confirmation, revealed five novel osteoarthritis-associated proteins (PPIB, ASS1, LHDB, TPI1, and ARPC4-TTLL3). Focusing future studies on these novel targets may lead to new therapies for osteoarthritis.


Introduction
Osteoarthritis (OA) is the most common musculoskeletal disorder and cause of chronic disability in adults [1]. The main characteristic of OA is the deterioration of the articular cartilage, of which chondrocytes are the only cell type. The primary function of chondrocytes is to maintain homeostasis of the extracellular matrix (ECM). During OA, chondrocytes show aberrant phenotypes and actively produce cartilage-degrading enzymes, such as matrix metalloproteinases (MMPs) and aggrecanases, which result in the destruction of the ECM [2]. This change in phenotype is reflected at the mRNA level with various studies having identified a number of differentially expressed genes, some of which can also be seen at the protein level [3][4][5]. Despite research having identified a number of key genes and cellular pathways associated with the pathogenesis of OA, their potential as therapeutic targets remains largely undetermined.
Micro-RNAs (miRNAs) are a class of small non-coding RNA molecules, approximately 22 nucleotides long, which bind to messenger RNAs (mRNA) and induce their degradation or inhibit protein translation. They play a major role in regulating post-transcriptional gene expression and therefore protein levels. miRNA dysregulation has been implicated in OA development [6] and they are emerging as powerful regulatory molecules and novel therapeutic agents. miRNAs often have hundreds of experimentally verified and/or predicted target genes, which can be accessed via public databases such as miRTarBase [7] and TargetScan [8] and can therefore target multiple genes involved in a specific disease process. It is thought that restoring physiological levels of miRNAs, via mimics or inhibitors, will allow for restoration of joint homeostasis and function. Although miRNA and gene expression data are incredibly useful for understanding diseases, they are primarily researched independently. Single nucleotide polymorphisms (SNPs)-a common genetic variation where single nucleotides are substituted at a specific position in the genome-are another important tool for identifying disease-causing genes [9]. The benefit of integrated biological networks is that they combine these data and allow us to gain new insights into the molecular interactions underlying diseases [10]. These networks can be used to identify potential new genes that contribute towards a biological process or disease phenotype, and aid in target prioritisation via a guilt-by-association approach [11].
The current study utilises a p-value-based meta-analysis of the literature to identify miRNAs and mRNAs that are significantly dysregulated in OA cartilage. Subsequent pathway enrichment, overlap, chondrocyte co-expression, and integrated network analysis using a variety of input data is then used to identify novel OA-associated hub genes, which are ranked and prioritised. It is anticipated that this research will highlight potential important therapeutic targets and pathways that future research should focus on.

p-Value-Based Meta-Analyses Identify 6 miRNAs and 207 mRNAs Differentially Expressed in OA Cartilage
The PubMed search for studies on mRNA and miRNA expression in OA yielded 936 and 622 papers, respectively. Of these initial papers, 86 on miRNA expression and 30 on mRNA expression met our eligibility criteria (see Literature Search and Eligibility Criteria in Methods). Studies on miRNA expression were subject to quality control based on the inclusion of the miRNA in miRbase (v22; http://www.mirbase.org, accessed on 1 May 2020), resulting in 77 papers that were suitable for the meta-analysis. From these papers, 411 miRNAs and 5166 mRNAs were extracted. The p-value-based meta-analysis identified 6 miRNAs and 207 mRNAs as being significantly dysregulated in OA cartilage compared to healthy tissue in three or more independent studies. The 20 top mRNAs are shown below (Table 1) and the full list can be found in Supplementary Table S1.  To determine the most significant cellular pathways linked to these dysregulated mRNA and miRs, we performed pathway analysis (IPA) on the 207 dysregulated mRNAs and on the target genes of the miRNAs identified from the meta-analysis. We identified four common pathways between the dysregulated mRNA genes and miRNA target genes, including senescence, p53 signalling, BEX2 signalling, and unfolded protein response ( Figure 1a). We next investigated further the most significant pathway of senescence and showed that our lists of miRNA target genes, mRNAs, and their upstream regulators overlapped with genes shown to induce or inhibit senescence in vitro (CellAge genes) [12]. The most significant overlap was between predicted upstream regulators of the differentially expressed mRNAs identified in the meta-analysis and inducers of senescence (43% overlap). Moreover, there was a 33.3% overlap between inhibitors of senescence and predicted upstream regulators of miRNA target genes ( Figure 1b). All significantly enriched canonical pathways identified by IPA in the list of significant miRNA target genes and mRNAs can be found in Supplementary Tables S3 and S4, respectively. An undirected and weighted chondrocyte co-expression network was built using 396 human chondrocyte samples (see 'Chondrocyte expression matrix construction' in methods). This analysis allows the identification of novel genes that may not have been associated with OA previously, but that are commonly co-expressed with OA-associated genes identified from the meta-analysis. The meta-analysis genes were used as seed nodes. First-order interactors were extracted alongside the OA genes and disconnected components comprising less than 10 nodes were filtered out. The resulting network contained 1463 unique nodes and 4972 edges shared across seven disconnected components (including 142 OA seed nodes). An integrated multi-source information (MI) network was then created by combining the co-expression network with other data sources, such as miRNA target genes and SNPs. In order to condense the topological information of the network to a node (gene)-specific summary, the weighted degree was calculated, also known as the strength of each node, which was taken as the MIGe for the Multi-source Information Gain (MIG) score and was combined with the normalized integrated gene-specific information MIGn (see methods for more details). This allowed us to make a total gene score based on both node and edge aggregated score. The top 20 genes ranked by this score are listed in Table 2 and a list of all 1685 genes identified from the MI network can be found in Supplementary Table S5. Gene scores of CellAge genes were also significantly higher when compared to non-CellAge gene scores in the MI network (p = 0.0015). Table 2. The top 20 OA-associated genes identified from the MI network. Genes are ranked by total gene score-based on both node and edge aggregated score.

Gene
Total Gene Score These 1685 genes from the MI network were filtered for their inclusion in mass spectrometry (MS) data of dysregulated proteins in OA cartilage vs. healthy controls. Data are available via ProteomeXchange with identifier PXD029116. This proteomic analysis found 81, 46, and 29 dysregulated proteins in the superficial, middle, and deep cartilage zones, respectively. Filtering for genes identified from the MI network revealed 32 common proteins between the MS data and the MI network (Table 3). Seven mRNAs of these thirtytwo proteins were identified in the original meta-analysis, whereas the rest were found via analysis of the MI network. Finally, five of the proteins (PPIB, ASS1, LDHB, TPI1, and ARPC4-TTLL3) identified from the meta-analysis, that were experimentally confirmed via inclusion in the proteomics data, have never been studied in OA before.

Discussion
Following a systematic literature search and data extraction, this study included a p-value-based meta-analysis of data from all eligible miRNA and mRNA expression studies in human OA cartilage versus healthy control tissue. We identified a list of OA-associated genes and miRNAs, some of which were also confirmed to be modified at the protein level. Using this MI network approach, a ranked system was established in order to prioritise genes that future research can study for potential therapeutic targeting. In addition, this study has identified novel OA-associated genes that were not found previously.
As senescence was one of the most significantly OA-associated pathways shared between both the miRNA target genes and mRNAs, overlap analysis was performed with genes included in the CellAge database [12]. These CellAge genes were compiled by a systematic literature search of genetic manipulation studies whereby direct in vitro manipulation of the gene in question resulted in induction or inhibition of cellular senescence. Results of this analysis revealed highly significant overlaps, the most significant being with predicted upstream inhibitors of the differentially expressed mRNAs and inducers of cellular senescence. Furthermore, CellAge genes were found to score higher in the MI network compared to non-CellAge genes, further highlighting the potential significance of cellular senescence in the development or progression of OA. Senescent cells accumulate later in life and at sites of age-related pathologies, where they contribute to disease onset and progression through complex cell autonomous and non-autonomous effects [13]. Previous research has shown that senescent chondrocytes not only accumulate with age but are present at higher numbers in human OA cartilage compared with age-matched healthy controls [14]. In fact, a clinical trial investigating whether the senolytic supplement fisetinA reduces OA-associated pain and cartilage breakdown is due to begin in 2022 (NCT04770064). A key characteristic that distinguishes senescent cells from other cell types is the upregulation of a combination of factors known as the 'senescence-associated secretory phenotype' (SASP) [15]. The SASP contributes to fuel a state of chronic, systemic, low-grade inflammation, known as 'inflammaging', and compromises a subset of genes whose encoded secreted proteins include proinflammatory cytokines and chemokines, growth factors, and proteases that can digest the ECM [16]. Overall, results of the overlap analyses corroborate this research, suggesting a strong association between OA and cellular senescence.
One of the strengths of this study is that it increased the sample size by combining all eligible data into one statistical test. This is particularly important as sample sizes of individual miRNA studies are often small, especially as control healthy cartilage is notoriously difficult to obtain. A co-expression network allows identification of genes that tend to show a coordinated expression pattern across a group of samples, in this case chondrocytes. However, hub-gene identification via co-expression networks has limited power for identifying targets for follow-up studies [11]. We therefore enhanced identification of important hub genes by integrated network analyses where additional data sources were integrated to make a multi-source information (MI) network, with the aim of prioritising genes on their importance to OA [10]. Finally, we confirmed our list of OAassociated genes at the protein level using mass spectrometry data. It is known that articular cartilage can be separated into distinct zones, namely, superficial, intermediate, and deep, in which chondrocytes show distinct gene expression profiles and behaviours [17]. Thus, separating these zones for proteomics analysis ensures location-specific changes in protein levels is not under-represented compared to whole-cartilage samples.
Using this complex method, we have confirmed some well-studied OA-associated genes, such as TIMP-4 (Tissue Inhibitor for Matrix Metalloproteinases 4), a potent Matrix Metalloproteinases (MMP) inhibitor known to be expressed by chondrocytes [18]. Although not as well-studied in OA as other TIMP family members, previous research has demonstrated an increase in TIMP-4 in human knee synovium and expression in primary hip and knee chondrocytes [19]. In addition, other commonly associated gene expression changes were defined by our network analysis, such as aggrecan (ACAN), collagens 1 and 12, and lubricin (PRG4) [20]. As well as confirming well-studied OA genes, results of the MI network found a few genes that have, to our knowledge, never been implicated in OA previously (PPIB, ASS1, LDHB, TPI1, and ARPC4-TTLL3). This shows that there may be many genes integral to OA pathogenesis that have yet to be identified in the tissue, let alone studied. These novel genes warrant further investigation to understand the complex disease more fully. Interestingly, PPIB has been implicated in the rare skeletal disorder 'osteogenesis imperfecta' (OI), with a recent study reporting a rare pedigree with an autosomal recessive OI caused by two novel PPIB mutations [21]. Moreover, lactate dehydrogenase (LDH) catalyses the interconversion of pyruvate and lactate, which are critical fuel metabolites of skeletal muscle. Previous research found that LDHB expression is induced by exercise in human muscle and that chronic activation of LDHB in skeletal muscle triggers an adaptive oxidative muscle transformation, leading to increased exercise capacity in muscle-specific LDHB transgenic murine models [22]. This is particularly interesting as research is increasingly finding a relationship between OA and skeletal muscle wasting [23]. This warrants further investigation into LDHB in both OA and surrounding peri-articular muscles. TPI1 is another novel gene identified in this study as having a critical role in skeletal muscle, where it is involved in oxidative pathways [24]. ASS1 encodes a protein that catalyses the penultimate step of the arginine biosynthetic pathway, mutations in which cause the life-threatening condition Citrullinemia [25], but has not, to our knowledge, been implicated in diseases of the musculoskeletal system. Finally, ARPC4-TTLL3 functions as the actin-binding component of the Arp2/3 complex [26]. The fact that three of the five novel genes identified from this study are implicated in skeletal muscle function corroborates the idea that health of the peri-articular skeletal muscles may play an integral role in OA pathogenesis.
Other genes identified by the network analysis have been previously linked to OA, but have not been studied for their potential as biomarkers/therapeutic targets. For example, Apolipoprotein D (APOD) was found to be downregulated in every zone of the OA cartilage. It was also 1 of the 207 mRNAs identified as significantly dysregulated from the meta-analysis and was the top-ranked gene of the MI network. Research has previously implicated APOD as being an important gene in OA pathogenesis. For example, APOD is strongly upregulated by retinoic acid [27], which is in turn regulated by ALDH1A2-an OA risk locus [28]. In vitro studies have shown APOD to be upregulated upon SOX9 overexpression, a master transcription factor essential for cartilage ECM formation [29]. Furthermore, a recent study into the identification of knee OA genes shared by both cartilage and synovial tissue proposed that APOD may manage OA through chondrogenesis in articular cartilage and immune regulation in the synovium [30]. The high ranking of APOD in the MI network makes it an interesting candidate for future studies into OA. In particular, research should investigate its potential as a biomarker/drug target. Fibrillin-1 (FBN1) was another gene that was discovered from analysis of the MI network whose encoded protein was also found to be upregulated in the middle and deep zones of OA cartilage according to the MS data. This is particularly interesting as FBN1 is the causative gene of the inherited connective tissue disorder Marfan syndrome [31]. Moreover, it was 1 of 300 proteins identified via lectin-affinity chromatography in a previous study investigating the proteome of human OA synovial fluid [32]. The fact that this gene encodes microfibrils that play a structural role in all connective tissues, and mutations in which are known to cause a disease of the musculoskeletal system, warrants further investigation of its role in OA.
It should be noted that different methods of RNA extraction, miRNA expression measurements, and statistical methods were not considered in this analysis. Hypothetically, the impact of these variables could be investigated systematically, for example, by performing sensitivity or meta-regression analyses. However, the current number of independent studies is too small to allow for this kind of analysis. Most of the studies used in this analysis also did not report specific p-values in relation to the miRNA dysregulation. Rather, the values were reported as "less than" a certain significance level (typically <0.05 or <0.001). In these instances, the largest possible p-value was used (i.e., if it was reported at <0.05, the p-value used in the analysis was 0.05). This conservative method may have prevented some miRNAs that were on the verge of significance from being included in the study. Although the sample size of analysis for each miRNA was increased by the meta-analysis method used, ultimately the quality of the analysis is only as strong as the original publications. As mentioned in the methods, quality control was conducted whereby some miRNAs and publications were filtered out based on certain criteria. However, errors or limitations of analysis in the original publications may remain. Moreover, research has suggested that there are reporting biases of differential gene expression in literature, including: preferential reporting of overexpressed rather than underexpressed genes as well as genes that are popular in the biomedical literature at large [33]. As such, a critical mRNA that is investigated by only one group worldwide may not make the cut in the present analysis despite its potential importance to the disease pathogenesis. This bias is evident in the results of this study. For example, miR-140 is probably the most researched and established miRNA to date in terms of its relation to OA [34,35]. As its dysregulation has been very well classified, research will often include it as a positive control. This is reflected in the results of this meta-analysis, where miR-140, miR-140-3p, and miR-140-5p were all found to be significantly dysregulated. However, miR-140 has also been shown to attenuate OA progression via the inhibition of senescence in a recent study by [36]. This provides further support for the downregulation of the miRNA observed in this meta-analysis, as well as the association of its target genes with senescence.
A possible weakness of this study is that eligible studies often did not specify the stage of OA of the tissue donor. As many of the samples came from total joint replacements, it is assumed that a lot of the samples were from late-stage patients. Studies have shown that different stages of OA development and severity have distinct gene and miRNA expression patterns [37]. As such, the results may not adequately represent miRNA dysregulation in early-stage OA.

Literature Search and Eligibility Criteria
A systematic literature search for miRNA and mRNA expression studies in human OA cartilage was performed using PubMed (http://www.pubmed.gov (accessed on 1 May 2020)), applying the search terms "(microRNA OR miRNA OR miR OR micro-RNA) AND (OA OR Osteoarthritis)" for the miRNA analysis, and "(OA OR osteoarthritis) AND (mRNA OR gene) AND (expression) AND (human) AND (knee) NOT (synovium) NOT (murine) NOT (meniscus)" for the mRNA analysis. Papers were assessed for eligibility using the title, abstract, or full text, as necessary. Only articles published in peer-reviewed journals and in English were considered. Papers were not filtered for publication date and were only considered for eligibility provided they: (1) used human knee articular cartilage tissue for analysis, (2) used control cartilage from non-OA patients, and (3) provided the number of patients and significant p-values. A summary of eligible studies can be found in Supplementary Tables S6 and S7 and an overview of the study design is depicted in Figure 2.

Data Extraction and Quality Control
For each eligible paper, the first author's name, year of publication, PubMed link, city/country of origin, source of specimen, number of OA and control samples, p-values, miRNA/mRNA names, and direction of dysregulation were extracted. For quality control, the list of the extracted miRNAs was compared to those included on miRbase (v22; http: //www.mirbase.org (accessed on 1 May 2020)). Any miRNAs that were not listed on miRbase, had insufficient annotation, or corresponded with expired/non-human entries, were excluded from further analysis.

p-Value-Based Meta-Analyses
A p-value-based method was used as it enables the combination of results when effect size estimates and/or standard errors from individual studies are not freely available. Meta-analyses were performed on p-values and directions of effects, providing the miRNA or mRNA was identified as being significantly dysregulated in ≥3 independent studies, as previously described [38]. To do so, a customised R Studio script was used to transform p-values into signed z-scores using Stouffer's method [39,40], which were then converted to positive or negative values depending on the direction of expression (R script can be found in Supplementary Table S8). Z-scores for each miRNA/mRNA were combined by calculating a weighted sum, with weights being proportional to the square root of the effective sample size of the study.

CellAge Overlap
CellAge is a database of genes that can drive the senescence process [41]. Build 2 of CellAge [12] was overlapped with differentially expressed genes identified from the OA meta-analysis. Significance was assessed using a two-tailed Fisher's exact test with Benjamini-Hochberg false discovery rate (FDR) correction.

Chondrocyte Co-Expression Matrix and Network Construction
The undirected weighted protein-coding chondrocyte expression matrix was built using 396 chondrocyte read count data obtained from recount2 [42]. The raw expression data were normalized by quantile normalization method using Bioconductor in R Studio [43]. Protein-coding genes were obtained using Ensembl biomaRt version 101 [44]. The mutual-rank method was used to obtain the top co-expressed partners for all 15,550 proteincoding genes expressed in the chondrocyte data [45]. The mutual-rank cut-off value of 15 was used to filter the top co-expressed genes, resulting in 14,521 unique nodes and 61,222 edges. Networks were built and analysed using the R package igraph version 1.2.521 [46]. The chondrocyte co-expression network was filtered for OA genes identified in the meta-analysis, alongside their first-order interactors. Since the network comprised 31 disconnected components, components with fewer than 10 nodes were filtered out.

Multi-Source Information Network Construction and Gene Rankings
Selected OA multi-source data were integrated by extending a previously proposed methodology [10]. In total, three different sources of information were utilised: validated miRNA targets (from miRTarBase [7]), the co-expression network, and OA-associated SNPs. These individual networks were used to produce an MI network that is based on the weighted sum of the pairwise weighted edge vectors (for each pair of genes), and node-specific information from different sources was used to produce the weighted sum of the nodal score. An overview of this methodology is depicted in Figure 3 and a detailed description can be found in Supplementary Text S1. Figure 3. A depiction of the multi-omic integration method used to create the MI network of genes involved in OA. The circles represent the 'nodes' and the connecting line represents the 'edge'. 'w' refers to the weight given to each source of data used to calculate the node and edge-specific scores.

Confirmation of Results with Mass Spectrometry Analysis
Results of the MI network were overlapped with label-free mass spectrometry proteomics data of human OA articular cartilage compared to healthy controls (manuscript in preparation). The mass spectrometry data have been deposited to the ProteomeXchange Consortium via the PRIDE [47] partner repository with the dataset identifier PXD029116 and 10.6019/PXD029116.

Conclusions
OA is a progressive and debilitating disease and the most common cause of chronic disability in adults. This study identifies 33 dysregulated miRNAs in human OA cartilage which may present as good candidates for replacement or inhibition therapy, as well as 207 differentially expressed mRNAs. Results of IPA and overlap analyses suggest a strong association between OA and senescence, corroborating the idea that the accumulation of senescent cells in cartilage contributes to the ECM degradation characteristic of OA. The MI network approach used in this study to integrate multi-source data may help to uncover important and novel genes involved in OA. Experimental confirmation of our bioinformatic analyses, using mass spectrometry data, revealed 32 proteins that are significantly differentially expressed in human OA cartilage. By using a range of bioinformatic methods, this study enabled the ranking, prioritisation, and experimental confirmation of novel OA-associated genes and their encoded proteins. Ultimately, this will allow for future research to focus on genes that may be of higher importance to OA pathogenesis and assess their suitability as drug targets or disease biomarkers. This is particularly important given that pain management and total joint replacement procedures are the only current treatment options for the disease.