Multifactorial Landscape Parses to Reveal a Predictive Model for Knee Osteoarthritis

The present study attempted to investigate whether concerted contributions of significant risk variables, pro-inflammatory markers, and candidate genes translate into a predictive marker for knee osteoarthritis (KOA). The present study comprised 279 confirmed osteoarthritis patients (Kellgren and Lawrence scale ≥2) and 287 controls. Twenty SNPs within five genes (CRP, COL1A1, IL-6, VDR, and eNOS), four pro-inflammatory markers (interleukin-6 (IL-6), interleuin-1 beta (IL-1β), tumor necrosis factor alpha (TNF-α), and high sensitivity C-reactive protein (hsCRP)), along with significant risk variables were investigated. A receiver operating characteristic (ROC) curve was used to observe the predictive ability of the model for distinguishing patients with KOA. Multivariable logistic regression analysis revealed that higher body mass index (BMI), triglycerides (TG), poor sleep, IL-6, IL-1β, and hsCRP were independent predictors for KOA after adjusting for the confounding from other risk variables. Four susceptibility haplotypes for the risk of KOA, AGT, GGGGCT, AGC, and CTAAAT, were observed within CRP, IL-6, VDR, and eNOS genes, which showed their impact in recessive β(SE): 2.11 (0.76), recessive β(SE): 2.75 (0.59), dominant β(SE): 1.89 (0.52), and multiplicative modes β(SE): 1.89 (0.52), respectively. ROC curve analysis revealed the model comprising higher values of BMI, poor sleep, IL-6, and IL-1β was predictive of KOA (AUC: 0.80, 95%CI: 0.74–0.86, p < 0.001), and the strength of the predictive ability increased when susceptibility haplotypes AGC and GGGGCT were involved (AUC: 0.90, 95%CI: 0.87–0.95, p < 0.001).This study offers a predictive marker for KOA based on the risk scores of some pertinent genes and their genetic variants along with some pro-inflammatory markers and traditional risk variables.


Introduction
Osteoarthritis (OA) is a debilitating arthropathy due to the continuing degradation of all joint tissue, such as the meniscus, infrapatellar fat pad, and cartilage, along with subchondral sclerosis and synovial inflammation [1]. Several mechanisms participate in the etiopathogenesis of OA, but intra-articular inflammation is the crucial trigger, propagated by pro-inflammatory cytokines in plasma and synovial fluid [2]. Studies have shown that amongst others, interleukin-6 (IL-6), interleukin 1-beta (IL-1β), and tumor necrosis factor alpha (TNF-α) are the prime culprits that play significant roles in the progressive cascade of OA pathology [1][2][3]. After tissue injury, specific cell surface receptors augment the synthesis

Subjects
This study was conducted between August 2016 and August 2020 under the "osteoarthritis predictive marker identification initiative (OPMII)". After preliminary screening, 279 subjects with moderately severe and severe osteoarthritis verified by radiographic examination of grade ≥2 based on the Kellgren and Lawrence (K&L) scale were enrolled as cases. Matched for age and sex, 287 subjects, who scored zero on the K&L scale and had neither current nor past complaints of skeletal abnormalities or joint pain were selected as controls. Exclusion criteria for the cases included the following: taking hormone replace therapy (HRT) or using corticosteroids; having any type of lupus or suffering from gout; having cancers, rheumatoid arthritis, joint infection/necrosis, fractures, or secondary osteoarthritis; or having any inflammatory disorders or confounding pathologies (osteonecrosis of the medial condyle, psoriatic osteoarthritis, femoral neuropathy, enthesitis, and reactive osteoarthritis). Exclusion criteria for controls included the following: history of joint injury, pain, swelling, or stiffness and/or subjects on medications ( Figure 1). Only those subjects who provided prior written consent were enrolled in the study. The privacy of the subjects was preserved to prevent bias. The study protocol was approved by the Institutional Ethical Committee (IEC/350/2017) and stringently followed the Helsinki Declaration.
in the study. The privacy of the subjects was preserved to prevent bias. The study protocol was approved by the Institutional Ethical Committee (IEC/350/2017) and stringently followed the Helsinki Declaration.

Sleep Quality and Biochemical Variables
Sleep quality was examined with the Pittsburgh Sleep Quality Index (PSQI), comprising 19 items that identify seven core elements of sleep: sleep quality, how long subjects take to fall asleep, percentage of time in bed, sleep duration, sleep disturbances, medications for sleep, and daytime dysfunction. Global scores of >5 indicate poorsleep and <5 good sleep with a sensitivity of 98.7 and specificity of 84.4 for the diagnosis of sleep quality. Total cholesterol (TC), triglycerides (TG), and high-density lipoprotein (HDL) were examined with a one-step enzymatic method (Erba Mannheim, London, UK) with a minimum detection limit of 0.1 mg/L. Low-density lipoprotein (LDL) was evaluated with the Friedewald equation. An Enzyme Linked Immunosorbent Assay (ELISA) was used to determine plasma levels of IL-6, IL-1β, TNF-α, and hsCRP(ThermoFisher Scientific,Waltham, MA, USA,)on a microplate reader (Biotek Instruments Inc., Winooski, VT, USA). Diagnostic sensitivities of the kits were <1.0 pg/mL, 0.35 pg/mL, 1.75 pg/mL, and 9.38 pg/mL for IL-6, IL-1β, TNF-α, and hsCRP, respectively. The coefficients of variation for inter-and intra-assay analyses were less than 5% for all these assays.

Sleep Quality and Biochemical Variables
Sleep quality was examined with the Pittsburgh Sleep Quality Index (PSQI), comprising 19 items that identify seven core elements of sleep: sleep quality, how long subjects take to fall asleep, percentage of time in bed, sleep duration, sleep disturbances, medications for sleep, and daytime dysfunction. Global scores of ≥5 indicate poorsleep and <5 good sleep with a sensitivity of 98.7 and specificity of 84.4 for the diagnosis of sleep quality. Total cholesterol (TC), triglycerides (TG), and high-density lipoprotein (HDL) were examined with a one-step enzymatic method (Erba Mannheim, London, UK) with a minimum detection limit of 0.1 mg/L. Low-density lipoprotein (LDL) was evaluated with the Friedewald equation. An Enzyme Linked Immunosorbent Assay (ELISA) was used to determine plasma levels of IL-6, IL-1β, TNF-α, and hsCRP(ThermoFisher Scientific, Waltham, MA, USA), on a microplate reader (Biotek Instruments Inc., Winooski, VT, USA). Diagnostic sensitivities of the kits were <1.0 pg/mL, 0.35 pg/mL, 1.75 pg/mL, and 9.38 pg/mL for IL-6, IL-1β, TNF-α, and hsCRP, respectively. The coefficients of variation for inter-and intra-assay analyses were less than 5% for all these assays.

SNP Selection and Genotyping
SNPs within CRP, IL-6, VDR, eNOS, and COL1A1 genes were carefully selected based on the following criteria: (i) polymorphic with minor allele frequency >5%, (ii) confirmed functional SNP influencing mRNA expression, (iii) significant participant in inflammatory conditions evidenced by published reports, and (iv) validated at NCBI's refSNP cluster (http://www.ncbi.nlm.nih.gov/snp (accessed on 20 January 2021). Twenty SNPs (Table 1) within these genes were chosen and genotyped. DNA was extracted from the whole blood using the salting-out method. A total of 25 µL of reaction mixture was used to amplify the extracted DNA by employing the polymerase chain reaction (PCR). Amplified DNA was digested with respective high fidelity restriction endonucleases (New England BioLabs Inc., Ipswich, MA, USA). The genotypes were scored and visualized with 2-3% agarose gel electrophoresis depending upon the size of the product. All the experimental work was done without the knowledge of the case/control status to avoid any bias. Fifteen percent of randomly selected samples were repeated for each locus genotyping for replication and validity.

Statistical Power and Population Stratification
Examination of statistical strength of the sample size for inferring genetic associations was checked by using the Power for Genetic Association Analysis (PGA) Package [11]. The minimum detectable relative risk (MDRR) using SNPs and haplotype effects under various models revealed that a sample size of 566 subjects (279 cases and 287 controls) would deliver >85% power to distinguish a minimum genotype relative risk (MGRR) of 2.0 with minor alleles having at least a frequency of 0.15 (rs1800947 in the present sample) at a significance of 0.05. The haplotype relative risk module indicated an MDRR of 1.5 with substantial power (>85%). Population stratification (PS) was tested and the pairwise fixation index (Fst), an indicator of genetic dissimilarity contained in a sub-population, was analyzed by using Arlequin ver. 3.0 software [12]. Fstfor "within population differentiation" was observed to be 0.032 ± 0.017, showing no PS between groups in this study population.

Receiver Operating Characteristic Curve Analysis
To test predictive ability and discriminatory accuracy of biochemical parameters, haplotypes, and traditional risk factors, an area under the receiver operating characteristic curve (AUROC) was modelled with risk scores. Risk scores for biochemical markers and traditional risk factors (TRD) were calculated based on their respective β coefficients (unweighted scores) obtained from logistic regression analysis. Values of these unweighted scores were further standardized by multiplying the lowest absolute value of the coefficient with a number (κ) so that its value becomes 1. All β values were rounded to the closest integer value after multiplying by the value (κ) used for unweighted scores, and their summation represented their respective weighted risk scores. For calculating haplotypebased genetic risk scores, first, individual scores were calculated based on the carriage of risky alleles in the susceptibility haplotype of each gene. Genotypes of risk (R) and non-risk (N) alleles were assigned scores of 0 (NN), 1 (RN), or 2 (RR) for every SNP participating in the respective haplotype. The summation of all SNP-wise risk scores represented the final risk score of every individual. The values deduced as the area under the curve (AUC) from 0.6 to 0.7, 0.7 to 0.8, and 0.8 to 0.9 were considered weak, acceptable, and excellent, respectively.

Statistical Analysis
Differences between proportional and categorical data were analyzed by the chi-square test and continuous data were analyzed with the t-test (Student's t) or Mann-Whitney-Wilcoxon rank-sum test. Minor allele frequencies (MAFs) of the SNPs were analyzed by gene counting, and departures from the Hardy-Weinberg equilibrium were evaluated with the Fisher's exact test. Univariate and multivariable regression analysis were employed to investigate the extent of the independent contribution of variables with KOA. Haplotypes were determined with Arlequin ver. 3.0 software [12]. Two locus epistasis effects (genegene) versus risk variables (gene-environment) were analyzed using epiSNP software [13]. Multivariable regression analysis identified a susceptibility haplotype-adjusted model by taking the most prevalent haplotype as the referent. Furthermore, Wald statistics, Akaike information criterion (AIC = 2k−2ln (L)), and uncertainty measure (R 2 h) was used to distinguish the best fit model.

Detection of Independent Risk Predictors
Univariate regression analysis of risk variables (

Haplotype Analysis, Their Risk and Best Fit Mode of Their Manifestation
For haplotype risk analysis of all the genes (Table 4), the most prevalent haplotypes were taken as referents. Haplotypes having frequencies less than five were excluded from further analysis. SNPs of the CRP gene (in the order of rs2794521, rs1800947, and rs1130864) developed into 5 out of 8 possible haplotypes, which captured 85-95% of the genetic variance in osteoarthritis patients and controls. AGT was observed to have a strong association with the risk of osteoarthritis (OR 3.12, 95%CI: 1.88-5.19), which remained significant even after additional adjustment with BMI, TG, sleep, IL-6, IL-1β, and CRP levels (OR 2.73, 95%CI:1.63-4.72). Similarly, SNPs of the COL1A1 gene (in the order of rs1107946 and rs1800012) revealed TG to be a risk haplotype (OR 1.68, 95%CI:1.04-2.70) in the preliminary analysis, but it lost its significance after adjusting for the effects of risk variables. SNPs with the IL-6 gene (in the order of rs1800795, rs1800796, rs1800797, rs2069827, rs12700386, and rs10499563) generated 8 haplotypes out of the expected 64. Haplotype GGGGCT was observed to be risky (OR 2.24, 95%CI: 1.20-4.16) and suggested a susceptibility haplotype (OR 2.10, 95%CI: 1.08-3.79) after adjusting its effect with risk variables. The VDR gene showed 6 haplotypes out of a possible 8 haplotypes, which captured 87-95% of the genetic variance for osteoarthritis patients and controls. The AGC (Bat) haplotype appeared to confer risk (OR 2.10, 95%CI: 1.26-3.93) after adjusting for the effects of risk variables. SNPs within the eNOS gene (in the order of rs2070744, rs1799983, rs1800780, rs391881, rs891512, and rs1808593) developed into 7 out of the 64 possible haplotypes, which captured 78-91% of the genetic variance. Minor alleles of all the SNPs except at position 6 in the form of CTAAAT appeared to be the susceptibility haplotype for osteoarthritis risk (OR 3.12, 95%CI: 1.99-6.72) when its influence was examined after adjusting for the effects of risk variables.
The functional effects of the haplotypes on osteoarthritis risk were further tested more strictly with Wald statistics under dominant, recessive, multiplicative, and general n the text ng method has been written as suggesteduggested Many SNP-SNP interactions were deduced between osteoarthritis and non-osteoarthritis subjects, but only significant effects are shown here.

Haplotype Analysis, Their Risk and Best Fit Mode of Their Manifestation
For haplotype risk analysis of all the genes (Table 4), the most prevalent haplotypes were taken as referents. Haplotypes having frequencies less than five were excluded from further analysis. SNPs of the CRP gene (in the order of rs2794521, rs1800947, and rs1130864) developed into 5 out of 8 possible haplotypes, which captured 85-95% of the genetic variance in osteoarthritis patients and controls. AGT was observed to have a strong association with the risk of osteoarthritis (OR 3.12, 95%CI: 1.88-5.19), which remained significant even after additional adjustment with BMI, TG, sleep, IL-6, IL-1β, and CRP levels (OR 2.73, 95%CI:1.63-4.72). Similarly, SNPs of the COL1A1 gene (in the order of rs1107946 and rs1800012) revealed TG to be a risk haplotype (OR 1.68, 95%CI:1.04-2.70) in the preliminary analysis, but it lost its significance after adjusting for the effects of risk variables. SNPs with the IL-6 gene (in the order of rs1800795, rs1800796, rs1800797, rs2069827, rs12700386, and rs10499563) generated 8 haplotypes out of the expected 64. Haplotype GGGGCT was observed to be risky (OR 2.24, 95%CI: 1.20-4.16) and suggested a susceptibility haplotype (OR 2.10, 95%CI: 1.08-3.79) after adjusting its effect with risk variables. The VDR gene showed 6 haplotypes out of a possible 8 haplotypes, which captured 87-95% of the genetic variance for osteoarthritis patients and controls. The AGC (Bat) haplotype appeared to confer risk (OR 2.10, 95%CI: 1.26-3.93) after adjusting for the effects of risk variables. SNPs within the eNOS gene (in the order of rs2070744, rs1799983, rs1800780, rs391881, rs891512, and rs1808593) developed into 7 out of the 64 possible haplotypes, which captured 78-91% of the genetic variance. Minor alleles of all the SNPs except at position 6 in the form of CTAAAT appeared to be the susceptibility haplotype for osteoarthritis risk (OR 3.12, 95%CI: 1.99-6.72) when its influence was examined after adjusting for the effects of risk variables. Number of subjects having the haplotype are shown in the parenthesis. All haplotypes that had less than 5% frequencies were excluded from the analysis. P Cor : p values were corrected for multiple comparisons (Bonferroni adjustment). Bold faces show the susceptibility haplotype. a Odds ratios were adjusted with body mass index, triglycerides, poor sleep, interleukin-6, interleukin 1-beta, and high sensitivity C-reactive protein.
The functional effects of the haplotypes on osteoarthritis risk were further tested more strictly with Wald statistics under dominant, recessive, multiplicative, and general modes of inheritance, and selection of the best fit model was identified with AIC and R 2 h (Table 5). It was observed that the susceptibility haplotype AGT of the CRP gene (β ± SE; 2.11 ± 0.76, p < 0.001) and the GGGGCT haplotype of the IL-6 gene (β ± SE; 2.75 ± 0.59, p < 0.001) influenced osteoarthritis risk in recessive modes, whereas the AGC haplotype of the VDR gene (β ± SE; 2.35 ± 0.65, p < 0.001) and the haplotype CTAAAT of the eNOS gene (β ± SE; 1.89 ± 0.52, p < 0.001) manifested in dominant and multiplicative modes, respectively. Models showing values after adjustment for risk covariates: body mass index, triglycerides, poor sleep, interleukin-6, interleukin 1-beta, and high sensitivity C-reactive protein. } Estimated haplotype effect, p: asymptotic value, R 2 h: haplotype uncertainty measure, AIC:Akaike information criterion. Values in bold face show highest R 2 h values and lowest AIC. Recessive (subjects with1 copy are at the same risk as subjects withno copy), Dominant effect (subjects with 1 copy are at same risk as subjects with2 copies), Multiplicative effect (subjects with 1 copy are at an intermediate risk compared to that ofsubjects withno copies or 2 copies).

Predictive Ability of Traditional Risk Factors, Biochemical Parameters, and Haplotypes
Weighted risk scores (effect estimates) were used to measure the predictive power of traditional risk factors (TRD), biochemical parameters, and haplotypes by using AUROC (Figure 2). In the first model, estimation of TRD and biochemical parameters was calculated, and out of these estimates, variables were omitted that did not change the AUC values.

Discussion
The present study identified a predictive model based on the collective contribution of some traditional risk predictors, relevant biochemical parameters, and genetic candidates for the risk of KOA in the population of Punjab, India. The clinical literature guided our thoughts to consider that in multifactorial pathologies, not a single factor but rather interactions between many "biological culprits" shape the phenotype, otherwise contradictory inferences prevail. For instance, some of the reports suggest that higher CRP levels at baseline are predictive of cartilage loss, synovial inflammation, and progression of KOA [14,15], whereas a meta-analysis showed that neither hsCRP levels nor its gene expression is associated with incidence or progression of KOA without correction for the effect of BMI [16]. Individual SNPs fail to capture genetic variance, whereas haplotypes reflect more robust pictures of these variances and uncover concerted roles of the alleles. None of the studies so far reported CRP-haplotyperelated effects on hsCRP levels in KOA. For the first time, results of the present study highlight that a susceptibility haplotype of AGT of the CRP gene is significantly associated with higher hsCRP levels after adjusting for the effects of BMI, TG levels, and sleep (OR 2.73, 95%CI: 1.63-4.72); how-

Discussion
The present study identified a predictive model based on the collective contribution of some traditional risk predictors, relevant biochemical parameters, and genetic candidates for the risk of KOA in the population of Punjab, India. The clinical literature guided our thoughts to consider that in multifactorial pathologies, not a single factor but rather interactions between many "biological culprits" shape the phenotype, otherwise contradictory inferences prevail. For instance, some of the reports suggest that higher CRP levels at baseline are predictive of cartilage loss, synovial inflammation, and progression of KOA [14,15], whereas a meta-analysis showed that neither hsCRP levels nor its gene expression is associated with incidence or progression of KOA without correction for the effect of BMI [16]. Individual SNPs fail to capture genetic variance, whereas haplotypes reflect more robust pictures of these variances and uncover concerted roles of the alleles. None of the studies so far reported CRP-haplotyperelated effects on hsCRP levels in KOA. For the first time, results of the present study highlight that a susceptibility haplotype of AGT of the CRP gene is significantly associated with higher hsCRP levels after adjusting for the effects of BMI, TG levels, and sleep (OR 2.73, 95%CI: 1.63-4.72); however, this result lacked predictive strength so should not be used as a predictive marker for KOA. Inconsis-tency also exists for the role of the IL-6 gene vis-à-vis KOA, as functional SNP rs1800795 of the IL-6 gene has been observed to correlate strongly with knee OA in a meta-analysis [17], whereas another meta-analysis could not observe it [18]. It is highly likely that transcription control of IL-6 levels by functional SNP rs1800795 is not additive but confers KOA risk by its interaction with other SNPs such as rs1800796, rs1800797, rs2069827, rs12700386, and rs10499563 in a haplotype GGGGCT [7]. The present study affirms that susceptibility haplotype GGGGCT within the IL-6 gene influences higher plasma IL-6 levels but also confers substantial osteoarthritis risk for the knee (OR 2.10, 95%CI:1.08-3.79) after adjusting for the effects of BMI, TG, and sleep.
Vitamin D is a significant mediator of osteoblast proliferation and bone mineralization in addition to its active role in the synthesis of proteoglycans, alkaline phosphatase, and osteocalcin [19]. Therefore, low levels of vitamin D in the body are associated with decreased muscle power strength, poor knee rehabilitation, locomotor dysfunction, and cartilage loss [19]. The T allele of the VDR gene SNP rs731236 has a functional effect on mRNA expression, and lack of this allele poses a three-fold higher risk of KOA [8]. The present study exposed a susceptibility haplotype AGC (Bat) within the VDR gene, and bearers of this haplotype are at a two-fold higher risk of KOA after adjusting for the effect of risk variables (TG, BMI, and sleep). Most of the studies investigating this link did not adjust for the effects with significant confounders, especially sleep quality, which is considerably influenced by vitamin D deficiency. Poor sleep is not an obvious artefact of skeletal abnormalities, as it has strong clinical connotations. For instance, poor sleep induces an immune response of inflammation, which was clarified by a study highlighting that every one hour of sleep deficiency increases 9-12% of the circulating levels of IL-6, TNF-α, and hsCRP levels [20]. VDR interacts epistatically with eNOS to transcriptionally regulate the production of nitric oxide (NO), which is increased in chondrocytes, and its excessive production is autotoxic, which causes tissue and cartilage destruction in OA [9]. Two-way epistatic interactions with 20 SNPs analyzed in the present study highlight that all the functional SNPs of CRP, IL-6, COL1A1, VDR, and eNOS genes participate and interact with other SNPs in influencing risk variables and pro-inflammatory cytokines, which indicates that not an individual SNP but rather SNP-SNP cross talks capture the impact on risky traits.
Some possible limitations of this study may have affected the interpretations. Firstly, gender-specific analysis of knee osteoarthritis has been done owing to the relatively small sample size of men and women in the study. Therefore, additional gender-specific studies with appropriate statistical strength may clarify differences. Secondly, all the explanations are with reference to a population from Punjab; consequently, the findings may not be generalizable to other populations. Hence, such studies in higher cohorts involving other populations and ethnic groups are required to confirm the findings. Finally, despite the current study involving five genes, twenty SNPs, four pro-inflammatory markers, and some pertinent traditional risk factors, further studies including additional microRNAs as epigenetic modulators would be worth pursuing.
In conclusion, the present study observed a predictive model that highlights that traditional risk variables such as BMI and sleep are good predictors for the identification of KOA (AUC: 0.71, 95%CI: 0.64-0.78) and the predictive capacity increased when IL-6 and IL-1β were also involved in the model (AUC: 0.80, 95%CI:0.74-0.86). Likewise, further addition of two susceptibility haplotypes, AGC and GGGGCT, with this model made it an excellent predictor for distinguishing knee osteoarthritis patients from controls (AUC: 0.91, 95%CI: 0.87-0.95).