A Functional Polymorphism Downstream of Vitamin A Regulator Gene CYP26B1 Is Associated with Hand Osteoarthritis

While genetic analyses have revealed ~100 risk loci associated with osteoarthritis (OA), only eight have been linked to hand OA. Besides, these studies were performed in predominantly European and Caucasian ancestries. Here, we conducted a genome-wide association study in the Han Chinese population to identify genetic variations associated with the disease. We recruited a total of 1136 individuals (n = 420 hand OA-affected; n = 716 unaffected control subjects) of Han Chinese ancestry. We carried out genotyping using Axiom Asia Precisi on Medicine Research Array, and we employed the RegulomeDB database and RoadMap DNase I Hypersensitivity Sites annotations to further narrow down our potential candidate variants. Genetic variants identified were tested in the Geisinger’s hand OA cohort selected from the Geisinger MyCode community health initiative (MyCode®). We also performed a luciferase reporter assay to confirm the potential impact of top candidate single-nucleotide polymorphisms (SNPs) on hand OA. We identified six associated SNPs (p-value = 6.76 × 10−7–7.31 × 10−6) clustered at 2p13.2 downstream of the CYP26B1 gene. The strongest association signal identified was rs883313 (p-value = 6.76 × 10−7, odds ratio (OR) = 1.76), followed by rs12713768 (p-value = 1.36 × 10−6, OR = 1.74), near or within the enhancer region closest to the CYP26B1 gene. Our findings showed that the major risk-conferring CC haplotype of SNPs rs12713768 and rs10208040 [strong linkage disequilibrium (LD); D’ = 1, r2 = 0.651] drives 18.9% of enhancer expression activity. Our findings highlight that the SNP rs12713768 is associated with susceptibility to and severity of hand OA in the Han Chinese population and that the suggested retinoic acid signaling pathway may play an important role in its pathogenesis.


Introduction
Hand osteoarthritis (OA) is the most prevalent joint disease characterized by cartilage degeneration, bone sclerosis, and osteophytes, which manifests with loss of hand motion [1,2]. The disease affects different hand joints in a bilateral manner, frequently affecting the distal interphalangeal (DIP) and proximal interphalangeal (PIP) joints, the Heberden's and Bouchard's nodes, respectively, and the first carpometacarpal (CMC-1) joint [2,3]. The prevalence of hand OA differs by ethnicity [4][5][6][7][8]; however, most genetics studies of hand OA were conducted in European cohorts. In radiographic hand OA studies of Japanese [9] and Korean [10] populations, a higher prevalence of OA in the interphalangeal (IP) and a lower prevalence in the CMC thumb joints were reported when compared to those in Caucasian populations. Furthermore, previous studies have indicated that OA susceptibility loci identified in the European population are often not in accordance with Asian populations [11][12][13]. Hand OA is a multifactorial disease where age, sex, occupational activities, and genetics strongly contribute to the development and progression of the disease, which complicates efforts to identify pathogenic mechanisms [1,3].
Family-based and twin pair studies strongly suggest the genetic component of hand OA, with heritability estimates of~64% [14,15]. Early genome-wide linkage and association studies have identified multiple hand OA susceptibility genes, such as AGC1 [16][17][18], HFE [19,20], GDF5 [12], and A2BP1 [21]; however, neither of those studies show conclusive results, nor do they reach the genome-wide significance p-value. The first genomewide association studies (GWASs) of hand OA to report genome-wide significant results identified a single-nucleotide polymorphism (SNP) rs3204689 at chromosome 15q22 (p-value = 3.99 × 10 −10 , odds ratio (OR) = 1.51) [22]. This SNP is mapped to the ALDH1A2 gene that encodes retinaldehyde dehydrogenase 2 enzyme, which is involved in the conversion of retinoic acid during vitamin A metabolism [22]. Two other hand-OA associated genes have since been reported, MGP (p-value = 1.8 × 10 −15 ) [23] and WNT9A (p-value = 2.4 × 10 −13 ) [24]. The most recent large-scale meta-analyses of 177,517 individuals with OA (21,186 cases of hand OA) further identified 52 novel loci across 11 OA phenotypes, seven of which are associated with hand OA [25]. Most of these seven identified loci are annotated to the intron region, except for rs8112559, which is annotated upstream of the IRF2BP1 gene [25]. Their study incorporates nine populations, including East Asian; yet, all of the hand OA subjects were of European or Caucasian ancestries [25].
GWASs have since uncovered~100 OA genetic risk loci; however, only eight of them are hand OA-related [22][23][24][25][26]. Besides, these findings were primarily replicated in European and Caucasian populations [22][23][24][25][26]. Asian descent remains disproportionately underrepresented in the GWAS of hand OA. Hence, we carried out a GWAS to identify novel genetic susceptibility loci and genes associated with hand OA in 1136 Han Chinese subjects. We also carried out a reporter assay of the candidate SNPs to test their potential functional effects on hand OA.

Overview of the Study Population
A total of 1136 unrelated subjects (n = 420 hand OA-affected; n = 716 unaffected control subjects) met the recruitment criteria for this study (Figure 1). Table 1 shows a higher prevalence of female subjects (87%) affected by hand OA than male subjects (13%). Of the unaffected subjects, 76% were female and 24% were male. The mean age was 64.7 and 64.1 years for affected and unaffected subjects, respectively. No significant differences in the BMI between the affected and unaffected control subjects (24.64 vs. 24.67 kg/m 2 ). A summary of the sex and age distribution in the GWAS is presented in Supplementary Table  S5. The most common age group was 60-69 years for females and 70-79 years for males, in both the affected and unaffected control subjects.

Hand OA Genome-Wide Association Analysis
The MDS plot showed no evidence of population stratification between the GWAS and HapMap population (Supplementary Figure S1A) and between the affected and unaffected control subjects (Supplementary Figure S1B). The genomic inflation factor (λGC) of 1.0418

Hand OA Genome-Wide Association Analysis
The MDS plot showed no evidence of population stratification between the GWAS and HapMap population (Supplementary Figure S1A) and between the affected and unaffected control subjects (Supplementary Figure S1B). The genomic inflation factor (λ GC ) of 1.0418 also revealed a minimal effect of population stratification on the association results. The quantile-quantile (Q-Q) plot is deflated (Supplementary Figure S2), likely due to the moderate sample size.

CYP26B1/rs12713768 Is a Potential Causal Gene for Hand OA
Subsequent to GWAS, we performed post-GWAS analysis using FUMA to identify causative variants. SNP rs12713768 is predicted to be a potentially deleterious variant, with a C-score of 18.65 ( Figure 3B), and it has a regulatory function, with a RegulomeDB rank of 2b ( Figure 3C). We also used RegulomeDB to investigate 17 SNPs located on chromosome

CYP26B1/rs12713768 Is a Potential Causal Gene for Hand OA
Subsequent to GWAS, we performed post-GWAS analysis using FUMA to identify causative variants. SNP rs12713768 is predicted to be a potentially deleterious variant, with a C-score of 18.65 ( Figure 3B), and it has a regulatory function, with a RegulomeDB rank of 2b ( Figure 3C). We also used RegulomeDB to investigate 17 SNPs located on chromosome 2 (mapped within 200 kb to CYP26B1) in tissues most relevant to OA, including osteoblasts and chondrocytes. Fifteen of the tested SNPs returned scores of 1-6, four of which were predicted to have regulatory functions, with RegulomeDB scores ≤ 3. Two of these predicted regulatory variants were SNPs with strongest association signal identified in our GWAS (CYP26B1/rs883313, RegulomeDB score = 3a; CYP26B1/rs12713768, RegulomeDB score = 2b; Supplementary Table S4). Roadmap DHS annotations [27] further showed that SNPs rs12713768 and rs10208040 (LD; D' = 1, r 2 = 0.651) were mapped within the enhancer region. This variant is mapped to the polycomb-repressed region in chondrogenic cells and primary osteoblasts ( Figure 3D). The C alleles of SNP rs12713768 have also been previously associated with the higher expression of the CYP26B1 gene according to its eQTL analysis (GTEx Portal database; Figure 3E).
Next, we tested 24 SNPs in an independent hand OA cohort from Geisinger. Data for 20 SNPs were available from Geisinger, and none showed associations with the SNPs of the present study (Supplementary Table S6). We also tested previously reported hand OA-associated SNPs. Some signals are replicated, with p-values < 0.05, including 39 SNPs within the ALDH1A2 [22], SNP rs4764133 near MGP gene [23], and SNPs rs7294636 and rs11071366 nearest to C12orf60 and ALDH1A2, respectively [25] (Supplementary Table S7). They did not reach genome-wide significance, but the effects were in the same direction as in our cohort.
To clarify whether our top SNPs rs883313 and rs12713768 are specifically associated with hand OA, we performed GWAS of the 289 subjects affected by knee and/or hip OA (without hand OA). These subjects were previously excluded from our hand OA cohort ( Figure 1). These two SNPs, rs883313 and rs12713768, did not reach the suggestive p-value threshold of 10 −5 , having p-values of 0.3113 and 0.3946, respectively (Supplementary Table  S8). In addition, we also estimated the relative risk-conferring allele frequency of these SNPs. SNP rs883313 has a risk allele frequency of 0.766436, while rs12713768 has an allele frequency of 0.768166 compared to the allele frequency of 0.8381 and 0.8405 in the subjects with hand OA, respectively (Supplementary Table S9).
Furthermore, we analyzed the association of SNPs rs883313 and rs12713768 in the Taiwan Precision Medicine Initiative (TPMI) cohorts of hip OA (ICD-10 M16), knee OA (ICD-10 M17), and rheumatoid arthritis (RA; ICD-10 M05 and M06). The p-values of these SNPs did not reach p-values of <0.05 in all cohorts (Supplementary Table S10). Our findings suggest that these two SNPs are more likely to affect the development and severity of hand OA than those of hip OA, knee OA, and RA.
We further genotyped 33 SNPs in at least 188 hand OA-affected subjects to validate our findings. Fifteen out of the 16 SNPs on chromosome 2 were mapped within 150 kb of the gene CYP26B1, and one was within 200 kb. The genotype concordance between the two platforms was ≥98%, except for SNP rs1738263, which had a concordance rate of 96% (Supplementary Table S3). Taken together, our results suggest that rs12713768 is a potential regulatory variant for the novel hand-OA associated gene, CYP26B1.

Effect of SNP Variants rs12713768 and rs10208040 on the Predicted Enhancer Activity of CYP26B1 Gene
Two SNPs selected for validation, rs12713768 and rs10208040, were mapped to the predicted enhancer region (2:72,240,035-72,241,404, hg19). We also mapped the promoter of gene CYP26B1 at the position of 2:72,148,038-72,150,001 (hg19). We performed a reporter assay to determine if the predicted enhancer region is functional and whether SNPs rs12713768 and rs10208040 could affect the regulation of gene expression (Figure 4). Figure 4B shows that the CYP26B1 promoter has robust activity in driving reporter vector expression as compared to the pGL3-basic vector without a promoter (p-value = 0.0011). Alleles A and C of rs12713768 are in strong linkage disequilibrium (LD, D' = 1, r 2 = 0.651) with the T and C alleles of rs10208040, respectively ( Figure 5). We constructed the promoter and enhancer sequences with AT, AC, CT, and risk-conferring CC haplotype in a pGL3-basic vector ( Figure 4A). Sequences with the enhancer showed a 2.0-to 2.5-fold increase in reporter activity compared to the construct with the promoter only, supporting the functionality of the enhancer region ( Figure 4B). Moreover, the risk-conferring CC haplotype group displayed 18.9% higher enhancer activity than the AT haplotype group (p-value = 0.0429; Figure 4B). This suggests that the presence of risk-conferring CC haplotype in the enhancer region increases reporter gene activity. Our findings confirmed that the predicted enhancer sequence is functional and that the CC haplotype of SNPs rs12713768 and rs10208040 contributed to the increased activity of the enhancer.
the enhancer region ( Figure 4B). Moreover, the risk-conferring CC haplotype group displayed 18.9% higher enhancer activity than the AT haplotype group (p-value = 0.0429; Figure 4B). This suggests that the presence of risk-conferring CC haplotype in the enhancer region increases reporter gene activity. Our findings confirmed that the predicted enhancer sequence is functional and that the CC haplotype of SNPs rs12713768 and rs10208040 contributed to the increased activity of the enhancer.

Discussion
We identified novel causal variants, rs883313 and rs12713768, showing the strongest association signals with radiographic hand OA. The risk-conferring CC haplotype of SNPs rs12713768 and rs10208040 (LD, D' = 1, r 2 = 0.651), both located downstream nearest to the

Discussion
We identified novel causal variants, rs883313 and rs12713768, showing the strongest association signals with radiographic hand OA. The risk-conferring CC haplotype of SNPs rs12713768 and rs10208040 (LD, D' = 1, r 2 = 0.651), both located downstream nearest to the cytochrome P450 family gene CYP26B1, is predicted to increase the expression of CYP26B1. Our bioinformatics and functional validation further suggest that CYP26B1 is potentially a novel causal gene for hand OA.
CYP26B1 plays an important role in the vitamin A metabolism pathway by catabolizing excess retinoic acid into its inactive polar forms [28]. Retinoic acid, the most active metabolite derivative of vitamin A, is a critical signaling molecule in regulating chondrogenesis and osteogenesis during vertebrate embryonic growth and development [29,30]. Genetic deficiencies in CYP26B1 in mice display craniofacial abnormalities and limb malformation [31,32]. The CYP26B1 gene is also highly conserved in all chordates, further affirming its critical role in vertebrate development [33].
A retinoic acid-related gene (ALDH1A2) had previously been linked to the hand OA phenotype in the Icelandic population [22]. The ALDH1A2 gene encodes an enzyme involved in retinoic acid synthesis [22]. Studies by Styrkarsdottir et al. [22] and Shepherd et al. [34] revealed that the reduced bioavailability of retinoic acid might increase the risk of hand OA. Higher expression of CYP26B1 in OA hip cartilage compared to non-OA control hip cartilage was also observed in their RNA-seq data, albeit it was not significantly differentially expressed [34]. On the contrary, Davies et al. [35] proposed that excess levels of retinoic acid detected in synovial fluid of OA patients may cause detrimental effects on the cartilage. Taking our findings into consideration, we deduce that increased expression of CYP26B1 may also lower the level of retinoic acid in the tissues and lead to the same phenotypic outcome as Styrkarsdottir et al. [22] and Shepherd et al. [34]. The low bioavailability of retinoic acid is known to compromise cartilage integrity through modulation of the expression of a key chondrogenic transcription factor, SRY-Box Transcription Factor 9 (SOX9) [34]. Our study highlights the importance of retinoic acid and vitamin A metabolism pathway for hand OA phenotype in both European and Han Chinese populations, despite affecting different genes.
Joint pain is one of the hallmark symptoms of OA, and the treatment of OA is mainly aimed at pain relief [36]. Our GWAS identified several genes that have been documented for their functional relationship with OA pain (AGPAT4 [37] and ASIC2 [38][39][40]) and low bone mineral content (AGPAT4 [41]). AGPAT4 (1-acylglycerol-3-phosphate O-acyltransferase 4) encodes a catalyze enzyme involved in the conversion of lysophosphatic acid (LPA) to phosphatidic acid in the phospholipid biosynthesis [41]. McDougall et al. [37] has demonstrated that LPA contributes to the neuropathic component of OA. Furthermore, AGPAT4-knockout mice showed a decrease in bone mineral content [41]. ASIC2 (acidsensing ion channel 2) is a member of the proton-gated cation channels [42]. It is a nociceptor that is part of the amiloride-sensitive degenerin/epithelial sodium channel (DEG/ENaC) superfamily [39,42]. Previous studies demonstrated that ASIC2 expressed in bone cells may have a role in regulating the pain signal transmission associated with bone metabolism [39,40]. Cho et al. [38] observed an increase in the mRNA expression and protein levels of ASIC2 in the frozen shoulder patients as compared to the controls, implying its possible role in the pathogenesis of shoulder pain caused by frozen shoulder.
We also identified several genes expressing inflammatory chemokines that previously have been detected in the inflamed synovium samples of OA patients (CCL14 [43], CCL15 [43], and CCL16 [43]) or have been reported to be expressed in human fetal osteoblast and chondrocytes (CCL23 [44]). CCL15 was also detected to significantly elevate in the severe-stage OA group compared to the early-stage OA group [45]. The involvement of inflammatory chemokine in the pathogenesis of OA has gained more attention, as it has been known that chemokine can be secreted by chondrocytes and synovial cells, which can be detected in the synovial fluid of OA patients [45][46][47]. However, the complexity of the inflammation-induced signaling pathway warrants more studies to define the mechanisms in which chemokine may be involved in the pathogenesis of OA.
Nevertheless, the top SNPs identified in this study were not replicated in an independent cohort from Geisinger. These differences are most likely attributed to ethnic heterogeneity, as more than 95% of the Geisinger population is of white European ancestry [48]. SNP rs12713768 has a risk allele frequency of 47% and 55% in American and European populations, respectively, but it is higher, at 77%, in the East Asian population, according to the 1000 Genome Project (Phase 3; Supplementary Table S11). In addition, the Geisinger cohort used the ICD codes, rather than the strict radiographic criteria used in this study, which could also contribute to the differences in results.
We were also not able to validate the previously reported hand OA-associated variants. This could be owing to the relatively small sample size used in this study, heterogeneity of hand OA phenotypes and ethnic groups, and different case definitions (radiographically confirmed cases of hand OA in the present study as compared to ICD codes). The undetermined misclassification rate was a weakness for some of the previous studies and the Geisinger cohort used in this study. On the other hand, the lack of replication suggests the complexity of the polygenic background of hand OA, which involves multiple genes, variable phenotypic penetrance of the variants, and complex gene-environment interactions.
For the first time, our study revealed the association between CYP26B1 gene with hand OA. We identified SNPs located within a functional enhancer region mapped closest to the vitamin A metabolizing gene CYP26B1, which likely accounts for the progression of hand OA through the reduced bioavailability of retinoic acid. Future studies with larger sample sizes are required to replicate these findings and to identify markers that are weakly associated with the target traits. In addition, we also identified several genes linked to joint pain and inflammation that have not been associated with hand OA. However, further analysis is required to demonstrate their potential role as candidate genes for hand OA or to delineate their molecular mechanism in the association with hand OA. Our findings, together with previous genetic studies [22,34], highlight the genetic contribution of target SNP variants in the vitamin A metabolism pathway in association with the severity of hand OA.

Study Populations
A total of 1434 subjects were recruited from five medical centers in Taiwan (National Taiwan University Hospital, Taipei Medical University Hospital, Taipei Veterans General Hospital, Tri-Service General Hospital, and Linkou Chang Gung Memorial Hospital) between 2005 and 2009 ( Figure 1). All hand OA-affected and unaffected control subjects were of Taiwanese Han background, and range in age from 32 to 94 years (with a mean of 64.4 years; Table 1).

Inclusion Criteria
The inclusion criteria to define hand OA cases follow those of The Genetics of Generalized Osteoarthritis (GOGO) study [49]. The severity of hand OA was graded according to the Kellgren-Lawrence (KL) scaling system (grades 0-4) [50]. Radiographic hand OA was defined as the presence of three or more joint involvements of KL grade ≥ 2, at least one DIP of digits 2-5, two of the three involved joints within a joint group (DIP, PIP, or CMC), with the first IP joint of a thumb being considered in the PIP group, bilateral hand involvement, and no more than three swollen metacarpophalangeal (MCP) joints ≥2 by clinical examination, as defined in the Dictionary of Rheumatic Diseases [51].

Exclusion Criteria
Subjects were excluded if they had a diagnosis of bone and joint disease (including rheumatoid arthritis, gout arthritis, psoriatic arthropathy, hypertrophic osteoarthropathy, hypermotility syndrome, hemochromatosis, Paget's disease, spondyloarthropathy, posttraumatic OA, other secondary-form OA, or lupus) or had more than three swollen MCP joints with KL grade ≥ 2, or they were excluded if they had MCP changes compatible with hemochromatosis. Additionally, if fasting transferrin saturation (FE/TIBC) ratio outcomes were >55%, the subjects were excluded. Female subjects who had a positive urine pregnancy test at screening were excluded.

Inclusion Criteria
Non-OA controls for the study were recruited of age and ethnic match to the hand OA-affected subjects. All non-OA control subjects were age ≥ 30 years of Taiwanese Han ancestry. Non-OA controls were defined as subjects without OA in hands, hips, and knees (no more than two hand joints involvement of KL grade ≥ 2, hip-KL grade of <2, and knee-KL grade < 2, respectively). If greater than three joints with KL grade ≥ 2 were present in hand joints, then it should be neither bilateral joint involvement nor two joints in the same joint group (DIP, PIP or CMC). If the hip-KL grade is equal to 2, the review joint space width has to be ≥2.5.

Exclusion Criteria
Subjects with OA in the hand joint, or had hip and/or knee OA, or had any bone and joint disease as mentioned above, were excluded. Subjects with a knee-KL grade ≥ 2, and/or a hip-KL grade ≥ 2, or those that had any osteophytes in the hip with joint space width < 2.5, were excluded. Subjects were excluded if they had more than three swollen MCP joints with KL grade ≥ 2, MCP changes compatible with hemochromatosis, or iron overload > 55%. None of the first-degree relative subjects was enrolled in the study.
Of the 1434 subjects who participated, 289 individuals having knee and/or hip OA (without hand OA) were excluded from the analysis (Figure 1). A total of 1145 individuals were genotyped (n = 421 hand OA-affected; n = 724 unaffected control subjects). The replication cohort was identified from within the Geisinger MyCode community health initiative (MyCode ® ), a system-wide research biorepository in Geisinger, Pennsylvania, USA with more than 265,000 participants enrolled to date [48]. Hand OA-affected and control population of the Geisinger cohort was identified using International Classification of Diseases (ICD) codes (Supplementary Table S1).

Genotyping, Quality Control, and Imputation
Genomic DNA was extracted from the blood using the QIAamp DNA Blood Mini kit (QIAGEN Inc., Valencia, CA, USA). Genotyping was performed using the Axiom Asia Precision Medicine Research Array (Affymetrix, Santa Clara, CA, USA). Genotype calls were determined using the BirdSeed genotyping algorithm implemented in Affymetrix Power Tools (Affymetrix, Santa Clara, CA, USA). The overall call rates of all samples were >95%. Per-individual quality control (QC) excluded the following samples for further analysis: (i) call rate < 95%; (ii) discrepancy between recorded and genotype-determined sex; (iii) first-degree relatives identified by kinship analysis; and (iv) outliers in the multidimensional scaling plot. Genotype QC for each SNP was evaluated by the total call rate and minor allele frequency (MAF) in cases and controls combined. Autosomal SNPs that met the following criteria were excluded from further analysis: (i) nonpolymorphic, (ii) sample call rate < 95%, (iii) MAF < 1% and total call rate < 99%, and (iv) Hardy-Weinberg equilibrium p-value < 1 × 10 −6 . After QC and removing individuals who had kinship relationships (one in cases and eight in controls), 507,468 SNPs were retained in the 420 hand OA-affected and 716 unaffected control subjects (Supplementary Table S2). Genotype data passed QC were imputed with IMPUTE2 (https://mathgen.stats.ox.ac.uk/impute/impute_v2.html) with East Asian 1000 Genomes Project data as the references. After imputation (info score > 0.9), 4,587,241 SNPs were obtained.

Genome-Wide SNPs Cross-Platform Validation
Thirty-three SNPs that are located nearest to CYP26B1 or have p-values < 1 × 10 −5 were further validated (Supplementary Table S3). Top SNPs rs883313 and rs12713768 were validated in all controls and cases (n = 1136). The other 31 SNPs were validated in at least 188 hand OA-affected subjects in the current study. SNPs of interest were genotyped using either the Sequenom MassARRAY iPLEX platform (Sequenom, San Diego, CA, USA) at the National Center for Genome Medicine, Academia Sinica, Taiwan, or standard Sanger direct sequencing on an ABI Prism 3730XL DNA Analyser (Applied Biosystems, Foster, CA, USA).

Statistical and Bioinformatics Database Analysis
All statistical analyses were performed using PLINK1.07 (http://pngu.mgh.harvard. edu/~purcell/plink). Multidimensional scaling (MDS) showed that all individuals included in the study were clustered closely within the East Asian 1000 Genomes Project. The genomic variance inflation factor (λ GC ), calculated according to Devlin et al. [52], was 1.0418. We performed the Cochran-Armitage trend test to compare allele and genotype frequencies between cases and controls. The distribution of the p-value was examined by plotting a quantile-quantile (Q-Q) plot. Logistic regression models were performed to evaluate the odds ratio (OR) with 95% confidence intervals (CI). Manhattan and Q-Q plots were generated using the CMplot package (https://CRAN.R-project.org/package=CMplot; accessed on 12 March 2021) [53]. The locus-specific plot was generated using LocusZoom (http://locuszoom.sph.umich.edu/locuszoom/), with recombination rates taken from the East Asian component of the 1000 Genomes Project. Haplotype analysis was performed using SNPStats [54] and Haploview [55].
Variant regulatory region annotation was performed using the SNP2GENE tool from Functional Mapping and Annotation (FUMA) [56]. FUMA annotation includes combined annotation-dependent depletion (CADD) score [57], RegulomeDB score [58], chromatin state, and expression quantitative trait loci (eQTL) data. CADD ranks the pathogenicity of a variant as a C-score ranging from 1 to 99. Any variant with a C-score ≥ 10 is considered to be within the top 10% of deleteriousness substitutions in the human genome [59]. RegulomeDB online database (https://regulomedb.org/; accessed on 27 August 2021) [58] was used to explore the functionality of 17 SNPs on chromosome 2 in tissues most relevant to OA (Supplementary Table S4). It scores SNP functionality according to experimental datasets from the Encyclopedia of DNA Elements (ENCODE) project, Gene Expression Omnibus, and the published literature covering 962 datasets and over 100 tissues and cell lines [58]. SNPs are graded from ranks 1-6, with lower rank indicating the more likely the SNP is to be located within a potentially functional region [58].

Construction of Luciferase Reporter Plasmids
SNPs rs12713768 and rs10208040 (strong linkage disequilibrium (LD); D' = 1, r 2 = 0.651) were further selected for functional validation, as they were mapped to the predicted enhancer region (2:72,240,035-72,241,404, hg19) according to Roadmap DNase hypersensitivity site (DHS) annotations [27]. Five constructs were synthesized and cloned into the pGL3-basic vector by Bio Basic Inc. (Markham, ON, Canada). The pGL3-basic plasmid was digested with SacI and MluI, where the DNA fragment of chromosome 2:72,148,038-72,150,001 (hg19) was ligated to generate the pGL3-basic-P construct. The DNA fragment was excerpted from the promoter region of CYP26B1 (contains the region of ENSR00000118913 but without the region ENSE00001956510) according to Ensembl Database (http://www.ensembl.org/; accessed on 7 August 2020). Four of the pGL3-basic-P constructs were digested with BamHI and SalI. The predicted enhancer sequences with different interest variants of rs12713768 (2:72,240,527, hg19; C or A allele) and rs10208040 (2:72,241,295, hg19; C or T allele) were ligated into these constructs, generating constructs with risk haplotype, pGL3-basic-P-CC, pGL3-basic-P-CT, pGL3-basic-P-AC, and construct with alternative haplotype, pGL3-basic-P-AT. Positive clones were sequence-verified by Sanger sequencing. Renilla luciferase, the pRL-TK vector, was used as an internal control, while pGL3-basic vector was used as a negative control. Plasmid DNA was isolated using PureLink™ HiPure Plasmid Maxiprep Kit (Invitrogen, Waltham, MA, USA).

Cell Cultures and Luciferase Reporter Assay
Human embryonic kidney (HEK) 293T cells were maintained at 37 • C and 5% CO 2 in Dulbecco's modified essential medium, containing 10% FBS, 100 U/mL penicillin, and 100 µg/mL streptomycin antibiotics. Liposome-based transfection was carried out using Lipofectamine 2000 (Invitrogen, Waltham, MA, USA) according to the manufacturer's instructions. HEK293T cells were seeded into 96-well plates at a density of 10,000 cells per well for 24 h. Cells were co-transfected with the same plasmid copy number of each construct DNA and pRL-TK and lysed after 24 h. Luminescence was measured using Firefly & Renilla Luciferase Single Tube Assay Kit (Biotium, Fremont, CA, USA) on an EnSpire™ Multilabel Plate Reader (Perkin Elmer, Waltham, MA, USA). Firefly luciferase activity was normalized to the activity of Renilla luciferase. Three independent experiments, with at least four technical replicates, were performed per construct. The relative luciferase activities are presented regarding the activity of pGL3-basic-P construct being defined as 1.

Statistics for Luciferase Assays
Statistical analysis was conducted in R 4.1.1 (https://www.R-project.org/), and a p-value < 0.05 was considered significant. p-values were calculated using the unpaired two-tailed Student's t-test. Assay figures were plotted using the package ggplot2 [60].