Gene Expression Signature in Endemic Osteoarthritis by Microarray Analysis

Kashin-Beck Disease (KBD) is an endemic osteochondropathy with an unknown pathogenesis. Diagnosis of KBD is effective only in advanced cases, which eliminates the possibility of early treatment and leads to an inevitable exacerbation of symptoms. Therefore, we aim to identify an accurate blood-based gene signature for the detection of KBD. Previously published gene expression profile data on cartilage and peripheral blood mononuclear cells (PBMCs) from adults with KBD were compared to select potential target genes. Microarray analysis was conducted to evaluate the expression of the target genes in a cohort of 100 KBD patients and 100 healthy controls. A gene expression signature was identified using a training set, which was subsequently validated using an independent test set with a minimum redundancy maximum relevance (mRMR) algorithm and support vector machine (SVM) algorithm. Fifty unique genes were differentially expressed between KBD patients and healthy controls. A 20-gene signature was identified that distinguished between KBD patients and controls with 90% accuracy, 85% sensitivity, and 95% specificity. This study identified a 20-gene signature that accurately distinguishes between patients with KBD and controls using peripheral blood samples. These results promote the further development of blood-based genetic biomarkers for detection of KBD.

Therefore, we focused on the peripheral blood of patients with KBD to identify a specific gene signature that could be associated with the early diagnosis and the detection of KBD. In this study, 169 genes were selected as target genes, based on the results of previous genome-wide gene expression profile analysis in articular cartilage and peripheral blood mononuclear cells (PBMCs) from KBD patients [9,12] The expression levels of the 169 target genes in peripheral blood of 100 KBD patients and 100 healthy controls were evaluated via a microarray analysis. Based on the results, a 20-gene signature was identified and validated as a highly sensitive and specific tool for detecting KBD in the peripheral blood.

Differentially Expressed Genes
The microarray system was used to compare the gene expression levels of 169 target genes in the PBMCs from KBD patients versus controls. Fifty genes were identified as differentially expressed (18 up-regulated and 32 down-regulated) in the 100 paired of microarray data sets.

Identification of a 20-Gene Signature
The training set (n = 160) was divided into an inner training set (n = 159) and an inner test set (n = 1). The mRMR algorithm was used to select a subset of n genes which exhibited a maximum relevance to the clinical status of the KBD and also a minimum redundancy within the 50 differentially expressed genes identified by microarray analyses. Next, the SVM model was built in the inner training set by using the linear kernel based on n genes selected by mRMR. Two parameters of SVM, which determined the accuracy of the model, were optimized by the program named gridergression.py. First, the parameters "c" abbreviate from "cost": set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 0.5); second, the parameters "g" abbreviate from "gamma": set gamma in kernel function (default 1/k). Then, the SVM model was used to predict the class of the inner test set. The prediction was dependent on the expression level of the inner test set without knowledge of the true class of the test sample. We compared the predicted class of the sample with the true class label of the inner test set. If they were consistent, the prediction was correct.
Then, a new classification model was built with the new inner training set and the inner test set. This time, another sample was in the inner test set, and all of the other 159 samples were in the inner training set. With the same mRMR algorithm for gene selection, the new SVM model based on the new training set was built, and the gene set selection was not exactly the same as in the previous model. Once again, the SVM model was used to predict the class of the inner test set. If the prediction was consistent with the true class label, then the prediction was correct. The process described above was repeated by leaving each of the 160 samples out of the training set one at a time. Therefore, 160 different SVM models were built, and each model was performed to predict the class of the inner test set. Finally, the number of the correct predictions was summed and considered the LOOCV (leave-one-out cross validation) accuracy rate. The amount of genes selected by the mRMR algorithm could be set as a predefined variable. We began with one gene, and added one gene at a time to n = 50. Then, we calculated the accuracy rate of LOOCV for each n genes and the optimal gene signature was determined by LOOCV performance. Eventually, 8000 different SVM models were built to evaluate the gene signature performance. The classification accuracy was 97.5% when 20 genes were included in the signature, and the accuracy increased when more genes were added to the signature.
Our aim in this study was to identify a high-performing gene signature with compact size, which would be a rational application for future development. Therefore, we identified a 20-gene signature as the optimal size. Finally, the 20-gene signature identified by the mRMR algorithm and the SVM model in the training set was used to discriminate between patients with KBD and the normal controls in the test set (n = 40). The prediction was based on the expression level of the test set without using knowledge of the true KBD class of the sample. The twenty gene expression signature discriminated the KBD patients from the controls with 90% accuracy, 85% sensitivity, and 95% specificity.

Statistical Analysis of the KBD Degree with the 20-Gene Signature
To further elucidate the relationship between signature genes and detection of KBD, Bayes discriminant analysis (BDA) algorithm was performed with the 20-gene signature (Table 1) to determine the KBD degree. Based on the expression ratios of the 20 signature genes, BDA correctly classified the KBD degree (I, II) in 91% (91/100) of the patients. After a LOOCV, BDA correctly classified the KBD degree in 82% (82/100) of the KBD patients (Table 2). In addition, linear correlation analyses showed that none of the 20 genes was found to be differentially expressed based on age. When 100 KBD patients were divided into four age groups and an ANOVA test was performed to observe the interaction between age and gene expression, there were no significant differences in gene expression among the four age groups. a Differentially expressed genes between the KBD vs. controls were assessed using the selection criteria described in Materials and Methods. According to the fold change value, only those genes that showed significant differences (p-value < 0.05) in expression were screened as differential expression genes in KBD and were identified as signature genes are listed; b fold change, the mean and standard error of the mean (SEM) of the fold change in expression of each gene.

Quantitative RT-PCR Analyses
To validate the microarray data, six differentially expressed genes were selected for qRT-PCR analysis using PBMCs samples from an additional 10 KBD patients. The expression levels of ABI3BP, BIRC3, CSGALNACT1, SIGLEC8, SSBP1, and TTC25 in the PBMCs of KBD patients were shown to be significantly different from the normal controls ( Figure 1). Importantly, the changes were consistent with those revealed by the microarray data. in PBMCs of controls and KBD patients.Steady-state mRNA levels were quantitated by two-step SYBR Green real time RT-PCR. The comparative Ct method was used for the calculation. The lines inside the boxes denote the medians. The boxes mark the interval between the 25th and 75th percentiles. The whiskers denote the interval between the 10th and 90th percentiles. The "○"indicates data outside the 10th and 90th percentiles.

Discussion
Microarray analysis was used to identify the gene expression signatures of peripheral blood, which is a technique that has been demonstrated as an effective tool for a number of diseases, including schizophrenia and bipolar disorder [13], coronary artery disease [6], rheumatoid arthritis, juvenile arthritis, spondyloarthropathy [14,15], and osteoarthritis [7]. In this study, we identified a 20-gene signature in peripheral blood samples that accurately discriminates patients with KBD from the controls, as well as the degree of severity of KBD in patients. Our results suggest that these biomarkers may be used as a new method to diagnose early-stage KBD.
According to the national diagnostic criteria of Kashin-Beck disease (WS/T207-2010) in China, the diagnosis and determination of the severity of KBD is primarily based on the symptoms and metaphysis changes observed radiologically. However, KBD is an insidious disease with slow progression, and there are still suspected cases in endemic areas that have not manifested any pathological metaphyseal changes still have the risk of suffering from KBD. (WS/T207-2010) criteria tend to identify advanced KBD cases, but it could not identify patients at early stages. This indicates that the X-ray diagnosis method is much less sensitive for detecting the potential KBD or early stage patients. Therefore, we further emphasize the need for a non-invasive and sensitive early KBD diagnosis. Our results provided a high-performing 20-gene signature, which could be used as a biomarker to discriminate between patients with KBD and the normal controls with 90% accuracy, 85% sensitivity, and 95% specificity. In our test set (n = 40), two samples were diagnosed as KBD by application of the 20-gene signature, while they were diagnosed as normal by using (WS/T207-2010) criteria. Next, we are planning to validate the 20-gene signature in a large sample size of children with KBD as a biomarker diagnostic method to further establish a reliable and stable method of early KBD diagnosis.
The genes identified in this study belong to various functional categories, including metabolism, transcription, ion channel, signal transduction, transport proteins, apoptosis, immunity, growth factors, and receptors, among others. CACNG6 encodes a gamma subunit of a calcium channel, which functions in calcium ion transport across cell membranes. Calcium channels play an important role in the process of osteoblast differentiation into osteocytes and osteoclasts, and both calcium ions and calcium channels are crucial for apoptosis [16]. A number of hormones and cytokines that act directly on osteoblasts are regulated by calcium ions (via calcium channels) [17]. A previous study has shown the relationship between CACNG6 and KBD [18]. We found that CACNG6 was down-regulated in the peripheral blood of the KBD patients, which may reflect the differentiation disorder and apoptosis of chondrocytes observed in KBD patients. BIRC3 is a member of the BIRC family, and apoptosis regulation is one of the important functions of their encoded proteins [19,20]. Furthermore, BIRC2, BIRC3 and TRAF2 had been identified as critical genes in the activation of NF-κB signaling in TNF pathways [21]. At the same time, BIRC3 was observed to be significantly more up-regulated in KBD than control, which means BIRC3 might be one of the critical genes in NF-κB signal changing.
Growth differentiation factor 5 (GDF5), which plays an essential role in skeletal development, is a member of the bone morphogenetic (BMP) gene family and TGF-beta superfamily [22]. It also promotes the development, maintenance, and repair of synovial joint tissues, particularly for bone and cartilage [23]. It has been reported to be associated with OA in the Asian population [24]. A previous study examining the association between GDF5 gene polymorphisms and Kashin-Beck disease found that the polymorphism of haplotype TGC is associated with KBD [25]. In this study, the observed down-regulation of GDF5 indicates it plays an important role in the pathogenesis of KBD.
Among the 20 candidate genes, ABI3BP, CTSC, FZD1, ERH, CSGALNACT1, and SSBP1 were the most highly differentially expressed genes in the cartilage tissue of KBD or OA patients [1,12]. Cathepsin C (CTSC) is a lysosomal amino peptidase and a member of the papain family of cysteine proteinases. Cathepsins are synthesized in the endoplasmic reticulum as pre-proproteins consisting of a signal peptide, and can degrade extracellular matrix proteins, such as collagens [26]. Previous studies have demonstrated that enhancer of rudimentary homolog (ERH) may be a critical transcription regulator that also functions in the control of cell-cycle progression [27]. Single-stranded DNA binding protein 1, mitochondrial (SSBP1) is a housekeeping gene whose expression is regulated in response to signaling pathway that is involved in mitochondrial biogenesis in mammalian cells [28]. It is also a subunit of a single-stranded DNA (ssDNA)-binding complex involved in the maintenance of genome stability [29]. SSBP1 was up-regulated in our study, and KBD patients do exist with mitochondria dysfunction. The reactive oxygen content is increased in articular chondrocytes, and cytochrome c is released from mitochondria to cytoplasm which stimulates apoptosis [30].
Chondroitin sulfate N-acetylgalactosaminyl-transferase-1 (CSGALNACT-1) is one of the enzymes involved in chondroitin sulfate metabolism, which participates in chondroitin sulphate (CS) chain formation of aggrecan [12,31]. It also plays a key role in degeneration of the extracellular matrix of cartilage and in chondrogenesis at early developmental stages [32]. Additionally, some animal model research suggests that CSGALNACT-1 is important to endochondral ossification [33]. Down-regulation of this gene has been identified in the cartilage of KBD and OA patients, and it may also participate in the Wnt/β-catenin signaling pathway [14,34]. The same results were found in peripheral blood. These results together indicate that CSGALNACT-1gene expression is likely similar in cartilage and in PBMCs of KBD patients.
Protein products of frizzled genes are cell membrane receptors for Wnt proteins, which play multiple roles during development [35]. Furthermore, the Wnt/β-catenin signaling pathway is vital for development, growth, and homeostasis of the joints and the skeleton [36]. This study found that the gene FZD1 was down-regulated in the peripheral blood of KBD patients, while both FZD1 and FZD10 were up-regulated in the cartilage of KBD patients [12]. Although the expression was inconsistent, the alteration of FZD1 may contribute to the development of KBD through the Wnt/β-catenin signaling pathway. FZD1 and the Wnt/β-catenin signaling pathway could be the next targets in KBD studies. These genes may be worthy of further study regarding how alteration of cartilage in KBD influences changes in blood on a molecular level.
Microarrays have been used to predict and differentiate between benign and malignant tumors, as well as the size and degree of tumors, using gene expression ratios [37][38][39]. Using this technology, the identified gene signatures were able to accurately discriminate KBD patients from the controls. In this study, we reclassified the degree of severity of KBD using expression ratios of 20 candidate genes via discrimination analysis. The KBD patients could be accurately identified on a molecular level. Microarray analysis can thus provide greater information in cases in which it is difficult to classify or distinguish between the clinical types of KBD. The additional evidence that these 20 genes may be considered early diagnostic biomarkers of KBD is that the expression of candidate genes was not altered by age according to linear correlation analyses. Furthermore, our group performed another microarray experiment in PBMCs of juvenile KBD using the 20 signature genes. It was demonstrated that these 20 genes are also differentially expressed in PBMCs of children with KBD [40].
KBD and primary OA display similar clinical manifestations and pathologic changes in the articular cartilage [1]. However, KBD occurs mostly in children ages 3-12 years and becomes symptomatic in adults [41], Osteoarthritis, an age-related disease of the joints, is the most common form of arthritis, and is a leading cause of the disability in an aging western population [42,43]. Therefore, it is important to study the two types of osteoarthropathy comparatively. Previously, Duan et al. found a number of differentially expressed genes in the comparative analysis of gene expression profiles between KBD and OA in cartilage [1]. Marshall et al. found nine blood-based biomarkers, which could accurately discriminate the mild OA patients from normal controls [7], and were completely different from our 20-gene signature. Therefore, the 20-gene signature we identified might be considered a method of differential diagnosis between KBD and OA at early stages.

Ethics Statement
This study was approved by the Institutional Review Board (IRB) of Xi'an Jiaotong University (Xi'an, China). All the participants gave their informed consent by signing a document that had been carefully reviewed by the IRB of Xi'an Jiaotong University.

Patients and Study Design
All subjects were randomly chosen from Yong-shou and Lin-you counties in Shaanxi province, which are the endemic areas of KBD, with a prevalence of 20.4% and 10.5%, respectively. Patients were diagnosed according to the national diagnostic criteria of Kashin-Beck disease (WS/T207-2010) in China (Table 3). The KBD patients were classified as having a first-degree (I) KBD manifestation due to multiple and symmetrical enlargements of phalanges or other joints, limitation of motion, arthralgia, mild muscle atrophy, and thickening of other limb joints. Patients with the symptoms of first degree KBD with additional brachydactylia were diagnosed as second-degree (II) KBD. The KBD patients were compared with age and sex-matched healthy controls without symptomatic and clinical KBD according to the criteria of (WS/T207-2010). All of patients and controls with other types of osteoarthropathy and other chronic diseases, such as cardiovascular disease, diabetes and hypertension, were excluded. According to the inclusion and exclusion criteria of the sample selection described above, 100 patients (50 degree I and 50 degree II KBD patients) and 100 healthy controls were chosen for this study. These samples were divided into 100 pairs, each consisting of one KBD patients and one healthy control person, who was age and sex-matched. All subjects were of Chinese Han lineage. The microarray analyses were performed for each of the 100 pairs of the samples. Then, the 100 KBD patients and 100 healthy controls were completely randomized and divided into training and test sets. The training set consisted of 80 patients with KBD and 80 controls. In the test set, an independent cohort of 20 patients with KBD and 20 controls was used. The microarray analyses of the 100 paired samples were used to evaluate the expression level of 169 target genes. Then, the expression levels of target genes were used to establish a statistical model based on the SVM algorithm to predict KBD and healthy condition. The statistical model established in the training set was applied to the test set to validate the ability of the gene signature to identify the KBD patients.
An independent cohort of 5 patients with KBD and 5 controls (Table 4) was used to evaluate the validity of the obtained microarray data through parallel analysis of selected transcripts using a quantitative real-time reverse transcription polymerase chain reaction (qRT-PCR). Figure 2 depicts the different components of the study design, and Tables 3 and 4 summarize the clinical characteristics of the samples in the study. Table 4. Characteristics of the patients with KBD and the controls used for quantitative RT-PCR analysis.

Selection of the Target Genes
In this study, 169 genes to be used in a custom-made microarray were selected as target genes from 2 databases of previously published microarray analyses [9,12]. In those studies, genes with a >2 or <0.5-fold-change in gene expression level obtained from KBD and healthy persons and a p value <0.05 and biologic function related to cartilage or selenium were selected. Finally, the 169 genes refined by application of selection criteria above were selected as target genes (The list of 169 target genes can be found in the online supplementary materials).

Blood Collection and the PBMC Isolation
Peripheral blood (4 mL) from each subject was collected into heparinized Vacutainer ® tubes (Becton Dickenson, San Jose, CA, USA) for the gene expression analysis. Leukocyte cell numbers were determined using a Hemovet 950 (Drew Scientific, Oxford, CT, USA). The PBMCs were separated from plasma by centrifuging the blood at 1500× g for 20 min. The cell pellets were resuspended in Hanks' balanced salt solution (Gibco BRL/Invitrogen, Carlsbad, CA, USA). The cell suspensions were layered over 5 mL of Lympholyte-H (Cedarlane Labs, Hornby, BC, Canada) in a 15-mL Falcon tube and were centrifuged at 1500× g for 40 min. After rinsing twice with cold Hanks' balanced salt solution, the cells were stored in RNAlater ® (Ambion Inc., Austin, TX, USA) until RNA isolation. Figure 2. Study design. The gene expression in the peripheral blood of 100 patients with KBD and 100 controls were investigated in different parts. We first used previously published experiment data on gene expression profiles of adult articular cartilage and PBMCs with KBD to select the target genes that were significantly differentially expressed between patients with KBD and controls. One hundred pairs of microarray were then applied to evaluate the expression of the target genes to identify the differentially expressed genes based on a large population. A gene expression signature was identified using a training set (n = 160) and validated using an independent test set (n = 40).

RNA Preparation
Total RNA was isolated from the PBMCs using Trizol ® reagent (Life Technologies Inc., Carlsbad, CA, USA) following the manufacturer's recommended protocol. The quality and integrity of the extracted total RNA were determined using a high-resolution electrophoresis system (Agilent 2100 Bioanalyzer, Agilent Technologies, Palo Alto, CA, USA).

Microarray Hybridization
The isolated total RNA from all KBD patients and healthy controls were transcribed into complementary DNA (cDNA) and were then reverse-transcribed into cRNA and labeled with CyDye using the Amino Allyl Message Amp™ a RNA Amplification Kit (Ambion) according to the manufacturer's instructions. Before reverse transcription, RNase-free DNase I was used to remove residual genomic DNA from the total RNA. For each sample, 0.5 µg of labeled cRNA was purified separately and then mixed with hybridization buffer before applying to microarrays. The custom-made primer array, which contains 169 oligonucleotide probes representing the selected 169 human genes (manufactured by the National Engineering Research Center for Miniaturized Detection System in Xi'an, China), was used to perform microarray hybridization following the recommended protocol. This is a two-color microarray that uses a two-channel labeling system. The control cRNA was labeled with Cy3, and the KBD cRNA was labeled with Cy5. The fluorescent spots that failed to pass the quality control procedure were not used for further analysis.

Gene Expression Analysis
Microarray slides were scanned with Gene-Pix 4000B (Axon Instruments Inc., Foster City, CA, USA), and GenePix Pro 3.0 software (Axon Instruments Inc.) was used to analyze the 16-bit TIFF images produced by the scanner. The fluorescent signal intensity was considered gene expression data. Ratio images for all of the spots were defined by accessing the gene list file describing the microarray location of each gene. After the average signal intensity was subtracted from the median back intensity, the gene expression data were imported into Excel spreadsheets (Microsoft Corp., Redmond, WA, USA) using Unigene and GenBank descriptors. A global normalization was conducted to calculate the relative expression levels between two samples using all of the detected genes. Each gene was identified as up-regulated or down-regulated if the fold change was more than two-fold or less than 0.5-fold, respectively. The fold change value is the ratio of the signal intensity of the KBD sample compared to that of the control sample in each pair. Significant differences in gene expression between the KBD and the control samples were determined by Student's t-test. p-values were calculated by the standard combinatorial approach and then adjusted for multiple testing by the Bonferroni method. p-values less than 0.05 were considered statistically significant. All data have been deposited in NCBI GEO and are accessible through GEO series accession No. GSE59446 (http://www.ncbi.nlm.nih.gov/ geo/query/acc.cgi?acc=GSE59446).

Statistical Analysis
The minimum redundancy maximum relevance (mRMR) algorithm was used for gene selection [44]. A library for support vector machine (LIBSVM) algorithm, a linear classification, was used for establishing a statistical model and classification [45], and leave-one-out cross validation (LOOCV) was conducted for the sampling. LOOCV was applied in this analysis. The Statistical Package for the Social Science for Windows version 18.0 (SPSS 18.0, IBM, Armonk, NY, USA) was applied to identify whether a correlation exists between age and gene expression by a linear correlation and ANOVA test. The Bayes discriminant analysis (BDA) algorithm was used to evaluate the accuracy of the KBD degree (I, II) identification with the expression ratio of the gene signature by using the SPSS 18.0 program. For the statistical analyses of quantitative real-time RT-PCR (qRT-PCR), the Mann-Whitney Wilcoxon Test was performed to determine the significance of level-of-expression differences for the selected genes between KBD and healthy controls.

Quantitative Real-Time PCR Validation
Quantitative real-time PCR was used to confirm the expression levels of 6 genes, which were the 4 most significantly up-regulated and the 2 most significantly down-regulated. Total RNA was isolated from an additional 10 subjects and prepared as for oligonucleotide microarray analysis, and then qRT-PCR was performed. The RNA was then converted into complementary DNA (cDNA) using Superscript II reverse transcriptase (Invitrogen, Karlsruhe, Germany) and random primers. QRT-PCR was performed using the ABI7500 Real-Time PCR system (Applied Biosystems, Foster City, CA, USA) according to the manufacturer's instructions. The primer sequences are listed in Table 5

Conclusions
This study describes the development of a blood-based, 20-gene signature that differentiates patients with KBD from controls with a high degree of accuracy in a large number of participants. These results provide a basis for further development of blood-based gene expression biomarkers for the detection of KBD. This first step is vital for identifying molecular markers for the early diagnosis of KBD.