Molecular Subtype Classification of Postmenopausal Osteoporosis and Immune Infiltration Microenvironment Based on Bioinformatics Analysis of Osteoclast-Regulatory Genes

Osteoporosis is common in postmenopausal women but is often asymptomatic until a fracture occurs, highlighting the importance of early screening and preventive interventions. This study aimed to develop molecular subtype risk stratification of postmenopausal osteoporosis and analyze the immune infiltration microenvironment. Microarray data for osteoporosis were downloaded and analyzed. Logistic and least absolute shrinkage and selection operator (LASSO) regression analyses were used to construct the molecular risk model. Circulating blood samples were collected from 10 enrolled participants to validate the key differentially expressed genes, and consistent clustering based on the expression profiles of candidate genes was performed to obtain molecular subtypes. Three key genes, CTNNB1, MITF, and TNFSF11, were obtained as variables and used to construct the risk model. External experimental validation showed substantial differences in the three key genes between patients with osteoporosis and the controls (p < 0.05). Three subtypes were obtained based on dimensionality reduction clustering results. Cluster 3 had significantly more patients with low bone mineral density (BMD), whereas Cluster 2 had significantly more patients with high BMD (p < 0.05). This study introduced a novel molecular risk model and subtype classification system, which is an evidence-based screening strategy that will guide the active prevention, early diagnosis, and treatment of osteoporosis in high-risk postmenopausal women.


Introduction
Osteoporosis is a systemic, aging-related, skeletal disease characterized by low bone mass and microstructural destruction [1].In clinical practice, osteoporosis is diagnosed as a bone mineral density (BMD) T-score of −2.5 or less [2].Osteoporosis leads to increased bone fragility and greater susceptibility to fractures, thereby adversely affecting orthopedic surgery because of the high incidence of implant failure [3].Epidemiological studies have shown that the prevalence of osteoporosis in postmenopausal women is as high as 32.1% [4].In addition, owing to its age-related characteristics and the aging of the global population, the number of patients with osteoporosis is expected to increase rapidly, resulting in higher complication rates, mortality, and medical burden [5,6].
Osteoporosis is common in postmenopausal women but is often asymptomatic until fractures occur.From the perspective of disease characteristics and cost-effectiveness, early screening and preventive interventions are required [7,8].There are many options for treating osteoporosis, including anti-resorptive and anabolic medications, as well as some natural compounds [9].However, knowledge of screening and prevention has not developed at the same rate as diagnosis and treatment advances in recent years.Dualenergy X-ray absorptiometry (DXA) of the hip and lumbar spine is the most widely used tool for the diagnosis of osteoporosis, and is the recommended test of BMD screening for all women aged 65 years or older by the US Preventive Services Task Force and other professional societies [10,11].However, in practice, it is difficult to screen the entire atrisk population using DXA without triage tests [10].Several epidemiological studies have developed clinical decision-making tools for osteoporosis risk assessment to screen postmenopausal women for BMD measurements [12][13][14].However, there is still room for these decision-making tools based solely on clinical information when it comes to their sensitivity and specificity [15].
Both genetic and environmental factors contribute to the development of osteoporosis, but compared with numerous previously reported clinical tools, risk assessment models of osteoporosis based on genetic information are rare [16].Since the clinical importance of the genetic determination of osteoporosis is incontestable, a study of molecular risk models based on regulation genes can be a timely and interesting contribution.Osteoclasts play an important role in maintaining the balance of bone metabolism and have been extensively studied in osteoporosis [17].Previous studies have reported a few genetic markers associated with osteoporosis [18,19].Comparisons between healthy and diseased individuals at the transcript level facilitate the identification of differentially expressed genes (DEGs), which may indicate disease causes or consequences and other factors correlated with the disease under scrutiny [20].In the present study, we aimed to further stratify the risk of osteoporosis in postmenopausal women based on DEGs.Although the immune infiltration microenvironment is closely related to the development of osteoporosis, little is known about its relationship with patients at different risk levels [21].Through this study, we also intended to speculate on the differences in the immune infiltration microenvironment in patients with different molecular subtypes.
To address these needs, in this study, we constructed a novel molecular risk model and molecular subtypes in postmenopausal women by identifying DEGs in osteoclasts.Such evidence-based screening strategies will guide the active prevention, early diagnosis, and treatment of osteoporosis for high-risk postmenopausal women.

Public Dataset Source and Processing
GSE56815, a microarray of osteoporosis data of circulating monocytes, was downloaded from the Gene Expression Omnibus (GEO) using the R package "GEOquery", comprising a total of 80 samples from 40 female patients with high BMD and 40 with low BMD.The data were pre-processed as follows: the probes corresponding to genes were retained and the no-load probes were removed; multiple probes corresponding to the same gene were then selected as average gene expression levels.The functions and corresponding gene sets related to osteoclast regulation were obtained from the Molecular Signatures Database (MSigDB).The gene expression profile data for these osteoclast-regulatory genes were selected from the GSE56815 dataset for further differential analysis.

Participants
This study was approved by the institutional review board of Honghui Hospital (no.202206029) and was performed according to the Helsinki Declaration of 1975, as revised in 2000.Written consent was obtained from all participants included in the study.Ten participants were enrolled in this study, with five each in the osteoporosis and control groups.Circulating blood was collected from all participants.These participants underwent spine surgery for degenerative lumbar disease in the hospital and were evaluated by a senior osteoporosis specialist.Details of the participants are provided in the Results section.All patients completed the Oswestry Disability Index and Karnofsky Performance Status scales at the time of enrollment to minimize the influence of the external factors of daily life on BMD.All patients were evaluated by a senior osteoporosis specialist for primary osteoporosis.

Identification of DEGs and Construction of the Molecular Risk Model
The "limma" package in R software (version 4.1.1,R Foundation for Statistical Computing, Vienna, Austria) was used to analyze the DEGs between high-and low-BMD samples in the GSE56815 dataset.Heatmaps and boxplots of DEGs were produced using R software (version 4.1.1)and the packages "pheatmap" and "ggplot".Univariate and multivariate logistic regression analyses were conducted to identify candidate genes related to BMD among the DEGs.The LASSO regression model reduces the dimensionality of the data to reduce noise or redundant genes.Candidate genes were subjected to LASSO regression analysis using the R package "glmnet".We used family = "binomial" to fit the model, and the most effective candidate genes and the optimal value of the penalization λ were determined via 10-fold cross-validation.The risk score of each patient with osteoporosis was calculated using the corresponding regression coefficients and the expression levels of the candidate genes.The calculation formula is as follows: where coe f i is the LASSO regression coefficient and expr i is the expression value of the gene.We divided the patients with osteoporosis into high-and low-risk groups based on the median value.The area under the receiver operating characteristic curve was calculated using the R package "pROC" to evaluate the prediction performance.

External Experimental Validation
Peripheral blood mononuclear cells (PBMCs) were isolated from the blood sample of each patient.Total RNA from PBMCs was extracted using TRIzol reagent (Ambion; Thermo Fisher Scientific, Waltham, MA, USA) following standard procedures.An optical density ratio at 260 nm/208 nm of 1.8-2.0satisfied the experimental requirements.RNA (2 µg) from each sample was reverse-transcribed into cDNA using 5× HiScript II Select qRT SuperMix II (VAZYME, Nanjing, China).qPCR was performed with SYBR Green Master Mix (VAZYME) using an ABI QuantStudio 6 Real-Time System (Applied Biosystems, Waltham, MA, USA), according to the standard protocols, programmed to perform 42 cycles in total.Specific primers for mRNAs were synthesized by Tsingke Biotechnology (Beijing, China).Relative mRNA transcript levels were normalized to those of β-actin.The primer sequences are as follows: CTNNB1 forward CCAAGTGGGTGGTATAGAGG, reverse AGTCCATAGTGAAGGCGAAC (156 bp); MITF forward CCAAAGAGAGGCA-GAAAAAGGA, reverse CGTGGATGGAATAAGGGAAAGTC (311 bp); TNFSF11 forward ATCTGGTTCCCATAAAGTGAG, reverse CGAAAGCAAATGTTGGCATA (141 bp); and β-actin forward CCCTGGAGAAGAGCTACGAG, reverse CGTACAGGTCTTTGCGGATG (180 bp).The expression level of each mRNA was calculated using the 2 − Ct method.

Abundance of Infiltrating Immune Cells
The CIBERSORT algorithm uses the deconvolution method to extract features from RNA-sequencing data and inversely calculates the proportions of various cellular components in bulk-seq.The R script "CIBERSORT" was used to detect the abundance of 22 immune cells in all samples.

Molecular Subtypes of Osteoporosis Samples
Consensus clustering is an algorithm that can be used to identify the members of clusters and their numbers in a dataset.We used the R package "ConsensusClusterPlus" to perform consistent clustering of samples based on the expression profiles of candidate genes to obtain three molecular subtypes.

Functional Enrichment Analysis and Gene Set Variation Analysis
The R package "clusterProfiler" was used to perform functional enrichment analysis for DEGs of each subtype, and the significantly enriched pathways and functions were screened using a threshold of p < 0.05.We downloaded the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway gene set from MSigDB and used the R package "GSVA" to calculate the pathway enrichment scores for each sample.Gene set variation analysis is a non-parametric, unsupervised method for estimating enriched variation in gene sets from samples of expression datasets.The gene set file "c2.cp.kegg.v7.5.1.symbols", containing 184 key genes, was downloaded from MSigDB.

PPI Network Construction and Topology Feature Analysis
We used the STRING database to map the PPI networks of 26 DEGs on human protein interaction networks in the database, and then, reconstructed the network using Cytoscape (version 3.9.1).The MCODE plugin was used to detect important co-regulation modules for sub-network analysis.The cytoHubba plugin was used to compute hub nodes.

Quantification and Statistical Analyses
All statistical analyses were performed using R, version 4.1.1(R Foundation for Statistical Computing, Vienna, Austria).The Wilcoxon test was used to calculate the statistical significance of two groups of variables, and the Kruskal-Wallis test was used to calculate the statistical significance of multiple groups of variables.Spearman's correlation coefficient was used to analyze the correlation between two groups of variables.Differences in clinical characteristics between the molecular subtypes were determined using the Chi-square test.

Functions of Osteoclast-Regulatory Genes
Functions related to osteoclast regulation were downloaded from MSigDB, and nine functions related to osteoclast regulation were obtained, comprising five osteoclast differentiation-related functions, three osteoclast development-related functions, and one osteoclast proliferation-related function.These functions included 100 genes related to osteoclast regulation, comprising 83, 13, and 4 genes regulating osteoclast differentiation, development, and proliferation, respectively.Figure 1A illustrates the location of osteoclastregulatory genes on the chromosome.We mapped the osteoclast-regulatory genes in the STRING database to obtain the interaction network of these genes.The majority of the genes were related to osteoclast differentiation, whereas the osteoclast proliferationrelated genes were in the minority.Moreover, there was close interaction between the three gene groups.There was an interaction between proliferation-related genes and a large number of differentiation-related genes.In particular, this analysis suggested that development-related SRC genes may play an important role in osteoclast differentiation (Figure 1B).

Expression Levels and Correlation of Osteoclast-Regulatory Genes
The gene expression profile data of 96 osteoclast-regulatory genes were selected from the GEO GSE56815 dataset based on a microarray of standardized osteoporosis data of circulating monocytes, comprising a total of 80 samples from 40 female patients with high BMD and 40 with low BMD.Sixteen DEGs were found between the high-and low-BMD groups, among which six genes (LRRK1, ANXA2, POU4F2, CTSK, TNFSF11, and IFNAR1) were substantially overexpressed in the low-BMD samples and 10 genes (JUNB, TNF, GAB2, CCR1, IREB2, TOB2, CTNNB, GPR183, MITF, and LTF) were substantially overexpressed in the high-BMD samples (p < 0.05) (Figure 2).The locations of genes related to osteoclast regulation on chromosomes.(B) Protein-protein interaction network of three types of genes related to the regulation of osteoclasts and the proportion of each type of gene: osteoclast development-related genes (black), osteoclast differentiation-related genes (orange), and osteoclast proliferation-related genes (blue).

Expression Levels and Correlation of Osteoclast-Regulatory Genes
The gene expression profile data of 96 osteoclast-regulatory genes were selected from the GEO GSE56815 dataset based on a microarray of standardized osteoporosis data of circulating monocytes, comprising a total of 80 samples from 40 female patients with high BMD and 40 with low BMD.Sixteen DEGs were found between the high-and low-BMD groups, among which six genes (LRRK1, ANXA2, POU4F2, CTSK, TNFSF11, and IFNAR1) were substantially overexpressed in the low-BMD samples and 10 genes (JUNB, TNF, GAB2, CCR1, IREB2, TOB2, CTNNB, GPR183, MITF, and LTF) were substantially overexpressed in the high-BMD samples (p < 0.05) (Figure 2).Spearman's correlation coefficient was used to analyze the correlation between the 16 differentially expressed osteoclast-regulatory genes.TOB2 expression negatively correlated with TNFSF11 expression (R = −0.37,p < 0.01), whereas TNF expression positively correlated with JUNB expression (R = 0.63; p < 0.001) (Figure 3).These four genes are associated with osteoclast differentiation.TOB2 belongs to the antiproliferative protein family and is involved in the regulation of cell cycle progression.TNF is involved in the regulation of a wide range of biological processes, including cell proliferation, differentiation, apoptosis, and lipid metabolism.JUNB is involved in the positive regulation of transcription by RNA polymerase II.TNFSF11 is a key factor in osteoclast differentiation and activation.
Biomedicines 2023, 11, x FOR PEER REVIEW 7 of 20 Spearman's correlation coefficient was used to analyze the correlation between the 16 differentially expressed osteoclast-regulatory genes.TOB2 expression negatively correlated with TNFSF11 expression (R = −0.37,p < 0.01), whereas TNF expression positively correlated with JUNB expression (R = 0.63; p < 0.001) (Figure 3).These four genes are associated with osteoclast differentiation.TOB2 belongs to the antiproliferative protein family and is involved in the regulation of cell cycle progression.TNF is involved in the regulation of a wide range of biological processes, including cell proliferation, differentiation, apoptosis, and lipid metabolism.JUNB is involved in the positive regulation of transcription by RNA polymerase II.TNFSF11 is a key factor in osteoclast differentiation and activation.

Molecular Risk Model for Osteoporosis
Univariate logistic regression was performed using BMD as the dependent variable and the expression level of each differentially expressed osteoclast-regulatory gene as the independent variables.Fifteen genes were found to be statistically significant, among which CCR1, CTNNB1, GAB2, GPR183, IREB2, JUNB, LTF, MITF, TNF, and TOB2 emerged as protective factors, whereas CTSK, ANXA2, LRRK1, POU4F2, and TNFSF11 emerged as risk factors (p < 0.05) (Figure 4A).
Least absolute shrinking and selection operator (LASSO) regression analysis was performed on these genes to determine the coefficient change of each feature according to the lambda (λ) change; the λ and log (λ) values were set to 0.14 and −1.97, respectively (Figure 4C,F).Three of the fifteen genes, CTNNB1, MITF, and TNFSF11, were selected in LASSO regression to construct the risk model (risk score = Exp (CTNNB1) × (−1.167227) + Exp (MITF) × (−5.192496) + Exp (TNFSF11) × (12.019122)).The key roles of these three genes in osteoporosis were verified using multivariate logistic regression, with CTNNB1 (OR = 3.501 × 10 −7 , p = 0.016) and MITF (OR = 2.475 × 10 −14 , p = 0.002) as protective factors and TNFSF11 (OR = 9.245 × 10 22 , p = 0.003) as a risk factor (Figure 4B).These results indicated that CTNNB1, MITF, and TNFSF11 are of great significance in the occurrence and development of osteoporosis.Indeed, patients with low BMD had significantly higher risk scores from the model and patients with high BMD had lower risk scores (p < 0.001) (Fig- ure 4D).Finally, the receiver operating characteristic curve of all samples demonstrated the excellent diagnostic performance of our risk model (area under the curve = 0.8306; Figure 4E).Least absolute shrinking and selection operator (LASSO) regression analysis was performed on these genes to determine the coefficient change of each feature according to the lambda (λ) change; the λ and log (λ) values were set to 0.14 and −1.97, respectively (Figure 4C,F).Three of the fifteen genes, CTNNB1, MITF, and TNFSF11, were selected in LASSO regression to construct the risk model (risk score = Exp (CTNNB1) × (−1.167227) + Exp (MITF) × (−5.192496) + Exp (TNFSF11) × (12.019122)).The key roles of these three genes in osteoporosis were verified using multivariate logistic regression, with CTNNB1 (OR = 3.501 × 10 −7 , p = 0.016) and MITF (OR = 2.475 × 10 −14 , p = 0.002) as protective factors and TNFSF11 (OR = 9.245 × 10 22 , p = 0.003) as a risk factor (Figure 4B).These results indicated that CTNNB1, MITF, and TNFSF11 are of great significance in the occurrence and development of osteoporosis.Indeed, patients with low BMD had significantly higher risk scores from the model and patients with high BMD had lower risk scores (p < 0.001) (Figure 4D).Finally, the receiver operating characteristic curve of all samples demonstrated the excellent diagnostic performance of our risk model (area under the curve = 0.8306; Figure 4E).

External Experimental Validation of the Key Genes in the Osteoporosis Risk Model
Ten patients were enrolled in this study for experimental validation, including five with and five without osteoporosis (see Table 1 for the inclusion and exclusion criteria).The average age at enrollment was 62.2 ± 7.7 years, and the average time after menopause was 15.8 ± 6.3 years.The average BMD T-values in the osteoporosis group were −3.1 ± 0.4 (spine) and −3.4 ± 1.0 (hip), and those in the control group were −1.3 ± 0.8 (spine) and −1.1 ± 0.8 (hip).There was no significant difference between the control and osteoporosis groups in terms of body mass index (26.7 ± 5.3 vs. 23.6 ± 3.5; p = 0.309), serum calcium level (2.3 ± 0.1 vs. 2.3 ± 0.0; p = 0.633), serum phosphate level (1.3 ± 0.1 vs. 1.2 ± 0.2; p = 0.369), Oswestry Disability Index (0.3 ± 0.2 vs. 0.5 ± 0.1; p = 0.087), and Karnofsky Performance Status (82.0 ± 11.0 vs. 72.0± 4.5; p = 0.114).The quantitative polymerase chain reaction (qPCR) of osteoclast regulation genes selected in the risk model in these patients showed consistent results with the analysis of public microarray data.TNFSF11 was significantly upregulated (p = 0.001), whereas CTNNB1 (p = 0.001) and MITF (p = 0.017) were significantly downregulated, in the osteoporosis group (Figure 5).

Associations of Osteoclast-Regulatory Genes with the Immune Microenvironment
After dividing the patients into high-and low-risk groups according to the median value of the risk score, we calculated the abundance of immune cells in the two groups.The low-risk group had substantially more CD8 + T cells and the high-risk group had substantially more natural killer (NK) cells (Figure 6A).The expression of TNFSF11 positively correlated with the abundance of M1-subtype macrophages (R = 0.25; p = 0.028).The expression of MITF was significantly positively correlated with CD8 + T cells (R = 0.28; p = 0.012) and negatively correlated with resting NK cells (R = −0.24;p = 0.032).The expression of CTNNB1 was positively correlated with memory B cells (R = 0.256; p = 0.022) and negatively correlated with naive B cells (R = −0.273;p = 0.014) (Figure 6B).

Associations of Osteoclast-Regulatory Genes with the Immune Microenvironment
After dividing the patients into high-and low-risk groups according to the median value of the risk score, we calculated the abundance of immune cells in the two groups.The low-risk group had substantially more CD8 + T cells and the high-risk group had substantially more natural killer (NK) cells (Figure 6A).The expression of TNFSF11 positively correlated with the abundance of M1-subtype macrophages (R = 0.25; p = 0.028).The expression of MITF was significantly positively correlated with CD8 + T cells (R = 0.28; p = 0.012) and negatively correlated with resting NK cells (R = −0.24;p = 0.032).The expression of CTNNB1 was positively correlated with memory B cells (R = 0.256; p = 0.022) and negatively correlated with naive B cells (R = −0.273;p = 0.014) (Figure 6B).

Molecular Subtypes Mediated by Osteoclast-Regulatory Genes
Based on the osteoclast-regulatory genes (CTNNB1, MITF, and TNFSF11) in the risk model, we used the R package "ConsensusClusterPlus" to cluster the osteoporosis patients (Figure 7A).Based on the cumulative distribution function and delta area plots, we found that K = 3 was a suitable value; therefore, we divided patients with osteoporosis into three subtypes (Figure 7B,C).Dimensionality reduction clustering showed that the patients in clusters 2 and 3 grouped into one class, with Cluster 1 mainly distributed at the bottom, demonstrating heterogeneity among subtypes (Figure 7D).Additionally, the expression patterns of osteoclast-regulatory genes in the three subtypes were characterized (Figure 7E,F).The risk gene TNFSF11 was highly expressed in Cluster 3, whereas the protective genes CTNNB1 and MITF were highly expressed in clusters 1 and 2, respectively.Finally, we characterized the clinical characteristics across molecular subtypes (Figure 7G).Cluster 3 had considerably more patients with osteoporosis with low bone density who were classified in the high-risk group, whereas Cluster 2 had considerably more patients with high bone density and those classified in the low-risk group, according to our model.The Wilcoxon test was used to determine the significance of immune cells and related genes bet high-and low-risk groups.Spearman's correlation coefficient was used to the correla ns: no statistically significant difference; * p < 0.05; ** p < 0.01; *** p < 0.001.

Molecular Subtypes Mediated by Osteoclast-Regulatory Genes
Based on the osteoclast-regulatory genes (CTNNB1, MITF, and TNFSF11) in the model, we used the R package "ConsensusClusterPlus" to cluster the osteoporosi tients (Figure 7A).Based on the cumulative distribution function and delta area plot found that K = 3 was a suitable value; therefore, we divided patients with osteopo into three subtypes (Figure 7B,C).Dimensionality reduction clustering showed tha (Figure 7G).Cluster 3 had considerably more patients with osteoporosis with low bone density who were classified in the high-risk group, whereas Cluster 2 had considerably more patients with high bone density and those classified in the low-risk group, according to our model.

Functional Analysis of Different Molecular Subtypes
The differential analysis of pathway enrichment scores identified 27 significantly different pathways between subtypes.We found that primary immunodeficiency and some metabolism-related pathways, including cysteine and methionine metabolism and glycerophospholipid metabolism, were mainly enriched in Cluster 2, indicating that this cluster has metabolism-related characteristics.Additionally, the hedgehog signaling and steroid hormone biosynthesis pathways were mainly enriched in Cluster 3 (Figure 9).

Functional Analysis of Different Molecular Subtypes
The differential analysis of pathway enrichment scores identified 27 significantly different pathways between subtypes.We found that primary immunodeficiency and some metabolism-related pathways, including cysteine and methionine metabolism and glycerophospholipid metabolism, were mainly enriched in Cluster 2, indicating that this cluster has metabolism-related characteristics.Additionally, the hedgehog signaling and steroid hormone biosynthesis pathways were mainly enriched in Cluster 3 (Figure 9).

DEGs and Functional Analysis of Molecular Subtypes
Gene expression profiles were analyzed for differential expression between each subtype using the R package "limma".A total of 17, 11, and 20 DEGs (p < 0.05) were identified in clusters 1, 2, and 3, respectively (Figure 10A).Functional enrichment analysis of these DEGs using the R package "clusterprofiler" showed that Cluster 1 was mainly enriched in osteoclast differentiation, myeloid cell differentiation, and Toll-like receptor signaling (Figure 10B); Cluster 2 was mainly enriched in osteoclast differentiation, myeloid cell differentiation, and phagosome maturation (Figure 10C); and Cluster 3 was mainly enriched in osteoclast differentiation, myeloid cell differentiation, and Rap1 signaling (Figure 10D).

DEGs and Functional Analysis of Molecular Subtypes
Gene expression profiles were analyzed for differential expression between each subtype using the R package "limma".A total of 17, 11, and 20 DEGs (p < 0.05) were identified in clusters 1, 2, and 3, respectively (Figure 10A).Functional enrichment analysis of these DEGs using the R package "clusterprofiler" showed that Cluster 1 was mainly enriched in osteoclast differentiation, myeloid cell differentiation, and Toll-like receptor signaling (Figure 10B); Cluster 2 was mainly enriched in osteoclast differentiation, myeloid cell differentiation, and phagosome maturation (Figure 10C); and Cluster 3 was mainly enriched in osteoclast differentiation, myeloid cell differentiation, and Rap1 signaling (Figure 10D).

Potential Drug Targets Identified in the Protein-Protein Interaction (PPI) Network
The upregulated genes in each subtype (25 upregulated genes in total) were mapped using the STRING database and the PPI network was reconstructed using Cytoscape (Figure 11A).CTSK, CSF1, TNFSF11, CTNNB1, TNFRSF11A, and TNF formed a key sub-network (Figure 11B), and were screened as key genes via cytoHubba according to the node degree of the network (Figure 11C).Therefore, these six genes were selected as the hub node, among which TNF and CTNNB1 were the upregulated genes of Cluster 1, and CTSK, CSF1, TNFSF11, and TNFRSF11A were the upregulated genes of Cluster 3. The hub node was used as a query in DGIdb (https://dgidb.genome.wustl.edu/accessed on 4 September 2023), which obtained 100 gene-drug interaction pairs, including 29 gene-drug interaction pairs with an interaction score >1 (Figure 11D).

Potential Drug Targets Identified in the Protein-Protein Interaction (PPI) Network
The upregulated genes in each subtype (25 upregulated genes in total) were mapped using the STRING database and the PPI network was reconstructed using Cytoscape (Figure 11A).CTSK, CSF1, TNFSF11, CTNNB1, TNFRSF11A, and TNF formed a key subnetwork (Figure 11B), and were screened as key genes via cytoHubba according to the node degree of the network (Figure 11C).Therefore, these six genes were selected as the hub node, among which TNF and CTNNB1 were the upregulated genes of Cluster 1, and CTSK, CSF1, TNFSF11, and TNFRSF11A were the upregulated genes of Cluster 3. The hub node was used as a query in DGIdb (https://dgidb.genome.wustl.edu/accessed on 4 September 2023), which obtained 100 gene-drug interaction pairs, including 29 gene-drug interaction pairs with an interaction score >1 (Figure 11D).

Discussion
In this study, we detected three key genes, CTNNB1, MITF, and TNFSF11, using microarray data for osteoporosis.These genes were selected as variables to construct a molecular risk model for predicting osteoporosis (risk score = Exp (CTNNB1) × (−1.167227) + Exp (MITF) × (−5.192496) + Exp (TNFSF11) × (12.019122)).Additionally, a molecular subtype classification map in postmenopausal women was drawn by analyzing these key genes.The results showed that Cluster 3 had significantly more patients with low BMD, whereas Cluster 2 had significantly more patients with high BMD.Furthermore, the immune microenvironment, DEGs, functional pathways, and gene-drug interactions of different molecular subtypes were described in detail.The strategy of our model complements the existing clinical assessment tools for screening high-risk postmenopausal women with osteoporosis, while the subtype classification also provides a novel perspective for further research.
Loss of mobility after a fracture is often a trigger for fatal events in older adults [22].The high incidence and peculiar pre-fracture nature of osteoporosis emphasize the importance of screening for this disease.Dual-energy X-ray is a widely used and effective

Discussion
In this study, we detected three key genes, CTNNB1, MITF, and TNFSF11, using microarray data for osteoporosis.These genes were selected as variables to construct a molecular risk model for predicting osteoporosis (risk score = Exp (CTNNB1) × (−1.167227) + Exp (MITF) × (−5.192496) + Exp (TNFSF11) × (12.019122)).Additionally, a molecular subtype classification map in postmenopausal women was drawn by analyzing these key genes.The results showed that Cluster 3 had significantly more patients with low BMD, whereas Cluster 2 had significantly more patients with high BMD.Furthermore, the immune microenvironment, DEGs, functional pathways, and gene-drug interactions of different molecular subtypes were described in detail.The strategy of our model complements the existing clinical assessment tools for screening high-risk postmenopausal women with osteoporosis, while the subtype classification also provides a novel perspective for further research.
Loss of mobility after a fracture is often a trigger for fatal events in older adults [22].The high incidence and peculiar pre-fracture nature of osteoporosis emphasize the importance of screening for this disease.Dual-energy X-ray is a widely used and effective tool for diagnosing osteoporosis; however, it is not a practical screening tool.Although all women aged 65 years or older are recommended to undergo screening with DXA, nearly 25% of women aged 65-85 years never undergo a BMD test or have regular check-ups at the recommended frequency [10].Additionally, studies have found that fewer DXA scans are performed in high-risk patients, whereas excessive DXA scans are performed in low-risk postmenopausal women [23].A more detailed stratified strategy would be beneficial to complement age as a means of pre-screening for patients suitable for DXA.
Although rapid bone loss already occurs during the menopausal transition, there is no established risk assessment and screening strategy for younger postmenopausal women for osteoporosis [24].The heterogeneity of biological genetic information among different populations may be an important factor in the progression of osteoporosis since some women develop osteoporosis early after menopause [25,26].Risk assessment models of osteoporosis based on genetic information are rare, although several molecular models for other diseases have been developed [27][28][29].This is partly because of the poor genetic information available on osteoporosis.Although some previous bioinformatics studies have also analyzed DEGs related to osteoporosis, in this study, we intended to further stratify the risk of osteoporosis in postmenopausal women posed by DEGs (Figure 4), and our findings revealed the differences in the immune infiltration microenvironment in patients with different molecular subtypes (Figure 8) [18,19].Therefore, the molecular classification map obtained in this study can help to fill in these knowledge gaps and clinical needs.
We identified three key genes using regression analysis (Figure 4).CTNNB1 and MITF are protective factors, whereas TNFSF11 is a risk factor for osteoporosis.CTNNB1 is located at chromosome 3p22.1 and encodes important proteins for adherens junctions and the cytoskeleton system [30].MITF is located at chromosome 3p13 and encodes a transcription factor that was first identified for its critical role in promoting the survival of migrating melanoblasts [31].TNFSF11 is located at 13q14.11 and encodes a member of the tumor necrosis factor cytokine family, RANKL, which is a ligand for the receptor activator of nuclear factor-κB and osteoprotegerin, directly related to osteoclast differentiation and activation [32,33].
Our molecular risk model complements existing clinical assessment strategies for osteoporosis; however, using this molecular model alone is not recommended.We believe that a risk model system that includes both genetic and environmental factors can be a better solution for a disease determined by both aspects such as osteoporosis [34].Since genetic testing is an invasive operation, we need to first use a highly sensitive clinical assessment tool to screen perimenopausal women, and then, use this molecular model to comprehensively classify the risk of osteoporosis to determine the frequency of DXA examination for menopausal women.
Owing to the limited knowledge of the pathophysiology of osteoporosis, cases are currently classified as "primary" or "secondary" based on largely unexplained clinical associations [35].Our analysis revealed three molecular subtypes of primary osteoporosis according to differences in gene expression (Figure 7).Based on BMD levels in different groups, we speculate that the Cluster 3 subtype is more prone to a phenotype of low BMD with osteoporosis, Cluster 2 is a phenotype characterized by high BMD, and Cluster 1 showed an overall intermediate phenotype between clusters 2 and 3. Gene expression analysis supported this result; the risk gene TNFSF11 was significantly highly expressed in Cluster 3 and the protective gene MITF was highly expressed in Cluster 2. We further characterized the functional pathways of the different subtypes, which may provide clues for further in-depth studies of primary osteoporosis.
Another focus of this study was on the immune infiltration microenvironment across different groups (Figure 8).Over the past two decades, important discoveries in the etiology of osteoporosis have demonstrated the deep integration of the skeletal system with the immune system, and the regulation of gene expression in immune cells is closely related to bone aging [36,37].PDCD1 is an important immunosuppressive molecule [38], and was found to be highly expressed in Cluster 2 and the low-risk group in this study.Greisen et al. [39] confirmed that Pdcd1-knockout mice showed signs of osteoporosis.Additionally, patients in Cluster 3 had significantly more dendritic cells, which are the most functional antigen-presenting cells [40].These results provide insight into the role of the immune system in osteoporosis.Speculatively, autoimmunity may be an important factor in osteoporosis development.TNFSF11, which was upregulated in Cluster 3, is a dendritic cell survival factor involved in the regulation of T cell-dependent immune responses [41].In contrast, T-cell activation was reported to induce the expression of TNFSF11, leading to an increase in osteoclast genesis and bone resorption [42].
We also analyzed the upregulated genes in each subtype and reconstructed the PPI network.CTSK, CSF1, TNFSF11, CTNNB1, TNFRSF11A, and TNF formed a key sub-network, and their corresponding gene-drug interactions were analyzed (Figure 11).TNFRSF11A is a target of diclofenac, an anti-inflammatory drug used to treat glaucoma, and TNFSF11 is a target of lenalidomide, denosumab, and anastrozole.Denosumab has also been used to treat osteoporosis, whereas lenalidomide and anastrozole are antitumor drugs.TNF is a target of adalimumab and etanercept, both of which are antirheumatic drugs.CTSK is a target of drugs such as relacatib, which is an anti-osteoporosis drug that is not Food and Drug Administration-approved. CSF1 is a target of pexidartinib, an antitumor agent.Therefore, these drugs may also be potential candidates for the treatment of osteoporosis.Furthermore, CTSK, CSF1, TNFSF11, CTNNB1, TNFRSF11A, and TNF may be the most promising osteoporosis markers.
This study has some limitations.First, although the analyzed public dataset represents the largest sample size for osteoporosis currently available, it is still insufficient to construct a more effective molecular risk model; thus, further studies with large sample sizes are needed as the cost of sequencing decreases.Second, since environmental factors also contribute to the development of osteoporosis, and gene expression is indeed partially driven by epigenetics, our molecular risk model only considers the genetic aspects of osteoporosis and is not recommended for stand-alone use.However, the design and validation of a risk assessment system that considers both genetic and environmental factors will require a considerably more comprehensive analysis that is beyond the scope of the present study.Third, we proposed three molecular subtypes and characterized their genes, functional pathways, and immune infiltration.However, further clarification of the differences and connections between the different subtypes needs to be provided by future experiments.
In conclusion, CTNNB1, MITF, and TNFSF11 were identified as three key genes involved in the occurrence and development of osteoporosis.Our molecular risk model and subtype classification may complement existing clinical assessment tools and age-based strategies for pre-screening DXA-suitable patients.This evidence-based screening strategy will guide the active prevention, early diagnosis, and treatment of osteoporosis in high-risk postmenopausal women.Additionally, we identified the immune infiltration microenvironment in three molecular subtypes of osteoporosis, providing a novel perspective for further research.
Institutional Review Board Statement: This study was approved by the institutional review board of Honghui Hospital (no.202206029) and was performed according to the Helsinki Declaration of 1975, as revised in 2000.
Informed Consent Statement: Informed consent was obtained from all subjects involved in this study.

Figure 1 .
Figure 1.Characterization of osteoclast-regulatory genes.(A) The locations of genes related to osteoclast regulation on chromosomes.(B) Protein-protein interaction network of three types of genes related to the regulation of osteoclasts and the proportion of each type of gene: osteoclast development-related genes (black), osteoclast differentiation-related genes (orange), and osteoclast proliferation-related genes (blue).

Figure 1 .
Figure 1.Characterization of osteoclast-regulatory genes.(A) The locations of genes related to osteoclast regulation on chromosomes.(B) Protein-protein interaction network of three types of genes related to the regulation of osteoclasts and the proportion of each type of gene: osteoclast development-related genes (black), osteoclast differentiation-related genes (orange), and osteoclast proliferation-related genes (blue).

Figure 2 .
Figure 2. Expression levels of osteoclast-regulatory genes according to bone mineral density (BMD).(A) Expression heatmap of differentially expressed osteoclast regulation-related genes in patients with high (light blue) and low (peach) BMD.(B) Boxplots comparing the expression distribution of differentially expressed osteoclast-regulatory genes between patients with high (green) and low (orange) BMD.The Wilcoxon test was used to determine the significance of differential expression.ns: no statistically significant difference; * p < 0.05; ** p < 0.01; *** p < 0.001.

Figure 2 .
Figure 2. Expression levels of osteoclast-regulatory genes according to bone mineral density (BMD).(A) Expression heatmap of differentially expressed osteoclast regulation-related genes in patients with high (light blue) and low (peach) BMD.(B) Boxplots comparing the expression distribution of differentially expressed osteoclast-regulatory genes between patients with high (green) and low (orange) BMD.The Wilcoxon test was used to determine the significance of differential expression.ns: no statistically significant difference; * p < 0.05; ** p < 0.01; *** p < 0.001.

Figure 4 .
Figure 4. Construction of a molecular risk model for osteoporosis based on osteoclast-regulatory genes.(A) Univariate logistic regression results of differentially expressed osteoclast regulation-related genes.(B) Multivariate logistic regression results of CTNNB1, MITF, and TNFSF11.(C)

Figure 4 .
Figure 4. Construction of a molecular risk model for osteoporosis based on osteoclast-regulatory genes.(A) Univariate logistic regression results of differentially expressed osteoclast regulationrelated genes.(B) Multivariate logistic regression results of CTNNB1, MITF, and TNFSF11.(C) Coefficient change of each feature according to the lambda change in the LASSO regression model.(D) Distribution of risk scores between high (blue)-and low (yellow)-bone-mass-density (BMD) groups.The Wilcoxon test was used to determine the significance of the difference in risk scores.(E) Receiver operating characteristic curve to evaluate risk model performance according to the area under the curve (AUC) value.(F) Confidence interval of the target parameter after cross-validation.

Figure 5 .
Figure 5. Quantitative real-time polymerase chain reaction verification of key differentially expressed genes.Compared with that in the control groups (red), the relative expression of CTNNB1 (A) and MITF (B) was significantly downregulated, whereas that of TNFSF11 (C) was significantly upregulated in the osteoporosis groups (blue).* p < 0.05; ** p < 0.01.

Figure 5 .
Figure 5. Quantitative real-time polymerase chain reaction verification of key differentially expressed genes.Compared with that in the control groups (red), the relative expression of CTNNB1 (A) and MITF (B) was significantly downregulated, whereas that of TNFSF11 was significantly upregulated in the osteoporosis groups (blue).* p < 0.05; ** p < 0.01.

Figure 6 .
Figure 6.Osteoclast-regulatory genes and the immune microenvironment.(A) Differences types of immune cells between high-(green) and low-risk (orange) groups.(B) Correlations bet CTNNB1, MITF, and TNFSF11 expression and the 22 types of immune cells.(C) Differences pression of HLA-related genes between high-(green) and low-risk (orange) groups.(D) Co tions between CTNNB1, MITF, TNFSF11, and HLA-related genes.(E) Differences in expressi immune-response-related genes between high-(green) and low-risk (orange) groups.(F) Co tions between CTNNB1, MITF, and TNFSF11 expression levels and immune-response-related g The Wilcoxon test was used to determine the significance of immune cells and related genes bet high-and low-risk groups.Spearman's correlation coefficient was used to the correla ns: no statistically significant difference; * p < 0.05; ** p < 0.01; *** p < 0.001.

Figure 6 .
Figure 6.Osteoclast-regulatory genes and the immune microenvironment.(A) Differences in 22 types of immune cells between high-(green) and low-risk (orange) groups.(B) Correlations between CTNNB1, MITF, and TNFSF11 expression and the 22 types of immune cells.(C) Differences in expression of HLA-related genes between high-(green) and low-risk (orange) groups.(D) Correlations between CTNNB1, MITF, TNFSF11, and HLA-related genes.(E) Differences in expression of immune-response-related genes between high-(green) and low-risk (orange) groups.(F) Correlations between CTNNB1, MITF, and TNFSF11 expression levels and immune-response-related genes.The Wilcoxon test was used to determine the significance of immune cells and related genes between high-and low-risk groups.Spearman's correlation coefficient was used to calculate the correlations.ns: no statistically significant difference; * p < 0.05; ** p < 0.01; *** p < 0.001.

Figure 8 .
Figure 8.Immune microenvironments of different molecular subtypes of osteoporosis.(A) Differences in 22 immune cells among molecular subtypes of osteoporosis.(B) Expression differences in HLA-related genes among molecular subtypes.(C) Expression differences in immune-response-related genes among molecular subtypes.The Kruskal-Wallis test was used to calculate the significance of differences in gene expression.ns: no statistically significant difference; * p < 0.05; ** p < 0.01; *** p < 0.001.Green: Cluster 1; orange: Cluster 2; blue: Cluster 3.

Figure 8 .
Figure 8.Immune microenvironments of different molecular subtypes of osteoporosis.(A) Differences in 22 immune cells among molecular subtypes of osteoporosis.(B) Expression differences in HLArelated genes among molecular subtypes.(C) Expression differences in immune-response-related genes among molecular subtypes.The Kruskal-Wallis test was used to calculate the significance of differences in gene expression.ns: no statistically significant difference; * p < 0.05; ** p < 0.01; *** p < 0.001.Green: Cluster 1; orange: Cluster 2; blue: Cluster 3.

Figure 9 .
Figure 9. Functional analysis of different molecular subtypes.Heatmap of gene set variation analysis enrichment scores for different molecular subtypes.Green: Cluster 1; pink: Cluster 2; blue: Cluster 3.

Figure 9 .
Figure 9. Functional analysis of different molecular subtypes.Heatmap of gene set variation analysis enrichment scores for different molecular subtypes.Green: Cluster 1; pink: Cluster 2; blue: Cluster 3.

Figure 10 .
Figure 10.Differentially expressed genes and functional analysis of different molecular subtypes.(A) Venn diagram showing differentially expressed genes among different molecular subtypes of osteoporosis.Red: Cluster 1; blue: Cluster 2; yellow: Cluster 3. (B) Functions and pathways enriched by differentially expressed genes of Cluster 1. (C) Functions and pathways enriched by differentially expressed genes of Cluster 2. (D) Functions and pathways enriched by differentially expressed genes of Cluster 3.

Figure 10 .
Figure 10.Differentially expressed genes and functional analysis of different molecular subtypes.(A) Venn diagram showing differentially expressed genes among different molecular subtypes of osteoporosis.Red: Cluster 1; blue: Cluster 2; yellow: Cluster 3. (B) Functions and pathways enriched by differentially expressed genes of Cluster 1. (C) Functions and pathways enriched by differentially expressed genes of Cluster 2. (D) Functions and pathways enriched by differentially expressed genes of Cluster 3.

Figure 11 .
Figure 11.Potential drugs for osteoporosis based on hub nodes.(A) Protein-protein interaction network of upregulated genes for three molecular subtypes.(B) Key sub-networks of (A).(C) Degree of hub nodes in the network.(D) Sankey diagram showing drug-gene relationships.Red: Cluster 1; blue: Cluster 2; yellow: Cluster 3.

Figure 11 .
Figure 11.Potential drugs for osteoporosis based on hub nodes.(A) Protein-protein interaction network of upregulated genes for three molecular subtypes.(B) Key sub-networks of (A).(C) Degree of hub nodes in the network.(D) Sankey diagram showing drug-gene relationships.Red: Cluster 1; blue: Cluster 2; yellow: Cluster 3.

Table 1 .
Inclusion and exclusion criteria for participants in the external experimental validation study.