Abstract
Background: Osteoporosis and osteopenia are prevalent bone diseases characterized by reduced bone mineral density (BMD) and an increased risk of fractures, particularly in postmenopausal women. While dual-energy X-ray absorptiometry (DXA) remains the gold standard for diagnosis, it has limitations regarding accessibility, cost, and predictive capacity for fracture risk. Machine learning (ML) approaches offer an opportunity to develop automated and more accurate diagnostic models by incorporating both BMD values and clinical variables. Method: This study retrospectively analyzed BMD data from 142 postmenopausal women, classified into 3 diagnostic groups: normal, osteopenia, and osteoporosis. Various supervised ML algorithms—including Support Vector Machines (SVM), k-Nearest Neighbors (k-NN), Decision Trees (DT), Naive Bayes (NB), Linear Discriminant Analysis (LDA), and Artificial Neural Networks (ANN)—were applied. Feature selection techniques such as ANOVA, CHI2, MRMR, and Kruskal–Wallis were used to enhance model performance, reduce dimensionality, and improve interpretability. Model performance was evaluated using 10-fold cross-validation based on accuracy, true positive rate (TPR), false negative rate (FNR), and AUC values. Results: Among all models and feature selection combinations, SVM with ANOVA-selected features achieved the highest classification accuracy (94.30%) and 100% TPR for the normal class. Feature sets based on traditional diagnostic regions (L1–L4, femoral neck, total femur) also showed high accuracy (up to 90.70%) but were generally outperformed by statistically selected features. CHI2 and MRMR methods also yielded robust results, particularly when paired with SVM and k-NN classifiers. The results highlight the effectiveness of combining statistical feature selection with ML to enhance diagnostic precision for osteoporosis and osteopenia. Conclusions: Machine learning algorithms, when integrated with data-driven feature selection strategies, provide a promising framework for automated classification of osteoporosis and osteopenia based on BMD data. ANOVA emerged as the most effective feature selection method, yielding superior accuracy across all classifiers. These findings support the integration of ML-based decision support tools into clinical workflows to facilitate early diagnosis and personalized treatment planning. Future studies should explore more diverse and larger datasets, incorporating genetic, lifestyle, and hormonal factors for further model enhancement.