A Predictive Model for Abnormal Bone Density in Male Underground Coal Mine Workers

The dark and humid environment of underground coal mines had a detrimental effect on workers’ skeletal health. Optimal risk prediction models can protect the skeletal health of coal miners by identifying those at risk of abnormal bone density as early as possible. A total of 3695 male underground workers who attended occupational health physical examination in a coal mine in Hebei, China, from July to August 2018 were included in this study. The predictor variables were identified through single-factor analysis and literature review. Three prediction models, Logistic Regression, CNN and XG Boost, were developed to evaluate the prediction performance. The training set results showed that the sensitivity of Logistic Regression, XG Boost and CNN models was 74.687, 82.058, 70.620, the specificity was 80.986, 89.448, 91.866, the F1 scores was 0.618, 0.919, 0.740, the Brier scores was 0.153, 0.040, 0.156, and the Calibration-in-the-large was 0.104, 0.020, 0.076, respectively, XG Boost outperformed the other two models. Similar results were obtained for the test set and validation set. A two-by-two comparison of the area under the ROC curve (AUC) of the three models showed that the XG Boost model had the best prediction performance. The XG Boost model had a high application value and outperformed the CNN and Logistic regression models in prediction.


Introduction
Bone mineral density (BMD), known as bone mineral density, is an important indicator of bone strength, reflects the degree of osteoporosis, and is an important predictor of fracture risk [1]. BMD was classified according to World Health Organization standards: normal bone mass, reduced bone mass, and osteoporosis [2]. Osteoporosis (OP) is a systemic disease characterized by low bone mass and destruction of bone structure, abnormal BMD was the main cause of OP. At present, the incidence of OP had jumped to the 7th in common disease, and more than 200 million people worldwide suffered from osteoporosis [3]. The results of the October 2018 China Osteoporosis Epidemiology Survey showed that the prevalence of OP among people aged 40-49 years in China was 3.2%, including 2.2% for men and 4.3% for women, and the highest incidence among people aged 65 years or older was 32.0%. Globally, one osteoporotic fracture occurs every 3 s, and 50% of first-time osteoporotic patients will have another osteoporotic fracture [4]. Abnormal bone density had become an increasingly serious public health problem due to the accelerated ageing of society's population, abnormal bone density in a large number of people, and a general lack of awareness of bone density.
The coal industry is one of the economic pillar industries in China. There are a great number of people engaged in coal industry, their health status directly related to the development of China's coal industry. Some studies had pointed out that the special environment of underground coal mines had a significant effect on bone metabolism of people who worked underground for long years [5]. The vast majority of coal miners are male. They worked in the dark, damp and relatively narrow working environment deep underground for a long time, and exposed to occupational harmful factors such as shifts, their risk factors for abnormal bone density differ from those of the general population. A study by the World Health Organization reported that the health and diseases of the population were caused by a variety of factors, including behavioral lifestyle, environmental factors, biogenetic factors and the quality of health care services, among which behavioral lifestyle was the most important influencing factor, accounting for 60%, and miners mostly had bad habits such as smoking, alcohol consumption and high-salt diet [6], which together led to underground workers' bone metabolic alterations. Currently, domestic and international studies on BMD abnormalities focused on the risk factors and pathogenesis of BMD decline [7][8][9], and there were fewer studies on its early prevention and risk assessment. In addition, domestic studies on BMD had mainly focused on the elderly and menopausal women [10], with fewer studies on BMD in coal miners. If coal miners at risk of abnormal bone density can be identified early, and changing their own unhealthy lifestyles. The number of abnormal BMD can be effectively reduced and the incidence of OP and osteoporotic fractures can be reduced.
Data Mining is the process of extracting knowledge and information with potential application value from large databases and is a new type of information processing system that had developed rapidly in recent years. Disease risk prediction is a very important task in data mining, which is to take the precondition of multiple pathologies of diseases, select multiple influencing factors of diseases, and use suitable statistical analysis methods to construct models so as to predict the probability of occurrence of certain diseases in groups or individuals with certain characteristics [11]. Commonly used models include Logistic regression, neural networks, decision trees, support vector machine (SVM), and so on. Each of these methods had its own characteristics and had been widely and successfully used in the medical field [12]. In recent years, many scholars had used data mining risk prediction methods in the medical field, and all of them had obtained better results. For example, the heart disease risk prediction model based on Convolutional neural network (CNN) by Jian Wang [13] had high prediction accuracy (89.89%) and can accurately predict the risk of heart disease development; Chao-Wen Tan [14] et al. applied convolutional neural networks to effectively improve the robustness and accuracy of heart sound signal classification, which was expected to be applied to machine-aided auscultation. Sethuraman [15] used a feed-forward neural network for feature selection and reduced the number of attributes to 12, which also helped the prediction model achieve 89.4% training accuracy and 82.2% test accuracy. Heydari et al. [6] compared neural network, SVM, decision tree and Bayesian methods in type II diabetes diagnosis and found that the neural network model had the highest accuracy. Tian-Pei Su [16] developed a diabetes risk prediction model based on the eXtreme Gradient Boosting (XG Boost) algorithm, and found that XG Boost was overall a better fit and more accurate than random forest. Hong-Xia Zhang [17] et al. established a prediction model for type II diabetes based on the XG Boost algorithm, which had a good prediction effect with an accuracy of 86.6%. In addition, most of these models currently developed are for disease risk assessment in the general population, and ignore the special groups in the occupational population. There were a large number of coal miners in China, and their special occupational environment such as high temperature, noise, shift work, and other occupational exposures can cause or affect the development of chronic diseases [18][19][20][21][22]. Therefore, prediction models for BMD abnormalities in the general population are not applicable to coal miners. To improve the quality of life and health status of coal miners, there is an urgent need to develop a new predictive model for the risk of BMD abnormalities in coal miners.
Based on the information from the physical examination data of underground coal mine workers, we developed three bone density abnormality prediction models: Logistic regression, CNN, and XG Boost. Overall, our study includes two contributions.
1. Based on the data information of 3695 underground coal mine workers' physical examinations, the risk factors of their BMD abnormalities were screened to provide a basis for the development of early prevention strategies for BMD abnormalities in underground coal mine workers. 2. The XG Boost model had better predictive performance of the three models. The XG Boost model can be used to predict the risk of BMD abnormalities in underground coal mine workers, so as to achieve early prevention of BMD abnormalities in underground coal mine workers.

Study Subjects
A total of 3695 male on-the-job underground workers who participated in occupational health physical examination from July to August 2018 in Gequan and Dongpang mines of Hebei Jizhong Energy were selected for the study. Inclusion criteria: age greater than or equal to 18 years; more than 1 year of service. Exclusion criteria: age greater than or equal to 60 years; those with incomplete information; those with congenital metabolic diseases affecting bone metabolism. All study subjects signed an informed consent form. The study was conducted in accordance with the Declaration of Helsinki and was reviewed and approved by the Ethics Committee of North China University of Technology (approval number: 15006).

Information Collection
Face-to-face questionnaires were administered to the study subjects by uniformly trained investigators, and the information collected included (1) demographic information: age, education level, marital status, BMI, per capita monthly household income, etc.; (2) lifestyle habits: smoking, alcohol consumption, sleep, physical exercise, etc.; (3) physical and laboratory examinations: bone mineral density, blood pressure, blood glucose, lipids, etc.; (4) exposure to occupational hazards: length of service, shift work, work intensity, etc.

Laboratory Tests
Fasting venous blood was collected by the doctors early in the morning from the study subjects. Blood specimens were sent to the hospital's Laboratory Department for blood biochemistry testing using Myriad Automatic Biochemistry Analyzer (BS-800).

Diagnostic Criteria for Abnormal Bone Density
The heel bone density of underground coal mine workers was measured using a CM-200 ultrasonic bone densitometer (FURONO, Japan). Bone density abnormalities were classified according to WHO standards [2].

Variable Definitions
Smoking: smoked at least 1 cigarette per day and smoked continuously for more than 6 months; current smoking: was smoking at the time of this survey; never smoked quit: used to smoke but had stopped smoking for at least 6 months at the time of this survey. 4. Drinking: according to the definition of drinking by the Chinese Center for Disease Control and Prevention [26], drinking was classified in this study as never drinking, having stopped drinking, and now drinking. Never drinking: drinking at least once a week and drinking continuously for more than 6 months; Now drinking: drinking at the time of this survey; Already abstaining: used to drink but had stopped drinking for at least 6 months at the time of this survey. 5. Shift situation: a working hour system that requires 24 h continuous work in the production process and is ensured by one or several teams working in shifts. This study classifies the shift situation into the following three cases, never shift, once shift now not shift, and now shift. 6. Years of shift work: the sum of years of shift work performed, this study divided the years of shift work into 5 groups, 0, 0~, 10~, 20~, and more than 30 years. 7. Exercise: exercise more than three times a week, no less than 30 min each time. 8. Body mass index: BMI = weight (kg)/height 2 (m 2 ). The normal range of body weight is BMI < 24 kg/m 2 , the overweight range is 24.0 kg/m 2 ≤ BMI < 28.0 kg/m 2 , and the obese range is BMI ≥ 28.0 kg/m 2 . 9. Dyslipidemia: according to the Chinese guidelines for the prevention and treatment of dyslipidemia in adults (revised 2016) [27], serum total cholesterol ≥ 6.2 mmol/L, and/or triglycerides ≥ 2.3 mmol/L, and/or LDL cholesterol ≥ 4.1 mmol/L, and/or HDL cholesterol < 1.0 mmol/L, or a previous A history of hyperlipidemia and current use of lipid-lowering drugs defined as dyslipidemia. 10. High-intensity operations: The physical activity of workers was investigated using the International physical activity questionnaire (IPAQ) (long-volume version) [28], and weekly total physical activity levels ≥ 3000 MET-min/w were defined as highintensity operations. 11. Medium-intensity work: weekly total force activity level ≥ 600 MET-min/w.

Sample Size Calculation
To ensure that the model could accurately predict the mean of the outcome events, the prevalence of abnormal bone density Ø was reviewed in the literature and was approximately 25% [29], with the margin of error δ set at 0.05, which was calculated to require at least 289 study subjects. As shown in Equation (1).
To control for the minimum mean error of all individual predicted values, the mean absolute error MAPE was set to 0.05, the expected contraction rate R 2 CS was set to 0.1, and the predictor variable P was approximately 24, which was calculated to require at least 951 study subjects. As shown in Equation (2).
To ensure an expected contraction rate of 10% and to reduce model overfitting, S was set to 0.1 and the number of study variables P was approximately 24, which was calculated to require at least 2038 study subjects. As shown in Equation (3).
To ensure the minimum difference between the developed model and the R 2 CS optimization adjustment value, R 2 CS in Equation (4) is 0.1 and maxR 2 CS is 0.65, and S' is calculated to be 0.75, which is calculated to require at least 671 study subjects. As shown in Equations (4) and (5).
It was calculated that a minimum of 2038 individuals needed to be included. A total of 3695 individuals were included in the study, and the sample size met the needs of the study.

Statistical Methods
Excel 2016 was used to establish the original database, and IBM SPSS24.0 was used for statistical analysis. Count data were described by rate or composition ratio, and χ 2 test was used for comparison between groups; Ordinal data were described by rate or constituent ratio, and the Kruskal-Wallis test was used for comparison between groups. measurement data obeying normal distribution were described by mean and standard deviation, and non-normally distributed data were expressed as median and quartiles; multi-factor unconditional Logistic was used for multivariate analysis of influencing factors. The test level α = 0.05.

Software and Hardware Platform
The sample data are randomly selected in the ratio of 7:2:1 to divide the training set, test set and validation set. Dataset partition codes as shown in Supplementary Material S1.

Logistic Regression Model
The Logistic regression model was built using the sklearn package. The Logistic model codes as shown in Supplementary Material S2.

CNN Model
Convolutional neural network mainly consists of input layer, convolutional layer, pooling layer, fully connected layer and output layer. In this study, the CNN model was constructed by numpy package. The input layer included 2 nodes, both hidden layers contained 5 nodes, and the output layer included 1 node. The sigmoid function was used as the excitation function, softmax was used for probability normalization, cross-entropy loss function was used as the loss function, and stochastic gradient descent was used as the optimizer. The model was trained by updating the network parameters according to the computed gradients. The CNN model codes as shown in Supplementary Material S3.

XG Boost Model
The XG Boost model codes as shown in Supplementary Material S4. Based on the m eigenvalues and the n sample data, the XG boost prediction model was obtained by the following equation:ŷ In the formula: i-number of total physical examination data samples K-total number of trees f k -the k tree Once the predicted values were obtained, the objective function was obtained by the following equation.
In the formula: l(y i ,ŷ i )-the training error of the sample x i Ω f k -the regular term of the k tree In the formula: T-number of leaf nodes w-score values of leaf nodes γ-parameter to balance the complexity of the model λ-Parameter to balance the complexity of the model XG Boost was an additive model in which the objective function changes each time a tree was added to the model. With an additive strategy, the objective function can be written as: A Taylor expansion of the loss function via Equation (9) was viewed as, and the final loss function can be written as:

Model Evaluation
The model prediction effect was evaluated comprehensively in terms of both discrimination and calibration. As shown in Table 1. Table 1. Model evaluation indexes.

Sensitivity
The percentage of study participants who actually had BMD and were accurately determined to have BMD by the risk prediction model.

Specificity
The percentage of study participants who did not actually have BMD and were accurately determined to not have BMD by the risk prediction model.

Youden index
Correctness Index, the model correctly determined the total capacity of BMD patients and non-patients.

F1 score
The adjusted mean values of precision and recall, used to evaluate the comprehensive performance of the model.

AUC
Area under the ROC curves.
Brier score The quantitative score of the model calibration, ranging from 0 to 0.25, the smaller the value, the better the calibration of the model.

Log loss
The error between the true value of the response and the predicted value of the model. Calibration-in-the-large The intercept of the calibration curve.

Quality Control
The surveyors were trained by a unified induction, and the questionnaires were checked three times after recovery, and the entry system was double-entry to ensure the accuracy of the information. Measuring instruments were maintained and regularly calibrated by dedicated personnel. 1~2 workers were randomly selected, and the second measurement was made by the surveyor each day, and the results were compared to ensure the consistency of the measurement results. Ten workers were randomly selected to use CM-200 ultrasonic bone densitometer and QCT bone densitometer for bone density measurement, and the QCT bone densitometer measurement results were used to calibrate the CM-200 ultrasonic bone densitometer to ensure the accuracy of bone density measurement results.

General Demographic Characteristics
The prevalence of BMD abnormalities was 28.25% among the 3695 male coal mine workers included in the study. Analysis of the age of the study subjects revealed that the age range of the study subjects was (19-59) years, with a mean age of (39.04 ± 8.41) years. The results showed that the differences in the prevalence of BMD abnormalities among male coal mine workers were statistically significant (p < 0.05) across age, education level, BMI, marital status, fracture, smoking status, drinking status, exercise and Sleep time (h), and not statistically significant (p > 0.05) across income, diabetes, hypertension and dyslipidemia groups. As shown in Table 2.

Analysis of Occupational Hazardous Factors and Prevalence of Bone Density Abnormalities
The analysis of occupational hazardous factors in 3695 workers showed that the prevalence of BMD abnormalities in workers increased with working ages and shift length. There were statistically significant differences (p < 0.05) between groups of working ages, shift conditions, shift length, high intensity work and medium intensity work. As shown in Table 3.

Logistic Regression Analysis of Risk Factors for Abnormal Bone Density
It had been reported in the literature [30] that hypertension and diabetes had an effect on BMD, so these indicators were also included in the model building. The variable assignments are shown in Table 4. Drinking status 1 = No drinking; 2 = Alcohol withdrawal; 3 = Drinking X 10 Exercise 1 = No; 2 = Yes X 11 Sleep time (h) 1 = <7; 2 = 7~; 3 = ≥8 X 12 Working age 1 = <10; 2 = 10~; 3 = 20~; 4 = ≥30 X 13 Shift situation 1 = Never; 2 = Once; 3 = Now X 14 Shift length 1 = 0; 2 = <10; 3 = 10~; 4 = 20~; 5 = ≥30 X 15 High intensity work 1 = No, 2 = Yes X 16 Medium intensity work 1 = No, 2 = Yes All independent variables were tested for collinearity diagnostics and found to be free of multicollinearity, as shown in Table 5. The results of the multifactorial analysis showed that age, low level of education, diabetes, hypertension, fractures, smoking status, drinking status, shift situation, high intensity work and medium intensity work were all risk factors for abnormal bone mineral density, with BMI, physical activity and Sleep time (h) as protective factors. As shown in Table 6.

Bone Density Abnormalities Models Construction and Evaluation
Based on the results of the multifactorial analysis of factors, we included in the model 13 independent variables that were significant for the multifactorial analysis, including age, education, BMI, diabetes, hypertension, fracture, smoking, alcohol consumption, shift work status, heavy workload, moderate workload, exercise, and sleep duration. The sample data were partitioned, with 70% of the training set, 20% of the test set and 10% of the validation set, to construct Logistic, CNN and XG Boost models. The results of training set sample of 2586 cases (70%) showed that the sensitivity, Youden index, F1 score, AUC (95% CI), Brier score, Log loss, and Calibration-in-the-large of the XG Boost model were 82.058%, 0.715, 0.919, 0.858 (0.839~0.876), 0.040, 0.147, and 0.020, better than the other two models. The CNN model had a better specificity of 91.866%. The Logistic regression model performed worse. As shown in Table 7. The results of test set sample of 739 cases (20%) showed that the XG Boost model had a sensitivity, specificity, Youden index, F1 score, AUC (95% CI), Brier score, Log loss, and Calibration-in-the-large of 76.555%, 88.302%, 0.649, 0.753, and 0.824(0.787~0.861), 0.107, 0.358, and 0.019, better than the other two models. As shown in Table 7.
A two-by-two comparison of the area under the ROC curves (AUC) of the three models showed that the XG Boost model had the best prediction performance, followed by the CNN model, and the Logistic model had the worst prediction performance, with the differences being statistically significant (p < 0.017). The test set results showed that the XG Boost model had the best prediction performance, and the differences were all statistically significant (p < 0.017). The results of the validation set showed that the XG Boost model outperformed the CNN model, and the differences were statistically significant (p < 0.017); the results of the test and validation sets showed that the differences in prediction performance between the Logistic and CNN models were not statistically significant (p > 0.017). AS shown in Table 8 and Figure 1.     The XG Boost model Brier Score, Log Loss, and Calibration-in-the-large metrics all outperformed the CNN and Logistic regression models. The calibration curves for the training, test and validation sets were all close to the diagonal, with no serious deviations in the results, and the calibration curves for the Logistic regression model were more deviant. As show in Figure 2a In summary, the XG Boost model had excellent performance in the training set set and validation set, its predicted risk was in good agreement with the actual occurr of risk, and each evaluation indexes were significantly better than the CNN model the Logistic regression model. The XG boost model was the optimal model for this st In summary, the XG Boost model had excellent performance in the training set, test set and validation set, its predicted risk was in good agreement with the actual occurrence of risk, and each evaluation indexes were significantly better than the CNN model and the Logistic regression model. The XG boost model was the optimal model for this study.

Discussion
Building risk prediction models was important for early identification and intervention of diseases. Early detection, diagnosis and treatment can contribute to tertiary prevention strategies for the disease. Machine learning had shown advantages in disease models. Meng D et al. conducted a machine learning study on the incidence of hand, foot and mouth disease in all provinces of mainland China and found that the predictive ability of the XG Boost model was generally better than that of the random forest model [31]. Li Z et al. constructed a novel predictive model integrating GCN, CNN and squeeze inspired network (GCSENet) for identifying miRNA-disease associations. By applying the three models together, Li Z et al. obtained an AUC of 0.950 and an F1 score of 0.864, which satisfactorily predicted miRNA disease relevance [32]. Workers working underground in coal mines were susceptible to the effects of their environment on bone metabolism [33], and the factors affecting abnormal bone density were different from those in the general population. Therefore, early identification of high-risk groups and strict control of their influencing factors can reduce the incidence of BMD abnormalities. In this study, 3695 male underground coal mine workers were investigated and their BMD abnormalities were found to be influenced by various factors. By constructing Logistic regression, CNN and XG boost risk prediction models, a comparison of the prediction performance of the three models revealed that the XG boost risk prediction model was the best prediction performance model in this study.
The results of this study showed that the rate of abnormal bone density in a coal mine worker was 28.25%. Based on the importance of the predictor variables in the three models developed, we found that the top four variables were age, BMI, shift work and sleep duration, indicating that these four factors play a very important role in the occurrence of BMD abnormalities. In this study, advanced age was found to be a risk factor for BMD abnormalities, which was consistent with previous findings [34]. The reasons for this might be: decrease in estrogen production with age, which in turn affected parathyroid hormone levels, affecting bone reconstruction and loss of bone mass; the level of secretion decreases with age, inducing increased osteoclast activity and reduced bone content. However, due to the limitation of research conditions, retired workers over 60 years of age were not included in the scope of the study, which may lead to deviations in the research results. Subsequent research can be conducted on retired workers to evaluate the effect of age on bone mineral density of coal mine workers. The detection rate of abnormal bone density was lowest when BMI was between 24.0 and 27.9, which was consistent with previous studies [35]. BMI affected bone density probably because: adipose tissue increased the body's estrogen content, which favored bone formation; the increased mechanical load on the skeleton at higher BMI can promote bone formation. The risk of abnormal bone density was 1.356 times higher for shift workers compared to workers without shift work. Possible reasons for this were: shift work broke the normal work and rest schedule of workers, making them prone to circadian rhythm disorders and metabolic disorders, and shift work was also associated with the development of diseases such as sleep, hypertension, diabetes and obesity, which can indirectly affect bone density [36]. In terms of sleep duration, this study found that the risk of abnormal BMD in the study participants in the ≥8 h sleep group was 0.561 times lower than that in the <7 h sleep group, which was consistent with the results of previous studies [37]. This may be due to the reduced sleep time of workers, especially at night due to shift work, and exposure to light at night, which reduced melatonin secretion, which can reduce bone mass and also disrupted body metabolism, which in turn affected BMD [38]. The present study found that physical activity was a protective factor for abnormal BMD in male coal mine workers with an OR of 0.725 (95% CI: 0.595, 0.882), similar to the findings of Anupama DS et al. [39]. Hauger AV et al. found that high physical activity was positively associated with total hip bone mineral density compared to a sedentary lifestyle [40]. The present study found that the risk of BMD occurrence was lower in workers with high intensity work than in workers with moderate intensity work, which may be due to skeletal muscle contraction activating bone biomodulation mechanisms that enhance BMD to adapt to exercise load [41]. The results of numerous studies had concluded that smoking had a negative impact on human bone. The results of the present study showed that the risk of abnormal BMD was 1.908 (95CI: 1.547, 2.353) times higher in workers who smoked compared to those who never smoked, which was consistent with the results of previous studies [42][43][44]. The possible reasons for this were that tobacco can affect the production and metabolism of estrogen and androgen, affect the activity of osteoblasts and osteoclasts, and inhibit the vitamin D-parathyroid hormone axis, which can had a negative impact on bones [45]. Adequate vitamin D increases intestinal calcium absorption, promotes bone mineralization, maintains muscle strength, improves balance, and reduces the risk of falls. Vitamin D deficiency can lead to secondary hyperparathyroidism, which increases bone resorption and thus causes or exacerbates osteoporosis. Concurrent calcium and vitamin D supplementation may reduce the risk of osteoporotic fractures. Vitamin D insufficiency also affects the efficacy of other anti-osteoporosis drugs [46]. The current study found that the risk of BMD abnormalities in workers who consumed alcohol was 2.182 (95% CI: 1.684, 2.827) times higher than in non-drinkers, suggesting that alcohol consumption was a risk factor for BMD abnormalities, which was consistent with the findings of some studies [47]. However, some studies had also suggested that moderate alcohol consumption was a protective factor for BMD abnormalities and that it was heavy alcohol consumption that led to BMD abnormalities [48]. Previous studies have been inconsistent regarding the relationship between education level and BMD, and Yan Ren's analysis of hip fracture risk factors in middle-aged and elderly Chinese found that elderly people with lower education levels were at high risk of fracture [49]. In contrast, Chen et al. [50] observed a positive association between education level and the risk of hip fracture in postmenopausal women in Taiwan. The current study found a high prevalence of BMD in workers with higher education levels. The relationship between education level and BMD needs to be further explored. No consensus conclusion had been reached regarding the effect of alcohol consumption on BMD, which may be related to factors such as study population selection, alcohol intake, frequency of alcohol consumption and type of alcohol consumption.
Logistic regression models were widely used in the field of risk factor screening and disease prediction. It was easy to use and had clear parameter meanings, but the predictive power of Logistic regression models decreased when the data do not meet the requirements [51]. Yan X et al. applied Logistic regression to build an osteoporosis risk model and performed internal and external validation, and the results showed that the Cindex was 0.947 for internal validation and 0.946 for external validation, and the calibration curve showed a good agreement between predicted and actual probabilities [52]. In this study, there were limitations in applying the Logistic regression model to BMD prediction for underground workers in coal mines, and all three data sets performed poorly in terms of calibration index, suggesting that the consistency between the predicted and actual results of the Logistic regression model was not high, and that it was prone to bias when used for BMD risk prediction. The CNN model was a deep learning method of multi-layered networks, including input layer, convolutional layer, pooling layer, fully connected layer and output layer. The number of network parameters was effectively reduced and the computational complexity was greatly reduced. It had been used as a neural network model to predict the risk of various diseases in recent years [53,54], but the prediction effect of CNN on different diseases was unstable. For example, Dai G et al. used CNN model to explore the effect of hypertension on the retinal microvascular system, and the results were not satisfactory, with a sensitivity of 60.94%, a specificity of 51.54% and an AUC of 0.6506, which may be due to the fact that the model construction needed to be further improved [55]. Jiang J et al. used a CNN model to identify the degree of left atrial enlargement and showed that the AUCs for normal, mild and moderately severe left atrial enlargement ECGs were 0.942, 0.951 and 0.998 respectively [56]. In this study, the CNN model performed better than the Logistic regression model in terms of calibration metrics, but its ability to distinguish between abnormal and non-abnormal BMD was inferior to the XG Boost model and was not the preferred choice for predicting the risk of abnormal BMD in underground coal mine workers. XG Boost was an improvement of the boosting algorithm based on GBDT, which performed a second-order loss function Taylor expansion, which allowed for higher accuracy. The inclusion of a regular direction in the objective function made the trained model simpler and can effectively combat over-fitting [57]. In this study, the XG Boost model not only had a high ability to distinguish between BMD anomalies and non-anomalies but also had the highest agreement between the prediction results and the actual results, and had the best fit with the BMD anomaly data information of underground workers in coal mines, which can be used for the prediction of BMD anomalies of underground workers in coal mines.
In addition, there were limitations in this study. Our study did not measure vitamin and D levels in underground coal mine workers, did not investigate the type and location of fractures, and did not take into account the medication status of the study subjects. this study only established and completed the internal validation of the risk prediction model for abnormal bone density in coal miners, and no external validation was conducted. There were many methods and sites of bone density measurement, and only one was selected for this study. Moreover, this study was a cross-sectional study, and only information on the prevalence of BMD abnormalities in coal miners was obtained, and the causality argument was less effective, and further cohort studies could be conducted for discussion.

Conclusions
In this study, the data related to abnormal BMD in male underground coal mine workers were analyzed and found that age ≥ 30 years, Junior secondary school or lower, diabetes, hypertension, fracture, smoking, drinking, shift work, BMI ≥ 28 kg/m 2 , high intensity work and medium intensity work were risk factors. Exercise and sleep time ≥ 7 h were protective factors for bone density abnormalities. The XG Boost model outperformed the CNN and Logistic regression models in prediction.