3.3. Machine Learning Analysis
(1) Predictive Performance of Machine Learning Models
To evaluate the predictive performance and interpretability of machine learning models with miners’ safety citizenship behavior as the dependent variable, this study constructed and assessed nine predictive models. These models include Linear Regression (LM), Elastic Net (ENet), Decision Tree (DT), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), Multilayer Perceptron (MLP), Light Gradient Boosting Machine (LightGBM), and K-Nearest Neighbor (KNN).
The model performance was comprehensively evaluated using three metrics: Root Mean Square Error (RMSE), Coefficient of Determination (R
2), and Mean Absolute Error (MAE), as presented in
Table 5.
The analysis results show that the Random Forest (RF) model exhibits the optimal overall predictive performance, with the highest R2 value (0.588), as well as the lowest RMSE (0.466) and MAE (0.333) among all models. This indicates that the model has the strongest explanatory power for data variance and the smallest prediction error. The Linear Regression (LM), Support Vector Machine (SVM), and Light GBM models also demonstrate good performance, with their R2 values all exceeding 0.56. In contrast, the Decision Tree (DT) model presents the least satisfactory predictive performance (R2 = 0.296, RMSE = 0.643, MAE = 0.431). Based on the above results, the Random Forest model has significant advantages in predicting safety citizenship behavior and is recommended as the preferred predictive algorithm.
(2) Model Interpretability
Variable feature importance visualization was performed for the three machine learning models with the best predictive performance—Random Forest (RF), Lasso Ridge Regression Elastic Net (ENet), and Support Vector Machine (SVM)—to intuitively evaluate the relative impact of the included variables on safety citizenship behavior. A higher variable feature importance value indicates a greater contribution of the variable to the prediction of safety citizenship behavior.
As shown in
Figure 3, the variables are ranked in descending order of their contribution to the Random Forest model: Safety Atmosphere (SAC), Promotion Regulatory Focus (PRF), Prevention Regulatory Focus (PF), Self-Efficacy (SE), Social Support (SS), Organizational Justice (OJ), Technical Support (TS), Autonomy (A), Career Development Potential (CDP), Work–Family Conflict (WFC), Work Pressure (WP), Technical Complexity (TC), Working Environment (WE), and Cognitive Demand (CD).
3.4. Response Surface Analysis
Response surface analysis constructed three nested models to progressively examine the effects of demographic variables, main effects of job characteristics, and their nonlinear interactions on miners’ safety-related civic behaviors. Model Y1 included only demographic variables as control variables; Model Y2 incorporated linear main effects of job demands and job resources on top of Y1; and Model Y3 further introduced quadratic terms and interaction terms for job demands and job resources, thereby forming a complete quadratic polynomial model. Response surface analysis was conducted based on this model. As shown in
Table 6, the results of response surface analysis and polynomial regression analysis for miners’ safety-related citizen behavior (SCB) are presented.
Model comparisons indicate that incorporating quadratic and interaction terms significantly improves the fit of Y3 relative to Y2. This demonstrates that the effects of X and Y on SCB are not simply linearly additive but exhibit nonlinear and interactive structures. Therefore, response surface analysis is necessary to characterize the “match/mismatch” effects, as shown in
Table 6 Panel A.
Within the full quadratic polynomial model (Y3), the linear main effects of Job Demands (X) and Job Resources (Y) exhibit opposite directions: Job Demands generally exert an inhibitory effect, while Job Resources demonstrate a promotional effect, as shown in
Table 6, Panel B. This finding indicates that, within the context of intelligent coal mines, safety-related citizen behavior is not solely driven by individual traits or demographic factors. Instead, it is more directly shaped by the interplay between job stress structures and accessible support resources. Further analysis through the JD-R lens suggests: when job demands increase while resource availability is insufficient, individuals are more likely to allocate their limited attention and energy toward “task completion and stress management,” thereby reducing the space for implementing extra-role safety behaviors. Conversely, when organizations provide sufficient resources (e.g., support, tools, autonomy, and development opportunities), individuals are more likely to internalize safety goals and adopt proactive safety behaviors. Crucially, the significance of the X
2 and X × Y terms implies: (1) X’s influence on SCB exhibits curvature characteristics, with marginal effects varying across demand levels; and (2) the marginal effect of X on SCB is not constant but systematically varies with Y levels (i.e., ∂SCB/∂X = b
1 + 2b
3X + b
4Y). Therefore, it is necessary to further identify “matching/mismatch” structures through core tests along the LOC and LOIC in the response surface, as illustrated in
Figure 4.
Key response surface tests further reveal, as shown in
Table 6 Panel C, that along the line of consistency (LOC, X = Y), both the slope and curvature are positive, indicating that when requirements and resources increase synchronously, SCB rises with convex acceleration. This implies that in intelligent coal mines, as technological complexity and coordination demands increase, miners are more likely to translate safety objectives into stable proactive behaviors when resource allocation keeps pace (e.g., through reliable technical support, training, and coordination mechanisms). Along the inconsistency line (LOIC, X = −Y), the slope is negative while the curvature is insignificant, indicating that mismatch impacts primarily manifest as directional divergence—when system complexity and cognitive demands rise but resource supply falls short (“high demands–low resources”), SCB declines more markedly. Conversely, “low demands–high resources” does not yield equivalent negative consequences. The three-dimensional response surface intuitively illustrates this “cooperative configuration–behavior enhancement” trend, as shown in
Figure 5.
In summary, mining safety-related civic behavior is influenced not only by the linear main effects of job demands and job resources but also by the systemic regulation of their nonlinear matching relationship. Response surface analysis clearly demonstrates that achieving high levels of mining safety-related civic behavior hinges on promoting the synergistic enhancement of job demands and job resources, while actively avoiding the vicious cycle of “high demands–low resources” in the workplace.
3.5. Latent Profile Analysis
In this section, an exploratory latent profile analysis (LPA) approach was adopted for the data, followed by univariate and multivariate regression analyses to examine the predictive power of the identified latent profiles on other variables. Building on the overall research, a refined analysis was conducted by classifying miners based on their different scores on job demands and job resources to identify the job characteristic profiles of the miner groups.
Latent profile analysis was performed using the scores of each dimension of job demands and job resources, with all job demands–resources variables standardized before being included in the LPA model. The model analysis results are presented in
Table 7.
The results of the latent profile analysis showed that: Firstly, the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and sample size-adjusted Bayesian information criterion (aBIC) are better when smaller, but they may tend to select a larger number of classes. Secondly, for the Lo–Mendell–Rubin likelihood ratio test (LMR/aLMR), if the test result reaches a significant level (
p < 0.05), it indicates that the model classification is acceptable. The Bootstrap Likelihood Ratio Test (BLRT) was further used to test the model acceptability: by comparing the fitting differences between the k-class and k-1-class models using the likelihood ratio method, a significant
p-value indicates that the k-class model is better [
40]. In this study, the two-class model was superior to the one-class model (P
LMR < 0.001, P
aLMR = 0.001, P
BLRT < 0.001), the three-class model was superior to the two-class model (PLMR < 0.001, P
aLMR = 0.001, P
BLRT < 0.001), the four-class model was superior to the three-class model (P
LMR < 0.001, P
aLMR = 0.001, P
BLRT < 0.001), and the five-class model was not superior to the four-class model (P
LMR = 0.683, P
aLMR = 0.685, P
BLRT < 0.001). Thirdly, entropy is used to assess the clarity of classification, ranging between 0 and 1. The closer the entropy value is to 1, the clearer the classification, and an entropy value greater than 0.8 is generally considered to indicate good classification quality [
40]. In this study, the entropy values for the two-class to five-class models were 0.813, 0.881, 0.859, and 0.825, respectively, and were all greater than 0.8. Fourthly, the proportions and absolute sizes of the sample categories are shown in
Table 5. In the four-class model of this study, the smallest category proportion was 13.27% > 5%, and the smallest category sample size was 155 > 50, indicating that each category has an adequate sample size [
41].
Considering all indicators comprehensively: Firstly, the information criteria (AIC, BIC, aBIC) showed that the three-class model had the lowest values. However, the LMR and BLRT tests indicated that the four-class model had a significant improvement compared to the three-class model (
p < 0.001), while there was no significant difference between the five-class model and the four-class model (
p > 0.05). Secondly, the four-class model had a relatively high entropy value (0.859), indicating good classification clarity. Finally, the sample proportions of each category in the four-class model were balanced (smallest category 13.27%), and the profile characteristics (low–low, high–low, low–high, high–high) had a clearer theoretical distinction and explanatory significance, as shown in
Figure 6. Based on the significant improvement in model comparison tests, good classification accuracy, and theoretical interpretability, this study ultimately selected the four-class model as the optimal solution.
Each profile was named based on the score levels and specific characteristic distributions of the respective profile types. The names are as follows: as shown in
Table 8, Profile 1, where both job demands and job resources scores were low, was named Low Job Demands–Low Job Resources, consisting of 225 samples (19.26%); Profile 2, characterized by high job demands scores and low job resources scores, was named High Job Demands–Low Job Resources, consisting of 348 samples (29.8%); Profile 3, with low job demands scores and high job resources scores, was named Low Job Demands–High Job Resources, consisting of 440 samples (37.67%); and Profile 4, where both job demands and job resources scores were high, was named High Job Demands–High Job Resources, consisting of 155 samples (13.27%).
Differences among profiles in prevention regulatory focus, promotion regulatory focus, miners’ safety citizenship behavior, safety atmosphere, and self-efficacy were examined using one-way analysis of variance (ANOVA) in SPSS 29.0, with post-hoc comparisons conducted using the Least Significant Difference (LSD) method. The specific results are presented in
Table 9. Furthermore, multivariate analysis was conducted using SPSS regression, with marital status, age, years of service, educational level, and job type included as control variables, the latent profile groups (after dummy variable treatment) as the independent variable, and the variables that showed significant differences in the one-way ANOVA (prevention regulatory focus, promotion regulatory focus, miners’ safety citizenship behavior, safety atmosphere, and self-efficacy) as the outcome variables. This was done to analyze the changes in other variables among miners under different job demands–resources groups.
The analysis results showed that the main effects of all variables across the latent profile categories reached a significant level (p < 0.05). Specifically, the main effects of prevention regulatory focus (F(3, 1164) = 83.506, p < 0.001), promotion regulatory focus (F(3, 1164) = 82.292, p < 0.001), miners’ safety citizenship behavior (F(3, 1164) = 89.013, p < 0.001), safety atmosphere (F(3, 1164) = 32.957, p < 0.001), and self-efficacy (F(3, 1164) = 4.790, p = 0.003) were all significant. In other words, there were significant differences in prevention regulatory focus, promotion regulatory focus, miners’ safety citizenship behavior, safety atmosphere, and self-efficacy among the four job demands–resources categories of miners.
Multivariate linear regression (controlling for demographic variables) further revealed that the high demands–low resources profile significantly reduced prevention regulatory focus, promotion regulatory focus, miners’ safety citizenship behavior, and safety atmosphere, while the high demands–high resources profile significantly increased these variables. The effects of the low demands–high resources profile were mostly not significant, and the high demands–high resources profile also enhanced self-efficacy. The results of the LPA characterize the combined patterns of job characteristics, clarify the differences in key psychological and behavioral outcomes among different profiles of miners, and provide empirical evidence for the implementation of differentiated management and interventions in the safety management of frontline miners in intelligent mining areas.