1. Introduction
With the increasing depth of coal mining, severe deformation and damage to roadways are becoming more frequent, leading to frequent safety accidents such as roadway collapse, roof fall, and sidewall spalling [
1,
2,
3]. The stability of the surrounding rock in roadways is crucial not only for personnel safety but also for maintaining production efficiency. In deep mining environments, stress-induced deformation is dynamic and complex, further increasing the difficulty of monitoring and control. Therefore, accurate and continuous monitoring and effective prediction of surrounding rock deformation has become the core of smart mining research [
4,
5]. Against this backdrop, there is an urgent need for more adaptive, real-time, and accurate methods to monitor and predict the deformation and damage of the rock mass surrounding roadways, especially in dynamic and complex mining environments.
Traditional methods for monitoring surrounding rock deformation and damage, such as laser scanning, ground-penetrating radar, and conventional measuring instruments, often face numerous challenges, including high costs, complex operation, and difficulty in handling large datasets [
6,
7,
8]. While these methods have value, they are insufficient in complex and variable mining environments, especially when real-time data acquisition is required [
9,
10,
11]. For example, laser scanning is very effective in static environments, but it has limitations in underground mining operations because the underground working environment is often undesirable, such as insufficient lighting, dust, and the movement of machinery, all of which affect the accuracy and real-time performance of the system [
11,
12,
13]. Photogrammetry, while useful, also has similar adaptability problems in environments with uniform texture, leading to difficulties in feature recognition and stereo matching [
14,
15,
16]. Zhu, D. [
17] analyzed the deformation mechanism of deep soft rock roadways through on-site investigation, mechanical modeling and numerical simulation. And developed a combined control technology of “roof corner anchor cable + rib anchor cable + concrete inverted arch + floor anchor cable” was proposed as a combined support technology, which has been successfully applied and monitored in roadway III4104 and confirmed. By analyzing the spatial distribution and rotation of the principal stress axis under three-dimensional stress conditions, Zuo, J. [
18] studied the complex failure mechanism of roads, established mechanical and numerical models, and revealed how stress adjustment leads to failure. The research results were verified through on-site observations. Liu, G. [
19] combined indoor experiments with three-dimensional discontinuous deformation analysis (3D DDA) and utilized a customized high-speed binocular vision experimental system to explore the movement characteristics of falling rocks. By analyzing the influences of the shape of the falling rocks, the height of the fall, the slope Angle and the release method, it is determined that the slope Angle and the height of the fall are the main factors affecting the speed of the falling rocks. The experimental results are highly consistent with the simulation results, verifying the accuracy of this method. Despite challenges such as poor lighting and adverse mining conditions, these remote sensing techniques provide a valuable complement to traditional point measurement methods by delivering comprehensive time-lapse data for better understanding of surrounding rock deformation and failure. However, laser scanning still has limitations, particularly in its application within complex environments and in terms of real-time performance.
Since the beginning of the 21st century, computer vision technology has gradually become an important tool for surrounding rock monitoring and prediction. Du, Y.X. [
20] investigated the key technologies of visual inspection systems and their applications in tunneling sites and predicted the critical technologies that need to be developed in the future. Han, Z. [
21] established a full-plane strain model of a rectangular roadway considering the deflection of the principal stress axis, derived analytical solutions for the stress and plastic zone distributions, and analyzed the effect of stress space rotation on roadway stability. The results show that stress deflection leads to asymmetric failure and rapid expansion. Shen, Y. [
22] studied the asymmetric deformation and failure of fully mechanized collapse roadways in close-range thick coal seams through numerical simulation and field tests and proposed an asymmetric support optimization scheme. The results show that the non-uniform stress generated by the residual coal pillar can lead to butterfly failure, while the proposed asymmetric support scheme can effectively maintain the stability of the roadway. Chai, J. [
23] established a force and deformation model based on optical fiber sensing for monitoring the loose areas of the roadway floor. He introduced a compression coefficient (ξ) to quantify rock compression and demonstrated that distributed optical fiber technology can effectively track the development, displacement and stress of the loose areas in real time during mining operations. Liang, MF [
24] developed a cubic three-dimensional stress sensor based on fiber Bragg grating technology. Experiments demonstrated that the sensor achieves a triaxial sensitivity of 25.51–24.86 pm/MPa within the 0–50 MPa range, with a measurement error below 4%. The sensor exhibits good linearity and repeatability, providing a high-precision tool for safety monitoring in underground engineering such as coal mines and tunnels. Zhang, T. [
25] conducted large-scale physical model tests to explore the dynamic instability of the surrounding rock of deeply buried roadways under plateau stress and dynamic disturbances. The results show that the increase in disturbance intensity and the change in disturbance position significantly affect crack propagation and failure, and stress concentration occurs at the arch shoulder and point b. Cheng, J. [
26] addressed the issue of insufficient binocular vision positioning accuracy in coal mining roadheaders by establishing a spatial circular feature projection error model and a structural parameter analysis model. The study systematically analyzed the effects of imaging errors and sensor parameters on measurement accuracy. Experimental results validated the reliability and precision of the pose measurement system, providing technical support for improving the efficiency of intelligent coal mining operations. Mahdevari S. [
27,
28] proposed a Particle Swarm Optimization-based Adaptive Neuro-Fuzzy Inference System (PSO-ANFIS) model to predict the maximum roof displacement in the roadways of the Tabas coal mine. The results demonstrated that this model outperformed other methods in terms of accuracy and generalization ability, aiding in the identification of unstable zones and the formulation of support strategies. Additionally, the developed Improved Support Vector Regression (ISVR) model accurately predicted the stability of tailgate roadways in mechanized longwall mining, showing a high agreement with measured data (R
2 = 0.91) and outperforming Artificial Neural Networks and multivariate linear regression models.
In summary, the application of binocular vision methods in roadway deformation and failure monitoring is relatively mature; however, most existing studies are based on experiments simulating static roadway environments under idealized conditions [
29,
30,
31]. Meanwhile, traditional stereo matching techniques face challenges when dealing with homogeneous surface textures commonly found in underground roadways. In the field of roadway deformation prediction, current research mainly focuses on predicting the deformation amount at the same location in the surrounding rock [
32,
33]. To address these issues, the general methodological framework of this study is structured into three main components: (1) experimental modeling of roadway deformation and failure using a physical similarity model; (2) binocular vision–based monitoring and data acquisition with an improved stereo matching algorithm; and (3) deformation prediction through the Gradient-Enhanced Random Forest (GERF) model trained on the visual monitoring data. This integrated framework enables a closed-loop process from deformation observation to model-based prediction and validation [
34]. Although the proposed method aims to improve the accuracy of deformation monitoring and prediction, the fundamental research problem addressed in this study is to reveal the intrinsic coupling mechanism between the deformation-failure evolution of roadway surrounding rock under complex dynamic stress conditions and its visual characteristics [
35,
36]. Understanding this coupling mechanism provides a theoretical foundation for developing intelligent, data-driven roadway monitoring and prediction systems.
4. Prediction Experiment
4.1. Dataset and Model Parameter Settings
Deformation and failure data pertaining to the surrounding rock, acquired through the application of a binocular vision algorithm, serve as the primary source for training the model. A dataset is constructed using the deformation amounts of the roadway roof and three landmark points located 15.0 cm and 30.0 cm above it, with the roadway roof deformation amount 1 as the X feature. In the simulation experiment, the binocular vision measurement method was used to collect 60 sets of data at each of the two key layers, A and B, resulting in a total of 120 datasets. These 120 datasets were randomly divided, with 80% allocated as the training dataset Y_h1 and the remaining 20% as the testing dataset Y_h2.
In the process of using RF regression model, it is necessary to adjust the model parameters to achieve optimal results. The study uses the number of decision trees (n_estimators) and the maximum depth of the decision trees (max_depth) in RF regression model as examples, applying the method of controlling variables. The selection of optimal parameter values is determined by the model’s efficacy on the validation set, with a focus on minimizing the Mean Squared Error (MSE).
Initially, establish constant starting values for the remaining parameters and evaluate the model’s efficacy across varying tree counts using the datasets Y_h1 and Y_h2.
Figure 16 illustrates the variation curve of the mean squared error (MSE) for the model as the number of trees changes, as applied to the two datasets. As the number of trees increases, the MSE exhibits a downward trend, and ultimately, the error levels off at approximately 0.0064. At this point, the number of trees is approximately 180, and the model reaches its optimal performance. Hence, the n_estimators parameter for the model using the Y_h1 dataset is adjusted to 180. The model reaches convergence more rapidly on the Y_h2 dataset. When the number of decision trees reaches 120, the mean squared error begins to stabilize. Therefore, the n_estimators parameter for the model on the Y_h2 dataset is set to 120.
As shown in
Figure 17, the mean squared error variation curve for the model on the two datasets with different tree depths is presented. As the tree depth increases, the mean squared error shows a decreasing trend, eventually converging to around 0.005. At this point, the tree depth is approximately 4, and the model reaches its optimal performance. Therefore, the max_depth parameter for the model on the Y_h1 dataset is set to 4. The model on the Y_h2 dataset converges more quickly, and when the depth reaches 3, the model achieves the minimum error. Therefore, the max_depth parameter for the model on the Y_h2 dataset is set to 3.
After analyzing the different parameters of the model, the final parameter settings are determined as shown in
Table 3. The parameters min_samples_split and min_samples_leaf are also determined using this method.
The performance of the GERF largely depends on data quality and model optimization. Effective feature engineering is essential to select highly relevant features such as depth information or texture features while minimizing the influence of noise. Additionally, model parameters, including the number of trees, tree depth, and splitting criteria, significantly affect the model’s generalization ability and must be carefully tuned to avoid overfitting. When the output of binocular vision such as 3D point clouds is used as input features, any inherent errors may propagate into the prediction results. For instance, depth deviations can lead to classification errors. Therefore, it is essential to jointly optimize the robustness of the binocular vision system (e.g., through illumination-invariant design) and the feature fusion strategy of the GERF. This coordinated approach ensures high precision and low latency throughout the data pipeline, thereby enhancing the overall reliability of the system.
4.2. Evaluation Metrics
The model is trained using the roof deformation and failure data X and the Y feature values at two different depths. The following assessment metrics are determined: Root Mean Squared Error (RMSE), R2, and Cross-Validated Mean Squared Error (CV_MSE).
RMSE quantifies the average discrepancy between the values predicted by the model and the true values. It is the square root of the average of the squared errors. A lower RMSE signifies superior predictive accuracy of the model. The formula for computing RMSE is as follows:
The determination coefficient, R
2, serves as a statistical metric for assessing the fit of a regression model. It varies between 0 and 1, where a higher value near 1 suggests a more accurate representation of the real data. The equation for computing R
2 is as follows:
The calculation formula of CV_MSE is:
5. Results and Analysis
5.1. Monitoring Results and Analysis
The difference map derived from the stereo correspondence algorithm can subsequently be employed to derive the three-dimensional coordinates of the reference points. By contrasting the three-dimensional coordinates of the reference points across various moments in time, the displacement of the fiducial points can be calculated, which is indicative of the deformation occurring in the surrounding rock in the roadway. To validate the accuracy of the stereo vision algorithm for the measurement of surrounding rock deformation and failure, this investigation conducts a comparative analysis of the outcomes with those derived from measurements utilizing a total station. In the loading experiment, the deep deformation of the roadway surrounding rock is relatively large, and it mainly occurs in the vertical direction, making it easier to obtain accurate results for comparison and analysis. Therefore, the movement of the marking points on the two key layers above the model roadway is selected to verify the accuracy of the stereo vision algorithm, as shown in
Figure 18, referred to as Layer A and Layer B, respectively.
The improved stereo vision algorithm has been employed to process the collected images of roadway surrounding rock deformation and failure, and the cumulative movement of the marker points on the surrounding rock surface is calculated. Meanwhile, the total displacement of the marker points is monitored and calculated using a total station. Finally, the deformation statistics for Layer A and Layer B of the surrounding rock are obtained for both methods.
As shown in
Figure 19, it is evident that the deformation data derived from the stereo vision algorithm closely correspond to the measurements taken by the total station, with both sets of values aligning closely with the real experimental scenarios. The surrounding rock deformation and failure is greatest above the roadway roof, while the deformation of the surrounding rock farther away from the roadway is relatively smaller. Taking the total station monitoring outcomes as the precise benchmark, the mean measurement errors of the stereo vision algorithm for the deformation of the surrounding rock in Layer A and Layer B are determined to be 1.22 mm and 0.92 mm, respectively. In summary, the stereo vision-based roadway surrounding rock deformation and failure monitoring method can accurately measure the deformation of the surrounding rock, and in contrast to the total station method, it offers higher efficiency and a higher degree of automation.
5.2. Prediction Results and Analysis
After model training, the evaluation metric results of RF algorithm on the Y_h1 and Y_h2 datasets are shown in
Table 4.
The findings from the model training exercise confirm that the GERF regression technique exhibits a high degree of predictive precision and robustness when applied to the Y_h1 and Y_h2 data matrices. In addition, the smaller CV_MSE, the better the model’s generalization ability on that dataset. The model’s RMSE on the Y_h1 dataset is only 0.0164, with an R2 value of 0.8856 and a CV_MSE of 0.1281. This indicates that the model fits the test data on the Y_h1 dataset very well. The model’s RMSE on the Y_h2 dataset is only 0.0113, with an R2 value of 0.8356 and a CV_MSE of 0.1063. This performance is even better than that on the Y_h1 dataset, which demonstrates that RF regression model has strong fitting capability for this type of data. The mechanism of the algorithm is proficient in delineating and extracting pertinent features from the training dataset, which culminates in the production of exemplary outcomes upon evaluation of the test set.
During the advancement of the physical similarity model working face, a total station was employed to monitor the displacement and deformation of the roadway roof, floor, and sidewalls in real time. The deformation measurements obtained from the total station were compared with the predictions generated by the GERF model to verify the accuracy of the predictive results. The model trained on the training dataset was used to perform predictive analysis on the test datasets Y_h1 and Y_h2. The fitting curves between the predicted values and the actual values are illustrated in
Figure 20.
Figure 20a depicts the efficacy of the model as applied to the Y_h1 test corpus. Y_h1A, Y_h1B, and Y_h1C represent the test data, which correspond to the actual displacement data of the three rows of markers at the roadway cover thickness of 4.5 m in the experiment. Y_h1A_Pre, Y_h1B_Pre, and Y_h1C_Pre denote the predictive outcomes yielded by the model for the respective test set datasets. The figure shows that the predicted values of the model fit very closely with the true values, with a consistent trend in the curve changes. This indicates that the model has effectively extracted the features from the dataset and can make accurate predictions on the test set data.
Figure 20b shows the model’s performance on the Y_h2 test set. Y_h2A, Y_h2B, and Y_h2C represent the test set data, while Y_h2A_Pre, Y_h2B_Pre, and Y_h2C_Pre are the corresponding predicted values. From the figure, it can be observed that the model also performs very well on the Y_h2 test set. The predicted value curves almost perfectly match the real value curves, with a fitting rate as high as 88.7%. The graphical correlation manifestly validates the efficacy of the model’s parameterization, affirming the robustness and precision of its predictive capabilities as evidenced by the alignment of the predicted and observed data trajectories.
The empirical validation of the trained RF regression model attests to its proficiency in delineating the quantitative correlation between rooftop deformation metrics and subsurface geological displacement data. This enables the forecasting of subsurface rock deformation and failure parameters exclusively from roof deformation observations.
The same dataset was used to predict the stability of roadway surrounding rock using ensemble learning methods: Adaboost, XGBoost, and Vision Transformer (ViT) [
52,
53].
Adaboost improves accuracy by focusing on hard-to-classify samples but is sensitive to noise and depends on weak learners. XGBoost enhances gradient boosting with regularization and parallelization, offering high performance but requiring complex tuning. ViT (Vision Transformer) uses self-attention on image patches for strong global feature extraction, excelling with big data but demanding high computational resources.
As shown in
Table 5,
Table 6 and
Table 7, the GERF model exhibits slightly better fitting performance on datasets Y_h1 and Y_h2 compared to the Adaboost, XGBoost, and ViT models. Compared to the Adaboost model, it achieves up to a 25% reduction in root mean square error (RMSE), a maximum increase of 5% in the R
2 value, and up to a 12% reduction in cross-validation mean square error (MSE). Compared to the XGBoost model, the GERF model achieves up to a 33% reduction in root mean square error (RMSE), a maximum increase of 3% in the R
2 value, and up to a 15% reduction in cross-validation mean square error (MSE). Compared to the Vision Transformer model, it achieves up to an 18% reduction in RMSE, a maximum increase of 3.7% in the R
2 value, and up to an 8.7% reduction in cross-validation MSE. These results further validate that the GERF model has a strong fitting capability for this type of data.
The GERF, Adaboost, XGBoost, and ViT models were evaluated using Precision, Specificity, Accuracy, Recall, and F-measure as performance metrics. The evaluation results for these four models are shown in the table.
As shown in
Table 8, all models demonstrate good predictability during the training phase. Among them, the GERF model achieves the highest performance metrics. It can also be observed that the global and local evaluation metric averages of the Adaboost and XGBoost models are similar, indicating that their predictive performance for roadway surrounding rock stability is comparable. The VT model performs better than both the Adaboost and XGBoost models, but its overall evaluation metrics are slightly lower than those of the GERF model. Their model ranking scores are shown in
Figure 21. The average values of these five evaluation metrics were scored, with the best-performing model among the four receiving 4 points, the next best 3 points, and so on, down to 1 point for the lowest-performing model. The results show that the GERF model achieved the most satisfactory predictive performance, earning the highest score of 17 points. Compared to the Adaboost, XGBoost, and ViT models, the GERF model’s overall performance improved by 7.82%, 8.68%, and 3.87%, respectively. This indicates that the predictive performance of the GERF model is superior to that of the Adaboost, XGBoost, and ViT models.
5.3. Evaluation and Prospects
The results of this study show that the improved binocular vision system combined with the gradient-enhanced random forest (GERF) model can achieve accurate real-time monitoring of roadway surrounding rock deformation. Compared with traditional mine monitoring methods (such as laser scanning, total station and fiber optic sensors), the method in this paper effectively overcomes the limitations of high equipment cost, complex calibration and poor adaptability to low-light underground environments, and provides an economical, efficient, non-contact and real-time large-scale high-resolution spatial deformation data acquisition scheme. Similar to the fiber optic sensing technology developed by Chai et al. [
23], this method can also continuously monitor deformation but has greater advantages in spatial resolution and coverage.
In terms of prediction, compared with stress and deformation prediction models such as PSO-ANFIS and ISVR proposed by Mahdevari et al. [
27,
28], the GERF-based model shows stronger generalization ability and stability. Traditional models mostly focus on single-point displacement prediction, while this study realizes spatiotemporal prediction of surface and deep deformation of surrounding rock by integrating multi-dimensional visualization data. Furthermore, the physical similarity model used is closer to the actual deformation situation on site, effectively bridging the gap between laboratory simulation and field observation by Zhu et al. [
17].
The proposed method for monitoring and predicting roadway surrounding rock deformation based on an improved binocular vision system and a gradient-enhanced random forest (GERF) model has significant advantages. First, the method is low-cost and provides non-contact, real-time monitoring of large-scale, high-resolution spatial deformation data, significantly improving monitoring efficiency and accuracy. Second, combining a physical similarity model with machine learning enhances the accuracy and generalization ability of spatiotemporal deformation prediction, bridging the gap between laboratory simulations and complex field conditions. However, the method also has certain shortcomings and limitations. The binocular vision system is highly sensitive to lighting conditions and surface texture; low underground lighting and homogeneous surfaces may affect stereo matching accuracy, thus impacting monitoring effectiveness. Furthermore, the GERF model relies on a large amount of high-quality training data, making data acquisition and processing challenging and time-consuming. Interference factors in the mining environment, such as dust and vibration, may introduce measurement errors, affecting system stability and data reliability. In practical applications, the model’s generalization ability faces challenges, and prediction performance may decline under different geological conditions. The system’s reliance on high-performance computing and algorithm support also brings potential technical risks. The storage and real-time processing of high-frequency data also place high demands on equipment performance. Overall, while the method presented in this study has certain limitations, its comprehensive performance and innovation provide important technical support and a direction for the intelligent monitoring and prediction of surrounding rock deformation in mines.
This study demonstrates the significant potential of an intelligent monitoring and prediction method based on improved binocular vision and gradient-enhanced random forest (GERF) models in the field of surrounding rock deformation in mine roadways. In the future, this method is expected to further promote the development of intelligent and automated mining, achieving higher-precision, real-time monitoring and early warning of the underground environment. Scientifically, this research combines physical similarity modeling and machine learning to expand the theoretical framework of deformation prediction, providing new insights into the analysis of surrounding rock behavior in deep and complex geological environments. In terms of application, with the improvement of sensing technology and computing power, this method can be extended to more types of underground engineering and mining conditions, integrating multi-source sensor data to improve the robustness and adaptability of the system. Furthermore, combined with intelligent devices such as drones and robots, it can achieve broader autonomous monitoring and remote control in the future, contributing to the digital transformation of mine safety production and risk management.
6. Conclusions
This study developed a comprehensive deformation monitoring and prediction framework for deep soft-rock roadways by integrating binocular vision technology with machine learning. Through algorithmic innovation, physical modeling, and data-driven prediction, the research bridges the gap between laboratory-scale observation and real-time underground monitoring. The proposed system not only enhances the precision and robustness of deformation measurement but also expands the predictive capability of surrounding rock stability analysis. The main scientific and engineering conclusions are summarized as follows:
1. An improved Semi-Global Block Matching (SGBM) algorithm with an adaptive BlockSize mechanism was proposed, which dynamically adjusts the matching window based on image gradient and contrast. This enhancement significantly improves stereo-matching accuracy under the complex lighting and texture conditions typical of underground environments. A physical similarity-based experimental model using binocular vision was established to quantitatively analyze the deformation and failure characteristics of roadway surrounding rock. Validation against total station measurements demonstrated high reliability, with mean errors of 1.22 mm and 0.92 mm at different monitoring layers.
2. To predict deformation evolution, a Gradient-Enhanced Random Forest (GERF) model was developed and trained on datasets acquired through binocular vision monitoring. The model achieved an R2 of 0.8856 for deep-level deformation prediction, surpassing AdaBoost, XGBoost, and ViT models by 7.82%, 8.68%, and 3.87%, respectively. The GERF model effectively integrates multi-dimensional visual data, enabling accurate spatiotemporal prediction of both surface and internal deformation in roadway surrounding rock. This provides a solid foundation for intelligent, data-driven monitoring and early warning in deep mining operations.
3. The proposed binocular vision monitoring system, integrated with the Gradient-Enhanced Random Forest (GERF) model, exhibits strong potential for practical application in underground mining operations. Its compact, non-contact design allows for seamless installation along roadway structures without interrupting production, while real-time monitoring enables continuous deformation tracking and early warning of instability. The system can be connected to existing mine safety management platforms for automated data exchange and visualization, reducing maintenance requirements and long-term costs compared with traditional sensors. In the future, integration with IoT frameworks and intelligent control systems will enable a closed-loop process of perception, prediction, and response, enhancing the safety and automation of deep mining operations. Continued research will focus on improving environmental adaptability, incorporating deep learning methods, and expanding multimodal datasets to strengthen system robustness and generalization.