Next Article in Journal
A Far-Field Helicopter Acoustic Detection Method Based on FRESH Adaptive Filtering
Previous Article in Journal
An Efficient Pedestrian Gender Recognition Method Based on Key Area Feature Extraction and Information Fusion
Previous Article in Special Issue
Blasting Damage Control in Jointed Rock Tunnels: A Review with Numerical Validation of Water-Pressure Blasting
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Input Variable Effects on TBM Penetration Rate: Parametric and Machine Learning Models

Department of Civil Engineering, Pamukkale University, Denizli 20160, Turkey
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(3), 1301; https://doi.org/10.3390/app16031301
Submission received: 20 December 2025 / Revised: 20 January 2026 / Accepted: 26 January 2026 / Published: 27 January 2026
(This article belongs to the Special Issue Rock Mechanics in Geotechnical and Tunnel Engineering)

Abstract

In this study, linear and nonlinear parametric models (M1–M6) were jointly evaluated alongside machine learning (ML)-based approaches to achieve reliable and interpretable prediction of the penetration rate (ROP) of tunnel boring machines (TBMs). The analyses incorporate key geomechanical and structural variables, including the brittleness index (BI), uniaxial compressive strength (UCS), mean spacing of weakness planes (DPW), the angle between the tunnel axis and weakness planes (α), and Brazilian tensile strength (BTS). The coefficients of the parametric models were optimized using the Differential Evolution (DE) algorithm. Variable effects were systematically examined through Jacobian-based elasticity analysis under both original and standardized data scenarios. The results indicate that the M6 model, which explicitly incorporates interaction terms, delivers superior predictive accuracy and a more balanced, physically meaningful representation of variable contributions compared to widely used parametric formulations reported in the literature. While the dominant influence of BI and UCS on ROP is consistently preserved across all models, the indirect contributions of variables such as DPW and BTS are more clearly revealed in M6 owing to its interaction-based structure. Model performance improves systematically with increasing complexity, with the coefficient of determination (R2) rising from 0.62 for M1 to 0.69 for M6. Relative to the linear model, M6 achieves a 9.07% reduction in RMSE and a 10.48% increase in R2, while providing additional improvements of 2.47% in RMSE and 2.37% in R2 compared with the closest competing model. ML-based variable importance analyses are largely consistent with the parametric findings, highlighting BI and α in tree-based models, and UCS and α in SVM and GAM frameworks. Notably, the GAM exhibits the highest predictive performance under both data scenarios. Overall, the integrated use of parametric and ML approaches establishes a robust hybrid modeling framework that enables highly accurate and engineering-interpretable prediction of TBM penetration rate.

1. Introduction

In contemporary transportation, energy, and infrastructure projects, tunnels are extensively employed to ensure continuity of intercity connections, support water supply and wastewater management systems, and enable the safe transmission of energy. As these structures are often constructed under complex geotechnical conditions, they require a high level of technical expertise, meticulous planning, and advanced safety measures throughout the design, construction, and operational phases. Tunnel boring machines (TBMs) represent one of the most tangible field-level manifestations of this multidisciplinary framework. Accurate prediction of TBM performance relies on the integrated knowledge of multiple disciplines, including civil engineering, geology, mechanical engineering, data science, and project management. Numerous parameters—such as the rate of penetration (ROP), advance rate (AR), and ground–support interactions—directly influence the efficiency and safety of TBM operations. Reliable estimation of these parameters is critical not only for occupational safety but also for effective cost and time management. Consequently, the success of a TBM project necessitates strong teamwork and effective communication among geologists, geotechnical engineers, mechanical engineers, data analysts, and field personnel, while clearly highlighting the importance of interdisciplinary approaches and team coordination in engineering education and practice.
Studies on the prediction of TBM penetration rate initially focused on simple empirical models based on rock hardness [1]. Subsequently, investigations into penetration behavior in sedimentary rock environments [2] and statistical regression models [3,4] contributed significantly to understanding the fundamental relationships between rock properties and ROP. Advanced regression analyses yielded high correlation coefficients (r = 0.82), leading to the development of empirical equations applicable to practical engineering problems [3]. However, as these early approaches largely relied on linear assumptions, they proved insufficient in fully capturing complex geotechnical interactions and machine–ground behavior.
To achieve a more realistic assessment of TBM performance, heuristic optimization algorithms have been increasingly adopted in subsequent studies. A pioneering contribution in this area is the Particle Swarm Optimization (PSO)—based model proposed by Yagiz and Karahan [5], which aimed to minimize the discrepancy between measured and predicted ROP values. Further studies expanded methodological diversity by incorporating heuristic algorithms such as Harmony Search (HS), Differential Evolution (DE), and the Grey Wolf Optimizer (GWO) [6]. Similarly, Khoshzaher et al. [7] demonstrated that TBM penetration rate is influenced by both machine-related and geological factors, showing that the Firefly algorithm outperformed PSO in terms of optimization efficiency.
In complex geotechnical and mechanical systems such as TBMs, it is extremely challenging to represent all governing relationships comprehensively within a single analytical expression. This limitation constrains the predictive capability of parametric models and typically restricts their performance to R2 values of approximately 0.70. To overcome this shortcoming, recent years have witnessed a growing interest in artificial neural networks (ANNs) [8,9,10,11], support vector machines (SVMs) [12,13,14,15,16,17], fuzzy logic approaches [18,19,20,21], and hybrid soft computing techniques [22,23,24,25], owing to their flexibility in modeling nonlinear relationships and managing uncertainty, thereby enabling higher prediction accuracy in TBM performance estimation.
Recent studies have demonstrated that the performance of various machine learning and deep learning approaches strongly depends on site-specific conditions and dataset characteristics. For instance, Mahdevari et al. [26] employed a support vector regression (SVR) model to predict TBM torque and total thrust with high accuracy, highlighting the critical role of parameter optimization (C, ε, γ). Ayawah et al. [27] compared different machine learning models and emphasized that no single model consistently performs well across all field conditions, as prediction accuracy depends on both the number of input variables and site characteristics. Xu et al. [28] reported that ensemble methods outperform statistical and deep learning approaches in terms of prediction accuracy when applied to limited datasets.
Li et al. [29] utilized a dual-input 2D convolutional neural network (2D-CNN) to predict TBM torque, total thrust, and rock conditions, demonstrating that a “line model” emphasizing steady-state phase data yielded the best performance. Mahmoodzadeh et al. [30] applied a long short-term memory (LSTM) model to a dataset comprising 1125 records and achieved the highest correlation coefficient and lowest mean squared error among the evaluated algorithms. Shahrour and Zhang [31] recommended reducing input dimensionality, completing missing data, using a limited number of hidden layers, adopting recurrent neural networks for time-series data, and employing hybrid optimization techniques to mitigate the risk of convergence to local minima in soft computing applications.
Recent research underscores the critical importance of model selection, parameter optimization, and data preprocessing strategies in relation to dataset size and site conditions. While deep learning and ensemble methods generally provide superior performance—particularly in capturing complex relationships and handling limited datasets—they also suffer from several limitations, including data scarcity, site-specific constraints, high computational cost, and challenges in result interpretability.
Nevertheless, these methods are capable of predicting penetration rates with high accuracy under multivariate and complex geotechnical conditions, effectively capturing nonlinear relationships and intricate interactions among parameters. Despite achieving high predictive performance (R2 ≈ 0.90–0.95), the interpretability of model outputs and the ability to quantify the relative influence of input variables remain limited, posing a significant drawback for practical engineering applications.
To address this limitation, the present study develops a Jacobian-based framework to quantify the influence of model inputs on outputs by integrating parametric and machine learning approaches, and applies this framework to the prediction of TBM penetration rate (ROP). Within this context, six parametric models are considered: three conventional models (linear, exponential, and nonlinear) and three hybrid formulations. The parameters of all models are optimized using the Differential Evolution (DE) algorithm [32,33], and the resulting coefficients are employed to analyze the effects of input variables on the output through the Jacobian matrix.
The analyses are conducted under two distinct scenarios. In Scenario I, the model inputs are used in their original form, whereas in Scenario II, the inputs are standardized using the z-score method. For both scenarios, the relative importance of input variables is quantified and the results are presented comparatively. In addition, variable importance measures are computed for Random Forest (RF) [34,35], Bagged Trees (BT) [36,37], Support Vector Machine (SVM) [38,39], and Generalized Additive Models (GAM) [40,41], and the outcomes are compared with those obtained from parametric models.
For reliable modeling of TBM penetration rate, high predictive accuracy alone is insufficient; transparency and interpretability of model decision mechanisms are equally critical. Accordingly, Partial Dependence Plots (PDP) and Accumulated Local Effects (ALE) analyses are employed to investigate the influence of input variables on ROP within the selected machine learning models. As black-box models do not permit analytical differentiation, numerical differentiation techniques are applied. These analyses provide both global and local insights into variable effects, enhance model interpretability, and enable a reliable and comprehensive assessment of factor influence in TBM penetration rate prediction.
The main contributions of this study are summarized as follows:
(1)
Development of a Jacobian-based framework to quantify the influence of input variables in both parametric and machine learning models for TBM performance prediction.
(2)
Comparative analysis of variable importance across multiple modeling approaches under standardized and non-standardized input scenarios.
(3)
Enhancement of model interpretability through PDP and ALE analyses, enabling more transparent and reliable assessment of factor influence in TBM penetration rate prediction.
The proposed framework is expected to improve the transparency of TBM performance prediction and support more informed decision-making in tunneling projects.

2. Materials and Methods

2.1. Data Used in the Study

The dataset employed in this study is based on field and laboratory measurements obtained from a tunnel project located in the Queens district of southwestern New York (USA). This dataset, which has been widely used in the literature for evaluating TBM performance [3,4,5], contains the key variables required for predicting the penetration rate (ROP) of tunnel boring machines. The Queens Water Tunnel has an approximate length of 7.5 km and was excavated using a high-power TBM. The geological conditions encountered during construction comprise complex metamorphic rock formations of Manhattan Schist, including shear zones, faults, and other localized zones of weakness [26,42,43].
The measured parameters include both geotechnical and geometric variables, such as uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), brittleness index (BI), mean spacing of weakness planes (DPW), and the angle between the tunnel axis and the weakness planes (α), as detailed in [44]. When considered collectively with other influencing factors, these parameters enable a holistic representation of sustainable advance capacity, accounting for cutting forces applied to the rock mass and cutter wear.
Variations in these parameters along the tunnel axis are illustrated in Figure 1, while the basic statistical characteristics of the dataset are summarized in Table 1. As clearly observed from Figure 1, the relationships between the input variables presented in the first five rows and the observed ROP values shown in the final column exhibit a highly complex structure. This complexity indicates that penetration rate cannot be adequately explained by a single simple relationship. Accordingly, multiple formulations with different mathematical structures were developed in this study, and the resulting models—after parameter optimization—were comparatively evaluated using various error performance metrics.
In this context, prior to initiating the modeling process, descriptive statistical analyses and z-score normalization were performed to guide the selection and treatment of model input variables. However, extreme values were intentionally retained and not removed from the dataset. Although outliers may cause a slight reduction in model performance, they often represent extreme yet realistic field conditions in natural processes and are therefore valuable from an engineering practice perspective.

2.2. Selection of Model Inputs

In the development of the prediction models, five primary input parameters influencing TBM performance were considered: brittleness index (BI), uniaxial compressive strength (UCS), mean spacing of weakness planes (DPW), the angle between the tunnel axis and the weakness planes (α), and Brazilian tensile strength (BTS). The model output was defined as the penetration rate of the tunnel boring machine (rate of penetration, ROP; m/h). The relationships between these input variables and ROP, as well as their mutual correlations, are presented in Figure 2 through a correlation matrix.
As shown in Figure 2, the strongest positive relationship with the penetration rate (ROP) is observed for the brittleness index (BI) (r = 0.5805). Uniaxial compressive strength (UCS) exhibits a moderate positive correlation with ROP (r = 0.2595), whereas the influence of the joint orientation angle (α) remains relatively weaker (r = 0.2196). In contrast, a pronounced negative relationship is identified between the mean spacing of weakness planes (DPW) and ROP (r = −0.4654), indicating that this parameter adversely affects the penetration rate. Similarly, a moderate negative correlation is observed between the BI–DPW interaction term and ROP (r = −0.2819). Brazilian tensile strength (BTS) shows a very weak positive correlation with ROP (r = 0.0931). Although the BTS variable has been neglected in many previous studies, preliminary analyses conducted within the scope of this work indicate that its inclusion provides a marginal yet measurable improvement in model performance; therefore, this parameter was retained in the model formulations. Overall, BI emerges as the most dominant factor influencing ROP, while UCS and α also contribute meaningfully to penetration rate prediction.

2.3. Models Used in the Study

In this study, five fundamental modeling approaches with increasing levels of complexity were evaluated to predict the penetration rate (ROP) of tunnel boring machines. During model development, the selection of independent variables and their order of inclusion were guided by the correlation matrix presented in Figure 2, and a stepwise structure was adopted, progressing from the simplest mathematical formulations to more complex models.
First, a linear regression model was employed within the framework of classical multiple linear regression, assuming linear relationships between the dependent variable and the input parameters for ROP prediction [45,46]. This approach aims to quantify the direct and proportional effects of model inputs on ROP.
To move beyond linear assumptions, an exponential regression model was developed in which a linear combination of the independent variables is expressed through an exponential function, thereby allowing nonlinear effects to be incorporated into the model structure [45].
In addition, the multiplicative nonlinear model represents ROP as a product of the independent variables raised to different power terms. This formulation enables parameter estimation using nonlinear regression techniques and provides a more flexible representation of the relative influence of each variable on ROP [46,47,48,49,50,51,52].
Hybrid nonlinear models, offering a more advanced structure, combine linear terms with exponential functions, logarithmic transformations, and interaction effects among variables. The parameters of these models were determined using heuristic optimization techniques, including Genetic Algorithms (GA), Particle Swarm Optimization (PSO), the Firefly Algorithm (FA), and the Symbiotic Organisms Search (SOS) method. By integrating the strengths of empirical modeling and machine learning–based optimization, the hybrid approach provides enhanced flexibility and representational capacity for ROP prediction [5,6,7].
All mathematical formulations developed to better capture the complex and nonlinear relationships observed between ROP and the independent variables in Figure 1 are summarized in Table 2. In this context, Equation (1) describes the effects of the variables within a linear framework, whereas Equation (2) transforms these relationships into a nonlinear form through exponential mapping. Equation (3) expresses variable effects using power-law terms, while Equations (4) and (5) represent hybrid model structures that incorporate both linear and nonlinear components. The most comprehensive formulation, Equation (6), further enhances model explanatory capability by including interaction terms among variables, such as BI × DPW. While the parameters of the first three models can be estimated using conventional regression techniques, heuristic optimization algorithms are required to determine the parameters of the more complex model formulations.

2.4. Determination of Model Parameters

2.4.1. Parameter Estimation Strategies

The estimation methods employed to determine model parameters vary directly with the mathematical structure and level of complexity of the models. For models with linear structures, parameter estimation is typically performed using the Ordinary Least Squares (OLS) method [50], while regularized regression techniques such as Ridge and Lasso are adopted in cases involving multicollinearity or overfitting. For exponential or logarithmically transformed models, appropriate mathematical transformations can be applied to linearize the problem, allowing classical linear regression methods to be used for parameter estimation.
In contrast, parameter estimation becomes considerably more challenging for nonlinear models, particularly those involving a large number of parameters. Although nonlinear least squares approaches and gradient-based optimization methods are commonly employed in such cases, strong interactions among variables and complex error surfaces often cause these methods to become trapped in local minima, limiting their ability to identify the global optimum [43,44,45,46,47,48,49].
To overcome these challenges, metaheuristic optimization algorithms have been widely adopted in the literature. By exploring the solution space more comprehensively through stochastic and guided search mechanisms, these algorithms aim to mitigate the local minimum problem and obtain solutions close to the global optimum. Genetic Algorithms (GA) [50,51], Particle Swarm Optimization (PSO) [52,53], and the Firefly Algorithm (FA) [54,55] are among the most frequently applied techniques [56,57,58] for TBM performance modeling.
In this study, the Differential Evolution (DE) algorithm [32,33] was employed to estimate the coefficients of all parametric models. DE is a population-based evolutionary optimization method that is particularly well suited for continuous and nonlinear problems, owing to its strong convergence capability and robustness. The algorithm iteratively applies mutation, crossover, and selection operations to an initially randomized population of candidate solutions in order to identify the optimal parameter set that minimizes the objective function. In this study, the objective function was defined as the minimization of the mean squared error (MSE) between the predicted and observed ROP values.
In the present implementation, the mutation factor (F) and crossover probability (CR) were not fixed but randomly selected at each iteration within ranges recommended in the literature (F ∈ [0.5, 0.8], CR ∈ [0.7, 0.9]). This strategy aims to prevent premature convergence, preserve population diversity, and enhance exploration of different regions of the search space. The selected intervals are consistent with commonly accepted DE parameter ranges that provide a balanced trade-off between exploration and exploitation. Among the key advantages of DE are its derivative-free nature, relatively simple parameter tuning, and high potential to achieve global optima even in the presence of complex error landscapes [6,59,60]. Owing to these properties, DE is regarded as a reliable and efficient parameter estimation tool for problems characterized by multivariate and strongly nonlinear relationships, such as TBM penetration rate prediction.

2.4.2. Model Performance Evaluation

To quantitatively assess the agreement between predicted values and observed data, several statistical performance indicators were employed to evaluate model performance. In this context, the most commonly used metrics—the coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE)—were considered [61,62,63]. The R2 coefficient reflects the explanatory power of the model, whereas RMSE captures sensitivity to large errors. MAE represents the average magnitude of prediction deviations, while MAPE expresses prediction errors in percentage terms, thereby enhancing interpretability [62].
To evaluate the generalization capability of machine learning-based models, datasets are typically divided into training and testing subsets [62,63]. In cases where the dataset size is limited, the robustness of the models can be enhanced by incorporating noise into the data [64].
For the evaluation of machine learning models, an initial data split of 80% for training and 20% for testing was adopted. However, recognizing that a single random split may lead to variability in performance metrics for moderately sized datasets, k-fold cross-validation was applied to improve model reliability. Specifically, k = 5 and k = 10 folds were considered. This approach enables repeated training and testing across different subsets of the dataset, thereby minimizing the effects of sampling variability on R2 and RMSE. Consequently, the reported performance metrics more accurately reflect the overall behavior of the models rather than being dependent on a single data partition.
The aforementioned performance measures were applied to both classical regression models and metaheuristic optimization–assisted nonlinear models, and the resulting outcomes were comparatively evaluated against machine learning–based approaches. This comprehensive assessment enabled a holistic analysis of model interpretability, error performance, and generalization capability.
To further examine the effectiveness of the Differential Evolution (DE) algorithm in parameter optimization, the parameters of the sixth parametric model were also optimized using widely adopted heuristic algorithms, including Harmony Search (HS), Particle Swarm Optimization (PSO), and Grey Wolf Optimization (GWO).
For all algorithms, the population size was set to ten times the number of parameters to be optimized (i.e., 70). The Symbiotic Organisms Search (SOS) algorithm does not require algorithm-specific control parameters, whereas for the Harmony Search (HS) algorithm, the harmony memory considering rate (HMCR) and pitch adjustment rate (PAR) were set to 0.95 and 0.55, respectively. For PSO and GWO, the standard algorithmic parameter settings were employed.

2.5. Jacobian-Based Elasticity Analysis

In this study, a Jacobian-based elasticity approach was employed to evaluate the sensitivity of the TBM penetration rate prediction model with respect to its input variables [65,66,67]. For a single-output model:
y = f ( x 1 , x 2 , , x n ) , y R
and
x = [ x 1 , x 2 , , x n ] T R n
representing the vector of input variables.
For each observation in the dataset, local derivatives of the model with respect to the input variables are computed, providing a detailed depiction of the model’s sensitivity structure. By aggregating the derivatives across all observations, a Jacobian matrix is constructed with dimensions m × n , where m is the number of observations and n is the number of input variables:
J = y 1 x 1 y 1 x 2 y 1 x n y m x 1 y m x 2 y m x n R m × n
Here, y i represents the model’s predicted value for the i -th observation. Each row corresponds to the local sensitivity vector at the respective observation point [65,66].
To assess the relative influence of each variable, the elasticity measure is used. Elasticity is a dimensionless quantity that expresses the proportional effect of a proportional change in an input variable on the output. Following the classical econometric definition, it is given by:
E i = y x i x i y
This expression provides a general definition of elasticity for any model, indicating the proportional impact on y of a 1% change in x i . Computing elasticity for all observations yields an elasticity matrix E R m × n [66], with each element calculated as:
E i j = y i x j x i j y i
In vector form, the elasticity matrix can be expressed as:
E = J ( X / Y )
where X is the observation matrix ( m × n ), Y is the predicted output vector ( m × 1 ), and denotes element-wise multiplication. The notation X / Y indicates that each x i j value is divided by the corresponding y i value. For clarity, the X matrix comprises the column vectors of the model variables BI, UCS, α, DPW, and BTS, while Y consists of the ROP column vector.
This approach enables quantitative analysis of the model’s sensitivity to each input variable, both at the individual observation level and across the model as a whole. Moreover, for nonlinear and “black-box” models such as SVM, GAM, or other machine learning approaches, this method facilitates local-level interpretability, providing a clearer insight into the decision-making mechanism of the model.
For parametric models, derivatives are obtained analytically, whereas for black-box machine learning models, the Jacobian can be computed numerically. This allows the approach to be applied consistently across all model types. Parametric model Jacobians and elasticities are detailed in Appendix A.1, and the interpretability and sensitivity analyses of black-box models are presented in Appendix A.2.

3. Model Application and Results

3.1. Implementation of Parametric Models (M1–M6)

In this section, the implementation process of the parametric models M1–M6 developed for predicting the TBM penetration rate (ROP) is presented. The study jointly evaluates both classical regression approaches (linear, exponential, and multiplicative nonlinear models) and hybrid nonlinear models supported by heuristic optimization techniques.
The parameters of the models defined in Equations (1)–(3) were estimated using conventional estimation methods. Accordingly, the Ordinary Least Squares (OLS) method was employed for the linear model [36,37], while parameter estimation for the exponential and multiplicative nonlinear models was performed using the Nonlinear Least Squares (NLS) approach [38,39,40,41].
In contrast, for the hybrid models expressed in Equations (4)–(6), parameter optimization was carried out using the Differential Evolution (DE) algorithm, which is capable of better adapting to nonlinear and complex data relationships [26,27].
The optimal model parameters obtained are presented in Table 3, while the performance evaluations of the models based on various error metrics are comparatively summarized in Table 4.
An examination of the results presented in Table 4 indicates a consistent increase in the coefficient of determination (R2) from M1 to M6, accompanied by a pronounced decrease in RMSE and MAE values. This trend can be attributed to the evolution of the model structures from linear and relatively simple formulations toward more complex nonlinear forms, allowing the relationships among variables to be represented more effectively.
While the independent variables in the initial models were expressed through linear or limited logarithmic and exponential transformations, the inclusion of exponential, logarithmic, and interaction terms in the subsequent models led to a substantial improvement in prediction accuracy. The M6 model, which was implemented for the first time in this study and explicitly incorporates interaction effects among variables, achieved the highest goodness-of-fit values and the lowest error metrics in predicting TBM performance.
It was observed that the Brazilian tensile strength (BTS) variable, which has often been neglected in previous studies, did not provide a significant contribution in Equations (1)–(3). In contrast, in the interaction-based Equations (4)–(6), BTS markedly improved model performance. To demonstrate the effectiveness of the Differential Evolution (DE) algorithm used for optimizing the parameters of the M1–M6 models, the parameters of the best-performing M6 equation were also estimated using commonly applied heuristic algorithms, including Harmony Search (HS), Particle Swarm Optimization (PSO), Symbiotic Organisms Search (SOS), and Grey Wolf Optimization (GWO). The comparative results are summarized in Table 5.
As shown in Table 5, the DE and SOS algorithms yielded the best performance across all error evaluation criteria, whereas the other algorithms became trapped in local optima. The convergence behavior of the algorithms for the M6 model is illustrated in Figure 3a. As can be seen from Figure 3a, no significant improvement in the results was observed after 100 iterations for any of the algorithms. However, as also evident from Table 5, DE and SOS successfully converged to the global optimum, while the remaining algorithms stagnated at local optima, showing no noticeable improvement from the 100th to the 1000th iteration due to population similarity. Similar performance trends for the DE and SOS algorithms were also observed for the other models besides M6.
Figure 3 illustrates the convergence behavior of the M6 model, along with the variation in the model predictions and measured values along the tunnel length, and the corresponding scatter plot.
As shown in Table 5 and Figure 3, the M6 model stands out as the most successful among the investigated parametric models, achieving the highest coefficient of determination (R2 = 0.69) and the lowest RMSE value (0.20). Moreover, the fact that its test performance exceeds that of the parametric models commonly reported as benchmarks in the literature [4,5,6] demonstrates that the proposed model offers a robust and reliable alternative for predicting TBM penetration rate. In engineering applications, however, high predictive accuracy alone is not sufficient; understanding the relative contributions of the input variables to this performance is equally important. Accordingly, this study not only focuses on improving model accuracy but also quantifies the relative influence of the model inputs. This dual emphasis constitutes one of the most significant contributions of the present study to TBM performance modeling and is discussed in detail in the following section.
The proposed M6 model provides a parametric framework for estimating TBM penetration rate by explicitly incorporating key model variables, including BI, UCS, α, DPW, and BTS. The primary reason for its superior performance compared to other parametric approaches lies in the inclusion of the interaction term between BI and DPW. This interaction effectively captures the nonlinear mechanical behavior observed during penetration by jointly representing the cutter–rock contact mechanism, energy transfer, and stress redistribution processes. Although the calibrated model coefficients are site-specific, the inclusion of fundamental mechanical relationships—such as the BI × DPW interaction—allows the model structure to be adapted to different geological conditions, TBM types, and operational settings through recalibration or re-optimization using new field data. Consequently, the M6 model can be confidently employed as a field-calibrated predictive tool while also offering sufficient flexibility to serve as a transferable parametric modeling framework. This demonstrates that the model’s superior performance is not only physically grounded but also potentially generalizable.

3.2. Feature Importance Analysis for Parametric Models

As shown in Table 1, the statistical characteristics and orders of magnitude of the variables constituting the model inputs differ substantially. For instance, uniaxial compressive strength (UCS) varies on the order of approximately 150 MPa, whereas the average spacing between discontinuities (DPW) is on the order of about 1 m. Such scale disparities may lead to misleading interpretations of the relative influence of variables, particularly in derivative-based sensitivity analyses.
For this reason, a Jacobian-based elasticity analysis was conducted under two different scenarios:
  • Scenario I: Model variables in their original scales;
  • Scenario II: Model variables standardized using Z-score normalization.
In both scenarios, the sensitivity of the model output with respect to the input variables was quantified using an elasticity formulation, and the resulting relative weights are presented comparatively in Figure 4. This approach enables a scale-independent assessment and facilitates a more physically meaningful interpretation of the effects of the input variables.
An examination of the correlation matrix presented in Figure 2 indicates that the strongest positive relationship with the penetration rate (ROP) is observed for the rock brittleness index (BI) (r = 0.5805), followed by uniaxial compressive strength (UCS) (r = 0.2595). The sensitivity results shown in Figure 4 are consistent with these findings, demonstrating that BI and UCS exert the most dominant influence on ROP across all models, regardless of whether original or standardized data are used. This consistency confirms that the developed parametric models capture the variable–output relationships in a physically coherent and meaningful manner.
On the other hand, the relative importance of the discontinuity orientation (α) varies depending on both the data scaling and the model structure. While α exhibits a comparable weight across all models in the original-scale analyses, its influence becomes largely confined to the M1–M3 models when standardized data are used, diminishing markedly in the remaining models. As also indicated in Table 2, this behavior stems from the wide range of variation exhibited by the α variable (min: 2°, max: 89°, mean: 44.715°, standard deviation: 23.279°). Such a broad distribution artificially inflates the magnitude of derivatives computed on the original scale, whereas standardization effectively mitigates this effect.
A pronounced negative relationship is observed between DPW and ROP (r = –0.4654), indicating that an increase in discontinuity spacing leads to a reduction in penetration rate and that DPW adversely affects TBM performance. PDP- and ALE-based analyses reveal that DPW exerts a moderate yet consistent influence, whereas the Jacobian (elasticity-based) analysis assigns relatively higher importance to DPW in certain models. This finding suggests that even small variations in DPW can induce meaningful local effects on ROP.
The Brazilian tensile strength (BTS) variable, by contrast, exhibits relatively low importance across all sensitivity analysis methods and models. This result indicates that the influence of BTS on ROP is secondary compared to other mechanical and structural parameters, in agreement with the outcomes of the correlation analysis.
The comparative results presented in Table 6 clearly demonstrate that PDP and ALE methods capture the global average effects of the input variables, whereas the Jacobian (elasticity-based) approach reflects sensitivity based on local derivative information. Nevertheless, the strong consistency among the three methods in terms of the ranking of key variables confirms that the developed parametric models are reliable from both statistical and engineering perspectives.

3.3. Noise-Based Robustness and Variable Sensitivity Analysis for Parametric Models

To evaluate the behavior of the parametric models (M1–M6) developed in this study under measurement uncertainties and to assess the stability of variable effects, a noise-based robustness and sensitivity analysis was conducted. This analysis does not rely on re-estimation of model coefficients; rather, it aims to examine the responses of the final calibrated parametric models under perturbed input data. Accordingly, the applied methodology constitutes an input uncertainty propagation analysis, focusing on how measurement uncertainties propagate through the model to affect the outputs, rather than a re-fitting or re-training procedure.
Within the scope of the analysis, the coefficients of each parametric model were optimized once and subsequently held constant throughout the evaluation process. This ensures that the assessment exclusively reflects the impact of measurement errors in the input variables on the model outputs, while eliminating any effects arising from changes in model structure or parameter values.
To represent measurement uncertainties, multiplicative Gaussian noise was added to the model inputs. The noise injection process is defined as follows:
X n = X ( 1 + ε )
Here, X denotes the base data matrix, represents element-wise multiplication, and ε∼N(0, σn2) denotes a random error term with zero mean and variance corresponding to the selected noise level. Accordingly, the perturbed data for each observation and variable are obtained as follows:
x i j n = x i j ( 1 + ε i j )
In this study, noise levels were set to 0.5%, 1%, and 5%. The primary reason for adopting a multiplicative noise approach is that measurement errors in geomechanical and rock mechanics parameters are predominantly scale-dependent (proportional) in nature. In particular, measurement uncertainties associated with parameters such as UCS, BI, and BTS typically manifest as a certain percentage of the measured value; this characteristic limits the physical consistency of additive error models. The multiplicative approach preserves the engineering meaning of the model inputs by preventing the occurrence of negative or physically meaningless values.
Following the noise injection process, the models were not retrained. All parametric models retained the optimized coefficients obtained from the noise-free dataset. This design choice enabled the analysis to isolate the effect of measurement uncertainty in the input variables on model outputs, without confounding influences arising from changes in model parameter stability.
To evaluate the sensitivity of model outputs to input variables, a Jacobian-based elasticity analysis was applied for each noise-perturbed scenario. For each observation, the elasticity value was computed as follows:
E i j n = f x j x i n x i j n y ^ i n
Here, y ^ i n denotes the model prediction obtained using the noise-perturbed inputs. The absolute means of the computed elasticity values were then calculated and normalized by dividing them by the total elasticity. This normalization procedure ensured that variables with different scales became directly comparable.
The results indicate that, as the noise level increases, the model prediction performance (e.g., RMSE) decreases as expected; however, the relative importance ranking of the variables is largely preserved. This finding demonstrates that, although parametric models experience a reduction in predictive precision under measurement uncertainty, they are still able to maintain the fundamental physical relationships among variables and the structural consistency of the model.
In this context, the applied noise-based analysis demonstrates that the proposed parametric models exhibit a robust structure not only in terms of statistical accuracy, but also with respect to interpretability, physical consistency, and engineering reliability.
Figure 5 illustrates how the relative sensitivities of input variables in the developed parametric models (M1–M6) vary under different noise levels. The panels represent cases with multiplicative Gaussian noise levels of (a) 0.5%, (b) 1%, and (c) 5%, respectively. All sensitivity values were computed based on the Jacobian-based elasticity analysis and subsequently normalized.
An examination of Figure 5 reveals that, as the noise level increases, the fluctuations observed in model outputs and the associated prediction uncertainty also increase. Nevertheless, the relative importance ranking of the variables is largely preserved across all noise levels. In particular, the blockiness index (BI) and uniaxial compressive strength (UCS) consistently emerge as the dominant variables for all models and noise scenarios.
While the distribution of variable sensitivities exhibits a highly stable pattern under low (0.5%) and moderate (1%) noise levels, limited changes in relative weights are observed in some models at the 5% noise level. These variations are not sufficient to alter the underlying physical trends; notably, the overall hierarchy among the BI–UCS–DPW variables remains intact. This behavior indicates that the models are capable of maintaining structural consistency even under measurement uncertainty.
The relative contributions of the α and BTS variables display minor fluctuations as noise increases and do not become dominant at any noise level. This outcome suggests that the influence of these parameters on ROP remains secondary, and their sensitivity to measurement noise is relatively limited.
To facilitate a clearer interpretation of the trends presented in Figure 5, the corresponding numerical values are provided in Table 7. An examination of Table 7 indicates that increases in the noise level lead to only limited fluctuations in elasticity values, while the relative importance ranking of the variables is largely preserved across all models. BI and UCS consistently emerge as the most dominant variables in all scenarios, whereas ALPHA and DPW exhibit moderate influence, and BTS shows a comparatively limited effect.
Although an increase in noise level (from 0.5% to 5%) results in a slight reduction in predictive sensitivity, it does not induce any structural disruption in model behavior. This observation demonstrates that the models are capable of preserving the fundamental physical relationships among variables as well as the relative hierarchy of their effects. Accordingly, the numerical results presented in Table 7 provide strong evidence that the parametric models yield robust and interpretable outcomes not only in terms of accuracy but also under measurement uncertainty.
In summary, when the visual trends observed in the figures are jointly interpreted with the quantitative evidence provided in Table 7, it becomes more evident which parameters exert strong influence and which require more cautious interpretation within the proposed modeling framework.

3.4. Feature Importance Analysis for Machine Learning Models

Feature importance analysis is a fundamental explainability approach that reveals which input variables a machine learning model relies on, and to what extent, when predicting the target variable. This analysis is critically important not only for assessing predictive performance but also for understanding through which variables this performance is achieved. In particular, for tree-based methods (such as Random Forest and Gradient Boosting), feature importance stands out as a powerful tool for interpreting the model’s decision-making mechanism [36,37,38].
The primary objective of feature importance analysis is to determine the relative influence of variables in the dataset and thereby enhance the transparency of the model’s internal structure. In this way, the inputs that contribute most significantly to prediction accuracy can be clearly identified, substantially reducing the “black-box” nature of the model [44].
In addition, feature importance analysis offers significant advantages in terms of dimensionality reduction and model simplification. By removing variables with low contribution or high collinearity, more parsimonious and interpretable models can be obtained, leading to reduced training time and computational cost while improving generalization performance [44]. Moreover, this approach enables the comparison of variable priorities across different model types applied to the same dataset, allowing a clearer assessment of each variable’s sensitivity within the modeling framework.
Within the scope of this study, the effects of input variables used in modeling the TBM penetration rate (ROP) were examined in detail using four different machine learning methods (Random Forest, Bagged Trees, SVM, and GAM). Model performances under both original and standardized data scenarios were comparatively evaluated on a method-by-method basis and are presented in Table 8.
An examination of Table 8 reveals that, under both scenarios, the GAM exhibits a clearly superior performance compared to the other machine learning models, achieving the highest correlation coefficient (R) and coefficient of determination (R2), along with the lowest error metrics (RMSE and MAE). In particular, the GAM attains R2 = 0.906 and RMSE = 0.110 in Scenario I, indicating its strong capability to accurately capture the nonlinear relationships governing TBM penetration rate.
From a broader perspective, the Random Forest, Bagged Trees, and SVM models display only marginal performance variations with respect to data scaling (original versus standardized data). This behavior suggests that these models are relatively robust to scaling choices. In contrast, the use of standardized data in the GAM (Scenario II) results in a noticeable degradation in performance, reflected by a decrease in R2 and increases in RMSE and MAE. Nevertheless, despite this reduction, the GAM continues to outperform the other methods in both scenarios. This outcome highlights the high representational capacity of GAM and suggests that its performance may be sensitive to the original scale and distribution of input variables.
Following the evaluation of model performance across scenarios and methods, a feature importance analysis was conducted to identify which input variables contribute most significantly to predictive accuracy. The results of this analysis are presented in Table 9 and Figure 6. This analysis not only quantifies the overall predictive success of the models but also explicitly reveals, as further illustrated in Figure 7, the extent to which individual variables contribute to that success.
Table 9 presents the relative importance of input variables obtained for the Random Forest (RF), Bagged Trees (BT), SVM, and GAM methods under Scenario I (original variables) and Scenario II (standardized variables). The findings indicate that the models produce largely consistent results across both scenarios and exhibit similar learning dynamics in terms of variable prioritization.
As evidenced by Table 9 and Figure 6, the BI and ALFA variables attain the highest importance scores across all models, emerging as the dominant inputs in predicting TBM penetration rate. In Scenario II, particularly for the SVM and GAMs, the distribution of variable importance becomes more balanced, with a noticeable increase in the relative contributions of UCS and DPW. In contrast, the tree-based models (RF and BT) exhibit only limited inter-scenario variation, with variable rankings largely preserved. This behavior indicates that tree-based approaches are less sensitive to variable scaling, reinforcing their robustness with respect to input data representation.
When Figure 4, Figure 6 and Figure 7 are jointly examined, the observed differences between machine learning (ML) models and parametric models clearly indicate that the way variable contributions are reflected in the model outputs is strongly dependent on the model structure. In particular, although the ALPHA parameter exhibits a limited influence on the results of parametric models, it assumes a dominant role across all machine learning approaches. This contrast highlights a pronounced divergence between the two modeling paradigms in terms of their learning mechanisms and their capacity to capture nonlinear relationships.
Figure 7 demonstrates that ML methods exhibit a notable similarity in their data-driven learning strategies. Among these, GAM distinctly stands out due to its ability to capture local variations, achieving superior performance across all evaluation metrics. Furthermore, Figure 8 clearly shows that while GAM and SVM display a more fluctuating learning process, Random Forest (RF) and Bagged Trees (BT) maintain a relatively stable performance throughout the training stages.
Similarly, the BTS variable—often treated as secondary in previous parametric studies and some ML approaches—provides a performance-enhancing contribution in both nonlinear parametric models and ML methods. This finding indicates that BTS plays a complementary yet meaningful role in explaining TBM performance.
Overall, the findings demonstrate that variable importance analyses yield consistent and reliable results across different scaling scenarios and model classes. Moreover, ML approaches exhibit a greater potential to represent the complex interactions influencing TBM penetration rate more effectively and transparently compared to parametric models.

3.5. Variable Contributions in Parametric and Machine Learning Approaches

To investigate variable contributions and robustness to noise across different model classes in predicting TBM penetration rate, a comprehensive analysis was conducted using both parametric and machine learning (ML) methods. Table 9 presents the variable importance results for Random Forest (RF), Bagged Trees (BT), SVM, and Generalized Additive Model (GAM) under Scenario I (original data) and Scenario II (Z-score standardized data).
The results indicate that ML models produce largely consistent outcomes across both scenarios, exhibiting similar learning dynamics. Across all models, BI and ALPHA emerge as dominant inputs in predicting TBM penetration rate. However, when standardized data are used, particularly in SVM and GAM, variable importance distributions become more balanced, with UCS and DPW showing notable increases in their relative contributions. In contrast, tree-based models (RF and BT) preserve their variable rankings across scenarios, demonstrating limited sensitivity to variable scaling.
The comparison between parametric and ML models reveals that the way variables are incorporated into the model is closely linked to the underlying model structure. While ALPHA exhibits a relatively limited effect in parametric models, it assumes a dominant role in ML models. This difference underscores the ability of ML approaches to learn nonlinear interactions and complex relationships more effectively than parametric formulations. Similarly, while BTS plays a secondary role in parametric models, it significantly enhances model performance in ML methods—particularly in GAM and SVM—thereby assuming a complementary role in explaining TBM performance.
Noise robustness analyses (Table 7) further provide valuable insights. The Jacobian-based normalized elasticity values obtained under different multiplicative Gaussian noise levels (0.5%, 1%, and 5%) indicate that the models largely preserve their fundamental variable priorities. BI and ALPHA remain dominant across all models and noise levels, while other variables exhibit only limited fluctuations. This finding confirms that the models retain a considerable degree of robustness to measurement uncertainty when predicting TBM penetration rate.
In summary, variable importance analyses deliver consistent and reliable insights across scaling scenarios and model classes. However, structural differences among models play a critical role in interpreting variable contributions. Parametric models are particularly effective in representing fundamental relationships within a predictable mathematical framework, but their capacity to capture nonlinear and complex interactions is limited. In contrast, ML models—especially GAM and SVM—are more capable of capturing such complexity, modeling interactions among variables that influence TBM penetration rate more effectively. This demonstrates that the two approaches are complementary, with ML methods offering superior potential when complex field data and noise robustness are of concern.

4. Discussion

The variable importance analyses conducted using both parametric and machine learning (ML) models provide a multidimensional perspective on the factors influencing TBM penetration rate (ROP). Because input variables in parametric models exhibit markedly different scales (e.g., UCS ≈ 150, DPW ≈ 1), direct comparison of variable effects is challenging. Therefore, Jacobian-based elasticity analysis was performed under two scenarios: original-scale variables (Scenario I) and standardized variables (Scenario II). The resulting relative variable weights are comparatively presented in Figure 3, allowing assessment of scale effects on the results.
All coefficients of the six parametric models (M1–M6) were optimized using the Differential Evolution (DE) algorithm. Consequently, variable importance evaluations were based not on fixed or literature-derived coefficients, but on model-specific parameter sets optimized for predictive performance. This approach enhanced the reliability of parametric analyses by ensuring consistency between variable interpretation and model accuracy.
Parametric analyses reveal that the strongest positive relationship with ROP is associated with the Blockiness Index (BI, r = 0.5805), followed by Uniaxial Compressive Strength (UCS, r = 0.2595). The joint angle (α) exhibits a relatively limited effect (r = 0.2196), while DPW shows a pronounced negative relationship with ROP (r = −0.4654). The dominant influence of BI and UCS is consistently preserved across all parametric models and both scenarios, in agreement with correlation matrix findings. Scenario-related differences reflect redistribution effects caused by scaling rather than changes in physical influence.
Among the parametric formulations, the proposed M6 model provides a more balanced and physically meaningful distribution of variable contributions. Elasticity analysis based on DE-optimized M6 coefficients not only confirms the dominance of BI and UCS, but also highlights indirect and interaction-based contributions from DPW and BTS. This indicates that the interaction and logarithmic terms embedded in M6 more realistically represent TBM–rock interactions. Accordingly, M6 achieves higher predictive accuracy than existing parametric correlations reported in the literature.
Variable importance analyses for ML models (RF, BT, SVM, GAM) also yield largely consistent results across both scenarios (Figure 4, Table 6). The key distinction between parametric and ML models lies in how variable contributions are embedded within the model structure. ALPHA, which has limited influence in parametric models, attains high importance in ML models—demonstrating ML methods’ superior ability to capture nonlinear and data-driven interactions. Similarly, BTS, while secondary in parametric models, contributes significantly in ML approaches—particularly in GAM, reaching importance levels of 14–15%.
Noise analyses presented in Table 7 (0.5%→5%) further demonstrate model robustness. Although increasing noise introduces minor fluctuations in predictions, relative variable rankings remain preserved. BI and UCS consistently remain dominant, ALPHA and DPW maintain moderate influence, and BTS continues to play a secondary role. This confirms that the models can reliably represent both linear and nonlinear relationships even under measurement uncertainty.
Performance comparisons confirm that Model M6 outperforms all other models in terms of both error metrics (RMSE, MAE) and explanatory power. Compared to Model 1, M6 achieves a 9.07% reduction in RMSE and a 10.48% increase in R2. Similar improvements across other models indicate that, despite its increased complexity, M6 does not exhibit overfitting and delivers stable, balanced predictions—making it suitable as a reference model for subsequent analyses.
Overall, the combined use of parametric and ML models enables a comprehensive assessment of variable effects on TBM penetration rate, integrating both physical interpretability and nonlinear interaction modeling. While parametric models—particularly the DE-optimized M6—provide a transparent and physically consistent representation of processes, ML models successfully capture complex patterns embedded in the data. Together, these approaches form a complementary and robust modeling framework for TBM performance prediction.

5. Conclusions

In this study, six parametric models (M1–M6) were developed to predict the TBM penetration rate (ROP), with all coefficients optimized using the Differential Evolution (DE) algorithm. Model performance was evaluated using optimized coefficients, and variable importance was systematically examined through Jacobian-based elasticity analyses under both original (Scenario I) and standardized (Scenario II) data. This integrated framework enabled a consistent assessment of predictive accuracy, robustness, and scale sensitivity.
The comparative analyses demonstrate that the proposed M6 model outperforms widely used parametric formulations reported in the literature. Owing to the inclusion of interaction and logarithmic terms, M6 provides a more realistic representation of TBM–rock interactions and achieves superior predictive performance, yielding the lowest error metrics (RMSE, MAE) and the highest explanatory power (R2) among the evaluated parametric models.
Elasticity analyses consistently identify brittleness index (BI) and uniaxial compressive strength (UCS) as the dominant factors governing ROP, while mean spacing of weakness planes (DPW) exhibits a clear negative influence. The similarity of elasticity trends across both scenarios indicates that data scaling does not alter the underlying physical effects but rather redistributes relative contributions. The interaction terms incorporated in M6 further enhance the model’s ability to capture complex inter-variable relationships that cannot be adequately represented by linear formulations.
Machine learning (ML) models, including RF, BT, SVM, and GAM, yield results that are broadly consistent with the parametric findings. In particular, the increased prominence of the joint orientation angle (α) in ML models highlights their capability to capture nonlinear effects and complex dependencies. Moreover, although Brazilian tensile strength (BTS) plays a secondary role in classical parametric models, its inclusion contributes measurably to prediction accuracy, especially in GAM and nonlinear parametric formulations.
Noise analyses conducted under varying noise levels (0.5–5%) confirm the robustness of the proposed framework, as relative variable importance rankings remain stable. BI and UCS consistently retain their dominant influence, while α and DPW exhibit moderate effects and BTS maintains a limited yet meaningful contribution, supporting the physical interpretability and engineering relevance of the results.
Overall, the results demonstrate that the proposed M6 parametric model, optimized using Differential Evolution and supported by Jacobian-based elasticity analysis, achieves the most balanced performance among all evaluated approaches. While machine learning models such as GAM and SVM effectively capture nonlinear interactions, the M6 model offers superior interpretability, physical consistency, and competitive predictive accuracy. The hybrid evaluation framework adopted in this study therefore provides a robust and practically applicable methodology for reliable TBM penetration rate prediction under complex geological conditions.
Future studies may extend the current framework by comparing Jacobian-based elasticity results with model-agnostic explainability techniques such as SHAP or permutation importance. Such a comparison would strengthen the interpretability of the proposed approach by highlighting both local and global variable contributions.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app16031301/s1, Table S1: Supplementary Materials.

Author Contributions

Conceptualization, H.K. and D.A.; Methodology, H.K.; Software, H.K.; Validation, H.K. and D.A.; Formal analysis, H.K. and D.A.; Investigation, H.K. and D.A.; Resources, H.K. and D.A.; Data curation, H.K. and D.A.; Writing—original draft preparation, H.K. and D.A.; Writing—review & editing, H.K. and D.A.; Visualization, H.K.; Supervision, H.K.; Project administration, H.K.; Funding acquisition, D.A. All authors have read and agreed to the published version of the manuscript.

Funding

No external funding was received for this study.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the Supplementary Materials.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

SymbolDescription
k Observation index (1, 2, …, m); m = total number of observations
x ( k ) Input vector of the k -th observation [ B I , U C S , α , D P W , B T S ]
R O P ( k ) Predicted TBM penetration rate for the k -th observation
J ( k ) Numerical Jacobian matrix for the k -th observation (local derivatives with respect to input variables)
E i ( k ) Elasticity of the i -th input variable for the k -th observation
i Index of input variables (1 = BI, 2 = UCS, 3 = α, 4 = DPW, 5 = BTS)

Appendix A

Appendix A.1. Jacobian Components and Elasticities of Parametric Models

ModelJacobian (∂ROP/∂xi)Elasticities (Ei)
Model 1
(Linear)
∂ROP/∂BI = p2E_BI = p2·BI/ROP
∂ROP/∂UCS = p3E_UCS = p3·UCS/ROP
∂ROP/∂α = p4E_α = p4·α/ROP
∂ROP/∂DPW = p5E_DPW = p5·DPW/ROP
∂ROP/∂BTS = p6E_BTS = p6·BTS/ROP
Model 2
(Exponential)
∂ROP/∂BI = p2·ROPE_BI = p2·BI
∂ROP/∂UCS = p3·ROPE_UCS = p3·
∂ROP/∂α = p4·ROPUCS E_α = p4·α
∂ROP/∂DPW = p5 ROPE_DPW = p5·DPW
∂ROP/∂BTS = p6·ROPE_BTS = p6·BTS
Model 3
(Power)
∂ROP/∂BI = p2·ROP/BIE_BI = p2
∂ROP/∂UCS = p3·ROP/UCSE_UCS = p3
∂ROP/∂α = p4·ROP/αE_α = p4
∂ROP/∂DPW = p5·ROP/DPWE_DPW = p5
∂ROP/∂BTS = p6·ROP/BTSE_BTS = p6
Model 4
(Mixed Log–Power)
∂ROP/∂BI = p2E_BI = p2·BI/ROP
∂ROP/∂UCS = p3E_UCS = p3·UCS/ROP
∂ROP/∂α = p4/αE_α = p4/ROP
∂ROP/∂DPW = p5·p6·DPW^(p6−1)E_DPW = p5·p6·DPW^p6/ROP
∂ROP/∂BTS = p7E_BTS = p7·BTS/ROP
Model 5
(Generalized Power)
∂ROP/∂BI = p2E_BI = p2·BI/ROP
∂ROP/∂UCS = p3E_UCS = p3·UCS/ROP
∂ROP/∂α = p4·p5·α^(p5−1)E_α = p4·p5·α^p5/ROP
∂ROP/∂DPW = p6·p7·DPW^(p7−1)E_DPW = p6·p7·DPW^p7/ROP
∂ROP/∂BTS = p8E_BTS = p8·BTS/ROP
Model 6
(Interaction Term)
∂ROP/∂BI = p2 + p6·DPWE_BI = (p2 + p6·DPW)·BI/ROP
∂ROP/∂UCS = p3E_UCS = p3·UCS/ROP
∂ROP/∂α = p4/αE_α = p4/ROP
∂ROP/∂DPW = p5 + p6·BIE_DPW = (p5 + p6·BI)·DPW/ROP
∂ROP/∂BTS = p7E_BTS = p7·BTS/ROP

Appendix A.2. Interpretability and Sensitivity Analyses of Black-Box Models

  • Input vector for each observation
x ( k ) = [ B I ( k ) , U C S ( k ) , α ( k ) , D P W ( k ) , B T S ( k ) ]
2.
Numerical Jacobian for a single observation
J ( k ) = R O P ( k ) B I R O P ( k ) U C S R O P ( k ) α R O P ( k ) D P W R O P ( k ) B T S
3.
Jacobian matrix for all observations
J = R O P ( 1 ) B I R O P ( 1 ) U C S R O P ( 1 ) α R O P ( 1 ) D P W R O P 1 B T S R O P ( 2 ) B I R O P ( 2 ) U C S R O P ( 2 ) α R O P ( 2 ) D P W R O P ( 2 ) B T S R O P ( m ) B I R O P ( m ) U C S R O P ( m ) α R O P ( m ) D P W R O P ( m ) B T S
4.
Numerical derivative (finite difference)
R O P ( k ) x i R O P ( k ) ( x i + h ) R O P ( k ) ( x i ) h , h = 0.001 x i
5.
Elasticity
E i ( k ) = R O P ( k ) x i x i ( k ) R O P ( k )

References

  1. Tarkoy, P.J. Rock Hardness Index Properties and Geotechnical Parameters for Predicting Tunnel Boring Machine Performance; University of Illinois at Urbana-Champaign: Urbana, IL, USA, 1975. [Google Scholar]
  2. Nelson, P. Tunnel Boring Machine Performance in Sedimentary Rocks; Cornell University: Ithaca, NY, USA, 1983. [Google Scholar]
  3. Kahraman, S. Correlation of TBM and drilling machine performances with rock brittleness. Eng. Geol. 2002, 65, 269–283. [Google Scholar] [CrossRef]
  4. Yağız, S. Utilizing Rock Mass Properties to Predict TBM Performance in Hard Rock Conditions. Tunn. Undergr. Space Technol. 2008, 23, 326–339. [Google Scholar] [CrossRef]
  5. Yağız, S.; Karahan, H. Prediction of Hard Rock TBM Penetration Rate Using Particle Swarm Optimization. Int. J. Rock Mech. Min. Sci. 2011, 48, 427–433. [Google Scholar] [CrossRef]
  6. Yağız, S.; Karahan, H. Application of Various Optimization Techniques to Predict TBM Penetration Rate in Rock Mass and Comparison of Their Performance. Int. J. Rock Mech. Min. Sci. 2015, 80, 308–315. [Google Scholar] [CrossRef]
  7. Khoshzaher, E.; Chakeri, H.; Bazargan, S.; Mousapour, H. The prediction of EPB-TBM performance using firefly algorithms and particle swarm optimization. Rud.-Geološko-Naft. Zb. 2023, 38, 79–86. [Google Scholar] [CrossRef]
  8. Benardos, A.G.; Kaliampakos, D.C. Modelling TBM performance with artificial neural networks. Tunn. Undergr. Space Technol. 2004, 19, 597–605. [Google Scholar] [CrossRef]
  9. Benardos, A. Artificial intelligence in underground development: A study of TBM performance. Undergr. Spaces 2008, 102, 21–32. [Google Scholar]
  10. Javad, G.; Narges, T. Application of artificial neural networks to the prediction of tunnel boring machine penetration rate. Min. Sci. Technol. 2010, 20, 727–733. [Google Scholar] [CrossRef]
  11. Torabi, S.R.; Shirazi, H.; Hajali, H.; Monjezi, M. Study of the influence of geotechnical parameters on the TBM performance in Tehran–Shomal highway project using ANN and SPSS. Arab. J. Geosci. 2013, 6, 1215–1227. [Google Scholar] [CrossRef]
  12. Fattahi, H.; Babanouri, N. Applying optimized support vector regression models for prediction of tunnel boring machine performance. Geotech. Geol. Eng. 2017, 35, 2205–2217. [Google Scholar] [CrossRef]
  13. La, Y.S.; Kim, M.I.; Kim, B. Prediction of replacement period of shield TBM disc cutter using SVM. J. Korean Tunn. Undergr. Space Assoc. 2019, 21, 641–656. [Google Scholar]
  14. Liu, B.; Wang, R.; Guan, Z.; Li, J.; Xu, Z.; Guo, X.; Wang, Y. Improved support vector regression models for predicting rock mass parameters using tunnel boring machine driving data. Tunn. Undergr. Space Technol. 2019, 91, 102958. [Google Scholar] [CrossRef]
  15. Koopialipoor, M.; Fahimifar, A.; Ghaleini, E.N.; Momenzadeh, M.; Armaghani, D.J. Development of a new hybrid ANN for solving a geotechnical problem related to tunnel boring machine performance. Eng. Comput. 2020, 36, 345–357. [Google Scholar] [CrossRef]
  16. Afradi, A.; Ebrahimabadi, A.; Hallajian, T. Prediction of TBM penetration rate using support vector machine. GEOSABERES Rev. De Estud. Geoeducacionais 2020, 11, 467–479. [Google Scholar] [CrossRef]
  17. Hu, M.; Li, B.; Zhang, B.; Wang, R.; Chen, L. Improved SVR method for predicting the cutting force of a TBM cutter using linear cutting machine test data. KSCE J. Civ. Eng. 2021, 25, 4425–4442. [Google Scholar] [CrossRef]
  18. Yazdani-Chamzini, A.; Yakhchali, S.H. Tunnel Boring Machine (TBM) selection using fuzzy multicriteria decision making methods. Tunn. Undergr. Space Technol. 2012, 30, 194–204. [Google Scholar] [CrossRef]
  19. Ghasemi, E.; Yagiz, S.; Ataei, M. Predicting penetration rate of hard rock tunnel boring machine using fuzzy logic. Bull. Eng. Geol. Environ. 2014, 73, 23–35. [Google Scholar] [CrossRef]
  20. Minh, V.T.; Katushin, D.; Antonov, M.; Veinthal, R. Regression models and fuzzy logic prediction of TBM penetration rate. Open Eng. 2017, 7, 60–68. [Google Scholar] [CrossRef]
  21. Afradi, A.; Ebrahimabadi, A.; Hallajian, T. Prediction of TBM penetration rate using fuzzy logic, particle swarm optimization and harmony search algorithm. Geotech. Geol. Eng. 2022, 40, 1513–1536. [Google Scholar] [CrossRef]
  22. Armaghani, D.J.; Mohamad, E.T.; Narayanasamy, M.S.; Narita, N.; Yagiz, S. Development of hybrid intelligent models for predicting TBM penetration rate in hard rock condition. Tunn. Undergr. Space Technol. 2017, 63, 29–43. [Google Scholar] [CrossRef]
  23. Wang, K.; Wu, X.; Zhang, L.; Song, X. Data-driven multi-step robust prediction of TBM attitude using a hybrid deep learning approach. Adv. Eng. Inform. 2023, 55, 101854. [Google Scholar] [CrossRef]
  24. Yu, S.; Zhang, Z.; Wang, S.; Huang, X.; Lei, Q. A performance-based hybrid deep learning model for predicting TBM advance rate using attention-ResNet-LSTM. J. Rock Mech. Geotech. Eng. 2024, 16, 65–80. [Google Scholar] [CrossRef]
  25. Yao, M.; Li, X.; Pang, Y.E.; Wang, Y. Prediction model of TBM response parameters based on a hybrid drive of knowledge and data. Tunn. Undergr. Space Technol. 2025, 161, 106598. [Google Scholar] [CrossRef]
  26. Mahdevari, S.; Shahriar, K.; Yagiz, S.; Shirazi, M.A. A support vector regression model for predicting tunnel boring machine penetration rates. Int. J. Rock Mech. Min. Sci. 2014, 72, 214–229. [Google Scholar] [CrossRef]
  27. Ayawah, P.E.; Sebbeh-Newton, S.; Azure, J.W.; Kaba, A.G.; Anani, A.; Bansah, S.; Zabidi, H. A review and case study of Artificial intelligence and Machine learning methods used for ground condition prediction ahead of tunnel boring Machines. Tunn. Undergr. Space Technol. 2022, 125, 104497. [Google Scholar] [CrossRef]
  28. Xu, C.; Liu, X.; Wang, E.; Wang, S. Prediction of tunnel boring machine operating parameters using various machine learning algorithms. Tunn. Undergr. Space Technol. 2021, 109, 103699. [Google Scholar] [CrossRef]
  29. Li, X.; Yao, M.; Yuan, J.D.; Wang, Y.J.; Li, P.Y. Deep learning characterization of rock conditions based on tunnel boring machine data. Undergr. Space 2023, 12, 89–101. [Google Scholar] [CrossRef]
  30. Mahmoodzadeh, A.; Nejati, H.R.; Ibrahim, H.H.; Ali, H.F.H.; Mohammed, A.H.; Rashidi, S.; Majeed, M.K. Several models for tunnel boring machine performance prediction based on machine learning. Geomech. Eng. 2022, 30, 75. [Google Scholar]
  31. Shahrour, I.; Zhang, W. Use of soft computing techniques for tunneling optimization of tunnel boring machines. Undergr. Space 2021, 6, 233–239. [Google Scholar] [CrossRef]
  32. Price, K.V. Differential evolution: A fast and simple numerical optimizer. In Proceedings of the North American Fuzzy Information Processing, Berkeley, CA, USA, 19–22 June 1996; IEEE: Washington, DC, USA, 1996; pp. 524–527. [Google Scholar]
  33. Storn, R.; Price, K. Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
  34. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  35. Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
  36. Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
  37. Hastie, T. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  38. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  39. Vapnik, V. Statistical Learning Theory; Wiley-Interscience: New York, NY, USA, 1998. [Google Scholar]
  40. Hastie, T.; Tibshirani, R. Generalized additive models. Stat. Sci. 1986, 1, 297–310. [Google Scholar] [CrossRef]
  41. Wood, S.N. Generalized Additive Models: An Introduction with R; Chapman and Hall/CRC: New York, NY, USA, 2017. [Google Scholar]
  42. Merguerian, C.; Brock, P.C.; Brock, P.W. The Queens Tunnel Complex–a granulite facies orthogneiss terrane exposed in NYC Water Tunnel# 3 (abs.). Geol. Soc. Am. Abstr. Programs 2001, 33, A46. [Google Scholar]
  43. Kwiatkowski, T.P.; Bhore, J.S.; Velez, J.; Orlandi, C.D.; Traylor, M.T.; Townsend, J.W. Succesful Tunneling for Water Tunnel No. 3 New York City, New York. In Proceedings Rapid Excavation and Tunneling Conference; SME: Toronto, ON, Canada, 2007. [Google Scholar]
  44. Karahan, H.; Alkaya, D. Integrating SVR Optimization and Machine Learning-Based Feature Importance for TBM Penetration Rate Prediction. Appl. Sci. 2025, 16, 355. [Google Scholar] [CrossRef]
  45. Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2021. [Google Scholar]
  46. Kutner, M. Applied Linear Regression Models; McGraw-Hill/Irwin: Columbus, OH, USA, 1996. [Google Scholar]
  47. Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
  48. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  49. Jamshidi, A. Prediction of TBM penetration rate from brittleness indexes using multiple regression analysis. Model. Earth Syst. Environ. 2018, 4, 383–394. [Google Scholar] [CrossRef]
  50. Seber, G.A.; Wild, C.J.; Stanford, J.L.; Vardeman, S.B. 9 least squares. In Statistical Methods for Physical Science; Academic Press: Cambridge, MA, USA, 1994; Volume 28, pp. 245–281. [Google Scholar]
  51. Salimi, A.; Esmaeili, M. Utilising of linear and non-linear prediction tools for evaluation of penetration rate of tunnel boring machine in hard rock condition. Int. J. Min. Miner. Eng. 2013, 4, 249–264. [Google Scholar] [CrossRef]
  52. Salimi, A.; Rostami, J.; Moormann, C.; Delisio, A. Application of non-linear regression analysis and artificial intelligence algorithms for performance prediction of hard rock TBMs. Tunn. Undergr. Space Technol. 2016, 58, 236–246. [Google Scholar] [CrossRef]
  53. Arbor, A.; Holland, J.H. Adaptation in Natural and Artificial Systems; The MIT Press: London, UK, 1975. [Google Scholar]
  54. Goldberg, D.E.; Korb, B.; Deb, K. Messy genetic algorithms: Motivation, analysis, and first results. Complex Syst. 1989, 3, 493–530. [Google Scholar]
  55. Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95-International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; IEEE: Washington, DC, USA, 1995; Volume 4, pp. 1942–1948. [Google Scholar]
  56. Eberhart, R.C.; Shi, Y. Comparison between genetic algorithms and particle swarm optimization. In Proceedings of the International Conference on Evolutionary Programming, Berlin/Heidelberg, Germany, 25–27 March 1998; Springer: Berlin/Heidelberg, Germany, 1998; pp. 611–616. [Google Scholar]
  57. Yang, X.S. Nature-Inspired Metaheuristic Algorithms; Luniver Press: Beckington, UK, 2008. [Google Scholar]
  58. Yang, X.S. Firefly algorithms for multimodal optimization. In International Symposium on Stochastic Algorithms; Springer: Berlin/Heidelberg, Germany, 2009; pp. 169–178. [Google Scholar]
  59. Yagiz, S.; Yazitova, A.; Karahan, H. Application of differential evolution algorithm and comparing its performance with literature to predict rock brittleness for excavatability. Int. J. Min. Reclam. Environ. 2020, 34, 672–685. [Google Scholar] [CrossRef]
  60. Karahan, H.; Gurarslan, G. Discussion of “Estimation of nonlinear muskingum model parameter using differential evolution” by Dong-Mei Xu, Lin Qiu, and Shou-Yu Chen. J. Hydrol. Eng. 2013, 18, 1064–1067. [Google Scholar] [CrossRef]
  61. Willmott, C.J.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
  62. Karahan, H.; Erkan Can, M. A Novel Method to Forecast Nitrate Concentration Levels in Irrigation Areas for Sustainable Agriculture. Agriculture 2025, 15, 161. [Google Scholar] [CrossRef]
  63. Karahan, H.; Cetin, M.; Can, M.E.; Alsenjar, O. Developing a new ANN model to estimate daily actual evapotranspiration using limited climatic data and remote sensing techniques for sustainable water management. Sustainability 2024, 16, 2481. [Google Scholar] [CrossRef]
  64. Karahan, H.; Iplikci, S.; Yasar, M.; Gurarslan, G. River flow estimation from upstream flow records using support vector machines. J. Appl. Math. 2014, 2014, 714213. [Google Scholar] [CrossRef]
  65. Kincaid, D.; Cheney, W. Numerical Analysis, Brooks; Cole Publishing Company: Belmont, CA, USA, 1991; Volume 20, pp. 10–13. [Google Scholar]
  66. Karahan, H. Numerical modelling of shallow water using an iterative solution algorithm. WIT Trans. Built Environ. 2001, 58, 173–181. [Google Scholar]
  67. Bien, Z.; Jang, W.; Park, J. Characterization and use of feature-jacobian matrix for visual servoing. Vis. Servoing 1993, 7, 317–363. [Google Scholar]
Figure 1. Variation in model variables and ROP values along the tunnel length.
Figure 1. Variation in model variables and ROP values along the tunnel length.
Applsci 16 01301 g001
Figure 2. The correlation relationship between ROP and model parameters.
Figure 2. The correlation relationship between ROP and model parameters.
Applsci 16 01301 g002
Figure 3. (a) Convergence of the M6 model; (b) comparison of measured and predicted results along the tunnel length; (c) scatter plot (The red dashed line represents the 1:1 reference line, the black solid line indicates the regression line, and the blue circles denote the model predictions).
Figure 3. (a) Convergence of the M6 model; (b) comparison of measured and predicted results along the tunnel length; (c) scatter plot (The red dashed line represents the 1:1 reference line, the black solid line indicates the regression line, and the blue circles denote the model predictions).
Applsci 16 01301 g003aApplsci 16 01301 g003b
Figure 4. Relative effects of variables in the M1–M6 models constructed using original data (a) and Z-score standardization (b).
Figure 4. Relative effects of variables in the M1–M6 models constructed using original data (a) and Z-score standardization (b).
Applsci 16 01301 g004
Figure 5. Variation in variable sensitivity in parametric models under different noise levels of (a) 0.5%, (b) 1%, and (c) 5%.
Figure 5. Variation in variable sensitivity in parametric models under different noise levels of (a) 0.5%, (b) 1%, and (c) 5%.
Applsci 16 01301 g005aApplsci 16 01301 g005b
Figure 6. Relative effects of variables in soft computing methods developed using original data and Z-score standardization.
Figure 6. Relative effects of variables in soft computing methods developed using original data and Z-score standardization.
Applsci 16 01301 g006aApplsci 16 01301 g006b
Figure 7. Comparison of partial dependence patterns across models.
Figure 7. Comparison of partial dependence patterns across models.
Applsci 16 01301 g007
Figure 8. Variation in MSE with training steps in machine learning models.
Figure 8. Variation in MSE with training steps in machine learning models.
Applsci 16 01301 g008
Table 1. Descriptive statistics of the variables used in the study.
Table 1. Descriptive statistics of the variables used in the study.
RowBIUCSAlphaDPWBTSROP
Min24.867118.2762.0000.0506.7241.268
Max57.964199.65589.0002.00011.4353.072
Mean34.600150.05444.7151.0219.5502.044
Std8.45722.18923.2790.6450.8700.360
Table 2. Proposed Equations for Predicting TBM Performance.
Table 2. Proposed Equations for Predicting TBM Performance.
ModelEquationEqn
M1 R O P = p 1 + p 2 B I + p 3 U C S + p 4 α + p 5 D P W + p 6 B T S (1)
M2 R O P = e x p ( p 1 + p 2 B I + p 3 U C S + p 4 α + p 5 D P W + p 6 B T S )   (2)
M3 R O P = p 1   B I   p 2 U C S p 3     α p 4 D P W p 5   B T S   p 6 (3)
M4 R O P = p 1 + p 2 B I + p 3 U C S + p 4 log α + p 5 D P W   p 6 + p 7 B T S (4)
M5 R O P = p 1 + p 2 B I + p 3 U C S + p 4 α p 5 + p 6   D P W   p 7 + p 8 B T S (5)
M6 R O P = p 1 + p 2 B I + p 3 U C S + p 4 log α + p 5 D P W + p 6 B I   D P W + p 7 B T S   (6)
Table 3. Optimal Parameters of the Penetration Rate Prediction Models.
Table 3. Optimal Parameters of the Penetration Rate Prediction Models.
Modelp1p2p3p4p5p6p7p8
M11.41380.0308−0.00350.0054−0.21610.0069
M20.40780.0146−0.00190.0026−0.10980.0076
M30.64910.5483−0.29340.0987−0.07030.1337
M41.05700.0291−0.00370.1908−0.33900.61500.0186
M52.58700.0286−0.0038−1.7221−0.2017−0.34410.61080.0219
M60.72970.0384−0.00400.18750.0948−0.00900.0138
Table 4. Comparative Performance of Penetration Rate Prediction Models.
Table 4. Comparative Performance of Penetration Rate Prediction Models.
ModelMSERMSEMAER2
M10.048380.219960.185240.62351
M20.047200.217250.182080.63275
M30.043730.209120.175570.65971
M40.042380.205860.176440.67023
M50.042040.205050.175840.67282
M60.039990.199990.169020.68878
Table 5. Comparison of the performance of DE and other heuristic algorithms for the M6 model.
Table 5. Comparison of the performance of DE and other heuristic algorithms for the M6 model.
AlgorithmMSERMSEMAER2NSE
DE0.039990.199990.169020.688780.68878
HS0.047120.217060.180890.644280.63336
PSO0.066820.258490.213170.492630.48005
SOS0.039990.199990.169020.688780.68878
GWO0.043010.207390.178230.665320.66532
Table 6. Comparative sensitivity results obtained using different sensitivity analysis methods.
Table 6. Comparative sensitivity results obtained using different sensitivity analysis methods.
ModelBIUCSALFADPWBTS
PDPM10.45690.12730.21220.18900.0146
M20.46120.13070.19440.18320.0305
M30.37200.11420.25490.20630.0525
M40.36410.11820.26770.20480.0452
M50.38000.11950.28300.18320.0342
M60.38850.13210.28520.16820.0260
ALEM10.47330.13570.21100.16750.0125
M20.47160.14500.19180.16560.0260
M30.40940.13390.23980.17100.0459
M40.40770.13630.24490.16920.0419
M50.41380.13400.25280.16860.0308
M60.43390.14310.24510.15520.0226
Jacobian
(Elasticity-based)
M10.38040.12930.22320.25680.0103
M20.38070.14030.20790.24930.0219
M30.39470.14440.22750.19110.0423
M40.37900.15120.22820.20160.0400
M50.36360.14000.22330.24530.0278
M60.35930.15120.21760.25130.0206
Table 7. Normalized Jacobian-based elasticity (%) under multiplicative Gaussian noise (fixed model coefficients).
Table 7. Normalized Jacobian-based elasticity (%) under multiplicative Gaussian noise (fixed model coefficients).
Noise LevelModelBIUCSALPHADPWBTS
0.005M149.6324.8311.3311.073.14
M246.3626.1010.6410.276.63
M347.9225.638.636.1411.67
M446.5326.319.019.758.40
M547.4323.309.618.8910.77
M646.2428.488.8410.246.20
0.010M149.5924.8411.3611.073.14
M246.4126.0710.6410.266.63
M347.9125.648.626.1411.69
M446.5326.349.019.738.40
M547.4223.299.618.9110.77
M646.2728.488.8410.226.19
0.050M149.6424.7011.4411.103.13
M246.2626.1310.6210.346.65
M348.0225.508.706.1511.63
M446.5826.209.059.738.44
M547.2823.349.648.9610.78
M646.3128.428.8710.196.21
Table 8. Performance comparison of machine learning methods used for ROP prediction.
Table 8. Performance comparison of machine learning methods used for ROP prediction.
ScenarioModelRR2RMSEMAE
IRandom Forest0.8780.7510.1790.146
Bagged Trees0.8790.7430.1820.148
SVM0.8940.7640.1740.123
GAM0.9520.9060.1100.058
IIRandom Forest0.8850.7560.1770.146
Bagged Trees0.8820.7420.1820.148
SVM0.8880.7500.1790.125
GAM0.9320.8680.1300.070
Table 9. Variable importance for machine learning models.
Table 9. Variable importance for machine learning models.
ScenarioVariableVariable Importance (%)
RFBTSVMGAM
IBI31.9037.6921.1923.27
UCS9.259.9127.7315.62
ALPHA28.8335.5326.8034.48
DPW19.818.998.6312.85
BTS10.217.8815.6313.78
IIBI29.5637.7418.5426.75
UCS9.1012.8416.1019.99
ALPHA30.1030.0531.1827.58
DPW23.7313.7518.2312.02
BTS7.525.6315.9513.66
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Karahan, H.; Alkaya, D. Input Variable Effects on TBM Penetration Rate: Parametric and Machine Learning Models. Appl. Sci. 2026, 16, 1301. https://doi.org/10.3390/app16031301

AMA Style

Karahan H, Alkaya D. Input Variable Effects on TBM Penetration Rate: Parametric and Machine Learning Models. Applied Sciences. 2026; 16(3):1301. https://doi.org/10.3390/app16031301

Chicago/Turabian Style

Karahan, Halil, and Devrim Alkaya. 2026. "Input Variable Effects on TBM Penetration Rate: Parametric and Machine Learning Models" Applied Sciences 16, no. 3: 1301. https://doi.org/10.3390/app16031301

APA Style

Karahan, H., & Alkaya, D. (2026). Input Variable Effects on TBM Penetration Rate: Parametric and Machine Learning Models. Applied Sciences, 16(3), 1301. https://doi.org/10.3390/app16031301

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop