Cross-Material Benchmarking of Machine Learning Models for Cutting Force Prediction in CNC Turning

Alsoufi, Mohammad S.; Bawazeer, Saleh A.

doi:10.3390/machines14040426

Open AccessArticle

Cross-Material Benchmarking of Machine Learning Models for Cutting Force Prediction in CNC Turning

by

Mohammad S. Alsoufi

^*

and

Saleh A. Bawazeer

Department of Mechanical Engineering, College of Engineering and Architecture, Umm Al-Qura University, Makkah 21955, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Machines 2026, 14(4), 426; https://doi.org/10.3390/machines14040426

Submission received: 5 March 2026 / Revised: 2 April 2026 / Accepted: 9 April 2026 / Published: 11 April 2026

(This article belongs to the Section Advanced Manufacturing)

Download

Browse Figures

Versions Notes

Abstract

Accurate prediction of cutting force is essential for process optimization and intelligent control in CNC turning, yet cross-material performance comparisons of machine learning models remain limited. This study develops and applies a structured diagnostic benchmarking framework to evaluate ten supervised regression models for cutting force prediction across five engineering alloys: Aluminum Alloy 6061, Brass C26000, Bronze C51000, Stainless Steel 304 (annealed), and Carbon Steel 1020 (annealed). The input space included material category together with machining descriptors (diameter, feed rate, and axial distance from the chuck). Model performance was evaluated using the coefficient of determination (R²), root mean square error (RMSE), cross-validated stability metrics, and pairwise dominance probability matrices derived from R² and CV(RMSE). Gradient Boosting achieved the highest overall accuracy and robustness, with a mean R² = 0.962 and RMSE = 18.03 N, followed by a feedforward neural network (R² = 0.953, RMSE = 19.96 N), while Support Vector Regression showed substantially lower performance (R² < 0.65; RMSE > 54 N). Residual diagnostics indicated that ensemble and neural models produced compact, near-homoscedastic error distributions, whereas linear and single-tree models exhibited systematic bias and heteroscedasticity. Principal Component Analysis revealed that the first two components captured 78.7% of the total variance, separating geometric and spatial effects from feed-driven variability. The proposed evaluation framework provides a unified methodology for accuracy, and multivariate interpretation in machining force prediction. These results offer practical guidance for selecting robust learning models in intelligent CNC systems and data-driven manufacturing environments.

Keywords:

cutting force prediction; machine learning; regression models; machining of metallic materials; data-driven modeling; construction alloys

1. Introduction

Dimensional accuracy and deformation stability remain critical challenges in high-precision machining, particularly for metallic alloys subjected to combined mechanical and thermal loads during cutting operations [1]. Despite significant advances in computer numerical control (CNC) systems and machining automation, unintended deformation during turning remains a persistent challenge affecting part quality and dimensional accuracy [2]. This behavior arises from complex, strongly coupled interactions among cutting forces, heat generation, tool–workpiece contact conditions, and the intrinsic material response [3]. As a result, deformation-related effects are increasingly recognized as a limiting factor in achieving reliable dimensional control, yet they are still frequently treated as secondary outcomes rather than being explicitly addressed within predictive machining frameworks.

Traditional machining research has extensively investigated surface roughness, tool wear, chip morphology, and cutting force evolution [4,5]. Empirical correlations and analytical formulations have been proposed to relate these responses to process parameters such as feed rate, cutting speed, and depth of cut [6,7]. While these approaches have yielded valuable insights, their applicability is often restricted by simplifying assumptions and calibration to specific material–process combinations [8,9]. More advanced finite element simulations have improved understanding of localized stress states, strain localization, and thermal gradients during cutting [10,11,12]. However, such models are typically computationally intensive, sensitive to constitutive assumptions, and difficult to generalize across multiple materials or machining configurations.

In response to these limitations, data-driven modeling approaches, particularly machine learning (ML), have gained significant attention over the past decade for predicting machining responses, including cutting forces, tool wear, and surface integrity [13,14,15,16]. Recent studies have further explored ML-based modeling of machining performance under varying cutting conditions and tribological environments. For example, Abdallah et al. [17] applied machine-learning techniques to predict machining responses under different cutting regimes, while Kumar et al. [18] investigated ML-based prediction of tribological behavior in machining processes. Similarly, Ben Saoud and Korkmaz [5] discussed data-driven modeling strategies for machinability assessment and process optimization. These studies highlight the growing role of machine-learning approaches in capturing complex nonlinear relationships between machining parameters and process responses. Nevertheless, most existing ML-based machining studies prioritize predictive accuracy for a single algorithm or dataset, often without systematic comparison across model families or detailed evaluation of residual behavior and predictive stability [19,20,21,22]. Consequently, reported performance gains may be algorithm-specific or dataset-dependent, limiting their broader interpretability and practical transferability.

In many current ML formulations, deformation or cutting force is treated as a scalar output optimized solely for prediction, with limited examination of how error distributions, bias, or variance evolve across materials and machining regions [23]. This perspective obscures important differences in how learning algorithms respond to variations in material properties such as hardness, thermal conductivity, and elastic modulus, which are known to influence force generation and deformation behavior [24,25]. From an engineering standpoint, this lack of comparative insight complicates model selection and reduces confidence in applying ML predictors across heterogeneous material systems.

This challenge is particularly pronounced in multi-material turning environments, where alloys with widely differing thermomechanical characteristics are often machined under nominally similar operating conditions. A structured and statistically consistent benchmarking methodology is therefore required to evaluate ML models on a common experimental basis, quantify their robustness, and diagnose systematic prediction errors, rather than relying exclusively on aggregate accuracy indicators [26,27].

Furthermore, existing literature rarely integrates global performance metrics with residual-pattern diagnostics and multivariate structure analysis, despite their importance for understanding model behavior beyond mean error values [19,20,21,22]. Without integrated diagnostic analysis, high-performing models may still exhibit localized bias, heteroscedasticity, or instability that limits reliability in practical machining environments.

Accordingly, this study presents a comprehensive benchmarking and diagnostic evaluation of machine learning models for cutting force prediction in CNC turning under controlled experimental conditions. Ten supervised regression algorithms, spanning linear, tree-based, ensemble, instance-based, and neural network architectures, are evaluated across five engineering alloys with distinct mechanical and thermal properties. Model performance is assessed using

R^{2}

, RMSE, cross-validated stability metrics, and pairwise dominance probability matrices. In addition, residual–prediction analysis is employed to identify model-specific bias and heteroscedasticity, while Principal Component Analysis (PCA) is used to explore the multivariate structure of machining descriptors and material clustering.

Beyond model comparison, this work formalizes a unified evaluation strategy that integrates accuracy assessment, residual diagnostics, probabilistic dominance analysis, and multivariate interpretation into a reproducible benchmarking protocol. This structured framework enables robust cross-material model selection and provides distribution-aware insight into algorithm stability under controlled turning conditions. The investigation is intentionally designed as a controlled benchmarking study in which cutting speed, depth of cut, and tool geometry were held constant across all experiments. This configuration isolates material variability and spatial machining descriptors, enabling consistent comparison of predictive behavior across alloys. The objective is not to derive universal cutting-force laws but to establish a statistically rigorous and practically deployable framework for model selection in data-driven machining systems.

2. Materials and Methods

2.1. Materials and Experimental Setup

To investigate the wide range of thermomechanical behaviors in precision CNC turning, five engineering-grade metallic alloys were chosen: Aluminum Alloy 6061, Brass C26000, Bronze C51000, Stainless Steel 304 Annealed, and Carbon Steel 1020 Annealed. These metallic alloys were sourced from SpecLine Arabia Company Ltd. and Al Mokahal Co. (Al-Jubail, Saudi Arabia), with nominal thermophysical and mechanical properties obtained from the respective manufacturers’ datasheets. These materials encompass a diverse range of Brinell hardness, thermal conductivity, and elastic modulus, enabling a comprehensive evaluation of deformation properties across ductile, conductive, and machining-resistant types.

The suggested modeling framework builds upon our earlier research on CNC-machined alloys, where we systematically measured cutting force, axial deformation, and intrinsic material properties to evaluate machinability and deformation patterns [28,29]. These foundational studies underpin the current regression-based approach, which uses a statistically validated power-law model to predict deformation across different materials.

All machining tests were conducted under consistent dry turning conditions on a high-precision CNC lathe using tungsten carbide cutting inserts conforming to ISO designation CNMG120408-PM [30], mounted on a standard ISO-designated tool holder (PCLNR 2525M12) [31]. The inserts were coated with titanium aluminum nitride (TiAlN) to enhance wear resistance and thermal stability during machining. This insert geometry and tool-holder configuration follow ISO 1832 standards [30] for indexable turning inserts, while the tool holder designation conforms to ISO 5608 [31], ensuring consistent cutting conditions across all machining trials.

The machining experiments were conducted under controlled cutting conditions in order to isolate the influence of material properties and spatial machining descriptors. The cutting speed and depth of cut were kept constant throughout all experiments, while feed rate and axial distance from the chuck were systematically varied to generate the dataset used for machine-learning modeling. The investigated parameter ranges included feed rates between 0.025–0.25 mm/rev, turning diameters between 17.5–33.25 mm, and axial machining distances ranging from 5.85–40.65 mm from the chuck. These parameter ranges were selected to represent typical finishing and semi-finishing turning conditions used in machining metallic alloys.

The variable referred to as turning diameter corresponds to the diameter of the workpiece at the cutting location, i.e., the instantaneous turning diameter at the tool–workpiece contact point during machining. The parameter distance from the chuck denotes the axial distance between the chuck face and the cutting location along the workpiece axis. A schematic illustration of the CNC turning configuration and the geometric definition of these spatial descriptors is provided in our earlier publication [32], where the experimental setup is presented in detail.

Axial deformation (δ) was tracked in real-time as a function of applied force (F), which was measured using a three-component dynamometer (TeLC) mounted inline between the CNC turret and the tool shank, enabling simultaneous measurement of the main cutting, feed (axial), and radial force components. For each material, repeated machining trials were conducted under identical cutting speed, depth of cut, and tool geometry conditions, while feed rate and axial position were systematically varied. Furthermore, each test was repeated multiple times, and the average outcomes were analyzed to minimize measurement errors and experimental variability.

Following the abovementioned experimental setup, the measured force–deformation data were compiled into a structured dataset consisting of 125 total observations distributed across the five alloys. The dataset therefore includes intrinsic material properties (hardness, thermal conductivity, and Young’s modulus), and spatial machining descriptors. This structured dataset served as the input for the subsequent machine learning benchmarking and diagnostic analysis.

2.2. Machine Learning Models Employed

To accurately predict cutting force across various machining conditions and material types, ten supervised regression algorithms were used. These models cover a wide range of learning methods, including linear regressors, ensemble techniques, kernel-based learners, neural networks, and instance-based approaches. This diverse set of models offers a comprehensive evaluation of both simple and complex relationships between force and features in CNC turning.

The models include Partial Least Squares (PLS), Stochastic Gradient Descent (SGD), Linear Regression (LR), Decision Tree (DT), k-Nearest Neighbors (kNN), Support Vector Machine (SVM), Neural Network (NN), Random Forest (RF), AdaBoost (AB), and Gradient Boosting (GB). Table 1 summarizes their abbreviations, model types, and core attributes. These models were selected not only for their widespread use in regression tasks but also for their complementary strengths in capturing linearity, nonlinearity, robustness to noise, and handling of high-dimensional interactions. This summary of machine learning models for cutting force prediction (Table 1) is based on established literature covering ML applications in manufacturing and data-driven modeling [33,34,35].

Each algorithm has unique strengths. Linear models (PLS, SGD, LR) are valued for their interpretability and efficiency but may struggle to model nonlinear patterns [36]. Tree-based models (DT, RF) and ensemble methods (AB, GB) tend to be more robust and generalize better, though they can be prone to overfitting or need extensive tuning [37]. The NN captures complex, multivariate interactions, but it demands careful architecture design and larger datasets to prevent overfitting. Support Vector Machines (SVM), effective in high-dimensional spaces, can be sensitive to noise and variability in material responses. The intuitive and straightforward kNN algorithm is lightweight in terms of parameters, but it becomes less practical for larger datasets [38]. This diverse set of models provides a comprehensive benchmarking platform for assessing predictive accuracy, robustness, and generalizability across five metallic alloys and various machining conditions.

2.3. Feature Selection and Preprocessing

The input features for the predictive model included three intrinsic thermomechanical properties: Brinell hardness, thermal conductivity, and Young’s modulus. These were selected because of their direct impact on deformation mechanics during machining and their wide range across the alloys studied. To maintain statistical consistency and ensure fair comparison between algorithms, all features were standardized with z-score normalization before training the models.

The five materials examined in this study provide a strategically varied mix of properties. Aluminum Alloy 6061 features low hardness (95 HB), excellent thermal conductivity (167 W/m·K), and a low modulus of elasticity (69 GPa), making it very ductile and good at dissipating heat. Brass C26000, on the other hand, has moderate hardness (70 HB), high thermal conductivity (120 W/m·K), and increased stiffness (100 GPa), balancing plasticity and rigidity. Bronze C51000 exhibits higher hardness (110 HB), moderate thermal conductivity (62 W/m·K), and higher stiffness (117 GPa), making it ideal for wear-resistant applications.

Stainless Steel 304 Annealed is a difficult-to-machine material characterized by high hardness (201 HB), low thermal conductivity (16.2 W/m·K), and high stiffness (193 GPa), which makes it susceptible to thermal gradients and elastic recovery. In contrast, Carbon Steel 1020 Annealed has an intermediate hardness (126 HB), higher thermal conductivity (51 W/m·K), and the highest stiffness in the group (210 GPa), positioning it at the upper range of rigidity.

These thermomechanical differences create a diverse feature space that tests the ability of learning models to generalize across various deformation regimes. Correlation analysis confirmed the independence of the features, ensuring that each descriptor provides unique information to the regression task without introducing multicollinearity or redundancy.

Although the material category is included as a categorical feature in the regression models, it is not treated as an abstract label. Instead, material identity serves as a compact proxy for the combined thermomechanical properties, such as hardness, thermal conductivity, and elastic modulus, that govern cutting force response during machining. The inclusion of explicit property descriptors alongside material categories ensures that model learning remains physically grounded, while allowing comparison across alloys that differ simultaneously in multiple correlated properties. Feature importance results should therefore be interpreted as reflecting aggregated material behavior rather than nominal material labels.

2.4. Model Performance Metrics and Validation Framework

Model predictive performance was evaluated using complementary statistical metrics commonly adopted in supervised regression analysis. Let

F_{i, e x p}

denote the experimentally measured cutting force for sample

i

, and

F_{i, m o d e l}

the corresponding prediction generated by a given learning algorithm. The dataset consists of

n

observations.

The coefficient of determination (R²) quantifies the proportion of variance in the observed response explained by the regression model. It was computed as:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(F_{i, e x p}− F_{i, m o d e l})}^{2}}{\sum_{i = 1}^{n} {(F_{i, e x p}− {\bar{F}}_{e x p})}^{2}}

(1)

where

{\bar{F}}_{e x p}

represents the mean experimental cutting force.

Predictive accuracy in absolute terms was assessed using the Root Mean Square Error (RMSE), defined as:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(F_{i, e x p}− F_{i, m o d e l})}^{2}}

(2)

which preserves the physical unit of the response variable (N). To enable normalized comparison across models and materials with differing mean force magnitudes, a scale-independent coefficient of variation of RMSE was computed as:

C V (R M S E) = \frac{R M S E}{{\bar{F}}_{e x p}} \times 100 %

(3)

This dimensionless metric facilitates relative comparison of predictive dispersion.

Model robustness was evaluated through k-fold cross-validation, with k = 5. For each algorithm, performance metrics were calculated across validation folds, and both mean values and standard deviations were reported. The variability of R² and RMSE across folds was used as an indicator of predictive stability and generalization consistency.

To compare algorithms beyond average metric values, pairwise dominance probability matrices were constructed. For each pair of models

(A, B)

, dominance probability was defined as the proportion of cross-validation folds in which model

A

achieved superior performance relative to model

B

, based on higher R² and lower CV(RMSE). A dominance probability greater than 0.5 indicates consistent outperformance across folds, providing a probabilistic ranking framework that accounts for performance variability rather than relying solely on mean metric values.

Residuals were defined as:

ε_{i} = F_{i, e x p} - F_{i, m o d e l}

(4)

Residual prediction scatter analysis was conducted to assess heteroscedasticity, systematic bias, and variance inflation patterns. Compact residual distributions without structured trends were interpreted as indicators of model adequacy and stable generalization across materials.

3. Exploratory Data Analysis

Prior to the development of machine-learning models, it is important to examine the statistical structure of the dataset through exploratory data analysis. Such analysis helps identify correlations between machining parameters and responses, as well as potential patterns or distributions present in the data. From a machine-learning perspective, datasets exhibiting consistent statistical behavior or identifiable relationships among variables are generally more amenable to predictive modeling, as learning algorithms can more effectively capture the underlying patterns governing the process. Therefore, the following analysis aims to explore the relationships among machining descriptors and measured responses, providing a preliminary assessment of the dataset before applying the regression models described in the subsequent sections.

In the statistical analysis presented in this study, the symbols

μ

and

σ

denote the sample mean and sample standard deviation of the considered variables. For a variable

x

with

n

observations

x_{i}

, the mean value

μ

is computed as

μ = \frac{1}{n} \sum_{i = 1}^{n} x_{i},

(5)

while the standard deviation

σ

is calculated as

σ = \sqrt{\frac{1}{n - 1} \sum_{i = 1}^{n} {(x_{i} - μ)}^{2}} .

(6)

These statistics provide a concise description of the central tendency and variability of the machining parameters and response variables analyzed in the subsequent sections.

3.1. Force Distribution

To examine the variability and distribution of mechanical loads during CNC turning, a detailed statistical analysis was conducted on the cutting force data. Figure 1 presents two histograms: one fitted with a two-parameter Gamma distribution (Figure 1a) and the other with a Gaussian (Normal) distribution (Figure 1b). This comparison provides valuable insights into the non-symmetric and heteroskedastic nature of force variations under different machining conditions.

The raw cutting force data exhibit a wide dynamic range, from as low as 5 N to peaks exceeding 470 N, with a pronounced right-skewed distribution. More than 75% of the measurements fall between 5 N and 105 N, indicating that the operation primarily involves low to moderate mechanical loads. This tendency is likely due to machining ductile and thermally conductive materials, such as Aluminum 6061 and Brass C26000, which naturally require less cutting force because of efficient chip formation and lower tool-workpiece friction.

Figure 1a depicts how a Gamma distribution fits the data, with a shape parameter α = 1.70002 and a scale parameter β = 67.17555. The theoretical mean μ is around 114.2 N, and the mode is approximately 47.9 N, emphasizing the dataset’s strong asymmetry. The tail extending toward higher force values indicates rare but critical events, such as tool contact with tough materials like Stainless Steel 304 and Carbon Steel 1020, or brief process issues like tool wear or chip buildup. The mode being much lower than the mean shows positive skewness. The Gamma model aligns well with the observed frequency distribution within the 10–120 N force range, confirming its suitability for representing complex, non-normal, and heteroskedastic physical behaviors.

In contrast, Figure 1b presents the same data fitted with a normal distribution, with a mean (μ) of 114.2 N and a standard deviation (σ) of 92.80 N. By definition, the mean and mode are both at 114.2 N, assuming perfect symmetry around this point. While the Gaussian model captures the overall average, it significantly underestimates the high-density peak near 40 N and fails to account for the heavy tails beyond 200 N. This highlights the limitations of the Normal model in accurately reflecting the empirical skewness and outlier behavior often observed in machining processes involving nonlinear material–tool interactions.

The difference between the mean and mode in the Gamma fit (~66.3 N) indicates the skewness of the cutting force distribution. In contrast, the Normal model assumes symmetry, causing the mean, median, and mode to align at the same point. This oversimplification diminishes the Gaussian model’s ability to accurately represent the true variability and extreme randomness inherent in the machining process.

Ultimately, the dual-distribution analysis shows that force data from machining are inherently non-Gaussian. This requires statistical models capable of capturing asymmetry and tail behavior. The Gamma distribution aligns more closely with the observed data and supports more accurate downstream tasks, such as regression-based deformation prediction and tool wear analysis. Its proficiency in modeling skewed and long-tailed distributions offers a more precise depiction of CNC turning forces, which are affected by complex, nonlinear interactions and material variability.

3.2. Correlation Structure

In turning operations, the magnitude of cutting force is known to depend on several interacting factors, including chip load, tool–workpiece contact conditions, and the structural stiffness of the workpiece–fixture system. For example, increasing feed rate increases chip thickness and shear deformation energy, which typically leads to higher cutting forces. Similarly, the effective stiffness of the workpiece may decrease as the cutting location moves farther from the chuck due to cantilever effects, potentially influencing the force response during machining. These relationships are well documented in machining literature and industrial practice. The correlation analysis presented in this section therefore aims to examine whether the experimental dataset reflects these established tendencies, providing a preliminary validation of the physical plausibility of the data prior to the machine-learning analysis.

A detailed analysis of correlations between variables offers valuable insights into how machining parameters interact and influence force behavior. Figure 2 shows a Pearson correlation heatmap for four main variables: cutting force, turning diameter at the cutting location, feed rate, and distance from the chuck. The heatmap uses colors ranging from dark red (indicating a strong positive correlation) to dark blue (indicating a strong negative correlation), with numerical correlation coefficients displayed for clarity. The identity diagonal of the Pearson correlation heatmap reflects the perfect self-correlation (r = 1) of each machining descriptor, confirming internal consistency of the matrix.

A moderate positive correlation between cutting force and axial distance from the chuck (r = 0.606) is observed. This trend is consistent with established turning mechanics, where increasing distance from the chuck reduces structural stiffness due to greater cantilevering of the workpiece, potentially influencing force levels during machining. The correlation analysis is presented here primarily as an exploratory diagnostic step to verify that the dataset reflects physically plausible relationships before applying machine-learning models. Reduced rigidity leads to less efficient dissipation of vibrational energy, necessitating higher cutting forces to ensure proper chip formation and surface quality. This finding highlights the importance of accounting for zone-specific stiffness degradation as a latent factor in predictive models.

The correlation between cutting force and feed rate (r = 0.242) is weaker statistically but remains consistent with machining mechanics. As feed increases, the chip load per revolution also increases, causing a linear rise in cutting resistance until the tool edge breaks down or chatter begins. Although moderate, this correlation suggests that feed should not be overlooked or treated as a minor variable, particularly in multivariate regression or machine learning models that are sensitive to second-order effects.

In contrast, a negative correlation exists between cutting force and turning diameter (r = −0.320). This reflects the geometry of progressive turning: as the operation moves inward radially, the tool engages a smaller surface area, which reduces inertial and frictional loads on the workpiece, thus lowering the force needed. This geometric relationship is significant in multi-pass or taper turning, where radial depth and diameter vary continuously.

The strongest negative correlation is between diameter and distance (r = −0.628), due to the setup’s geometry: larger diameters are mainly near the chuck. In comparison, smaller diameters are at deeper axial points. This taper pattern is typical in staged roughing followed by fine finishing, emphasizing the physical connection between radial and axial tool paths.

A moderate negative correlation is observed between diameter and feed (r = −0.342), likely reflecting operational practices: finer feeds are often used with larger diameters to improve surface finish or reduce tool deflection. This suggests possible adaptive control strategies during turning.

Finally, the near-zero correlation between feed and distance (r = 0.000) is statistically significant. It suggests operational independence, meaning feed rates were not systematically changed based on the tool’s position along the workpiece. This finding endorses model design decisions that consider feed rate as an external variable, separate from spatial parameters.

The correlation map highlights several interaction layers: geometric (diameter–distance), mechanical (force–distance), and operational (feed–diameter). These relationships justify the use of multivariate models that can include interaction terms, capture hidden structures such as stiffness gradients, and handle potential nonlinearities in the response. These insights will guide the next steps in feature engineering and predictive modeling.

3.3. Bivariate Feature Trends with Confidence Ellipses

To identify the underlying structure and interdependencies among key process descriptors, a bivariate feature analysis was conducted using pairwise scatter plots with 95% confidence ellipses, as presented in Figure 3. This visualization framework facilitates simultaneous interpretation of linear correlations, variance patterns, and experimental stratification across three machining variables: turning diameter (μ = 28.148 mm, σ = 3.922 mm), feed rate (μ = 0.120 mm/rev, σ = 0.070 mm/rev), and axial distance from the chuck (μ = 23.25 mm, σ = 12.353 mm).

The diameter–feed rate plot exhibits a narrow, negatively sloped ellipse, indicative of a strong inverse correlation (Pearson’s r = −0.91). This implies that as diameters increase, feed rates were systematically reduced, likely a deliberate control strategy to stabilize tool engagement and maintain dimensional accuracy. The highly elongated shape of the ellipse (eigenvalue ratio > 5:1) confirms low within-group variability, validating this as a statistically significant and experimentally embedded design constraint.

In contrast, the diameter–distance scatter plot forms a moderately inclined ellipse (r = +0.86), suggesting a strong positive correlation. This pattern reflects the stepped or tapered geometry of the machined components, where turning diameters naturally increase with axial distance from the chuck. Such coupling between radial and axial dimensions confirms that geometric design factors are not orthogonal and should be accounted for in multivariate regression to avoid omitted-variable bias or confounding effects.

The feed–distance pairing displays a nearly circular confidence ellipse (r = −0.03), confirming that no meaningful linear association exists between these variables. This statistical independence reinforces the orthogonality of the experimental design, as feed levels were uniformly distributed across axial zones. The absence of correlation minimizes multicollinearity in regression models, allowing for the isolated evaluation of the feed’s influence on cutting forces and deformation.

Diagonal histograms further clarify these trends. The diameter histogram exhibits a bimodal structure, indicating two distinct groups of workpiece geometries or turning stages, likely corresponding to rough and finishing passes. The feed and distance histograms exhibit multimodal patterns, driven by discrete level assignments in the design matrix. None of the variables follow a perfect Gaussian distribution, reinforcing the need for non-parametric or distribution-aware modeling techniques, such as robust regression or transformation-based PCA.

From a modeling perspective, these bivariate insights are essential for ensuring accurate interpretation and prediction. The strong negative correlation between diameter and feed warrants the inclusion of interaction terms in predictive algorithms, while the independence of feed and distance supports simplified feature selection. Moreover, the observed multimodality and skewness justify moving beyond traditional linear modeling assumptions.

In summary, the bivariate analysis confirms that the dataset is both statistically balanced and physically grounded, capturing both the engineered design intent and natural machining variability. The combined evidence from correlation coefficients, confidence ellipses, and distribution histograms offers a statistically rigorous foundation for advanced multivariate modeling and regression analysis in the subsequent sections.

While the preceding bivariate analysis provides insight into pairwise dependencies and experimental stratification patterns, machining processes inherently involve simultaneous interactions among multiple geometric and process descriptors. Pairwise correlations, although informative, cannot fully capture higher-dimensional structure or reveal latent coupling mechanisms across variables. To systematically investigate the dominant modes of variability and uncover hidden multivariate relationships within the dataset, a dimensionality-reduction approach is required. Accordingly, Principal Component Analysis (PCA) is employed to examine the global structure of the machining feature space, quantify variance distribution, and identify material-driven clustering behavior across combined geometric and process regimes.

3.4. Principal Component Analysis of Material–Property Relationships and Feature Impact

To better understand the underlying structure and hidden relationships within the machining dataset, a Principal Component Analysis (PCA) was performed on the standardized numerical variables: cutting force, diameter, feed, and axial distance. Figure 4 shows two related perspectives from this analysis: the PCA Score Plot (Figure 4a), which displays data points across the first two principal components, and the PCA Loading Plot (Figure 4b), which illustrates the strength and direction of each original variable’s correlation with these components.

The PCA Score Plot (Figure 4a) shows apparent material-based clustering along the PC1–PC2 plane. The first two components account for 78.72% of the total variance, with 53.47% attributed to PC1 and 25.25% to PC2. This indicates that reducing dimensions to two retains most of the structural information. Materials are organized across the four quadrants, each representing a particular geometric and process combination. This concentration of variance within two components suggests that the machining feature space is governed by a limited number of dominant structural modes.

Quadrant 1 (PC1+, PC2+) exhibits high feed rates, accompanied by moderate to high cutting forces and distance values. It mainly includes Carbon Steel 1020 and Stainless Steel 304, which are known for their high mechanical resistance and require power-intensive turning. Quadrant 2 (PC1−, PC2+) is characterized by aggressive feed conditions but involves smaller diameters, representing intense machining phases on ductile materials such as brass and Bronze. Quadrant 3 (PC1−, PC2−) features lower feed and force levels, typically using materials such as Brass C26000 and Aluminum 6061, during finishing passes or in minimal engagement zones. Quadrant 4 (PC1+, PC2−) involves configurations with moderate to high force and distance values but limited feed, indicating controlled finishing operations where diameter is a key factor.

This quadrant-based interpretation aligns with the earlier confidence ellipse analysis in Section 3.3, which demonstrated that variables such as feed and distance change independently. The clear elliptical clusters in the score plot further support the multivariate consistency of material–process regimes, particularly under controlled design stratification. Significantly, the non-Gaussian, multimodal characteristics previously discussed in histograms are reflected in the nonlinear dispersions observed across PC space, validating PCA as an effective method for revealing structure.

The PCA Loading Plot (Figure 4b) provides essential insights into how the original variables affect the PC axes. Cutting force, diameter, and distance have strong loadings on PC1, with values of +0.525, −0.553, and +0.584, respectively, indicating that PC1 mainly reflects geometric and force-related variations. Conversely, feed loads strongly correlate with PC2, with a coefficient of +0.891, indicating that PC2 represents a distinct aspect of process variability. The angular separation between feed and the other variables suggests low collinearity and high orthogonality, making the components easy to interpret. This orthogonal structure confirms that feed-driven variability is statistically separable from geometric–spatial effects within the dataset.

The vector directions and magnitudes in the loading plot highlight the concept of counteracting effects: diameter and distance are opposed along PC1, indicating an engineered taper or staged operation setup. The feed vector’s distinct radial extension indicates its high independence, making it the most influential parameter along PC2. Meanwhile, the cutting force, positioned between PC1 and PC2, highlights its role as an integrated variable influenced by both geometry and feed conditions.

The PCA score and loading plots confirm a distinct separation and complex interactions among machining variables. These findings also support the earlier observation that the dataset is not purely Gaussian, justifying the application of distribution-aware analytical methods. Importantly, the multivariate stratification observed in PCA space establishes a structured foundation for the regression analysis presented in the following sections, particularly in selecting modeling approaches capable of capturing nonlinear geometric–process interactions.

4. Model Training and Performance Metrics

4.1. Model Training and Hyperparameter Tuning

Before interpreting the predictive performance of the machine-learning models, it is important to describe the training and validation procedure used in this study. Model performance was evaluated using k-fold cross-validation, where the dataset was partitioned into five subsets and each subset was used once as validation while the remaining subsets were used for training. A proportion of the available observations was used for model training, while the remaining data were reserved for testing to assess predictive accuracy on previously unseen samples. The training dataset was used to fit the regression models and determine the optimal model parameters, while the testing dataset was used exclusively for performance evaluation. This separation ensures that the reported metrics reflect the ability of the models to generalize beyond the training data.

Furthermore, model training was performed using a structured hyperparameter-selection procedure to ensure that the reported differences among machine-learning models were not caused by arbitrary or suboptimal parameter settings. For algorithms with tunable hyperparameters, parameter values were selected through cross-validated search within predefined ranges, and the final configuration for each model was chosen based on validation performance. This procedure was applied especially to models known to be sensitive to hyperparameter selection, including Support Vector Machine, k-Nearest Neighbors, Decision Tree, Random Forest, AdaBoost, Gradient Boosting, Stochastic Gradient Descent, Partial Least Squares, and Neural Network models.

For each algorithm, hyperparameter tuning was carried out under the same validation framework used for comparative performance assessment, so that the reported accuracy metrics reflect calibrated implementations rather than default settings. The final hyperparameter values adopted for all models are summarized in Table 2. This addition improves the reproducibility of the study and supports the fairness of the cross-model benchmarking analysis presented in the following subsections.

4.2. Feature Importance Analysis

Grasping how material and process descriptors differently affect cutting force prediction helps relate model interpretability to the physical characteristics of the machining process. Figure 5 compares feature importance through two metrics: root mean square error (RMSE)-based error sensitivity (Figure 5a) and R²-based variance explanation (Figure 5b). Both are presented with ranked contributions and standard deviation bars, with the Mean Weighted Average Relevance Line (MWARL) aiding in distinguishing features with a strong influence from those with less impact.

In the RMSE-based analysis (Figure 5a), material type proves to be the most influential predictor, with an average importance of 68.27 ± 5.54, significantly above the MWARL threshold of 50.64. This highlights the significance of intrinsic material properties, such as hardness, thermal conductivity, and stiffness, in determining the behavior of cutting forces. Following closely are feed rate (66.26 ± 6.01) and distance from chuck (65.13 ± 4.52), which are the second- and third-most impactful variables. These findings highlight that material-specific responses are crucial, and process dynamics, particularly feed-induced plasticity and changing contact conditions across machining zones, play a significant role in shaping the force profile.

These top three features collectively account for nearly 79% of the total importance, indicating a highly skewed contribution profile. Conversely, diameter (36.04 ± 5.74) and zone (17.49 ± 1.07) are much less significant compared to MWARL, indicating that they offer limited additional value in predicting force behavior. This implies that, after considering continuous spatial features such as distance and material identity, other geometric descriptors contribute little additional predictive power.

The R²-based analysis (Figure 5b) supports these trends by providing a variance-focused view. Material type remains the most influential (0.581 ± 0.090), with feed (0.549 ± 0.096) and distance (0.530 ± 0.070) following closely. Together, these three variables account for 88.5% of the total variance, underscoring their consistent and significant contributions across various models. The consistently low relevance of zone (0.044 ± 0.005) emphasizes that discrete spatial labels are redundant when continuous geometric information, such as distance, already captures spatial variation.

The convergence of RMSE and R² metrics reinforces the importance of these key features. It provides physically consistent interpretation: the most informative inputs of the model align with recognized physical factors that affect force variation, such as material resistivity, plastic deformation rate (through feed), and tool–workpiece interaction length. This consistency between statistical results and physical understanding boosts confidence in the model’s ability to generalize and its transparency. Furthermore, the clear separation at the MWARL threshold provides a natural cutoff point for simplifying the model. Low-ranked features can be removed without significantly affecting predictive accuracy, enabling real-time use in sensor-limited settings. Overall, these results support a streamlined, physically based model structure that favors a minimal yet physically interpretable feature set, with a focus on force prediction in CNC turning.

From a broader perspective, this feature-importance analysis aligns with the growing interest in interpretable machine-learning models in manufacturing research. While advanced Explainable Artificial Intelligence (XAI) techniques, such as SHAP or Grad-CAM [39,40], are often used to interpret complex black-box models, the present study maintains interpretability by relying on physically meaningful machining descriptors. This approach allows the resulting model insights to be directly related to the underlying mechanics of the turning process.

Figure 6 illustrates the feature importance scores from the ensemble model, highlighting a highly skewed distribution of contributions among the descriptors. The three most essential features, distance (26.5%), feed (23.2%), and material identity, notably Stainless Steel 304 Annealed (21.1%), together make up over 70% of the model’s predictive power. This emphasizes their key roles in influencing force response variability. The results suggest that the model depends on a small set of highly informative features, consistent with a Pareto-like pattern where a few features account for most of the variance reduction.

The significance of distance is confirmed by earlier results from Figure 5a,b, now with greater detail. As machining progresses through different zones (F1–F5), localized effects such as thermal buildup, loss of stiffness, and tool wear become increasingly pronounced, resulting in a gradual increase in force over time. Using distance as a continuous variable efficiently captures this spatial dependence, revealing dynamic effects that discrete zone labels may miss. This indicates that distance acts as a proxy for tool–workpiece interaction deterioration, particularly during longer cuts.

Feed, the second-most influential variable, accounts for 23.2% of the model’s importance. It directly influences chip thickness and material removal rate, impacting both thermal load and mechanical strain on the cutting edge. Its high significance indicates the nonlinear response of force to even small changes in feed rate, a common characteristic in semi-ductile metals.

Among material types, Stainless Steel 304 Annealed stands out as the most force-sensitive alloy, contributing 21.1%. This is due to its high work-hardening coefficient and low thermal conductivity, which increase cutting resistance. Conversely, materials like Aluminum 6061 and Brass C26000, recognized for their ease of machining, show significantly lower influence scores (2–5%), suggesting more consistent performance across various force conditions.

Interestingly, zone 1 alone accounts for 4.5%, while zones 2–5 each contribute less than 3.1%, with zone 4 (0.1%) being almost negligible. This sharp decline emphasizes the limited standalone informativeness of individual zone labels, particularly when distance is already included in the features. It also indicates that early cutting conditions (zone 1) exhibit greater variability or transitional behavior, which the model detects more effectively than the more stable mid-span areas.

From a modeling perspective, these insights indicate that a reduced set of informative descriptors, such as distance, feed, and aggregated material response, can capture most of the variance in cutting force behavior. For example, in systems with limited sensors, removing less important categorical variables, such as zone labels, can maintain model accuracy while reducing computational demands. Additionally, the distribution details provide a strong foundation for feature pruning in transfer learning, especially when new alloys or setups lack comprehensive data on variables.

4.3. Overall Predictive Performance of Machine Learning Models

To thoroughly assess the predictive ability of the machine learning models used for cutting force estimation, two key performance metrics were employed: Root Mean Square Error (RMSE), which penalizes large residuals and highlights accuracy, and the coefficient of determination (R²), which shows how much variance the model explains. Together, these metrics provide insight into the models’ precision and their ability to generalize. Figure 7 presents the performance of ten models, with ensemble-mean benchmarks (RMSE = 38.54, R² = 0.81) serving as reference points.

Among all the models, GB distinctly leads, with the lowest RMSE of 18.03 and the highest R² of 0.962. This highlights the effectiveness of additive tree-based approaches in modeling complex, nonlinear relationships and variable interactions, particularly in datasets characterized by structural gradients and moderate feature collinearity. Neural Networks (NNs) follow closely behind GB in both error reduction (RMSE = 19.96) and explanatory strength (R² = 0.953), demonstrating their adaptability in capturing intricate regression patterns. However, they require more precise parameter tuning and data scaling.

Random Forests (RF) and AdaBoost (AB) exhibit stable yet moderate performance (RMSE approximately 33–34; R² around 0.86–0.87). This suggests that ensemble averaging enhances robustness against local variations, though it does not offer the sequential correction benefit seen in boosting. These models are especially adept at handling high-dimensional, partially correlated inputs, aligning with the structure of the machining dataset.

Models based on simpler inductive biases, such as LR, PLS, and DT, exhibit noticeable limitations. Their RMSE values are consistently higher than the ensemble average (45–52), and their R² scores range from 0.68 to 0.74, indicating that they struggle to capture the nonlinearities and interactions typical of machining processes. Support Vector Machines (SVM) perform the worst overall, with an RMSE of 54.76 and an R² of 0.6479, indicating limited ability to capture the underlying relationships in the dataset. Their prediction error is substantially higher than the mean RMSE (38.54), while their explanatory power falls well below the average R² level (0.81) observed across models. This suggests that SVM struggles to adequately represent the nonlinear interactions and coupled effects typical of machining processes.

A key insight is the consistent correlation between RMSE and R² rankings. Models that perform well in minimizing errors also tend to account for more variance, showing metric consistency and a stable hierarchy of performance across different evaluation methods. The RMSE standard deviation (σ = 23.17) highlights notable performance differences among models, indicating high sensitivity to algorithmic complexity and structural flexibility. The RMSE range, from 18.03 to 91.15, shows that choosing the right model is crucial and can significantly impact predictive accuracy.

These results highlight the statistical advantage of gradient-based ensemble models for cutting force prediction within the studied experimental regime. In comparison, conventional regressors and fixed-margin models do not adapt as well, emphasizing the need for nonlinear, interaction-aware, and ensemble learners in practical manufacturing settings.

5. Residual Distribution and Prediction Accuracy Analysis

5.1. Model-Specific Residual Error Patterns

Residual plots are essential diagnostic tools for confirming assumptions in regression models, especially concerning linearity, homoscedasticity, and error independence. In this research, residuals, which measure the difference between predicted and actual cutting force values, are plotted against the predicted results for each machine learning model, as shown in Figure 8. This visualization provides insights into model-specific error patterns, biases, and the extent to which the models accurately capture relationships across the entire prediction range.

The PLS model (Figure 8a) displays an hourglass-shaped dispersion, with residual variance kept within ±10 N near the midrange predictions but expanding broadly toward the upper end, exceeding ±40 N. This pattern indicates increasing heteroscedasticity and highlights the model’s limited capacity to capture nonlinear deformation–force interactions, which stems from its linear latent-variable structure. The SGD model (Figure 8b) exhibits a similar widening of residuals, albeit more symmetrical, suggesting sensitivity to force magnitude and a degree of rigidity in the model when applying gradient descent in high-variance regions.

The AB model (Figure 8c) exhibits a fairly concentrated residual distribution, with values mostly staying within ±20 N and centered around zero. However, it displays slight skewness at lower cutting forces (<25 N). This asymmetry indicates the effects of instance re-weighting inherent to boosting algorithms, especially when low-force samples are underrepresented during iterative fitting. The NN model (Figure 8d) exhibits better residual consistency, with residuals primarily within ±15 N throughout the prediction range. The nearly flat residual band highlights the NN’s ability to learn complex nonlinear mappings with stable generalization, even in areas with sparse data or higher complexity.

The LR model (Figure 8e) shows structured residuals that increase proportionally with force predictions, creating a clear conical pattern. This breaks the homoscedasticity assumption and reveals the model’s failure to capture nonlinear or interaction effects between thermomechanical descriptors and cutting force. The DT model (Figure 8f) exhibits distinct clusters of residuals, particularly near transition points between force levels, with vertical jumps exceeding ±50 N in some cases. This “stepwise” residual pattern is a common sign of overfitting in single-tree models, where sharp splits and lack of smooth interpolation cause sudden prediction errors.

The kNN model (Figure 8g) exhibits some residual stratification, with groups of points clustering around common force values, resulting in a staircase-like pattern. Although the average bias remains low, the stepped residuals suggest possible issues with the model’s ability to predict accurately in less populated regions of the feature space. In contrast, the RF model (Figure 8h) significantly improves residual compactness, with most errors falling within ±20 N and no clear bias trends. Its ensemble approach reduces variance and avoids localized overfitting, leading to a more consistent residual pattern across the prediction spectrum.

Support Vector Regression (SVM) (Figure 8i) underperforms compared to other methods, exhibiting residuals that follow a pronounced downward trend rather than a random distribution around zero. As the predicted force increases, the residuals systematically become more negative, indicating a growing underestimation of the true cutting force. This structured pattern reveals a strong model bias and suggests that the SVM formulation fails to adequately capture the nonlinear scaling behavior of the process. Instead of producing balanced and homoscedastic residuals, the model generates progressively larger deviations at higher force levels, reflecting a limited capacity to represent the underlying force dynamics. Conversely, GB (Figure 8j) demonstrates one of the best residual profiles, with errors closely centered around zero and confined within ±15 N. The step-by-step improvement of weak learners allows GB to effectively reduce localized biases and maintain smooth residuals even at high-force levels.

In summary, examining residual behavior across different models provides a detailed view of their predictive robustness that goes beyond traditional error metrics. Ensemble methods, such as RF and GB, combined with the neural network model, produce residual distributions that are statistically consistent, unbiased, homoscedastic, and compact, indicating that they effectively capture the underlying physical relationships. Conversely, models with linear formulations or static kernels exhibit higher residual variance and structural inconsistencies, which limit their usefulness for modeling complex, force-dependent machining phenomena. Therefore, residual analysis serves as a valuable complement to RMSE and R² metrics, revealing hidden prediction instabilities, aiding in model selection, and assessing the potential for real-world application in cutting processes.

5.2. Predicted vs. Observed Cutting Force Behavior

Figure 9 provides a clear visualization of the prediction accuracy of each machine learning model by comparing predicted cutting force values with the actual ground truth. The parity line (y = x) serves as an ideal benchmark, indicating where predicted and actual values align perfectly. The extent of scatter, curvature, and consistent deviations from this line collectively illustrate each model’s ability to represent the underlying mechanics of force generation accurately.

The PLS model (Figure 9a) shows good alignment near the central force area, but divergence appears at both low and high-force regions. The upper right quadrant tends to underpredict high cutting forces, and the lower-left corner shows wider error bands, highlighting limitations in capturing nonlinear deformation behaviors.

SGD (Figure 9b) exhibits a similar primary trend but shows a more symmetrical spread across the prediction space. However, the points fan out from the parity line, indicating uneven performance and challenges in predicting force behavior in sparse regions, particularly at force levels beyond approximately 200 N.

AdaBoost (Figure 9c) demonstrates a tight clustering of predictions throughout the entire force range, especially between 100–300 N. There is only a slight downward deviation for forces below 100 N. This indicates good generalization across most of the machining domain, although some residual skewness can still be observed at low-force levels.

The Neural Network (Figure 9d) shows the most linear and symmetric distribution of predictions. Throughout the entire cutting force range (0–350 N), the data points align closely with the parity line, exhibiting minimal scatter, which indicates the model’s strong ability to manage nonlinear interactions between input features and the machining response. The lack of banding, curvature, or outlier clusters suggests excellent model generalization.

In contrast, Linear Regression (Figure 9e) shows a noticeably curved prediction pattern. Lower predicted forces suggest a moderate alignment, but the curve consistently bows downward in the high-force range (above 250 N), resulting in a notable underestimation. This deviation is a typical result when linear models are used to fit nonlinear systems.

The Decision Tree (Figure 9f) displays a fragmented prediction pattern, characterized by force values in stepped vertical bands. This discreteness indicates overfitting and poor interpolation, especially around transition zones near 150 N and 275 N. The prediction lacks smooth transitions, reducing its reliability.

In the kNN model (Figure 9g), predictions are organized into horizontally aligned blocks, highlighting its instance-based approach. Although clusters near mid-range forces are reasonably well-aligned, outliers occur at both ends due to local sparsity. The banded response pattern indicates limited flexibility in regions with complex features.

The Random Forest model (Figure 9h) produces a dense cluster of predictions that closely follow the parity line. The points are evenly spread across forces from low to high (up to 350 N), demonstrating effective ensemble averaging and good generalization. The absence of sharp outliers or heavy banding emphasizes its robustness.

In contrast, the SVM model (Figure 9i) exhibits a dispersed and biased prediction pattern that deviates noticeably from the ideal parity line. Rather than following the expected linear relationship, many predictions cluster below the ideal line at higher force levels, indicating systematic underestimation as the cutting force increases. This behavior suggests that the model struggles to properly scale across the full force range, particularly beyond approximately 200 N. The resulting distribution reflects limited regression flexibility and difficulty capturing the nonlinear relationships governing the machining process.

Finally, Gradient Boosting (Figure 9j) demonstrates a nearly perfect prediction profile. The model’s outputs stay closely aligned with the parity line across the entire force range, from the lowest to the highest values, with almost no directional bias. Even beyond 300 N force, predictions stay consistent, indicating excellent model tuning and layered error refinement.

Overall, these parity plots demonstrate that nonlinear ensemble models and neural networks exhibit superior predictive performance, as indicated by their linear, centered distributions with minimal scatter. In contrast, the nonlinearity caused by deformation in the machining process makes it difficult for several regression models to maintain consistent scaling behavior. Linear response type models such as PLS, SGD, LR, and SVM show a compressed response pattern, increasingly underestimating cutting forces at higher loads. Other nonlinear response methods, including AB, DT, kNN, and RF, better follow the overall trend but exhibit growing variance as cutting force increases. Only the NN and GB models maintain a stable and well-centered prediction distribution across the full force range. The upcoming section will review the specific performance rankings and error metrics to provide quantitative support for these visual observations.

6. Comparative Evaluation of Machine Learning Models

6.1. Pairwise Model Ranking Based on R² and CV(RMSE)

To evaluate the statistical significance of one model outperforming another beyond just raw metrics, pairwise probability matrices were created using R² and CV(RMSE) comparisons among the ten predictive models. These matrices, shown in Table 2 and Table 3, indicate the probability that the model in each row surpasses the one in the column, providing a probabilistic understanding of their comparative performance.

According to Table 3, the GB model consistently outperforms other models, with a 100% chance of surpassing PLS, SGD, and SVM, and very high probabilities against the remaining models (95.6% vs. AB, 99.9% vs. LR, 95.2% vs. DT, 98.3% vs. kNN, and 94.9% vs. RF). The only model that remains competitive is the NN, where the probability drops to 76.8%, indicating a closer statistical comparison between these two high-performing models. This consistent performance highlights GB’s strong ability to adapt across different data types.

The NN also shows a significant statistical advantage, outperforming most other models with probabilities exceeding 94% (e.g., 99.9% vs. SGD, 99.5% vs. LR, 95.6% vs. DT, and 98.6% vs. kNN). However, NN does not dominate GB (23.2%) and shows a moderate comparison with RF (94.6%), indicating that ensemble learners remain competitive in capturing nonlinear deformation–force relationships.

In contrast, SVM ranks the lowest, showing extremely low dominance probabilities across nearly all comparisons. For example, its probability of outperforming PLS, SGD, and AB is only 2.5%, 2.0%, and 3.3%, respectively, and it achieves just 0.2% against NN and 0% against GB. This supports its earlier observed underperformance, highlighting its failure to match its kernel space with the nonlinear deformation–force manifolds in the dataset.

Linear Regression (LR), SGD, and PLS appear as mid-tier models. They demonstrate moderate dominance probabilities against weaker learners; for example, PLS shows 79.6% vs. SGD, 80.4% vs. DT, and 87.6% vs. kNN, while SGD achieves 78.5% vs. DT and 86.0% vs. kNN, but they fail to compete effectively against NN and GB, where probabilities drop close to zero. Notably, PLS performs competitively against LR (84.0%) and DT (80.4%) but does not significantly outperform RF (9.9%) or NN (0.1%), indicating limitations imposed by its latent linear structure.

The CV(RMSE)-based matrix in Table 4 confirms these rankings from a predictive stability perspective. GB maintains near-universal dominance, outperforming PLS, SGD, LR, and SVM with probabilities of 100%, while also strongly surpassing DT (99.6%), kNN (99.9%), RF (99.4%), and AB (99.3%). Only NN remains somewhat competitive with GB, with a probability of 79.1%.

Neural Networks similarly demonstrate strong predictive stability. NN dominates PLS, SGD, AB, LR, and DT with probabilities exceeding 99%, while also outperforming kNN (100%), RF (98.6%), and SVM (99.7%). These results suggest that NN predictions maintain low cross-validation variance and stable generalization across different folds.

AdaBoost (AB), although competitive in raw RMSE metrics, shows decreased robustness in cross-validation. Its probability of outperforming NN and GB falls to 0.4% and 0.7%, respectively. This inconsistency likely stems from AB’s dependence on reweighted instance distributions, which can become unstable during fold reshuffling and exposure to rare events.

Meanwhile, RF demonstrates improved stability compared with simpler learners, outperforming PLS, SGD, LR, and DT with probabilities exceeding 92%, though it remains weaker than NN and GB with probabilities of 1.4% and 0.6%, respectively. This pattern confirms the effectiveness of ensemble averaging in buffering variance while still lagging behind gradient-boosted optimization.

Support vector machine (SVM) remains the least stable model in the CV(RMSE) matrix, achieving dominance probabilities close to zero in nearly all comparisons. For instance, it records only 0.1% vs. PLS, 0.1% vs. SGD, 0.2% vs. AB, and 0% vs. NN, confirming its poor predictive stability across folds.

These probabilistic matrices reveal a clear statistical hierarchy. GB and NN consistently lead in terms of explained variance and cross-validated stability. RF and AB occupy a second tier, demonstrating moderate robustness but weaker dominance against the top models. Linear models (PLS, SGD, LR) and tree-based methods (DT, kNN) exhibit inconsistent performance, occasionally outperforming weaker learners but failing to compete with advanced nonlinear methods. SVM consistently underperforms across both matrices, indicating its limited effectiveness for modeling the complex deformation–force relationships inherent in machining processes.

6.2. Material-Wise Prediction Accuracy Across Models

In addition to model-to-model comparisons, a detailed analysis was conducted at the material level to assess how prediction accuracy depends on specific deformation behaviors. Table 5 presents the R² values for each machine learning model across the five metallic materials examined. Meanwhile, Table 6 and Table 7 highlight the best and worst model–material pairings, providing a dual perspective on the robustness of the models and their sensitivity to different materials.

The GB model surpasses all others in nearly every material, achieving the highest R² scores for four of the five alloys. Notably, GB reaches an R² of 0.99108 for Stainless Steel 304 Annealed, showing near-perfect predictability despite this material’s work-hardening and low thermal conductivity, which typically complicate modeling. For Brass C26000 and Bronze C51000, GB records R² values of 0.98754 and 0.98732, demonstrating their capacity to model both conductive and plastically deformable behaviors. Even with the more ductile Aluminum Alloy 6061 and the less predictable Carbon Steel 1020 Annealed, GB maintains excellent accuracy, as evidenced by R² scores of 0.98608 and 0.97823, respectively, emphasizing its robustness across different material types.

Neural Networks (NN) demonstrate material-agnostic performance, with R² values exceeding 0.95 for all alloys, reaching a peak of 0.98725 for Stainless Steel 304. These high scores highlight NN’s ability to capture nonlinear relationships, especially where traditional parametric models often fail. Notably, NN outperforms other models for Carbon Steel 1020 (R² = 0.98081), indicating its strong adaptability to materials with intermediate hardness and mixed-mode deformation.

In contrast, SVM consistently underperform, showing the lowest R² scores for three out of five materials. A notable example is Bronze C51000, where SVM yields an R² of 0.84303, failing to model its complex strain behavior accurately. Similarly, Brass C26000 and Aluminum Alloy 6061 have SVM R² values of 0.82378 and 0.82539, respectively, indicating issues with kernel alignment and limited data generalization. This pattern aligns with earlier residual analysis, which revealed irregular and biased error patterns for SVM.

The performance of LR and DT varies. Both models perform moderately well on Brass C26000 (with R² of 0.95184 for LR and 0.89121 for DT), but their predictive accuracy drops significantly for Aluminum Alloy 6061, recording 0.85813 and 0.73558, respectively. These decreases indicate a struggle to model second-order or interaction effects inherent in this alloy’s thermal–mechanical behavior.

Among intermediate performers, RF consistently provides reliable results, achieving R² values above 0.89 across all materials, with a peak of 0.96724 for Brass C26000. This indicates that RF’s bagging approach helps stabilize predictions, even with noisy data. AdaBoost (AB) shows a similar but slightly less consistent pattern, performing well on Stainless Steel 304 (R² = 0.95006) and Brass (R² = 0.94826), but its performance drops on Aluminum (R² = 0.87693), possibly because class weighting amplifies local deviations.

According to the material-centric perspective in Table 7, GB is identified as the leading model for four out of five alloys, attaining the highest R² for Aluminum Alloy 6061, Brass C26000, Bronze C51000, and Stainless Steel 304. The sole exception is Carbon Steel 1020, where the NN delivers slightly better accuracy (R² = 0.98081 compared to 0.97823), likely because of its neural architecture’s enhanced interpolation of mid-range feature distributions. In contrast, SVM ranks lowest among the models for four of the five materials, indicating consistent struggles in modeling conductive or soft alloys. Interestingly, SVM becomes the poorest performer for Carbon Steel 1020 Annealed (R² = 0.77571), highlighting its sensitivity to high variance in difficult-to-machine materials.

In summary, this material-specific accuracy evaluation confirms previous results: GB and NN provide the most dependable, generalizable prediction methods across various thermomechanical environments. RF and AB are effective secondary alternatives, with slight differences depending on the material. Simpler linear and tree-based models tend to perform poorly with alloys that have complex deformation behaviors. In contrast, SVM is not suitable for accurately predicting machining-induced forces in metallic systems.

To further consolidate the material-wise insights discussed earlier, a ranked summary of predictive performance was developed by averaging the R² and RMSE values across all ten machine learning models for each material. This dual-metric method provides a thorough assessment of how consistently and accurately each alloy’s cutting force behavior was modeled, minimizing model-specific biases.

In the ranking analysis presented in Table 8, model performance across the different materials is evaluated using the coefficient of determination (

R^{2}

) and the root mean square error (RMSE). Higher

R^{2}

values and lower RMSE values indicate better predictive accuracy, and the rankings are assigned accordingly. For instance, Brass C26000 achieved the highest average

R^{2}

value (0.9415), indicating that the machine learning models were able to capture a slightly larger proportion of the variance in the cutting-force response for this material. In contrast, Stainless Steel 304 Annealed exhibited the lowest RMSE (29.24 N), reflecting the smallest absolute prediction error among all materials. This discrepancy between the two metrics suggests that while the models explained the variability in Brass C26000 marginally better, the predictions for Stainless Steel 304 were more tightly concentrated around the measured values. Such behavior is often observed when a material exhibits more stable and less dispersed machining responses, resulting in smaller absolute prediction errors even when the variance explained by the model is slightly lower.

This is especially noteworthy considering the alloy’s low thermal conductivity and high work-hardening capacity, traits that usually complicate modeling accuracy. Its steady deformation behavior appears to have enabled the models to generalize effectively across cross-validation folds, with RMSE calculated using normalized residuals for improved comparability.

Brass C26000 and Bronze C51000 ranked second and third, respectively, both exhibiting an optimal balance of plasticity and thermal conductivity. This helped facilitate low-bias model learning and prevented overfitting. Conversely, Carbon Steel 1020 Annealed and Aluminum Alloy 6061 were less predictable, with lower mean R² values and higher RMSE. These findings likely stem from the more complex mechanical behaviors of Carbon Steel, as well as the nonlinear strain hardening in Carbon Steel, and the sensitivity of Aluminum Alloy 6061 to variations in feed rate and ductility, which introduce nonlinearities that pose challenges for generalized regression models.

This performance analysis confirms that the predictability of machining is highest for materials exhibiting moderate and stable thermomechanical behavior, consistent with the assumptions underlying regression-based learning models. In contrast, materials with pronounced nonlinear responses or high variability, such as those with complex strain hardening or feed sensitivity, necessitate more specialized modeling methods and could decrease the accuracy of general machine learning predictions across different material systems [41,42].

Although the present benchmarking framework demonstrates strong predictive consistency under controlled turning conditions, several limitations should be acknowledged. All experiments were conducted under fixed depth-of-cut in dry machining environments, which restricts direct extrapolation to high-speed, lubricated, or dynamically adaptive production settings. Furthermore, while the dataset size is sufficient for comparative regression analysis, deeper neural architectures or physics-informed learning strategies may benefit from larger multi-regime datasets. Future work may extend the present framework to variable-speed machining and multi-objective prediction scenarios, including simultaneous modeling of cutting force and surface integrity indicators.

7. Conclusions

This study presented a comprehensive benchmarking and diagnostic evaluation of ten supervised machine learning models for predicting cutting force in CNC turning across five engineering-grade metallic alloys. Under controlled machining conditions, clear performance differences were observed among model families in terms of accuracy, stability, and residual behavior.

Ensemble-based and nonlinear models consistently outperformed linear and kernel-based approaches. Gradient Boosting achieved the best overall performance, with

R^{2}

values exceeding 0.98 for all materials and reaching 0.991 for Stainless Steel 304 Annealed, alongside a mean RMSE of 18.03 N across the full dataset. Neural Networks followed closely, particularly for Carbon Steel 1020 Annealed (

R^{2} = 0.981

), indicating strong adaptability to intermediate material responses. In contrast, Support Vector Regression showed unstable behavior and low predictive capability, with

R^{2}

values as low as 0.776 for Carbon Steel 1020 Annealed and RMSE values exceeding 80 N for Aluminum Alloy 6061.

Residual-prediction analysis provided insight beyond aggregate metrics. Linear regression and single decision-tree models exhibited pronounced heteroscedasticity and systematic bias, whereas ensemble and neural models produced compact, symmetric residual distributions largely confined within ±15–20 N. Pairwise dominance probability matrices confirmed the statistical hierarchy of model performance, with Gradient Boosting exhibiting dominance probabilities exceeding 94% over all competing models in both

R^{2}

and CV(RMSE) comparisons.

Material-wise analysis showed that Stainless Steel 304 Annealed and Brass C26000 were the most predictable alloys, with mean

R^{2}

values above 0.93 across models, while Aluminum Alloy 6061 and Carbon Steel 1020 Annealed exhibited greater variability and lower average predictability. Principal Component Analysis showed that the first two components explained 78.7% of the total variance, dominated by geometric and spatial descriptors (PC1: 53.5%) and feed rate (PC2: 25.2%).

The results show a clear pattern across all tested materials. Ensemble-based models, particularly Gradient Boosting, consistently delivered the most reliable predictions, combining high accuracy with stable residual behavior. Neural Networks performed well overall and adapted effectively to intermediate material responses, although their performance varied slightly across cases. In contrast, Support Vector Regression was less stable and more sensitive to variability in the data, which led to weaker predictive performance. Differences between materials were also evident. Stainless Steel 304 Annealed and Brass C26000 produced more consistent and predictable responses, whereas Aluminum Alloy 6061 and Carbon Steel 1020 Annealed showed greater variability, which made accurate prediction more challenging. This behavior is reflected in the statistical structure of the dataset, where Principal Component Analysis indicated that feed rate and geometric descriptors dominate the variation, with the first two components accounting for 78.7% of the total variance. Taken together, these results suggest that predictive performance depends not only on the choice of model, but also on how stable the underlying material response is. In practical machining environments, this is important since models that are accurate but unstable can introduce unwanted fluctuations in control systems. From this perspective, models that combine good accuracy with consistent residual behavior are more suitable for real-world applications. The dominance-based evaluation approach used in this study therefore provides a practical way to identify robust models for multi-material CNC turning scenarios.

Author Contributions

Conceptualization, M.S.A.; methodology, M.S.A.; software, M.S.A. and S.A.B.; validation, M.S.A. and S.A.B.; formal analysis, M.S.A. and S.A.B.; investigation, S.A.B.; resources, M.S.A.; data curation, S.A.B.; writing—original draft preparation, M.S.A.; writing—review and editing, S.A.B.; visualization, M.S.A.; supervision, M.S.A.; project administration, M.S.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The author gratefully acknowledges Umm Al-Qura University, Makkah, Saudi Arabia, for providing the facilities and resources that supported the successful completion of this research.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

Abbreviation	Full Term
CNC	Computer Numerical Control
ML	Machine Learning
RMSE	Root Mean Square Error
R²	Coefficient of Determination
CV(RMSE)	Coefficient of Variation of Root Mean Square Error
GB	Gradient Boosting
RF	Random Forest
AB	AdaBoost
NN	Neural Network
kNN	k-Nearest Neighbors
DT	Decision Tree
LR	Linear Regression
PLS	Partial Least Squares
SGD	Stochastic Gradient Descent
SVM	Support Vector Machine
HB	Brinell Hardness
W/m·K	Watts per meter-Kelvin (Thermal Conductivity)
GPa	Gigapascal (Young’s Modulus)
δ	Axial Deformation
F	Cutting Force
MWARL	Mean Weighted Average Relevance Line
PC1	Principal Component 1
PC2	Principal Component 2
PCA	Principal Component Analysis
μ	Mean
σ	Standard Deviation
α	Regression Coefficient
β	Regression Coefficient

References

Alsoufi, M.S.; Bawazeer, S.A.; Alhazmi, M.W.; Hijji, H.H.; Alhazmi, H.; Alqurashi, H.F. Dimensional Accuracy and Measurement Variability in CNC-Turned Parts Using Digital Vernier Calipers and Coordinate Measuring Machines Across Five Materials. Materials 2025, 18, 2728. [Google Scholar] [CrossRef] [PubMed]
Qiao, X.; Bai, X.F.; Zhang, Y.; Meng, D.R. The Influence of Residual Stresses on Dimension Stability during Thermal Mechanical Processes for High-Precision Components. Mater. Sci. Forum 2020, 982, 151–156. [Google Scholar] [CrossRef]
Tang, Z.; Huang, C.; Shi, Z.; Li, B.; Liu, H.; Niu, J.; Chen, Z.H.; Jiang, G. A new characterization method for stress, hardness, microstructure and shear streamline using the stored energy field in deformation zone of workpiece. Int. J. Mach. Tools Manuf. 2022, 178, 103891. [Google Scholar] [CrossRef]
Alsoufi, M.S.; Bawazeer, S.A. Probabilistic Analysis of Surface Integrity in CNC Turning: Influence of Thermal Conductivity and Hardness on Roughness and Waviness Distributions. Machines 2025, 13, 385. [Google Scholar] [CrossRef]
Ben Saoud, F.; Korkmaz, M.E. A Review on Machinability of Shape Memory Alloys Through Traditional and Non-Traditional Machining Processes: A Review. Manuf. Technol. Appl. 2022, 3, 14–32. [Google Scholar] [CrossRef]
Fatirah, R.S.; Norkhairusshima, M.K.; Suhaily, M.; Najiah, D.A.; Natasha, A.R. Optimization of Machining Parameters During Milling of Carbon Fibre Reinforced Plastics (CFRP). In Proceedings of the 3rd Malaysian International Tribology Conference; Springer: Singapore, 2022. [Google Scholar] [CrossRef]
Rathod, N.J.; Chopra, M.K.; Vidhate, U.S.; Gurule, N.B.; Saindane, U.V. Investigation on the turning process parameters for tool life and production time using Taguchi analysis. Mater. Today Proc. 2021, 47, 5830–5835. [Google Scholar] [CrossRef]
Hastaoglou-Martinidis, V. Identifying key interactions between process variables of different material categories using mutual information-based network inference method. Procedia Comput. Sci. 2022, 200, 1550–1564. [Google Scholar] [CrossRef]
Valdés-Alonzo, G.; Binetruy, C.; Eck, B.; García-González, A.; Leygue, A. Phase distribution and properties identification of heterogeneous materials: A data-driven approach. Comput. Methods Appl. Mech. Eng. 2021, 390, 114354. [Google Scholar] [CrossRef]
Espeseth, V.K.; Morin, D.; Børvik, T.; Hopperstad, O.S. A gradient-based non-local GTN model: Explicit finite element simulation of ductile damage and fracture. Eng. Fract. Mech. 2023, 289, 109442. [Google Scholar] [CrossRef]
Samaniego, E.; Ulloa, J.; Rodriguez, P.; Samaniego, C. Variational Modelling of Strain Localization in Solids: A Computational Mechanics Point of View. Arch. Comput. Methods Eng. 2021, 28, 1183–1203. [Google Scholar] [CrossRef]
Negi, A.; Kumar, S. Localizing gradient damage model with smoothed stress based anisotropic nonlocal interactions. Eng. Fract. Mech. 2019, 214, 21–39. [Google Scholar] [CrossRef]
Shen, J. Prediction of Machining Quality and Tool Wear in Micro-Turning Machine Using Machine Learning Models. In Advances in Micro and Nano Manufacturing and Surface Engineering; Springer: Singapore, 2022. [Google Scholar] [CrossRef]
Aggogeri, F.; Pellegrini, N.; Tagliani, F.L. Recent Advances on Machine Learning Applications in Machining Processes. Appl. Sci. 2021, 11, 8764. [Google Scholar] [CrossRef]
Shurrab, S.; Almshnanah, A.; Duwairi, R. Tool Wear Prediction in Computer Numerical Control Milling Operations via Machine Learning. In Proceedings of the 12th International Conference on Information and Communication Systems (ICICS), Valencia, Spain, 24–26 May 2021. [Google Scholar] [CrossRef]
Jin, F.; Bao, Y.; Jin, X. Tool wear prediction in edge trimming of carbon fiber reinforced polymer using machine learning with instantaneous parameters. J. Manuf. Process. 2022, 82, 277–295. [Google Scholar] [CrossRef]
Abdallah, A.; Korkmaz, M.E.; Yaşar, N.; Gupta, M.K. Machine Learning-Based Modeling of Machining Performance under Different Cutting Conditions. Int. J. Adv. Manuf. Technol. 2024, 133, 2171–2188. [Google Scholar] [CrossRef]
Kumar, S.; Singh, R.; Sharma, V.S.; Gupta, M.K. Machine Learning Prediction of Tribological Performance in Machining Processes. Tribol. Int. 2023, 190, 109207. [Google Scholar] [CrossRef]
Ni, Y.; Li, Y.; Liu, C.; Liu, X. A mechanism informed neural network for predicting machining deformation of annular parts. Adv. Eng. Inform. 2022, 53, 101661. [Google Scholar] [CrossRef]
Akbari, P.; Zamani, M.; Mostafaei, A. Machine learning prediction of mechanical properties in metal additive manufacturing. Addit. Manuf. 2024, 91, 104320. [Google Scholar] [CrossRef]
Nieto-Fuentes, J.C.; Rittel, D.; Osovski, S. On a dislocation-based constitutive model and dynamic thermomechanical considerations. Int. J. Plast. 2018, 108, 55–69. [Google Scholar] [CrossRef]
Kuvyrkin, G.N.; Savelieva, I.Y. Thermomechanical model of nonlocal deformation of a solid. Mech. Solids 2016, 51, 256–262. [Google Scholar] [CrossRef]
Zhao, Z.; Li, Y.; Liu, C.; Chen, Z.; Chen, J.; Wang, L. A subsequent-machining-deformation prediction method based on the latent field estimation using deformation force. J. Manuf. Syst. 2022, 63, 224–237. [Google Scholar] [CrossRef]
Xu, X. Research on Cutting Force based on Thermal-mechanical Coupling Precision Modeling. Sci. J. Technol. 2022, 4, 25–31. [Google Scholar] [CrossRef]
Erturk, A.S.; Malakizadi, A.; Larsson, R. A thermomechanically motivated approach for identification of flow stress properties in metal cutting. Int. J. Adv. Manuf. Technol. 2020, 111, 1055–1068. [Google Scholar] [CrossRef]
Soofi, Y.; Gu, Y.J.; Liu, J. An adaptive Physics-based feature engineering approach for Machine Learning-assisted alloy discovery. Comput. Mater. Sci. 2023, 226, 112248. [Google Scholar] [CrossRef]
Linton, N.; Aidhy, D.S. A machine learning framework for elastic constants predictions in multi-principal element alloys. APL Mach. Learn. 2023, 1, 16109. [Google Scholar] [CrossRef]
Alsoufi, M.S.; Bawazeer, S.A.; Alhazmi, M.W.; Alhazmi, H.; Hijji, H.H. Zone-Wise Uncertainty Propagation and Dimensional Stability Assessment in CNC-Turned Components Using Manual and Automated Metrology Systems. Machines 2025, 13, 585. [Google Scholar] [CrossRef]
Alsoufi, M.S. A Dual-Mean Statistical and Multivariate Framework for Machinability Evaluation in CNC Turning: Gradient and Stiffness Analysis Across Five Materials. Materials 2025, 18, 2952. [Google Scholar] [CrossRef]
ISO 1832:2017; Indexable Inserts for Cutting Tools—Designation. International Organization for Standardization: Geneva, Switzerland, 2017.
ISO 5609:2012; Tool Holders for Turning and Copying with Indexable Inserts—Designation. International Organization for Standardization: Geneva, Switzerland, 2012.
Alsoufi, M.S.; Bawazeer, S.A. Mechanistic Prediction of Machining-Induced Deformation in Metallic Alloys Using Property-Based Regression and Principal Component Analysis. Machines 2026, 14, 37. [Google Scholar] [CrossRef]
Ansari, S.; Nassif, A.B. A comprehensive study of regression analysis and the existing techniques. In 2022 Advances in Science and Engineering Technology International Conferences (ASET); IEEE: New York, NY, USA, 2022; pp. 1–10. [Google Scholar] [CrossRef]
Sekerogiu, B.; Ever, Y.K.; Dimililer, K.; Al-Turjman, F. Comparative Evaluation and Comprehensive Analysis of Machine Learning Models for Regression Problems. Data Intell. 2022, 4, 620–652. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
Bishop, C.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006; ISBN 978-0387310732. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; ISBN 978-0262035613. [Google Scholar]
Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 4766–4777. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar] [CrossRef]
Bhattacharya, S.; Kalita, K.; Čep, R.; Chakraborty, S. A Comparative Analysis on Prediction Performance of Regression Models during Machining of Composite Materials. Materials 2021, 14, 6689. [Google Scholar] [CrossRef]
Malley, S.A.; Reina, C.; Nacy, S.M.; Gilles, J.; Koohbor, B.; Youssef, G. Predictability of mechanical behavior of additively manufactured particulate composites using machine learning and data-driven approaches. Comput. Ind. 2022, 142, 103739. [Google Scholar] [CrossRef]

Figure 1. Cutting force distribution across all machining trials fitted with (a) Gamma distribution and (b) Normal distribution. The gamma fit captures skewness in force response more accurately than the symmetric normal distribution. Red bars represent the frequency histogram of the measured data, while the black curve indicates the fitted probability distribution.

Figure 2. Pearson correlation heatmap for machining variables including cutting force, diameter, feed rate, and axial distance. Color intensity represents correlation magnitude, with values from –1 (strong negative) to +1 (strong positive correlation).

Figure 3. Pairwise scatter plots among diameter, feed rate, and chuck distance with 95% confidence ellipses (solid red ellipses). Diagonal panels show marginal histograms, while ellipse shapes and orientations indicate correlation strength and direction. The grey shaded area represents the area under the distribution and dashed lines indicate mean values.

Figure 4. Principal Component Analysis visualization of machining dataset: (a) PCA Score Plot showing material clustering and feature-driven stratification, and (b) PCA Loading Plot showing vector contributions of cutting force, diameter, feed, and distance to PC1 (53.47%) and PC2 (25.25%). The dashed circle represents the unit correlation circle, while arrows indicate the direction and magnitude of variable loadings on the principal components.

Figure 5. Relationship Between Feature Importance and Model Performance: (a) RMSE vs. Feature Importance, (b) R² vs. Feature Importance. Bars represent mean feature importance, while error bars denote ±1 standard deviation (STD) across model evaluations. The dashed red line indicates the mean weighted average reference level (MWARL).

Figure 6. Unified Feature Importance Scores with Percentage Contribution to Cutting Force Prediction.

Figure 7. Comparative Performance of Machine Learning Models: (a) Root Mean Square Error (RMSE) and (b) Coefficient of Determination (R²).

Figure 8. Residuals vs. Cutting Force for All Models: (a) PLS, (b) SGD, (c) AB, (d) NN, (e) LR, (f) DT, (g) kNN, (h) RF, (i) SVM, (j) GB.

Figure 9. Predicted vs. Cutting Force for All Models: (a) PLS, (b) SGD, (c) AB, (d) NN, (e) LR, (f) DT, (g) kNN, (h) RF, (i) SVM, (j) GB.

Table 1. Overview of Machine Learning Models for Cutting Force Prediction: Type, Strengths, and Limitations.

Abbreviation	Full Model Name	Type	Strengths	Limitations
PLS	Partial Least Squares	Linear	Handles multicollinearity; interpretable	Limited in capturing nonlinearity
SGD	Stochastic Gradient Descent	Linear	Efficient with large datasets	Sensitive to learning rate and scaling
AB	AdaBoost Regression	Ensemble	Reduces bias, handles nonlinearities	Prone to overfitting with noisy data
NN	Neural Network	Nonlinear	Captures complex nonlinear patterns	Requires large data; may overfit
LR	Linear Regression	Linear	Fast, interpretable	Poor with nonlinear data
DT	Decision Tree	Tree-based	Nonlinear, easy to interpret	High variance, prone to overfitting
kNN	k-Nearest Neighbors	Instance-based	Simple, non-parametric	Computationally expensive for large data
RF	Random Forest	Ensemble	High accuracy, reduces overfitting	Less interpretable
SVM	Support Vector Machine	Nonlinear	Effective in high-dimensional space	Poor performance with noise & large datasets
GB	Gradient Boosting	Ensemble	Excellent accuracy, robust to overfitting	Slower to train, sensitive to hyperparameters

Table 2. Hyperparameters of the machine learning models used in this study.

Model	Key Hyperparameters
PLS	components = 5; max_iter = 500; tol = 1 × 10⁻⁶; scale = True
SGD	loss = squared_error; penalty = l2; $α$ = 1 × 10⁻⁵; learning_rate = constant; $η$ = 0.01; max_iter = 1000; tol = 0.001
AB	estimators = 100; learning_rate = 1.0; loss = exponential; base estimator = DecisionTreeRegressor
NN	hidden_layer_sizes = (100,100,100); activation = relu; solver = adam; learning_rate_init = 0.001; $α$ = 0.0002; max_iter = 400
LR	$α$ = 0.0001; L1_ratio = 0.5; max_iter = 1000; fit_intercept = True; selection = cyclic
DT	criterion = squared_error; min_samples_split = 2; min_samples_leaf = 1; splitter = best
kNN	neighbors = 5; metric = Euclidean; weights = distance; algorithm = auto
RF	estimators = 30; criterion = squared_error; bootstrap = True; max_features = 1.0; min_samples_split = 2; min_samples_leaf = 1
SVM	kernel = linear; C = 1.1; regression loss = 0.2; max_iter = 400; tol = 0.001
GB	estimators = 200; learning_rate = 0.1; loss = squared_error; max_depth = 3; subsample = 0.9; criterion = friedman_mse

Table 3. Pairwise Dominance Probability Matrix Based on R² for Model-to-Model Comparison.

Model	PLS	SGD	AB	NN	LR	DT	kNN	RF	SVM	GB
PLS	-	0.796	0.100	0.001	0.840	0.804	0.876	0.099	0.975	0.000
SGD	0.204	-	0.098	0.001	0.822	0.785	0.86	0.084	0.980	0.000
AB	0.900	0.902	-	0.035	0.910	0.943	0.989	0.427	0.967	0.044
NN	0.999	0.999	0.965	-	0.995	0.956	0.986	0.946	0.998	0.232
LR	0.160	0.178	0.090	0.005	-	0.731	0.799	0.061	0.970	0.001
DT	0.196	0.215	0.057	0.044	0.269	-	0.427	0.064	0.438	0.048
kNN	0.124	0.140	0.011	0.014	0.201	0.573	-	0.010	0.470	0.017
RF	0.901	0.916	0.573	0.054	0.939	0.936	0.990	-	0.984	0.051
SVM	0.025	0.020	0.033	0.002	0.030	0.562	0.530	0.016	-	0.000
GB	1.000	1.000	0.956	0.768	0.999	0.952	0.983	0.949	1.000	-

Table 4. Pairwise Dominance Probability Matrix Based on CV(RMSE) for Model-to-Model Comparison.

Model	PLS	SGD	AB	NN	LR	DT	kNN	RF	SVM	GB
PLS	-	0.177	0.920	0.999	0.156	0.152	0.101	0.920	0.034	1.000
SGD	0.823	-	0.920	0.999	0.184	0.185	0.121	0.930	0.028	1.000
AB	0.080	0.080	-	0.996	0.070	0.014	0.003	0.573	0.044	0.993
NN	0.001	0.001	0.004	-	0.004	0.002	0.000	0.014	0.003	0.791
LR	0.844	0.816	0.930	0.996	-	0.273	0.194	0.954	0.051	1.000
DT	0.848	0.815	0.986	0.998	0.727	-	0.518	0.981	0.466	0.996
kNN	0.899	0.879	0.997	1.000	0.806	0.482	-	0.998	0.442	0.999
RF	0.080	0.070	0.427	0.986	0.046	0.019	0.002	-	0.026	0.994
SVM	0.966	0.972	0.956	0.997	0.949	0.534	0.558	0.974	-	1.000
GB	0.000	0.000	0.007	0.209	0.000	0.004	0.001	0.006	0.000	-

Table 5. Model Accuracy by Material: R² Values Across All Models and Alloys.

Model	Aluminum Alloy 6061	Brass C26000	Bronze C51000	Stainless Steel 304 Annealed	Carbon Steel 1020 Annealed
PLS	0.90161	0.95239	0.94009	0.95202	0.91428
SGD	0.91293	0.95210	0.94086	0.93450	0.89531
AB	0.87693	0.94826	0.91995	0.95006	0.91393
NN	0.96347	0.95280	0.97466	0.98725	0.98081
LR	0.85813	0.95184	0.94008	0.93476	0.88441
DT	0.73558	0.89121	0.84509	0.78524	0.81274
kNN	0.93737	0.95271	0.91812	0.92862	0.87770
RF	0.89207	0.96724	0.96293	0.96431	0.93635
SVM	0.82539	0.82378	0.84303	0.92351	0.77571
GB	0.98608	0.98754	0.98732	0.99108	0.97823

Table 6. Summary of Best and Worst Predicted Materials for Each Machine Learning Model.

Model	Best Material (Highest R²)	R²	Worst Material (Lowest R²)	R²
PLS	Brass C26000	0.95239	Aluminum Alloy 6061	0.90161
SGD	Brass C26000	0.95210	Carbon Steel 1020 Annealed	0.89531
AB	Brass C26000	0.94826	Aluminum Alloy 6061	0.87693
NN	Stainless Steel 304 Annealed	0.98725	Brass C26000	0.95280
LR	Brass C26000	0.95184	Aluminum Alloy 6061	0.85813
DT	Brass C26000	0.89121	Aluminum Alloy 6061	0.73558
kNN	Brass C26000	0.95271	Carbon Steel 1020 Annealed	0.87770
RF	Brass C26000	0.96724	Aluminum Alloy 6061	0.89207
SVM	Stainless Steel 304 Annealed	0.92351	Carbon Steel 1020 Annealed	0.77571
GB	Stainless Steel 304 Annealed	0.99108	Carbon Steel 1020 Annealed	0.97823

Table 7. Summary of Best and Worst Performing Models for Each Metallic Alloy.

Material	Best Model (Highest R²)	R²	Worst Model (Lowest R²)	R²
Aluminum Alloy 6061	GB	0.98608	DT	0.73558
Brass C26000	GB	0.98754	SVM	0.82378
Bronze C51000	GB	0.98732	SVM	0.84303
Stainless Steel 304 Annealed	GB	0.99108	DT	0.78524
Carbon Steel 1020 Annealed	NN	0.98081	SVM	0.77571

Table 8. Ranked Material Accuracy by Mean R² and RMSE Across All Models.

Material	Mean R²	Rank (R²)	Mean RMSE (N)	Rank (RMSE)
Stainless Steel 304 Annealed	0.9361	2nd	29.24	1st
Brass C26000	0.9415	1st	31.60	2nd
Bronze C51000	0.9351	3rd	34.87	3rd
Carbon Steel 1020 Annealed	0.9081	4th	36.12	4th
Aluminum Alloy 6061	0.9011	5th	38.66	5th

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alsoufi, M.S.; Bawazeer, S.A. Cross-Material Benchmarking of Machine Learning Models for Cutting Force Prediction in CNC Turning. Machines 2026, 14, 426. https://doi.org/10.3390/machines14040426

AMA Style

Alsoufi MS, Bawazeer SA. Cross-Material Benchmarking of Machine Learning Models for Cutting Force Prediction in CNC Turning. Machines. 2026; 14(4):426. https://doi.org/10.3390/machines14040426

Chicago/Turabian Style

Alsoufi, Mohammad S., and Saleh A. Bawazeer. 2026. "Cross-Material Benchmarking of Machine Learning Models for Cutting Force Prediction in CNC Turning" Machines 14, no. 4: 426. https://doi.org/10.3390/machines14040426

APA Style

Alsoufi, M. S., & Bawazeer, S. A. (2026). Cross-Material Benchmarking of Machine Learning Models for Cutting Force Prediction in CNC Turning. Machines, 14(4), 426. https://doi.org/10.3390/machines14040426

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cross-Material Benchmarking of Machine Learning Models for Cutting Force Prediction in CNC Turning

Abstract

1. Introduction

2. Materials and Methods

2.1. Materials and Experimental Setup

2.2. Machine Learning Models Employed

2.3. Feature Selection and Preprocessing

2.4. Model Performance Metrics and Validation Framework

3. Exploratory Data Analysis

3.1. Force Distribution

3.2. Correlation Structure

3.3. Bivariate Feature Trends with Confidence Ellipses

3.4. Principal Component Analysis of Material–Property Relationships and Feature Impact

4. Model Training and Performance Metrics

4.1. Model Training and Hyperparameter Tuning

4.2. Feature Importance Analysis

4.3. Overall Predictive Performance of Machine Learning Models

5. Residual Distribution and Prediction Accuracy Analysis

5.1. Model-Specific Residual Error Patterns

5.2. Predicted vs. Observed Cutting Force Behavior

6. Comparative Evaluation of Machine Learning Models

6.1. Pairwise Model Ranking Based on R2 and CV(RMSE)

6.2. Material-Wise Prediction Accuracy Across Models

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

6.1. Pairwise Model Ranking Based on R² and CV(RMSE)