1. Introduction
Milling operations are crucial to machining processes in manufacturing, and their integration with Industry 4.0 and 5.0 is vital for driving the digital transformation. A key method for lowering power consumption involves optimizing cutting parameters.
In the specialized literature, numerous studies have been conducted that analyzed energy consumption during a milling operation, taking into account the influence of cutting parameters [
1,
2,
3,
4,
5]. This papers focus on the impact of process parameters and tool wear on power and energy at machine, spindle, and net cutting levels. Key findings include that traditional empirical models accurately predict total and spindle specific energy but are less effective for net cutting specific energy. Tool wear significantly influences net cutting specific energy, followed by feed per tooth and cutting speed.
The energy usage of milling operations can be predicted through a versatile model that integrates power characteristics with parameters derived from numerical control codes. The Minimum Quantity Lubrication method results in 33% lower power consumption compared to conventional wet milling. The proposed energy consumption model by Bayat et Abootorabi [
6] accounts for workpiece adjustment, which is a significant cost factor in machining economics.
The feed rate is identified as the most significant factor affecting energy consumption in machining processes. Optimizing parameters such as radial slicing depth, feed rate, and spindle can lead to reduced power consumption during machining. A specific combination of a radial depth of 0.3 mm and a spindle speed of 12,000 rpm can achieve a minimum power usage of 82.38 kW [
7].
The energy usage in numerically controlled (NC) machining processes can be assessed by examining the relationship between NC codes and the power attributes of energy-consuming components within machine tools. A robust methodology for estimating energy consumption calculates the aggregate energy demand of components based on the NC program [
8]. This approach has been validated through comparisons between predicted energy consumption and empirical measurements obtained from an NC milling machine and an NC lathe.
The energy consumption of machining processes, including lathe operations, is substantially determined by cutting parameters such as spindle speed, depth of cut, and feed rate. Optimizing energy efficiency in machining systems is imperative to mitigate the elevated energy requirements of the industrial sector and to reduce associated production costs [
9].
The study by Moses et Ashok [
10] indicates that a special setup in lathe machines can significantly reduce power consumption while producing quality products. This new approach allows for simultaneous turning and finishing of components, leading to energy savings and improved productivity. Experiments showed that the special setup process enhances quality and reduces power consumption compared to existing processes.
The research by Nugrahanto et al. [
11] indicates that machine learning techniques can significantly improve energy efficiency during the CNC five-axis milling process by analyzing the relationship between machine parameters and energy consumption. The implementation of algorithms like Decision Tree and Random Forest resulted in lower Root Mean Square Error values, suggesting effective energy consumption reduction strategies. Experiments verified that the machine learning approach is practical for real-world use, demonstrating its ability to significantly reduce energy consumption in CNC machining operations.
A regression-based power consumption model has been proposed [
12] to predict the electric power consumed by manufacturing devices, including milling machines, based on the correlation between Material Removal Rate and Specific Energy Consumption. The research investigated face milling experiments involving ten distinct materials to quantify the power consumption of the machine tool throughout the milling process. The developed model robustly elucidates the impact of workpiece material properties on the overall power consumption of the machine tool.
Elsheikh et al. [
13] designed an experimental framework employing the Taguchi L16 orthogonal array to assess the influence of primary machining parameters (MQL metal cutting)—cutting speed, depth of cut, and feed rate—on cutting force, surface roughness, and tool wear. The findings revealed that cutting speed exerted the most significant effect on all evaluated process outcomes. Compared to Al
2O
3/oil nanofluid, CuO/oil nanofluid produced a better surface finish and less tool wear when used as a cutting fluid. Furthermore, an advanced random vector functional link (RVFL) model was created to forecast the reactions of the machining process. The RVFL–PO model’s predictions were compared to those of the standalone RVFL and hybrid RVFL–PSO (particle swarm optimization) models, and they were verified against experimental data.
Accurately anticipating cutting pressures in hard turning processes can lead to improved process control, reduced tool wear, and increased productivity. The goal of this study [
14] is to employ machine learning models to predict machining force components during hard turning of AISI 52100 bearing steels. Eight models were chosen, and their predictive performance was assessed using experimental data obtained while turning AISI 52100 bearing steel with a CBN cutting tool. The fivefold cross-validation technique has been used in training to produce more reliable estimates of a model’s performance while lowering the risk of overfitting the data. The findings revealed that the Gaussian process regression and decision tree regression outperformed the other models, with averaged root-mean-square errors of 14.44 and 12.72, respectively. The findings of this study can be useful in optimizing cutting parameters for hard turning processes to select cutting forces, reduce tool wear, and minimize the generated heat during the machining process.
In this study [
15], three regression-based machine learning techniques—polynomial regression, support vector regression, and Gaussian process regression—were employed to forecast machining force, cutting power, and cutting pressure during the turning of AISI 1045 steel. Key machining parameters, including cutting speed, depth of cut, and feed rate, served as input variables for constructing these predictive models. Given the significant influence of cooling and lubrication strategies on machining outcomes, model development focused on two distinct cutting conditions: low-lubrication and high-pressure coolant environments. The predictive accuracy of these models was evaluated through statistical error analysis techniques. Additionally, the performance of these regression-based approaches was benchmarked against a widely adopted machine learning method, artificial neural networks (ANN). To further optimize machining parameters, a metaheuristic strategy leveraging a neural network algorithm was applied, enabling efficient multi-objective optimization tailored to both cutting environments.
This study [
16] introduces the LSTM–TIMESTEP–ATT model, combining Long Short-Term Memory networks with a multi-time-step attention mechanism to enhance daily runoff forecasting for the Liujiang River Basin. Using data from 2001–2010, the model outperforms standard LSTM, multiple linear regression, and SVR models, achieving the lowest errors (MAPE: 18.8%, MAE: 223.3 m
3/s, RMSE: 596.5 m
3/s) and the highest number of days with relative error within 20% (233 days). The attention mechanism improves the model’s ability to capture critical time steps, supporting effective water resource management and flood forecasting. These studies [
17,
18,
19] underscore the potential of data-driven, intelligent approaches to achieve greener, more efficient machining, aligning with modern manufacturing’s sustainability goals. Both papers [
20,
21] highlight the importance of balancing energy inputs with machining benefits to achieve sustainable manufacturing.
2. Materials and Methods
This study addresses the growing demand for energy efficiency and sustainability in manufacturing by developing predictive models for milling processes. Understanding and managing the power dynamics of milling operations, particularly the role of the main spindle and its braking system, has become crucial as industrial processes face increased demand to reduce energy consumption and environmental impact. This research aims to provide actionable insights for improving machine performance and reducing costs. The development of advanced machine learning models, such as Random Forest, Support Vector Regression, and Multi-Layer Perceptron, is driven by the potential to predict active power consumption accurately, thereby supporting smarter process planning and maintenance strategies in industrial settings.
Experimental Setup
On a conventional milling machine, the braking systems used to control the main spindle’s motion vary based on the machine’s design. A mechanical brake functions by applying physical force through components like brakes or pads against a drum or disc connected to the spindle, dissipating kinetic energy as heat via friction. An electromagnetic brake employs a magnetic field, generated by an electric current, to decelerate or stop the spindle, offering rapid response and reduced mechanical wear. A hydraulic brake utilizes pressurized fluid to actuate pistons, delivering precise and powerful braking force to the spindle, ideal for applications requiring fine control and high torque.
The experimental tests were conducted in the laboratories of the National University of Science and Technology Politehnica Bucharest. The tests were carried out with a milling machine. The FV 32 milling machine is a medium-sized machine tool manufactured by ICM Oradea, Oradea, Romania. The table dimensions are 325 mm and 1525 mm, and the milling machine has a 5 kW motor, voltage of 220–240 V, frequency of 60 Hz, no-load speed of 33,500 rpm and a chuck with a diameter of 6 mm. It is equipped with a single milling head, with a main shaft. The construction is relatively simple and rigid, intended for mass production. The data acquisition board used was National Instruments NI USB-6001 [
22] with the following relevant features: 8 analog input channels (14-bit, 20 kS/s), useful for measuring signals from sensors (vibration, pressure, or temperature sensors) mounted on the milling machine; 13 digital channels, which can be used to monitor the status of machine components or to control actuators; and 2 analog output channels for precise control (adjusting feed rate).
The
Figure 1 visually represent the process of collecting experimental data for power consumption and force measurements on the FV 32 milling machine.
The output parameters were measured using Fluke 1738 Three-Phase Power Quality Loggers (provided by Fluke Corporation, Everett, WA, USA) [
23]. Some data regarding the idle operation of the milling machine are presented in
Table 1.
In the first stage, for braking the main shaft of the machine tool, a shaft-type specimen was used, clamped in the machine tool spindle, the radial force being applied with the help of a semicircular element plated with brake pad, clamped in a dynamometer, mounted in the tool holder of the machine-tool. It was found that high-intensity braking can be achieved, without generating vibrations, thereby ensuring the stability of the technological system.
The results of this experiment demonstrated that the brake device configuration could achieve high-intensity braking without inducing unwanted vibrations, thus maintaining the stability of the technological system during operation. However, a limitation was identified: while the setup successfully halted the shaft’s rotation, it did not provide a reliable means to accurately measure the tangential friction force acting on the shaft and the speed variation range was relatively small. This drawback suggests that, despite its effectiveness in braking, this stressing mode may require further refinement or an alternative approach to fully characterize the frictional dynamics involved. A variable braking torque system was also proposed to adjust resistance based on speed, aiming to align idle power with industry standards (<0.3 kW). Experimental configurations designed to optimize the reduction of frictional resistance in mechanical systems operating at low rotational speeds, ranging from 37.5 to 475 rpm were investigated. In the case of high idle power consumption, the causes must be analyzed and measures taken to reduce this consumption. If the bearings and gears are not operating within normal parameters, power consumption will increase.
The new concept is reliable and simple, represented by a loading mechanism attached to the machine tool’s guideways. The axial displacement creates a friction force which determines a torque sensed and measured by an inductive transducer. The application of axial force facilitates the loading of technological systems on the front surface of the disc type specimen, mounted in the main shaft of the machine-tool. In order to apply axial force, a stressing device (
Figure 2) was designed which mainly consists of the base plate 1, the body 2, the axial bearing 3, the elastic element 4, the fork 5, the ring 6, the disk 7 and the plates 8. Through the base plate 1, the device is mounted on the table of the milling machine, and the attachment of a conical tail to this plate makes it possible to mount the device also on a lathe (in the spindle of the movable chuck). Through the axial movement of the device, the clutch friction material plates 8 come into contact with the front surface of the test piece 10. Due to the friction, this results in a torsional moment that tends to rotate the subassembly formed by the disk, the bolt 9, the ring and the forks. Between the disk and the body, the axial bearing is mounted to take over the axial forces that occur during stress. The rotation of the subassembly stresses the elastic element 4, its deformation being sensed by the inductive transducer 12 represented by the INSIZE 2134-10 Digital Indicator (supplied by INSIZE Co., Suzhou, China).
The determinations were made for different workloads on the main shaft and were obtained by varying the speed and the braking force. The electrical and power parameters (voltage, U, intensity, I, active power, P, apparent power, Pa, power factor, cos φ, and yield η) of the machine-tool drive motor were measured also using the Fluke 1738 Three-Phase. This method guaranteed a varied dataset of force-load interactions, precisely reflecting real-world cutting conditions while preserving a controlled and repeatable testing environment. The use of the device was not influenced by wear because it was compensated by a technical solution represented by the loading mode (axial displacement). When wear causes the abrasive pads no longer to fulfill their braking function, they will be replaced. The degree of wear of the pads will be determined experimentally, and an admissible limit will be established. For the tests performed and for the acquisition of the experimental data set, the device was mounted on two classic milling machine tools, thus ensuring a data set that avoids the problem of overfitting.
Table 2 showcases a representative sample of the measurements, emphasizing critical data points from the entire experimental dataset.
The input data for the FV 32 milling machine, encompassing both idle and loaded conditions, provide a comprehensive basis for analyzing the energy performance and operational characteristics of the machine, particularly in the context of its novel main spindle braking device. Utile Power (Pu), measured in kilowatts (kW), refers to the effective mechanical power used for milling, distinct from Active Power (P), which includes total electrical energy consumed by the machine. For loaded conditions, the spindle speed spans four discrete values of 37.5, 118, 235, and 475 rpm, while the cutting force ranges from 16.5258 to 164.7168 daN, corresponding to active power (P) values between 0.68 and 7.5 kW and power factor (cos φ) values from 0.180688 to 0.936958. In idle mode, the same spindle speeds are considered, with idle power ranging from 0.372 to 1.016 kW and power factor values estimated between 0.145 and 0.372, reflecting typical low values for milling machines. The braking device, designed to enhance spindle control, introduces additional mechanical resistance, significantly impacting idle power consumption compared to standard milling machines, which typically exhibit idle power below 0.3 kW.
In idle conditions, the power consumption exhibits a non-linear trend with respect to spindle speed. At 37.5 rpm, the idle power is 0.372 kW, increasing to 0.496 kW at 118 rpm, then slightly decreasing to 0.432 kW at 235 rpm, before rising sharply to 1.016 kW at 475 rpm. This dip at 235 rpm suggests a unique dynamic in the braking mechanism, possibly due to reduced frictional losses or optimized brake engagement at this speed, contrasting with the more consistent exponential increase observed in other machining systems. These losses are exacerbated by the braking device, which contributes to idle power values exceeding those of typical milling machines, highlighting an area for potential optimization in brake design to reduce energy inefficiency during non-productive phases.
Under loaded conditions, active power demonstrates a strong dependence on both spindle speed and cutting force. At the highest speed of 475 rpm, power consumption increases linearly from 1.7 kW at the lowest force of 16.5258 daN to 7.5 kW at the maximum force of 164.7168 daN. The linear relationship between force and power underscores the dominance of cutting forces in determining energy demand, with the braking device’s influence being minimal in loaded conditions, contributing less than 5% to total power (
Figure 3).
The nonlinear dependence of the consumed power on the speed, including a decrease at 235 rpm, is caused by the resonance phenomena or the gearbox engagement features. The power factor under load is significantly higher than in idle mode, ranging from 0.180688 at low speed and force (37.5 rpm, 16.5258 daN) to a peak of 0.936958 at 118 rpm and 128.3632 daN. At 475 rpm, cos φ varies from 0.421605 to 0.812827, exceeding 0.85 at higher forces (above 86 daN), suggesting efficient energy transfer under significant mechanical stress, comparable to high-load performance in lathe operations.
The braking’s device innovative design supports high Utile Power under load. At 475 rpm, Pu scales from 0.041 to 0.410 kW as force increases. At low speeds, such as 37.5 rpm, idle power (0.372 kW) is significantly lower than loaded power (0.68–1.87 kW), indicating that the device, while increasing non-productive energy demand, effectively converts electrical power into mechanical work during cutting. The high power factor at elevated forces validates the device’s design for heavy milling tasks, ensuring stable energy transfer. These results indicate that operating at intermediate speeds (118–235 rpm) with moderate to high forces maximizes energy efficiency. Future enhancements to the braking system will prioritize reducing idle power consumption to meet industry standards, enhancing the machine’s sustainability and operational performance.
3. Machine Learning Algorithms
The application of machine learning techniques to predict outcomes based on input parameters plays a critical role in optimizing industrial processes. These techniques enable the modeling of complex, non-linear relationships between input features (Speed, Force) and target variables such as Intensity, Utile Power, Apparent Power, Active Power, and Power Factor providing insights that traditional analytical methods may overlook. This accuracy allows for precise forecasting of Utile Power, enabling optimization of energy usage and machine performance in real-time. By predicting the Utile Power alongside derived metrics (surface roughness from related studies), machine learning supports multi-objective optimization, balancing power efficiency with quality outcomes. This is vital for reducing costs and improving product quality in manufacturing. These methods can incorporate additional variables (noise factors like wear) or adapt to larger datasets, making these methods scalable for industrial applications beyond the current dataset, such as real-time monitoring systems.
The calculated and predicted performance metrics for the machine learning models applied to the milling dataset include the following: The Coefficient of Determination (R2) measures the proportion of variance in Utile Power accounted for by the model. Values close to 1 indicate a robust model fit, demonstrating the model’s capacity to effectively capture underlying data patterns. Mean Squared Error (MSE) represents the average of the squared differences between predicted and observed Utile Power values, with lower values denoting higher model accuracy and greater sensitivity to larger prediction errors. Root Mean Squared Error (RMSE), calculated as the square root of MSE, expresses prediction error in kilowatts (kW), offering a more interpretable metric for assessing model performance and error magnitude.
Feature engineering was enhanced by applying Principal Component Analysis (PCA) to reduce multicollinearity among electrical features (Active Power, Apparent Power, Power Factor, Intensity) to two components capturing 95% variance, Recursive Feature Elimination (RFE) to select the most predictive features (Speed, Force, Active Power, Speed × Force), and standardization using StandardScaler.
Machine learning models were developed in Python 3.8.10 using libraries like scikit-learn, TensorFlow, and Keras, which provide robust tools for building and training algorithms. These models are coded with structured workflows, including data preprocessing, model fitting, and evaluation [
24,
25]. Linear Regression establishes a straightforward approach to modeling by assuming a direct linear correlation between input variables (Speed, Force) and the target variable, Utile Power. It employs the least squares method to fit a linear equation, minimizing the discrepancy between predicted and actual values. Polynomial Regression extends the linear framework by accommodating non-linear relationships [
26]. This method transforms input features into higher-order polynomial terms, enabling the model to fit a polynomial curve to the data, thus capturing more intricate patterns with enhanced flexibility. Support Vector Regression (SVR) leverages a kernel-based strategy to project data into a higher-dimensional space, identifying an optimal hyperplane for predicting Utile Power within a defined tolerance margin [
27,
28]. A grid search was conducted to explore the parameter space, with R
2 as the primary metric. The optimal configuration was C = 10, epsilon = 0.1, and gamma = ‘scale’. The relatively high C value allowed the model to fit the data closely while maintaining generalization, and gamma = ‘scale’ adapted the kernel to the feature scale of the dataset, enhancing performance on non-linear relationships. By utilizing a Radial Basis Function (RBF) kernel, SVR effectively addresses non-linear relationships in the data. K-Nearest Neighbors (KNN) operates as a non-parametric technique, predicting Utile Power by computing the average of the k closest data points in the feature space. Relying solely on proximity, KNN generates predictions without requiring explicit model training, making it intuitive and adaptable [
28]. Multilayer Perceptron (MLP) with Two Layers represents a neural network architecture comprising two hidden layers with 50 and 30 neurons, respectively. Through backpropagation, it learns complex non-linear relationships by adjusting weights and biases, enabling robust modeling of feature interactions. Multilayer Perceptron (MLP) with Three Layers enhances the two-layer MLP by incorporating an additional hidden layer (50, 30, and 20 neurons). This deeper architecture provides greater capacity to capture nuanced patterns, improving predictive performance for complex datasets [
29]. A random search was employed due to the computational cost of training neural networks, sampling 20 combinations of hyperparameters. The best configuration for the two-layer MLP was hidden_layer_sizes = (50, 30), max_iter = 500, learning_rate_init = 0.001, and activation = ‘relu’. For the three-layer MLP, the optimal setup was hidden_layer_sizes = (50, 30, 20), max_iter = 500, learning_rate_init = 0.001, and activation = ‘relu’. The ‘relu’ activation function was selected for its ability to handle non-linearities effectively, and the moderate number of neurons prevented overfitting on the relatively small dataset. Random Forest (RF) is an ensemble learning approach that integrates bootstrap aggregating (bagging) with randomized feature selection [
25]. By constructing multiple decision trees and aggregating their predictions, RF enhances accuracy and mitigates overfitting, offering superior performance compared to individual decision trees. The architecture is configured with n_estimators = 100, meaning that 100 individual decision trees are grown. Each decision tree is trained on a randomly selected subset of the training dataset, sampled with replacement. At each node split within a tree, a random subset of features is evaluated, which reduces inter-tree correlation and improves model generalization. No explicit maximum depth is specified, permitting trees to expand until all leaves are pure or contain fewer samples than the defined minimum. A grid search was performed over the combinations of these hyperparameters, evaluating performance using the mean squared error (MSE) as the scoring metric. The best configuration was found to be n_estimators = 100, max_depth = None, min_samples_split = 2, min_samples_leaf = 1, and max_features = ‘sqrt’. The choice of n_estimators = 100 balanced computational efficiency and predictive accuracy, while max_features = ‘sqrt’ ensured diversity in tree splits, reducing overfitting [
29]. Parameters ensure reproducibility by fixing the random seed for tree construction and sampling. Default settings for other hyperparameters (min_samples_split, min_samples_leaf) are used, which were optimized for general-purpose performance. A continuous prediction is produced for Utile Power based on the averaged tree predictions. The machine learning model excels in handling non-linear relationships, staying robust to outliers, and reducing variance through averaging. Computational cost increases with n_estimators, and the model may overfit if trees are too deep, though this is mitigated by random feature selection.
The Bagging (Bootstrap Aggregating) model is an ensemble technique that improves stability and accuracy by training multiple instances of the same base model on different bootstrap samples of the dataset and combining their predictions [
29]. Bagging uses RandomForestRegressor as the base estimator, creating a meta-ensemble with its own default settings (100 trees, random feature selection) is configured with n_estimators = 10, meaning 10 instances of the Random Forest are trained. The base Random Forest already incorporates random feature selection, and Bagging adds randomness by varying the training subsets, further decorrelating the models. Each Random Forest instance predicts Utile Power, and the final prediction is the average across the 10 ensembles. This ensures reproducibility for the bagging process and the underlying Random Forests. The default Random Forest configuration is utilized as the base learner. Default settings for max_samples and max_features are applied, meaning that the full feature set and a bootstrap sample size equal to the training set are used unless otherwise specified. A continuous prediction is produced for Utile Power by averaging the predictions from the 10 Random Forest instances. Bagging shows slightly better performance than a standalone Random Forest due to the additional ensemble layer. The model enhances stability and generalization by reducing variance further than a single Random Forest through averaging multiple instances. The computational cost is higher than a single Random Forest due to the training of 10 separate models, though mitigated by the lower n_estimators = 10 compared to the standalone RF’s 100 trees. The performance gain may be marginal if the base Random Forest is already well-tuned. Random Forest inherently includes bagging and adds random feature selection, making it a more specialized ensemble.
Bagging with Random Forest as the base estimator (10 instances) adds another layer of aggregation but, with fewer trees per instance, it may not capture as much diversity as the standalone RF unless the base estimator’s hyperparameters were optimized. The results suggest that Random Forest (R2 = 0.9752) slightly edges out Bagging (R2 = 0.9778), likely due to the additional averaging, though the difference is minor. Both models are trained on the same scaled dataset and evaluated on the test set. Their high R2 values indicate that they effectively model the non-linear relationships in the spindle load data. For Random Forest, adjusting max_depth or min_samples_split could prevent overfitting. For Bagging, increasing n_estimators or tuning the base Random Forest’s parameters (n_estimators = 50 per instance) might enhance performance. Analyzing feature importance from Random Forest could guide feature selection or engineering. Both models benefit from scaling, but robustness to unscaled data could be tested for real-world deployment. The Random Forest model, with 100 trees and random feature selection, provides a robust baseline, while the Bagging model, with 10 Random Forest instances, adds a layer of stability through further averaging. Both are well-suited for the dataset, with R2 values above 0.97.
Another model used is the PIRF–MLP model, a custom hybrid ensemble model designed to leverage the strengths of both a Random Forest and a Multi-Layer Perceptron neural network. This approach combines the robustness of tree-based methods with the flexibility of neural networks to enhance predictive accuracy for the target variable Utile Power. The model aggregates predictions from the RF and MLP components. The PIRF–MLP model uses a feature set (X) including Speed, Force, Intensity, Apparent Power, Active Power, and Power Factor, with Utile Power (y) as the continuous target variable in kW.
The PIRF–MLP model combines two base learners: a Random Forest (RF) and a Multi-Layer Perceptron (MLP). The RF component uses 50 decision trees, each trained on a bootstrap sample of the training data with random feature subsets to reduce overfitting and ensure reproducibility. The MLP component has two hidden layers with 50 and 25 neurons, respectively, enabling it to learn complex, non-linear relationships. There are a maximum of 500 training iterations (epochs) to converge. The parameter random_state = 42 ensures consistent initialization of weights. For optimisation a default stochastic gradient descent optimiser with backpropagation is used, adjusted internally based on loss convergence. The PIRF–MLP model defines its estimators using a list of tuples: [(‘rf’, rf_base), (‘mlp’, mlp_base)], where rf_base is the Random Forest and mlp_base is the MLP. The VotingRegressor combines predictions from multiple models by averaging them, improving overall stability, balancing the robustness of Random Forest with the adaptability of MLP. For the MLP component, the same tuning process as the standalone two-layer MLP was applied, selecting hidden_layer_sizes = (50, 25), max_iter = 500, and activation = ‘relu’. For the PIRF–MLP model, an adaptive weighting scheme was implemented, dynamically adjusting the contributions of Random Forest (RF) and Multi-Layer Perceptron (MLP) based on speed (low: ≤118 rpm, high: >118 rpm) and force (low: <50 daN, high: >100 daN) ranges, improving performance in condition-specific scenarios (18 % MSE reduction in high-speed conditions), yielding the best performance (MSE = 0.0238 kW2, R2 = 0.9866). This configuration balanced the robustness of RF with the non-linear learning capacity of MLP, achieving superior accuracy.
The PIRF–MLP hybrid model reduces Random Forest’s bias toward tree-based decisions and MLP’s sensitivity to initialization or local minima, improving prediction stability and accuracy. It performs well on the dataset but may face scalability issues with larger datasets due to MLP’s computational cost. The default settings include 50 trees for Random Forest and 500 iterations for MLP. The predicted performance metrics for all the methods analyzed are presented in
Table 3.
Linear Regression, with an MSE of 0.1824 kW2, RMSE of 0.4271 kW, and R2 of 0.9162, provides a baseline performance but is outperformed by more complex models due to its inability to capture non-linear relationships inherent in the data. Polynomial Regression (MSE = 0.1508 kW2, RMSE = 0.3883 kW, R2 = 0.9311) improves upon this by modeling non-linear trends, though its performance remains moderate. SVR (MSE = 0.1309 kW2, RMSE = 0.3618 kW, R2 = 0.9403) offers a robust alternative, effectively handling high-dimensional data, yet it is surpassed by ensemble and neural network approaches.
Ensemble methods, including Random Forest (MSE = 0.0491 kW
2, RMSE = 0.2215 kW, R
2 = 0.9778), and Bagging (MSE = 0.0549 kW
2, RMSE = 0.2343 kW, R
2 = 0.9752), exhibit superior predictive accuracy, leveraging the diversity of decision trees and neighborhood-based learning to reduce variance and improve generalization. Among neural network models, MLP with 2 layers (MSE = 0.0658 kW
2, RMSE = 0.2565 kW, R
2 = 0.9702) and 3 layers (MSE = 0.0633 kW
2, RMSE = 0.2516 kW, R
2 = 0.9715) demonstrates competitive performance, with the additional layer slightly enhancing model capacity to capture complex patterns. The hybrid PIRF–MLP model emerges as the most effective, achieving the lowest RMSE of 0.1543 kW, and the highest R
2 of 0.9866. This approach combines the probabilistic uncertainty estimation of Random Forest with the non-linear refinement of MLP, offering a significant improvement over standalone methods. The results (
Table 4) underscore the efficacy of integrating ensemble and neural network techniques for enhanced prediction in mechanical engineering applications, particularly where Utile Power prediction is critical for optimizing milling machine efficiency.
The 5-fold cross-validation provided a more robust estimate of model performance compared to the 80/20 split, with PIRF–MLP achieving a mean R2 of 0.9862, slightly lower than the original 0.9866, indicating a conservative estimate of generalizability and effective capture of the dataset’s patterns. Cross-validation reduced the risk of overfitting observed in the single train-test split, ensuring that reported performance metrics are more representative of real-world scenarios. The PIRF–MLP model shows a 7.80% improvement in prediction accuracy compared to Linear Regression, the clasical baseline.
Grid search or cross-validation could optimize n_estimators, hidden_layer_sizes, or max_iter. The hybrid’s effectiveness depends on the data’s complexity. The PIRF–MLP model was generated by integrating a 50-tree Random Forest and a two-layer MLP (50, 25 neurons) into a VotingRegressor, trained on scaled data with a focus on averaging predictions. Higher values (closer to 1) for R
2_mean indicate better model fit. This parameter is larges in the PIRF–MLP model. Lower values for MSE_mean and RMSE_mean indicate better prediction accuracy. RMSE is in kW, making it easier to interpret than MSE (kW
2). Low standard deviations in R
2 0.0024 suggest that PIRF–MLP maintains consistent performance across different data subsets, enhancing its reliability for industrial applications. The results of the cross-validation process are detailed in
Table 5.
Although the size of the dataset may seem limited, this was dictated by the practical constraints of the experimental setup, which involved high-precision measurements of tangential friction forces and electrical parameters (spindle speed, force, active power, power factor) in a controlled environment. The number of observations was selected to cover a representative range of spindle speeds (37.5, 118, 235 and 475 rpm) and cutting forces (from 16.5258 to 164.7168 daN), reflecting real milling conditions, while maintaining experimental repeatability and control. This dataset size, although compact, was sufficient to capture the non-linear relationships between input parameters and utile power, as demonstrated by the high performance of the PIRF–MLP model. The use of 5-fold cross-validation ensured a robust assessment of the models’ performance, minimizing the risk of overfitting and compensating for the limited data size by efficiently exploiting the available information. In addition, the dataset was supplemented with idling observations, providing additional context for analyzing the machine behavior under various conditions. The limitation of the number of observations was also influenced by the available resources, including the complexity of the brake setup, which required precise calibration of the torque sensor for each tested condition. To better understand energy consumption dynamics, the Speed–Force interaction impact on Utile Power in mechatronic milling system is non-linear, with higher speeds amplifying force’s effect, contributing 34% to Utile Power variance. Optimal efficiency occurs at intermediate Speed–Force combinations (118–235 rpm, 40–86 daN), yielding 0.3985–2.2789 kW with high power factors.
On the basis of the script created in Python, several types of charts were generated for easy understanding of the predictions made for the model found suitable (
Figure 4). Actual vs. Predicted Utile Power scatter plot compares actual versus predicted Utile Power values, and indicates that the PIRF–MLP model predictions are highly accurate, with minimal deviation, reflecting a strong R
2 value. Residual Distribution shows a symmetric, unimodal shape centered around zero with prediction errors ranging from approximately −0.2 kW to 0.2 kW, indicating minimal bias and high predictive accuracy for Utile Power across the test set samples, reinforcing the model’s precision. The median residuals are near zero across all categories, with wider spreads for higher influence features, suggesting that highly influential features contribute to slightly larger but still manageable errors. Predicted vs. Actual Utile Power Over Test Set compares actual and predicted values across test samples. A histogram of residuals shows the distribution of prediction errors. The learning curve for PIRF–MLP (Adaptive Weights) shows that both training and validation MSE decrease sharply initially with increasing training set size, stabilizing at around 0.02 kW after 100 samples, indicating good model convergence. The residuals vs. predicted values plot for Utile Power shows a random scatter around the zero residual line across the predicted range of 2.0 to 5.5 kW, suggesting no significant bias or systematic error in the model predictions. Feature Importance (Random Forest) indicate that Speed and Force exhibit the highest contributions, reflecting their critical roles in the mechanical and operational context, while the interaction term (Speed × Force) also shows significant influence, suggesting a non-linear relationship. Intensity, Apparent Power, and Active Power contribute moderately, aligning with their roles in the electrical domain.
For an input with Speed = 118 rpm, Force = 40.315 daN, and other features, the RF predicts Utile Power as 0.39 kW, the MLP predicts 0.41 kW, and the VotingRegressor averages predictions from RF and MLP using 5-fold cross-validation and outputs 0.40 kW as the final prediction. Overall, the visualizations demonstrate the PIRF–MLP model’s robustness, with high predictive accuracy and well-distributed errors, validated across the dataset. These findings suggest that, for high-stakes predictive tasks in machine learning, hybrid models like PIRF–MLP should be prioritized when computational resources permit, given their superior accuracy and robustness. Future research could explore hyperparameter optimization, larger datasets, or additional features (temperature or tool wear) to further refine these models, potentially extending their applicability to real-time industrial monitoring systems. The innovative braking device for the milling machine and the PIRF–MLP model were successfully mounted and tested on an additional machine tool, a normal lathe SNA 560 × 2000, to evaluate their adaptability and scalability across different machining platforms. The PIRF–MLP model and the novel spindle braking device, originally developed for the FV 32 milling machine, were successfully adapted and tested on the SNA 560 × 2000 lathe, demonstrating robust performance and scalability for lathe operations. The braking device was integrated into the lathe’s spindle system, maintaining precise control over frictional forces with minimal vibrations across a speed range of 50–1000 rpm. Under loaded settings, 100 observations were collected, including cutting forces ranging from 10 to 150 daN and spindle speeds consistent with standard lathe operations. With an R2 of 0.9765, an MSE of 0.0351 kW2, and an RMSE of 0.1873 kW, the PIRF–MLP model that was trained on this dataset performed similarly to that of the milling machine (R2 = 0.9789). Due to the lathe’s wider speed range, Speed (0.139) had a marginally greater influence than in milling, although Apparent Power (0.542) and Intensity (0.215) continued to be the leading predictors of Utile Power, according to feature importance analysis. With a mean R2 of 0.9752 and a low standard deviation (0.0081), the model’s 5-fold cross-validation verified consistent performance and demonstrated strong generalizability. Operational differences were minimal, with the braking device maintaining idle power at 0.35–0.95 kW, comparable to milling results but slightly lower due to the lathe’s simpler spindle dynamics.
4. Discussion
The work contributes to the field by developing and comparing multiple machine learning approaches. The dataset was first split into the training set (80%) and test set (20%) then evaluated with 5-fold cross-validation using train_test_split with random_state = 42 for reproducibility, ensuring that preprocessing was performed only on the training set to avoid data leakage. Standard Scaler was the primary method, as it suits the Gaussian-like distribution of many features and supports the PIRF–MLP’s hybrid structure. MinMax Scaler have been tested for comparison, but its use was likely limited to specific models or sensitivity analyses, given the preference for standardization in ensemble and neural network context. The preprocessing ensured that the PIRF–MLP’s Random Forest leveraged normalized feature importance (Apparent Power at 0.559) and the MLP optimized weight updates, contributing to the model’s high performance. No automatic outlier removal (Z-score thresholding beyond 3 standard deviations) was universally applied, due to the dataset size, where losing data could reduce model generalizability. Instead, the PIRF–MLP’s Random Forest component, known for robustness to outliers, helped mitigate their impact during training, while the MLP component was supported by the scaled data to minimize distortion. Post-correction, the model’s performance improved, as evidenced by the residual plots and error distribution, confirming that outlier treatment enhanced prediction accuracy without necessitating extensive data truncation.
Hyperparameter tuning significantly improved model performance compared to default settings. The untuned Random Forest (default scikit-learn parameters) achieved an R2 of 0.9756, while the tuned version reached 0.9772. Similarly, the tuned PIRF–MLP improved R2 from 0.9778 (default) to 0.9862, highlighting the importance of optimization. The tuning process also revealed that feature scaling (StandardScaler) was critical for SVR and MLP, while Random Forest was less sensitive. The selected hyperparameters balanced model complexity and generalization, ensuring reliable predictions for Utile Power across varied milling conditions. This tuning methodology enhances the credibility of the results by demonstrating a systematic approach to model optimization, ensuring that the reported performance metrics reflect the best possible configurations for the given dataset.
Another technique for data analysis RF + GB combines Random Forest with Gradient Boosting to sequentially correct errors, whereas PIRF–MLP integrates Random Forest (50 trees) with a two-layer MLP (50, 25 neurons) via a VotingRegressor, averaging predictions probabilistically. The PIRF–MLP’s use of MLP’s non-linear learning capacity alongside RF’s ensemble diversity offers a dual probabilistic approach, unlike RF + GB’s sequential error correction. This results in an R2 of 0.9862 and MSE of 0.0242 kW2, potentially outperforming RF + GB’s R2 of 0.975, as the MLP refines complex patterns (Speed–Force interactions) that boosting might miss. The voting mechanism provides stability across varied inputs, a feature less emphasized in RF + GB, enhancing robustness validation. Another variant is Stacking Ensemble (Random Forest + Support Vector Machine + Multi-Layer Perceptron) uses a meta-learner to combine predictions from diverse base model, while PIRF–MLP directly fuses RF and MLP outputs without a separate meta-model. PIRF–MLP’s streamlined hybrid design reduces computational overhead compared to stacking, achieving an Adjusted R2 of 0.9862 with only two components. Stacking might yield similar R2 but requires more complex tuning and risks overfitting with multiple models, whereas PIRF–MLP balances simplicity and performance.
Regardless of the temperature in the braking area, the axial loading force was applied at values that produced the torsional moment required by the experimental program. While the current dataset lacks temperature data, future studies will incorporate temperature sensors to quantify these effects, enabling the development of a temperature-compensated model or the inclusion of thermal variables in the feature set to ensure long-term reliability in industrial milling environments.
The braking system reduces idle power by 41–72% (to 0.20–0.28 kW), while MQL lowers cutting power by 10–20% [
13]. MQL minimizes coolant use, and the braking system cuts electrical consumption. A hybrid approach combining both could optimize total energy and machining performance. While the PIRF–MLP model demonstrates superiority over Random Forest with an R
2 of 0.9862 compared to 0.9772, the modest improvement in predictive accuracy must be balanced against its higher computational cost, particularly for real-time industrial applications where processing speed and resource constraints are critical. The hybrid PIRF–MLP model highlights the potential of ensemble techniques in capturing complex non-linear relationships. The current study achieves a higher predictive accuracy with an R
2 of 0.9862 using the PIRF–MLP model, compared to Nugrahanto et al. and their Decision Tree and Taguchi approach [
11], which lacks specific accuracy metrics but focuses on optimizing spindle speed for energy efficiency in five-axis CNC milling. The study also achieves a higher predictive accuracy using the PIRF–MLP model, compared to the Random Forest-based approach of Brillinger et al. [
30], which lacks specific R
2 values but leverages real production data for practical CNC energy predictions.