1. Introduction
The transition to intelligent manufacturing demands higher machining accuracy. As the core component of CNC machine tools, the motorized spindle is a primary source of thermal error, accounting for 40%~70% of the total machining error [
1,
2,
3]. During prolonged operation, the motorized spindle experiences significant temperature increases, which degrade machining accuracy and escalate maintenance and operational costs. Consequently, the ability to accurately predict and compensate for the thermal error of motorized spindles has emerged as a critical challenge in enhancing machining precision.
Thermal error compensation is an effective method to mitigate the impact of thermal errors and enhance the machining accuracy of machine tools [
4,
5,
6]. Reducing redundancy and collinearity in temperature data while developing a thermal error prediction model with high precision and robustness is critical for successfully implementing thermal error compensation techniques. Although prediction model technologies have progressed, accurately predicting thermal errors remains challenging due to the complex nonlinear relationships between multiple temperature points and spindle thermal behavior. Existing models often struggle to capture these complexities or require substantial computational resources, making it difficult to balance predictive performance and computational efficiency.
Currently, researchers have conducted extensive studies on thermal error prediction and compensation techniques for motorized spindles [
7,
8,
9,
10], with physical modeling and data-driven methods dominating the field. Physical modeling methods rely on an in-depth understanding of heat source distribution, heat transfer mechanisms, and environmental conditions, providing an explanation for the mechanisms of thermal error generation and being suitable for relatively simple operating conditions [
11]. However, the applicability of such methods is significantly limited under dynamic and complex working conditions. In contrast, data-driven methods build predictive models by learning patterns from experimental data without explicitly establishing heat transfer equations, demonstrating robust nonlinear modeling capabilities [
12,
13,
14]. In recent years, the rapid advancement of deep learning has led to its widespread adoption in the industrial field [
15,
16,
17]. Neural network-based thermal error prediction models have gained attention, excelling in predictive accuracy due to their strong capabilities in modeling temporal and spatial features. However, existing data-driven methods still face several challenges: the selection of temperature points often relies on manual experience or simple statistical methods, resulting in redundant input features, hyperparameter tuning is inefficient, increasing the computational cost of the models [
18,
19,
20,
21]. Furthermore, the generalizing ability and stability of the models under multiple operating conditions need further improvement.
This study makes three primary contributions to spindle thermal error prediction modeling. First, regarding feature selection, this combination is specifically employed because HDBSCAN can effectively identify clusters of varying density and isolate noise in temperature field data, while PSO efficiently optimizes its key parameters to ensure the selected temperature clusters exhibit maximum correlation with the thermal error. By combining particle swarm optimization and adaptive density clustering, this method successfully identifies the temperature points most significantly affecting thermal error, effectively reducing temperature data redundancy and enhancing the quality of input features. Second, regarding model architecture and optimization, to efficiently optimize the hyperparameters of the complex DLTK network, this study develops the RBMO-X-DLTK hybrid convolutional neural network model, which employs the RBMO-X algorithm—an enhanced optimizer designed for robust global search and stable convergence in high-dimensional spaces. The DLTK network integrates multiple advanced modules (e.g., 1D deformable convolution, LSTM, Transformer, Fourier KAN) for comprehensive feature extraction. The proposed RBMO-X optimizer is novel for its hybrid design, integrating deep learning training stabilizers (SGDR scheduler and gradient clipping) directly into the metaheuristic loop (see
Section 4.1), which enables stable tuning and contributes to the model’s superior performance, such as a >30% reduction in RMSE compared to benchmarks. Third, as the integrated outcome, the core innovation of this paper lies in proposing and validating a novel, integrated pipeline that systematically addresses the need for efficient feature selection and an optimized, robust predictive model by combining an optimized temperature-sensing strategy with a specially designed and optimized hybrid neural network, thereby achieving superior accuracy and robustness in spindle thermal error prediction.
The remainder of the paper is organized as follows:
Section 2 introduces related works.
Section 3 proposes a temperature selection method based on PSO-optimized HDBSCAN.
Section 4 presents the RBMO-X optimized DLTK hybrid neural network.
Section 5 details the experimental setup, analyzes the performance of temperature point selection and optimization, and compares the results with other models.
Section 6 validates the model through real-world compensation experiments.
Section 7 discusses the experimental results in detail.
Section 8 concludes this study and suggests potential future work.
2. Related Work
Efficient selection of temperature points highly correlated with thermal errors is critical for enhancing the performance of prediction models. In recent years, researchers have conducted extensive studies on temperature point selection, the application and optimization of deep learning, and adaptability under dynamic conditions. Early temperature point selection methods focused on improving clustering algorithms, such as K-Harmonic Means (KHM) [
22] Correlation Coefficient Variation Determination Factor (CCVDF) [
23], and Synthetic Temperature Information (STI) [
24] While these methods improved selection accuracy to some extent, they often suffer from sensitivity to initial conditions, weak adaptability to dynamic conditions, or high computational complexity, limiting their real-time application.
The introduction of optimization algorithms, such as the Binary Bat Algorithm (BBA) [
25] and the Improved Binary Grasshopper Optimization Algorithm (IBGOA) [
26], marked a breakthrough by directly optimizing feature subsets for prediction accuracy. However, these methods often struggle with computational efficiency in high-dimensional spaces or tend to converge to local optima, leading to unstable selection results.
While the aforementioned studies have advanced the field, they reveal several common challenges. Methods based on static statistical analysis or conventional clustering often lack adaptability to dynamic thermal conditions and may select redundant or pseudo-correlated points. Although optimization algorithms improve accuracy, they can suffer from high computational cost in high-dimensional spaces or a tendency to converge to local optima, leading to unstable selection results. Furthermore, the selection process is often decoupled from the final prediction model, lacking a unified optimization objective.
The development of deep learning has brought groundbreaking advancements to thermal error modeling. For example, Zhao et al. [
27] invented a thermal error prediction method for the ball screw feed system of CNC machine tool, capable of predicting the heat generation rate, temperature distribution, and thermal error of the ball screw feed drive system. The proposed Adaptive Real-Time Model (ARTM) can predict the thermal error of the ball screw feed drive system but has limitations in addressing the complexity of spindle thermal error. To enhance model performance, researchers have begun exploring the integration of deep learning and optimization algorithms. For instance, Li et al. [
28] optimized BP neural network parameters using Improved Particle Swarm Optimization (IPSO), which improved prediction accuracy but still faced issues such as high computational complexity and a tendency to fall into local optima. To overcome these shortcomings, Li et al. [
29] proposed the Beetle Antennae Search (BAS) algorithm to optimize BP neural network-based thermal error prediction models for motorized spindles. Experiments confirmed that BAS-BP models achieved higher prediction accuracy than BP and GA-BP models at different speeds. However, these methods primarily focus on single-model optimization, neglecting systematic studies on input feature selection and adaptability to dynamic conditions. To address this, Li et al. [
30] further proposed an optimized Extreme Learning Machine (MPA-ELM) based on the Marine Predators Algorithm, designed to predict thermal displacement in motorized spindle models. They compared the accuracy of ELM, MPA-ELM, and GA-ELM (Genetic Algorithm-optimized ELM). Experimental data demonstrating that MPA-ELM achieved superior prediction accuracy. However, the issue of low efficiency in input feature selection persists.
To enhance the adaptability and robustness of deep learning models under dynamic conditions, Gao et al. [
31] optimized the LSTM network using PSO, significantly improving the predictive performance of thermal error models. Compared to traditional RBF and BP models, the PSO-LSTM model demonstrated superior accuracy and robustness, particularly in environments with strong nonlinear dynamic characteristics. However, as model complexity increases, this method becomes heavily reliant on hyperparameter selection, limiting its scalability under diverse dynamic conditions. To further address the limitations of temperature point selection under dynamic conditions, Du et al. [
32] developed a thermal error prediction model based on temperature-sensitive point recognition using an attention mechanism. This model adaptively focuses on the importance of different temperature points, avoiding the drawbacks of traditional methods that rely on manual temperature point selection and significantly improving prediction flexibility. However, due to the complex computation of the attention mechanism, this method is less adaptable to scenarios with high real-time requirements, and the interpretability of attention weights requires further investigation. Wu et al. [
33] utilized CNNs combined with thermal images and thermocouple data to model and predict the temperature field under dynamic conditions, demonstrating high accuracy and robustness. However, this method relies on high-quality thermal image input, making it susceptible to image noise and hardware device accuracy, which limits its application in low-cost environments. To address the spatiotemporal characteristics of thermal error, Guo et al. [
34] proposed a Spatiotemporal Correlation Hybrid Model (ST-CLSTM) based on CNN and LSTM. However, this method faces a risk of performance degradation when handling long time series and is heavily reliant on hyperparameter tuning.
In the direction of model optimization, Gao et al. [
35] utilized the Pelican Optimization Algorithm (POA) to optimize the CNN-LSTM model, enhancing its performance under multiple working conditions. The introduction of POA effectively improved the model’s adaptability and robustness under complex conditions. However, due to the algorithm’s random search characteristics, its convergence speed is slow, reducing its efficiency in large-scale data scenarios. Additionally, Li et al. [
36] proposed GWO-LSSVM thermal error modeling, which exhibited higher accuracy and modeling efficiency compared to traditional models. However, this method primarily focuses on static conditions and is sensitive to input feature fluctuations under dynamic conditions, limiting its performance in practical applications. To address these issues, Fu et al. [
37] proposed thermal error modeling based on CNN optimized by the Mayfly Algorithm, achieving accurate spindle thermal error prediction under various conditions. Experiments showed that this method performed well under various conditions, but balancing selection efficiency and prediction performance when handling high-dimensional feature data requires further optimization. For modeling in complex dynamic environments, Yang et al. [
38] proposed a CNN-GRU model combined with a Subtractive Averaging Based Optimizer (SABO). It demonstrated superior performance across multiple error metrics (e.g., MAE, RMSE). However, as model complexity increased, training time significantly extended, and the high preprocessing requirements for input data increased the deployment cost. Meanwhile, Dai et al. [
39] proposed a CS-Elman model optimized by the Cuckoo Search algorithm, which further improved prediction stability. However, the CS-Elman model has high computational complexity, and the local search capability of the algorithm limits further model improvements.
Despite their success, existing deep learning models for thermal error prediction exhibit notable shortcomings. Models that heavily rely on manual hyperparameter tuning incur significant computational overhead. While complex hybrid architectures capture spatiotemporal features, their performance is sensitive to input fluctuations and often requires extensive data preprocessing. Moreover, many studies focus on either improving the model architecture or the input features, neglecting the synergistic potential of co-optimizing both within an end-to-end framework.
Most recently, research in 2025 has begun to emphasize the need for integrated solutions that concurrently address efficient feature selection and robust, adaptive model design to tackle the complex spatiotemporal characteristics of thermal error under dynamic conditions [
40]. This trend toward co-design highlights the importance of a systematic pipeline but also indicates that achieving an optimal balance between accuracy, efficiency, and generalization remains an open challenge.
In summary, significant progress has been made in motorized spindle thermal error modeling in areas such as temperature point selection, deep learning model optimization, and adaptability to dynamic conditions. However, a critical synthesis of the literature, informed by the limitations discussed above, reveals three interconnected research gaps that limit the development of a robust and efficient prediction system:
- (1)
Existing methods exhibit a trade-off between selection efficiency and accuracy when handling high-dimensional complex feature data, failing to fully capture the nonlinear relationships between temperature points and thermal errors. Additionally, some optimization algorithms are prone to local optima in large-scale data scenarios, leading to unstable selection results.
- (2)
While deep learning methods (e.g., LSTM, CNN-LSTM, ST-CLSTM) have demonstrated excellent performance in thermal error modeling, there is still significant room for improvement in module collaboration and hyperparameter optimization. The design and tuning of complex models often require substantial computational resources, increasing the difficulty of practical applications.
- (3)
Existing models achieve high predictive accuracy under specific conditions but lack adaptability under multi-condition scenarios. In particular, their predictive performance is highly susceptible to input feature fluctuations in dynamic thermal environments.
This study is designed to bridge these specific gaps cohesively. The integration of metaheuristic optimization algorithms with clustering techniques, such as PSO with K-Means or GA with DBSCAN, has been explored in other fields for feature selection and pattern recognition. However, within the specific domain of motorized spindle thermal error modeling, the tailored integration of PSO with HDBSCAN for optimal temperature-sensitive point selection remains underexplored. This gap is notable because the PSO-HDBSCAN combination is particularly suited to address the aforementioned challenges: HDBSCAN’s ability to identify clusters of varying density and reject noise directly tackles the issue of redundant and pseudo-correlated temperature points under dynamic conditions, while PSO’s parameter optimization ensures the clustering outcome is explicitly guided by thermal error correlation, enhancing selection efficiency and accuracy. Therefore, proposing and validating this tailored pipeline for spindle thermal error prediction constitutes a novel and necessary contribution to the field. Addressing these issues requires an efficient and robust temperature point selection method, along with a deep learning modeling and optimization strategy that balances accuracy and efficiency, which is the focus of this study. The proposed PSO-HDBSCAN method effectively achieves efficient temperature point selection. Additionally, the RBMO-X-DLTK model proposed in this study optimizes the hyperparameters of DLTK using RBMO-X, further enhancing the model’s predictive accuracy and computational efficiency. This combined approach, therefore, constitutes a novel and integrated solution that directly addresses the gaps in efficient feature selection, optimized model architecture, and systemic integration, offering a robust and efficient pipeline for spindle thermal error prediction.
3. PSO-HDBSCAN Temperature Point Selection Method
To address the temperature point selection problem, we propose a novel PSO-HDBSCAN method. It leverages Particle Swarm Optimization (PSO) to optimize the key parameters of the HDBSCAN clustering algorithm, ensuring the selected temperature points are highly correlated with thermal error while minimizing data redundancy.
Particle Swarm Optimization (PSO) is a swarm-intelligence-based global optimizer for complex multi-dimensional problems [
41,
42,
43]. PSO is chosen for its efficiency in continuous parameter optimization and stable convergence properties, which are well suited for this task. The standard PSO procedure adapted for our parameter tuning is outlined in Algorithm 1.
| Algorithm 1 PSO (, , ) |
| 1: Initialize_Swarm: |
| 2: for each particle i to do |
| 3: |
| 4: ← random_velocity () |
| 5: |
| 6: |
| 7: end for |
| 8: |
| 9: while termination criterion not met do |
| 10: for each particle to do |
| 11: Update_Velocity_Position: |
| 12: |
| 13: |
| 14: |
| 15: |
| 16: if then |
| 17: |
| 18: |
| 19: |
| 20: end if |
| 21: end for |
| 22: |
| 23: |
| 24: end while |
| 25: return |
Algorithm 1 details the standard Particle Swarm Optimization (PSO) procedure. It operates on a swarm of candidate solutions (particles), where each particle has a position (a potential solution in the search space) and a velocity . The algorithm maintains two key memory traces for each particle: its personal best position to date () and corresponding best fitness (). while the swarm shares a global best position (). found by any particle. The core iteration (Lines 10–25) updates each particle’s velocity by combining its previous momentum (weighted by inertia coefficient ), a cognitive component toward , and a social component toward (Line 13). The new position is then computed (Line 14), and after evaluating its fitness (Line 16), and are updated if improvements are detected (Lines 18–20, 23–24). This process repeats to drive the swarm toward promising search space regions until a termination criterion is satisfied.
HDBSCAN is an improved version of DBSCAN [
44], a density-based clustering algorithm, automatically determines the number of clusters and identifies noise. This makes it suitable for adaptive selection of temperature-sensitive points from our dataset. Its procedure is detailed in Algorithm 2.
| Algorithm 2 HDBSCAN (, , hierarchy) |
| 1: |
| 2: for each point in data points do |
| 3: distance to k-th nearest neighbor of |
| 4: end for |
| 5: |
| 7: for each pair of points do |
| 8: |
| 9: end for |
| 10: Construct graph where edge weight |
| 11: |
| 12: |
| 13: hierarchy |
| 14: |
| 15: clusters, noise |
| 16: return clusters, noise |
Algorithm 2 details the HDBSCAN procedure for clustering temperature points. It first computes the core distance for each point to measure local density (Lines 1–4). Using these, the mutual reachability distance between every point pair is calculated to form a density-adjusted graph, from which a minimum spanning tree () is built (Lines 7–11). The is then converted into a hierarchical cluster tree (hierarchy) (Lines 12–13). Finally, the most stable clusters are extracted from this hierarchy, while remaining points are labeled as noise (Lines 14–16). This process automatically identifies groups of points with similar density and separates outliers, effectively reducing redundancy and selecting representative temperature-sensitive points for the thermal error model.
The PSO-HDBSCAN integration dynamically optimizes key HDBSCAN parameters (e.g., minimum samples, cluster size) to maximize the correlation between identified temperature clusters and the thermal error. This integrated process, illustrated in
Figure 1, is formalized in Algorithm 3.
| Algorithm 3 PSO-HDBSCAN (, , |
| 1: Initialize: |
| 2: |
| 3: Define parameter bounds |
| 4: for do |
| 5: |
| 6: |
| 7: |
| 8: |
| 9: end for |
| 10: |
| 11: |
| 12: while do |
| 13: for do |
| 14: |
| 15: |
| 16: |
| 17: |
| 18: if then |
| 19: if then |
| 20: |
| 21: |
| 22: |
| 23: |
| 24: |
| 25: end for |
| 26: |
| 27: end while |
| 28: |
| 29: |
| 30: return , |
Algorithm 3 describes the integrated PSO-HDBSCAN optimization. The swarm is initialized with random HDBSCAN parameters (lines 4–11). In each iteration, every particle’s position (a parameter set) is used to cluster the temperature data via HDBSCAN, and the fitness is computed as the maximum Pearson correlation between any cluster and the thermal error (lines 15–17). Personal and global best positions are updated if higher fitness is found (lines 19–21). The swarm then updates velocities and positions via standard PSO equations, with parameters bounded within the search space (lines 23–28). After iterations, the algorithm returns the optimal parameter set and the corresponding temperature cluster (lines 28–30). This pipeline systematically tunes HDBSCAN to select the most representative temperature-sensitive points.
The PSO-HDBSCAN framework mitigates the sensitivity of manual parameter tuning by dynamically optimizing HDBSCAN parameters. The optimization is guided by a fitness function that directly maximizes the Pearson correlation between temperature clusters and thermal error, ensuring the selection is objective and data-driven. Reasonable search bounds for the parameters are defined in
Section 5.1.
5. Experimental Results and Comparative Analysis
In the previous sections of this study, the PSO-HDBSCAN temperature point selection method and the hybrid convolutional neural network RBMO-X-DLTK were proposed to address the spindle thermal error prediction problem. These methods aim to improve prediction accuracy and reduce data redundancy by filtering and optimizing complex temperature time-series data. To validate the effectiveness of these methods, a series of experiments were designed in this section to evaluate the model’s prediction accuracy, generalization ability, and computational efficiency.
5.1. Experimental Design
First, the clustering performance of the PSO-HDBSCAN algorithm is evaluated. In this process, the Particle Swarm Optimization (PSO) algorithm serves as the optimizer with a fixed, well-established configuration. Its task is to find the optimal set of parameters for the HDBSCAN clustering algorithm within predefined search bounds, maximizing the Pearson correlation between temperature clusters and the thermal error. The configuration of the PSO optimizer and the search bounds for the HDBSCAN parameters are detailed in
Table 2 and
Table 3.
The parameter set for the PSO optimizer (
ω = 0.729,
c1 =
c2 = 1.494) is the standard configuration derived from the constriction factor model, which has been proven to ensure optimal convergence behavior by effectively balancing exploration and exploitation [
41]. This set of values is widely adopted in the field. The swarm size of 20 is a common choice for medium-scale optimization problems, providing a good balance between diversity and computational cost. The maximum iteration of 100 was confirmed to be sufficient for convergence in our preliminary tests, where the fitness (i.e., Pearson correlation) stabilized well before this limit. It is worth noting that the performance of the overall PSO-HDBSCAN method is primarily sensitive to the resulting HDBSCAN parameters (which we optimize) rather than the specific PSO configuration within reasonable defaults. The use of the aforementioned standard and robust PSO parameters ensures that the optimization process itself is not a source of instability or variance in our feature selection results.
The search ranges for HDBSCAN parameters are set to cover typical operational values while avoiding pathological settings. The Minimum Samples and Minimum Cluster Size range from 2 to 20, which is appropriate for identifying meaningful clusters from our temperature sensor data without forming trivial or overly broad groups. The Density Propagation and Cluster Separation parameters use ranges (0.5–2.0 and 0.0–0.5, respectively) recommended in the HDBSCAN literature [
44] to allow flexible adaptation to the varying density characteristics of the thermal field data.
Next, the RBMO-X-optimized DLTK model is built for thermal error prediction based on the selected optimal temperature points. The hyperparameter search space for the DLTK network is specified in
Table 4.
The bounds for the DLTK network hyperparameters are designed to encompass commonly effective ranges for deep learning models while being constrained for efficient optimization. For instance, the learning rate range (1 × 10
−5–1 × 10
−2) covers typical values from fine-tuning to aggressive learning. The ranges for architectural hyperparameters (e.g., LSTM dimensions: 32–128; Transformer heads: 2–8) are chosen based on the complexity of the thermal error prediction task and common model sizes in related sequence modeling studies [
49,
50]. The inclusion of common loss functions (RMSE, MSE, MAE) allows the optimizer to select the most suitable objective for error regression.
Subsequently, the optimized RBMO-X-DLTK model will be compared with other thermal error prediction models using various evaluation metrics.
All experiments and model comparisons were conducted on a consistent hardware setup (Intel i9-13700K, NVIDIA RTX 4090) using Python 3.12.1 with PyTorch 2.2.0, ensuring fair evaluation.
The design of the experimental setup and data acquisition process refers to the ISO 230-3 standard [
54], which provides guidelines for evaluating the thermal characteristics of machine tools. The experimental dataset consists of multiple sets of temperature measurement data and corresponding spindle thermal error data collected from a CNC machine tool. Specifically, it includes 16 temperature sensors, seven of which are distributed along the central axis of the motorized spindle. The layout of the sensors is shown in
Figure 5. The numbering and positioning of the sensors are described as follows:
T0 represents the temperature inside the front bearing, T1 denotes the temperature at the end face of the front bearing housing, T2 and T3 indicate the temperatures on both sides of the front bearing housing, T4 is the temperature on the outer surface of the front bearing, T12 represents the temperature inside the rear bearing, and T13 is the temperature outside the rear bearing. Additionally, the other seven temperature sensors are distributed at the water inlet and outlet of the front bearing, the water inlet and outlet of the motor, and the front, middle, and rear sections of the cooling jacket. Two more sensors measure the temperature of the worktable T14 and the surrounding environment T15.
For clarity and quick reference, the complete specifications of all temperature sensors are summarized in
Table 5.
Thermal error data was measured using a laser displacement sensor to monitor the Z-direction displacement of the spindle end face in real time, with a sampling frequency of 1 Hz. Temperature and displacement data were synchronously collected through the machine tool system, forming a timestamp-aligned dataset.
We collected 4682 sets of synchronised spindle thermal error and temperature data at one-minute intervals, capturing minute-wise error changes under different conditions. To ensure the consistency and applicability of the data, all data were normalized and randomly divided into training, validation, and test sets in an 8:1:1 ratio. Each temperature point reflects the temperature variation at different locations. The PSO-HDBSCAN algorithm is used to select the temperature points highly correlated with spindle thermal errors. These selected temperature points will be used as input features for the RBMO-X-DLTK model, for subsequent model training and prediction. The collected temperature data vary with the measurement time, as shown in
Figure 6, and the spindle thermal error data change with the measurement time, as shown in
Figure 7:
5.2. Analysis of Temperature Point Selection Effectiveness
To verify the clustering performance of the PSO-HDBSCAN algorithm, the PSO optimization algorithm was first used to optimize the parameters based on the experimental setup. The Pearson correlation coefficient was calculated, and the optimal parameters were determined based on its magnitude.
Figure 8 shows the trend of the Pearson correlation coefficient over 100 iterations.
In
Figure 8, the highest Pearson correlation of 0.99697 occurs at the 16th iteration, with the optimized HDBSCAN algorithm parameters shown in
Table 6.
Using these optimized parameters, the HDBSCAN clustering algorithm can automatically identify clusters of varying densities and then evaluate the clustering results using metrics such as the Davies-Bouldin index, Silhouette Score, and BWP. The evaluation results are shown in
Table 7.
The experimental results show that the DB index of the PSO-HDBSCAN algorithm decreased by 13.13%, the Silhouette index increased by 32.39%, and the BWP index increased by 49.16%, significantly outperforming the unoptimized HDBSCAN algorithm, indicating that the PSO-HDBSCAN algorithm significantly improves clustering performance.
To validate the effectiveness of the PSO-HDBSCAN optimization algorithm in selecting temperature points, clustering was first performed on the temperature points T0–T15 in the dataset, using Pearson, Spearman, and Kendall correlation coefficients to select the temperature clusters most correlated with thermal error. Based on the three metrics, the top three clusters most correlated with thermal error were selected through the above steps, as shown in
Table 8.
The PSO-HDBSCAN algorithm was designed to converge to a stable solution. In our experiments, multiple runs with different initializations consistently identified the same optimal cluster (Cluster 1). The observed correlation coefficients for this cluster across runs showed negligible variation, confirming the high repeatability of the selection process. Therefore, based on the correlation coefficients in
Table 8, the cluster with the highest overall correlation is selected as the optimal choice. Therefore, the temperature points in this cluster are considered the best thermal-sensitive points: the internal temperature of the front bearing (T0), the temperature at the end face of the front bearing housing (T1), the temperatures on both sides of the front bearing housing (T2 and T3), the temperature on the outer surface of the front bearing housing (T4), the temperature inside the rear bearing (T12), and the surrounding environment temperature (T15).
The inclusion of the ambient temperature (T15) in the optimal cluster is not merely a statistical correlation but is well supported by thermal dynamics. As a critical boundary condition, the ambient temperature directly influences the initial thermal state and heat dissipation efficiency of the entire machine tool structure. Changes in the ambient environment alter the thermal gradient between the spindle assembly and its surroundings, thereby modulating the system’s thermal equilibrium point and the dynamic response to internal heat sources. Consequently, T15 provides essential contextual information for predicting the spindle’s net thermal displacement under varying workshop conditions. Its selection by the PSO-HDBSCAN algorithm underscores that a robust thermal error model must account for this external thermal load to achieve high accuracy and generalizability across different operating environments. Next, RMSE, MSE, and MAE are used as performance metrics. The best thermal-sensitive points are used as input features for the model, and performance comparisons are made with models using seven randomly selected temperature points and all temperature points as features, with a training cycle of 100 epochs. For fairness, the unoptimized DLTK model ensures that the results are not influenced by hyperparameters. The experimental results are shown in
Table 9.
As shown in
Table 9, using the best sensitive points selected by the PSO-HDBSCAN clustering algorithm as input features significantly outperforms models using seven randomly selected temperature points and all temperature points. Specifically, compared to the randomly selected seven temperature points, RMSE decreased by 31.89%, MSE by 53.85%, and MAE by 29.39%. Compared to using all temperature points, RMSE decreased by 24.35%, MSE by 42.47%, and MAE by 22.6%. The results effectively demonstrate the validity of the temperature points selected by the PSO-HDBSCAN clustering algorithm, providing a more reliable basis for subsequent modeling and optimization.
5.3. Analysis of RBMO-X Optimization Effectiveness
To validate the effectiveness of the hyperparameters optimized by the RBMO-X algorithm, hyperparameter tuning was performed based on the hyperparameter range defined in the experimental setup. The optimization effect was evaluated using the sum of the root mean square errors of spindle thermal errors, as shown in
Figure 9.
It is evident from
Figure 9 that the sum of the root mean square errors is minimized at the 15th iteration, indicating the lowest prediction loss. Therefore, the parameters from the 15th iteration are selected as the optimal parameters, as shown in
Table 10.
Using the best sensitive temperature points as input, the RBMO-X optimized DLTK model was used to predict spindle thermal errors. The model’s prediction performance was evaluated using metrics such as RMSE, MSE, MAE, R
2 Score, and EV Score. The model’s performance metrics are shown in
Table 11, with a training cycle of 100 epochs. The loss curve for each training iteration is shown in
Figure 10, and the predicted errors from the RBMO-X-DLTK network model are compared with the actual values in
Figure 11.
Figure 10 shows the change in training and validation loss over epochs. As the training epochs increase, both the training and validation losses decrease rapidly and stabilize, indicating continuous improvement and eventual convergence of the model’s performance on both the training and validation sets. Furthermore, the proximity of the training and validation loss curves suggests that the model does not suffer from overfitting or underfitting, demonstrating good generalization ability.
In
Figure 11, the model’s predictive performance is visualized by comparing the actual error and prediction residuals. The figure shows that the difference between the actual error (blue) and the prediction error (orange) is small, and both display a consistent trend in most samples, indicating good prediction accuracy. The 98.83% prediction accuracy obtained in the experiment further demonstrates the model’s strong capability in predicting thermal errors.
The performance comparison between the optimized and unoptimized models is shown in
Table 12:
These metrics show that the model optimized by the RBMO-X algorithm performs better across all key evaluation indicators. The improvement in RMSE and MAE, in particular, indicates that the optimized model not only reduces overall error but also enhances the model’s robustness and accuracy. This demonstrates the effectiveness of the RBMO-X algorithm in optimizing DLTK hyperparameters, allowing the model to better adapt to tasks such as spindle thermal error prediction in time-series forecasting.
5.4. Comparative Experiment
To validate the superiority of the RBMO-X-DLTK model, we compare it with seven established and state-of-the-art benchmark models for thermal error prediction: BP neural network [
55], LSTM [
56], and recent popular deep learning models such as the CNN-LSTM model and MA-CNN, the CNN-LSTM-Transformer model proposed by Al-Ali et al. [
57] for solar power prediction, the TCN-LSTM hybrid model proposed by Li et al. [
58] for predicting the health status and remaining useful life of lithium-ion batteries, and the SHO-LSTM model proposed by Chen et al. [
59] for spindle thermal error modeling.
All models were trained and tested under identical conditions (same hardware, data split, input features, and 100 training epochs) to ensure a fair comparison. Their performance is comprehensively evaluated using RMSE, MAE, R
2, and other metrics, with results summarized in
Table 13 and visually compared in
Figure 12 and
Figure 13.
As shown in
Table 13, the RBMO-X-DLTK model achieves the lowest prediction error, with an RMSE of 0.181 and an MSE of 0.032. This represents a substantial improvement over all benchmarks. Specifically, it reduces RMSE by 57.6% compared to the BP model and by 42.0% compared to the LSTM model. Moreover, it maintains a clear advantage over recent advanced hybrids, achieving an 18.1% lower RMSE than TCN-LSTM and 34.2% lower than CNN-LSTM-Transformer. Similarly, the proposed model attains the lowest MAE of 0.128 μm, which is 61.1% and 46.4% lower than that of the BP and LSTM models, respectively. It also surpasses the best-performing baseline, TCN-LSTM (MAE = 0.176 μm), by a margin of 27.3%.
In terms of R
2 and EV Score, as shown in
Figure 13, the DLTK model also performs exceptionally well, achieving values of 0.9978 for both, indicating its superior ability to explain data variance and capture the trend of thermal error changes. In comparison, the BP model has an R
2 of 0.9866 and an EV Score of 0.9869, while the LSTM model achieves 0.9934 and 0.9936, respectively, suggesting weaker explanatory power and generalization ability in these traditional models. Although the MA-CNN (R
2 = 0.9941, EV Score = 0.9945) and SHO-LSTM (R
2 = 0.9942, EV Score = 0.9942) models show some improvement in explaining thermal error, they still do not perform as well as DLTK. The CNN-LSTM-Transformer model and the TCN-LSTM model, which integrate LSTM and Transformer mechanisms, show slight improvements, with R
2 and EV Score reaching 0.9967 and 0.9977, respectively, demonstrating good explanatory power, but still fall short of DLTK. This hierarchy of model performance is consistently reflected in the Variance Accounted For (VAF) metric, with our RBMO-X-DLTK achieving the highest VAF score (see
Table 13).
The validity of these comparative results is underscored by the deterministic experimental design. The substantial and consistent performance gains of RBMO-X-DLTK over all baseline models are observed under a tightly controlled framework (fixed data split and seeds). The large-margin superiority across every metric (e.g., >18% RMSE reduction versus the strongest baseline) provides compelling evidence that the improvement is inherent to the proposed methodology, not an artifact of random variation. In the context of architectural innovation, such conclusive empirical evidence establishes the practical efficacy of the framework.
Therefore, the experimental evidence confirms that the RBMO-X-DLTK framework establishes a new state-of-the-art for spindle thermal error prediction on the tested platform, significantly outperforming both traditional and state-of-the-art deep learning models.
Beyond prediction accuracy, computational efficiency is critical for industrial deployment.
Table 14 benchmarks the training time, inference latency, and model complexity of all compared models under identical experimental conditions.
The results reveal two key insights for deployment:
- (1)
Inference Efficiency: The inference latency of all deep learning models, including ours, is on the order of ~0.1 ms per sample. This is negligible compared to the thermal time constants of the spindle (minutes) and is fully compatible with the real-time control cycle of CNC systems.
- (2)
Favorable Training Cost-Complexity Trade-off: While simpler models (BP, LSTM) have the lowest training times, the proposed RBMO-X-DLTK achieves significantly faster training (116.9 s) and lower complexity (1.31 M parameters) than other advanced hybrid architectures of comparable accuracy (e.g., CNN-LSTM, CNN-LSTM-Transformer). This demonstrates that the superior predictive accuracy of RBMO-X-DLTK (established in
Table 13) is achieved with a competitive and often more efficient computational profile, striking a practical balance between performance and overhead for real-world deployment.
Comparison with Lightweight Models and Real-Time Deployment Considerations. It is noteworthy that our comparative study already includes two classic, lightweight models: the BP neural network and the LSTM network, which are widely adopted in embedded and real-time systems due to their structural simplicity and low computational demand. As shown in
Table 13, while these models exhibit low inference latency (
Table 14), their prediction errors are significantly higher than that of RBMO-X-DLTK. This indicates a clear trade-off between model complexity and accuracy. The proposed RBMO-X-DLTK model achieves state-of-the-art accuracy while maintaining an inference latency of 0.114 ms, which is orders of magnitude faster than the thermal dynamics of the spindle and thus fully compatible with real-time control cycles. Therefore, for high-precision thermal error compensation, the substantial accuracy gain afforded by RBMO-X-DLTK justifies its moderately increased complexity, as the real-time constraint is unequivocally met.
5.5. Ablation Study
To further validate the effectiveness of each module in the proposed RBMO-X-DLTK model, a series of ablation experiments were designed, systematically removing key modules to observe their impact on overall performance. The ablation models include:
- (1)
LSTM-Transformer-KAN: Removes the 1D-DC module, using only the LSTM, Transformer, and Fourier KAN modules.
- (2)
1D-DC-Transformer-KAN: Removes the LSTM module, retaining only the 1D-DC, Transformer, and Fourier KAN modules.
- (3)
1D-DC-LSTM-KAN: Removes the Transformer module, retaining only the 1D-DC, LSTM, and Fourier KAN modules.
- (4)
1D-DC-LSTM-Transformer: Removes the Fourier KAN module, retaining only the 1D-DC, LSTM, and Transformer modules.
These ablation models are used to assess the contribution of each module to the overall model performance. The experimental setup is consistent with the main experiment, with the same dataset split, training, and testing conditions applied to all ablation experiments. The experimental results are shown in
Table 15 and
Figure 14.
As shown in
Table 15 and
Figure 14, the complete DLTK model outperforms all other configurations across all metrics, particularly achieving RMSE and MAE values of 0.181 and 0.128, respectively, which are significantly better than any ablation experiment. The ablation study further demonstrates that the model’s predictive performance relies on the synergy of all modules, with the removal of any module leading to a marked decline in performance.
As shown in
Table 9 and
Table 15, the model without the 1D-DC module (LSTM-Transformer-KAN) exhibits higher RMSE (0.248) and MAE (0.196), significantly worse than the models that include this module. This indicates that 1D-DC plays a crucial role in feature extraction and capturing local variations. Moreover, compared to the CNN-LSTM combination, the inclusion of 1D-DC better adapts to the nonlinear characteristics of the machine tool temperature signals, significantly enhancing feature extraction capability and model robustness. The experimental results validate the importance of this module in improving prediction accuracy.
The model without the Fourier KAN module performs worse in all error metrics compared to those with it. For instance, the 1D-LSTM-Transformer model without Fourier KAN has an RMSE of 0.271 and MAE of 0.213, significantly higher than the 1D-Transformer-KAN and 1D-LSTM-KAN models with Fourier KAN (RMSE of 0.245–0.247, MAE of 0.181). This indicates that the Fourier KAN module plays a crucial role in extracting and analyzing frequency-domain features, which improves the model’s thermal error prediction accuracy. Specifically, in reducing RMSE and MAE, the Fourier KAN module effectively reduces prediction errors and enhances the overall performance of the model. Therefore, the ablation experiment results confirm the importance of the Fourier KAN module in enhancing the model’s prediction accuracy and robustness.
Removing either the LSTM module or the Transformer encoder leads to a decrease in performance, but the impact is less significant than removing the Fourier KAN module. This indicates that the LSTM and Transformer modules play crucial roles in modeling temporal dependencies and extracting global features, respectively, and their combined effect further improves the prediction accuracy.
Overall, the complete DLTK model performs best in key metrics such as RMSE and MAE. Removing any of the LSTM, Transformer, or Fourier KAN modules leads to a significant decline in model performance. The experimental results further demonstrate that the collaboration of the 1D deformable convolution, LSTM, Transformer, and Fourier KAN modules is crucial for enhancing feature extraction capabilities and prediction accuracy. The 1D deformable convolution enhances local feature extraction, LSTM captures long-term dependencies in the time series, the Transformer explores global features through the self-attention mechanism, and the Fourier KAN module improves prediction accuracy and stability through frequency domain analysis.
5.6. Evaluation Metrics
The root mean square error (RMSE) and mean absolute error (MAE) are employed as primary metrics because they provide interpretable, scale-dependent measures of the average prediction error in micrometers (μm), which directly relates to the magnitude of thermal displacement critical for machining accuracy [
60,
61]. RMSE penalizes larger errors more heavily, making it sensitive to undesirable large deviations, while MAE offers a robust measure of average error magnitude. The coefficient of determination (R
2), explained variance score (EV Score) and the variance accounted for (VAF) complement these by quantifying the proportion of variance in the thermal error that is captured by the model, thus assessing its overall fitting capability. While application-specific error thresholds (e.g., a maximum allowable thermal error) are ultimately crucial for acceptance, these standard metrics provide a comprehensive, granular, and widely comparable assessment of model performance across its entire operating range, forming the necessary foundation for evaluating whether a model can meet such thresholds.
Accordingly, in this study, the evaluation framework comprises two categories of metrics: the first measures clustering effectiveness for temperature point selection, and the second evaluates the regression performance of the thermal error prediction models. The specific metrics are defined as follows:
Pearson correlation is used to assess the linear correlation between two variables, with values closer to ±1 indicating a stronger correlation. Kendall and Spearman are used to evaluate the nonlinear correlation between two variables, with values ranging from [−1, 1], as shown in Equations (7)–(9).
where
and
represent the observed values of two variables, while
and
are the mean values of
and
respectively.
is the Pearson correlation coefficient, which ranges from −1 to 1.
is the Kendall index, where
is the number of concordant pairs,
is the number of discordant pairs, and
is the total number of temperature points.
is the Spearman index, and
represents the rank difference.
Davies-Bouldin, Silhouette, and BWP are metrics used to evaluate clustering performance. The lower the
DB value, the better the clustering result. The Silhouette score ranges from −1 to 1, with higher values indicating better clustering. A higher BWP value indicates better separation between clusters and greater cohesion within clusters. The formulas are shown in Equations (10)–(14).
where
represents the number of clusters,
and
indicate the intra-cluster average distance of clusters
and
, and
is the distance between clusters
and
.
is the Silhouette score, where
is the average distance from sample
to other samples in its cluster, and
is the average distance from sample
to the nearest cluster.
is the sum of squares between clusters (the squared distance between cluster centroids), where
and
are the centroids of clusters
and
, and
is the sum of squares within clusters (the squared distance between points and the centroid within each cluster), with
representing the sample points in cluster
.
Root Mean Squared Error (
RMSE), Mean Squared Error (
MSE), Mean Absolute Error (
MAE),
R2 Score (Coefficient of Determination), and Explained Variance Score (
EV Score) are commonly used to assess the performance of regression models. The smaller the RMSE, MSE, and MAE, and the higher the
R2 Score and
EV Score (closer to 1), and VAF (closer to 100%), the higher the model’s prediction accuracy. The formulas are shown in Equations (15)–(20).
where
represents the true values,
represents the predicted values,
is the number of samples,
is the mean of the true values, and
is the variance of
.
7. Discussion
The experimental results show that the RBMO-X-DLTK model achieves a comprehensive improvement in prediction error and fit indicators compared to other models. This performance enhancement is largely attributed to the advantages of the PSO-HDBSCAN algorithm in selecting temperature points. PSO-HDBSCAN significantly reduces the redundancy of input features and ensures that the selected temperature points achieve optimal correlation with thermal errors. Compared to randomly selecting 7 temperature points or using all temperature points, the selected temperature points as input features reduced RMSE by 31.89%, MSE by 53.85%, and MAE by 29.39%. However, PSO-HDBSCAN still faces challenges in real-time applications, such as the potential impact on the algorithm’s iteration efficiency in high-dimensional feature scenarios. Moreover, the clustering parameter optimization required by this method may need further adjustment under dynamic working conditions to adapt to more complex thermal environments. In the future, it may be considered to introduce an adaptive parameter adjustment mechanism to further enhance the algorithm’s generalization ability under various operating conditions.
Beyond these algorithmic considerations, the framework’s robustness to imperfections in the input data itself—such as sensor noise or intermittent missing values—is another critical aspect for practical deployment. It is noteworthy that the PSO-HDBSCAN feature selection stage offers inherent resilience against sporadic, uncorrelated noise by design, as the HDBSCAN algorithm filters out points identified as outliers. This addresses a common class of data quality issues. For sustained systematic noise or extended data loss, which could impact the cluster structure, the integration of complementary signal processing or imputation techniques would be necessary. Enhancing robustness under these challenging data conditions is a clear direction for future work aimed at industrial hardening.
Interpretability and Physical Insights. Our framework directly addresses model interpretability through its PSO-HDBSCAN feature selection stage, which acts as a data-driven sensor importance analyzer. By consistently identifying the optimal temperature cluster (e.g., front/rear bearings and ambient temperature, as detailed in
Section 5.2), it provides clear, physically grounded insight into the primary thermal influences on spindle error. This offers actionable guidance beyond mere black-box prediction, effectively bridging data-driven modeling with practical thermal management understanding.
Compared to DLTK, the RBMO-X-DLTK model reduces the overall error by about 13%, and this improvement is mainly attributed to the significant advantages of the RBMO-X algorithm in hyperparameter optimization. By combining SGDR and gradient clipping, RBMO-X significantly improves the model’s convergence speed and stability. Specifically, SGDR dynamically adjusts the learning rate, effectively avoiding getting stuck in local optima during training, allowing the model to better fit high-dimensional and complex feature data. On the other hand, gradient clipping limits the L2 norm of the gradient, eliminating the risk of gradient explosion and ensuring the stability of the DLTK model in multi-level feature extraction. In addition, RBMO-X’s collaborative global and local search mechanism enables hyperparameter optimization to quickly find the best configuration suited for complex working conditions. Although RBMO-X performs excellently in hyperparameter optimization, there is still a certain computational overhead in high-dimensional and complex feature scenarios. In the future, distributed optimization techniques could be combined to further improve its efficiency in large-scale data, and RBMO-X’s response speed for parameter adjustment under real-time dynamic conditions also has room for further optimization.
To substantiate the choice of the PSO algorithm for optimizing HDBSCAN parameters, a direct comparison with two other prevalent metaheuristic optimizers—the Grey Wolf Optimizer (GWO) [
63] and the Firefly Algorithm (FA) [
64]—was conducted. The comparison utilized the identical experimental setup: the same HDBSCAN parameter search bounds (
Table 3), population size (20), maximum iterations (100), and objective function (maximizing the Pearson correlation between temperature clusters and thermal error). Standard parameter settings from the literature were used for GWO and FA to ensure a fair comparison. The performance metrics, summarized in
Table 16, clearly demonstrate the advantage of PSO for this specific continuous parameter optimization problem. PSO not only achieved the highest Pearson correlation (0.99697) but also converged to this optimal solution in the fewest iterations (16). This empirical result validates the theoretical rationale provided in
Section 3, confirming that PSO offers superior efficiency and convergence reliability in tuning HDBSCAN, thereby enhancing the robustness of the feature selection stage.
The observed discrepancy between the high prediction accuracy (98.83%) and the realized compensation rate (89%) underscores the transition from a data-driven predictive model to a physical compensation system. This gap can be attributed to several intertwined practical and dynamic limitations inherent in real-time machining environments:
- (1)
Dynamic Response Lag of the Physical System: The RBMO-X-DLTK model predicts thermal error based on temperature history. However, the actual compensation system—comprising the CNC controller, servo drives, and mechanical axes—has inherent response delays (e.g., servo response time, communication latency). This finite response speed means the compensation command is always applied to a system state that is slightly ahead of the state on which the prediction was based, especially during periods of rapid thermal change.
- (2)
Nonlinearities and Hysteresis in the Actuation Chain: The model assumes a perfect mapping from predicted error to compensated axis movement. In reality, the ball screw, guideways, and bearings exhibit nonlinear friction, backlash, and hysteresis. These effects introduce small, uncompensated errors between the command position and the actual tool tip position, which are not captured by the purely thermal-error-focused prediction model.
- (3)
Unmodeled External Disturbances: The experimental validation involved actual cutting operations. Dynamic cutting forces, vibrations, and coolant effects introduce additional, non-thermal mechanical deformations that perturb the spindle position. Our model, trained primarily on thermally induced errors under idle or warm-up conditions, does not account for these coupled thermo-mechanical effects, leading to a predictable performance gap during real machining.
- (4)
Synchronization and Sampling Limitations: While temperature and error data were synchronized during model training, the real-time compensation loop operates under strict sampling and execution cycles. Any jitter or minor asynchrony between temperature acquisition, model inference, and compensation command issuance can result in sub-optimal correction.
These factors collectively explain why a portion of the predicted error cannot be perfectly canceled in practice. The achieved 89% compensation rate, therefore, reflects the practically attainable performance of the integrated “sensor-model-controller” system, demonstrating significant improvement over the uncompensated case while highlighting areas for future work in co-modeling and adaptive control.
Given the complexity of the DLTK architecture, proactive measures were taken to ensure generalization and mitigate overfitting. First, architectural regularization was implemented through Dropout layers (with the rate optimized by RBMO-X, see
Table 10). Second, and most critically, the model’s generalization capability is empirically validated by our rigorous evaluation protocol:
- (1)
The 8:1:1 train/validation/test split ensured an unbiased assessment.
- (2)
The close alignment between training and validation loss curves (
Figure 10) indicates no significant overfitting during training;
- (3)
Most importantly, the model maintained superior performance on the completely held-out test set (
Table 13), demonstrating its ability to generalize to unseen data.
Finally, its successful application in the real-world machining validation (
Section 6), a distinctly different environment, provides strong external evidence of its robustness beyond the original dataset.
Beyond the internal technical validation and comparison, it is instructive to position the overall performance of our proposed framework within the current landscape of research dedicated to spindle thermal error prediction.
Table 17 provides a comparative overview of the final prediction accuracy achieved by our integrated RBMO-X-DLTK pipeline against several recent, peer-reviewed studies that address the same core problem.
As shown in
Table 17, our RBMO-X-DLTK model achieves competitive and often superior prediction accuracy (RMSE = 0.181 µm, MAE = 0.128 µm) compared to other recent, dedicated studies. This advantage is attributed to our integrated two-stage design, which explicitly addresses input feature redundancy through PSO-HDBSCAN, providing the DLTK network with highly relevant input. While direct comparisons are qualitative due to differing experimental conditions, this positions our co-optimization framework as a robust and effective approach.
In summary, the integrated PSO-HDBSCAN and RBMO-X-DLTK framework provides an accurate and efficient solution for spindle thermal error prediction, effectively addressing feature redundancy and model optimization. Future work will focus on enhancing algorithmic and hardware efficiency for real-time compensation.