Development of a Transfer Learning Technique for Rapid Adaptation of Thermal Compensation Models to Long-Term Machine Thermal Behavior Changes

Chuang, Chia-Chin; Lin Chi, Zheng-Wei; Kuo, Tzu-Chien; Chang, Che-Jui; Hsieh, Wen-Hsin

doi:10.3390/machines14030309

Open AccessArticle

Development of a Transfer Learning Technique for Rapid Adaptation of Thermal Compensation Models to Long-Term Machine Thermal Behavior Changes

by

Chia-Chin Chuang

,

Zheng-Wei Lin Chi

,

Tzu-Chien Kuo

,

Che-Jui Chang

and

Wen-Hsin Hsieh

^*

Department of Mechanical Engineering, National Chung Cheng University, Chiayi 621, Taiwan

^*

Author to whom correspondence should be addressed.

Machines 2026, 14(3), 309; https://doi.org/10.3390/machines14030309

Submission received: 6 January 2026 / Revised: 14 February 2026 / Accepted: 25 February 2026 / Published: 9 March 2026

(This article belongs to the Section Machines Testing and Maintenance)

Download

Browse Figures

Versions Notes

Abstract

Structural aging and environmental changes associated with long-term operation can substantially modify the thermal behavior of machine tools, diminishing the accuracy of existing thermal compensation models. Traditional neural network approaches typically necessitate time-consuming and inefficient retraining from scratch for practical adaptation. To address this limitation, this study proposes a parameter-based transfer learning technique to enhance model adaptability under evolving machine tool operating conditions. The method establishes a composite fine-tuning architecture by adding hidden layers and selectively freezing neural network parameters, enabling the rapid adaptation of the pretrained model to new thermal characteristics using limited data. A full-factorial experimental design identified the optimal configuration, comprising (i) structural expansion via an LSTM layer inserted after the hidden layers; (ii) a strategy freezing parameters in all layers; and (iii) training under the selected optimal condition (C9), which reflects machine tool characteristics and environmental temperature variations. The baseline model achieved an RMSE of 3.88 µm. Traditional retraining using the complete dataset and retraining only on C9 yielded RMSE values of 3.21 and 3.84 µm, respectively. In contrast, the optimized transfer learning model trained on C9 achieved an RMSE of 3.47 µm. Experimental results demonstrate that the proposed strategy converges with limited data, reducing the number of datasets from 18 to nine and significantly shortening training time from 18 h 20 min to 30 s. This approach offers an effective solution for sustainable model maintenance and expedited industrial deployment.

Keywords:

transfer learning; thermal compensation model; neural network; parameter transfer

1. Introduction

In horizontal machining centers, the spindle is horizontally configured, with a double-wall design used for the inner surface of the column to enhance rigidity. The symmetrical structural design of the column significantly reduces thermal deformation in the X-direction. The utilization of a fourth-axis (B-axis) rotary table enhances machining flexibility, enabling multiple sides of complex workpieces to be machined in a single setup. Moreover, the fixture design can be simplified, which shortens the manufacturing process and allows for the production of more complex parts with fewer steps. This method is commonly used in the machining of precision parts in the aerospace and automotive industries. However, despite the aforementioned advantages of horizontal machining centers, the interaction among multiple heat sources during prolonged part machining alters the characteristics of the machine tool, gradually reducing workpiece accuracy. This remains the primary limitation of horizontal machining centers.

Among the various sources of errors in precision machining, thermal error has long been recognized as the most significant, accounting for approximately 40–70% of all machining errors [1,2]. In various precision machining systems, methods for reducing thermal errors are generally categorized into three types: thermal avoidance, thermal suppression, and thermal compensation [1]. Thermal avoidance methods involve reducing thermal deformation by selecting materials with lower coefficients of thermal expansion. Thus, these methods typically require expensive advanced materials, such as carbon fiber–reinforced polymers instead of metals. Thermal suppression methods focus on controlling the heat transferred to the spindle system to prevent uneven temperature distributions. Examples include installing spindle cooling jackets and optimizing the machine structure to minimize thermal deformation. Therefore, the choice of thermal suppression method must typically be finalized during the design stage. Thermal compensation methods primarily correct errors by adjusting the positions of the tool and workpiece through a controller. Compared with thermal avoidance and thermal suppression, thermal compensation incurs lower production costs and offers easier implementation.

In the thermal compensation process, the initial step is to develop a model that characterizes the relationship between temperature and thermal deformation. Next, the thermal error values predicted by the thermal compensation model are input into the motion axes as compensation values, and thermal compensation is achieved by adjusting the coordinate origin. Because thermal errors exhibit nonlinear characteristics that vary over time and are influenced by processing conditions, current industrial practices typically involve using regression analysis [3,4,5] or neural networks (i.e., machine learning techniques) [6,7] to construct thermal compensation models. By designing various machining scenarios and collecting large amounts of experimental training data, the temperature-sensitive points influencing thermal errors can be identified. Furthermore, experimental data reflecting real-world machining conditions can be utilized to enhance the accuracy and generalizability of thermal compensation models. In practice, new machine tools are typically subjected to precise calibration and verification procedures before they leave the factory, in accordance with standards such as ISO 10791-7 [8]. Therefore, developing a thermal compensation model with high accuracy and strong generalizability is relatively easy. However, prolonged machine operation may cause structural aging of the machine, altering its original characteristics [9]. Theoretically, this gradual shift in the machine’s thermal behavior due to long-term operation and structural aging can be characterized as a form of concept drift, where the mapping between temperature inputs and thermal deformation outputs evolves over time. Consequently, thermal compensation models calculated for new machines often become invalid when applied directly to aged equipment [10]. In practical machining environments, reduced cooling efficiency, spindle slider wear, column structure deformation, and changes in screw clearance can all reduce the accuracy of the original thermal compensation model. Therefore, thermal compensation models developed for new machines are difficult to apply directly to older machines. Without remodeling, the compensation is highly likely to become less accurate or even erroneous, which represents a key challenge that traditional thermal compensation methods struggle to overcome.

Developing thermal compensation models using traditional neural networks, such as regression analysis [11], decision trees [12], and support vector machines [13], typically requires collecting new training data and retraining from scratch, which has the following drawbacks: (1) as these models apply simple transformation functions to project input data into one or two simplified spaces, they struggle to capture complex features; (2) they fail to account for temporal characteristics, hindering the utilization of time-series information, which reduces prediction accuracy; (3) as previously trained models cannot be leveraged for new predictions, accuracy is poor when training data are insufficient, and each training session must start from scratch, which results in inefficient training.

In practical applications, recollecting the necessary training data and rebuilding the thermal compensation model is costly and resource-intensive. However, through transfer learning, a previously developed thermal compensation model for the same machine can be retrained to learn common features across different datasets and refined accordingly. This approach reduces the amount of training data that must be recollected, shortens the time required to update the thermal compensation model, enhances the accuracy of the original model, and lowers the overall cost of model development. Research on transfer learning for thermal compensation models has mainly focused on comparing models built using different algorithms and evaluating the effectiveness of transfer learning in enhancing their accuracy and generalization. For example, Horejs et al. [14] developed a multiple linear regression model and subsequently introduced a thermal transfer function based on heat transfer principles to identify and learn time-varying thermal sources, thereby broadening the model’s applicability. Based on test results, the thermal error in the Y-direction thermal deformation was reduced by 85%. Yau et al. [15] improved the thermal compensation model of a machine tool by combining a long short-term memory (LSTM) algorithm with thermal transfer functions. Through thermal compensation, the spindle thermal error was reduced from 20 µm to 5 µm. Li et al. [16] used easily collectible shutdown and low-speed experimental data, extracted their features, and employed transfer learning for model training. They compared the prediction performance of the transfer learning model with that of models based on three algorithms—multiple regression analysis (MRA), backpropagation algorithm (BP), and a convolutional neural network. The residual error (the difference between the actual thermal deformation and the thermal compensation model’s prediction) for the other three algorithms was around 18 µm, while that for the transfer learning model was 10 µm. Zhou et al. [17] applied transfer learning to transfer the thermal compensation model for the X-axis of a CNC lathe to its Z-axis. Experimental results showed that the residual error of the Z-axis thermal compensation model was between only −3.2 μm and 1.4 μm. Ma et al. [18] employed a weakly supervised adversarial network to generate training data for thermal compensation models under various spindle speeds. The generated data were then integrated into a multi-scale convolutional neural network using transfer learning. This method improved the prediction accuracy of the thermal compensation model across different spindle speeds when evaluated on both synthetic and real experimental data. Zheng et al. [19] developed a cross-speed thermal error transfer model that combines direct standardization (DS) with a self-attention mechanism to align fused temperature features. In a transfer task from 2000 rpm (source domain) to 4000 rpm (target domain), the SA-DS-EasyTL approach markedly improved prediction accuracy, achieving a mean square error of 2.712 µm², which is substantially lower than the 11.890 µm² obtained by the baseline DS-EasyTL model. Mao et al. [20] proposed a subspace metric-based dynamic domain adaptation method that aligns the angles and scales of thermal features within a specified subspace to improve characterization of feature correlations. A model updating strategy using buffered weighted incremental time windows is incorporated to address time-varying operating conditions, and the method was validated across seven spindle thermal error transfer tasks, demonstrating higher prediction accuracy and stability than state-of-the-art methods. Zheng et al. [21] presented a systematic literature review of current approaches for thermal error model transfer.

The preceding literature review indicates that early machine-tool thermal compensation models were primarily based on traditional neural networks or regression-based algorithms. More recently, Horejs et al. [14], Yau et al. [15], Li et al. [16], Zhou et al. [17], Ma et al. [18], Zheng et al. [19], Mao et al. [20], and Zheng et al. [21] adopted transfer learning techniques to exploit the knowledge structures embedded in existing models. This approach allows thermal compensation models to quickly adapt to varying machining conditions or different stages of machine operation, thereby improving their prediction accuracy. The findings from the aforementioned studies confirm that transfer learning can effectively improve the accuracy and generalizability of thermal compensation models. However, past research has mainly focused on comparing the transfer performance of different algorithms or applying transfer learning to train models across different operating conditions. Regarding situations where prolonged machine operation results in structural aging, such as in the spindle, cooling system, or column, systematic research remains inadequate. Hence, methodologies for rapidly updating existing models via transfer learning to adapt them to changes in machine characteristics must be developed.

Accordingly, this study introduces a transfer learning approach for thermal compensation models of horizontal machining centers, enabling rapid model adaptation to variations in thermal behavior induced by prolonged machine usage. The temperature and thermal deformation behavior of the same horizontal machining center were examined under two distinct conditions: (1) the baseline mechanical state at the time of delivery (2021) and (2) the mechanical state after extended operation (2024). Accordingly, the analysis of mechanical aging and operational history focuses on temporal changes in the mechanical state of a single machine rather than comparisons across different machines. Under the long-term operation condition (2024), the machine was operated in a factory environment with ambient temperatures ranging from 18 °C to 37 °C and an annual average ambient temperature of ~24.1 °C to 25.3 °C. During this period, the machine operated for ~10 days per month and for ~eight hours per day. After identifying the differences in thermal characteristics caused by machine operation over different durations, transfer learning was utilized to efficiently adapt existing thermal compensation models to changes in machine behavior, even with limited training data. Furthermore, a full-factorial experimental design was employed to systematically plan the parameter combinations for transfer learning. This design includes a structural expansion strategy that transfers thermal behavior by adding hidden layers to the pretrained model’s neural network, a parameter-unfreezing strategy that preserves the machine’s original thermal rise and deformation characteristics, and a strategy for configuring the model training conditions. Existing literature mainly focuses on comparing the transfer performance of different algorithms, with relatively limited attention paid to how various transfer factors influence model performance under different machine conditions. Therefore, a full-factorial analysis was performed to evaluate the main effects and interactions of each factor, with the goal of identifying an optimal transfer strategy and establishing an efficient transfer method for thermal compensation models that enhances model accuracy and facilitates quick adaptation to machine degradation characteristics.

2. Research Methods

The objective of this study was to develop a transfer learning algorithm for horizontal machining centers, aimed at correcting thermal compensation models to reduce thermal errors. The main research methods were as follows:

Experimental design and measurement data on temperature and thermal deformation: We systematically varied the spindle speed and ambient temperature to design experimental conditions for measuring the thermal rise and deformation behavior of the same horizontal machining center under two mechanical states—the baseline mechanical state at the time of delivery (2021) and the mechanical state after extended operation (2024).
Development of pretrained model: Experimental data collected from the new machine (baseline mechanical state at the time of delivery) were used to build a pre-trained thermal error prediction model, which served as the source model for subsequent transfer learning.
Development of a transfer learning algorithm for thermal compensation in horizontal machining centers: The machine characteristics were transferred from the baseline state at the time of delivery to the altered state induced by extended machining operations.
Determination of optimal factor selection strategies via full-factorial analysis.

2.1. Experimental Design and Measurement Data on Temperature and Thermal Deformation

2.1.1. Measurement Equipment and Methods for Thermal Rise and Deformation Experiments on Horizontal Machining Centers

An experimental setup was established to measure the temperature rise and thermal deformation of horizontal machining centers (Figure 1). Temperature rise and thermal deformation were recorded while the spindle speed and ambient temperature were systematically varied. A temperature data acquisition card (NI9213) and a temperature sensor (OMEGA, E-type thermocouple), together with the LabVIEW data acquisition software program, were used to measure temperature changes across four primary thermal regions: (i) Environmental and Auxiliary Systems, including the ambient environment (A), spindle coolant inlet and outlet (Cool), oil-air lubrication system (Oil Air), and electrical control cabinet (E); (ii) Structural Components, including the base surface (B), column surface (C), worktable surface (T), and saddle surface (H); (iii) Feed Systems, including the Y-axis and Z-axis ball screws, bearing housings, and nuts (SC), and the X-axis motor surface (X); and (iv) the spindle area (S). Temperature data were collected at multiple time points for two mechanical states: (1) 44 measurement points for the baseline mechanical state at the time of delivery (2021) and (2) 55 measurement points for the mechanical state after extended operation (2024). The exact locations of the temperature measurement points are illustrated in Figure 2 and Figure 3. Regarding thermal deformation, we used five capacitive displacement sensors (LION, CPL-230; Lion Precision, St. Paul, MN, USA) and a voltage data acquisition card (NI9239; National Instruments, Austin, TX, USA), combined with LabVIEW, to measure the spindle’s thermal deformation at Location (D) in directions of X1, X2, Y1, Y2, and Z accordance with the ISO 230-3 [22] standard.

Figure 2 includes nine measurement points in the spindle area (S), three points on the table surface (T), four ambient temperature points (A), one point in the electrical control cabinet (E), and 10 points on the base surface (B). The base surface (B) points labeled in black text on a green background indicate temperature measurement locations added in 2024.

Figure 3 includes 18 measurement points on the column surface (C), five points on the Y-axis and Z-axis ball screws, bearing housings, and nuts (SC), one point on the saddle surface (H), one point in the oil-air lubrication system (Oil-Air), one point on the X-axis motor surface (X), and two points at the spindle coolant inlet/outlet (Cool). Points labeled in black text on a green background indicate temperature measurement locations added in 2024.

In terms of measurement equipment specifications, the temperature data acquisition card has a sampling rate of 75 samples/s, with a per-channel sampling interval of 740 µs and a measurement accuracy of ±0.25 °C. The temperature sensor used was an OMEGA E-type thermocouple (model 5TC-TT-E-36-72; Omega Engineering, Norwalk, CT, USA), which has a nominal accuracy of ±1 °C. However, after calibration using a constant-temperature water bath and a digital thermometer (FLUKE 1552A EX; Fluke Corporation, Everett, WA, USA, accuracy ±0.05 °C), the measurement accuracy improved to ±0.1 °C. The voltage data acquisition card has a sampling rate of 50 kS/s and an accuracy of ±100 ppm. The capacitive displacement sensor has a measurement accuracy of 0.004% F.S.

The experimental equipment was calibrated before measurements in both 2021 and 2024. All sensors were calibrated rigorously. For temperature calibration, the experimental setup is depicted in Figure 4a. A constant-temperature water bath was used to control water temperature between 5 °C and 50 °C. A high-precision thermometer (FLUKE 1552A) served as the reference standard, and thermocouple (E-type) measurements were acquired using a data acquisition card (NI9213). Measurements were collected at 12 distinct temperature points within the 5–50 °C range, and linear regression was applied to derive the calibration equation [23]. In addition, measurement uncertainty was evaluated using Equation (1) to determine whether sensor bias could be neglected.

U_{R S S} = \sqrt{[{(B_{A v e r a g e})}^{2} + {(R_{A v e r a g e})}^{2}]} (a t 95 %)

(1)

For the baseline mechanical state (2021), repeated measurements were performed under operating condition H3’ with four repetitions. The computed measurement uncertainty across all temperature points (44 locations) fell within a

U_{R S S}

range of 0.12–0.49 °C. For the long-term operation state (2024), repeated measurements were conducted under operating condition H4 with four repetitions. The calculated measurement uncertainty across all temperature points (55 locations) fell within a

U_{R S S}

range of 0.11–0.59 °C.

Thermal deformation calibration was performed in accordance with ISO 230-3 [22]. The calibration experiment illustrated in Figure 4b used a displacement stage to move both the laser interferometer reflector and the capacitive displacement sensor head to multiple positions. Laser interferometer measurements were used as the reference standard, and displacement data were acquired through a data acquisition card (NI-9239). These data were then used with Equation (1) to compute thermal deformation uncertainty.

In 2021, the measured thermal deformation uncertainty at the spindle tool center point in the X1, X2, Y1, Y2, and Z directions (Figure 1) across five sensors ranged from 1.2 to 2.0 µm. In 2024, the uncertainty at the same points was 1.28–1.91 µm. Consequently, deviations in both temperature and thermal deformation measurements were considered negligible.

2.1.2. Experimental Condition Design

The experimental conditions for the horizontal machining center were categorized into four types: (1) characteristic conditions (H1’–H4’ and H1–H7), (2) training conditions (H5’–H14’ and H8–H18), (3) verification conditions (H15’ and H19), and (4) testing conditions (H16’ and H20). Conditions H1’–H16’ represent the thermal rise and deformation measurements in the baseline mechanical state at the time of delivery (2021), as shown in Figure 5, while H1–H20 represent the mechanical state after extended operation (2024). Figure 6 presents the differences in the experimental conditions.

The experimental conditions corresponding to the baseline mechanical state at delivery in 2021 are marked with an apostrophe (e.g., H1′–H16′), whereas those representing the mechanical state after extended operation in 2024 are denoted without an apostrophe (e.g., H1–H20).

The test under the characteristic operating conditions in the baseline mechanical state at the time of machine delivery (2021) involved (1) triggering an emergency stop while increasing the ambient temperature (H1’); (2) releasing the emergency stop while increasing the ambient temperature (H2’); (3) releasing the emergency stop and starting the spindle (H3’); and (4) releasing the emergency stop, increasing the ambient temperature, and starting the spindle (H4’). After calibration and acceptance, the machine’s structural rigidity, geometric accuracy, spindle preload, and thermal behavior were observed to remain within the thermal characteristics defined by the original design.

To better understand the machine’s behavior in the mechanical state after extended operation (2024), different characteristic conditions were designed by varying the following parameters: (1) the presence or absence of an emergency stop release (H1 and H2), (2) the ambient temperature (H3), and (3) spindle operation (H4). Conditions H1–H4 were intended to evaluate the individual effects of each factor. Additionally, H5 considered the combined effects of ambient temperature rise and emergency stop release, H6 encompassed the effects of spindle operation and emergency stop release, and H7 accounted for the combined influence of all three factors.

Regarding the training conditions, the spindle operating characteristics were evaluated across five duty cycles and under two ambient temperature conditions: a fixed ambient temperature of 20 °C and variable ambient temperatures of 20–28 °C. The five duty cycles were as follows: (1) 5 s of operation and 5 s of stoppage (H5’, H9’, H8, and H13); (2) 10 s of operation and 5 s of stoppage (H6’, H10’, H9, and H14); (3) 30 s of operation and 5 s of stoppage (H7’, H11’, H10, and H15); (4) 60 s of operation and 5 s of stoppage (H8’, H12’, H11, and H16); (5) 300 s of operation and 5 s of stoppage (H13’ and H17). The operation and stoppage durations in these duty cycles were primarily based on the spindle’s running and idle times observed during actual workpiece machining. Additionally, to consider the effects of extreme conditions, H14’ and H18 were designed such that the ambient temperature would increase when the spindle started running and would decrease when the spindle stopped.

To validate experimental conditions, simulated processing conditions must be established based on actual ones were designed with reference to actual industrial processing conditions at Tongtai. The designed condition consisted of spindle operation for 5 min, followed by a 1 min stop, with six speed changes forming one cycle. The machine was run for 6 h and then rested for 2 h, while the ambient temperature was varied between 20 and 28 °C. The verification conditions were designated as H15’ and H19.

The designed testing conditions were based on actual machining conditions. In this regard, we utilized the real-world machining conditions involved in motorcycle camshaft production. The spindle was operated for 10 s and then stopped for 5 s, with 69 speed changes forming one cycle. The machine was operated for 6 h and then rested for 2 h, while the ambient temperature was varied between 20 and 28 °C. The testing conditions were designated as H16’ and H20.

2.1.3. Comparison of Characteristic Differences Between New and Old Machines

Maurya et al. [24] investigated the influence of cutting fluids on machine thermal behavior and measurement errors. In this study, when comparing characteristic differences, we used measurement data collected under the same experimental conditions, namely H7’ and H10, at a fixed ambient temperature (spindle running for 30 s before stopping for 5 s, repeated continuously for 6 h before stopping). Under conditions involving a constant spindle speed and ambient temperature, the heat generated by spindle operation is primarily dissipated through the coolant. Therefore, the temperature rise at the coolant outlet can be used to monitor the spindle’s thermal behavior. Figure 7 displays the experimental results. The coolant outlet temperature rise (CoolOut_ΔT ≈ 2.5 °C) and the internal spindle temperature rise (Spindle_ΔT ≈ 4 °C) in the mechanical state after extended operation (2024) significantly exceeded those in the baseline mechanical state at the time of delivery (2021) (CoolOut’_ΔT ≈ 1.5 °C and Spindle_ΔT ≈ 1 °C, respectively). Furthermore, measurements of the tool tip’s thermal deformation in the Z-direction after prolonged machine operation show that the deformation in the baseline mechanical state during steady conditions was approximately 5–7 μm, whereas that in the mechanical state after extended operation (2024) increased to approximately 22–26 μm. This indicates that under identical experimental conditions, the mechanical state after extended operation (2024) produced significantly greater thermal effects.

2.2. Development of Pretrained Model

After collecting the experimental data, we observed the heat conduction paths within the machine and the physical phenomena related to the thermal deformation in the castings to determine the temperature measurement points in the spindle, column, and base areas. Because a large number of candidate temperature measurement points were identified, incorporating all of them would increase the complexity of the pretrained model and compromise its performance. Therefore, we used a feature selection method based on mutual information (MI) [25] to evaluate the nonlinear correlation between each temperature measurement point and the thermal deformation of the tool tip. This algorithm quantifies the nonlinear correlation between temperature variations and thermal deformation. Based on the results, the point exhibiting the highest correlation with the tool tip’s thermal deformation in each measurement region was selected as a temperature-sensitive point and used as an input feature for the thermal compensation model.

Using the time-series forecasting function provided by AutoKeras [26,27], we conducted automated hyperparameter tuning to analyze the training conditions (H1’–H14’) and the validation condition (H15’) for the baseline mechanical state at the time of delivery (2021). After obtaining the optimal model parameters, the activation functions in the hidden layers were manually adjusted within the same model architecture, and the model was retrained. Finally, based on the root-mean-squared error (RMSE) values from the feature data, training conditions (H1’–H14’), validation condition (H15’), and testing condition (H16’), the model with the lowest average RMSE was selected as the pretrained model for transfer learning.

The overall framework for constructing the pre-trained model can be divided into three main steps: (1) selection of temperature-sensitive points based on physical phenomena and correlation analysis; (2) identification of the optimal model architecture using AutoKeras (v1.0.20); and (3) definition of evaluation metrics for the pretrained model.

2.2.1. Selection of Temperature-Sensitive Points Based on Physical Phenomena and Correlation Analysis

Step 1: The thermal deformation of the spindle tool tip in the Z-direction is influenced by the following factors: (1) spindle expansion—thermal deformation caused by the internal bore structure triggers expansion in the positive Z-direction; (2) column expansion—prolonged operation causes a temperature difference between the front and rear of the column, resulting in the tool tip tilting forward or backward in the Z-direction; and (3) lateral expansion of the base—this affects the expansion of both the displacement gauge bracket and the spindle tool tip on either side in the Z-direction. These phenomena are illustrated in Figure 8. The temperature measurement points for each thermal expansion region were as follows:

(1): spindle: [S25, S26, S27, and $(S 25 + S 26 + S 27) ⁄ 3$ ];
(2): column: [ $- (C 16 - C 15)$ , $- (C 14 - C 13)$ , $- (C 18 - C 17)$ , and $- ((C 16 + C 14 + C 18) ⁄ 3 - (C 15 + C 13 + C 17) ⁄ 3)$ ]; and
(3): base: [B1, B2, B3, B4, B5, B6, $(B 1 + B 2 + B 3 + B 4) ⁄ 4$ , $(B 5 + B 6) ⁄ 2$ , and $(B 1 + B 2 + B 3 + B 4 + B 5 + B 6) ⁄ 6$ ].

Step 2: Based on the MI equation (Equation (2)), the temperature and deformation measurements were assigned to Variables X and Y, respectively. X represents the temperature data from different measurement points within the same region, while Y represents the deformation data along the Z-axis. Additionally, p(x) and p(y) denote the marginal probability distributions of the temperature-rise data and deformation data, respectively, while p(x, y) denotes the joint probability distribution of the temperature rise and deformation data. After completing the calculations, the MI values of the temperature measurement points within the same region were ranked in descending order, and the point with the highest value was selected as the temperature-sensitive point for the thermal compensation model.

I (X; Y) = \sum_{x ϵ X} \sum_{y ϵ Y} p (x, y) l n (\frac{p (x, y)}{p (x) p (y)}) .

(2)

2.2.2. Identification of Optimal Model Architecture Using AutoKeras

The hyperparameter optimization process is closely linked to the prediction accuracy of the thermal compensation model. Effective neural network training requires collaboration between researchers with domain-specific expertise and those experienced in machine learning techniques. During parameter tuning, a gradient-descent strategy is typically used to solve the optimization problem, as shown in Equation (3), where

(X^{(t r a i n)})

represents the training data, λ denotes the selected range and definition of hyperparameters,

A_{λ}

refers to the algorithm that controls the model architecture and training process, and F is the computed evaluation metric.

F = A_{λ} (X^{(t r a i n)}) .

(3)

Compared with manual parameter tuning, the automated random search strategy for hyperparameter optimization allows for exploring a wider hyperparameter space and therefore typically produces more accurate models. In this study, thermal transfer was considered a time-dependent dynamic behavior; therefore, the “Timeseries Forecasting” function in AutoKeras was employed to optimize the hyperparameters. This function utilizes default neural network architecture settings, which include the following: (1) hidden layer types: Simple Recurrent Neural Network, Unidirectional Gated Recurrent Unit, bidirectional Gated Recurrent Unit, unidirectional Long Short-Term Memory, and bidirectional Long Short-Term Memory; (2) number of hidden layers: 1–3; (3) dropout rates: 0, 0.25, and/or 0.5; (4) learning rates: 0.1–0.000001; (5) activation functions: “Tanh” for hidden layers and “Linear” for the output layer.

Users can define the hyperparameter search range and randomly sample 50 independent hyperparameter sets. The system then trains models using all combinations within the defined range and evaluates each of the randomly selected hyperparameter sets. In this study, the lowest evaluation value from the validation set (H15’) was selected as the optimal hyperparameter configuration for the model. The user-defined hyperparameter search settings were as follows: (1) optimizer: Adam; (2) batch size: 16; (3) number of validation repetitions: 3; (4) number of training epochs: 20.

The temperature-rise data recorded at the temperature-sensitive points described in Section 2.2.1 were used as input to the thermal compensation model, while the thermal deformation data were used as the model output (H1’–H14’). Subsequently, the temperature-rise data under the verification conditions described in Section 2.1.2 were input into the thermal compensation model for calculation. After obtaining the training and validation loss scores calculated by the software’s loss function based on a single training and validation run, the model with the lowest scores was selected. Next, a specialized search algorithm was used to automatically select an appropriate neural network architecture, along with the corresponding hyperparameters, whereafter the model was retrained and validated. This process was repeated 100 times, and the model with the lowest training and validation losses among all runs was selected as the optimal model based on automatic hyperparameter tuning.

2.2.3. Evaluation Metrics for Pretrained Model

After completing the automated random search for hyperparameters and obtaining the preliminary model architecture (Table 1), the activation functions in the hidden layers were manually adjusted, and the thermal compensation model was retrained. Subsequently, the activation functions in the hidden layers are manually adjusted, and the thermal compensation model is retrained. During model evaluation, the temperature-rise data from the temperature-sensitive points in the baseline mechanical state at the time of delivery (2021) (H1’–H16’) were used as inputs to calculate the predicted thermal deformation. The model’s performance was evaluated based on the RMSE between the predicted thermal deformation and the experimentally measured thermal deformation, as shown in Equation (4):

R M S E = \sqrt{\sum_{i = 1}^{n} \frac{{({\hat{y}}_{i} - y_{i})}^{2}}{n}} .

(4)

Finally, to evaluate the overall performance of the pretrained model under different conditions, we adopted the following approach: First, the average RMSE for the features and training conditions (H1’–H14’) was calculated. Then, the RMSEs under the validation condition (H15’) and the test condition (H16’) were computed. These three values were summed and divided by 3 to obtain a comprehensive average, as shown in Equation (5). The model with the lowest comprehensive average was selected as the pretrained model for the transfer learning algorithm.

{R M S E}_{a v g} = ({R M S E}_{t r a i n_a v g} + {R M S E}_{v a l} + {R M S E}_{t e s t}) / 3 .

(5)

2.3. Development of Transfer Learning Algorithm for Thermal Compensation in Horizontal Machining Centers

The transfer learning algorithm for thermal compensation modeling in horizontal machining centers was developed through a four-step experimental process: (1) the originally optimized thermal compensation model was utilized as the pretrained model for transfer learning; (2) based on the structural expansion and parameter-unfreezing strategy for the pretrained model, a composite fine-tuning neural network architecture was constructed [28]; (3) the neural network model architecture was trained and validated; (4) finally, the architecture was tested.

2.3.1. Originally Optimized Thermal Compensation Model as Pretrained Model for Transfer Learning

Based on the temperature-rise data (H1–H20) from the temperature-sensitive points in the mechanical state after long-term operation (2024) as input, the pretrained model described in Section 2.2 was used to calculate the predicted thermal deformation. The overall performance of the pretrained model was evaluated under different conditions using Equation (5) to determine the effectiveness of model transfer.

2.3.2. Creation of Composite Fine-Tuning Neural Network Architecture

To develop an efficient and high-accuracy transfer learning technique for thermal compensation modeling, a composite fine-tuning neural network architecture was devised. With reference to parameter-based transfer learning methods [28,29,30], two model improvement strategies were employed. The first approach was a structural extension strategy, which involves adding extra hidden layers to the pretrained model and transferring parameters that represent thermal behavior from the pretrained model to the new model; this enhances the model’s adaptability to changes in machine characteristics during long-term operation. The second approach was a parameter-unfreezing strategy, which involves selectively freezing or unfreezing portions of the pretrained parameters to preserve the original machine characteristics reflected in the temperature rise and thermal deformation data. This strategy served as the foundation for constructing the composite neural network architecture. It balances the adaptation to changes in machine characteristics after long-term operation with the preservation of the original machine characteristics, enabling rapid adjustment of the original thermal compensation model to accommodate changes in thermal behavior over time.

During the implementation of the composite architecture, the structural expansion strategy was applied first. Four possible positions for inserting hidden layers were defined within the neural network layers possessing trainable weights and biases (e.g., “bidirectional,” “bidirectional_1,” and “regression_head_1,” as listed in Table 1): (I) before “bidirectional,” (II) between “bidirectional” and “bidirectional_1,” (III) between “bidirectional_1” and “regression_head_1,” and (IV) after “regression_head_1.” Each of these four positions can optionally include an inserted hidden layer. If a hidden layer is added, it can be either a dense layer or a bidirectional LSTM layer, both of which are used in the pre-trained model. Equation (6) defines the total number of possible combinations based on these four positions and two network structure choices:

3^{4} - 1

(6)

Each of the four positions can assume one of three states: no insertion, dense insertion, or long short-term memory (LSTM) insertion. The value −1 denotes the configuration in which no insertion occurs at any position, rendering structural expansion meaningless. Excluding this case, the total number of valid structural expansion combinations is 80. This structure encompasses a range of architectures, from single-layer insertions to multi-layer combinations. This design corresponds to Factor A in the subsequent full-factorial analysis, representing the levels of the structural expansion strategy.

Next, under the parameter-unfreezing strategy, each neural network layer with trainable weights and biases—“bidirectional,” “bidirectional_1,” and “regression_head_1” (Table 1)—offers two options: frozen or unfrozen. Therefore, the parameter-unfreezing strategy can yield

2^{3}

possible combinations. This is defined as Factor B in the subsequent full-factorial analysis, representing the levels of the parameter-unfreezing strategy.

By integrating the structural expansion strategy (Factor A) with the parameter-unfreezing strategy (Factor B), a composite fine-tuning neural network architecture was established. This architecture was further combined with the strategy for combining model training conditions (Factor C) to perform systematic training, validation, and testing. The outcome was a transfer learning–based thermal compensation model applicable to horizontal machining centers, which achieves enhanced generalizability under varying thermal behaviors across different machines.

2.3.3. Training and Validation of Composite Fine-Tuning Neural Network Architecture

To train the composite neural network model, the temperature-rise data from the temperature-sensitive points in the baseline mechanical state at the time of delivery (2021) were used as input, while the corresponding thermal deformation data were used as the output. The training data were obtained under the characteristic and training conditions in the mechanical state after extended operation (2024), as described in Section 2.1.2. For model validation, the verification condition H19 mentioned in the same section was used, and its corresponding variation set of spindle operation and ambient temperature is illustrated in Figure 5c. The training conditions for the transfer learning model architecture were systematically designed and correspond to Factor C described in Section 2.4.1. The mean-squared error (MSE) calculated from the loss function was used as the evaluation metric for assessing the performance of the transfer learning model. To maintain consistent convergence behavior and update dynamics with the pretrained model, the same settings were adopted during training: the Adam optimizer (learning rate = 0.001, β1 = 0.9, β2 = 0.999, and ε = 1 × 10⁻⁸), a batch size of 16, and 20 training epochs to achieve model convergence.

The training procedure (Figure 9) is conducted as follows. First, the trainable weights and biases of the hidden layers are extracted from the pretrained model (Figure 9, left). Next, the structural expansion and parameter-freezing configurations defined by Factors A and B are applied to fine-tune the composite neural network architecture. This fine-tuning is designed to transfer the thermal behavior characteristics of the original machine using a limited number of parameters and datasets.

For each fine-tuning strategy, temperature rise and thermal deformation data for Factor C serve as the model’s input and output, respectively (Figure 9, right). The training and validation losses are computed, and the two models with the lowest index values are selected as the optimal configurations. Their corresponding weights and training iterations are stored. Upon step completion, the composite neural network architecture is evaluated according to the testing procedure outlined in Section 2.3.4.

2.3.4. Testing of Composite Fine-Tuning Neural Network Architecture

After completing model training and validation based on the integrated structural expansion strategy (Factor A), the parameter-unfreezing strategy (Factor B), and the strategy for combining model training conditions (Factor C), the temperature-rise data collected from the temperature-sensitive measurement points under different experimental conditions were fed into the trained model to estimate the thermal deformation. The predictions were then quantitatively compared with the corresponding measured deformation values to evaluate the model’s accuracy and prediction performance. The model’s performance on the test data was evaluated based on RMSE as the primary metric, calculated using Equation (4). The model’s accuracy and generalizability were comprehensively evaluated by calculating the average RMSEs under the characteristic and training conditions (H1–H18), along with the RMSEs under the validation condition (H19) and test condition (H20), using Equation (5). A smaller

{R M S E}_{a v g}

indicates better overall model performance.

We adopted a full-factorial design for statistical analysis [31,32], focusing on the main effects of the structural expansion strategy, the parameter-unfreezing strategy, and the strategy for combining model training conditions, as well as their interaction effects. We evaluated both two-factor interactions (structural expansion combined with parameter unfreezing, parameter unfreezing combined with training condition combination, and structural expansion combined with training condition combination) and the three-factor interactions (structural expansion, parameter unfreezing, and training condition combination together). The analysis provided comparative results across different factor levels. Finally, under an equal number of experimental scenarios, the required experimental conditions and training duration for model development were compared to identify a transfer learning strategy that effectively reduces the necessity for extensive physical trials.

2.4. Determination of Optimal Factor Selection Strategies via Full-Factorial Analysis

The selection of the optimized transfer learning–based thermal compensation model via full-factorial analysis was divided into two main steps: (1) definition of factors for the full-factorial analysis and (2) formulation of the full-factorial analysis equation.

2.4.1. Definition of Factors for Full-Factorial Analysis

We adopted a full-factorial approach to perform a three-way analysis of variance (ANOVA) [33] on the composite neural network model. The experimental design was formulated with reference to parameter transfer and fine-tuning configurations commonly employed in transfer learning, ensuring that the model optimization strategies aligned with established practices in deep neural network adaptation [30]. Considering the key parameters influencing model training, three factors were defined: (1) Factor A: structural expansion strategy; (2) Factor B: parameter-unfreezing strategy; (3) Factor C: strategy for combining model training conditions. Table 2, Table 3 and Table 4 list the levels of these factors.

Subsequently, the composite neural network models generated under different factor combinations were evaluated according to the procedures described in Section 2.3.3 and Section 2.3.4. The prediction error, expressed in terms of RMSE, was adopted as the response variable for the three-way ANOVA. Thus, the performance differences among the composite neural network architectures considered in this study could be quantitatively assessed. To investigate the main effects of each factor and the interaction effects among different factors on model performance, distinct level settings were defined for each factor.

(1) Factor A–structural expansion strategy: This factor specifies the architectural expansion of the composite neural network through the addition of hidden layers. Candidate expansion configurations were generated and filtered based on the procedure outlined in Section 2.3.1; the selected structures are summarized in Table 2.

Composite neural network architectures with different hidden-layer insertion strategies were selected with reference to the 80 initial insertion configurations introduced in Section 2.3.2. Each composite architecture was independently trained five times to ensure consistency and suppress the influence of stochastic variations during training. The evaluation was performed based on Equation (6), and the average performance across the five trials was considered for comparison. This approach ensured the stability and reproducibility of the composite neural network model under varying experimental conditions. To further select candidate architectures with sufficient generalizability, we calculated the average RMSE across the nine combinations of transfer-learning training conditions (C1–C9) corresponding to Factor C. This average RMSE was then used as the generalization criterion for evaluating each architecture. If the average RMSE of an architecture across C1–C9 was lower than that obtained from directly testing the pretrained model, the architecture was regarded as a superior candidate relative to the baseline. Subsequently, all candidate architectures were ranked in ascending order based on their average RMSE, from which four representative and stable architectures were selected as the experimental levels for Factor A. Table 2 lists all the insertion configurations (A1–A4).

(2) Factor B–parameter-unfreezing strategy: This factor defines whether the parameters of the pretrained model remain frozen or unfrozen during fine-tuning, as shown in Table 3.

In designing Factor B, which concerns the freezing and unfreezing configurations of the parameters in the pretrained model, we focused on three trainable neural network layers: “bidirectional,” “bidirectional_1,” and “regression_head_1” (Table 1). Since each layer can be individually assigned either a frozen or unfrozen state, a total of

2^{3} = 8

possible level combinations are available for this factor.

(3) Factor C–strategy for combining model training conditions: This factor defines the combinations of transfer-learning conditions used for model development. Nine combinations (C1–C9) were designed and implemented, the specific configurations of which are listed in Table 4

To evaluate the influence of transfer learning, Factor C was defined as a systematically constructed source of training data. The combinations of transfer-learning conditions included machine operating characteristics, duty-cycle variations, and environmental temperature settings—factors closely associated with machine thermal behavior. Based on these considerations, nine representative combination strategies were formulated, the design objectives of which are summarized in Table 4

2.4.2. Formulation of Full-Factorial Analysis Equation

As mentioned earlier, a full-factorial method was used to perform a three-factor ANOVA on the composite neural network model. Additional details are provided in Section 5.4 (on pages 206–211) of Reference [34]. The experimental design employed during the construction of the composite neural network model was based on transfer learning techniques involving parameter transfer and fine-tuning [35]. Accordingly, three model-related factors were incorporated into the full-factorial experimental matrix: (1) the structural expansion strategy, which involves extending the original pretrained architecture by inserting additional hidden layers to enable thermal behavior migration (Factor A), as shown in Table 2; (2) the parameter-unfreezing strategy, which controls the retention or adaptation of machine-specific thermal characteristics by freezing or unfreezing selected layers during fine-tuning (Factor B), as shown in Table 3; and (3) the strategy for combining model training conditions, which defines training scenarios based on machine operation, duty cycles, and environmental temperature conditions (Factor C), as shown in Table 4. Considering that four, eight, and nine levels were defined for Factors A, B, and C, respectively, the total number of full-factorial experiments was

4 \times 8 \times 9 = 288 .

Next, for each experimental combination of factors (

A_{i}, B_{j}, C_{k}

), the composite neural network model was trained five times. The corresponding

{R M S E E}_{i j k}

in each experiment was recorded as the performance metric. Based on experimental data, a linear model was then constructed using ordinary least squares, with the regression model incorporating three factors: the combinations of transfer-learning conditions, the composite neural network architectures, and the layer-wise unfreezing of the original pretrained model, as shown in Equation (7). An ANOVA was performed to determine whether statistically significant differences existed among the group means. As a fundamental statistical tool, ANOVA has been widely utilized across diverse research domains—including engineering [36], chemistry [37], and medicine [38]—to evaluate variability sources and significance among multiple experimental groups. Through the ANOVA, the total variance was decomposed into individual components, which allowed for calculating the sum of squares, degrees of freedom, mean squares, F-statistic, and corresponding significance level (p-value) for each main factor and interaction effect. We set the significance threshold at α = 0.05, meaning that the effects of the factors were considered statistically significant if they reached a 95% confidence level. After finalizing the composite neural network model, the ANOVA was conducted using the “ANOVA” function in the “stats models” module of the Python (https://www.python.org/) programming language [39], based on the Type II sum of squares method [39]. The results of this analysis served as the basis for the subsequent model adjustment, architecture design, and transfer learning strategy optimization steps.

{R M S E E}_{i j k} = μ + α_{i} + β_{j} + γ_{k} + {(α β)}_{i j} + {(β γ)}_{j k} + {(α γ)}_{i k} + {(α β γ)}_{i j k} + ε_{i j k l}

(7)

In a full factorial design, the sums of squares, mean squares, and corresponding F-statistics for the main and interaction effects (A, B, C, AB, AC, BC, and ABC) are computed to evaluate the statistical significance of each factor in subsequent analyses [34].

3. Results and Discussion

This section explores the optimization of the thermal compensation model for horizontal machining centers and the application of transfer learning techniques. Our goal was to improve the prediction accuracy and adaptability of the original thermal compensation model when applied to machine tools with altered characteristics, as well as to provide strategic recommendations for model transfer. First, an optimized thermal compensation model was established for the baseline mechanical state of the machine tool at the time of delivery (2021), which served as the pretrained model for subsequent transfer learning. To accommodate changes in thermal behavior resulting from prolonged machine tool usage, a parameter-based transfer learning method was employed. The model was trained on all experimental factor combinations using a full-factorial design, and the applicability of the transfer learning model architecture was assessed through a three-factor ANOVA based on the model’s test results. This section also presents recommendations for rapidly updating the thermal compensation model. The proposed method effectively adapts the original model to machine tools with altered characteristics, enhances predictive performance for the mechanical state after extended operation (2024), and significantly reduces the time required for remodeling—achieving both high efficiency and accuracy.

3.1. Optimized Thermal Compensation Model at Delivery as Pretrained Model

Based on the methodology described in Section 2.3, we established and validated a thermal compensation model for a horizontal machining center, designating the resulting optimized model as the pretrained model for subsequent transfer learning. The selected temperature-sensitive points were as follows: (1) points affected by spindle expansion:

(S 25 + S 26 + S 27) ⁄ 3

; (2) points affected by base expansion:

(B 1 + B 2 + B 3 + B 4) ⁄ 4

; and (3) points affected by column temperature difference:

- ((C 16 + C 14 + C 18) ⁄ 3 - (C 15 + C 13 + C 17) ⁄ 3)

.

After selecting these temperature-sensitive points, the model’s hyperparameters were determined using the automatic tuning method described in Section 2.3.1. The activation function of the thermal compensation model was then manually adjusted to derive the optimized model. The input comprised temperature-rise data (H1’–H14’) from the temperature-sensitive points in the baseline mechanical state at delivery (2021), and the output was the corresponding thermal deformation. The neural network model was trained by iteratively optimizing the weights and biases of the neurons. Upon completion of training, the RMSE was calculated for all operating conditions (H1’–H16’). The average RMSE across the test conditions was computed using Equation (7), and the model with the lowest average RMSE was selected as the pretrained model for transfer learning. The results are shown in Table 5.

3.2. Experimental Framework and Optimization Strategy for Full-Factorial Design

As part of the optimization strategy adopted for the full-factorial analysis, we first evaluated the performance of the pretrained model by directly applying it to measurement data obtained from the machine in its mechanical state after extended operation (2024). The model was tested under nine combinations of experimental conditions (C1–C9), and the mean test result (without transfer learning) was defined as the baseline model performance for benchmarking during the subsequent optimization procedures. Thereafter, based on the transfer learning approach utilizing hidden-layer insertion, a total of 80 composite neural network architectures were constructed (as detailed in Section 2.3.2). Each architecture was trained five times under the nine combinations of training conditions to account for randomness, before being evaluated using measurement data from the machine’s post-operation state. The average RMSE values were then calculated, and the models were accordingly ranked in descending order of performance. The architecture exhibiting lower mean RMSEs than the baseline pretrained model were subsequently retained. As shown in Table 6, four network architectures outperformed the baseline and were therefore selected for the structural expansion strategies under Factor A.

In the three-factor ANOVA conducted using the full-factorial approach, Factor B was defined as the parameter-unfreezing strategy, designed to investigate the main effects and interaction effects between parameter fine-tuning and hidden-layer structure expansion. Additionally, Factor C, representing the strategy for combining operating conditions, was introduced to assess whether the adjusted network architecture could quickly adapt to changes in thermal behavior caused by long-term machine operation, considering both machine operating characteristics and ambient temperature variations. Using the variance analysis formula in [34]. The RMSE was used as a response indicator to quantify the impact of different factors and their interactions. The statistical results are summarized in Table 7. Based on the main-effect analysis, Factor A (structure expansion strategy) exhibited the largest sum of squares (SS = 15,031.82) and the highest F-value (F = 4473.78), significantly higher than those of Factors B and C. This indicates that compared with parameter unfreezing or changes in training conditions, the location and configuration of the newly added hidden layers have a greater impact on the model’s prediction error. Based on the interaction analysis, the p-values for all two-factor interactions were less than 0.05, indicating a statistically significant interaction between any two factors. Notably, the three-factor interaction was also significant (F = 2.77, p = 1.29 ×

10^{- 25}

), which suggests that simultaneous changes in the three factors produce a nonlinear combined effect on the model’s performance, rather than a simple additive effect. The residual MSE was 1.12, lower than the MSEs of all principal and interaction terms, indicating low variability among the test observations. This confirms the stability of the factorial experimental results. Thus, the full-factorial analysis effectively revealed the optimal transfer learning strategy for thermal compensation, enabling the model to quickly adapt to changes in thermal characteristics caused by long-term machine operation.

3.3. Visualization of ANOVA Results: Main and Interaction Effects

To interpret the main and interaction effects identified through the three-way ANOVA, the influence of each factor on the model’s prediction error (RMSE) was analyzed in three steps.

Step 1–Main-effect analysis: Figure 10 depicts the main-effect plots constructed using bar charts. Each bar represents the mean RMSE computed from all data points in the full-factorial design for the corresponding factor level, where a lower RMSE indicates superior performance. The horizontal line at the center of each bar denotes the standard deviation of the RMSE for that level combination, with a shorter line reflecting higher result consistency for that level. Figure 10a–c illustrate the main effects of Factor A (structural expansion strategy), Factor B (parameter-unfreezing strategy), and Factor C (strategy for combining model training conditions), respectively. From a model performance perspective, levels A3, B1, and C9 yield the optimal performance among all levels of Factors A, B, and C, respectively.

Step 1-1: In Figure 10a, the average RMSEs of the structural expansion strategies (Factor A) are arranged in ascending order: A3 < A2 < A1 < A4. Table 8 describes each configuration in detail.

Step 1-2: As shown in Figure 10b, the average RMSEs for the parameter unfreezing strategy (Factor B) were arranged in ascending order, which yielded the following sequence: B1 < B8 < B6 < B5 < B2 < B7 < B3 < B4. Further details are provided in Table 9.

According to the definition of the Optimism of the Training Error Rate in Section 7.4 of Hastie et al. (2009) [40], the gap between the expected In-sample Error (

{E r r}_{i n}

) and the Training Error (

\bar{e r r}

) is determined by the sum of the covariances between the predicted values (

\hat{y_{i}}

) and the observed values(

y_{i}

), as expressed in Equation 7.21:

ω = E_{y} (o p) \equiv E_{y} ({E r r}_{i n} - \bar{e r r}) = \frac{2}{N} \sum_{i = 1}^{N} c o v (\hat{y_{i}}, y_{i})

In addition, a linear-fit assumption is introduced. For a linear model with

d

degrees of freedom, this relationship can be rewritten as:

\sum_{i = 1}^{N} c o v (\hat{y_{i}}, y_{i}) = d σ_{ε}^{2}

After substituting this relationship into Equation 7.21, a refined expression is obtained for the expected optimism of a model with fixed parameters:

ω = \frac{2 \cdot d}{N} σ_{ε}^{2}

where

d

(Degrees of Freedom) corresponds to the 415 parameters,

N

(Sample Size) denotes the dataset size, and

σ_{ε}^{2}

represents the irreducible noise variance in the data.

When the 2024 dataset is limited (small N), keeping all 415 parameters trainable (

d

) causes

σ_{ε}^{2}

to increase disproportionately. This behavior promotes overfitting, as the model learns transient noise rather than the underlying signal. Therefore, a freezing strategy is applied to reduce the number of trainable parameters (

d

), which encourages the model to emphasize stable and generalizable patterns inherited from the historical data.

Step 1-3: As shown in Figure 10c, the average RMSEs for the strategies for combining model training conditions (Factor C) were sorted in ascending order. The resulting sequence was C9 < C7 < C1 < C3 < C4 < C8 < C5 < C6 < C2. Table 10 provides a detailed description of each combination.

The optimal combination of main-effect factors was determined by selecting the levels of Factors A, B, and C that yielded the lowest overall RMSE values. Accordingly, the optimal main-effect settings were found to be as follows: structural expansion strategy = A3 (inserting a hidden layer after the LSTM output layer followed by a dense layer); parameter-unfreezing strategy = B1 (no layers unfrozen); and operating condition combination strategy = C9.

Step 2–Two-factor interaction analysis: Figure 11, Figure 12 and Figure 13 depict the two-factor interaction effects through interaction heatmaps. In these plots, the horizontal and vertical axes correspond to the levels of the two experimental factors under consideration, while the value in each cell represents the average RMSE for that specific factor combination. Differences in error magnitude across combinations are further visualized using a color scale. Specifically, Figure 11 shows the interaction between Factors A and B, Figure 12 illustrates the interaction between Factors A and C, and Figure 13 presents the interaction between Factors B and C.

Step 2-1: Figure 11 presents the interaction between Factor A and B using a heatmap-style interaction plot. The vertical and horizontal axes correspond to the levels of Factor A and Factor B, respectively.

As observed in Figure 11, under the unfreezing configuration, A4 (dense layer inserted after hidden layer, LSTM inserted after output layer) yielded the highest average RMSE (approximately 11–12). This indicates that introducing recurrent/time-dependent units at the output stage destabilizes the model during transfer learning. When an LSTM-related structure is appended to the output layer and subsequently unfrozen, historical temporal information is reintroduced into the prediction process, which increases temporal sensitivity and disturbs the pretrained output-layer weights. This amplifies error accumulation, significantly raising the RMSE.
Likewise, A1 (dense layer inserted after input layer) exhibited consistently high RMSEs when either the hidden layer, the output layer, or both were unfrozen. Inserting a dense layer at the input stage forces the model to relearn the feature relationship between temperature and deformation, undermining the pre-learned thermal behavior representation. Subsequently, upon unfreezing, the altered feature pathway increases the gradient sensitivity and weakens the established thermal correlations, thereby compromising predictive performance.
Overall, these findings demonstrate that the interaction between the insertion location and the unfreezing strategy has a decisive impact on transfer learning stability and that unsuitable structural–parameter configurations substantially increase prediction errors during model adaptation.

Step 2-2: Figure 12 displays the interaction between Factors A and C, where the horizontal axis represents the degree of structural expansion, and the vertical axis represents the combination of operating conditions. Based on the color distribution in the heatmap, no obvious linear trend between these two factors is observed.

However, under all operating conditions, A4 (inserting a dense layer after the hidden layer and an LSTM after the output layer) exhibited the highest RMSE (all exceeding 10, approximately 10.7–10.9), with limited variation across conditions. Thus, the effectiveness of this structure is consistently poor, which suggests that the resulting network architecture is not suitable for thermal parameter prediction in various processing scenarios.
Among all combinations of conditions, C2, with the least amount of training data, also displayed a relatively high RMSE after training. This indicates that an insufficient sample size leads to poor feature representation, thus limiting the model’s fitting ability. However, once the error exceeds a certain threshold, different operating conditions become indistinguishable, meaning that the model cannot learn thermal features effectively and becomes insensitive to changes in operating conditions.
C3 combined typical testing for operating conditions with variations in spindle speed at low/medium/high temperatures under constant and varying ambient temperatures, serving as a benchmark for evaluating the model’s generalizability under different thermal behaviors. This configuration supports cross-condition adaptability and provides a foundational representation for models subsequently trained under C4–C6.
When retaining only low-speed data under constant and varying temperature conditions, the RMSE of C4 was similar to that of C3. This shows that the influence of spindle speed can be adequately captured under low-speed conditions and that low-speed experiments can serve as a transfer basis for thermal modeling.
C7 retained only varying ambient temperature data under low-speed operation, further reducing the RMSE. Under low-speed conditions, spindle heating is minimal and stable, whereby ambient temperature becomes the primary driving thermal characteristic. This preserves the pre-learned thermal behavior distribution and facilitates fine-tuning, enabling the model to capture temperature-deformation correlations more accurately and thereby reducing error. Conversely, due to the lack of temperature variations in C8, limited temperature information was available during training, because the modeling was based solely on isothermal data. Therefore, the model often overreacted to small temperature fluctuations during prediction, which increased its RMSE.
Under C9, the thermal deformation trend became more pronounced when both spindle speed variations and large thermal disturbances caused by ambient temperature fluctuations were considered simultaneously. The clearer temperature gradient improved the separability of the features, allowing the model to learn stable thermal mappings more effectively and thereby reducing the overall RMSE again.
These results indicate that despite larger training datasets, C3 and C4 did not outperform C7 and C9 in terms of transfer learning, which suggests that additional data do not necessarily provide more knowledge during the fine-tuning phase. Instead, such data may modify (or even compromise) the existing thermal behavior information encoded in the pretrained model. Data at fixed ambient temperatures exhibit single-source, weakly nonlinear dynamics, while variable-temperature data correspond to multi-source, hysteresis-driven nonlinear thermal behavior [41].
During fine-tuning, gradient-update conflicts and dynamic behavior mismatches can occur when different heat distributions correspond to different heat transfer mechanisms in heterogeneous thermal environments, which impairs the learned feature weights. If an increase in the amount of data is accompanied by inconsistencies in the distribution, the model’s predictive accuracy may decrease [42].

Step 2-3: Figure 13 illustrates the interaction between Factors B and C, where the horizontal and vertical axes represent the levels of Factor B and Factor C, respectively.

In the unfreezing configuration (B1), the pretrained model retained the original thermal behavior representation learned through machine learning at delivery. Since no weight updates were performed, the model also retained the original weights and biases corresponding to the temperature features and thermal deformation trends, thus exhibiting the smallest RMSE under different operating conditions.
However, when only the hidden layer, only the output layer, or both the hidden and output layers were unfrozen, the average RMSE increased significantly. This is because during fine-tuning, the weights and biases of the initially learned relationship between temperature and thermal deformation were retrained, which disrupted the established thermal behavior features. Therefore, the model struggled to maintain stable predictions under changing conditions. Particularly under C2, which had a smaller sample size, the model’s sensitivity to weight perturbations increased, and the nonlinear mapping between the temperature signal and the deformation output was amplified, which increased the prediction error.
Interestingly, when the input layer was unfrozen (whether individually or together with hidden or output layers), the RMSE remained lower than when only the intermediate or output layers were unfrozen. When the input layer was unfrozen (either alone or together with the hidden or output layers), the RMSE remained lower than in cases where only the intermediate or output layers were unfrozen. Updating the input-layer parameters allows the model to recalibrate the temperature feature distribution before the resulting feature activations propagate through the subsequent hidden and output layers, thereby better matching the new thermal domain. Consequently, compared with fine-tuning only the intermediate layers, input-layer adaptation improves the model’s generalizability under dynamic operating conditions (C3–C9), including variations in spindle speed and ambient temperature.
Overall, fine-tuning the hidden or output layers may erase pretrained thermal deformation knowledge, whereas modifying the input layer preserves the stability of the core thermal representation while enabling adaptation to new conditions.

Step 3—Three-factor interaction analysis: Figure 14 visualizes the interaction among the three factors. Factor B and RMSE are presented in horizontal and vertical axes, respectively, while different colors correspond to the four structural expansion strategies (Factor A). The plotted values reflect averages aggregated across combinations of operating conditions (Factor C). Vertical projection lines indicate the mean RMSE values of Factor C projected onto the XY plane defined by Factors A and B, facilitating observation of the three-factor interaction. Detailed interpretations of the results from each of these analyses are presented below.

Step 3—Three-factor interaction evaluation: As shown in Figure 14, under different parameter-unfreezing settings, the RMSE variations of different insertion strategies exhibited a nonlinear and non-parallel distribution, indicating a complex three-factor interaction.

Among all configurations, A4 (inserting a dense layer after the hidden layer and then an LSTM layer after the output layer) displayed the highest RMSE (~11–12) across all unfreezing strategies; this confirms that placing a recurrent/time-varying structure in the output layer is detrimental during transfer learning. When an LSTM layer is followed by the output layer and then unfrozen, historical time information is fed back into the current prediction, which increases temporal sensitivity and disrupts the stability of the pretrained output weights. This feedback loop exacerbates gradient oscillations, amplifies error propagation, and ultimately leads to a significant decline in performance during adaptation.
Similarly, A1 (inserting a dense layer after the input layer) also produced a high RMSE when the hidden or output layer was unfrozen. Inserting a dense layer into the input layer forces the model to relearn thermal feature maps, thereby disrupting the pre-learned temperature–deformation representation. As more layers are unfrozen, the modified input path disrupts the stability of feature alignment further, leading to increased prediction errors and less stable transfer.
Conversely, when the time-varying network structure was placed within the inner hidden layers (A2 and A3), the RMSE fluctuations under different unfreezing strategies were significantly reduced. This indicates that time-varying features embedded in deeper layers can be partially absorbed and smoothed, thus preserving the core thermal behavior relationships and improving robustness during transfer learning.
The results of the two-factor analysis (Figure 11, Figure 12 and Figure 13) corroborate this conclusion. Changes in external operating conditions, such as fluctuations in ambient temperature and spindle speed settings, lead to differences in feature distribution. This exposes the model to the uncertainties in feature differences and weight adjustments during transfer. Therefore, the model’s prediction error (RMSE) is inferred to be influenced by both the neural network architecture design (adding and unfreezing layers) and the training strategy based on external operating conditions. Section 3.4 outlines the validation of the model based on the results presented in this section.

3.4. Model Validation Based on ANOVA Results

In this study, several transfer learning network configurations were designed to improve the predictive power and training efficiency of thermal deformation models through a three-factor ANOVA and subsequent analyses of main effects and interaction effects. We then compared these architectures with traditional hyperparameter-optimized neural network models to evaluate the improvements achieved in terms of training time and prediction accuracy.

For the main-effect analysis, the optimal levels of each factor (Figure 10) were used to configure the recommended models, as shown in Table 11 (Recommended Combination 1).

For the two-factor interaction analysis, to determine the optimal configuration for the two-factor interactions, i.e., the combination of levels that produced the lowest mean RMSE, we examined the statistical results for each interaction, as shown in Figure 11, Figure 12 and Figure 13. We designated this lowest-error combination as the fixed baseline. Next, in each interaction, the third factor not included in the fixed baseline was ranked by its level, and the combination that produced the minimum RMSE was selected as the final recommended setting for model inference and validation. For example, under the A × B interaction (Figure 11), C9 exhibited the lowest RMSE when A1 and B1 were used. Therefore, this configuration is recommended for the model for the A × B interaction (Recommended Combination 2 in Table 11). The same approach was followed for the A × C and B × C interactions.

Finally, based on the three-factor interaction results, the configuration with the lowest RMSE across all factor combinations was selected as the recommended three-factor model (Recommended Combination 3 in Table 11). Limited prior work has examined this specific phenomenon, and comparative results are not currently available in the existing literature. Accordingly, although the findings are promising, the generalizability of this recommendation is limited.

To evaluate model stability, each recommended configuration was trained five times. The mean RMSE and training time were recorded and compared with the baseline model trained using hyperparameter optimization. Results in Table 11 indicate that the proposed transfer learning strategy converged successfully with a reduced dataset (nine datasets rather than 18), significantly reducing training time from over four hours to ~30 s. Despite the limited data, the optimized model achieved an RMSE of 3.47 µm, which was not significantly different from that of the retrained baseline model (3.21 μm and 3.84 μm, respectively). These findings reveal that the strategy markedly lowers computational costs while preserving high prediction accuracy, providing a robust and reproducible framework for long-term model maintenance in smart manufacturing environments.

If this methodology is applied to other types of machinery that require model updating after long-term use, the approach developed in this study can serve as a reference. After the required data are collected and a pretrained model is established, the transfer model can be constructed by applying the strategies discussed in this chapter, including (i) selecting appropriate transfer layers, with Dense or LSTM configurations considered for the hidden layers and LSTM additions to the output layer avoided; (ii) freezing the parameters of the original pretrained model; and (iii) incorporating environmental temperature fluctuations into the design of operating conditions.

4. Conclusions

A parameter-based transfer learning approach is presented to update machine-tool thermal compensation models when long-term operation induces variations in thermal characteristics. The main insight is that, in intelligent manufacturing environments where model degradation is unavoidable, model updating should take precedence over model retraining. Conventional thermal compensation studies often assume long-term model validity under a fixed mechanical state, but real industrial conditions challenge this assumption due to thermal behavior drift driven by structural aging and environmental variability. The transfer learning framework demonstrates that existing models do not need to be treated as obsolete; instead, they can function as knowledge carriers of the original machine-state relationships and remain useful for continued deployment.

The proposed method combines a structural expansion strategy with a parameter-freezing strategy to form a composite fine-tuned neural network architecture. This design allows the model to accommodate thermal behavior variations arising from long-term operation and structural aging while retaining the originally learned temperature-thermal deformation relationships. Results from a three-factor ANOVA indicate that the optimal model configuration includes three main components: (i) insertion of an LSTM temporal unit after the hidden layers, (ii) full freezing of all pretrained layer parameters, and (iii) use of representative training conditions that jointly account for machine operating characteristics and ambient temperature variations.

The factor-wise analysis reveals that, under the structural expansion strategy, positioning the LSTM layer after the hidden layers captures time-varying thermal drift while preserving output-layer stability. In contrast, introducing temporal feedback at the output layer yields unstable predictions and amplifies oscillation errors. For the parameter-freezing strategy, complete freezing of pretrained network parameters preserves the core temperature-thermal deformation relationship established at the machine’s delivery state. This state allows newly added layers to focus exclusively on learning aging-induced variations. This approach mitigates overfitting and prevents knowledge loss that is typically associated with full model retraining.

With respect to training data configuration, the results indicate that increasing the number of training conditions does not necessarily improve transfer learning performance. Improved outcomes are achieved through the selection of representative thermal operating conditions informed by physical behavior, which reduces experimental effort and data collection demands without compromising model accuracy.

Based on the analysis of the main and interaction effects, the optimal transfer learning configuration employs an LSTM inserted after the hidden layers, fully frozen pretrained layers, and training under representative thermal condition C9. Validation demonstrated a mean RMSE of 3.88 µm for the pretrained model without transfer learning and 3.84 µm for conventional retraining with hyperparameter optimization. The proposed transfer learning approach reduced the RMSE to 3.47 µm, improving prediction accuracy by ~10% and significantly reducing training time from over four hours to ~30 s.

For industrial applications, constructing a high-precision thermal compensation model traditionally demands a comprehensive dataset encompassing 18 operating conditions. While a full model can improve prediction accuracy by 20%, the proposed transfer learning method provides a rapidly deployable alternative by significantly minimizing data collection (18 to 9) and training time from over four hours to ~30 s. In practice, this improved compensation model accuracy extends equipment service life and mitigates expensive machine replacement or reinvestment due to precision degradation.

This study frames transfer learning for machine-tool thermal compensation models as a continuous, evolving process rather than a single modeling task. A concrete, reproducible framework is presented to support long-term model maintenance and updating in intelligent manufacturing environments. Future research may extend this framework to cross-machine transfer learning, enabling a model trained on a single machine to be adapted for others within the same series. This extension could further reduce initial data collection costs and enhance the scalability of deployment across real production lines.

Author Contributions

Conceptualization, C.-C.C., Z.-W.L.C., T.-C.K., C.-J.C. and W.-H.H.; methodology, C.-C.C., Z.-W.L.C. and W.-H.H.; software, C.-C.C., Z.-W.L.C., C.-J.C. and W.-H.H.; validation, C.-C.C., Z.-W.L.C., T.-C.K., C.-J.C. and W.-H.H.; formal analysis, C.-C.C., Z.-W.L.C. and W.-H.H.; investigation, C.-C.C., Z.-W.L.C., T.-C.K., C.-J.C. and W.-H.H.; resources, C.-C.C., Z.-W.L.C., T.-C.K., C.-J.C. and W.-H.H.; data curation, C.-C.C., Z.-W.L.C., T.-C.K., C.-J.C. and W.-H.H.; writing—original draft preparation, C.-C.C., Z.-W.L.C. and W.-H.H.; writing—review and editing, C.-C.C., Z.-W.L.C., T.-C.K., C.-J.C. and W.-H.H.; visualization, C.-C.C., Z.-W.L.C. and W.-H.H.; supervision, T.-C.K. and W.-H.H.; project administration, W.-H.H.; funding acquisition, W.-H.H. All authors have read and agreed to the published version of the manuscript.

Funding

This study was financially supported by the National Science and Technology Council (Project No: NSTC 113-2622-E-194-006) and Tongtai Machine & Tool Co., Ltd. through a university-industry collaboration project. The authors gratefully acknowledge this support.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare that this study received funding from Tongtai Machine & Tool Co., Ltd. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

Abbreviations

The following abbreviations are used in this manuscript:

LSTM	Long short-term memory
MI	Mutual information
RMSE	Root-mean-squared error
MSE	Mean-squared error
ANOVA	Analysis of variance

References

Li, Y.; Zhao, Q.; Lan, S.; Ni, J.; Wu, W.; Lu, B. A review on spindle thermal error compensation in machine tools. Int. J. Mach. Tools Manuf. 2015, 95, 20–38. [Google Scholar] [CrossRef]
Ramesh, R.; Mannan, M.A.; Poo, A.N. Error compensation in machine tools—A review Part II: Thermal errors. Int. J. Mach. Tools Manuf. 2015, 93, 26–36. [Google Scholar]
Wei, X.; Ye, H.; Miao, E.; Pan, Q. Thermal error modeling and compensation based on Gaussian process regression for CNC machine tools. Precis. Eng. 2022, 77, 65–76. [Google Scholar] [CrossRef]
Liu, H.; Miao, E.M.; Wei, X.Y.; Zhuang, X.D. Robust modeling method for thermal error of CNC machine tools based on ridge regression algorithm. Int. J. Mach. Tools Manuf. 2017, 113, 35–48. [Google Scholar] [CrossRef]
Guo, Q.; Yang, J. Application of projection pursuit regression to thermal error modeling of a CNC machine tool. Int. J. Adv. Manuf. Technol. 2011, 55, 623–629. [Google Scholar]
Yang, H.; Ni, J. Adaptive model estimation of machine-tool thermal errors based on recursive dynamic modeling strategy. Int. J. Mach. Tools Manuf. 2005, 45, 1–11. [Google Scholar] [CrossRef]
Zhang, Y.; Yang, J.; Jiang, H. Machine tool thermal error modeling and prediction by grey neural network. Int. J. Adv. Manuf. Technol. 2012, 59, 1065–1072. [Google Scholar] [CrossRef]
ISO 10791-7:2014; Test Conditions for Machining Centres—Part 7: Accuracy of a Finished Test Piece. International Organization for Standardization: Geneva, Switzerland, 2014.
Bringmann, B.; Knapp, W. Machine tool calibration: Geometric test uncertainty depends on machine tool performance. Precis. Eng. 2009, 33, 524–529. [Google Scholar] [CrossRef]
Hinder, F.; Vaquet, V.; Hammer, B. One or two things we know about concept drift—A survey on monitoring in evolving environments. Part A: Detecting concept drift. Front. Artif. Intell. 2024, 7, 1330257. [Google Scholar] [CrossRef]
Chen, T.C.; Chang, C.J.; Hung, J.P.; Lee, R.M.; Wang, C.C. Real-time compensation for thermal errors of the milling machine. Appl. Sci. 2016, 6, 101. [Google Scholar] [CrossRef]
Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees, 1st ed.; CRC Press: Boca Raton, FL, USA, 1984. [Google Scholar]
Ramesh, R.; Mannan, M.A.; Poo, A.N.; Keerthi, S.S. Thermal error measurement and modelling in machine tools. Part II. Hybrid Bayesian network—Support vector machine model. Int. J. Mach. Tools Manuf. 2003, 43, 405–419. [Google Scholar] [CrossRef]
Horejs, O.; Mares, M.; Kohut, P.; Barta, P.; Hornych, J. Compensation of machine tool thermal errors based on transfer functions. MM Sci. J. 2010, 3, 162–165. [Google Scholar] [CrossRef]
Yau, H.T.; Kuo, P.H.; Chen, S.C.; Lai, P.Y. Transfer-learning-based long short-term memory model for machine tool spindle thermal displacement compensation. IEEE Sens. J. 2024, 24, 132–143. [Google Scholar]
Li, P.; Lou, P.; Yan, J.; Liu, N. The thermal error modeling with deep transfer learning. J. Phys. Conf. Ser. 2020, 1576, 012003. [Google Scholar] [CrossRef]
Zhou, D.; Zeng, F.; Jia, S. The thermal error modeling approach of feeding axes for horizontal CNC lathes based on transfer learning. Preprint 2023. [Google Scholar] [CrossRef]
Ma, S.; Leng, J.; Chen, Z.; Li, B.; Li, X.; Zhang, D.; Li, W.; Liu, Q. A novel weakly supervised adversarial network for thermal error modeling of electric spindles with scarce samples. Expert Syst. Appl. 2024, 238, 122065. [Google Scholar]
Zheng, Y.; Fu, G.; Mu, S.; Lu, C.; Wang, X.; Wang, T. Thermal Error Transfer Prediction Modeling of Machine Tool Spindle with Self-Attention Mechanism-Based Feature Fusion. Machines 2024, 12, 728. [Google Scholar] [CrossRef]
Mao, H.; Liu, Z.; Qiu, C.; Liu, H.; Sun, J.; Tan, J. Subspace metric-based transfer learning for spindle thermal error prediction under time-varying conditions. IEEE Trans. Instrum. Meas. 2024, 73, 2514311. [Google Scholar]
Zheng, Y.; Fu, G.; Mu, S.; Zhu, S.; Lin, K.; Yang, L. A review on transfer learning in spindle thermal error compensation of spindle. Adv. Manuf. 2024, 1, 12. [Google Scholar] [CrossRef]
ISO-230-3:2007; Test Code for Machine Tools—Part 3: Determination of Thermal Effects. International Organization for Standardization: Geneva, Switzerland, 2007.
IEC 60584-1:2013; Thermocouples—Part 1: EMF Specifications and Tolerances. International Electrotechnical Commission: Geneva, Switzerland, 2013.
Maurya, S.N.; Li, K.Y.; Luo, W.J.; Kao, S.Y. Effect of coolant temperature on the thermal compensation of a machine tool. Machines 2022, 10, 1201. [Google Scholar] [CrossRef]
Battiti, R. Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 1994, 5, 537–550. [Google Scholar] [CrossRef]
Alsharef, A.; Garg, S.; Kumar, K.; Iwendi, C. Time series data modeling using advanced machine learning and AutoML. Sustainability 2022, 14, 15292. [Google Scholar] [CrossRef]
Alsharef, A.; Aggarwal, K.; Sonia; Kumar, M.; Mishra, A. Review of ML and AutoML solutions to forecast time-series data. Arch. Comput. Methods Eng. 2022, 29, 5297–5311. [Google Scholar] [CrossRef]
Weiss, K.; Khoshgoftaar, T.M.; Wang, D.D. A survey of transfer learning. J. Big Data 2016, 3, 9. [Google Scholar] [CrossRef]
Sinno Jialin, P. Transfer learning. Learning 2020, 21, 1–2. [Google Scholar]
Xie, S.M.; Ma, T.; Liang, P. Composed fine-tuning: Freezing pre-trained denoising autoencoders for improved generalization. In Proceedings of the 38th International Conference on Machine Learning (ICML 2021), Virtual Conference, 18–24 July 2021; pp. 11424–11435. [Google Scholar]
Aleksandar, J.; Gaurav, C.; Francesco, G. Optimization through classical design of experiments (DOE): An investigation on the performance of different factorial designs for multi-objective optimization of complex systems. J. Build. Eng. 2025, 102, 111931. [Google Scholar] [CrossRef]
Inglis, A.; Andrew, P.; Catherine, B.H. Visualizing variable importance and variable interaction effects in machine learning models. J. Comput. Graph. Stat. 2022, 31, 766–778. [Google Scholar] [CrossRef]
Sutrisno, U.; Wulandari, Y.; Arifin, S.; Manurung, M.M.; Faisal, M. Trends, Contributions and Prospects: Bibliometric Analysis of ANOVA Research in 2022-2023. Indones. J. Appl. Math. Stat. 2024, 1, 27–38. [Google Scholar] [CrossRef]
Montgomery, D.C. Design and Analysis of Experiments, 9th ed.; John Wiley & Sons: New York, NY, USA, 2017. [Google Scholar]
Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? Adv. Neural Inf. Process. Syst. 2014, 27, 3320–3328. [Google Scholar]
Centofanti, F.; Colosimo, B.M.; Grasso, M.L.; Menafoglio, A.; Palumbo, B.; Vantini, S. Robust functional ANOVA with application to additive manufacturing. J. R. Stat. Soc. C Appl. Stat. 2023, 72, 1210–1234. [Google Scholar] [CrossRef]
Iravani, M.; Simjoo, M.; Chahardowli, M.; Rezvani Moghaddam, A. Experimental insights into the stability of graphene oxide nanosheet and polymer hybrid coupled by ANOVA statistical analysis. Sci. Rep. 2024, 14, 18448. [Google Scholar] [CrossRef] [PubMed]
Gorkem, S.; Ataman, M.G.; Kızıloğlu, İ. Analyzing main and interaction effects of length of stay determinants in emergency departments. Int. J. Health Policy Manag. 2019, 9, 198. [Google Scholar]
Seabold, S.; Perktold, J. Statsmodels: Econometric and statistical modeling with Python. In Proceedings of the 9th Python in Science Conference (SciPy 2010), Austin, TX, USA, 28 June–3 July 2010; pp. 92–96. [Google Scholar]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer: New York, NY, USA, 2009. [Google Scholar]
Mayr, J.; Jedrzejewski, J.; Uhlmann, E.; Alkan Donmez, M.; Knapp, W.; Härtig, F.; Wendt, K.; Moriwaki, T.; Shore, P.; Schmitt, R.; et al. Thermal issues in machine tools. CIRP Ann. 2012, 61, 771–791. [Google Scholar] [CrossRef]
Tzeng, E.; Hoffman, J.; Zhang, N.; Saenko, K.; Darrell, T. Deep domain confusion: Maximizing for domain invariance. arXiv 2014, arXiv:1412.3474. [Google Scholar] [CrossRef]

Figure 1. Temperature measurement layout and thermal deformation measurement method.

Figure 2. Thermal measurement layout covering spindle, table, base, and ambient points.

Figure 3. Temperature measurement layout, including column, screw bearings, spindle, motor, lubrication, and coolant points.

Figure 4. The calibration setups: (a) thermocouple temperature calibration and (b) thermal deformation calibration using a capacitive displacement sensor.

Figure 5. Baseline operating conditions (2021): (a) characteristic conditions; (b) training set; (c) validation and testing sets.

Figure 6. Operating conditions after long-term machining (2024): characteristic conditions, training set, validation set, and testing set.

Figure 7. Comparison of differences in machine characteristic changes between H10 (2024) and H7’ (2021) based on temperature distribution and Z-axis thermal deformation.

Figure 8. Thermal deformation characteristics along Z-axis of machine tool.

Figure 9. Schematic diagram of neural network structure.

Figure 10. Main-effect analysis for (a) Factor A, (b) Factor B, and (c) Factor C.

Figure 11. Interaction effects of Factors A and B.

Figure 12. Interaction effects of Factors A and C.

Figure 13. Interaction effects of Factors B and C.

Figure 14. Interaction effects of Factors A, B, and C.

Table 1. Architecture of pretrained neural network model.

Layer No.	Layer Name	Layer Type	Output Shape	Parameters
1	input_1	Input	(None, 1, 3)	0
2	bidirectional	Bidirectional	(None, 1, 6)	168
3	bidirectional_1	Bidirectional	(None, 6)	240
4	dropout	Dropout	(None, 6)	0
5	regression_head_1	Dense	(None, 1)	7
Total	–	–	–	415

Table 2. Factor A levels: selection of neural network layer types.

Level	Experimental Condition
A1	Insert a dense layer at Position (IV), as described in Section 2.3.2.
A2	Insert an LSTM layer at Position (II), as described in Section 2.3.2.
A3	Insert an LSTM layer at Position (II) and a dense layer at Position (IV), as described in Section 2.3.2.
A4	Insert a dense layer at Position (II) and an LSTM layer at Position (IV), as described in Section 2.3.2.

Table 3. Factor B levels: parameter-related factors influencing fine-tuning.

Level	Description
B1	Do not unfreeze any hidden layers.
B2	Unfreeze the layer named “bidirectional” in Table 1.
B3	Unfreeze the layer named “bidirectional_1” in Table 1.
B4	Unfreeze the layer named “regression_head_1” in Table 1.
B5	Unfreeze the layers named “bidirectional” and “bidirectional_1” in Table 1.
B6	Unfreeze the layers named “bidirectional” and “regression_head_1” in Table 1.
B7	Unfreeze the layers named “bidirectional_1” and “regression_head_1” in Table 1.
B8	Unfreeze the layers named “bidirectional,” “bidirectional_1,” and “regression_head_1” in Table 1.

Table 4. Factor C levels: selection strategy for operating conditions for transfer-learning training.

Level	Experimental Conditions	Purpose of Combination	Data Collection Duration (Working Days, 8 h/Day)
C1	Characteristic operating conditions H1–H7 described in Section 2.2	To establish a baseline comparison model primarily trained via characteristic experiments	7 days
C2	Characteristic operating conditions H5–H7 described in Section 2.2	To evaluate the influence of compound characteristic effects on model performance while maintaining a characteristic-experiment-dominant dataset	3 days
C3	The better-performing setting between C1 and C2, combined with duty-cycle-related operating conditions (H8, H10, H12, H13, H15, and H17)	To investigate the effect of duty-cycle duration on model behavior under both constant-temperature and varying-temperature environments	13 days
C4	The better-performing setting between C1 and C2, augmented with short-duty-cycle conditions under constant and varying ambient temperatures (H8 and H13)	To examine the effect of the shortest duty-cycle duration on model accuracy	9 days
C5	The better-performing setting between C1 and C2, augmented with medium-duty-cycle conditions under constant and varying ambient temperatures (H10 and H15)	To evaluate the effect of intermediate duty-cycle durations on model accuracy	9 days
C6	The better-performing setting between C1 and C2, augmented with long-duty-cycle conditions under constant and varying ambient temperatures (H12 and H17)	To assess the effect of the longest duty-cycle duration on model accuracy	9 days
C7	The best-performing configuration among C3–C6, combined with conditions that isolate the varying-temperature effect	To compare how varying ambient temperature affects the model’s fitting capability	8 days
C8	The best-performing configuration among C3–C6, combined with conditions that isolate the constant-temperature effect	To compare how constant ambient temperature influences the model’s fitting capability	8 days
C9	The better-performing configuration between C7 and C8, combined with the abrupt-change operating condition (H18)	To examine the model’s transfer-learning capability under rapid and simultaneous variations in machine operation and ambient temperature	9 Day

Table 5. Summary of test conditions (H1’–H16’) and corresponding prediction errors (RMSE).

Test Condition	Prediction Error RMSE (µm)	Test Condition	Prediction Error RMSE (µm)
H1’	0.51	H9’	1.97
H2’	1.16	H10’	1.20
H3’	2.30	H11’	1.35
H4’	2.39	H12’	1.13
H5’	0.93	H13’	1.84
H6’	1.03	H14’	5.71
H7’	1.21	H15’	4.01
H8’	1.24	H16’	1.17
Average		1.82

Table 6. Comparison of insertion strategies based on average RMSE across nine combinations of operating conditions in full-factorial design.

	Insertion Strategy	Average RMSE (µm) Across C1–C9
1	Insert a dense layer at Position (IV), as described in Section 2.3.2.	3.28
2	Insert an LSTM layer at Position (II), as described in Section 2.3.2.	3.77
3	Insert a dense layer at Position (II) and an LSTM layer at Position (IV), as described in Section 2.3.2.	3.86
4	Insert an LSTM layer at Position (II) and a dense layer at Position (IV), as described in Section 2.3.2.	3.86
5	Direct testing (pretrained model)	3.88

Table 7. ANOVA results for individual factors and their combinations.

	Sum of Squares	Degree of Freedom	Mean Square	F-Statistic	p-Value
Factor A	15,031.82	3	5010.61	4473.78	0
Factor B	1884.69	7	269.24	240.4	2.88 × 10⁻²⁶⁰
Factor C	407.66	8	50.96	45.5	1.37 × 10⁻⁶⁷
Factor A × Factor B	2673.63	21	127.32	113.68	1 × 10⁻³²²
Factor A × Factor C	647.43	24	26.98	24.09	8.77 × 10⁻⁹³
Factor B × Factor C	318.67	56	5.69	5.08	1.01 × 10⁻²⁹
Factor A × Factor B × Factor C	520.99	168	3.1	2.77	1.29 × 10⁻²⁵
Error	2257.91	2016	1.12

Table 8. Ranked levels of Factor A based on main-effect analysis in full-factorial design.

Factor A Level	Average RMSE (µm)
A3: LSTM inserted after hidden layer, dense layer inserted after output layer	4.70
A2: LSTM inserted after hidden layer	4.75
A1: Dense layer inserted after input layer	5.49
A4: Dense layer inserted after hidden layer, LSTM inserted after output layer	10.84

Table 9. Ranked levels of Factor B based on main-effect analysis in full-factorial design.

Factor B Level	Average RMSE (µm)
B1: Do not unfreeze any hidden layers.	4.15
B8: Unfreeze the layers named “bidirectional,” “bidirectional_1,” and “regression_head_1” in Table 1.	6.52
B6: Unfreeze the layers named “bidirectional” and “regression_head_1” in Table 1.	6.52
B5: Unfreeze the layers named “bidirectional” and “bidirectional_1” in Table 1.	6.53
B2: Unfreeze the layer named “bidirectional” in Table 1.	6.54
B7: Unfreeze the layers named “bidirectional_1” and “regression_head_1” in Table 1.	7.07
B3: Unfreeze the layer named “bidirectional_1” in Table 1.	7.08
B4: Unfreeze the layer named “regression_head_1” in Table 1.	7.14

Table 10. Ranked levels of Factor C based on main-effect analysis in full-factorial design.

Factor C Level	Average RMSE (µm)
C9: H1, H2, H3, H4, H5, H6, H7, H15, H18	6.065
C7: H1, H2, H3, H4, H5, H6, H7, H15	6.068
C1: H1, H2, H3, H4, H5, H6, H7	6.19
C3: H1, H2, H3, H4, H5, H6, H7, H8, H10, H12, H13, H15, H17	6.22
C4: H1, H2, H3, H4, H5, H6, H7, H8, H13	6.27
C8: H1, H2, H3, H4, H5, H6, H7, H10	6.46
C5: H1, H2, H3, H4, H5, H6, H7, H10, H15	6.58
C6: H1, H2, H3, H4, H5, H6, H7, H12, H17	6.66
C2: H5, H6, H7	7.48

Table 11. Recommended neural network architectures and test results from main-effect and interaction-effect analyses.

Description	Training Time	Run 1	Run 2	Run 3	Run 4	Run 5	Average RMSE (µm)
Main-effect analysis (Recommended Combination 1) Factor A: Insert an LSTM layer after the hidden layer and insert a dense layer after the output layer (A3) Factor B: No layers unfrozen (B1) Factor C: Operating condition combination 9 (C9)	0.5 min	3.55	3.69	3.53	3.44	3.6	3.56
Interaction-effect analysis: Factors A and B (Recommended Combination 2) Factor A: Insert a dense layer after the input layer (A1) Factor B: No layers unfrozen (B1) Factor C: Operating condition combination 9 (C9)	0.5 min	3.7	3.85	3.91	3.9	3.93	3.86
Interaction-effect analysis: Factors A and C (Recommended Combination 1) Factor A: Insert an LSTM layer after the hidden layer and insert a dense layer after the output layer (A3) Factor B: No layers unfrozen (B1) Factor C: Operating condition combination 9 (C9)	0.5 min	3.55	3.69	3.53	3.44	3.6	3.56
Interaction-effect analysis: Factors B and C (Recommended Combination 3) Factor A: Insert an LSTM layer after the hidden layer (A2) Factor B: No layers unfrozen (B1) Factor C: Operating condition combination 9 (C9)	0.5 min	3.44	3.43	3.43	3.42	3.66	3.47
Interaction-effect analysis: Factors A, B, and C (Recommended Combination 3) Factor A: Insert an LSTM layer after the hidden layer (A2) Factor B: No layers unfrozen (B1) Factor C: Operating condition combination 9 (C9)	0.5 min	3.44	3.43	3.43	3.42	3.66	3.47
Pretrained model (direct testing)	0 min	3.88					3.88
Pretrained model network architecture from C9	1 min	3.59	3.57	4.12	3.55	3.90	3.74
Test results of model retrained using data from C9 with optimized parameters	4 h 10 min	4.21	3.50	3.73	4.09	3.69	3.84
Test results of model retrained using data from all H1–H18 as the training set and H19 as the validation set with optimized parameters.	18 h 20 min	3.25	3.19	3.32	3.29	3.00	3.21

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chuang, C.-C.; Lin Chi, Z.-W.; Kuo, T.-C.; Chang, C.-J.; Hsieh, W.-H. Development of a Transfer Learning Technique for Rapid Adaptation of Thermal Compensation Models to Long-Term Machine Thermal Behavior Changes. Machines 2026, 14, 309. https://doi.org/10.3390/machines14030309

AMA Style

Chuang C-C, Lin Chi Z-W, Kuo T-C, Chang C-J, Hsieh W-H. Development of a Transfer Learning Technique for Rapid Adaptation of Thermal Compensation Models to Long-Term Machine Thermal Behavior Changes. Machines. 2026; 14(3):309. https://doi.org/10.3390/machines14030309

Chicago/Turabian Style

Chuang, Chia-Chin, Zheng-Wei Lin Chi, Tzu-Chien Kuo, Che-Jui Chang, and Wen-Hsin Hsieh. 2026. "Development of a Transfer Learning Technique for Rapid Adaptation of Thermal Compensation Models to Long-Term Machine Thermal Behavior Changes" Machines 14, no. 3: 309. https://doi.org/10.3390/machines14030309

APA Style

Chuang, C.-C., Lin Chi, Z.-W., Kuo, T.-C., Chang, C.-J., & Hsieh, W.-H. (2026). Development of a Transfer Learning Technique for Rapid Adaptation of Thermal Compensation Models to Long-Term Machine Thermal Behavior Changes. Machines, 14(3), 309. https://doi.org/10.3390/machines14030309

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Development of a Transfer Learning Technique for Rapid Adaptation of Thermal Compensation Models to Long-Term Machine Thermal Behavior Changes

Abstract

1. Introduction

2. Research Methods

2.1. Experimental Design and Measurement Data on Temperature and Thermal Deformation

2.1.1. Measurement Equipment and Methods for Thermal Rise and Deformation Experiments on Horizontal Machining Centers

2.1.2. Experimental Condition Design

2.1.3. Comparison of Characteristic Differences Between New and Old Machines

2.2. Development of Pretrained Model

2.2.1. Selection of Temperature-Sensitive Points Based on Physical Phenomena and Correlation Analysis

2.2.2. Identification of Optimal Model Architecture Using AutoKeras

2.2.3. Evaluation Metrics for Pretrained Model

2.3. Development of Transfer Learning Algorithm for Thermal Compensation in Horizontal Machining Centers

2.3.1. Originally Optimized Thermal Compensation Model as Pretrained Model for Transfer Learning

2.3.2. Creation of Composite Fine-Tuning Neural Network Architecture

2.3.3. Training and Validation of Composite Fine-Tuning Neural Network Architecture

2.3.4. Testing of Composite Fine-Tuning Neural Network Architecture

2.4. Determination of Optimal Factor Selection Strategies via Full-Factorial Analysis

2.4.1. Definition of Factors for Full-Factorial Analysis

2.4.2. Formulation of Full-Factorial Analysis Equation

3. Results and Discussion

3.1. Optimized Thermal Compensation Model at Delivery as Pretrained Model

3.2. Experimental Framework and Optimization Strategy for Full-Factorial Design

3.3. Visualization of ANOVA Results: Main and Interaction Effects

3.4. Model Validation Based on ANOVA Results

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI