Development of a Transfer Learning Technique for Rapid Adaptation of Thermal Compensation Models to Long-Term Machine Thermal Behavior Changes
Abstract
1. Introduction
2. Research Methods
- Experimental design and measurement data on temperature and thermal deformation: We systematically varied the spindle speed and ambient temperature to design experimental conditions for measuring the thermal rise and deformation behavior of the same horizontal machining center under two mechanical states—the baseline mechanical state at the time of delivery (2021) and the mechanical state after extended operation (2024).
- Development of pretrained model: Experimental data collected from the new machine (baseline mechanical state at the time of delivery) were used to build a pre-trained thermal error prediction model, which served as the source model for subsequent transfer learning.
- Development of a transfer learning algorithm for thermal compensation in horizontal machining centers: The machine characteristics were transferred from the baseline state at the time of delivery to the altered state induced by extended machining operations.
- Determination of optimal factor selection strategies via full-factorial analysis.
2.1. Experimental Design and Measurement Data on Temperature and Thermal Deformation
2.1.1. Measurement Equipment and Methods for Thermal Rise and Deformation Experiments on Horizontal Machining Centers
2.1.2. Experimental Condition Design
2.1.3. Comparison of Characteristic Differences Between New and Old Machines
2.2. Development of Pretrained Model
2.2.1. Selection of Temperature-Sensitive Points Based on Physical Phenomena and Correlation Analysis
- (1)
- spindle: [S25, S26, S27, and ];
- (2)
- column: [, , , and ]; and
- (3)
- base: [B1, B2, B3, B4, B5, B6, , , and ].
2.2.2. Identification of Optimal Model Architecture Using AutoKeras
2.2.3. Evaluation Metrics for Pretrained Model
2.3. Development of Transfer Learning Algorithm for Thermal Compensation in Horizontal Machining Centers
2.3.1. Originally Optimized Thermal Compensation Model as Pretrained Model for Transfer Learning
2.3.2. Creation of Composite Fine-Tuning Neural Network Architecture
2.3.3. Training and Validation of Composite Fine-Tuning Neural Network Architecture
2.3.4. Testing of Composite Fine-Tuning Neural Network Architecture
2.4. Determination of Optimal Factor Selection Strategies via Full-Factorial Analysis
2.4.1. Definition of Factors for Full-Factorial Analysis
2.4.2. Formulation of Full-Factorial Analysis Equation
3. Results and Discussion
3.1. Optimized Thermal Compensation Model at Delivery as Pretrained Model
3.2. Experimental Framework and Optimization Strategy for Full-Factorial Design
3.3. Visualization of ANOVA Results: Main and Interaction Effects
- The optimal combination of main-effect factors was determined by selecting the levels of Factors A, B, and C that yielded the lowest overall RMSE values. Accordingly, the optimal main-effect settings were found to be as follows: structural expansion strategy = A3 (inserting a hidden layer after the LSTM output layer followed by a dense layer); parameter-unfreezing strategy = B1 (no layers unfrozen); and operating condition combination strategy = C9.
- As observed in Figure 11, under the unfreezing configuration, A4 (dense layer inserted after hidden layer, LSTM inserted after output layer) yielded the highest average RMSE (approximately 11–12). This indicates that introducing recurrent/time-dependent units at the output stage destabilizes the model during transfer learning. When an LSTM-related structure is appended to the output layer and subsequently unfrozen, historical temporal information is reintroduced into the prediction process, which increases temporal sensitivity and disturbs the pretrained output-layer weights. This amplifies error accumulation, significantly raising the RMSE.
- Likewise, A1 (dense layer inserted after input layer) exhibited consistently high RMSEs when either the hidden layer, the output layer, or both were unfrozen. Inserting a dense layer at the input stage forces the model to relearn the feature relationship between temperature and deformation, undermining the pre-learned thermal behavior representation. Subsequently, upon unfreezing, the altered feature pathway increases the gradient sensitivity and weakens the established thermal correlations, thereby compromising predictive performance.
- Overall, these findings demonstrate that the interaction between the insertion location and the unfreezing strategy has a decisive impact on transfer learning stability and that unsuitable structural–parameter configurations substantially increase prediction errors during model adaptation.
- However, under all operating conditions, A4 (inserting a dense layer after the hidden layer and an LSTM after the output layer) exhibited the highest RMSE (all exceeding 10, approximately 10.7–10.9), with limited variation across conditions. Thus, the effectiveness of this structure is consistently poor, which suggests that the resulting network architecture is not suitable for thermal parameter prediction in various processing scenarios.
- Among all combinations of conditions, C2, with the least amount of training data, also displayed a relatively high RMSE after training. This indicates that an insufficient sample size leads to poor feature representation, thus limiting the model’s fitting ability. However, once the error exceeds a certain threshold, different operating conditions become indistinguishable, meaning that the model cannot learn thermal features effectively and becomes insensitive to changes in operating conditions.
- C3 combined typical testing for operating conditions with variations in spindle speed at low/medium/high temperatures under constant and varying ambient temperatures, serving as a benchmark for evaluating the model’s generalizability under different thermal behaviors. This configuration supports cross-condition adaptability and provides a foundational representation for models subsequently trained under C4–C6.
- When retaining only low-speed data under constant and varying temperature conditions, the RMSE of C4 was similar to that of C3. This shows that the influence of spindle speed can be adequately captured under low-speed conditions and that low-speed experiments can serve as a transfer basis for thermal modeling.
- C7 retained only varying ambient temperature data under low-speed operation, further reducing the RMSE. Under low-speed conditions, spindle heating is minimal and stable, whereby ambient temperature becomes the primary driving thermal characteristic. This preserves the pre-learned thermal behavior distribution and facilitates fine-tuning, enabling the model to capture temperature-deformation correlations more accurately and thereby reducing error. Conversely, due to the lack of temperature variations in C8, limited temperature information was available during training, because the modeling was based solely on isothermal data. Therefore, the model often overreacted to small temperature fluctuations during prediction, which increased its RMSE.
- Under C9, the thermal deformation trend became more pronounced when both spindle speed variations and large thermal disturbances caused by ambient temperature fluctuations were considered simultaneously. The clearer temperature gradient improved the separability of the features, allowing the model to learn stable thermal mappings more effectively and thereby reducing the overall RMSE again.
- These results indicate that despite larger training datasets, C3 and C4 did not outperform C7 and C9 in terms of transfer learning, which suggests that additional data do not necessarily provide more knowledge during the fine-tuning phase. Instead, such data may modify (or even compromise) the existing thermal behavior information encoded in the pretrained model. Data at fixed ambient temperatures exhibit single-source, weakly nonlinear dynamics, while variable-temperature data correspond to multi-source, hysteresis-driven nonlinear thermal behavior [41].
- During fine-tuning, gradient-update conflicts and dynamic behavior mismatches can occur when different heat distributions correspond to different heat transfer mechanisms in heterogeneous thermal environments, which impairs the learned feature weights. If an increase in the amount of data is accompanied by inconsistencies in the distribution, the model’s predictive accuracy may decrease [42].
- In the unfreezing configuration (B1), the pretrained model retained the original thermal behavior representation learned through machine learning at delivery. Since no weight updates were performed, the model also retained the original weights and biases corresponding to the temperature features and thermal deformation trends, thus exhibiting the smallest RMSE under different operating conditions.
- However, when only the hidden layer, only the output layer, or both the hidden and output layers were unfrozen, the average RMSE increased significantly. This is because during fine-tuning, the weights and biases of the initially learned relationship between temperature and thermal deformation were retrained, which disrupted the established thermal behavior features. Therefore, the model struggled to maintain stable predictions under changing conditions. Particularly under C2, which had a smaller sample size, the model’s sensitivity to weight perturbations increased, and the nonlinear mapping between the temperature signal and the deformation output was amplified, which increased the prediction error.
- Interestingly, when the input layer was unfrozen (whether individually or together with hidden or output layers), the RMSE remained lower than when only the intermediate or output layers were unfrozen. When the input layer was unfrozen (either alone or together with the hidden or output layers), the RMSE remained lower than in cases where only the intermediate or output layers were unfrozen. Updating the input-layer parameters allows the model to recalibrate the temperature feature distribution before the resulting feature activations propagate through the subsequent hidden and output layers, thereby better matching the new thermal domain. Consequently, compared with fine-tuning only the intermediate layers, input-layer adaptation improves the model’s generalizability under dynamic operating conditions (C3–C9), including variations in spindle speed and ambient temperature.
- Overall, fine-tuning the hidden or output layers may erase pretrained thermal deformation knowledge, whereas modifying the input layer preserves the stability of the core thermal representation while enabling adaptation to new conditions.
- Among all configurations, A4 (inserting a dense layer after the hidden layer and then an LSTM layer after the output layer) displayed the highest RMSE (~11–12) across all unfreezing strategies; this confirms that placing a recurrent/time-varying structure in the output layer is detrimental during transfer learning. When an LSTM layer is followed by the output layer and then unfrozen, historical time information is fed back into the current prediction, which increases temporal sensitivity and disrupts the stability of the pretrained output weights. This feedback loop exacerbates gradient oscillations, amplifies error propagation, and ultimately leads to a significant decline in performance during adaptation.
- Similarly, A1 (inserting a dense layer after the input layer) also produced a high RMSE when the hidden or output layer was unfrozen. Inserting a dense layer into the input layer forces the model to relearn thermal feature maps, thereby disrupting the pre-learned temperature–deformation representation. As more layers are unfrozen, the modified input path disrupts the stability of feature alignment further, leading to increased prediction errors and less stable transfer.
- Conversely, when the time-varying network structure was placed within the inner hidden layers (A2 and A3), the RMSE fluctuations under different unfreezing strategies were significantly reduced. This indicates that time-varying features embedded in deeper layers can be partially absorbed and smoothed, thus preserving the core thermal behavior relationships and improving robustness during transfer learning.
- The results of the two-factor analysis (Figure 11, Figure 12 and Figure 13) corroborate this conclusion. Changes in external operating conditions, such as fluctuations in ambient temperature and spindle speed settings, lead to differences in feature distribution. This exposes the model to the uncertainties in feature differences and weight adjustments during transfer. Therefore, the model’s prediction error (RMSE) is inferred to be influenced by both the neural network architecture design (adding and unfreezing layers) and the training strategy based on external operating conditions. Section 3.4 outlines the validation of the model based on the results presented in this section.
3.4. Model Validation Based on ANOVA Results
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| LSTM | Long short-term memory |
| MI | Mutual information |
| RMSE | Root-mean-squared error |
| MSE | Mean-squared error |
| ANOVA | Analysis of variance |
References
- Li, Y.; Zhao, Q.; Lan, S.; Ni, J.; Wu, W.; Lu, B. A review on spindle thermal error compensation in machine tools. Int. J. Mach. Tools Manuf. 2015, 95, 20–38. [Google Scholar] [CrossRef]
- Ramesh, R.; Mannan, M.A.; Poo, A.N. Error compensation in machine tools—A review Part II: Thermal errors. Int. J. Mach. Tools Manuf. 2015, 93, 26–36. [Google Scholar]
- Wei, X.; Ye, H.; Miao, E.; Pan, Q. Thermal error modeling and compensation based on Gaussian process regression for CNC machine tools. Precis. Eng. 2022, 77, 65–76. [Google Scholar] [CrossRef]
- Liu, H.; Miao, E.M.; Wei, X.Y.; Zhuang, X.D. Robust modeling method for thermal error of CNC machine tools based on ridge regression algorithm. Int. J. Mach. Tools Manuf. 2017, 113, 35–48. [Google Scholar] [CrossRef]
- Guo, Q.; Yang, J. Application of projection pursuit regression to thermal error modeling of a CNC machine tool. Int. J. Adv. Manuf. Technol. 2011, 55, 623–629. [Google Scholar]
- Yang, H.; Ni, J. Adaptive model estimation of machine-tool thermal errors based on recursive dynamic modeling strategy. Int. J. Mach. Tools Manuf. 2005, 45, 1–11. [Google Scholar] [CrossRef]
- Zhang, Y.; Yang, J.; Jiang, H. Machine tool thermal error modeling and prediction by grey neural network. Int. J. Adv. Manuf. Technol. 2012, 59, 1065–1072. [Google Scholar] [CrossRef]
- ISO 10791-7:2014; Test Conditions for Machining Centres—Part 7: Accuracy of a Finished Test Piece. International Organization for Standardization: Geneva, Switzerland, 2014.
- Bringmann, B.; Knapp, W. Machine tool calibration: Geometric test uncertainty depends on machine tool performance. Precis. Eng. 2009, 33, 524–529. [Google Scholar] [CrossRef]
- Hinder, F.; Vaquet, V.; Hammer, B. One or two things we know about concept drift—A survey on monitoring in evolving environments. Part A: Detecting concept drift. Front. Artif. Intell. 2024, 7, 1330257. [Google Scholar] [CrossRef]
- Chen, T.C.; Chang, C.J.; Hung, J.P.; Lee, R.M.; Wang, C.C. Real-time compensation for thermal errors of the milling machine. Appl. Sci. 2016, 6, 101. [Google Scholar] [CrossRef]
- Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees, 1st ed.; CRC Press: Boca Raton, FL, USA, 1984. [Google Scholar]
- Ramesh, R.; Mannan, M.A.; Poo, A.N.; Keerthi, S.S. Thermal error measurement and modelling in machine tools. Part II. Hybrid Bayesian network—Support vector machine model. Int. J. Mach. Tools Manuf. 2003, 43, 405–419. [Google Scholar] [CrossRef]
- Horejs, O.; Mares, M.; Kohut, P.; Barta, P.; Hornych, J. Compensation of machine tool thermal errors based on transfer functions. MM Sci. J. 2010, 3, 162–165. [Google Scholar] [CrossRef]
- Yau, H.T.; Kuo, P.H.; Chen, S.C.; Lai, P.Y. Transfer-learning-based long short-term memory model for machine tool spindle thermal displacement compensation. IEEE Sens. J. 2024, 24, 132–143. [Google Scholar]
- Li, P.; Lou, P.; Yan, J.; Liu, N. The thermal error modeling with deep transfer learning. J. Phys. Conf. Ser. 2020, 1576, 012003. [Google Scholar] [CrossRef]
- Zhou, D.; Zeng, F.; Jia, S. The thermal error modeling approach of feeding axes for horizontal CNC lathes based on transfer learning. Preprint 2023. [Google Scholar] [CrossRef]
- Ma, S.; Leng, J.; Chen, Z.; Li, B.; Li, X.; Zhang, D.; Li, W.; Liu, Q. A novel weakly supervised adversarial network for thermal error modeling of electric spindles with scarce samples. Expert Syst. Appl. 2024, 238, 122065. [Google Scholar]
- Zheng, Y.; Fu, G.; Mu, S.; Lu, C.; Wang, X.; Wang, T. Thermal Error Transfer Prediction Modeling of Machine Tool Spindle with Self-Attention Mechanism-Based Feature Fusion. Machines 2024, 12, 728. [Google Scholar] [CrossRef]
- Mao, H.; Liu, Z.; Qiu, C.; Liu, H.; Sun, J.; Tan, J. Subspace metric-based transfer learning for spindle thermal error prediction under time-varying conditions. IEEE Trans. Instrum. Meas. 2024, 73, 2514311. [Google Scholar]
- Zheng, Y.; Fu, G.; Mu, S.; Zhu, S.; Lin, K.; Yang, L. A review on transfer learning in spindle thermal error compensation of spindle. Adv. Manuf. 2024, 1, 12. [Google Scholar] [CrossRef]
- ISO-230-3:2007; Test Code for Machine Tools—Part 3: Determination of Thermal Effects. International Organization for Standardization: Geneva, Switzerland, 2007.
- IEC 60584-1:2013; Thermocouples—Part 1: EMF Specifications and Tolerances. International Electrotechnical Commission: Geneva, Switzerland, 2013.
- Maurya, S.N.; Li, K.Y.; Luo, W.J.; Kao, S.Y. Effect of coolant temperature on the thermal compensation of a machine tool. Machines 2022, 10, 1201. [Google Scholar] [CrossRef]
- Battiti, R. Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 1994, 5, 537–550. [Google Scholar] [CrossRef]
- Alsharef, A.; Garg, S.; Kumar, K.; Iwendi, C. Time series data modeling using advanced machine learning and AutoML. Sustainability 2022, 14, 15292. [Google Scholar] [CrossRef]
- Alsharef, A.; Aggarwal, K.; Sonia; Kumar, M.; Mishra, A. Review of ML and AutoML solutions to forecast time-series data. Arch. Comput. Methods Eng. 2022, 29, 5297–5311. [Google Scholar] [CrossRef]
- Weiss, K.; Khoshgoftaar, T.M.; Wang, D.D. A survey of transfer learning. J. Big Data 2016, 3, 9. [Google Scholar] [CrossRef]
- Sinno Jialin, P. Transfer learning. Learning 2020, 21, 1–2. [Google Scholar]
- Xie, S.M.; Ma, T.; Liang, P. Composed fine-tuning: Freezing pre-trained denoising autoencoders for improved generalization. In Proceedings of the 38th International Conference on Machine Learning (ICML 2021), Virtual Conference, 18–24 July 2021; pp. 11424–11435. [Google Scholar]
- Aleksandar, J.; Gaurav, C.; Francesco, G. Optimization through classical design of experiments (DOE): An investigation on the performance of different factorial designs for multi-objective optimization of complex systems. J. Build. Eng. 2025, 102, 111931. [Google Scholar] [CrossRef]
- Inglis, A.; Andrew, P.; Catherine, B.H. Visualizing variable importance and variable interaction effects in machine learning models. J. Comput. Graph. Stat. 2022, 31, 766–778. [Google Scholar] [CrossRef]
- Sutrisno, U.; Wulandari, Y.; Arifin, S.; Manurung, M.M.; Faisal, M. Trends, Contributions and Prospects: Bibliometric Analysis of ANOVA Research in 2022-2023. Indones. J. Appl. Math. Stat. 2024, 1, 27–38. [Google Scholar] [CrossRef]
- Montgomery, D.C. Design and Analysis of Experiments, 9th ed.; John Wiley & Sons: New York, NY, USA, 2017. [Google Scholar]
- Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? Adv. Neural Inf. Process. Syst. 2014, 27, 3320–3328. [Google Scholar]
- Centofanti, F.; Colosimo, B.M.; Grasso, M.L.; Menafoglio, A.; Palumbo, B.; Vantini, S. Robust functional ANOVA with application to additive manufacturing. J. R. Stat. Soc. C Appl. Stat. 2023, 72, 1210–1234. [Google Scholar] [CrossRef]
- Iravani, M.; Simjoo, M.; Chahardowli, M.; Rezvani Moghaddam, A. Experimental insights into the stability of graphene oxide nanosheet and polymer hybrid coupled by ANOVA statistical analysis. Sci. Rep. 2024, 14, 18448. [Google Scholar] [CrossRef] [PubMed]
- Gorkem, S.; Ataman, M.G.; Kızıloğlu, İ. Analyzing main and interaction effects of length of stay determinants in emergency departments. Int. J. Health Policy Manag. 2019, 9, 198. [Google Scholar]
- Seabold, S.; Perktold, J. Statsmodels: Econometric and statistical modeling with Python. In Proceedings of the 9th Python in Science Conference (SciPy 2010), Austin, TX, USA, 28 June–3 July 2010; pp. 92–96. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer: New York, NY, USA, 2009. [Google Scholar]
- Mayr, J.; Jedrzejewski, J.; Uhlmann, E.; Alkan Donmez, M.; Knapp, W.; Härtig, F.; Wendt, K.; Moriwaki, T.; Shore, P.; Schmitt, R.; et al. Thermal issues in machine tools. CIRP Ann. 2012, 61, 771–791. [Google Scholar] [CrossRef]
- Tzeng, E.; Hoffman, J.; Zhang, N.; Saenko, K.; Darrell, T. Deep domain confusion: Maximizing for domain invariance. arXiv 2014, arXiv:1412.3474. [Google Scholar] [CrossRef]















| Layer No. | Layer Name | Layer Type | Output Shape | Parameters |
|---|---|---|---|---|
| 1 | input_1 | Input | (None, 1, 3) | 0 |
| 2 | bidirectional | Bidirectional | (None, 1, 6) | 168 |
| 3 | bidirectional_1 | Bidirectional | (None, 6) | 240 |
| 4 | dropout | Dropout | (None, 6) | 0 |
| 5 | regression_head_1 | Dense | (None, 1) | 7 |
| Total | – | – | – | 415 |
| Level | Experimental Condition |
|---|---|
| A1 | Insert a dense layer at Position (IV), as described in Section 2.3.2. |
| A2 | Insert an LSTM layer at Position (II), as described in Section 2.3.2. |
| A3 | Insert an LSTM layer at Position (II) and a dense layer at Position (IV), as described in Section 2.3.2. |
| A4 | Insert a dense layer at Position (II) and an LSTM layer at Position (IV), as described in Section 2.3.2. |
| Level | Description |
|---|---|
| B1 | Do not unfreeze any hidden layers. |
| B2 | Unfreeze the layer named “bidirectional” in Table 1. |
| B3 | Unfreeze the layer named “bidirectional_1” in Table 1. |
| B4 | Unfreeze the layer named “regression_head_1” in Table 1. |
| B5 | Unfreeze the layers named “bidirectional” and “bidirectional_1” in Table 1. |
| B6 | Unfreeze the layers named “bidirectional” and “regression_head_1” in Table 1. |
| B7 | Unfreeze the layers named “bidirectional_1” and “regression_head_1” in Table 1. |
| B8 | Unfreeze the layers named “bidirectional,” “bidirectional_1,” and “regression_head_1” in Table 1. |
| Level | Experimental Conditions | Purpose of Combination | Data Collection Duration (Working Days, 8 h/Day) |
|---|---|---|---|
| C1 | Characteristic operating conditions H1–H7 described in Section 2.2 | To establish a baseline comparison model primarily trained via characteristic experiments | 7 days |
| C2 | Characteristic operating conditions H5–H7 described in Section 2.2 | To evaluate the influence of compound characteristic effects on model performance while maintaining a characteristic-experiment-dominant dataset | 3 days |
| C3 | The better-performing setting between C1 and C2, combined with duty-cycle-related operating conditions (H8, H10, H12, H13, H15, and H17) | To investigate the effect of duty-cycle duration on model behavior under both constant-temperature and varying-temperature environments | 13 days |
| C4 | The better-performing setting between C1 and C2, augmented with short-duty-cycle conditions under constant and varying ambient temperatures (H8 and H13) | To examine the effect of the shortest duty-cycle duration on model accuracy | 9 days |
| C5 | The better-performing setting between C1 and C2, augmented with medium-duty-cycle conditions under constant and varying ambient temperatures (H10 and H15) | To evaluate the effect of intermediate duty-cycle durations on model accuracy | 9 days |
| C6 | The better-performing setting between C1 and C2, augmented with long-duty-cycle conditions under constant and varying ambient temperatures (H12 and H17) | To assess the effect of the longest duty-cycle duration on model accuracy | 9 days |
| C7 | The best-performing configuration among C3–C6, combined with conditions that isolate the varying-temperature effect | To compare how varying ambient temperature affects the model’s fitting capability | 8 days |
| C8 | The best-performing configuration among C3–C6, combined with conditions that isolate the constant-temperature effect | To compare how constant ambient temperature influences the model’s fitting capability | 8 days |
| C9 | The better-performing configuration between C7 and C8, combined with the abrupt-change operating condition (H18) | To examine the model’s transfer-learning capability under rapid and simultaneous variations in machine operation and ambient temperature | 9 Day |
| Test Condition | Prediction Error RMSE (µm) | Test Condition | Prediction Error RMSE (µm) |
|---|---|---|---|
| H1’ | 0.51 | H9’ | 1.97 |
| H2’ | 1.16 | H10’ | 1.20 |
| H3’ | 2.30 | H11’ | 1.35 |
| H4’ | 2.39 | H12’ | 1.13 |
| H5’ | 0.93 | H13’ | 1.84 |
| H6’ | 1.03 | H14’ | 5.71 |
| H7’ | 1.21 | H15’ | 4.01 |
| H8’ | 1.24 | H16’ | 1.17 |
| Average | 1.82 | ||
| Insertion Strategy | Average RMSE (µm) Across C1–C9 | |
|---|---|---|
| 1 | Insert a dense layer at Position (IV), as described in Section 2.3.2. | 3.28 |
| 2 | Insert an LSTM layer at Position (II), as described in Section 2.3.2. | 3.77 |
| 3 | Insert a dense layer at Position (II) and an LSTM layer at Position (IV), as described in Section 2.3.2. | 3.86 |
| 4 | Insert an LSTM layer at Position (II) and a dense layer at Position (IV), as described in Section 2.3.2. | 3.86 |
| 5 | Direct testing (pretrained model) | 3.88 |
| Sum of Squares | Degree of Freedom | Mean Square | F-Statistic | p-Value | |
|---|---|---|---|---|---|
| Factor A | 15,031.82 | 3 | 5010.61 | 4473.78 | 0 |
| Factor B | 1884.69 | 7 | 269.24 | 240.4 | 2.88 × 10−260 |
| Factor C | 407.66 | 8 | 50.96 | 45.5 | 1.37 × 10−67 |
| Factor A × Factor B | 2673.63 | 21 | 127.32 | 113.68 | 1 × 10−322 |
| Factor A × Factor C | 647.43 | 24 | 26.98 | 24.09 | 8.77 × 10−93 |
| Factor B × Factor C | 318.67 | 56 | 5.69 | 5.08 | 1.01 × 10−29 |
| Factor A × Factor B × Factor C | 520.99 | 168 | 3.1 | 2.77 | 1.29 × 10−25 |
| Error | 2257.91 | 2016 | 1.12 |
| Factor A Level | Average RMSE (µm) |
|---|---|
| A3: LSTM inserted after hidden layer, dense layer inserted after output layer | 4.70 |
| A2: LSTM inserted after hidden layer | 4.75 |
| A1: Dense layer inserted after input layer | 5.49 |
| A4: Dense layer inserted after hidden layer, LSTM inserted after output layer | 10.84 |
| Factor B Level | Average RMSE (µm) |
|---|---|
| B1: Do not unfreeze any hidden layers. | 4.15 |
| B8: Unfreeze the layers named “bidirectional,” “bidirectional_1,” and “regression_head_1” in Table 1. | 6.52 |
| B6: Unfreeze the layers named “bidirectional” and “regression_head_1” in Table 1. | 6.52 |
| B5: Unfreeze the layers named “bidirectional” and “bidirectional_1” in Table 1. | 6.53 |
| B2: Unfreeze the layer named “bidirectional” in Table 1. | 6.54 |
| B7: Unfreeze the layers named “bidirectional_1” and “regression_head_1” in Table 1. | 7.07 |
| B3: Unfreeze the layer named “bidirectional_1” in Table 1. | 7.08 |
| B4: Unfreeze the layer named “regression_head_1” in Table 1. | 7.14 |
| Factor C Level | Average RMSE (µm) |
|---|---|
| C9: H1, H2, H3, H4, H5, H6, H7, H15, H18 | 6.065 |
| C7: H1, H2, H3, H4, H5, H6, H7, H15 | 6.068 |
| C1: H1, H2, H3, H4, H5, H6, H7 | 6.19 |
| C3: H1, H2, H3, H4, H5, H6, H7, H8, H10, H12, H13, H15, H17 | 6.22 |
| C4: H1, H2, H3, H4, H5, H6, H7, H8, H13 | 6.27 |
| C8: H1, H2, H3, H4, H5, H6, H7, H10 | 6.46 |
| C5: H1, H2, H3, H4, H5, H6, H7, H10, H15 | 6.58 |
| C6: H1, H2, H3, H4, H5, H6, H7, H12, H17 | 6.66 |
| C2: H5, H6, H7 | 7.48 |
| Description | Training Time | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Average RMSE (µm) |
|---|---|---|---|---|---|---|---|
| Main-effect analysis (Recommended Combination 1) Factor A: Insert an LSTM layer after the hidden layer and insert a dense layer after the output layer (A3) Factor B: No layers unfrozen (B1) Factor C: Operating condition combination 9 (C9) | 0.5 min | 3.55 | 3.69 | 3.53 | 3.44 | 3.6 | 3.56 |
| Interaction-effect analysis: Factors A and B (Recommended Combination 2) Factor A: Insert a dense layer after the input layer (A1) Factor B: No layers unfrozen (B1) Factor C: Operating condition combination 9 (C9) | 0.5 min | 3.7 | 3.85 | 3.91 | 3.9 | 3.93 | 3.86 |
| Interaction-effect analysis: Factors A and C (Recommended Combination 1) Factor A: Insert an LSTM layer after the hidden layer and insert a dense layer after the output layer (A3) Factor B: No layers unfrozen (B1) Factor C: Operating condition combination 9 (C9) | 0.5 min | 3.55 | 3.69 | 3.53 | 3.44 | 3.6 | 3.56 |
| Interaction-effect analysis: Factors B and C (Recommended Combination 3) Factor A: Insert an LSTM layer after the hidden layer (A2) Factor B: No layers unfrozen (B1) Factor C: Operating condition combination 9 (C9) | 0.5 min | 3.44 | 3.43 | 3.43 | 3.42 | 3.66 | 3.47 |
| Interaction-effect analysis: Factors A, B, and C (Recommended Combination 3) Factor A: Insert an LSTM layer after the hidden layer (A2) Factor B: No layers unfrozen (B1) Factor C: Operating condition combination 9 (C9) | 0.5 min | 3.44 | 3.43 | 3.43 | 3.42 | 3.66 | 3.47 |
| Pretrained model (direct testing) | 0 min | 3.88 | 3.88 | ||||
| Pretrained model network architecture from C9 | 1 min | 3.59 | 3.57 | 4.12 | 3.55 | 3.90 | 3.74 |
| Test results of model retrained using data from C9 with optimized parameters | 4 h 10 min | 4.21 | 3.50 | 3.73 | 4.09 | 3.69 | 3.84 |
| Test results of model retrained using data from all H1–H18 as the training set and H19 as the validation set with optimized parameters. | 18 h 20 min | 3.25 | 3.19 | 3.32 | 3.29 | 3.00 | 3.21 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Chuang, C.-C.; Lin Chi, Z.-W.; Kuo, T.-C.; Chang, C.-J.; Hsieh, W.-H. Development of a Transfer Learning Technique for Rapid Adaptation of Thermal Compensation Models to Long-Term Machine Thermal Behavior Changes. Machines 2026, 14, 309. https://doi.org/10.3390/machines14030309
Chuang C-C, Lin Chi Z-W, Kuo T-C, Chang C-J, Hsieh W-H. Development of a Transfer Learning Technique for Rapid Adaptation of Thermal Compensation Models to Long-Term Machine Thermal Behavior Changes. Machines. 2026; 14(3):309. https://doi.org/10.3390/machines14030309
Chicago/Turabian StyleChuang, Chia-Chin, Zheng-Wei Lin Chi, Tzu-Chien Kuo, Che-Jui Chang, and Wen-Hsin Hsieh. 2026. "Development of a Transfer Learning Technique for Rapid Adaptation of Thermal Compensation Models to Long-Term Machine Thermal Behavior Changes" Machines 14, no. 3: 309. https://doi.org/10.3390/machines14030309
APA StyleChuang, C.-C., Lin Chi, Z.-W., Kuo, T.-C., Chang, C.-J., & Hsieh, W.-H. (2026). Development of a Transfer Learning Technique for Rapid Adaptation of Thermal Compensation Models to Long-Term Machine Thermal Behavior Changes. Machines, 14(3), 309. https://doi.org/10.3390/machines14030309

