1. Introduction
In recent years, deep learning techniques have demonstrated significant potential in chemometrics due to their powerful feature extraction and pattern recognition capabilities [
1,
2]. Particularly in Near-Infrared Spectroscopy (NIRS) analysis, these methods have overcome the limitations of traditional approaches such as Partial Least Squares regression (PLS) and manual feature engineering by enabling end-to-end feature learning, thereby providing innovative solutions for the rapid detection and quantitative analysis of material components [
3]. However, spectral data discrepancies across instruments in practical applications restrict model generalizability, prompting researchers to develop model transfer methods to enhance cross-instrument prediction performance. The heterogeneous instrumentation, environmental variability, and sample diversity prevalent in industrial scenarios pose substantial challenges to conventional modeling frameworks. Transitioning from laboratory-grade instruments to field-deployed portable devices introduces mismatches in spectral data caused by differences in optical resolution, detector sensitivity fluctuations, and environmental factors such as temperature and humidity. These factors lead to multi-scale distribution shifts and domain-specific feature space misalignment, resulting in significant performance degradation of pre-trained models in new operational contexts [
4,
5,
6,
7,
8]. Repeated model recalibration to address these variations requires substantial labeled data and incurs high temporal and economic costs, severely limiting the scalable application of analytical models. To address these challenges, researchers have proposed instrument-specific calibration transfer techniques such as Direct Standardization (DS) and PLS-based calibration transfer. While these methods reduce inter-instrument discrepancies by correcting spectral data or model parameters, they remain limited in handling complex distribution shifts due to insufficient adaptability to nonlinear data patterns and neglect of multi-scale features [
9,
10,
11].
The emergence of transfer learning techniques offers a promising solution to this challenge [
12,
13]. Unlike conventional modeling approaches that heavily rely on target-domain data, transfer learning mitigates model degradation caused by data distribution discrepancies and feature space heterogeneity through knowledge transfer mechanisms, enabling effective adaptation from source to target domains [
14,
15]. Recently, transfer learning has gained significant attention in cross-instrument model transfer for NIRS applications. For instance, Mishra et al. [
16] proposed a deep calibration transfer method based on transfer learning that operates without standard samples, demonstrating applicability for model transfer between FT-NIR and handheld NIR instruments. Zhang et al. [
17] introduced a partial transfer component regression framework for calibrating NIRS data in soil analysis, achieving accurate cross-domain predictions through an automated calibration module. Liang et al. [
18] further developed an improved segmented direct standardization approach, where transfer samples selected based on specific attributes can be directly applied to calibration models for other attributes, yielding satisfactory results. Despite these advancements, most studies focus on specific transfer scenarios or single distribution alignment strategies. They fail to adequately address the multi-scale characteristics of spectral data and the compound distribution shifts arising from instrumental differences, leaving critical challenges unresolved in practical implementations.
Existing studies have preliminarily validated the feasibility of transfer learning in NIRS applications. Li et al. developed a three-stage transfer framework based on feature mapping alignment, achieving higher classification accuracy and scalability in multi-variety, multi-manufacturer NIRS classification experiments compared to established methods such as SVM, BP, AE, and ELM [
19]. Zhang’s team further proposed a spectral-density correlation transfer model that compensates for moisture content differences through an automated calibration module, achieving optimal density prediction for larch datasets under varying moisture conditions [
20]. Deng et al. introduced three transfer learning strategies applied to deep learning models for toxin spectral data, effectively addressing the poor adaptability of single-source models [
21]. While these studies demonstrate the potential of transfer learning in NIRS, most focus on specific transfer scenarios or single distribution alignment strategies. They inadequately address the multi-scale characteristics of spectral data and the compound distribution shifts caused by instrumental differences, leaving critical challenges unresolved in practical implementations.
Current research faces two critical challenges: First, the multi-scale characteristics of spectral data require models to dynamically adapt to varying wavelength resolutions. Second, instrument-induced marginal and conditional distribution shifts must be jointly optimized to achieve cross-domain alignment. To address these issues, researchers have explored diverse strategies. In feature extraction, multi-scale convolutional neural networks have been proposed to capture spectral characteristics across varying wavelength resolutions [
22]. For distribution alignment, domain adaptation techniques such as Maximum Mean Discrepancy (MMD) and adversarial training have been applied to reduce feature distribution discrepancies between source and target domains [
23]. However, these methods often separately optimize feature extraction or distribution alignment, failing to achieve synergistic optimization between these components. Therefore, designing a transfer learning framework that dynamically aligns feature distributions while enhancing cross-domain prediction accuracy has become a critical research focus.
Based on this background, this study proposes a novel transfer learning framework, BDSER-InceptionNet, which innovatively integrates a network architecture with a Balanced Distribution Adaptation (BDA) algorithm to simultaneously optimize feature representation and distribution alignment for addressing cross-instrument model transfer challenges in NIRS. First, a foundational feature extraction module is constructed using depthwise separable convolution [
24], decoupling spatial feature learning from channel-wise information aggregation. This design ensures a lightweight architecture while enhancing sensitivity to localized spectral characteristics. Second, an RX-Inception multi-scale structure is developed by combining the depthwise separable convolution advantages of Xception with the residual connection properties of ResNet [
25]. Finally, an SE channel attention mechanism [
26] is incorporated to dynamically recalibrate channel weights, strengthening the representation of critical spectral bands. These improvements effectively resolve issues of feature redundancy and multi-scale information loss inherent in traditional convolutional neural networks for spectral analysis.
More importantly, this study introduces the “Balanced Distribution Adaptation (BDA) theory” into the field of NIRS quantitative analysis for the first time. In cross-instrument NIRS applications, both marginal and conditional distributions of spectral data may shift due to instrumental discrepancies. In contrast to conventional domain adaptation methods that focus solely on marginal distribution alignment, the BDA algorithm jointly optimizes marginal distribution discrepancies and conditional distribution mismatches by establishing a “dual-constraint dynamic adaptation framework”. This approach comprehensively reduces domain divergence while achieving an optimal balance between inter-domain alignment and intra-domain prediction performance, thereby enhancing cross-domain predictive capabilities.
To further evaluate the experimental effectiveness, this study designs six distinct model transfer strategies to compare and analyze the impact of different transfer approaches on migration performance. The corn and pharmaceutical datasets are selected for experiments, as they represent canonical application scenarios of Near-Infrared Spectroscopy (NIRS) in agricultural and industrial domains, respectively, offering broad sector coverage and practical challenges. The corn dataset involves multi-component quantitative analysis, while the pharmaceutical dataset encompasses complex chemical compositions and stringent quality control requirements—both posing high demands on model accuracy and robustness. These datasets, widely adopted as established benchmarks in model transfer and calibration transfer research, enable rigorous validation of the proposed method’s generalization capability and practical applicability.
The experimental results demonstrate that the BDSER-InceptionNet model achieves high-precision prediction of pharmaceutical and corn composition without requiring complex preprocessing. Through the “BDA-optimized transfer learning strategy”, the model significantly outperforms baseline methods such as PLS, SVR, and traditional CNNs in cross-instrument prediction tasks, highlighting its advantages in handling multi-scale spectral data and cross-domain distribution alignment. This outcome not only validates the academic merit of the proposed approach but also establishes a foundation for its practical application in multi-instrument NIR spectroscopy analysis.
5. Results and Discussion
5.1. Results of the Pharmaceutical Dataset
Using A1 as the host instrument and A2 as the slave instrument, the initial tablet dataset comprised 155 calibration samples, 40 validation samples, and 460 test samples. To optimize model performance, these three subsets were integrated and re-divided into training and testing sets at an 8:2 ratio using the K-S algorithm. After removing 19 outlier samples, a final set of 509 training samples and 127 testing samples was obtained. A model was established using the training set from A1 and then tested on both A1 and A2. Finally, models were built using PLS, SVR, CNN, and BDSER-InceptionNet for comparison. The results are shown in
Table 2.
The data in
Table 2 demonstrate that convolutional networks outperform classical chemometric methods in feature extraction. The improved BDSER-InceptionNet model proposed in this paper further enhances predictive performance by incorporating multi-scale fusion, residual structures, and SE attention mechanisms, achieving the best results with an RMSE as low as 2.0243 and an R
2 as high as 0.9810. Previous studies [
36,
37] aim to address model adaptability in Near-Infrared (NIR) spectroscopy applications, but they typically rely on traditional chemometric approaches such as Multiplicative Scatter Correction (MSCA) or linear regression-based model updates, focusing primarily on spectral standardization or optimization of linear model parameters. In contrast, our approach leverages deep learning combined with advanced domain adaptation theory (Balanced Distribution Adaptation, BDA) to offer a more comprehensive and automated solution. This highlights the unique advantages of our method in handling multi-scale distribution shifts and cross-domain feature space mismatches caused by instrumental heterogeneity, environmental interference, and sample diversity.
Following the six transfer methods designed in
Section 2.3.2, a comparative analysis of the BDSER-InceptionNet model transfer performance was conducted. The training set of A2 data was used for fine-tuning, and the test set was evaluated. The evaluation metrics remained R
2, RMSE, and MAE. The results are shown in
Table 3.
Compared with the model before transfer, different transfer strategies of various network structures can improve the prediction results. The best performance was achieved after incorporating BDA, with an R
2 of 0.9860, and RMSE and MAE as low as 1.7489 and 1.4055, respectively, which proves the effectiveness of introducing BDA. The relevant prediction results are shown in
Figure 7, which more intuitively displays the effects before and after transfer. In the figure, the yellow points represent the predicted values after model transfer, green points represent the values before model transfer, and the blue line indicates the true values. The closer the prediction points are to the blue line, the better the model’s prediction performance.
Observations from
Table 3 and
Figure 7 reveal that transferring the entire fully connected layer yields superior overall performance. When employing direct prediction or fine-tuning only the second fully connected layer, most predicted points are closer to the true values; however, certain samples exhibit significant deviations, leading to suboptimal RMSE and R
2 metrics compared to other transfer methods. Specifically, both R
2 and RMSE values under these approaches are inferior to those achieved by other transfer strategies. Comparing Transfer 1 (direct testing) with Transfer 2 (full model parameter updates), it is evident that updating the entire model on the new dataset outperforms direct testing. Similarly, Transfer 3 (fine-tuning the first fully connected layer) demonstrates better performance than Transfer 4 (fine-tuning the second fully connected layer). Transfer 5 and Transfer 6 serve as complementary ablation experiments: while Transfer 5 achieves commendable results, Transfer 6 significantly surpasses it in terms of RMSE and MAE improvements. These findings underscore the efficacy of incorporating BDA, which substantially enhances overall performance, thereby validating its practical utility.
5.2. Results of the Corn Dataset
In the corn dataset, the data were collected from 80 samples measured by three near-infrared spectrometers: m5, mp5, and mp6. Due to the relatively small sample size, no significant outliers were identified in the dataset, so no preprocessing was performed. As with the pharmaceutical dataset, the three datasets were integrated and re-divided into training and testing sets at an 8:2 ratio using the K-S algorithm, resulting in 64 training samples and 16 testing samples. The m5 instrument was designated as the primary instrument, while mp5 and mp6 served as secondary instruments. The model established on m5 was used to test the datasets of mp5 and mp6. Finally, models based on PLS, SVR, CNN, and BDSER-InceptionNet were developed and compared to predict the four components of maize.
In the prediction experiments for corn moisture, oil, protein, and starch content, the model trained on m5 was directly tested on mp5 and mp6. R
2 and RMSE were selected as evaluation metrics, and the results are shown in
Table 4.
5.3. Model Transfer
Using the designed 6 migration methods for BDSER—InceptionNet model transfer, the KS algorithm was applied to divide the mp5 and mp6 datasets into training and testing sets in an 8:2 ratio. Then, the pre-trained model obtained on m5 was fine-tuned on the training sets of mp5 and mp6. The testing was conducted on the corresponding testing sets, with R
2 and RMSE still chosen as evaluation metrics. Tests were carried out for four components: moisture, oil, protein, and starch. The results are presented in
Table 5.
In the experiment of corn moisture content prediction, method 6 achieved the best performance. When transferring the model from the source to the mp5 and mp6 targets, it obtained an R2 of 0.8589 and an RMSE of 0.1245 on mp5, and the highest R2 of 0.8226 and lowest RMSE of 0.1393 on mp6. These results show a significant improvement in prediction accuracy over method 5, thus proving the effectiveness of BDA. Meanwhile, transferring only the first fully connected layer achieved the second-highest R2.
In the corn oil content prediction experiment, when transferring the model from the source to the mp5 and mp6 targets, method 6 achieved an R2 of 0.8482 and an RMSE of 0.0636 on mp5. However, on mp6, it was outperformed by method 2, which updated all model parameters. Nonetheless, method 6 still yielded satisfactory results. In contrast, updating only the last fully connected layer consistently achieved the worst outcomes in both model transfers to mp5 and mp6.
In corn protein content prediction, when transferring the model from the source to the m5 target, method 6 achieved the highest R2 of 0.9688 and the lowest RMSE of 0.0858. When transferring to the mp6 target, method 6, despite not matching the direct model prediction, still outperformed the other five methods with an R2 of 0.9572 and an RMSE of 0.0985, highlighting the effectiveness of BDA in enhancing model transfer.
In the experiment on corn starch content prediction, when the host m5 transfers the model to mp5 and mp6, method 6 achieved an R2 of 0.9141 and an RMSE of 0.02123 on mp5, along with an R2 of 0.9554 and an RMSE of 0.1478 on mp6. Thus, method 6 performed best in transferring to mp5 and second best in transferring to mp6.
To gain a more intuitive understanding of the model transfer effects, the prediction results of the samples were visualized by plotting the model fitting curves before and after the model transfer. The yellow points indicate the predicted values after the model transfer, the green points indicate the predicted values before the model transfer, and the blue points represent the actual values. The closer the predicted points are to the blue points, the better the prediction performance of the model. As shown in
Figure 8.
5.4. Comparison Experiment
Comparing BDSER-InceptionNet with PLS/SVR/CNN is not enough to show transfer learning’s advantages. Widely-used model-transfer algorithms include Direct Standardization (DS) [
38], Piecewise Direct Standardization (PDS) [
39], and Slope and Bias Correction (SBC) [
40]. To strictly validate our method’s advantages, we included traditional chemometric methods in the comparison. We built a model on the master instrument using PLS and applied traditional transfer methods. For both the pharmaceutical and maize datasets, we conducted a comparative analysis of component transfer from master to slave instruments. The R
2 results are in
Table 6 and the RMSE results in
Table 7.
Through multiple experiments, it has been observed that applying the traditional transfer method, Direct Standardization (DS), as a secondary calibration step can enhance model transfer performance. Experiments conducted on two publicly available datasets—one for corn and one for pharmaceuticals—demonstrate that the proposed BDA algorithm is both efficient and stable, whereas traditional model transfer approaches exhibit significant performance fluctuations. In terms of the metric, the DS algorithm performs well on the corn dataset but leads to over-correction and poor results on the pharmaceutical dataset. The PDS algorithm achieves an of 0.9556 on the pharmaceutical dataset, outperforming DS and SBC. However, it yields negative values on the corn dataset. In contrast, the proposed Transfer 6 method demonstrates superior performance in model transfer. By incorporating a transfer loss during the fine-tuning of the pre-trained model, Transfer 6 consistently improves prediction accuracy and stability, offering an effective and robust solution for cross-instrument model adaptation.
5.5. Discussion
In this study, we observe significant differences in the performance of various model transfer methods across different parameters. Method 6, incorporating the BDA strategy, generally enhances cross-domain adaptation and prediction accuracy by aligning feature distributions. It shows the best performance in multiple component predictions of the pharmaceutical and corn datasets. Nevertheless, in specific cases like the corn oil content prediction, method 2 (fine-tuning all layers) achieves better results on a particular target instrument. This might be because method 2’s comprehensive parameter updating better adapts the model to the target domain’s features. In contrast, methods 3, 4, and 5, which only fine-tune certain layers, underperform in some component predictions, suggesting they may not fully utilize the target-domain data characteristics. These differences likely stem from variations in spectral features of different parameters and the model’s sensitivity to them. For some parameters with spectral features highly susceptible to instrument differences, more complex transfer methods are needed for distribution alignment. For others with relatively stable features, simple methods suffice.
This extended comparative experiment clearly establishes the methodological advantages of the proposed transfer learning approach, demonstrating superior performance not only over traditional calibration transfer methods (DS, PDS, SBC) but also over previously evaluated machine learning models (BDSER-InceptionNet, PLS, SVR, CNN). By consistently achieving higher R2 values and lower RMSE values across all predicted components, BDSER-InceptionNet provides a robust and effective framework for calibration transfer in spectral data analysis. These results validate the effectiveness of transfer learning in addressing the challenges posed by instrumental variations, representing a significant improvement over existing approaches and laying the foundation for more reliable and accurate predictions in practical spectroscopic applications.
In summary, the choice of model transfer method should consider the target-domain data characteristics and the differences between domains. While the BDA strategy usually improves transfer effectiveness, its performance is also influenced by data quantity, noise, and parameter settings. Future research will focus on optimizing the BDA strategy and exploring advanced model architectures to further enhance model generalization and prediction accuracy.
6. Conclusions
This study introduces a model transfer algorithm based on deep transfer learning, leveraging the BDSER-InceptionNet model and the Balanced Distribution Adaptation (BDA) strategy to effectively address the problem of insufficient model generalization caused by instrument differences in Near-Infrared (NIR) spectroscopy analysis. The BDSER-InceptionNet model integrates Depthwise Separable Convolution (DSC), multi-scale feature extraction (RX-Inception), and an attention mechanism (SE module), significantly enhancing feature extraction efficiency and model robustness. The BDA algorithm further optimizes the model’s cross-instrument adaptability by aligning the marginal and conditional distributions of the source and target domains. The superiority of this approach was validated through experiments on two publicly available datasets: tablet and corn. In the tablet dataset, the transfer strategy incorporating BDA achieved an R2 of 0.9860 in predictions on secondary instruments, with Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) as low as 1.7489 and 1.4055, respectively, outperforming traditional methods such as Partial Least Squares (PLS), Support Vector Regression (SVR), and conventional Convolutional Neural Networks (CNNs). Corn dataset results: Predictions of moisture, oil, protein, and starch content demonstrated that the BDA-optimized model exhibited greater stability and prediction accuracy in multi-instrument scenarios. In the corn dataset, predictions of moisture, oil, protein, and starch content demonstrated that the BDA-optimized model exhibited greater stability and prediction accuracy in multi-instrument scenarios. Compared to traditional methods, this model’s reduced reliance on preprocessing and its efficient end-to-end analysis capabilities underscore its practical value. In comparative experiments with traditional transfer methods, conventional chemometric models were outperformed by the proposed algorithm, with results highlighting significant performance advantages. Notably, the method excelled in generalization across instruments, effectively mitigating spectral differences between devices and thereby enhancing transfer effectiveness and prediction accuracy. Looking ahead, this approach holds promise for broader application in NIR spectroscopy scenarios, offering reliable support for cross-instrument analysis and contributing to advancements in non-destructive testing technologies.