3.1. Comparison of Environmental Variable Distributions and Characteristics by Facility
The statistical values and standard deviations of each variable used in the development of the pre-trained model are presented in
Table 5, and the overall distributions and temporal variation patterns are shown in
Figure 7. During the experimental period, the measured indoor temperature reached a daytime maximum of 34.2 °C and dropped to a nighttime minimum of 27.0 °C, with an average of 30.1 °C. The average RH was 71%, ranging from a maximum of 82% to a minimum of 58%, showing a gradual decline throughout the observation period. This pattern was interpreted as a result of interactions with external climatic conditions. For NH
3, concentrations ranged from a minimum of 5.22 ppm to a maximum of 55 ppm, with an average of 14.42 ppm, generally remaining within the recommended threshold of 25 ppm. The average CO
2 concentration was 1011.5 ppm, with fluctuations between 675.6 ppm and 1722.8 ppm.
The training data for the transfer learning model are summarised in
Table 6 and illustrated in
Figure 8. In this dataset, the indoor temperature ranged from 24.6 °C to 31.1 °C, with an average of 28.2 °C. The RH averaged 57.1%, with a maximum of 84.1% and a minimum of 37.1%, indicating slightly lower values than in the source domain. NH
3 concentrations ranged from 18 to 71 ppm, with an average of 38.1 ppm. CO
2 concentrations showed significant variability, from 1450 ppm to 5677 ppm, with an average of 3384.1 ppm. These results can be attributed to factors such as higher stocking density and seasonal outdoor climatic conditions.
Fluctuations in CO2 and NH3 concentrations are interpreted as a consequence of ventilation system operations, wherein ventilation rates are adjusted in response to changes in indoor temperature. In both facilities, a common trend was observed: as temperature increased, ventilation rates increased, resulting in decreased NH3 and CO2 concentrations. The data confirmed a clear inverse relationship between indoor temperature and ventilation rate, with gas concentrations exhibiting similar regulatory patterns.
Although differences in environmental conditions—such as temperature, RH, and gas concentrations—existed between the datasets used for pre-training and transfer learning, the underlying operational mechanisms, including ventilation strategies, rearing environment variables, and measurement parameters, were consistently maintained. These shared dynamics support the feasibility of applying a transfer-learning-based predictive model to both domains. Therefore, rather than focusing solely on the differences in variable distributions, this study validates the legitimacy of transfer learning by emphasising the similarity in interaction patterns between environmental variables in both settings.
3.2. Influence of Input Variables on the Predictive Model
Figure 9 presents the results of evaluating and visualising the importance of key variables influencing NH
3 concentration using data collected from each facility. The analysis revealed that the two most influential factors were the average body weight of the pigs and the CO
2 concentration, which ranked first and second in feature importance, respectively.
The importance of average body weight can be interpreted as reflecting the production stage of the herd and the associated increase in nitrogen input and excretion. As pigs grow, feed intake and excretion rates generally increase, leading to a greater manure nitrogen load and, consequently, a higher potential for NH3 generation at manure and floor surfaces. Therefore, body weight primarily represents a source-term driver linked to the emission potential rather than short-term ventilation dynamics. Since feed intake and manure nitrogen content were not directly measured, this interpretation is presented as a proxy-based explanation grounded in established husbandry relationships.
In contrast, CO
2 concentration primarily captures ventilation and dilution conditions within the building. While CO
2 is generated by animal respiration, its indoor concentration is strongly modulated by the effectiveness of air exchange; thus, CO
2 can serve as an operational proxy for the balance between pollutant accumulation and removal. Under typical management, ventilation is adjusted in response to thermal loads, which affects both CO
2 and NH
3 concentrations. This explains the similar temporal patterns observed for CO
2 and NH
3 (
Figure 7c,d and
Figure 8d) and supports the consistently high importance of CO
2 in both the source and target domains. In the target domain, CO
2 may become relatively more informative because differences in housing scale and ventilation operation can make short-term NH
3 variability more strongly governed by ventilation-driven dilution, thereby strengthening the CO
2–NH
3 relationship learned by the model.
Notably, despite differences between facilities, average body weight and CO2 represent two fundamental and broadly transferable drivers, likely explaining their dominance across both domains. Other predictors regulate NH3 predictions through complementary pathways. Ventilation rate directly affects indoor NH3 via dilution and removal and also mediates the influence of thermal conditions. Temperature may influence NH3 volatilisation and mass transfer from emitting surfaces and can indirectly affect concentrations through ventilation control strategies. Relative humidity may reflect moisture-related conditions of manure and floor surfaces, which can modify emission and transfer processes; however, these effects can be non-linear and site-dependent due to interactions with management practices and ventilation.
Although this analysis enabled the identification of the relative importance of key variables, it was limited in its ability to quantitatively interpret non-linear interactions and complex interdependencies between the variables. For a more sophisticated interpretation, we therefore additionally applied a model interpretation technique,
SHAP, as described in
Section 4.2.
3.3. Performance Evaluation of the Pre-Trained Model
Model validation was performed by splitting the collected dataset into 80% for training and 20% for testing. Predictive performance was evaluated using three metrics:
R2,
RMSE, and
MAPE. The pre-trained model demonstrated overall high prediction accuracy, achieving
R2 = 0.96,
RMSE = 1.22, and
MAPE = 4.90 (
Table 7).
Figure 10 presents a visualised time-series comparison between the predicted NH
3 concentrations and the actual measured values. A significant alignment was observed between the two, with the predicted values reliably following the periodicity and patterns of the empirical data. These results indicate that the model possesses sufficient reliability to serve as a foundation for transfer learning.
Notably, the fact that the model demonstrated high generalisability despite being trained on a limited amount of data collected under livestock farming conditions supports its potential to maintain effective performance during subsequent transfers to the target domain. Accordingly, this pre-trained model was employed as the base model for transfer learning experiments using data from an actual pig farm located in Suncheon, to assess whether predictive accuracy could be sustained under new environmental conditions.
3.4. Comparison of Predictive Performance: Standalone Model Versus Transfer-Learning-Based Model
To evaluate the predictive performance of NH
3 concentration in the target domain, a comparative experiment was designed based on two variables: data collection intervals (10, 20, 30, and 60 min) and training strategy (standalone versus transfer learning). In particular, the analysis focused on how the transfer learning approach, which leverages generalised representations from a pre-trained model, affects prediction accuracy under varying data collection conditions. The experimental results are presented in
Table 8,
Table 9 and
Table 10 and
Figure 11. Overall, the transfer-learning-based model (Case B) outperformed the standalone model (Case A) under all conditions. The standalone model, trained solely on target domain data, exhibited a general tendency for predictive performance to improve with shorter data collection intervals. For instance, at the 10 min interval, it achieved relatively good performance with
R2 = 0.79,
RMSE = 3.45, and
MAPE = 6.12%. However, when the interval was extended to 30 min, performance decreased to
R2 = 0.67,
RMSE = 4.03, and
MAPE = 7.79%. This drop is interpreted as a result of the model’s inability to capture short-term fluctuations in NH
3 concentration at longer sampling intervals. Interestingly, at the 60 min interval, the model showed partial recovery, likely because long-term patterns and correlations between input variables became more prominent at coarser time resolutions. Nevertheless, the performance of the standalone model remained sensitive to data collection frequency and dataset structure, revealing inherent limitations in its generalisability.
In contrast, the transfer learning model (Case B) leveraged representations from the pre-trained source domain model and was fine-tuned using a subset of data from the target domain. Across all data collection intervals, Case B consistently demonstrated higher predictive accuracy than Case A. Notably, under conditions where the standalone model (Case A) showed diminished performance—such as at the 30 min interval—Case B achieved superior results, with R2 = 0.85, RMSE = 3.31, and MAPE = 5.24%. These findings suggest that transfer learning is an effective strategy for maintaining prediction stability and accuracy even in environments with low data resolution or limited sample sizes. Furthermore, by reusing high-level feature representations learned by the pre-trained model, the transfer learning approach maintained excellent performance with shorter training times and lower computational demand, even when the input data were scarce.
To minimise overfitting during fine-tuning, model complexity was controlled using XGBoost regularisation and tree-growth constraints, and model adaptation was carried out using a chronological split to prevent information leakage. Specifically, the model was fine-tuned using the initial segment of the target-domain time series, and final performance was evaluated on an independent evaluation set that was not used for training. Both models exhibited improved predictive performance with shorter data collection intervals; however, the sensitivity to interval changes was notably lower in Case B. In other words, the transfer-learning-based model was less affected by variations in data-sampling frequency and exhibited more consistent accuracy. This consistency indicates that the model internalised the generalised environmental features from the source domain, enabling it to learn effectively from limited data in the target domain.
The above experimental results indicate that the standalone model, which is based on the specific characteristics of an individual domain, responds sensitively to variations in data collection conditions and environmental variables. Consequently, such models may face limitations in maintaining stable performance when applied in real-world settings. In contrast, the transfer-learning-based model demonstrated stable predictive accuracy across a range of data collection conditions and showed strong potential for effective application in actual livestock environments, particularly where data collection is limited or environmental conditions differ substantially. These findings suggest that the transfer learning approach proposed in this study holds significant value for enhancing the generalisability and practical applicability of predictive models in livestock environmental contexts.
3.5. SHAP-Based Feature Importance Results
To quantitatively assess the effectiveness of transfer learning and analyse the impact of the pre-trained model on NH3 concentration prediction in the target domain, an additional analysis using SHAP values was conducted. Two XGBoost-based predictive models were compared: a standalone model trained solely on the target domain data, and a transfer learning model fine-tuned on the same dataset using a model pre-trained on the source domain. SHAP values were computed using the SHAP Python package (version 0.49.1).
A SHAP dot plot was used for comparison, displaying the distribution of SHAP values for each input variable across varying input levels. This allowed intuitive interpretation of the influence of each independent variable on NH3 concentration. Comparing the two models revealed how transfer learning altered the relative contributions and inference patterns of input variables, enabling improved interpretability and insight into how the model responded to environmental variables.
Figure 12 presents the
SHAP results from the standalone model, highlighting the feature importance of each variable in predicting NH
3 concentration. For most data collection intervals (10, 20, and 60 min), CO
2 concentration emerged as the most influential variable, with
SHAP values increasing in accordance with input level.
In contrast, for the 30 min interval, RH recorded the highest SHAP values and became the dominant variable in the model.
Figure 13 shows the
SHAP results from the transfer learning model. Across all intervals, CO
2 remained the most influential predictor, exhibiting the largest spread of
SHAP values compared with the other input variables, which indicates its dominant global contribution to NH
3 prediction under the target-farm conditions. Moreover, relative to the standalone model, the transfer learning model showed a broader range of CO
2-attributed
SHAP values, suggesting that CO
2 captured a wider set of ventilation- and dilution-related operating states after fine-tuning. For instance, at the 30 min interval, the
SHAP value range for CO
2 was [−6.8, 7.5] in the standalone model but [−10.8, 7.0] in the transfer learning model. In addition, the
SHAP dot plots indicate that higher CO
2 levels generally corresponded to more positive
SHAP contributions, consistent with elevated NH
3 under reduced ventilation conditions, whereas lower CO
2 levels tended to contribute negatively. Overall, these patterns support the interpretation that fine-tuning with pre-learned knowledge reduced over-reliance on any single variable and enabled better reflection of latent patterns within the target domain.
Moreover, while the top-ranked variables remained largely consistent, differences between their SHAP values were reduced after applying transfer learning, leading to fewer fluctuations in the relative importance rankings.