Next Article in Journal
Learning System-Optimal and Individual-Optimal Collision Avoidance Behaviors by Autonomous Mobile Agents
Previous Article in Journal
An Active Learning and Deep Attention Framework for Robust Driver Emotion Recognition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Engineering-Oriented Explainable Machine Learning and Digital Twin Framework for Sustainable Dairy Production and Environmental Impact Optimisation

1
Department of Computer Science, Loughborough University, Loughborough LE11 3TU, UK
2
Cattle Information Service, Scope House, Hortonwood 33, Telford TF1 7EX, UK
*
Author to whom correspondence should be addressed.
Algorithms 2025, 18(10), 670; https://doi.org/10.3390/a18100670
Submission received: 23 September 2025 / Revised: 12 October 2025 / Accepted: 20 October 2025 / Published: 21 October 2025
(This article belongs to the Special Issue AI-Driven Engineering Optimization)

Abstract

Enhancing productivity while reducing environmental impact presents a major engineering challenge in sustainable dairy farming. This study proposes an engineering-oriented explainable machine learning and digital twin framework for multi-objective optimisation of milk yield and nitrogen-related emissions. Using the CowNflow dataset, which integrates individual-level nitrogen balance, feeding, and production data collected under controlled experimental conditions, the framework combines data analytics, feature selection, predictive modelling, and SHAP-based explainability to support decision-making in dairy production. The stacking ensemble model achieved the best predictive performance ( R 2 = 0.85 for milk yield and R 2 = 0.794 for milk urea), providing reliable surrogates for downstream optimisation. Predicted milk urea values were further transformed using empirical equations to estimate urinary urea nitrogen (UUN) and ammonia (NH3) emissions, offering an indirect yet practical approach to assess environmental sustainability. Furthermore, the predictive models are integrated into a digital twin platform that provides a dynamic, real-time simulation environment for scenario testing, continuous optimisation, and data-driven decision support, effectively bridging data analytics with sustainable dairy system management. This research demonstrates how explainable AI, machine learning, and digital twin engineering can jointly drive sustainable dairy production, offering actionable insights for improving productivity while minimising environmental impact.

1. Introduction

The dairy industry plays an important role in the global agricultural landscape, with milk composition serving as a critical determinant of product quality and economic viability. Accurate prediction of milk yield and analysis of milk composition such as protein, fat, and milk urea have become more essential in the dairy industry. Milk yield is not only related to the economic viability of a dairy farm, but it is also an indicator of the overall health of the herd [1]. Similarly, monitoring milk urea is vital for assessing nutritional adequacy and ensuring optimal metabolic function in cattle [2]. Moreover, milk urea or milk urea concentration (MU) is closely linked to urea and nitrogen (N) in cattle urine and the release of ammonia (NH3), a significant but difficult-to-measure emission associated with livestock farming. This highlights a critical need for advanced predictive models that go beyond traditional practices, bringing precision to dairy farming management.
In the past, livestock management relied strongly on the collective knowledge of people and their experiences to make effective decisions. With the development of technology, dairy farmers are finding more effective management strategies, such as using intelligent management systems and sensors to record the characteristics of their herd and improve the efficiency of dairy production [3]. This has led to an increasing availability of farm-related data, enabling data-driven management of farming through techniques such as machine learning (ML).
ML can be useful in identifying patterns in data and making predictions relevant to dairy production, which enables dairy farmers to deploy optimal management strategies and reduce their production costs [4]. Data-driven machine learning techniques have been utilised to develop various solutions in the dairy industry, such as farm management [5], animal behaviour monitoring [6], disease detection [7,8], and milk yield and composition prediction [9,10].
Several studies have shown the potential of machine learning applications for predicting milk yield and composition. Linear regression (LR) [11], Multiple Linear Regression (MLR) [12,13], Random Forest (RF) [9], Support Vector Machines (SVMs) [11], and Artificial Neural Networks (ANNs) [14,15] have been widely used to deal with different scenarios in predicting milk yield and milk composition, including components like lactoferrin. However, there is limited systematic research on optimising machine learning frameworks for multi-objective prediction of milk yield and milk composition that considers both productivity and environmental impact. In this paper, we address this gap by developing an interpretable, data-driven optimisation framework to gain actionable insights into productivity and emissions based on feeding and various biological factors, including the inference of urinary urea nitrogen (UUN) excretion and ammonia emissions from MU.
We propose an engineering-oriented explainable machine learning framework to assist dairy farmers in predicting milk yield and milk urea, enabling them to infer ammonia emissions and UUN from milk urea. In addition to single ML models, we develop a stacking ensemble model to determine the optimal way to combine predictions from heterogeneous ML techniques, which enhances prediction capabilities. This study uses the CowNflow dataset [16] on Holstein cattle from 1983 to 2019. The dataset contains biological attributes such as cow age, body weight, lactation month, and feeding information, including the consumption of diet components. The main contributions of this paper are as follows:
  • Hybrid Method for Multi-objective Optimisation of Milk Productivity and Environmental Impact: A hybrid framework is developed that integrates ML-based prediction of milk yield and milk urea concentration with statistical estimation of urinary urea nitrogen excretion and ammonia emissions. This approach enables a combined assessment of productivity and environmental sustainability in dairy systems.
  • Novel Stacking Ensemble Learning: A stacking ensemble model is introduced to enhance predictive accuracy, outperforming five individual algorithms (linear regression, Random Forest, SVR, AdaBoost, and ANN). The diversity and complementarity of the base learners contribute to improved overall performance and model stability.
  • Explainable AI Using SHapley Additive exPlanations (SHAP) Analysis: Model interpretability is achieved through SHAP, which quantifies the contribution of each feature to model outputs. This provides transparent and actionable insights into the influence of dietary, physiological, and management factors, supporting data-driven decision-making for dairy producers and researchers.
  • Integration with a Digital Twin Platform: The predictive ML models are embedded within a digital twin environment, enabling real-time simulation, scenario testing, and continuous optimisation of dairy system management. This integration supports adaptive decision-making, facilitates model validation, and creates a feedback loop for improving data collection and future model performance.
This paper is structured as follows. Section 2 reviews the related work and recent advances in interpretable ML for dairy production and sustainability. Section 3 presents the proposed ML framework and describes the intermediate steps, while Section 4 reports the experimental results, covering model training, evaluation, and explainability analysis. Section 5 integrates the predictive models with empirical statistical equations to estimate urinary urea nitrogen and ammonia emissions. Section 6 introduces the digital twin framework, describing its integration with predictive models and discussing model generalisation. Finally, Section 7 concludes this study and outlines future research directions.

2. Related Work

Traditional approaches to predicting milk yield and milk composition have historically relied on statistical models [17,18,19,20]. Although statistical models have shown great promise in producing accurate predictions, ML-based models have recently emerged as a promising paradigm with the capacity to handle large amounts of input data to learn complex non-linear relationships [21].
Numerous studies have explored ML models for predicting milk yield and milk composition in dairy cows. Multivariate linear regression is one of the simple but most effective methods for predicting the 305-day milk yield of dairy cows [22]. The authors also proposed a Static Neural Network (SANN) and a Non-linear Autoregressive Model with exogenous input (NARX). NARX and SANN were shown to have better performance than MLR in predicting peak milk yield. NARX is capable of adapting and altering its projected forecast [22], especially with the introduction of seasonal data [23]. However, the impact of using weather parameters on the accuracy was not substantial [12]. Sharma et al. [13] compared an MLR and an ANN in predicting 305-day milk yield in Karan Fries dairy cattle and showed that the performance of the ANN model is slightly superior to that of the regression model. Similar studies carried out in Sahiwal cattle [24] and Karan Fries cattle [25] also showed that ANNs exhibit higher performance. Apart from basic factors such as cow age and lactation, body weight at calving and the days in milk on the test day are regarded as the variables that are important for ANNs [13]. Compared with predicting milk yield, milk composition relies more on testing. Milk infrared spectroscopy is a commonly used technique for detecting the composition of milk [26,27].
Milk urea concentration, also expressed as milk urea nitrogen (MUN), is an important indicator of dietary protein utilisation and ammonia emissions from cattle manure [28], especially under restricted grazing. Milk urea nitrogen and milk urea concentration can be converted into each other using the formula milk urea nitrogen (mg/dL) = milk urea concentration (mg/dL) × 46.6%, as MUN specifically quantifies the amount of nitrogen in the urea molecule [29]. Urea is the main nitrogen compound in urine and has the potential for ammonia volatilisation relative to other nitrogen compounds [30]. Thus, urinary urea nitrogen (UUN) excretion is an indicator of ammonia emissions from dairy cattle manure [31]. In addition, there are several studies that have shown that there exists a positive relationship between MUN and UUN [32,33]. This correlation indicates that it may be feasible to utilise MUN for predicting ammonia emissions.
In addition to predictive modelling, ML frameworks have gained significant attention in the dairy research community, e.g., precision livestock farming [34]. ML frameworks provide a systematic framework for data pre-processing, feature engineering, model selection, and evaluation, streamlining the process of model development and deployment [35]. ML frameworks enable people to expedite the implementation of predictive models while ensuring reproducibility and scalability. Ji et al. [10] use 5 years of behavioural, health, and productivity data to develop a machine learning framework to predict cow overall production performance in a robotic dairy farm. It can be used to predict cattle’s daily milk yield and composition (fat and protein content) and milking frequency.
Recent advances have further demonstrated the growing role of interpretable and sustainability-oriented ML frameworks in dairy farming. Grzesiak et al. [36] reviewed modern ML techniques applied in dairy systems, showing progress in regression, tree-based, and neural models for productivity prediction. Liu et al. [37] proposed a precision ML framework integrating dietary, environmental, and management factors to predict lactation performance in large herds. In addition to productivity, Foschi et al. [38] combined ML with life-cycle assessment methods to improve carbon footprint prediction in milk production, illustrating the potential of data-driven models for sustainability analysis.
These studies highlight the continued expansion of ML research in dairy production and sustainability. However, there remains limited systematic research on optimising ML frameworks for multi-objective prediction that considers both productivity and environmental impact. In our study, we address this gap by developing an interpretable, data-driven optimisation framework that integrates predictive modelling and optimisation to enhance milk yield and milk urea concentration. The prediction of MU allows further estimation of urinary urea excretion and subsequent ammonia emissions, providing a practical and economical method for monitoring nitrogen dynamics and environmental impacts on dairy farms.

3. Research Method

Figure 1 shows the proposed ML framework for milk yield and milk urea prediction, as well as UUN excretion and ammonia emissions estimation. Data analytics and pre-processing are used to explore the properties of the data. During the pre-processing, missing data and outliers are addressed. After feature correlation analysis, four different feature selection methods (yellow part in the dotted box in Figure 1) are applied to estimate the feature importance for the prediction problem. The ranking of the importance of different features for milk prediction is obtained by averaging the ranking of features in each method. After feature selection, we split the dataset into training data and testing data. The training data is used to build the model, and the testing data is used to evaluate the model’s performance on unforeseen data. Finally, the performances of each model are compared and evaluated. SHAP analysis is performed to present the models’ explainability.
MUN and the predicted milk urea concentration (MU) can be converted into each other using the formula MUN (mg/dL) = MU (mg/dL) × 46.6%, as MUN specifically quantifies the nitrogen portion of the urea molecule [29]. In the ammonia emissions estimation stage, we utilised previously established statistical equations to estimate ammonia emissions from dairy cattle manure based on MUN (function f) and UUN excretion (function g). UUN excretion can be inferred from MUN by the function h. These functions are shown in Section 5. This provides a practical method for indirectly estimating ammonia emissions and UUN excretion.
This study provides a transferable ML workflow comprising standardised pre-processing, feature curation, model training, and interpretation. Developed and verified on a well-characterised dataset, the same procedures can be applied to varied settings. In large and heterogeneous datasets, our framework can serve as a transferable reference and facilitate quicker implementation.
All the reported analyses were carried out in Python (version 3.6.13) using the Scikit-learn library (version 1.1.3). Seaborn (version 3.10.6) was used to generate the visualisations, and SHAP (version 0.41.0) was also used.

3.1. Data Description

The models developed in this paper utilise the CowNflow dataset [16], which is published by the National Institute for Agriculture, Food and the Environment (INRAE) ([dataset] https://entrepot.recherche.data.gouv.fr/dataverse/inrae, accessed on 1 October 2023) in France. The dataset was collected at the experimental dairy farms of the INRAE. It reports individual biological measurements from dairy cattle, such as dry matter (DM) intake, milk yield, faecal nitrogen concentration, and feed attributes like the crude protein concentration of each feed and diet composition. Cows were fed in individual troughs, had free access to water, and were milked twice a day. Features relevant to diet parameters and biological characteristics are considered in this study. Table 1 shows the biological and diet-related features in the dataset.
The CowNflow dataset compiles data collected across 28 nitrogen balance experiments on Holstein dairy cows between 1983 and 2019. The experiments were conducted under standardised conditions and lasted for 10 weeks on average. Each experiment comprises 2–5 steps depending on the number of treatments in terms of the main forage offered in the diet. There are six types of diets considered in the dataset, namely Maize, Fresh Herbage, Maize Fresh Herbage, Maize Lucerne, Dehydrated Herbage, and Maize Hay. There are 3–6 cows in each experiment. Overall, the dataset contains 414 records for individual cows on a wide range of diets with different biological characteristics [16].

3.2. Data Analytics and Pre-Processing

In this section, data cleaning steps are performed, such as missing data clean-up, dealing with outliers, and data standardisation, to ensure data quality for further processes. Visualisation is adopted to develop an understanding of the underlying distribution for different features and identify patterns as well as trends in the dataset. It is also valuable for selecting appropriate machine learning models and making informed decisions such as identifying outliers.
Missing data: Missing data can arise from several circumstances, such as data collection mistakes or an attribute that may not apply to all cases. Dealing with missing data is important, as we cannot build the model using instances with missing data. Missing data can be addressed by (1) dropping the feature from all instances when missing for more than 30% observations; (2) dropping instances with missing data; or (3) imputing the missing data by using the median or mean or training a regression model to predict and impute. Dropping the feature or the instance can affect the performance of the model, which makes imputation a more effective technique. Therefore, in our study, the features ‘NDF intake’ and ‘ADF intake’, which have missing values of more than 30%, are dropped from the dataset. In addition, for each of the features ‘OM intake’, ‘Ash intake’, and ‘milk yield’, there exists one missing value. These values are replaced by the mean values of the respective features.
Outliers: An outlier is defined as an observation that has an abnormal distance from the group [39]. Outliers can be caused by measurement errors, data entry mistakes, and natural variability, etc. Training a model with outliers can bias the model. Data visualisation is performed to identify outliers. The interquartile range (IQR) is a commonly used tool to detect outliers for numerical features. Figure 2 displays IQR box-and-whisker plots for a representative subset of numerical features in the dataset.The IQR aims to represent the spread of the features. To calculate the IQR, the feature is divided into rank-ordered even quartiles, denoted by Q1 (lower 25%), Q2 (median 50%), and Q3 (upper 75% quartile), so the IQR is the median 50% (Q3 − Q1). The whiskers have an offset length of 1.5× IQR, and the values located outside of the whiskers are considered outliers. For example, Figure 2 shows that the values higher than 750 for the feature ‘N intake’ are regarded as outliers.
The instances containing outliers are usually removed from the dataset. To prevent loss of data available for training, only the instances that have two or more values identified as outliers in features are removed from the dataset. In addition, the instances in which the feature of lactation has a value higher than 20 are also removed, as this does not commonly happen in dairy farming management and easily biases the model performance.
Standardisation: The values of all features are adjusted using Z-score standardisation (the equation is shown in Equation (1)) to reduce the impact of variations in the spread of different features on the performance of ML models.
z = ( x μ ) σ
where z represents the standard score, x is the observed value, and μ is the mean of the value, while σ shows the standard deviation of the instance.
In addition, the records in which ‘physiological status’ has the value of ‘dry’ were excluded from this study, as these do not contribute towards predicting the milk yield and milk urea. After pre-processing, 403 records were used for milk yield prediction and 180 records for milk urea prediction.

3.3. Feature Selection and Importance Analysis

Feature selection identifies informative features for the problem of milk yield and milk urea prediction. It aims to remove less relevant attributes and improve model learning effectiveness [40].
ML techniques can only handle numerical features. For this purpose, the categorical attribute ‘Diet type’ is converted to numerical values using CatBoost Encoder [41].
The correlation heatmap can illustrate the correlation coefficients between all variables used in the dataset. It helps identify patterns and relationships among variables, showing which pairs are strongly correlated (values near 1) and which are weakly or not correlated (values near 0). In our experiment, the correlation heatmaps are generated using Seaborn. To focus on the strength rather than the direction of relationships, the correlation coefficients were converted to absolute values, as positive or negative signs only indicate correlation direction. Then the correlation matrices with Pearson coefficients for milk yield and milk urea are shown in Figure 3a,b. For the heatmap, the intensity of the colour also indicates the strength of the correlation, with darker shades representing stronger correlations. It can be seen that variables such as ‘DM intake’, ‘N intake’, and ‘LactationM (Lactation Month)’ exhibit strong correlations with milk yield, while variables like ‘N intake’ show significant correlations with milk urea. This visualisation is useful for identifying key factors influencing milk yield and milk urea, as well as any potential multicollinearity in the dataset, which means two variables are considered highly collinear if their correlation coefficient value is close to ±1, such as ‘N intake’ vs. ‘CP intake’ and ‘DM intake’ vs. ‘OM intake’.
Based on the linear correlation evaluation using Pearson, four additional methods are applied to rank the features, which are two filter methods: the linear regression F-test and mutual information test (MI); a wrapper method, Recursive Feature Elimination (RFE); and an embedded method, ridge regression with built-in cross-validation (RidgeCV). F-test is a statistical test that provides an f-score by calculating the ratio of variances between variables. The variance of a feature determines how much it impacts the target variable. If the variance is low, it implies this is a less important feature for predicting and vice versa. MI measures the reduction in the uncertainty of a variable when other random variables are known [42]. It indicates that the higher the mutual information value, the more relevance exists between this feature and the target. However, it does not account for multicollinearity or feature redundancy. RFE selects features by recursively dropping the least important feature for prediction in the dataset until the last feature is left. The SVM algorithm was used as the regressor and the linear function was chosen as the kernel. RidgeCV performs linear regression with L2 regularisation to penalise large coefficients, reducing overfitting and the multicollinearity effect. It can handle multicollinearity by shrinking coefficients of correlated features, but non-linear relationships may be less covered.
Table 2 and Table 3 show feature ranking results used to generate predictive models for milk yield and milk urea. For milk yield, as shown in Table 2, ‘Ash intake’ consistently has the lowest average rank across all four feature selection methods. Notably, in three out of the four methods, ‘Ash intake’ is identified as the least important feature, indicating its relatively low ranking overall. For milk urea in Table 3, features with a ranking score below 9 were selected, resulting in a total of 10 features. However, ‘DM intake’ and ‘OM intake’, as well as ‘N intake’ and ‘CP intake’, exhibited an extremely high linear correlation (Pearson’s r ≈ 1; see Figure 3), suggesting redundant information. Although this correlation is not exactly identical across all observations, deviations are small and do not provide substantial additional information. Keeping all four variables would increase redundancy and model complexity without a meaningful performance gain, even for non-linear learners that could theoretically capture minor interactions. To ensure model parsimony and stability while maintaining predictive integrity, ‘DM intake’ and ‘N intake’ were therefore retained as representative variables, while ‘OM intake’ and ‘CP intake’ were excluded. This choice preserves the relevant variance related to feeding efficiency and nitrogen utilisation, avoids overfitting from redundant predictors, and improves interpretability. Therefore, 9 features are selected for milk yield prediction, while 8 features are used to generate the milk urea model.

4. Machine Learning Predictive Model Performance and Explainability

4.1. Model Training

After data pre-processing and feature selection, the predictive models for milk yield and milk urea are developed. The performance of all models is evaluated using a hold-out [43] validation framework with 75% and 25% of data used for training and testing, respectively. Each experiment runs five times and takes the average of the values as the final performance results.
Five different supervised ML algorithms were considered to develop predictive models for both milk yield and milk urea, namely linear regression (LR), Support Vector Regression (SVR), Random Forest (RF) regression, Adaptive Boosting (AdaBoost), and an Artificial Neural Network (ANN). All models were fine-tuned through grid search with five-fold cross-validation to achieve the best performance.
For the milk yield prediction task, the linear regression model served as a baseline for establishing linear relationships between features and the target variable. The SVR used a radial basis function kernel (kernel = ‘rbf’) with regularisation parameters tuned around the default (C = 1.0, gamma = ‘scale’). The RF model ensembles multiple decision trees, and tests with different tree numbers determined that n_estimators = 100 and max_depth = 5 achieved the best balance between bias and variance. The AdaBoost regressor used DecisionTreeRegressor (max_depth = 5) as its base estimator with n_estimators = 100 and learning_rate = 0.1. The ANN was implemented using scikit-learn’s MLPRegressor with a single hidden layer of 100 ReLU-activated neurons, trained using the Adam optimiser with max_iter = 500.
For milk urea prediction, similar algorithms and tuning procedures were applied. The RF achieved optimal performance with n_estimators = 100 and max_depth = 8. The AdaBoost model used DecisionTreeRegressor(max_depth = 8) as its base learner, with n_estimators = 100 and learning_rate = 0.1. The SVR again used an RBF kernel with C = 1.0 and gamma = ‘scale’, while the ANN maintained the same architecture (100 ReLU neurons, Adam optimiser, and max_iter = 500).
A stacking ensemble method was further proposed after building the baseline models. The structure of this model is shown in Figure 4. The stacking model combines the predictions of multiple base models to create a stronger overall model by leveraging their complementary strengths. For milk yield, the three top-performing base models (SVR, AdaBoost, and ANN) were used as level-0 regressors, and a ridge regression model (alpha = 1.0) was employed as the level-1 meta-regressor with five-fold cross-validation (cv = 5). For milk urea, the stacking model integrated RF, AdaBoost, and ANN as base learners, again using ridge regression as the meta-learner with identical cross-validation settings. This ensemble configuration reduced generalisation errors and provided the best overall predictive accuracy for both tasks.

4.2. Model Evaluation

In this section, the performance of different models developed in this study is evaluated and compared. The metrics used to measure the performance of the models include R 2 , Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE). The R 2 value is a summary statistic that represents the proportion of total variance in the outcome variable that is captured by the model [44]. MAE and MSE measure the average of the absolute difference and the average of the squared difference between the original and predicted values in the dataset, respectively, which represent the average of the residuals and variance of the residuals. RMSE is the square root of MSE and measures the standard deviation of residuals [45]. The R 2 is less than 1, where values close to 1 indicate that the model captures nearly all of the variation in the outcome of the predicted target. MAE calculates the difference between the predictive value and the actual value for each data instance and takes the average absolute value of all instances. MSE is similar to MAE but represents the average of squared errors in the predictions. The mathematical formulas for the different error metrics used in this study are given below:
R 2 = 1 ( y i y i ^ ) 2 ( y i y i ¯ ) 2 , M A E = 1 n y i y i ^ , M S E = 1 n y i y i ^ 2 , R M S E = M S E
where y i represents the actual values, y i ^ is the predicted values, y i ¯ is the mean of the actual data, and n represents the total number of instances. A model that has smaller MAE, MSE, and RMSE values represents better performance.
Table 4 and Table 5 summarise the performance of five baseline models as well as the stacking model on the datasets obtained before and after feature selection for milk yield and milk urea prediction. The performance of the stacking model after feature selection for milk yield and milk urea is shown in Figure 5.
For milk yield, Table 4 shows that the stacking model has the best performance, with an R 2 value of 0.85 after feature selection. The stacking framework combines ANN, AdaBoost, and SVM into a meta-model that captures complex non-linear relationships and achieves higher predictive accuracy. In addition, feature selection further enhances the model’s performance. It is shown that when we remove the ash intake as well as the CP intake and OM intake, the overall performance of the trained models improves, with the average R 2 value increasing from 0.807 to 0.811, and the values of the three kinds of error measurements also decrease.
For milk urea prediction, the prediction results are summarised in Table 5. Among the five baseline models, ANN shows the best performance with an R 2 value of 0.792, along with minimised error metrics: an MAE of 3.681 and an RMSE of 5.101. For the stacking model, ANN, AdaBoost, and RF regression were chosen as base regressors to construct the ensemble. This combination slightly improved performance, achieving an R 2 value of 0.794, while maintaining comparable error metrics with an MAE of 3.817 and an RMSE of 5.076. These results highlight the utility of stacking in enhancing predictive accuracy.
Although the improvement in predictive performance after feature selection appears moderate, the overall trend confirms its positive contribution. The excluded features were mostly redundant or highly collinear (e.g., OM intake versus DM intake and CP intake versus N intake), offering limited additional information. As a result, feature selection primarily enhanced model robustness, stability, and interpretability rather than yielding large numerical gains. The consistent improvement across all models indicates that the procedure effectively reduced redundancy and improved generalisation, demonstrating that the refined feature subset provides a more concise and informative representation of the data.

4.3. Model Interpretation Using SHAP Analysis

SHAP analysis is used to analyse ML models to extract knowledge that could be directly utilised by farmers and other stakeholders. It is based on the concept of Shapley values from cooperative game theory, where each feature’s contribution to the prediction is considered in the context of all possible combinations of features [46]. SHAP reveals how individual features influence the predictions of a model, aiding in understanding and debugging complex machine learning models. It has been widely applied across various fields [47,48].
In our task, the swarm plots of SHAP value are used to visualise the importance of features for milk yield and milk urea prediction. The SHAP plot indicates the magnitude and direction (positive or negative) of each feature’s contribution to the model’s prediction. The dots on the SHAP plot represent instances, and the colour bar indicates the feature value from low (blue) to high (red). The input features on the vertical axis are ranked from top to bottom by their mean absolute SHAP values. Features with higher absolute SHAP values have a greater impact on the model’s predictions, highlighting the most important features.
Figure 6 illustrates the SHAP values of individual features for milk yield prediction across SVR, AdaBoost, ANN, and the ensemble stacking model. Each plot shows the impact of individual features on model predictions. Across all models, ‘LactationM (lactation month)’ shows the strongest contribution to the model prediction for milk yield. Meanwhile ‘Age’ seems to have a minimal influence on all models. Focusing on the stacking model (Figure 6d), it combines insights from the other three models to create potentially more stable predictions. The high negative SHAP value for ‘LactationM’ indicates that milk yield decreases as lactation progresses, which is consistent with the typical lactation curve observed in dairy cows. As lactation month increases, milk production naturally declines due to factors such as depletion of nutritional reserves and hormonal changes. ‘DM intake’ and ‘Diet type’ also show considerable impact, suggesting that both the amount and type of diet are critical for milk yield. ‘DMI/100 kgBW’ (normalised dry matter intake by body weight) is another important factor, which indicates that proportional intake is relevant for predictions. Conversely, features like ‘GestationM (Gestation month)’ and ‘Body weight’ contribute relatively less to milk yield.
Figure 7 presents the SHAP values for milk urea prediction across baseline models Random Forest, AdaBoost, ANN, and the stacking model. ‘N intake’ consistently ranks as the most positively influential feature across all models, indicating its crucial role in determining milk urea levels. Following ‘N intake’, features such as ‘DMI/100 kgBW’, ‘DM intake’, and ‘Body weight’ show a notable negative influence, highlighting the importance of nutrient balance and animal body size in reducing milk urea concentration. ‘Ash intake’ and ‘Milk Yield’ have relatively low SHAP values, indicating that they are weakly associated with milk urea levels.
The strong positive relationship between N intake and milk urea concentration reflects a well-established biological mechanism of protein metabolism and nitrogen excretion. When dietary nitrogen intake exceeds the animal’s metabolic requirements, surplus nitrogen is converted into urea in the liver and subsequently excreted through milk and urine. This relationship explains the strong SHAP importance of N intake in the predictive models, showing that the model reflects the biological process.
Consistent with the SHAP results, managing dietary nitrogen intake is the most direct lever to control milk urea and ammonia emissions [49]. In practice, feeding needs to meet but not exceed metabolisable protein requirements. In addition, crude protein reduction is required when milk urea trends high. Because lactation month also strongly influences predictions, group feeding by stage of lactation can be applied and precision adjustments could be adapted [50]. Through routine monitoring of milk urea levels, combined with iterative dietary adjustments, stable feed supply, and minimised sorting operations, rapid feedback control can be achieved for the key drivers identified by the model. This approach shows good feasibility for implementation in commercial dairy farms.
As illustrated in Figure 6 and Figure 7, the SHAP-based feature importance and the corresponding positive or negative impacts across the baseline and ensemble stacking models are largely consistent. Furthermore, the feature contributions align well with the feature rankings, such as the milk yield rankings presented in Table 2.
However, ML models can produce different SHAP values due to their distinct structures and decision-making processes. This can be observed in milk urea prediction, where the predictor may lack strongly influential contributors or be affected by the overriding influence of certain closely related features. Each model (e.g., neural network, AdaBoost, and SVR) could learn unique relationships between features and predictions, leading to varying feature contributions. Combining models into an ensemble can improve predictive performance by leveraging their strengths. Ensembles often produce more stable predictions, resulting in more consistent SHAP values that are less sensitive to small data changes. For better interpretability, it may be useful to evaluate SHAP values from both individual and ensemble models alongside domain knowledge to understand feature contributions.

5. Ammonia Emissions Estimation from Milk Urea

In the previous sections, we introduced the prediction of milk urea concentration based on biological characteristics and feeding features. Milk urea nitrogen represents a specific component of milk urea concentration. MUN can be derived from MU based on the fact that urea contains approximately 46.6% nitrogen by weight (molecular weight of urea: 60.06 g/mol, with 28.02 g/mol attributed to nitrogen) [29]. Consequently, the MUN can be calculated as a proportion of milk urea concentration (mg/deciliter) using Formula (2):
M i l k   U r e a   N i t r o g e n   ( m g / d L ) = M i l k   U r e a   C o n c e n t r a t i o n   ( m g / d L ) × 46.6 %
Furthermore, a positive linear relationship has been established for estimating urinary urea nitrogen (UUN) excretion and ammonia (NH3) from MUN [51], providing a practical approach for assessing ammonia emissions and UUN. A summary of the empirically derived statistical equations is provided in Table 6, where the f-equation links MUN to ammonia, the h-equation links MUN to UUN, and the g-equation links UUN to NH3.
In Table 6, the empirical statistical equations f 1 and h 1 , which have the highest R 2 values, are selected to establish the relationships between MUN and both ammonia emissions and UUN. We convert the milk urea predicted by the ML model to MUN with Formula (2). These MUN values are then applied in Eq. f 1 and h 1 to calculate the corresponding ammonia and UUN.
Figure 8 presents the estimated ammonia and UUN values based on MUN derived from the ML-predicted milk urea. The red dots represent the estimated ammonia and UUN values, respectively, derived from MUN. Coloured lines depict the empirical statistical equations for f 1 (a) and the h group (b), with shaded areas around each line indicating the uncertainty ranges for these equations. In Figure 8a, the MUN range of 5–15 mg/dL (green shaded area) is empirically validated for Eq. f 1 , where the estimated ammonia values from ML-predicted milk urea align well with the statistics-based model. Outside this range, ML predictions indicate a slightly varying linear association between MUN and ammonia. Similarly, Figure 8b shows that the estimated UUN values generally align with the statistical equations, with most variations falling within the uncertainty range (e.g., orange), reflecting real-world variability in data obtained from different conditions.
The empirical equations used to convert MUN to NH3 and UUN were primarily validated within the MUN range of 5–15 mg/dL, and within this interval, the ML predictions agree closely with the empirical results, indicating that the models capture biologically consistent relationships. Slight deviations observed outside this range, particularly below 5 or above 15 mg/dL, are most likely related to data sparsity. In these cases, the models are effectively extrapolating, and the predictions should be interpreted as indicative rather than definitive. Although the ML framework is capable of learning complex non-linear relationships, the limited sample coverage at extreme MUN values prevents robust verification of such patterns. In practice, these out-of-range values rarely occur under normal feeding conditions, so their effect on practical application is negligible. Predictions outside the 5–15 mg/dL range are therefore provided for reference and should be treated with caution until larger datasets become available.
In summary, combining machine learning with established empirical statistical equations offers a reliable and practical approach to address real-world challenges when large-scale data measurement is difficult, such as estimating ammonia emissions. Indirectly estimating such data from predictable variables shows great promise. Predicting MUN not only helps assess nitrogen metabolism in dairy cows but also serves as a critical factor for estimating UUN and ammonia emissions.

6. Digital Twin Integration and Model Generalisation for Effective Dairy System Management

6.1. Integration of Predictive Models into the Digital Twin

As shown in Figure 9, the developed predictive framework is integrated within a digital twin platform designed to enable real-time simulation, optimisation, and decision support in dairy production systems. The digital twin serves as a virtual replica of the physical dairy environment, dynamically synchronising with on-farm data streams such as feeding behaviour, milk yield, milk composition, environmental conditions, and animal health indicators.
Within this environment, the predictive machine learning models for milk yield, milk urea, and ammonia emissions act as surrogate models that simulate system responses under varying dietary and management scenarios. This allows users to conduct “what-if” analyses, for example, evaluating how changes in feed composition or protein content influence both productivity and nitrogen emissions. By continuously updating with real-time or near-real-time data, the digital twin facilitates adaptive management and continuous learning, ensuring that recommendations remain context-specific and data-driven. Furthermore, the interactive nature of the platform supports stakeholder engagement by translating complex model outputs into intuitive visualisations and actionable insights for farmers, advisors, and policymakers.

6.2. Discussion on Model Generalisation and Future Work

The CowNflow dataset is relatively small but provides a well-structured and comprehensive source of individual-level nitrogen balance, feeding, and production data collected under controlled experimental conditions. Its detailed dietary and physiological measurements offer valuable insight into metabolic and feeding responses in dairy cows.
Despite the limited data, this study demonstrates that an interpretable hybrid machine learning framework can effectively integrate diverse biological and dietary features to achieve robust predictive performance ( R 2 = 0.85 for milk yield and R 2 = 0.794 for milk urea). These results support the feasibility of data-driven modelling in dairy nitrogen management, even under constrained data availability. This work therefore serves as a proof of concept and a reusable methodological foundation that can be expanded and continuously improved as more datasets become available.
While this study demonstrates strong predictive performance using the CowNflow dataset, the next step is to expand the framework with larger, sensor-based, and longitudinal datasets. Data from robotic and precision dairy farming systems, where automated sensors continuously record feeding behaviour, milk composition, and environmental conditions, will allow model retraining and improvement. Integrating such real-time data will enhance model robustness, reduce uncertainty, and better capture temporal variations in cow performance. Advanced approaches such as transfer learning and hybrid mechanistic (process-based or empirical models)–machine learning models can further strengthen predictive accuracy and adaptability across diverse farm environments, thereby improving generalisability.
Furthermore, integrating the predictive models within a digital twin platform can provide a dynamic environment for continuous optimisation, scenario testing, and data-driven decision support. Continuous validation within the digital twin platform will support model updating, inform data collection strategies, and strengthen the feedback loop between model development and practical dairy system management.

7. Conclusions

This study developed an interpretable hybrid machine learning framework for the multi-objective prediction of dairy cow productivity, milk urea concentration, and ammonia nitrogen emissions. The framework achieved high predictive accuracy ( R 2 = 0.85 for milk yield and R 2 = 0.794 for milk urea), and interpretability analysis identified the key biological and nutritional factors influencing these outcomes. By integrating data analytics, feature selection, ensemble learning, and SHAP-based explainability, the approach provides a practical pathway to connect productivity with environmental performance in dairy production systems.
Despite the limited dataset size, the framework successfully establishes a proof of concept for data-driven optimisation of dairy farming practices. The results highlight the potential of combining biological insight with machine learning to generate actionable and transparent predictions. As more comprehensive datasets become available from robotic and precision dairy systems that include precision feeding and real-time monitoring data, the model can be further enhanced through continuous learning, improved feature diversity, and temporal modelling to increase generalisability and robustness.
The proposed framework also illustrates how explainable machine learning models can be embedded within engineering digital twins to co-optimise productivity and environmental outcomes. This integration enables real-time simulation, scenario testing, and adaptive decision support, bridging data analytics with practical on-farm applications. This work highlights the broader value of explainable artificial intelligence in advancing sustainable, intelligent dairy production and provides a methodological foundation for future developments in agricultural technology and environmental engineering.

Author Contributions

Conceptualisation, R.X. and B.L.; Methodology, R.X., B.L. and S.D.; Software, R.X.; Validation, R.X. and S.D.; Formal analysis, R.X.; Investigation, R.X. and S.D.; Resources, M.W. and J.M.; Data curation, R.X.; Writing—original draft preparation, R.X.; Writing—review and editing, B.L., S.D., M.W. and J.M.; Visualisation, R.X.; Supervision, B.L.; Project administration, B.L.; Funding acquisition, B.L., M.W. and J.M. All authors have read and agreed to the published version of the manuscript.

Funding

This project is sponsored by Cattle Information Service (CIS) and National Bovine Data Centre (NBDC) in the UK. We would also like to thank the support from the UK Research Innovation (UKRI) and Engineering and Physical Sciences Research Council (EPSRC) grant (EP/Y00597X/1).

Data Availability Statement

The dataset used in this study is CowNflow, which is publicly available at the INRAE Dataverse Repository (https://entrepot.recherche.data.gouv.fr/dataverse/inrae, accessed on 19 October 2025).

Conflicts of Interest

Authors Michael Whittaker and Janette Mathie were employed by the company The Cattle Information Service (CIS). The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Ritchie, H.; Roser, M. Meat and Dairy Production. Our World in Data. 2017. Available online: https://ourworldindata.org/meat-production (accessed on 2 October 2023).
  2. Roy, B.; Brahma, B.; Ghosh, S.; Pankaj, P.; Mandal, G. Evaluation of milk urea concentration as useful indicator for dairy herd management: A review. Asian J. Anim. Vet. Adv. 2011, 6, 1–19. [Google Scholar] [CrossRef]
  3. Wolfert, S.; Ge, L.; Verdouw, C.; Bogaardt, M.J. Big data in smart farming—A review. Agric. Syst. 2017, 153, 69–80. [Google Scholar] [CrossRef]
  4. Cockburn, M. Review: Application and prospective discussion of machine learning for the management of dairy farms. Animals 2020, 10, 1690. [Google Scholar] [CrossRef] [PubMed]
  5. Lopez-Suarez, M.; Armengol, E.; Calsamiglia, S.; Castillejos, L. Using decision trees to extract patterns for dairy culling management. In Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations, Rhodes, Greece, 25–27 May 2018; Springer: Cham, Switzerland, 2018; pp. 231–239. [Google Scholar]
  6. Williams, M.; Mac Parthaláin, N.; Brewer, P.; James, W.; Rose, M. A novel behavioral model of the pasture-based dairy cow from GPS data using data mining and machine learning techniques. J. Dairy Sci. 2016, 99, 2063–2075. [Google Scholar] [CrossRef] [PubMed]
  7. Miekley, B.; Traulsen, I.; Krieter, J. Mastitis detection in dairy cows: The application of support vector machines. J. Agric. Sci. 2013, 151, 889–897. [Google Scholar] [CrossRef]
  8. Ebrahimi, M.; Mohammadi-Dehcheshmeh, M.; Ebrahimie, E.; Petrovski, K.R. Comprehensive analysis of machine learning models for prediction of sub-clinical mastitis: Deep Learning and Gradient-Boosted Trees outperform other models. Comput. Biol. Med. 2019, 114, 103456. [Google Scholar] [CrossRef] [PubMed]
  9. Dallago, G.M.; de Figueiredo, D.M.; de Resende Andrade, P.C.; dos Santos, R.A.; Lacroix, R.; Santschi, D.E.; Lefebvre, D.M. Predicting first test day milk yield of dairy heifers. Comput. Electron. Agric. 2019, 166, 105032. [Google Scholar] [CrossRef]
  10. Ji, B.; Banhazi, T.; Phillips, C.J.; Wang, C.; Li, B. A machine learning framework to predict the next month’s daily milk yield, milk composition and milking frequency for cows in a robotic dairy farm. Biosyst. Eng. 2022, 216, 186–197. [Google Scholar] [CrossRef]
  11. Saha, A.; Bhattacharyya, S. Artificial insemination for milk production in India: A statistical insight. Indian J. Anim. Sci. 2020, 90, 1186–1190. [Google Scholar] [CrossRef]
  12. Zhang, F.; Upton, J.; Shalloo, L.; Shine, P.; Murphy, M.D. Effect of introducing weather parameters on the accuracy of milk production forecast models. Inf. Process. Agric. 2020, 7, 120–138. [Google Scholar] [CrossRef]
  13. Sharma, A.K.; Sharma, R.; Kasana, H. Prediction of first lactation 305-day milk yield in Karan Fries dairy cattle using ANN modeling. Appl. Soft Comput. 2007, 7, 1112–1120. [Google Scholar] [CrossRef]
  14. Soyeurt, H.; Grelet, C.; McParland, S.; Calmels, M.; Coffey, M.; Tedde, A.; Delhez, P.; Dehareng, F.; Gengler, N. A comparison of 4 different machine learning algorithms to predict lactoferrin content in bovine milk from mid-infrared spectra. J. Dairy Sci. 2020, 103, 11585–11596. [Google Scholar] [CrossRef] [PubMed]
  15. Radwan, H.; El Qaliouby, H.; Elfadl, E.A. Classification and prediction of milk yield level for Holstein Friesian cattle using parametric and non-parametric statistical classification models. J. Adv. Vet. Anim. Res. 2020, 7, 429. [Google Scholar] [CrossRef] [PubMed]
  16. Ferreira, M.; Delagarde, R.; Edouard, N. CowNflow: A dataset on nitrogen flows and balances in dairy cows fed maize forage or herbage-based diets. Data Brief 2021, 38, 107393. [Google Scholar] [CrossRef]
  17. Græsbøll, K.; Kirkeby, C.; Nielsen, S.S.; Halasa, T.; Toft, N.; Christiansen, L.E. A robust statistical model to predict the future value of the milk production of dairy cows using herd recording data. Front. Vet. Sci. 2017, 4, 13. [Google Scholar] [CrossRef] [PubMed]
  18. Bouallegue, M.; M’Hamdi, N. Mathematical modeling of lactation curves: A review of parametric models. Lact. Farm Anim.-Biol. Physiol. Basis Nutr. Requir. Model. 2020, 1, 1–20. [Google Scholar]
  19. Græsbøll, K.; Kirkeby, C.; Nielsen, S.S.; Halasa, T.; Toft, N.; Christiansen, L.E. Models to estimate lactation curves of milk yield and somatic cell count in dairy cows at the herd level for the use in simulations and predictive models. Front. Vet. Sci. 2016, 3, 115. [Google Scholar] [CrossRef]
  20. Gołębiewski, M.; Brzozowski, P.; Gołębiewski, Ł. Analysis of lactation curves, milk constituents, somatic cell count and urea in milk of cows by the mathematical model of Wood. Acta Vet. Brno 2011, 80, 73–80. [Google Scholar] [CrossRef]
  21. Grzesiak, W.; Błaszczyk, P.; Lacroix, R. Methods of predicting milk yield in dairy cows—Predictive capabilities of Wood’s lactation curve and artificial neural networks (ANNs). Comput. Electron. Agric. 2006, 54, 69–83. [Google Scholar] [CrossRef]
  22. Murphy, M.D.; O’Mahony, M.J.; Shalloo, L.; French, P.; Upton, J. Comparison of modelling techniques for milk-production forecasting. J. Dairy Sci. 2014, 97, 3352–3363. [Google Scholar] [CrossRef]
  23. Grzesiak, W.; Zaborski, D.; Szatkowska, I.; Królaczyk, K. Lactation milk yield prediction in primiparous cows on a farm using the seasonal auto-regressive integrated moving average model, nonlinear autoregressive exogenous artificial neural networks and Wood’s model. Anim. Biosci. 2021, 34, 770. [Google Scholar] [CrossRef] [PubMed]
  24. Dongre, V.; Gandhi, R.; Singh, A.; Ruhil, A. Comparative efficiency of artificial neural networks and multiple linear regression analysis for prediction of first lactation 305-day milk yield in Sahiwal cattle. Livest. Sci. 2012, 147, 192–197. [Google Scholar] [CrossRef]
  25. Njubi, D.; Wakhungu, J.; Badamana, M. Use of test-day records to predict first lactation 305-day milk yield using artificial neural network in Kenyan Holstein–Friesian dairy cows. Trop. Anim. Health Prod. 2010, 42, 639–644. [Google Scholar] [CrossRef] [PubMed]
  26. Giannuzzi, D.; Mota, L.F.M.; Pegolo, S.; Tagliapietra, F.; Schiavon, S.; Gallo, L.; Marsan, P.A.; Trevisi, E.; Cecchinato, A. Prediction of detailed blood metabolic profile using milk infrared spectra and machine learning methods in dairy cattle. J. Dairy Sci. 2023, 106, 3321–3344. [Google Scholar] [CrossRef]
  27. Giannuzzi, D.; Mota, L.F.M.; Pegolo, S.; Gallo, L.; Schiavon, S.; Tagliapietra, F.; Katz, G.; Fainboym, D.; Minuti, A.; Trevisi, E.; et al. In-line near-infrared analysis of milk coupled with machine learning methods for the daily prediction of blood metabolic profile in dairy cattle. Sci. Rep. 2022, 12, 8058. [Google Scholar] [CrossRef]
  28. Van Duinkerken, G.; Smits, M.; André, G.; Šebek, L.; Dijkstra, J. Milk urea concentration as an indicator of ammonia emission from dairy cow barn under restricted grazing. J. Dairy Sci. 2011, 94, 321–335. [Google Scholar] [CrossRef]
  29. Chen, Y.; Atashi, H.; Vanderick, S.; Mota, R.; Soyeurt, H.; Hammami, H.; Gengler, N. Genetic analysis of milk urea concentration and its genetic relationship with selected traits of interest in dairy cows. J. Dairy Sci. 2021, 104, 12741–12755. [Google Scholar] [CrossRef]
  30. Bussink, D.; Oenema, O. Ammonia volatilization from dairy farming systems in temperate areas: A review. Nutr. Cycl. Agroecosyst. 1998, 51, 19–33. [Google Scholar] [CrossRef]
  31. Fischer, K.; Burchill, W.; Lanigan, G.; Kaupenjohann, M.; Chambers, B.; Richards, K.G.; Forrestal, P.J. Ammonia emissions from cattle dung, urine and urine with dicyandiamide in a temperate grassland. Soil Use Manag. 2016, 32, 83–91. [Google Scholar] [CrossRef]
  32. Burgos, S.; Fadel, J.; DePeters, E. Prediction of ammonia emission from dairy cattle manure based on milk urea nitrogen: Relation of milk urea nitrogen to urine urea nitrogen excretion. J. Dairy Sci. 2007, 90, 5499–5508. [Google Scholar] [CrossRef]
  33. Burgos, S.; Embertson, N.; Zhao, Y.; Mitloehner, F.; DePeters, E.; Fadel, J. Prediction of ammonia emission from dairy cattle manure based on milk urea nitrogen: Relation of milk urea nitrogen to ammonia emissions. J. Dairy Sci. 2010, 93, 2377–2386. [Google Scholar] [CrossRef]
  34. Sharma, A.; Jain, A.; Gupta, P.; Chowdary, V. Machine learning applications for precision agriculture: A comprehensive review. IEEE Access 2020, 9, 4843–4873. [Google Scholar] [CrossRef]
  35. Kreuzberger, D.; Kühl, N.; Hirschl, S. Machine learning operations (mlops): Overview, definition, and architecture. IEEE Access 2023, 11, 31866–31879. [Google Scholar] [CrossRef]
  36. Grzesiak, W.; Kowalski, Z.; Błaszczyk, P. The Use of Selected Machine Learning Methods in Dairy Cattle Farming. Animals 2025, 15, 2033. [Google Scholar] [CrossRef] [PubMed]
  37. Liu, K.; Zhao, Q.; Banhazi, T.; Li, B. A Machine Learning Framework for Precision Prediction of Lactation Performance in Large Dairy Herds. Comput. Electron. Agric. 2025, 227, 109234. [Google Scholar]
  38. Foschi, E.; Brondi, C.; Mazzoni, S.; Manzardo, A. Development of a Machine Learning Tool for the Enhancement of Carbon Footprint Prediction for Cattle Milk Production. Int. J. Life Cycle Assess. 2025, 30, 1124–1138. [Google Scholar] [CrossRef]
  39. Walfish, S. A review of statistical outlier methods. Pharm. Technol. 2006, 30, 82. [Google Scholar]
  40. Li, J.; Cheng, K.; Wang, S.; Morstatter, F.; Trevino, R.P.; Tang, J.; Liu, H. Feature selection: A data perspective. ACM Comput. Surv. (CSUR) 2017, 50, 1–45. [Google Scholar] [CrossRef]
  41. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. In Proceedings of the 32nd International Conference on Neural Information Processing System, Montréal, QC, Canada, 3–8 December 2018; pp. 6639–6649. [Google Scholar]
  42. Sarkar, S.; Pandey, B. A study on the statistical significance of mutual information between morphology of a galaxy and its large-scale environment. Mon. Not. R. Astron. Soc. 2020, 497, 4077–4090. [Google Scholar] [CrossRef]
  43. Raschka, S. Model evaluation, model selection, and algorithm selection in machine learning. arXiv 2018, arXiv:1811.12808. [Google Scholar]
  44. Salciccioli, J.D.; Crutain, Y.; Komorowski, M.; Marshall, D.C. Sensitivity analysis and model validation. In Secondary Analysis of Electronic Health Records; Springer: Cham, Switzerland, 2016; pp. 263–271. [Google Scholar]
  45. Willmott, C.J.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
  46. Shapley, L.S. A value for n-person games. In Classics in Game Theory; Princeton University Press: Princeton, NJ, USA, 1953; Volume 2. [Google Scholar]
  47. Zhang, J.; Long, Z.; Ren, Z.; Xu, W.; Sun, Z.; Zhao, H.; Zhang, G.; Gao, W. Application of machine learning in ultrasonic pretreatment of sewage sludge: Prediction and optimization. Environ. Res. 2024, 263, 120108. [Google Scholar] [CrossRef]
  48. Liu, K.; Liao, C. Examining the importance of neighborhood natural, and built environment factors in predicting older adults’ mental well-being: An XGBoost-SHAP approach. Environ. Res. 2024, 262, 119929. [Google Scholar] [CrossRef]
  49. Zhao, X.; Zang, C.; Zhao, S.; Zheng, N. Assessing milk urea nitrogen as an indicator of protein nutrition and nitrogen utilization efficiency: A meta-analysis. J. Dairy Sci. 2025, 108, 4851–4862. [Google Scholar] [CrossRef] [PubMed]
  50. Barrientos-Blanco, J.A.; White, H.; Shaver, R.D.; Cabrera, V.E. Considerations for nutritional grouping in dairy farms. J. Dairy Sci. 2022, 105, 7173–7189. [Google Scholar] [CrossRef] [PubMed]
  51. Beatson, P.; Meier, S.; Cullen, N.; Eding, H. Genetic variation in milk urea nitrogen concentration of dairy cattle and its implications for reducing urinary nitrogen excretion. Animal 2019, 13, 2164–2171. [Google Scholar] [CrossRef] [PubMed]
  52. Powell, J.; Wattiaux, M.; Broderick, G. Evaluation of milk urea nitrogen as a management tool to reduce ammonia emissions from dairy farms. J. Dairy Sci. 2011, 94, 4690–4694. [Google Scholar] [CrossRef]
  53. Spek, J.; Dijkstra, J.; van Duinkerken, G.; Hendriks, W.H.; Bannink, A. Prediction of urinary nitrogen and urinary urea nitrogen excretion by lactating dairy cattle in northwestern Europe and North America: A meta-analysis. J. Dairy Sci. 2013, 96, 4310–4322. [Google Scholar] [CrossRef]
  54. Li, B.; Thompson, M.; Partridge, T.; Xing, R.; Cutler, J.; Alhnaity, B.; Meng, Q. AI-powered digital twin for sustainable agriculture and greenhouse gas reduction. In Proceedings of the 2024 IEEE 21st International Conference on Smart Communities: Improving Quality of Life Using AI, Robotics and IoT (HONET), Doha, Qatar, 3–5 December 2024; pp. 55–60. [Google Scholar]
Figure 1. An interpretable machine learning framework for multi-objective prediction of dairy cow productivity, milk urea, and ammonia nitrogen emissions.
Figure 1. An interpretable machine learning framework for multi-objective prediction of dairy cow productivity, milk urea, and ammonia nitrogen emissions.
Algorithms 18 00670 g001
Figure 2. Examples of outlier detection and statistics in dairy farming using IQR box-and-whisker plots.
Figure 2. Examples of outlier detection and statistics in dairy farming using IQR box-and-whisker plots.
Algorithms 18 00670 g002
Figure 3. Heatmaps of feature correlations (Pearson coefficients), illustrating their importance and impact on milk yield and milk urea.
Figure 3. Heatmaps of feature correlations (Pearson coefficients), illustrating their importance and impact on milk yield and milk urea.
Algorithms 18 00670 g003
Figure 4. The structure of the predictive model through stacking ensemble learning.
Figure 4. The structure of the predictive model through stacking ensemble learning.
Algorithms 18 00670 g004
Figure 5. Stacking model predictions versus observed values across the test samples: (a) milk yield and (b) milk urea. Points and lines show observed (blue) and predicted (orange) series. The close tracking between the two series indicates good predictive accuracy, with most deviations occurring at the extremes.
Figure 5. Stacking model predictions versus observed values across the test samples: (a) milk yield and (b) milk urea. Points and lines show observed (blue) and predicted (orange) series. The close tracking between the two series indicates good predictive accuracy, with most deviations occurring at the extremes.
Algorithms 18 00670 g005
Figure 6. SHAP analysis for feature influence in milk yield predictive models. (a) SVR, (b) AdaBoost, (c) ANN, and (d) Stacking model. Across models, lactation month shows the strongest negative contribution to yield, consistent with the lactation curve, while age contributes minimally. Points denote instances; SHAP values indicate effect size and sign; and colour encodes feature value from low (blue) to high (red).
Figure 6. SHAP analysis for feature influence in milk yield predictive models. (a) SVR, (b) AdaBoost, (c) ANN, and (d) Stacking model. Across models, lactation month shows the strongest negative contribution to yield, consistent with the lactation curve, while age contributes minimally. Points denote instances; SHAP values indicate effect size and sign; and colour encodes feature value from low (blue) to high (red).
Algorithms 18 00670 g006
Figure 7. SHAP analysis for feature influence in milk urea predictive models. (a) Random Forest, (b) AdaBoost, (c) ANN, and (d) Stacking model. Nitrogen intake is the dominant positive driver of milk urea, followed by dietary factors with smaller effects. Points denote instances; SHAP values indicate effect size and sign; and colour encodes feature value from low (blue) to high (red).
Figure 7. SHAP analysis for feature influence in milk urea predictive models. (a) Random Forest, (b) AdaBoost, (c) ANN, and (d) Stacking model. Nitrogen intake is the dominant positive driver of milk urea, followed by dietary factors with smaller effects. Points denote instances; SHAP values indicate effect size and sign; and colour encodes feature value from low (blue) to high (red).
Algorithms 18 00670 g007
Figure 8. Consistency between ML-derived estimates and empirical equations for NH3 and UUN. (a) Red dots: NH3 estimated from ML-predicted MUN; coloured lines: published MUN→NH3 equations; shaded areas: their uncertainty bands. (b) Red dots: UUN estimated from ML-predicted MUN; coloured lines: published MUN→UUN equations; shaded areas: uncertainty bands. ML estimates generally fall within reported uncertainty ranges, indicating broad consistency.
Figure 8. Consistency between ML-derived estimates and empirical equations for NH3 and UUN. (a) Red dots: NH3 estimated from ML-predicted MUN; coloured lines: published MUN→NH3 equations; shaded areas: their uncertainty bands. (b) Red dots: UUN estimated from ML-predicted MUN; coloured lines: published MUN→UUN equations; shaded areas: uncertainty bands. ML estimates generally fall within reported uncertainty ranges, indicating broad consistency.
Algorithms 18 00670 g008
Figure 9. Integration of ML models within a digital twin for efficient dairy system management and environmental impact reduction [54].
Figure 9. Integration of ML models within a digital twin for efficient dairy system management and environmental impact reduction [54].
Algorithms 18 00670 g009
Table 1. Description of biological, feeding, and milk composition attributes in CowNflow dataset.
Table 1. Description of biological, feeding, and milk composition attributes in CowNflow dataset.
IdxAttributesNumData DescriptionRange of Values (Min–Max) or CategoriesData Type
1Cow age414Cow age in months31–123numerical
2Body weight414Body weight (kg)430–907numerical
3Physiological status414Two categories: Dry, LactatingDry, Lactatingcategorical
4Lactation month414Number of months of lactation0.5–5.5numerical
5Gestation month414Number of months of gestation0–8.25numerical
6Diet type414Six categories, about feeding diet typeMaize, Fresh Herbage, Maize Fresh Herbage, Maize Lucerne, Dehydrated Herbage, Maize Haycategorical
7DM intake414Dry matter intake (kg/day)8.0–29.6numerical
8DM digestibility414Dry matter digestibility (g/g)0.611–0.857numerical
9DMI/100 kgBW414Dry matter intake per 100 kg body weight1.10–4.04numerical
10OM intake413Organic matter intake (kg/day)7.0–28.0numerical
11Ash intake413Ash intake (kg/day)0.59–2.30numerical
12N intake414Nitrogen intake (g/day)194–864numerical
13CP intake414Crude protein intake (g/day)1211–5397numerical
14NDF intake294Neutral detergent fibre (kg/day)3.89–11.87numerical
15ADF intake265Acid detergent fibre (kg/day)2.40–6.72numerical
16Milk fat concentration402Milk fat concentration (g/kg)22.5–75.6numerical
17Milk true protein concentration402Milk true protein concentration (g/kg)25.1–74.3numerical
18Milk yield402Milk yield (kg/day)5.5–47.0numerical
19Milk urea concentration180Milk urea concentration (mg/decilitre)2.5–48.1numerical
Table 2. Feature ranking with different feature selection methods for milk yield.
Table 2. Feature ranking with different feature selection methods for milk yield.
FeaturesF-TestMIRFERidgeCVAverage Rank
OM intake (kg/day)11221.50
Lactation month33112.00
DM intake (kg/day)22432.75
Diet type471146.50
DMI/100 kgBW58586.50
DM digestibility (g/g)911357.00
CP intake (g/day)659107.50
N intake (g/day)76897.50
Gestation month89677.50
Cow age (year)1147118.25
Body weight (kg)10101269.50
Ash intake1212101211.50
Ranking score: 1–12, 1 means most related and 12 represents the least.
Table 3. Feature ranking with different feature selection methods for milk urea.
Table 3. Feature ranking with different feature selection methods for milk urea.
FeaturesF-TestMIRFERidgeCVAverage Rank
CP intake (kg/day)12111.25
N intake (kg/day)21322.00
DM digestibility (g/g)532135.75
DM intake (kg/day)118436.50
Ash intake (kg/day)359117.00
DMI/100 kgBW814578.50
Milk fat concentration4131369.00
Body weight (kg)9710109.00
Milk Yield1010889.00
OM intake (kg/day)1291149.00
Cow age (year)6412159.25
Milk protein concentration71513911.00
Diet Type131115511.00
Lactation Month151571412.75
Gestation Month1412141213.00
Ranking score: 1–15, 1 means most related and 15 represents the least.
Table 4. Comparison of model performance on the original and selected features for milk yield (kg/day).
Table 4. Comparison of model performance on the original and selected features for milk yield (kg/day).
ModeNo Feature SelectionWith Feature Selection
R 2 MAEMSERMSE R 2 MAEMSERMSE
Linear Regression0.7763.03115.6153.9520.7763.02015.5783.947
RF regression0.8052.92013.5723.6840.8042.90213.6533.695
SVR0.8132.73313.0573.6130.8122.77213.0943.619
AdaBoost0.8132.86013.0273.6010.8202.83812.5433.542
ANN0.8272.67012.0523.4710.8412.60211.0883.330
Stacking0.8432.56810.9443.3080.8502.53710.4733.236
AVERAGE0.8132.79713.0453.6050.8172.77912.7383.562
Table 5. Comparison of model performance on the original and selected features for milk urea concentration (mg/dL).
Table 5. Comparison of model performance on the original and selected features for milk urea concentration (mg/dL).
ModelNo Feature SelectionWith Feature Selection
R 2 MAEMSERMSE R 2 MAEMSERMSE
Linear Regression0.7064.56236.6976.0580.7164.27435.4435.953
SVR0.7194.47035.0935.9240.7474.34331.5295.615
RF regression0.7384.33532.6595.7150.7484.32031.4425.607
AdaBoost0.7424.41432.2145.6760.7604.25229.9285.471
ANN0.7843.88927.0225.1980.7923.68126.0165.101
Stacking0.7913.85826.0445.1030.7943.81725.765.076
AVERAGE0.7474.25531.6225.6120.7604.11530.0215.471
Table 6. Empirically derived statistical equations relating MUN (mg/deciliter), UUN (g/day), and ammonia (g/day).
Table 6. Empirically derived statistical equations relating MUN (mg/deciliter), UUN (g/day), and ammonia (g/day).
Eq.Mathematical Equation R 2 Reference
MUN to NH3 ( f ) f 1 NH3 = 25.0 (±6.72) + 5.03 (±0.373) × MUN R 2 = 0.85Burgos et al. (2010)
MUN to UUN ( h ) h 1 UUN = −37.33 (±11.62) + 16.01 (±0.48) × MUN R 2 = 0.99Burgos et al. (2007)
h 2 UUN = −49.95 (±21.18) + 18.67 (±2.58) × MUN − 0.17 (±0.07) × MUN2 R 2 = 0.97Burgos et al. (2007)
h 3 UUN = −23.1 (±9.13) + 14.4 (±0.55) × MUN R 2 = 0.96Burgos et al. (2007)
h 4 UUN = −34.2 + 16.23 × MUN R 2 = 0.79Powell et al. (2011)
h 5 UUN = −31.4 + 14.15 × MUN R 2 = 0.93Spek et al. (2013)
UUN to NH3 ( g ) g 1 NH3 = 28.8 (±5.20) + 0.377 (±0.024) × UUN R 2 = 0.88Burgos et al. (2010)
Ref: Burgos et al. (2010): [33]; Burgos et al. (2007): [32]; Powell et al. (2011): [52]; Spek et al. (2013): [53].
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xing, R.; Li, B.; Dora, S.; Whittaker, M.; Mathie, J. Engineering-Oriented Explainable Machine Learning and Digital Twin Framework for Sustainable Dairy Production and Environmental Impact Optimisation. Algorithms 2025, 18, 670. https://doi.org/10.3390/a18100670

AMA Style

Xing R, Li B, Dora S, Whittaker M, Mathie J. Engineering-Oriented Explainable Machine Learning and Digital Twin Framework for Sustainable Dairy Production and Environmental Impact Optimisation. Algorithms. 2025; 18(10):670. https://doi.org/10.3390/a18100670

Chicago/Turabian Style

Xing, Ruiming, Baihua Li, Shirin Dora, Michael Whittaker, and Janette Mathie. 2025. "Engineering-Oriented Explainable Machine Learning and Digital Twin Framework for Sustainable Dairy Production and Environmental Impact Optimisation" Algorithms 18, no. 10: 670. https://doi.org/10.3390/a18100670

APA Style

Xing, R., Li, B., Dora, S., Whittaker, M., & Mathie, J. (2025). Engineering-Oriented Explainable Machine Learning and Digital Twin Framework for Sustainable Dairy Production and Environmental Impact Optimisation. Algorithms, 18(10), 670. https://doi.org/10.3390/a18100670

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop