4.1. Key Predictive Indicators of Metritis in Dairy Cows
Identifying cows at increased risk of health issues offers clear benefits for farmers, and timely interventions can help prevent or mitigate the adverse effects of disease. Such early action contributes to improved health management at both the individual and herd levels within a precision livestock farming system [
8]. In this study, a range of classification models was selected, spanning from simpler approaches (e.g., PLS-DA) to more complex architectures (e.g., NN and Ensemble models). Each model was chosen based on its suitability for imbalanced tabular data and its demonstrated effectiveness in prior livestock health prediction research.
Among all models evaluated, the NN achieved the most balanced and highest overall classification performance. It reached an MCC of 0.79, an overall accuracy exceeding 96%, high sensitivity (83.6%), and specificity (97.5%). These findings support our initial hypothesis that deep learning methods can significantly enhance the early detection of metritis using routinely collected on-farm data. Both the RF and Ensemble models also demonstrated excellent performance, particularly in correctly identifying healthy animals, as reflected by their high specificity and PPV. However, their slightly lower sensitivity suggests a potential risk of underdiagnosing metritis when used independently. The SVM model achieved the highest sensitivity (90.9%) and NPV (98.9%), highlighting its potential for deployment in screening or early warning systems where minimising false negatives is critical. In contrast, the PLS-DA model, though commonly used in multivariate biomarker research, showed the weakest performance in this context and produced a high number of false positives (PPV = 21.9%).
Fadul-Pacheco et al. [
46] demonstrated that an RF algorithm achieved strong performance in predicting clinical mastitis, reporting a sensitivity of 85% and specificity of 62%. In comparison, our study yielded a lower sensitivity (68.42%) but substantially higher specificity (98.74%) for metritis detection [
46]. Similarly, Steensels et al. [
47] proposed a decision tree-based method to identify 35 post-calving cows affected by ketosis and/or metritis using sensor-derived variables, including performance metrics, rumination time, cow activity levels, and body weight [
47]. In another study, Wei Xu et al. [
48] identified that RF (error rate: 12.4–22.6%) and SV (SVM; error rate: 12.4–20.9%) as the most effective models for predicting metabolic status based on routine on-farm cow data [
48].
Feature importance analysis using RF and permutation-based methods identified body weight, the milk fat-to-protein ratio, and milk fat percentage as the three most informative predictors of metritis. These variables likely reflect disruptions in energy balance and lipid metabolism, which are consistent with the early inflammatory response observed in uterine disorders [
49]. Additionally, milk yield, lactation number, rumination time, and milk lactose content were also identified as significant contributors to model predictions. These findings suggest that metritis impacts both metabolic performance and behavioural parameters in dairy cows. Overall, the results support the utility of routinely collected sensor-based data in developing real-time, automated decision-support tools for the early detection of disease in dairy herds.
Retrospective evaluation of pre-diagnostic records revealed distinct physiological and behavioural changes in cows that developed metritis, evident as early as 6–10 days before clinical diagnosis. These included gradual declines in milk yield, rumination time, and body weight. These findings demonstrate that the applied ML models can identify at-risk cows up to one week before clinical symptoms manifest. Among the models tested, the neural network and SVM models performed particularly well, demonstrating strong performance in identifying subtle changes in indicators such as rumination time, body weight, and milk lactose concentration, as well as detecting subtle preclinical changes in variables such as rumination time, body weight, and milk lactose concentration. This early detection window provides a critical opportunity for timely intervention, allowing farmers and veterinarians to initiate preventive or therapeutic measures before disease progression. This could enhance animal welfare and reduce reliance on antibiotic treatments.
These results align with recent literature supporting the integration of cow-level data and ML for disease prediction [
17,
45]. While prior studies have primarily focused on treatment outcomes or metritis cure, our analysis emphasises early disease detection, independent of post-diagnosis management of therapeutic success [
2].
In contrast to earlier findings suggesting that parity or calving disorders are weak predictors of metritis cure [
2], our model found that lactation number–a proxy for parity–contributes to predicting disease onset. This discrepancy may arise from differing research goals: detection of disease occurrence versus prediction of cure probability. Moreover, we found that features reflecting metabolic state and behavioural response prior to diagnosis are critical for accurate prediction.
Importantly, our findings support the broader goal of precision and selective veterinary care [
50]. They align with global efforts to reduce antibiotic use by enabling the earlier identification of at-risk animals [
51]. Although this study focused on detection rather than treatment outcome, the methodology may be extended in future research to develop predictive models for disease resolution.
Feature importance analysis using the RF model, supported by permutation-based interpretation in SVM and NN models, revealed several core indicators that differentiated healthy from diseased cows. Body weight was the most influential variable, contributing over 21% to the classification model. Cows that developed metritis exhibited noticeable deviations in daily body weight, likely reflecting underlying metabolic stress or immune activation. These findings are consistent with evidence suggesting that systemic inflammation promotes catabolism, leading to weight loss and altered energy balance during early lactation [
52,
53]. The milk fat-to-protein ratio was the second most important feature (12.3%), as elevated ratios are commonly linked to negative energy balance—a known risk factor for uterine infections. An increased ratio reflects enhanced adipose mobilisation relative to protein synthesis, which may impair immune function and delay postpartum uterine recovery [
54,
55]. Similarly, milk fat percentage (11.0%) and milk protein (6.1%) were identified as key discriminators, further reflecting shifts in lipid metabolism and nutrient prioritisation under stress conditions. These findings are consistent with previous studies linking metabolic imbalance to increased susceptibility to uterine and metabolic disorders in the transition period [
56]. Milk yield (kg) (8.9%) was lower in metritis cows compared to healthy controls during the pre-diagnosis period, suggesting that reduced production may serve as an early signal of subclinical inflammation. This may reflect early inflammatory effects on feed intake, mammary metabolism, and endocrine signalling [
57]. Likewise, rumination time (min/day) contributed 8.3% to the model. Decreased rumination behaviour was consistently observed in cows that developed metritis, reflecting systemic discomfort or appetite suppression, and is frequently observed in animals experiencing early inflammatory stress [
58,
59]. Interestingly, the lactation number (8.5%) showed moderate importance. Cows in higher parities tended to have a greater risk of metritis, possibly due to cumulative physiological stress and slower postpartum uterine involution [
60]. Finally, milk lactose (%) (6.4%) was also reduced in cows with metritis. Lactose is often negatively affected during inflammation and serves as an indirect marker of mammary and systemic health. Understanding these relationships can help in the early detection and prevention of metritis in dairy cows [
61].
Importantly, the integration of ML model predictions into daily herd management could provide actionable benefits for dairy producers. For example, cows flagged as high-risk could be prioritised for clinical examination, enabling earlier diagnosis and treatment. In automated systems, model predictions could appear as real-time alerts within herd management dashboards, guiding timely veterinary decisions [
62]. Additionally, nutritional interventions could be tailored based on predicted risk, such as administering immune-supportive feed additives during the transition period to reduce inflammation and metabolic stress [
63]. These insights also support responsible antibiotic use, as only cows with high predicted risk would be considered for treatment, aligning with One Health principles [
64]. However, for successful adoption, further development of user-friendly interfaces and validation in diverse herd environments is needed [
65].
Feature selection was further informed by correlation analysis, which revealed several moderate to strong associations among milk composition traits and behavioural parameters. For example, milk fat and milk fat-to-protein ratio showed a significant positive correlation, while rumination time correlated negatively with somatic cell count. These relationships reflect biologically plausible patterns in early-lactation physiology and justify the observed model contributions. Importantly, collinear traits were reviewed to enhance model stability, as reflected in the final feature importances.
Notably, all top-ranked features are continuously and non-invasively monitored by modern herd management systems. This supports their integration into real-time risk prediction models and provides a strong rationale for developing precision veterinary tools based on sensor-derived data. Such tools could enable early detection and targeted intervention for postpartum uterine diseases, including metritis.
4.2. Strengths, Limitations, and Implications for Future Research
One of the key strengths of this study is its use of routinely recorded farm data, such as milk yield, composition, rumination behaviour, and body weight. This ensures that the proposed models can be practically implemented in real-world farm settings without the need for additional invasive or costly diagnostics. The inclusion of multiple ML models also allowed for a comprehensive comparison and demonstrated the superiority of NN in early disease detection.
Another important strength is the use of balanced evaluation metrics, particularly the MCC, which remains robust in the face of class imbalance. Additionally, the integration of feature importance analysis enhances the interpretability of the models and highlights biologically meaningful predictors of metritis.
However, this study also has limitations. First, the relatively small number of diseased cows (n = 11) may limit the generalizability of the findings. The data originate from a single herd, which may reduce external validity across different management systems or genetic lines. Moreover, while the models demonstrated strong internal performance, they have not yet been validated on an independent external dataset.
A key limitation of this study is the relatively small sample size and class imbalance between metritis-positive and healthy cows. These constraints limited the reliability of model-based feature ranking methods such as permutation importance and random forest-based scoring, which were explored but not included in the manuscript.
Although we selected features based on physiological relevance and literature, the absence of a robust empirical ranking means that model interpretability remains limited. Future work with larger, more balanced datasets is necessary to enable stable variable selection and generalizable predictive insights.
Although this study confirmed the presence of early physiological signals preceding clinical diagnosis of metritis, we did not evaluate model performance on a per-day basis leading up to the diagnosis. A time-series evaluation of model accuracy metrics (e.g., AUC, MCC) for each day before diagnosis could help determine the earliest time point at which each model can reliably predict disease onset. Such an analysis would provide even more precise insights into the temporal resolution of disease detection and is proposed as a direction for future research.
Finally, this study focused on disease detection, not treatment response or cure probability. While many predictive features likely overlap between detection and cure, future studies should aim to evaluate treatment outcomes and include additional physiological and immunological markers.
Although the ML models demonstrated strong internal performance, this study did not include external validation using data from other farms or populations. This limitation is primarily due to the controlled nature of the dataset, which was collected under uniform management and environmental conditions. As model generalizability is essential for practical adoption, future research should focus on validating these algorithms across diverse herds, geographical regions, and production systems to ensure robustness and broader applicability.