4.1. Physiological Significance and Machine Learning Insights from Threshold-Based Heat Stress Detection
Data obtained from precision livestock farming (PLF) devices can facilitate the modelling of heat stress effects, including its influence on rumen function, and support the implementation of suitable mitigation strategies [
62]. This study introduces a novel integrative framework combining biologically defined threshold conditions (C1–C6) with supervised ML to detect early physiological and behavioural alterations associated with heat stress in dairy cows. The results confirm that heat stress has measurable effects on reticulorumen pH, rumination time, milk temperature, and metabolic indicators such as milk FPR and lactose. This study evaluated the capacity of five supervised ML algorithms—PLS-DA, RF, SVM, NN, and an Ensemble model—to classify physiological and metabolic disturbances in dairy cows under various heat stress conditions. Using a combination of environmental (THI), physiological (milk temperature, rumination), and milk composition parameters (milk lactose, FPR), we explored how different models perform in detecting ruminal acidosis, elevated milk FPR, and suppressed rumination, which are indicative of systemic stress and reduced welfare. Our results demonstrate that model performance varied depending on the targeted physiological condition, the features included, and the degree of class balance. Among the algorithms tested, PLS-DA, RF, and the Ensemble models consistently achieved higher performance across multiple scenarios compared to SVM and NN. Becker et al. [
12] created models for the classification of stress in dairy cows utilising three distinct algorithms (random forest, logistic regression, and Gaussian Naive Bayes) and four types of thermal comfort interventions, achieving accuracies ranging from 81.1% to 89.3% [
12]. Rodrigues et al. [
63] developed a unique infrared thermography feature extraction approach for dairy cow heat stress classification [
63]. The thermal signature approach uses the frequency of each predefined temperature range in the thermographic images’ temperature matrix to build an IRT data descriptor vector. Images were taken from five animal body areas. The thermal signature and environmental data were used to build ANN models. The optimal model, derived from the thermal signature of the ocular region, achieved an accuracy of 90.1%, surpassing the accuracy of the current study’s most effective model (86.8%) [
63]. Given Rodrigues et al. [
63] higher accuracy on the same dataset, the thermal signature method may have been the main reason for model improvement [
63].
Under C1 (THI > 73 and milk temperature > 38.6 °C), the physiological hypothesis was supported by a statistically significant reduction in reticulorumen pH (
p = 0.013), albeit with a small effect size (Cohen’s d = −0.21). Among ML models, PLS-DA achieved the highest classification performance (accuracy = 66.7%, AUC = 0.65), while RF and SVM performed modestly. This suggests that linear models like PLS-DA may better capture subtle shifts in physiological states under moderate heat stress when feature variance is low. However, lower sensitivity across all models (≤0.50) reflects the difficulty of detecting early-stage acidosis, particularly in the presence of class imbalance and limited variance in input features. Wagner et al. [
64] submitted 14 cows (Bos taurus) to SARA, a condition that might cause behavioural alterations. A further 14 control cows were not submitted to SARA. K Nearest Neighbours for Regression (KNNR), Decision Tree for Regression (DTR), Multi-Layer Perceptron (MLP), Long Short-Term Memory (LSTM), and an algorithm that assumes daily activity is similar were tested. The best SARA detection algorithm, KNNR, detected 83% of true-positives but created 66% of false-positives, limiting its practical usage. Applying ML to huge animal datasets rather than group datasets may yield more gains [
64]. T. Touil et al. [
65] showed in their work that SARA can be accurately predicted from milk Fourier-transform infrared (FTIR) spectroscopy data utilising ML models. Through the examination of milk samples from individual cows on 12 commercial farms and the use of several ML algorithms—specifically, random forest, gradient boosting, and partial least squares—the researchers attained prediction accuracies for SARA of up to 69% in external validation contexts [
65]. Real-time monitoring of pH and temperature in the reticular contents of dairy cows has been proposed as an effective tool for evaluating the risk of subclinical acidosis [
66]. The pH of the reticulorumen is more likely to be lower in cows who are more susceptible to heat stress [
67]. SARA typically arises when ruminal pH remains between 5.2 and 6 for an extended duration [
68,
69]. Numerous research studies have emphasised the correlation between heat stress vulnerability and reduced reticulorumen pH. For example, research undertaken by Shengguo Zhao and colleagues [
70] demonstrated that the concentration of rumen acetate and pH in the heat-stressed group was substantially reduced, while the concentration of ruminal lactate increased. Cows experiencing heat stress tend to reduce their feed intake, which in turn leads to less rumination. This decrease in rumination lowers the production of saliva, a key source of buffering agents for the rumen [
71]. Moreover, heat stress causes blood flow to be redirected toward the extremities to enhance heat loss, which reduces the blood supply to the gastrointestinal tract. As a result, the absorption of digestion by-products, such as volatile fatty acids (VFAs), becomes less efficient, leading to a rise in total rumen VFA concentration and a corresponding decrease in pH. Additionally, the increased respiratory rate during heat stress contributes to rumen acidosis. Panting results in greater CO
2 exhalation, and since the body relies on a specific bicarbonate (HCO
3) to CO
2 ratio for blood pH buffering, the reduction in blood CO
2 levels prompts the kidneys to excrete bicarbonate. This response limits the amount of HCO
3 available for buffering in the rumen, exacerbating the drop in pH. Furthermore, panting cows tend to drool more, which reduces the saliva available for the rumen. The combination of reduced bicarbonate content in saliva and the diminished saliva entering the rumen makes cows under heat stress more vulnerable to both subclinical and acute rumen acidosis [
72]. Thus, maintaining a healthy rumen is essential for any type of nutritional intervention to counteract the effects of heat stress [
72]. Farmers can assist their cows in better managing the challenges of heat stress and preventing potentially severe health issues by ensuring a healthy reticulorumen environment, whether through dietary adjustments or other interventions. It is imperative to effectively manage heat stress in order to optimise the efficacy of production and the welfare of cows.
Milk FPR is a relevant marker for energy balance and subclinical inflammation. Under C2 (THI > 69, milk lactose < 4.5%), cows had significantly higher milk FPR values (
p < 0.001, d = 0.26), but ML models struggled to identify cows with high FPR due to poor sensitivity (e.g., SVM and RF = 0.00). Notably, only the NN model reached an MCC > 0.30, likely due to its ability to model non-linear interactions between heat load and milk biosynthesis. According to Alicja Satoła et al. [
73], using the fat-to-protein ratio as the sole variable in models for predicting subclinical ketosis proved inadequate, as the models achieved only moderate sensitivity (ranging from 0.58 to 0.69) and specificity (ranging from 0.66 to 0.71) [
73]. Heat stress modifies metabolites in the mammary glands of lactating dairy cows, impacting glycolysis, lactose, ketone, tricarboxylic acid cycle, amino acid, and nucleotide metabolism, thereby impeding the availability of essential components for milk production in lactating Holstein cows [
74]. Oxidative stress is exacerbated by heat stress, which alters the metabolic and molecular activity of mammary secretory tissue. This results in a decrease in cellular efficacy for the synthesis of milk components and a change in the composition of milk [
1]. In our study, outside temperature was moderately positively correlated with milk fat (r = 0.34,
p < 0.001) and milk temperature (r = 0.37,
p < 0.001), and weakly correlated with the milk FPR ratio (r = 0.28,
p < 0.001), indicating a potential effect of heat stress on milk composition. Jang-Hoon Jo et al. [
75] reported analogous research findings, indicating that milk fat is elevated throughout summer, maybe due to prolonged heat stress, resulting in the breakdown of long-chain fatty acids and consequently, a rise in milk fat [
75]. Moreover, the elevated fat percentage in the milk of heat-stressed cows may be ascribed to the decreased milk supply and resultant fat concentration, together with potentially increased non-protein nitrogen levels in the milk from cows experiencing heat stress [
75]. The alterations in milk composition resulting from heat stress may affect dairy farmers regarding milk quality and production. Farmers must monitor and control heat stress in their cows to preserve optimal milk composition and overall herd health. Additional research may be required to comprehensively elucidate the processes behind these alterations and to design ways to alleviate the impact of heat stress on milk production.
In contrast, when the milk FPR threshold was adjusted to >1.5 in C5, all models achieved high predictive performance. The RF, Ensemble, and NN models yielded AUCs ≥ 0.999 and MCC values > 0.93, highlighting the importance of optimising threshold selection in biomarker-based models. However, no substantial difference in milk lactose concentration was observed between the C2 and C5 groups (p > 0.24), indicating that lactose is not a sensitive stand-alone marker of stress-related metabolic imbalance, at least in the early phases. These results suggest that, although individual biomarkers such as lactose may not be sufficient to identify stress-related metabolic disturbances on their own, their accuracy can be significantly improved by incorporating them into a predictive model in conjunction with other markers. This emphasises the significance of employing a multi-marker approach to create effective diagnostic instruments for the early detection of metabolic disorders in dairy cows. Further research is needed to identify additional biomarkers that can complement lactose in improving the predictive performance of these models.
In C3 (THI > 70, rumination < 420 min/day), affected cows exhibited significantly higher milk temperatures (
p < 0.00001, d = 0.64). This finding confirms the link between behavioural suppression and thermoregulatory strain, where reduced feed intake leads to lower buffering capacity and greater internal heat retention. ML models, especially the Ensemble and RF classifiers, achieved high accuracy (≥0.87) and AUC values (>0.93), demonstrating their potential to detect behavioural manifestations of heat stress with high reliability. Dairy cows respond to heat stress through physiological adaptations, including peripheral vasodilation, which increases blood flow to the skin, raising skin temperature and enhancing heat dissipation. When body temperature rises due to inadequate heat dissipation, cows compensate by increasing perspiration to improve cooling efficiency. To maintain a stable body temperature, dairy cows regulate blood circulation and facilitate thermal exchange between their core and epidermis [
76]. Gonzalez-Rivas et al. [
77] report a strong positive association between rumen temperature and THI. Consistent with our research, Liang et al. [
78] noted that climate conditions had a more pronounced effect on rumen temperature during summer (40.4 °C) compared to spring and autumn (40.1 °C) or winter (40.0 °C). The heat produced during fermentation causes the temperature in the rumen of dairy cows to be typically elevated compared to other body regions [
75]. The temperature in the rumen continues to rise due to heat stress, and there are changes in the feed intake and the microbial compositions in the body to reduce heat generation [
75].According to J.W. West et al.’s [
79] findings, milk temperature rose steadily with the outside temperature, reaching 39 °C when the THI hit 72, the point at which cows’ feed intake and milk production decreased, and other signs of heat stress manifested. This suggests that milk temperature can be a useful tool for evaluating heat stress [
25,
79]. Monitoring milk temperature can help farmers identify and address heat stress in their herd before it leads to more serious health issues. By using milk temperature as an indicator of heat stress, producers can implement strategies to mitigate its effects and ensure the well-being of their cows.
Despite no significant group difference in milk lactose concentration (
p = 0.24), classification performance under C4 (THI > 72 and reticulorumen pH < 6.0) was excellent. RF, NN, and Ensemble models reached perfect or near-perfect performance across all metrics (AUC = 0.99–1.00; MCC > 0.92). This highlights that multi-modal integration of THI, milk temperature, and composition can overcome the limitations of single-variable markers. However, the exceptionally high classification scores may also reflect overfitting, which must be addressed with external validation in larger, independent datasets. Heat stress not only diminishes milk quantity but also negatively influences milk quality, resulting in reduced amounts of lactose [
2]. H. M. Farrell et al. [
80] noticed that the lactose levels in grazing cattle diminish throughout the summer season in comparison to the winter season. Garner et al. [
81] have demonstrated that cows that are subjected to prolonged heat exposure generate milk with a composition that is up to 49% lower in lactose. During heat stress, the lactose content in dairy cow milk decreases due to disruptions in lactose synthesis within the mammary gland. Heat stress induces physiological changes, including elevated cortisol levels and oxidative stress, which can downregulate α-lactalbumin (α-La) synthesis—a crucial coenzyme for lactose production. Since α-La modifies the substrate specificity of β-1,4-galactosyltransferase, its reduction impairs the formation of the lactose synthase enzyme complex in the Golgi apparatus of mammary epithelial cells, leading to decreased lactose synthesis [
2,
82]. Additionally, heat stress alters glucose metabolism by increasing energy demands for thermoregulation, potentially reducing glucose availability for lactose formation. Collectively, these factors contribute to lower lactose concentrations in milk under heat stress conditions.
C6 combined environmental (THI > 74) and internal (milk temperature > 38.7 °C) heat stress thresholds to predict lower rumination. Affected cows showed significantly lower rumination times (
p = 0.014, d = −0.34), indicating compromised feeding behaviour under acute heat stress. ML models—particularly RF, NN, and Ensemble—achieved perfect classification accuracy, again raising the need for cautious interpretation. Nonetheless, the strong signal separation affirms the feasibility of developing real-time monitoring tools to detect compound stressors, especially when leveraging integrated biosensor platforms. Ungar et al. [
83] utilised discriminant analysis, logistic regression, and NN as classification techniques, reporting accuracy rates of 67% to 82%, 87%, and 25% to 90%, respectively, in the accurate classification of jaw statistics [
83]. Giovanetti et al. [
84] effectively employed stepwise discriminant analysis (SDA), canonical discriminant analysis (CDA), and discriminant analysis (DA) to autonomously assess particular behaviours utilising a triaxial accelerometer, including the biting activity of dairy sheep in grazing settings [
84]. Abdanan Mehdizadeh et al. [
85] reported that the accuracy of predicting grazing, ruminating, and resting behaviours ranged from 89% to 95%, yielding an overall accuracy of 93% [
85]. Chelotti et al. [
86] developed a pattern recognition technique to classify jaw movements in grazing cattle using acoustic data, achieving a detection rate of 90% in noisy environments [
86]. Ayadi et al. [
87] present a novel monitoring technique utilising a convolutional neural network (CNN)-based deep learning models in their investigation. The classification technique is executed under two primary categories: ruminating and others, utilising all bovine postures recorded by the monitoring camera. The approach proposed by Ayadi et al. [
87] is straightforward and user-friendly, capable of capturing long-term dynamics through a condensed representation of a video in a single 2D image. This method demonstrated efficacy in identifying rumination behaviour, with average accuracy, recall, and precision rates of 95%, 98%, and 98%, respectively [
87]. Moreover, research conducted by Antanaitis et al. [
33] revealed that heat stress substantially reduced ruminating time by up to 70% in cows categorised within the highest THI range (73 to 78) and increased body temperature by 2%. It resulted in a 12.6% decrease in partial carbon dioxide pressure (pCO
2) and a 32% increase in partial oxygen pressure (pO
2). Additionally, plasma sodium and potassium decreased by 1.36% and 6%, respectively, while chloride increased by 3% [
33]. Recent research has found the effects of heat stress on reticulorumen parameters, increasing the risk of acidosis and influencing the activity levels of cows. Heat stress negatively impacted reticulorumen pH, temperature, and the rumination index. A heightened THI (≥72) increases the risk of ruminal acidosis and influences the physical activity levels of cows [
88]. The results demonstrate that Smaxtec data can be employed to assess and analyse the impact of heat stress on dairy cows. These real-time data enable the prompt detection of heat stress effects, reducing the impact of measurement uncertainty and stress associated with animal handling.
The RF and Ensemble models consistently achieved high accuracy and AUC in classifying heifers into threshold-defined states, demonstrating the effectiveness of ML models. It is crucial to note that the models were not black boxes, but rather biologically traceable, as disclosed by SHAP-based interpretability. This highlights the importance of features such as milk temperature, rumination, and pH as significant contributors. This implies a practical relevance for real-time farm monitoring systems that are designed to anticipate at-risk animals before clinical signs appear.
Taken together, these findings support the development of sensor-integrated, threshold-based monitoring systems that move beyond static indicators toward personalised animal-level diagnostics. Further studies should validate the generalisability of the C1–C6 framework across seasons, housing systems, and breeds, and assess its effectiveness in real-time interventions on commercial farms. Prior research has validated the accuracy and reliability of intra-ruminal sensors, further supporting their use in PLF for heat stress management [
89]. By utilising continuous physiological monitoring, dairy farmers can implement data-driven decision-making, optimising cow welfare and milk production efficiency.
Beyond individual herd-level applications, the proposed threshold-based classification framework also has potential implications for bioclimatic zoning and climate adaptation strategies. By integrating environmental variables (THI, ambient temperature, humidity) with internal physiological and behavioural indicators, the methodology can be extended to spatial modelling systems that predict regional heat stress risk. Such models could be incorporated into bioclimatic zoning tools to identify high-risk areas for dairy production under current and projected climate conditions. This approach aligns with recent studies emphasising the need to combine multiple data streams for spatially explicit risk assessment and management [
90,
91]. In this context, our framework not only contributes to on-farm early warning systems but also offers a scalable methodology to support climate-resilient livestock planning, enabling producers and policymakers to anticipate shifts in thermal load patterns and adapt housing, nutrition, and management strategies accordingly.
4.2. Study Strengths, Methodological Limitations, and Directions for Future Research
While this study advances the integration of biologically informed threshold conditions with ML for early detection of heat- and metabolism-related stress in dairy cows, several limitations warrant consideration. First, all data were sourced from a single commercial dairy farm with Holstein-Friesian cows managed under specific housing and feeding systems during the early postpartum period (0–60 days in milk). As such, the generalisability of our threshold conditions (C1–C6) and model performance to other breeds, management systems, climatic regions, or lactation stages remains to be evaluated. Replication across multiple farms and seasons is necessary before implementing these thresholds in broader precision farming platforms.
Second, although the thresholds were grounded in established physiological knowledge and exploratory data analysis, they have not yet been validated against gold-standard measures of animal health, such as blood biomarkers, rumen fluid sampling, or clinical diagnoses. For example, while reticulorumen pH below 6.0 is a recognised indicator of sub-acute ruminal acidosis in literature, our study relied on sensor-derived estimates without corroborating laboratory validation. Similarly, deviations in milk lactose and FPR may reflect inflammation or metabolic imbalance, but are influenced by myriad factors, including stage of lactation and dietary changes.
Third, some of the algorithmic results—particularly the near-perfect accuracy and AUC values achieved by RF and Ensemble models—suggest potential overfitting. Although we performed stratified train–test splits and bootstrap resampling, external validation on independent datasets is essential to confirm model robustness and avoid results artificially inflated by data idiosyncrasies.
Fourth, while explaining model decisions using SHAP values enhances interpretability, the resulting feature contributions may be overly specific to our dataset. Breeds, feed formulations, sensor models, or farm management practices not represented here could alter these associations. Future work should evaluate whether the same features hold predictive weight in other contexts or require adaptation.
Finally, although our dataset was rich—with 200 cows monitored for 60 days and approximately 36,000 records—the non-random missing data or sensor failures could introduce biases in both model training and results interpretation. While we employed standard preprocessing methods to handle missing values and outliers consistently, future studies may consider alternative imputation strategies or sensitivity analyses to quantify the impact of missingness on threshold detection and model performance. Missing values were imputed using the median of each variable, calculated from the training set only to avoid data leakage, while outliers were handled through winsorisation at the 1st and 99th percentiles [
92]. These preprocessing steps minimised the influence of missing or extreme values while preserving the integrity of the dataset.
In conclusion, while the proposed thresholds and models show great promise for early detection of heat and metabolic stress in dairy herds, further research is required to validate these tools across diverse production environments, health statuses, and lactation stages to ensure their robustness and practical applicability in real-world decision-support systems.