Next Article in Journal
Optimizing a Mix of Forage Cactus (Nopalea cochenillifera), Tifton (Cynodon sp.) Hay and Urea for Efficient Feeding of Ruminants in the Brazilian Semi-Arid Ecotype
Previous Article in Journal
The Impact of Oregano Essential Oil and the Finishing System on Performance, Carcass Characteristics and Meat Quality in Heifers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Towards Decision Support in Precision Sheep Farming: A Data-Driven Approach Using Multimodal Sensor Data

by
Maria P. Nikolopoulou
1,2,
Athanasios I. Gelasakis
3,
Konstantinos Demestichas
4,
Aphrodite I. Kalogianni
3,
Iliana Papada
3,
Paraskevas Athanasios Lamprou
3,
Antonios Chalkos
3,
Efstratios Manavis
3 and
Thomas Bartzanas
1,*
1
Laboratory of Farm Structures, Department of Natural Resources Management & Agricultural Engineering, Agricultural University of Athens, Iera Odos 45 Street, 11855 Athens, Greece
2
R&D Department, Telefarm SA., 11744 Athens, Greece
3
Laboratory of Anatomy and Physiology of Farm Animals, Department of Animal Science, School of Animal Biosciences, Agricultural University of Athens, Iera Odos 45 Street, 11855 Athens, Greece
4
Laboratory of Computer Science, Department of Agricultural Economics, Agricultural University of Athens, Iera Odos 45 Street, 11855 Athens, Greece
*
Author to whom correspondence should be addressed.
Ruminants 2026, 6(1), 3; https://doi.org/10.3390/ruminants6010003
Submission received: 30 November 2025 / Revised: 26 December 2025 / Accepted: 31 December 2025 / Published: 4 January 2026

Simple Summary

Sheep farmers must continuously monitor their animals to ensure welfare and sustain optimal productivity. However, many crucial challenges, such as stress or the onset of underlying health disorder problems, are difficult to detect through routine labor-intensive observation alone. In this study, we integrated data from multiple sensor systems, including activity and GPS collars, thermal imaging, and barn environmental monitors, alongside regular welfare assessments. Our aim was to evaluate how accurately these combined data streams could predict key welfare indicators in dairy sheep. Using advanced machine learning models, we found that animal behavior and environmental conditions were strong predictors of traits such as medial canthus eye temperature, daily locomotion, and milk yield. The models also showed potential in identifying early signs of impaired welfare, including altered respiratory rates. Overall, our results suggest that the fusion of diverse sensor signals can facilitate the early detection of welfare issues and support more informed decision-making for improved management in intensive sheep farming systems.

Abstract

Precision livestock farming (PLF), by integrating multimodal sensor data, provides opportunities to enhance welfare monitoring and management in small ruminants. This study evaluated whether environmental, physiological, and behavioral measurements—including the temperature–humidity index (THI), carbon dioxide (CO2) and ammonia (NH3) concentrations measured at the barn level, body condition score (BCS), rectal and ocular temperatures, GPS-derived locomotion metrics, accelerometry data, and fixed animal traits—can serve as key predictors of welfare and productivity in dairy sheep. Data were collected from 90 ewes: all animals underwent the same repeated welfare assessments, while 30 of them were additionally equipped with GPS–accelerometer sensor collars; environmental conditions were continuously recorded for the entire flock, generating 773 complete multimodal records. All predictive models were developed using data from all 90 ewes; collar-derived behavioral variables were included only for individuals equipped with GPS–accelerometer collars. Nine regression methods (linear regression (LR), partial least square regression (PLSR), elastic net (EN), mixed-effects models, random forest (RF), extreme gradient boosting (XGBoost), support vector regression (SVR), neural networks (multilayer perceptron, MLP), and an ensemble of RF–XGBoost–EN were evaluated using a combination of nested cross-validation (CV) and leave-one-animal-out CV (LOAOCV) to ensure robustness and generalization at the individual animal level. Nonlinear models—particularly RF, XGBoost, SVR, and the ensemble—consistently delivered superior performance across traits. For behavioral (e.g., daily distance movement) and thermal indicators (e.g., medial canthus temperature), the highest predictive capacity (R2 ≈ 0.60–0.70) was achieved, while moderate predictive capacity (R2 ≈ 0.40–0.50 and ≈0.35–0.45), respectively, was observed for respiratory rate and milk yield, reflecting their multifactorial nature. Feature importance analyses underscored the relevance of THI, CO2, NH3, concentrations, and BCS across results. Overall, these findings demonstrate that multimodal sensor fusion can effectively support the prediction of welfare and productivity indicators in intensively reared dairy sheep and emphasize the need for larger and more diverse datasets to further enhance model generalizability and model transferability.

1. Introduction

As global food demand continues to rise, farmers are required to increase productivity while maintaining high welfare standards for their animals [1,2,3]. Precision livestock farming (PLF) has emerged as a revolutionary approach for animal production, leveraging technological advancements to enhance health, welfare, and productivity while supporting sustainable management practices. The key idea of PLF is the integration of sensor-derived data streams, enabling continuous and objective monitoring of both animals and their environments. Timely decision-making in commercial livestock systems is crucial to maximize animal welfare and production [1,4]. Access to detailed phenotypic information is necessary to facilitate such decisions; however, obtaining these data remains challenging in commercial settings where sheep are handled infrequently [4].
Changes in animal behavior can serve as early indicators of underlying disease or compromised welfare [1]. However, conventional welfare assessments in sheep rely on labor-intensive and subjective measures, such as body condition scoring, respiratory rate (RR) monitoring, and behavioral observation. Non-invasive sensors like accelerometers offer an alternative by enabling continuous monitoring of animal activity, providing insight into an individual’s physiological state and external responses [5]. Changes in animal behavior can be used for the detection of health disorders or physiological responses that are accompanied by behavioral signs [4,5]. Animal-borne sensor devices enable the continuous recording of activity data that can be analyzed to categorize behavioral patterns [6], while accelerometers, in particular, have been used to detect fundamental behaviors like grazing, standing, lying, walking, and ruminating [4,5]. Global positioning system (GPS) adds spatial context, thereby facilitating the assessment of grazing patterns and movement across heterogeneous landscapes [7]. Locomotion in precision livestock monitoring includes several aspects of mobility, such as axis-specific activity recorded by accelerometers and horizontal displacement recorded by GPS [8].
Small ruminant farms often operate under harsher environmental conditions and face practical and infrastructural limitations, which make continuous monitoring of the production environment particularly important. Engineering advances and the decreasing costs of new electronic technologies have enabled the development of sensor-based solutions that collect data automatically and in real time, allowing the early detection of problems related to production loss, poor health, or threats to wellbeing. These sensing systems can monitor key environmental variables—such as temperature, humidity, air quality, and illumination—offering objective information that supports timely decision-making at group or individual level. Because small ruminant farms typically have a high number of animals, low individual value, and reduced staff-to-animal ratios, automated environmental monitoring helps compensate for limited labor availability and enhances the interpretation of behavioral and physiological indicators. Furthermore, environmental measurements form an essential component of PLF systems, enabling continuous, sensor-based assessment under commercial conditions where manual observations are often constrained by labor, cost, and farm location [9].
Despite notable advances, much of the research on sheep behavior monitoring has been conducted in experimental settings with small sample sizes and limited monitoring durations [4]. Single-sensor systems or multiple sensors from a single source are frequently used in studies [10]; however, hybrid and multimodal approaches hold greater promise for generating physiological and behavioral insights [7]. Recent progress in wireless sensor networks, Internet of Things (IoT) technologies, and machine learning (ML) has further driven interest in real-time PLF applications [6,10]. Continuous monitoring enables the early detection of abnormal behaviors, enhances the efficiency of farm management, and contributes to improved animal welfare [7].
Heat stress represents a significant challenge to both welfare and productivity in sheep, particularly within Mediterranean and subtropical production systems [11]. Heat stress disrupts physiological, biochemical, and behavioral processes [11,12], leading to reduced productivity, immune suppression, and increased susceptibility to infectious diseases [11]. Rectal temperature, RR, and rumen temperature are examples of traditional heat stress indicators [12]; however, these methods are invasive, labor-intensive, and may themselves induce additional stress in animals. In contrast, sensor systems that continuously monitor physiological status offer a promising alternative for non-invasive and real-time assessment of heat stress [13].
Ιnfrared thermography (IRT) provides a non-invasive method to assess thermal responses in animals [12,14]. It has been applied to evaluate stress-related temperature variations in specific anatomical regions, such as the eye, muzzle, and flank [12]. However, the accuracy of IRT measurements is susceptible to several environmental and methodological factors—such as the image capture angle, distance from the animal, and ambient conditions—which can complicate data interpretation [12]. These challenges underscore the need for reliable analytical approaches, with machine learning (ML) algorithms presenting a promising avenue to adjusting and modeling nonlinear relationships between thermal and core body temperature responses [12].
There is growing interest in integrating multimodal data—behavioral, physiological, and environmental—to improve prediction capacity, enhance robustness, and strengthen interpretability [7,9,10,15,16,17]. Non-invasive remote sensing techniques, including thermal imaging and computer vision, have been used to identify physiological indicators such as respiration and heart rate without requiring physical contact [15]. Furthermore, edge computing and near-real-time data processing play an important role in supporting timely decision-making and early interventions in farm animal management [10].
Despite these advances, existing data-driven and sensor-based modeling studies in sheep have often focused on single sensing streams or a limited number of welfare indicators [18,19], frequently evaluated using standard train–test splits. Consequently, the combined value of simultaneously integrating behavioral, physiological, and environmental data streams, as well as the robustness of model generalization across individual animals, remains insufficiently explored in dairy sheep systems. Moreover, while computational and statistical approaches are increasingly reported, their application within intensive dairy sheep housing conditions using multimodal data integration and rigorous validation strategies is still limited.
In this exploratory study, we investigated whether multimodal sensor data—covering behavioral measurements, physiological indicators, and environmental conditions—can be used to predict continuous welfare- and productivity-related traits in dairy sheep. Through the development of machine learning regression models, this work provides an initial assessment of how these diverse data streams contribute to explaining variation in milk yield, thermal responses, respiratory dynamics, and daily locomotion under Mediterranean housing conditions, while explicitly evaluating model performance across multiple validation strategies to assess robustness and animal-level generalization.

2. Materials and Methods

2.1. Experimental Setup and Data Collection

The research was conducted at a commercial dairy sheep farm located at Paiania, Greece, for one lactation period (between February and July 2025). A total of 90 purebred milking ewes (75 Chios and 15 Lesvos) were enrolled in the study, 45 days post-partum, and were systematically monitored for one lactation period. Among them, 30 ewes (15 Chios and all 15 Lesvos ewes) were randomly selected to be equipped with GPS collars (by Digitanimal, Digitanimal S.L., Calle de la Ribera del Loira 46, 28042 Madrid, Spain), while the remaining 60 Chios ewes were assessed exclusively through physical observation. An environmental sensing system was installed within the housing facilities, which continuously recorded ambient conditions for the entire flock. All animals were reared under the same management conditions.
The collars continuously recorded surface temperature, accelerometer data, and GPS position. Accelerometer data were collected in short bursts at a sampling frequency of 10 Hz for 18 s every 11 min. Summary statistics were subsequently computed by the device and time-stamped using the corresponding GPS fix; GPS position data were recorded at the same 11 min interval, whereas surface temperature was logged at the manufacturer’s default interval. Daily locomotion metrics were derived from these signals and included both daily distance traveled (from GPS coordinates) and accelerometer-based movement intensity (z-axis activity), providing complementary information on horizontal displacement and vertical motion. Environmental sensors measured ammonia (NH3) (DOL 53; measurement range 0–100 ppm), carbon dioxide (CO2) (DOL 119; measurement range 400–10,000 ppm), illuminance (lux) (DOL 16; measurement range 0–1000 lux), ambient temperature (°C), and relative humidity (%) (DOL 114; temperature −40 to +60 °C and relative humidity 0–100%), all manufactured by dol-sensors A/S, Agro Food Park 15, 8200 Aarhus N, Denmark. Methane (CH4) concentrations were measured using a Guardian NG sensor (with measurement range 0–1%) manufactured by Edinburgh Sensors, Livingston, Scotland, UK. All environmental sensors were factory-calibrated by the manufacturers prior to installation and operated using the manufacturers’ default logging intervals throughout the study period; no additional field recalibration was performed. Environmental sensors were installed at two heights within the housing facility: gas concentration sensors (CH4, NH3, CO2) were positioned at approximately 1 m above ground level to reflect the inhalation zone of the animals, whereas temperature, relative humidity, and illuminance sensors were mounted at approximately 2 m to capture broader ambient conditions within the housing environment. In addition, the temperature–humidity index (THI) was calculated using the measurements of the dry bulb temperature and relative humidity using the formula:
T H I = 1.8   ×   T +   32 0.55 0.0055   ×   R H   ×   1.8   ×   T 26.8 ,
where T is the dry bulb temperature (°C) and RH the relative humidity (%) [20].
Physiological- and welfare-related parameters were recorded in all animals at regular intervals, including weekly assessments of the body condition score (BCS, a five-degree scale, with 0.25 increments, was used, where 1 = emaciated and 5 = obese). Body condition scoring was performed at the lumbar spine and short ribs region following standard palpation guidelines [21]. Rectal temperature was measured by a digital thermometer (UKAL, Château-Thierry, France, with measuring range 32.0–43.9 °C) and medial canthus eye temperature was measured by a thermal camera (°C) (FLIR E54 24°, by FLIR Systems, Wilsonville, OR, USA). Thermal recordings were obtained via short video sequences captured by the same trained operator using a predefined protocol as regards the settings and the procedure, measuring the medial canthus of the eye from approximately 0.5 m distance (Figure 1). Image acquisition was standardized across animals, with the camera positioned approximately perpendicular to the target region to minimize angular effects. Emissivity was set according to manufacturer-recommended values for biological tissues (emissivity value 0.95), and ambient environmental conditions measured at the time of acquisition were used as camera inputs. The thermal sensitivity, noise-equivalent temperature difference (NETD), accuracy, emissivity, and resolution of the camera were <40 mK at 30 °C, ±2 °C or ±2% of the reading, 0.95, and 320 × 240 pixels, respectively. Respiratory rate (breaths/minute) was measured by counting flank movements for 60 s. Respiratory rate, rectal temperature, and infrared thermography measurements were collected prior to milking under routine farm management practices, minimizing additional handling and reflecting normal on-farm practice. Respiratory rate, rectal temperature, medial canthus thermal recordings, and daily milk yield measurements were all collected during the same time period of each sampling day to ensure consistency across observations. From the 30 ewes with the monitoring collars, serum blood samples were collected monthly for the determination of serum cortisol concentrations, using clot activator tubes, transferred under 4 °C in the lab where they were centrifuged at 3000× g for 10 min [22]; the serum was separated and used for the measurement of cortisol levels (ng/mL) using a commercial ELISA kit (Sheep Cortisol ELISA Kit, ELK8817, ELK Biotechnology Co., Ltd., Denver, CO, USA) according to the manufacturer’s instructions (Figure 2). Blood sampling was performed by the same trained veterinarian after milking and under routine farm handling conditions in accordance with national animal welfare regulations and approved protocols, minimizing animal stress and discomfort. Monthly serum cortisol concentrations were included as a physiological indicator reflecting endocrine responses to routine farm management conditions. Also, individual milk recordings were performed monthly for all milking ewes and daily milk yield (DMY) was calculated according to ICAR recommendations. All ewes were mechanically milked twice daily at 12 h intervals (06:00 and 18:00), following the standard routine of the farm.
Measurements for all physiological and behavioral variables were collected at approximately 06:00–10:00 a.m. for each sampling day to avoid differences caused by normal changes in the animals’ physiology throughout the day.
To ensure consistency and completeness, all data were cleaned and formatted appropriately. Additionally, they were aligned by animal ID. Time variables were converted to standardized date formats to enable temporal alignment across data sources. Records with extreme or biologically implausible values were excluded based on biologically informed criteria derived from the literature (e.g., implausible rectal or ocular temperatures, respiratory rates outside physiological limits, or GPS-derived daily distances equal to zero due to sensor malfunction). Missing values were predominantly structural and arose from differences in measurement frequency across variables, with some outcomes and predictors collected weekly or monthly and daily sensor-derived variables used as lagged predictors (day-1, day-2, etc.) preceding each observation. True missing data due to sensor malfunction were minimal and limited to a short interruption (seven consecutive days) from a single collar. This structure allowed the models to account for delayed physiological and behavioral responses of animals to environmental conditions and management conditions. Missing numeric values were imputed using median imputation. Categorical variables, like breed, were converted to dummy variables to allow integration into the models. To avoid information leakage, median imputation, normalization, and categorical encoding were implemented within the training data of each resampling fold only, using the “recipes” framework. For each weekly or monthly outcome measurement, daily sensor-derived variables from the preceding days (day-1 to day-6) were used as predictors, ensuring temporal consistency and preventing the inclusion of information collected after the outcome assessment.

2.2. Machine Learning and Model Evaluation

To predict key welfare and production indicators—including milk yield, medial canthus eye temperature, daily locomotion, and RR—a suite of supervised machine learning regression algorithms was applied. Multiple regression and machine learning algorithms, based on behavioral, physiological, and environmental variables, were applied to predict ewe welfare indicators. The target variables—daily milk yield, medial canthus of the eye temperature, RR, and daily distance—represent major welfare factors associated with productivity, stress, and physical condition. Predictors included body condition score (BCS), environmental measures (THI, temperature from collar, CO2, NH3, etc.), and behavioral metrics derived from GPS (daily distance, average speed, and positional means), complemented by fixed animal characteristics (breed, age) and, when available, cortisol concentrations as a physiological indicator related to stress responses.
Before modeling, in order to evaluate potential multicollinearity, all predictor variables were examined for intercorrelation through correlation heat maps. A moderate correlation magnitude (absolute Pearson correlation of approximately 0.5) was used as a reference threshold to identify moderate-to-strong associations and to aid interpretation of predictor relationships. Importantly, this correlation analysis was used for exploratory assessment only, and no automated correlation-based feature elimination was applied prior to model development.
Modeling was conducted using several statistical and machine learning methods: linear regression (LR), PLSR, RF, XGBoost, MLP, EN, SVR, ensemble learning, and mixed-effects models. Each model was trained and validated on identical datasets across six temporal windows (day-1 to day-6), allowing for a robust comparison of predictive stability and performance.
Initial exploratory regression visualizations were created and used to demonstrate the connections between important behavioral and environmental predictors and productive performance.
Prior to model fitting, all numeric predictors were centered and scaled using z-score normalization to ensure comparability across variables with different units and ranges. Normalization was implemented within the preprocessing workflows and estimated using training data only within each resampling fold. Hyperparameter tuning was conducted using nested cross-validation. For random forest models, the number of variables randomly sampled at each split and the number of trees were tuned. For extreme gradient boosting models, tree depth, learning rate, and the number of boosting iterations were optimized. Support vector regression models were tuned over kernel type, cost (regularization parameter), and kernel width, while multilayer perceptron models were implemented as feed-forward neural networks with tuning focused on network size and regularization parameters. Elastic net models were tuned over mixing and penalty parameters.
A comprehensive regression framework was implemented for continuous outcomes. Linear models and partial least squares regression (PLSR) were utilized for interpretability and for addressing residual multicollinearity, while elastic net (EN) regularization aimed to balance bias and variance. Random forest and extreme gradient boosting (XGBoost) captured complex nonlinear interactions, and support vector regression (SVR) and feed-forward neural networks (multilayer perceptron, MLP) modeled highly nonlinear response patterns. A linear mixed-effects model (LMM) incorporated animal identity as a random effect to account for repeated measures. In addition, a simple ensemble technique combined predictions from RF, XGBoost, and EN to improve robustness and predictive accuracy.
Model training and evaluation followed a nested cross-validation protocol, with five outer folds for unbiased model evaluation and inner folds for hyperparameter optimization. For the main analyses, the data were split into training (70%) and testing (30%) sets. To assess model generalization across individuals, a grouped cross-validation strategy was applied (leave-one-animal-out). To ensure model reliability and generalizability, a nested cross-validation (CV) procedure was implemented. In this scheme, the dataset was divided into outer folds used for unbiased model evaluation and inner folds used for hyperparameter tuning, minimizing the risk of overfitting during model selection. Additionally, a leave-one-animal-out cross-validation (LOAOCV) strategy was applied to test how well models could generalize across individual animals. In LOAOCV, all records from one animal are excluded from model training and then used as the test set, allowing performance to be evaluated on entirely unseen individuals. This approach reflects real-world deployment scenarios in which predictions must extend to new animals not previously observed by the model.
Animal identity was therefore explicitly respected as a grouping factor during resampling for all machine learning models through the LOAOCV procedure, preventing information leakage across repeated measurements from the same ewe.
Model performance was quantified using coefficient of determination (R2), root mean squared error (RMSE), mean absolute error (MAE) [23], Pearson’s correlation coefficient (r), and the concordance correlation coefficient (CCC). All models were developed in R using the tidy models framework and specialized libraries for tree-based methods, neural networks, and mixed-effects modeling. The inclusion of lagged environmental and behavioral predictors (e.g., day-1, day-2 values) enabled the models to capture both immediate and delayed effects of farm management and environmental conditions on animal welfare and productivity.

2.3. Ethical Approval

All experimental procedures, including blood sampling and monitoring of animals, were examined and approved by the Animal Research Ethic Committee of the Agricultural University of Athens. In accordance with Article 23, Paragraph 1 of Law 4521/2018, the committee assessed the submitted study protocol and related documents and granted approval (protocol number 96/26.09.2025). All applicable guidelines and regulations concerning the ethical treatment of animals were applied during the conduction of this study.

3. Results

Basic descriptive statistics of the modeling dataset, including the number of observations per breed and mean values of key environmental variables and milk yield, are presented in Table 1 to contextualize the subsequent analyses.

3.1. Descriptive Visualization of Key Relationships

The relationship between the temperature–humidity index (THI) and daily milk yield (DMY) is illustrated in Figure 3, which shows individual observations for Chios (X) and Lesvos (M) ewes together with a smoothed regression line and its confidence interval. A similar visualization is provided in Figure 4 for daily distance traveled and DMY, again separating the two breeds and displaying both the distribution of measurements and the fitted trend line. These plots offer a descriptive overview of the data structure prior to the application of the modeling framework.

3.2. Model-Specific Results

3.2.1. Linear Regression (LR)

Linear models provided modest predictive performance across welfare indicators, with R2 ranging from 0.18 to 0.46 for most of the outcome variables. The model captured broad trends between productivity and environmental conditions but was limited in explaining nonlinear relationships. For daily milk yield, linear regression reached an R2 of approximately 0.42 under the best-performing day 2 and day 4 configurations, while for temperature of medial canthus of the eye, performance peaked at R2 = 0.48–0.51. The model underperformed for behavioral metrics such as daily distance moved (R2 ≈ 0.28–0.34), reflecting its inability to model complex interactions between environmental stressors and mobility, and RR peaked at R2 ≈ 0.36. Under CV and LOAOCV, the same linear model showed the expected attenuation of performance: milk yield R2 ≈ 0.18–0.35, temperature of the medial canthus of the eye R2 ≈ 0.41–0.59, daily distance R2 ≈ 0.39, and RRR2 ≈ 0.19. Despite the drop in accuracy, overfitting remained low, with similar patterns between training and validation sets, suggesting stable but limited generalization.

3.2.2. Partial Least Squares Regression (PLSR)

Partial least squares regression improved model stability by demonstrating multicollinearity among other predictors. Across welfare indicators, R2 values varied between 0.43 and 0.59, with the higher performance being achieved for daily milk yield and temperature of medial canthus of the eye. The method effectively managed to minimize the noise from correlated predictors such as the THI, temperature, and CO2, which are inherently interdependent in intensive housing systems. Under both CV and LOAOCV, PLSR maintained a consistent ranking and a slight decrease in R2 values (typically by 0.05–0.10), illustrating stable generalization among individuals as well as low overfitting and reliable generalization across animals. The marginal gains showed that PLSR’s linear component structure is not sufficient to fully represent nonlinear dynamics, despite reaching a higher R2 in comparison to linear regression.

3.2.3. Random Forest (RF)

The RF model delivered excellent predictive performance across all dependent variables, with R2 values from 0.38 to 0.69. For DMY, RF consistently achieved R2 values between 0.37 and 0.41, surpassing the performance of other single models. R2 reached values of 0.65–0.69 for the temperature of the medial canthus of the eye and the value of approximately 0.62 for daily distance moved, respectively. Under CV and LOAOCV, RF sustained its high performance, with R2 values varying from 0.38 to 0.66 and 0.38 to 0.58, respectively. The algorithm also produced the most consistent CCC values across all the target variables, demonstrating a robust correlation between observed and predicted values. The model’s ensemble architecture decreased noise sensitivity and enhanced generalization on the other hand. Nonetheless, its higher training accuracy compared to the validation findings resulted in the showing of a moderate overfitting. These findings emphasize RF’s capability to model nonlinear dependencies and complex feature interactions. Moreover, it must be mentioned that feature importance analysis revealed that BCS, THI, and daily distance moved prevailed as predictors for productivity and welfare, proving the intertwined effects of body condition and thermal stress. Similar results have been observed by studies utilizing RF to animal monitoring data, as by choosing random subsets of covariates, it constructs multiple decision trees, improving the predictive accuracy and reducing overfitting.

3.2.4. Extreme Gradient Boosting (XGBoost)

Extreme gradient boosting showed comparable—and in some cases slightly superior—performance to RF, with R2 values ranging from 0.38 to 0.66. Its gradient boosting mechanism demonstrated quite encouraging findings for a large part of the predictor combinations. XGBoost achieved up to R2 = 0.66 for the temperature of the medial canthus of the eye, and R2 values between 0.38 and 0.59 for daily distance moved and DMY. Extreme gradient boosting also succeeded with CCC values, illustrating strong correlation between the observed and predicted values. Under CV and LOAOCV, its performance remained consistent, with R2 values ranging between 0.33 and 0.72 and 0.33 and 0.73, indicating stable learning behavior despite the individual-level variability in the dataset.

3.2.5. Neural Network (MLP)

The MLP model demonstrated moderate to strong performance, with R2 values between 0.30 and 0.66 for most indicators. The neural networks effectively identified nonlinear patterns, although their performance was influenced by the sample size and data variability. For instance, the temperature of the inner canthus of the eye reached R2 ≈ 0.66, followed by daily activity with a maximum value of R2 ≈ 0.54, RR with R2 ≈ 0.39, and average daily milk yield. The model’s predictive accuracy under CV and LOAOCV displayed only moderate variations (R2 ≈ 0.30–0.71 and 0.26–0.71, respectively), validating the model’s generalization ability across individuals.

3.2.6. Elastic Net (EN)

Elastic net regression produced consistent yet moderate predictive performance (R2 ≈ 0.20–0.57). By combining L1 (lasso) and L2 (ridge) regularization, the method extends ordinary linear regression by simultaneously performing variable selection (through L1 shrinkage of some coefficients to zero) and stabilizing correlated predictors (through L2 penalization) [24]. After evaluation through CV and LOAOCV, stable R2 values (≈0.20–0.62) were yielded, confirming this way the reliability of the model across folds and individuals, despite having lower predictive ability than other nonlinear methods. Elastic net regression successfully reduced multicollinearity among environmental predictors (THI, CO2, NH3) and behavioral parameters. Nonetheless, its predictive accuracy remained at lower levels compared to that of ensemble and nonlinear methods, since elastic net is not able to capture higher-order interactions.

3.2.7. Ensemble Modeling

By combining predictions from a linear model and two tree-based learners, the ensemble modeling achieved a balanced performance with R2 values of 0.38–0.66 for most indicators. Additionally, the ensemble averages enhanced stability and decreased variance across days on the other hand. CV produced R2 ≈ 0.32–0.70, while LOAOCV yielded R2 values between 0.30 and 0.69, displaying resilience to both random resampling and animal-level heterogeneity.

3.2.8. Mixed-Effects Model

The mixed-effects model achieved strong in-sample performance (R2 ≈ 0.62) but substantially lower accuracy on the test set (R2 ≈ 0.38, ΔR2 ≈ 0.24), indicating moderate overfitting. This discrepancy likely suggests structural overfitting to animal-specific random intercepts: the model effectively captured variance within individuals but generalized with lower reliability across animals. The excessive flexibility of the model in relation to the available sample size (n = 773) contributed to variance overfitting as well. After applying parameter penalization and CV, performance stabilized at R2 ≈ 0.31—0.64, while LOAOCV resulted in R2 ≈ 0.31–0.62, confirming limited yet consistent generalization across individuals.

3.2.9. Support Vector Regression (SVR)

Support vector regression produced stable and robust results across welfare indicators, achieving R2 values based on kernel configuration from 0.27 to 0.66. The model, because of its superior performance for medial canthus eye temperature (R2 ≈ 0.66) and daily milk yield (R2 ≈ 0.38), illustrated effective nonlinear mapping between physiological and environmental predictors and welfare outcomes. The train–test R2 gap (ΔR2 ≈ 0.06) across most targets indicated minimal overfitting and strong generalization capacity. After CV, performance remained consistent (R2 ≈ 0.23–0.64), while after LOAOCV, predictive accuracy was mostly maintained (R2 ≈ 0.21–0.64), highlighting SVR generalized well across individuals.

3.3. Comparative Model Performance

Random forest and XGBoost consistently achieved the highest R2 values across all target variables, with ensemble learning closely following. The neural network performed competitively but with slightly higher variance, whereas PLSR and EN produced moderate yet interpretable solutions. Mixed-effects models presented high fit but poor predictive transferability, highlighting that hierarchical models are preferable for inferential, not predictive purposes under small-to-medium sample constraints. For both DMY and medial canthus eye temperature, models that incorporated combined environmental and behavioral predictors (e.g., THI, daily distance moved, BCS, blood cortisol, rectal temperature) performed better than those using only physiological parameters, confirming this way the multifactorial nature of welfare outcomes.
Across models, predictive performance was generally stable across temporal aggregation windows, with marginal improvements observed when short-term lagged predictors (day-1 to day-3) were included, while longer temporal windows provided limited additional gains.

3.4. Interpretation Across Welfare Indicators

Given the multimodal nature of the dataset and the variable availability across measurement systems, the contribution of animals and predictors differed among welfare indicators. Daily milk yield, RR, BCS, and thermographic measurements were available for all 90 ewes, whereas GPS-derived behavioral metrics and collar-based temperature were available only for the 30 collared animals. Monthly serum cortisol concentrations were exclusively available for this collared subset and were therefore included only in models fitted on this reduced dataset. Predictors not available for all animals were not imputed across individuals; instead, models were trained using the largest compatible subset of observations for each outcome–predictor combination.

3.4.1. Daily Milk Yield (DMY)

Daily milk yield in the dataset ranged from 0.0 to 3.9 lt/day, with a mean of 1.40 lt/day and a standard deviation of 0.80 lt/day (n = 327). These values align with typical production levels for Chios and Lesvos dairy sheep, where average yields generally have a decrease across the lactation period, fluctuating between 1.2 and 2.0 lt/day in commercial Mediterranean dairy systems. The reported range therefore appears biologically plausible, reflecting both low-yield and high-yield individuals within the flock. Daily milk yield predictions demonstrated the biggest accuracy, with RF (R2 ≈ 0.50) and XGBoost (R2 ≈ 0.42). Despite the fact that the achieved R2 values might seem moderate (Table 2), they showed, however, significant predictive accuracy given the biological and environmental variability in the data. Milk yield is affected by a variety of physiological, behavioral, and microclimatic factors, many of which are stochastic or not detected by sensor measurements. In such multifactorial systems, explaining 0.40–0.50 of the total variances is considered a robust performance, suggesting that the models captured the dominant environmental and behavioral drivers of productivity. Due to their repeated-measures architecture, mixed models tend to overfit, highlighting that tree-based models generalize better in heterogeneous farm conditions (Table 3).
Table 2 shows the best-performing predictor combinations for each algorithm under standard evaluation conditions. For each model, the independent variables set that reached the highest test set R2 was selected from the full library of tested feature combinations. Rows 1–3 address the optimal configuration for LR, rows 4–6 for PLSR, rows 7–9 for RF, and so on, with each triplet of rows also reporting firstly the standard train–test evaluation, followed by the nested CV and LOAOCV results. In cases where the same predictor subset also reached the top performance for another model too, the results are listed only once. This brief presentation allows direct comparison of each algorithm’s maximum predictive capability for DMY, without burdening the reader with numerous predictor combinations examined during model development.
Table 3 summarizes model generalization behavior by comparing training and test results for each algorithm. Overfitting was evaluated using practical criteria based on the divergence between training and test performance. A small rise in error (RMSE/MAE) from train to test (≤10–20%) and a minor decline in R2 or CCC (≤0.05–0.10) from train to test was classified as low overfitting. A moderate increase in error (≈20–50%) or a moderate drop in R2 or CCC (≈0.10–0.30) indicated moderate overfitting and large discrepancies, such as a train error much higher than 50%, or R2/CCC dropping near to 0 or even negative values, were interpreted as severe overfitting. Predictive performance was also evaluated separately using only the test set R2 and CCC, as these scale-independent metrics provide a more meaningful representation of biological prediction accuracy than RMSE or MAE, which depend on the absolute range of milk yield (0–3.9 lt/day). Performance categories were assigned using the following interpretation: if R2 or/and CCC was low, less than 0.30 corresponds to weak performance; R2 or/and CCC ranges from 0.30 to 0.60 correspond to moderate performance, and values exceeding R2 or/and CCC more than 0.60 correspond to good predictive ability. These empirical adopted criteria allowed characterization of both overfitting severity and predictive strength for each modeling approach [25].

3.4.2. Medial Canthus of the Eye

The medial canthus eye temperature demonstrated relatively low variability in the dataset, ranging from 35.0 to 41.9 °C, with a mean of 38.0 °C and a standard deviation of 0.87 °C. This range aligns with the expected physiological surface temperature in sheep, typically 38.3–39.9 °C under non-pathological conditions. The small spread and low coefficient of variation reveal that infrared thermography of the medial canthus captures a stable physiological trait, moderately affected by environmental or behavioral changes [26]. This target accomplished being one of the most predictable targets, achieving R2 ≈ 0.70 and 0.65 for RF and XGBoost, respectively. Environmental and physiological variables, particularly THI and skin temperature measured by collar, had an important contribution as well. Elevated ambient temperature and CO2 levels were consistently associated with increased medial canthus temperature, confirming stress responses. Table 4 presents the best-performing predictor combinations for each modeling algorithm in estimating medial canthus eye temperature (°C). From the full set of tested feature subsets, the predictor subset yielding the highest test set R2 under the standard train–test split was selected as the representative model for each algorithm. In cases where the same predictor subset also reached the top performance for another model too, the results are listed only once, because repeating identical entries would not enhance interpretability. Where differences existed, the triplet of evaluations—standard split, nested CV, and LOAOCV—appears in consecutive rows. Given the relatively narrow biological range of medial canthus temperature, even modest improvements in accuracy are physiologically meaningful, and this table summarizes each algorithm’s strongest expression under three complementary evaluation frameworks. Table 5 summarizes the generalization behavior of all models for medial canthus eye temperature by comparing their training and test metrics. Overfitting was evaluated using the same practical magnitude-based thresholds applied uniformly across all target variables in this study. Specifically, an increase of ≤10–20% in RMSE or MAE from train to test, combined with a decline in R2 or CCC ≤ 0.05–0.10, was classified as low overfitting; increases of ≈ 20–50% in error or drops of ≈0.10–0.30 in R2/CCC indicated moderate overfitting; and larger discrepancies—such as major error inflation or substantial loss of R2/CCC—were interpreted as severe overfitting.

3.4.3. Respiratory Rate (RR)

In this dataset, RR ranged from 24 to 138 breaths/min, with a mean of 43.7 breaths/min and a standard deviation of 16.5 breaths/min (n = 758). Respiratory rate showed a moderate to almost low predictive ability with R2 ≈ 0.46 under RF and XGBoost. However, the accuracy of the model was enhanced, by integrating the THI, NH3, temperature variables, and the activity of the animal, underscoring their combined impact on respiration under heat load. A poor performance (R2 < 0.25) was also displayed by elastic net and linear methods, showcasing the need for nonlinear modeling to capture stress-induced respiratory fluctuations. These results align with findings in livestock physiology, where heat stress markers present a nonlinear increase beyond threshold temperatures. This pattern is better shown in Figure 5, where a boxplot comparison of the observed and predicted RR shows that the measured data display wider variability and several physiologically plausible outliers, while the model predictions remain more tightly clustered, reflecting the natural smoothing imposed by the regression algorithms. The boxes depict the interquartile range, whiskers represent values within the typical 20–90 breaths/min interval, and black dots indicate outliers.
Table 6 provides a summary of the highest-performing predictor subsets for estimating the RR for each algorithm, following the same selection strategy applied to the two previous target variables. For every modeling approach, we chose the predictor configuration that achieved the highest test set R2 under the standard train–test split, while also reporting the corresponding nested CV and LOAOCV results. As before, when a predictor set produced the top R2 for multiple algorithms, it is listed only once to avoid unnecessary repetition. This structure allows direct comparison of the best achievable performance for each method in predicting RR while maintaining clarity. Table 7 presents the generalization behavior of the RR models by comparing the train and test results for each algorithm. Overfitting categories were assigned using the same criteria applied to the previous targets: small train–test divergence indicates low overfitting; larger discrepancies indicate moderate or severe overfitting. Predictive performance was also evaluated consistently using only test set R2 and CCC, classifying models as weak, moderate, or good depending on the magnitude of these scale-independent metrics. By using the same interpretive thresholds across all target variables, the performance of RR models can be compared directly to the results obtained for eye temperature and daily milk yield.

3.4.4. Daily Distance

Daily animal movement distances in our dataset ranged from 0 to 5251 m/day, with a mean movement distance of 1435 m/day (SD = 1011 m). This substantial variability reflects the different housing and management conditions across recording days, since the distance a ewe travels depends both on the available space inside the barn and on the extent to which animals are moved for outdoor grazing. Daily distance moved was effectively predicted, with RF achieving R2 ≈ 0.62 and XGBoost achieving R2 ≈ 0.59, indicating that behavioral activity can be modeled with high accuracy using environmental and body condition features. Ensemble approaches produced comparable R2 values of roughly 0.60. Predictors such as the NH3 and CO2 concentration showed a negative correlation with distance, consistent with behavioral thermoregulation (reduced movement under stress).
The predictive performance for daily distance moved is summarized in Table 8 and Table 9, where each algorithm is evaluated using the best-performing feature subset identified during model development. Both tables follow the same structure used for the previous target variables, reporting standard evaluation, alongside their corresponding nested CV and LOAOCV results to ensure a transparent comparison of generalization behavior.

4. Discussion

This study demonstrated that the integration of behavioral, physiological, and environmental sensor data enables the prediction of important welfare and productivity traits in dairy sheep under Mediterranean conditions. Nonlinear algorithms—particularly RF, XGBoost, and SVR—consistently outperformed linear baselines, confirming that welfare-related traits are shaped by complex, nonlinear interactions between environmental stressors and animal responses. In this context, for the LR model, although accuracy decreased, the method showed low overfitting, with comparable trends across training and validation sets, indicating stable, yet limited generalization. Similar limitations were noted in earlier precision livestock research where linear approaches captured only partial variance of animal behavior under fluctuating microclimatic conditions [27]. Similarly, for the PLSR model, the slight improvements indicated that its linear component structure cannot fully capture nonlinear relationships. These findings align with reports that PLSR has a good performance in moderate-dimensional datasets but often saturates when nonlinearities prevail [28]. It is considered especially useful for constructing prediction equations when there are many explanatory variables but comparatively small sample data. Partial least squares analysis was designed to deal with multiple regression when the data have a small sample, missing values, or multicollinearity [29]. Consistently with these patterns, for the RF approach, the algorithm accomplishes implicit feature selection to generate uncorrelated decision trees, making this way an effective method, especially in datasets with numerous features [30,31]. For gradient boosting methods such as XGBoost, comparable superior performance in livestock thermal and behavioral prediction has been reported in recent data-driven studies on animal welfare [32]. In addition, for a multilayer perceptron model, its performance may surpass tree-based algorithms when data volume is large; however, overfitting is still a risk in medium-sized agricultural datasets. To mitigate this issue, regularization and early stopping were applied, and they proved effective for most predictor combinations. For the elastic net model, its consistent behavior across days indicated suitability for scenarios where interpretability and transparency have higher priority than maximizing predictive power [14]. Moreover, for the ensemble model, the findings align with previous evidence that improves generalization in sensor-based livestock prediction tasks and mitigates the overfitting of single learners [33]. Even though predictive performance was lower than that of ensemble and tree-based approaches, the mixed-effects framework is still theoretically appropriate for repeated-measures animal data, as it explicitly addresses hierarchical dependencies (e.g., multiple days per ewe) and inter-individual variability. Therefore, while its predictive reliability is statistically justified, it was also restrained by the dataset’s dimensionality and sampling structure. However, these attempts showed signs of overfitting and limited generalization and were therefore considered exploratory and are discussed as methodological limitations and directions for future work. Finally, the SVR model was effective in modeling nonlinear and moderately sized datasets, where despite having limited interpretability, its kernel-based formulation facilitates accurate and stable predictions from noisy inputs [34]. The stronger performance of tree-based and ensemble models in this study can be attributed to their ability to capture nonlinear relationships, higher-order interactions among predictors, and complex dependencies arising from repeated measurements within animals. Unlike linear models, these approaches are robust to multicollinearity and can flexibly model threshold effects commonly observed in physiological and behavioral responses, which likely contributed to their improved generalization under the present data structure. Overall, the predictive accuracy reached biologically meaningful levels. From an on-farm decision support perspective, prediction performance should be interpreted primarily in terms of its ability to support monitoring, early warning, and management prioritization rather than precise point estimation. In applied precision livestock and agricultural decision support systems, moderate predictive performance can be operationally meaningful under commercial conditions characterized by high biological variability and measurement noise [35]. Accordingly, RMSE values should be interpreted relative to the natural variability of each trait rather than as absolute prediction errors.
In addition to the overall predictive accuracy, the choice of a temporal aggregation window has important practical implications for on-farm decision support. Shorter aggregation windows (e.g., day-1 to day-2) favor responsiveness and are better suited for the early detection of acute changes in animal status, such as rapid increases in thermal load or abrupt alterations in activity patterns. In contrast, longer aggregation windows (e.g., day-4 to day-6) improve robustness by smoothing short-term fluctuations and reducing the influence of transient variability, which may be advantageous for monitoring more stable welfare trends. The relatively consistent performance observed across temporal windows in this study suggests a trade-off between responsiveness and stability, indicating that shorter windows may be more appropriate for real-time monitoring, whereas longer windows may support retrospective assessment or strategic management decisions on farms.
Thermal indicators (temperature of the medial canthus of the eye) and behavioral (daily distance moved) showed the highest accuracy with R2 values of approximately ≈ 0.6–0.7, suggesting strong physiological linkage to environmental conditions. On the contrary, RR and DMY exhibited greater variability, consistent with non-continuous measurements and multi-variability of factors determining animals’ productivity—where important, often unnoticed factors (e.g., genetics, social dynamics) present stochastic noise. In particular, RR showed the lowest overall predictability among the traits, reflecting its strong dependence on transient heat load, short-term activity bursts, and handling events—factors that introduce substantial biological variability and require high-frequency sensing for accurate modeling. In relation to these findings, it is important to note that RR is particularly sensitive to short-term fluctuations in heat load and activity. Additionally, although normal resting RR in sheep typically ranges from 16 to 34 breaths/min, the broader and higher values noted here likely indicate periods of activity, environmental heat load, or handling-related stress [36]. Respiratory rate represents a direct physiological proxy of thermoregulatory response, whereas environmental variables such as the THI and gas concentrations act as external factors that indirectly influence respiratory patterns. Accordingly, model associations likely reflect indirect correlations mediated through heat load and activity rather than direct causal effects of environmental variables on respiratory rate. Nonetheless, explaining approximately 40–50% of the variance in milk yield demonstrates robust performance in this context. For DMY, similar findings of explained variance have been widely reported in animal behavior and welfare modeling, where high inter-individual variability and measurement noise inherently lead to the limitation of predictive ceilings [37,38,39]. From a biological perspective, the observed associations between milk yield, environmental conditions, and behavioral activity are consistent with known thermophysiological and metabolic responses in dairy sheep. Elevated thermal load increases maintenance energy demands and alters thermoregulation, while changes in activity reflect adaptive behavioral responses to environmental stress. Together, these processes influence the partitioning of energy between maintenance and production, providing a plausible physiological basis for the observed relationships, which should be interpreted as associative rather than causal in this preliminary analysis [40]. For the medial canthus temperature trait, machine learning models effectively captured these nonlinear physiological thresholds, consistent with findings reported in precision thermography studies [41]. For daily distance moved, the observed patterns agree with automated behavior tracking studies demonstrating opposite correlations between locomotion and microclimatic stress [42]. Together with these model-based findings, the exploratory visual relationships among traits further support the biological interpretation of the results. These visual analyses provide preliminary evidence that both activity and heat stress have a big impact on the productivity of Lesvos and Chios ewes. Notably, the observed patterns align with physiological expectations: higher heat load is associated with decreased milk production, while increased movement is typically linked to better performance. In contrast, slowly changing traits such as body weight or body condition are inherently more difficult to predict from short-term sensor data, because they reflect cumulative nutritional balance and longer-term physiological adaptation. This creates a temporal mismatch between rapidly varying environmental/behavioral predictors and gradually evolving outcomes, limiting the ability of short aggregation windows to capture meaningful variation in body weight-related traits.
The mixed-effects model effectively captured within-animal variability but provided a limited advancement as far as cross-animal generalization is concerned, indicating moderate variance overfitting. On the other hand, ensemble and tree-based models generalized well across individuals, due to their verification by nested and leave-one-animal-out validation.
Feature importance patterns showed that the temperature–humidity index, CO2, and body condition score were significant predictors across traits, emphasizing the combined effects of thermal load and energy balance on welfare results. All environmental variables, including NH3 and CO2, were measured at the barn (housing) level and shared across animals; therefore, their contribution should be interpreted as contextual indicators of environmental exposure rather than as animal-specific physiological measurements. These outcomes are consistent with earlier precision livestock studies indicating that environmental and behavioral indicators often outperform invasive physiological measures for continuous welfare monitoring [29,43]. While feature importance analyses were used to identify dominant predictors, the present study did not aim to provide causal or mechanistic explanations at the individual prediction level. Advanced explainability methods (e.g., SHAP-based decomposition) were therefore considered beyond the scope of this proof of concept and represent an important direction for future work.
Given the strong influence of environmental and behavioral predictors, additional physiological variables were considered to determine whether they could further strengthen model performance. In a preliminary analysis, the inclusion of cortisol (with the specific predictor combinations tested here) did not appear to meaningfully improve the predictive performance of the algorithms for the examined target traits. This finding indicates that cortisol did not contribute additional explanatory power beyond the existing multimodal feature set.
Monthly serum cortisol concentrations were used as an indicator of longer-term physiological stress load under typical management conditions rather than as a marker of acute stress. Given the context-sensitive nature of cortisol and its responsiveness to handling and management-related events—particularly when sampled after milking—associations involving cortisol in this study should be interpreted as correlational and reflective of cumulative exposure to environmental and management-related factors rather than diagnostic of stress status. This context may partly explain the limited additional predictive value of cortisol beyond behavioral and environmental indicators in the present multimodal framework.
Despite the encouraging performance, several limitations must be acknowledged. The dataset size (n = 773) and the sampling frequency limited the ability to fit more complex models. Environmental variables were measured at housing level, likely averaging individual differences, and daily aggregation may have further obscured short-term stress responses as well. Nevertheless, the use of nested CV and LOAOCV led to the reduction in overfitting and provided realistic assessments of model generalization. In addition, the single-farm design may limit the generalizability and transferability of the proposed models to other dairy sheep farming systems. Breed composition may influence behavioral patterns, physiological responses, and adaptability to environmental stressors, while housing characteristics and farm-specific management practices may alter the relationships between predictors and welfare outcomes. Although breed was included as a predictor, breed-specific models were not developed due to sample size imbalance and limited statistical power, particularly for the Lesvos breed. Consequently, model performance observed in this study may not directly translate to systems with different breeds, housing conditions, or management strategies, highlighting the need for external validation across multiple farms and farming systems.
Also, the present study should be regarded as a proof of concept rather than a fully validated decision support system. While the results demonstrate the feasibility of predicting welfare- and productivity-related indicators using multimodal sensor data and machine learning approaches, additional steps are required before practical deployment. These include validation on larger and multi-farm datasets, evaluation across different housing systems and management practices, assessment of real-time data processing constraints, and the translation of model outputs into actionable decision thresholds for farmers. Furthermore, this study provides one of the first demonstrations of integrating behavioral, physiological, and environmental sensor data to model welfare- and productivity-related indicators in dairy sheep, specifically in Chios and Lesvos breeds, for which multimodal approaches remain scarce. Conceptually, it supports the value of data fusion for welfare assessment in small ruminant systems, while methodologically, it highlights the role of nonlinear models and temporal aggregation under rigorous validation frameworks. Together, these findings contribute to precision livestock farming research by clarifying both the potential and the current limitations of sensor-driven, data-based welfare monitoring in dairy sheep. Future research should emphasize larger, multi-farm datasets and temporal models with the capability of capturing high-frequency behavioral dynamics. Combining explainable AI with multimodal sensing will further enhance proactive welfare management in precision sheep farming.

5. Conclusions

This study highlights that integrating environmental, physiological, and behavioral sensor data can provide a strong foundation for predicting welfare- and productivity-related indicators in dairy sheep under Mediterranean conditions. Nonlinear regression approaches—especially RF, XGBoost, SVR, and an ensemble of complementary learners—consistently accomplished a better performance than linear and mixed-effects models, highlighting the significance of capturing nonlinear interactions between thermal stress, body condition, and animal behavior. Thermal (as medial canthus eye temperature) and behavioral (daily distance moved) traits achieved the highest predictive accuracy, while DMY and RR showed only a moderate predictability, indicating their inherently multifactorial nature. The temperature–humidity index, CO2, NH3, and BCS emerged as significant contributors across characteristics in feature importance analyses. However, the limitations of the dataset of the present study (one farm, two breeds, a single lactation period, a relatively short temporal coverage, and reliance on housing-level environmental measures) do not permit fully capturing inter-individual heterogeneity and longer-term adaptive responses. Accordingly, the present work should be considered a proof of concept demonstrating the feasibility of multimodal prediction. Future research should prioritize validation using larger, multi-farm datasets, extended monitoring periods, and higher-frequency behavioral sensing, to assess the robustness and generalizability of the proposed modeling framework across diverse production contexts.

Author Contributions

Conceptualization, M.P.N. and T.B.; methodology, M.P.N., A.I.G., K.D. and T.B.; validation, M.P.N., A.I.G., K.D. and T.B.; formal analysis, M.P.N.; investigation, M.P.N., A.I.K., I.P., P.A.L., A.C. and E.M.; writing—original draft preparation, M.P.N.; data curation, M.P.N.; writing—review and editing, T.B., A.I.G., A.I.K. and K.D.; supervision, T.B., funding acquisition: T.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research is implemented within the framework of the National Recovery and Resilience Plan «Greece 2.0» funded by the European Union—NextGenerationEU: ΥΠ1ΤA-0558858.

Institutional Review Board Statement

All experimental procedures, including blood sampling and monitoring of animals, were examined and approved by the Research Ethics and Deontology Committee (Ε.H.Δ.Ε.) of the Agricultural University of Athens. In accordance with Article 23, Paragraph 1 of Law 4521/2018, the committee assessed the submitted study protocol and related documents and granted approval. All applicable guidelines and regulations concerning the ethical treatment of animals were applied during the conduction of this study. The approval number is 97/26.09.2025, issued on 26 September 2025.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available at this time due to contractual restrictions and intellectual property agreements associated with an ongoing industrial research project conducted in collaboration with Telefarm S.A. and funded under the National Recovery and Resilience Plan “Greece 2.0” (European Union—NextGenerationEU, Project ID: ΥΠ1ΤA-0558858). Data will be made accessible upon completion of the project and release from the associated restrictions.

Acknowledgments

The authors would like to thank Telefarm S.A. for their invaluable support in the preparation and during the experiments of this work. The resources, administrative assistance, and access to relevant materials provided by Telefarm S.A. were essential in enabling the authors to thoroughly analyze and compile the information presented in this manuscript. This support is gratefully acknowledged.

Conflicts of Interest

The author affiliated with Telefarm S.A. is Maria P. Nikolopoulou. The author declares that this affiliation did not influence the design, execution, analysis, or reporting of the study. The authors declare no conflicts of interest.

References

  1. Kaler, J.; Mitsch, J.; Vázquez-Diosdado, J.A.; Bollard, N.; Dottorini, T.; Ellis, K.A. Automated Detection of Lameness in Sheep Using Machine Learning Approaches: Novel Insights into Behavioural Differences among Lame and Non-Lame Sheep. R. Soc. Open Sci. 2020, 7, 190824. [Google Scholar] [CrossRef] [PubMed]
  2. Ezenwa, V.O.; Archie, E.A.; Craft, M.E.; Hawley, D.M.; Martin, L.B.; Moore, J.; White, L. Host Behaviour–Parasite Feedback: An Essential Link between Animal Behaviour and Disease Ecology. Proc. R. Soc. B Biol. Sci. 2016, 283, 20153078. [Google Scholar] [CrossRef]
  3. Vermeulen, K.; Aerts, J.-M.; Dekock, J.; Bleyaert, P.; Berckmans, D.; Steppe, K. Automated Leaf Temperature Monitoring of Glasshouse Tomato Plants by Using a Leaf Energy Balance Model. Comput. Electron. Agric. 2012, 87, 19–31. [Google Scholar] [CrossRef]
  4. Price, E.; Langford, J.; Fawcett, T.W.; Wilson, A.J.; Croft, D.P. Classifying the Posture and Activity of Ewes and Lambs Using Accelerometers and Machine Learning on a Commercial Flock. Appl. Anim. Behav. Sci. 2022, 251, 105630. [Google Scholar] [CrossRef]
  5. Ikurior, S.J.; Marquetoux, N.; Leu, S.T.; Corner-Thomas, R.A.; Scott, I.; Pomroy, W.E. What Are Sheep Doing? Tri-Axial Accelerometer Sensor Data Identify the Diel Activity Pattern of Ewe Lambs on Pasture. Sensors 2021, 21, 6816. [Google Scholar] [CrossRef]
  6. Cabezas, J.; Yubero, R.; Visitación, B.; Navarro-García, J.; Algar, M.J.; Cano, E.L.; Ortega, F. Analysis of Accelerometer and GPS Data for Cattle Behaviour Identification and Anomalous Events Detection. Entropy 2022, 24, 336. [Google Scholar] [CrossRef] [PubMed]
  7. Jin, Z.; Shu, H.; Hu, T.; Jiang, C.; Yan, R.; Qi, J.; Wang, W.; Guo, L. Behavior Classification and Spatiotemporal Analysis of Grazing Sheep Using Deep Learning. Comput. Electron. Agric. 2024, 220, 108894. [Google Scholar] [CrossRef]
  8. Lachica, M.; Barroso, F.G.; Prieto, C. Seasonal Variation of Locomotion and Energy Expenditure in Goats under Range Grazing Conditions. J. Range Manag. 1997, 50, 234. [Google Scholar] [CrossRef]
  9. Caja, G.; Castro-Costa, A.; Salama, A.A.K.; Oliver, J.; Baratta, M.; Ferrer, C.; Knight, C.H. Sensing Solutions for Improving the Performance, Health and Wellbeing of Small Ruminants. J. Dairy Res. 2020, 87, 34–46. [Google Scholar] [CrossRef]
  10. Fogarty, E.S.; Swain, D.L.; Cronin, G.M.; Moraes, L.E.; Bailey, D.W.; Trotter, M. Developing a Simulated Online Model That Integrates GNSS, Accelerometer and Weather Data to Detect Parturition Events in Grazing Sheep: A Machine Learning Approach. Animals 2021, 11, 303. [Google Scholar] [CrossRef]
  11. Berihulay, H.; Abied, A.; He, X.; Jiang, L.; Ma, Y. Adaptation Mechanisms of Small Ruminants to Environmental Heat Stress. Animals 2019, 9, 75. [Google Scholar] [CrossRef]
  12. Joy, A.; Taheri, S.; Dunshea, F.R.; Leury, B.J.; DiGiacomo, K.; Osei-Amponsah, R.; Brodie, G.; Chauhan, S.S. Non-Invasive Measure of Heat Stress in Sheep Using Machine Learning Techniques and Infrared Thermography. Small Rumin. Res. 2022, 207, 106592. [Google Scholar] [CrossRef]
  13. Davison, C.; Michie, C.; Hamilton, A.; Tachtatzis, C.; Andonovic, I.; Gilroy, M. Detecting Heat Stress in Dairy Cattle Using Neck-Mounted Activity Collars. Agriculture 2020, 10, 210. [Google Scholar] [CrossRef]
  14. Choudhury, M.; Saikia, T.; Banik, S.; Patil, G.; Pegu, S.R.; Rajkhowa, S.; Sen, A.; Das, P.J. Infrared Imaging a New Non-Invasive Machine Learning Technology for Animal Husbandry. Imaging Sci. J. 2020, 68, 240–249. [Google Scholar] [CrossRef]
  15. Fuentes, S.; Gonzalez Viejo, C.; Chauhan, S.S.; Joy, A.; Tongson, E.; Dunshea, F.R. Non-Invasive Sheep Biometrics Obtained by Computer Vision Algorithms and Machine Learning Modeling Using Integrated Visible/Infrared Thermal Cameras. Sensors 2020, 20, 6334. [Google Scholar] [CrossRef] [PubMed]
  16. Silva, S.R.; Sacarrão-Birrento, L.; Almeida, M.; Ribeiro, D.M.; Guedes, C.; González Montaña, J.R.; Pereira, A.F.; Zaralis, K.; Geraldo, A.; Tzamaloukas, O.; et al. Extensive Sheep and Goat Production: The Role of Novel Technologies towards Sustainability and Animal Welfare. Animals 2022, 12, 885. [Google Scholar] [CrossRef]
  17. Emsen, E.; Kutluca Korkmaz, M.; Odevci, B.B. Artificial Intelligence-Assisted Selection Strategies in Sheep: Linking Reproductive Traits with Behavioral Indicators. Animals 2025, 15, 2110. [Google Scholar] [CrossRef]
  18. Cabrera, V.; Delbuggio, A.; Cardoso, H.; Fraga, D.; Gómez, A.; Pedemonte, M.; Ungerfeld, R.; Oreggioni, J. Harnessing Technology for Livestock Research: An Online Sheep Behavior Monitoring System. IEEE Trans. AgriFood Electron. 2024, 2, 306–313. [Google Scholar] [CrossRef]
  19. Z, T.B.; Shastry, C. Ewe Health Monitoring Using IoT Simulator. In Proceedings of the 2022 IEEE International Conference on Data Science and Information System (ICDSIS), Hassan, India, 29 July 2022; IEEE: Hassan, India, 2022; pp. 1–8. [Google Scholar]
  20. Dikmen, S.; Hansen, P.J. Is the Temperature-Humidity Index the Best Indicator of Heat Stress in Lactating Dairy Cows in a Subtropical Environment? J. Dairy Sci. 2009, 92, 109–116. [Google Scholar] [CrossRef] [PubMed]
  21. Vall, E.; Blanchard, M.; Sib, O.; Cormary, B.; González-García, E. Standardized Body Condition Scoring System for Tropical Farm Animals (Large Ruminants, Small Ruminants, and Equines). Trop. Anim. Health Prod. 2025, 57, 106. [Google Scholar] [CrossRef]
  22. Kalogianni, A.I.; Bouzalas, I.; Bossis, I.; Gelasakis, A.I. Seroepidemiology of Maedi-Visna in Intensively Reared Dairy Sheep: A Two-Year Prospective Study. Animals 2023, 13, 2273. [Google Scholar] [CrossRef]
  23. Jadon, A.; Patil, A.; Jadon, S. A Comprehensive Survey of Regression Based Loss Functions for Time Series Forecasting. arXiv 2022, arXiv:2211.02989. [Google Scholar] [CrossRef]
  24. Owen, A.B. A Robust Hybrid of Lasso and Ridge Regression. In Contemporary Mathematics; Verducci, J.S., Shen, X., Lafferty, J., Eds.; American Mathematical Society: Providence, RI, USA, 2007; Volume 443, pp. 59–71. ISBN 978-0-8218-4195-2. [Google Scholar]
  25. Montesinos López, O.A.; Montesinos López, A.; Crossa, J. Multivariate Statistical Machine Learning Methods for Genomic Prediction; Springer International Publishing: Cham, Switzerland, 2022; ISBN 978-3-030-89009-4. [Google Scholar]
  26. Arfuso, F.; Acri, G.; Piccione, G.; Sansotta, C.; Fazio, F.; Giudice, E.; Giannetto, C. Eye Surface Infrared Thermography Usefulness as a Noninvasive Method of Measuring Stress Response in Sheep during Shearing: Correlations with Serum Cortisol and Rectal Temperature Values. Physiol. Behav. 2022, 250, 113781. [Google Scholar] [CrossRef]
  27. Misiura, M.M.; Filipe, J.A.N.; Kyriazakis, I. Mathematical and Statistical Approaches to the Challenge of Forecasting Animal Performance for the Purposes of Precision Livestock Feeding. In Smart Livestock Nutrition; Kyriazakis, I., Ed.; Springer International Publishing: Cham, Switzerland, 2023; pp. 141–167. ISBN 978-3-031-22584-0. [Google Scholar]
  28. Agiomavriti, A.-A.; Bartzanas, T.; Chorianopoulos, N.; Gelasakis, A.I. Spectroscopy-Based Methods for Water Quality Assessment: A Comprehensive Review and Potential Applications in Livestock Farming. Water 2025, 17, 2488. [Google Scholar] [CrossRef]
  29. Agiomavriti, A.-A.; Nikolopoulou, M.P.; Bartzanas, T.; Chorianopoulos, N.; Demestichas, K.; Gelasakis, A.I. Spectroscopy-Based Methods and Supervised Machine Learning Applications for Milk Chemical Analysis in Dairy Ruminants. Chemosensors 2024, 12, 263. [Google Scholar] [CrossRef]
  30. Frizzarin, M.; Gormley, I.C.; Berry, D.P.; Murphy, T.B.; Casa, A.; Lynch, A.; McParland, S. Predicting Cow Milk Quality Traits from Routinely Available Milk Spectra Using Statistical Machine Learning Methods. J. Dairy Sci. 2021, 104, 7438–7447. [Google Scholar] [CrossRef] [PubMed]
  31. Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Chapman and Hall/CRC: Boca Raton, FL, USA, 1984. [Google Scholar] [CrossRef]
  32. Ma, R.; Chen, R.; Liang, B.; Li, X. A XGBoost-Based Prediction Method for Meat Sheep Transport Stress Using Wearable Photoelectric Sensors and Infrared Thermometry. Sensors 2024, 24, 7826. [Google Scholar] [CrossRef]
  33. Kumar, S.; Kaur, P.; Gosain, A. A Comprehensive Survey on Ensemble Methods. In Proceedings of the 2022 IEEE 7th International conference for Convergence in Technology (I2CT), Mumbai, India, 7 April 2022; IEEE: Mumbai, India, 2022; pp. 1–7. [Google Scholar]
  34. Lima, A.R.; Cannon, A.J.; Hsieh, W.W. Nonlinear Regression in Environmental Sciences by Support Vector Machines Combined with Evolutionary Strategy. Comput. Geosci. 2013, 50, 136–144. [Google Scholar] [CrossRef]
  35. Colaço, A.F.; Richetti, J.; Bramley, R.G.V.; Lawes, R.A. How Will the Next-Generation of Sensor-Based Decision Systems Look in the Context of Intelligent Agriculture? A Case-Study. Field Crops Res. 2021, 270, 108205. [Google Scholar] [CrossRef]
  36. Table: Resting Respiratory Rates. Available online: https://www.msdvetmanual.com/multimedia/table/resting-respiratory-rates (accessed on 17 November 2025).
  37. Wang, A.; Brito, L.F.; Zhang, H.; Shi, R.; Zhu, L.; Liu, D.; Guo, G.; Wang, Y. Exploring Milk Loss and Variability during Environmental Perturbations across Lactation Stages as Resilience Indicators in Holstein Cattle. Front. Genet. 2022, 13, 1031557. [Google Scholar] [CrossRef]
  38. Becker, C.A.; Collier, R.J.; Stone, A.E. Invited Review: Physiological and Behavioral Effects of Heat Stress in Dairy Cows. J. Dairy Sci. 2020, 103, 6751–6770. [Google Scholar] [CrossRef]
  39. Neves, S.F.; Silva, M.C.F.; Miranda, J.M.; Stilwell, G.; Cortez, P.P. Predictive Models of Dairy Cow Thermal State: A Review from a Technological Perspective. Vet. Sci. 2022, 9, 416. [Google Scholar] [CrossRef]
  40. Ploumi, K.; Belibasaki, S.; Triantaphyllidis, G. Some Factors Affecting Daily Milk Yield and Composition in a Flock of Chios Ewes. Small Rumin. Res. 1998, 28, 89–92. [Google Scholar] [CrossRef]
  41. Bakker, M.L.; Milano, G.D.; Fernández, J.; Alvarado, P.I.; Nadin, L.B. Lack of Agreement among Analysers of Infrared Thermal Images in the Temperature of Eye Regions in Sheep. J. Therm. Biol. 2024, 126, 104021. [Google Scholar] [CrossRef] [PubMed]
  42. Antanaitis, R.; Džermeikaitė, K.; Bespalovaitė, A.; Ribelytė, I.; Rutkauskas, A.; Japertas, S.; Baumgartner, W. Assessment of Ruminating, Eating, and Locomotion Behavior during Heat Stress in Dairy Cattle by Using Advanced Technological Monitoring. Animals 2023, 13, 2825. [Google Scholar] [CrossRef] [PubMed]
  43. Essien, D.; Neethirajan, S. Multimodal AI Systems for Enhanced Laying Hen Welfare Assessment and Productivity Optimization. Smart Agric. Technol. 2025, 12, 101564. [Google Scholar] [CrossRef]
Figure 1. FLIR camera measurement.
Figure 1. FLIR camera measurement.
Ruminants 06 00003 g001
Figure 2. Elisa kit lab preparation for measurement of cortisol levels in serum blood samples.
Figure 2. Elisa kit lab preparation for measurement of cortisol levels in serum blood samples.
Ruminants 06 00003 g002
Figure 3. Relationship between temperature–humidity index (THI) and daily milk yield (lt) in Lesvos (M) and Chios (X) ewes.
Figure 3. Relationship between temperature–humidity index (THI) and daily milk yield (lt) in Lesvos (M) and Chios (X) ewes.
Ruminants 06 00003 g003
Figure 4. Relationship between daily milk yield (lt) and daily distance traveled (m) in Lesvos (Μ) and Chios (X) ewes.
Figure 4. Relationship between daily milk yield (lt) and daily distance traveled (m) in Lesvos (Μ) and Chios (X) ewes.
Ruminants 06 00003 g004
Figure 5. Boxplot comparison of observed and predicted RR values across all animals. Observed measurements exhibit greater variability and several physiologically plausible outliers, whereas predicted values show tighter dispersion, reflecting the smoothing behavior of the regression model. Boxes represent the interquartile range, whiskers indicate variability within the 20–90 breaths/min range, and black dots represent outliers.
Figure 5. Boxplot comparison of observed and predicted RR values across all animals. Observed measurements exhibit greater variability and several physiologically plausible outliers, whereas predicted values show tighter dispersion, reflecting the smoothing behavior of the regression model. Boxes represent the interquartile range, whiskers indicate variability within the 20–90 breaths/min range, and black dots represent outliers.
Ruminants 06 00003 g005
Table 1. Descriptive statistics of the modeling dataset by breed.
Table 1. Descriptive statistics of the modeling dataset by breed.
VariableChiosLesvos
Number of observations518255
DMY 1 (kg)0.00–1.87 (0.66 ± 0.36) 20.10–2.10 (0.73 ± 0.48) 2
Daily distance traveled (m)0–2668 (1004 ± 576) 241–5251 (2082 ± 1169) 2
THI 150.99–75.54 250.99–75.54 2
NH3 1 (ppm)4.67–30.17 24.67–30.17 2
CO2 1 (ppm)474.54–666.65 2474.54–666.65 2
CH4 1 (%)0.038–0.065 20.038–0.065 2
1 DMY: daily milk yield; THI: temperature–humidity index; NH3: ammonia; CO2: carbon dioxide; CH4: methane; 2 values are reported as min–max (mean ± SD) where applicable.
Table 2. Model performance for predicting daily milk yield (DMY) in liters using standard train–test evaluation, nested cross-validation (CV), and leave-one-animal-out cross-validation (LOAOCV). Metrics reported include R2, RMSE, MAE, and CCC for each algorithm across all predictor sets.
Table 2. Model performance for predicting daily milk yield (DMY) in liters using standard train–test evaluation, nested cross-validation (CV), and leave-one-animal-out cross-validation (LOAOCV). Metrics reported include R2, RMSE, MAE, and CCC for each algorithm across all predictor sets.
PredictorsLRPLSRRFXGBoostMLPENEnsembleMixed-EffectsSVR
R2RMSE
lt/day
MAE
lt/day
CCCR2RMSE
lt/day
MAE
lt/day
CCCR2RMSE
lt/day
MAE
lt/day
CCCR2RMSE
lt/day
MAE
lt/day
CCCR2RMSE
lt/day
MAE
lt/day
CCCR2RMSE
lt/day
MAE
lt/day
CCCR2RMSE
lt/day
MAE
lt/day
CCCR2RMSE
lt/day
MAE
lt/day
CCC R2 RMSE
lt/day
MAE
lt/day
CCC
10.470.550.440.680.220.740.580.450.380.600.460.600.330.630.510.410.340.620.460.620.300.640.480.470.370.600.470.510.550.500.400.740.320.620.480.49
20.320.660.520.490.310.660.520.480.380.620.480.530.380.630.480.520.350.640.490.540.320.660.510.470.350.640.500.540.550.510.390.720.330.660.510.45
30.310.670.520.480.300.680.520.460.360.640.500.540.360.640.500.530.340.650.500.530.320.670.520.460.330.660.510.520.520.570.460.650.270.690.530.41
40.340.610.460.510.330.630.490.490.380.600.460.590.330.630.510.410.340.620.460.540.340.620.470.520.380.600.460.530.340.610.500.560.340.620.470.51
50.350.640.500.520.350.640.500.520.390.620.480.550.400.620.480.560.390.620.480.560.350.640.500.510.360.630.490.550.440.540.410.610.350.640.490.51
60.350.650.510.520.350.650.510.520.370.640.490.540.370.640.500.540.340.660.510.530.340.660.510.500.350.650.500.540.360.600.470.530.330.660.500.48
70.390.590.460.600.210.730.560.390.410.580.440.600.360.630.510.400.340.620.470.540.310.630.480.480.390.600.460.520.440.560.440.650.350.610.470.53
80.300.670.520.470.300.660.520.470.390.620.480.550.400.620.480.560.370.640.490.540.290.670.520.450.350.640.500.530.420.600.470.580.300.660.520.45
90.290.680.530.460.300.670.530.470.360.640.500.540.360.640.500.540.350.650.500.520.300.670.530.450.330.660.510.520.390.620.490.560.290.670.520.44
100.400.590.450.620.210.700.540.400.400.590.450.590.380.620.500.420.380.600.450.570.330.620.480.480.390.600.470.510.460.550.420.660.340.610.480.52
110.330.650.510.500.330.650.520.500.390.620.480.550.380.630.480.540.380.620.480.550.330.650.520.480.350.640.500.540.480.560.430.640.320.650.510.48
120.330.660.660.500.320.660.520.500.360.640.500.540.360.650.510.520.350.650.500.530.340.650.520.480.340.660.510.530.470.580.460.620.300.670.530.45
130.410.570.430.630.260.690.530.470.380.600.450.580.380.600.460.540.350.620.480.560.340.620.480.530.380.600.460.550.510.510.390.710.370.600.460.55
140.360.640.490.540.370.630.490.540.380.630.480.530.370.640.500.520.390.620.480.560.360.640.490.520.360.640.490.550.540.520.390.700.370.630.480.54
150.350.640.500.530.360.640.500.540.370.640.490.550.360.640.500.520.340.650.500.530.360.640.500.510.340.650.500.540.520.560.440.650.330.660.500.50
160.400.590.460.610.210.740.570.400.410.580.450.600.360.630.510.400.380.600.460.570.330.620.480.490.400.590.470.520.500.540.430.680.350.610.470.54
170.300.670.520.460.300.670.520.460.390.620.480.550.390.620.480.560.370.630.490.550.300.670.520.450.350.640.500.530.420.600.460.580.300.670.520.41
180.280.680.530.450.280.680.530.450.360.650.500.530.350.650.500.530.320.670.510.500.280.680.540.430.340.660.510.520.370.630.490.540.270.680.540.41
(1–3) Predictor set based on day 3 environmental and behavioral variables: body condition score (BCS), daily distance moved (DDM), temperature–humidity index (THI), NH3, animal’s skin temperature from collar (ST), CO2, and fixed factors (breed (B), age (A)), (1) standard evaluation (SE); (2) nested cross-validation (CV); (3) leave-one-animal-out cross-validation (LOAOCV). (4–6) Predictor set based on day 5 environmental indicators: BCS, THI, illuminance (lux), CO2, B, A, (4) SE; (5) CV; (6) LOAOCV. (7–9) Predictor set based on Day 1: BCS, DDM, THI, NH3, ST, CO2, B, A, blood cortisol concentration, (7) SE; (8) CV; (9) LOAOCV. (10–12) Predictor set based on Day 2: BCS, DDM, THI, NH3, ST, CO2, B, A, cortisol, (10) SE; (11) CV; (12) LOAOCV. (13–15) Predictor set based on Day 4: BCS, DDM, THI, NH3, ST, CO2, B, A, cortisol, (13) SE; (14) CV; (15) LOAOCV. (16–18) Predictor set based on Day 1: BCS, DDM, THI, NH3, ST, CO2, B, A. (16) SE; (17) CV; (18) LOAOCV. LR: linear regression, PLSR: partial least squares regression, RF: random forest, XGBoost: extreme gradient boosting, MLP: multilayer perceptron, EN: elastic net, SVR: support vector regression, R2: coefficient of determination, RMSE: root mean squared error, MAE: mean absolute error, CCC: concordance correlation coefficient. Bold values highlight comparatively better-performing results (highest R2/CCC or lowest RMSE/MAE).
Table 3. Summarizes training and test errors (RMSE, MAE) and accuracy metrics (R2, CCC) for all models, along with their overfitting level and predictive performance classification.
Table 3. Summarizes training and test errors (RMSE, MAE) and accuracy metrics (R2, CCC) for all models, along with their overfitting level and predictive performance classification.
ModelPredictorsRMSE (lt/Day)MAE (lt/Day)R2CCCOverfitting Performance
TrainTestTrainTestTrainTestTrainTest
LR10.510.550.380.440.600.470.750.68MM
40.550.610.430.460.420.340.590.51LM
70.560.590.450.460.510.390.670.60MM
100.520.590.420.450.580.400.740.62MM
130.500.570.390.430.610.410.760.63MM
160.580.590.450.460.480.400.640.61LM
PLSR10.680.680.530.520.310.230.470.44LW
40.680.630.520.490.340.330.460.49LM
70.760.720.580.540.200.170.340.35LW
100.740.700.570.540.250.210.400.40LW
130.700.690.540.530.340.260.490.47LW
160.770.720.580.540.180.180.330.37LW
RF10.560.600.430.470.540.380.670.58MM
40.570.600.440.460.510.380.660.59MM
70.520.580.400.440.600.410.710.60MM
100.540.590.420.450.560.390.680.58MM
130.540.600.420.460.560.370.690.58MM
160.540.580.410.450.570.410.690.60MM
XGBoost10.610.600.470.470.460.370.560.54LM
40.620.600.480.470.440.370.550.53LM
70.520.580.400.440.600.420.700.59MM
100.610.590.480.450.470.410.560.54LM
130.610.600.470.460.460.380.560.54LM
160.520.590.400.460.600.400.700.58MM
MLP10.630.620.480.460.400.340.560.53LM
40.620.620.480.460.410.340.570.54LM
70.590.610.460.470.470.350.620.55MM
100.600.600.470.450.450.380.600.57LM
130.600.620.460.480.460.350.620.56MM
160.600.600.470.460.450.380.600.57LM
EN10.660.640.520.480.350.300.480.47LM
40.630.620.500.470.390.330.540.52LM
70.660.630.520.480.340.310.480.48LM
100.650.620.520.480.370.330.490.48LM
130.620.620.490.480.420.340.520.52LM
160.670.620.520.480.320.330.460.49LM
Ensemble10.600.600.460.460.480.380.590.54LM
40.600.600.460.460.470.380.590.56LM
70.550.580.430.440.560.410.650.57MM
100.590.590.460.450.500.400.590.54LM
130.590.600.450.460.490.380.610.55MM
160.560.580.430.450.550.410.640.57MM
Mixed
Effects
10.410.500.310.400.740.550.840.74MM
40.320.610.250.460.830.340.880.56SM
70.520.560.420.440.580.440.720.65MM
100.480.550.380.420.650.460.780.66MM
130.440.520.340.390.700.490.820.69MM
160.500.540.390.430.630.500.750.68MM
SVR10.620.620.450.480.420.320.560.49LM
40.620.620.470.470.420.340.560.51LM
70.610.610.450.470.450.350.580.53LM
100.590.610.430.470.480.350.610.52MM
130.620.610.480.460.420.340.560.53LM
160.600.610.440.350.450.350.600.54LM
L: low, M: moderate, W: weak, S: severe, LR: linear regression, PLSR: partial least squares regression, RF: random forest, XGBoost: extreme gradient boosting, MLP: multilayer perceptron, EN: elastic net, SVR: support vector regression, R2: coefficient of determination, RMSE: root mean squared error, MAE: mean absolute error, CCC: concordance correlation coefficient, (1) predictor set based on day 3 environmental and behavioral variables: body condition score (BCS), daily distance moved (DDM), temperature–humidity index (THI), NH3, animal’s skin temperature for collar (ST), CO2, and fixed factors (breed (B), age (A)), (4) predictor set based on day 5 environmental indicators: BCS, THI, illuminance (lux), CO2, B, A, (7) predictor set based on Day 1: BCS, DDM, THI, NH3, ST, CO2, B, A, blood cortisol concentration, (10) predictor set based on Day 2: BCS, DDM, THI, NH3, ST, CO2, B, A, cortisol, (13) predictor set based on Day 4: BCS, DDM, THI, NH3, ST, CO2, B, A, cortisol, (16) predictor set based on Day 1: BCS, DDM, THI, NH3, ST, CO2, B, A. Bold values indicate the best-performing configurations, where applicable.
Table 4. Best predictor subsets and model accuracy for medial canthus eye temperature, shown for standard evaluation, CV, and LOAOCV.
Table 4. Best predictor subsets and model accuracy for medial canthus eye temperature, shown for standard evaluation, CV, and LOAOCV.
PredictorsLRPLSRRFXGBoostMLPENEnsembleMixed-EffectsSVR
R2RMSE
°C
MAE
°C
CCCR2RMSE
°C
MAE
°C
CCCR2RMSE
°C
MAE
°C
CCCR2RMSE
°C
MAE
°C
CCCR2RMSE
°C
MAE
°C
CCCR2RMSE
°C
MAE
°C
CCCR2RMSE
°C
MAE
°C
CCCR2 RMSE
°C
MAE
°C
CCC R2 RMSE
°C
MAE
°C
CCC
10.600.610.400.730.570.590.400.720.690.500.340.810.620.560.380.780.570.600.420.740.570.600.400.720.660.530.360.780.620.600.390.740.620.560.380.76
20.620.540.390.760.620.540.390.760.730.450.330.830.720.460.340.820.690.480.350.810.620.540.390.760.700.470.340.820.630.550.390.770.620.540.390.75
30.620.540.390.760.620.540.390.760.730.450.320.840.730.460.330.830.670.500.360.800.620.540.390.760.700.480.340.820.620.560.390.760.610.540.390.75
40.440.770.540.630.281.030.670.510.670.520.360.790.660.530.380.800.520.630.440.690.390.700.490.560.650.540.380.760.450.760.540.640.570.590.400.73
50.460.640.460.630.460.640.460.630.710.460.330.820.700.480.350.810.670.500.360.790.450.640.460.610.680.490.360.790.570.650.460.710.510.610.450.64
60.470.640.460.630.470.640.460.630.720.460.340.830.680.490.360.810.640.530.380.780.470.640.460.620.680.500.360.790.550.660.470.700.520.610.440.63
70.470.690.460.640.410.700.490.580.690.510.340.800.600.580.390.760.660.530.360.790.400.630.700.480.590.550.370.760.490.680.460.650.660.530.360.78
80.460.640.470.620.460.640.460.620.720.460.330.830.720.460.330.830.690.480.350.820.460.640.470.620.690.490.350.800.530.610.440.680.630.520.380.77
90.470.630.460.640.470.630.460.640.730.450.330.840.730.450.330.840.700.480.340.820.470.630.460.630.680.490.350.800.520.610.440.690.610.540.390.76
100.490.720.540.660.470.660.500.660.670.520.350.800.650.540.380.790.470.670.500.680.520.620.460.680.650.540.380.780.510.710.540.680.650.530.360.79
110.590.560.420.740.590.560.420.740.710.470.340.820.700.470.340.820.660.510.380.800.590.560.420.740.680.490.360.810.620.600.460.760.640.540.390.74
120.590.560.410.740.590.560.410.740.720.460.330.830.700.480.360.810.650.520.380.790.590.550.410.740.690.480.360.810.580.620.470.730.640.540.390.73
(1–3) Predictor set based on day 5 indicators: body condition score (BCS), temperature–humidity index (THI), illuminance (lux), CO2, breed (B), age (A); (1) standard evaluation (SE); (2) nested cross-validation (CV); (3) leave-one-animal-out cross-validation (LOAOCV). (4–6) Predictor set based on day 4 variables: BCS, daily distance moved (DDM), THI, NH3, animal’s skin temperature from collar (ST), CO2, B, A; (4) SE; (5) CV; (6) LOAOCV. (7–9) Predictor set based on day 6 variables: BCS, THI, lux, CO2, B, A; (7) SE; (8) CV; (9) LOAOCV. (10–12) Predictor set based on day 1 variables: BCS, DDM, THI, NH3, ST, CO2, B, A; (10) SE; (11) CV; (12) LOAOCV. LR: linear regression, PLSR: partial least squares regression, RF: random forest, XGBoost: extreme gradient boosting, MLP: multilayer perceptron, EN: elastic net, SVR: support vector regression, R2: coefficient of determination, RMSE: root mean square error, MAE: mean absolute error, CCC: concordance correlation coefficient. Bold values highlight comparatively better-performing results.
Table 5. Train–test comparison, overfitting assessment, and predictive performance for all models estimating medial canthus eye temperature.
Table 5. Train–test comparison, overfitting assessment, and predictive performance for all models estimating medial canthus eye temperature.
ModelPredictorsRMSE (°C)MAE (°C)R2CCCOverfittingPerformance
TrainTestTrainTestTrainTestTrainTest
LR10.530.610.390.400.630.600.770.73LM
40.580.770.420.540.610.770.760.63MG
70.580.690.420.460.550.470.710.64LM
100.550.720.420.540.650.490.790.66MM
PLSR10.510.590.380.400.640.570.780.72LM
40.941.030.600.670.340.280.560.51LW
70.620.700.470.490.480.410.640.58LM
100.550.670.420.500.600.470.750.66MM
RF10.380.500.280.340.800.690.880.81MG
40.320.520.230.350.860.670.920.79SG
70.360.500.280.340.820.690.890.81MG
100.310.520.230.350.870.680.920.80SG
XGBoost10.260.560.180.380.910.620.950.78SG
40.170.530.070.380.960.660.980.80SG
70.260.580.180.390.910.600.950.76SG
100.170.540.070.380.960.650.980.79SG
MLP10.450.600.340.420.720.570.840.74MM
40.490.630.360.440.670.520.800.69MM
70.430.530.310.360.750.660.850.79MG
100.510.670.400.500.660.470.800.68MM
EN10.510.600.380.400.640.570.780.72LM
40.600.700.440.490.500.390.650.56MM
70.600.700.440.480.510.400.670.58MM
100.520.620.390.460.630.520.770.68MM
Ensemble10.350.530.260.360.840.530.900.78SM
40.310.540.230.380.890.650.920.76SG
70.360.550.270.370.840.640.890.76SG
100.290.540.220.380.890.650.930.78SG
Mixed Effects10.480.600.350.390.690.620.810.74MM
40.540.760.380.540.660.450.790.64MM
70.550.680.400.460.600.490.740.65MM
100.500.710.380.540.720.510.830.68MM
SVR10.410.560.290.380.760.620.860.76MM
40.420.590.290.400.750.570.850.73MM
70.410.530.280.360.770.660.860.78MG
100.320.530.230.360.860.650.920.79SG
L: low, M: moderate, S: severe, W: wea, LR: linear regression, PLSR: partial least squares regression, RF: random forest, XGBoost: extreme gradient boosting, MLP: multilayer perceptron, EN: elastic net, SVR: support vector regression, R2: coefficient of determination, RMSE: root mean squared error, MAE: mean absolute error, CCC: concordance correlation coefficient, (1) predictor set based on day 5 indicators: body condition score (BCS), temperature–humidity index (THI), illuminance (lux), CO2, breed (B), age (A); (4) predictor set based on day 4 variables: BCS, daily distance moved (DDM), THI, NH3, animal’s skin temperature from collar (ST), CO2, B, A; (7) predictor set based on day 6 variables: BCS, THI, lux, CO2, B, A; (10) predictor set based on day 1 variables: BCS, DDM, THI, NH3, ST, CO2, B, A. Bold values indicate the best-performing configurations, where applicable.
Table 6. Best-performing predictor combinations for respiratory rate (breaths/min) across all algorithms under standard evaluation, cross-validation, and LOAOCV.
Table 6. Best-performing predictor combinations for respiratory rate (breaths/min) across all algorithms under standard evaluation, cross-validation, and LOAOCV.
PredictorsLRPLSRRFXGBoostMLPENEnsembleMixed-EffectsSVR
R2RMSE
**
MAE
**
CCCR2RMSE
**
MAE
**
CCCR2RMSE
**
MAE
**
CCCR2RMSE
**
MAE
**
CCCR2RMSE
**
MAE
**
CCCR2RMSE
**
MAE
**
CCCR2RMSE
**
MAE
**
CCCR2 RMSE
**
MAE
**
CCC R2 RMSE
**
MAE
**
CCC
10.3615.811.00.460.1316.611.80.320.3913.59.220.520.3415.310.20.440.2615.010.30.470.2215.310.80.350.3614.19.370.460.3915.410.80.490.3514.09.310.51
20.2114.710.50.340.2114.710.50.340.3713.19.140.530.3613.29.070.550.2914.09.800.500.2114.710.50.330.3313.59.470.500.3115.511.10.470.2414.910.30.36
30.1914.510.50.330.1914.510.50.330.3613.09.200.530.3413.39.210.530.2414.210.20.440.1914.510.50.320.3113.59.520.470.2715.811.70.420.2414.710.20.36
40.2415.911.20.370.2315.210.90.370.4412.98.910.590.4314.69.640.500.2515.010.80.450.2215.210.90.360.4213.69.090.510.2815.410.80.420.3014.710.00.40
50.2114.710.50.340.2114.710.50.340.3812.99.020.540.3613.29.100.550.3013.89.850.490.2114.810.60.320.3513.29.350.520.2615.010.60.390.2414.610.30.44
60.1914.610.50.330.1914.610.50.330.3712.99.100.530.3513.19.290.540.2514.110.00.430.1914.610.50.330.3313.39.470.500.2015.110.80.350.2014.910.40.37
70.1916.411.70.300.1815.611.20.300.4612.78.720.590.4514.49.500.500.3913.59.560.520.1915.611.10.290.4313.69.110.500.2316.011.30.340.3714.19.570.45
80.1815.110.80.290.1815.110.80.290.3912.88.930.560.3713.19.040.560.3013.99.930.480.1815.110.80.280.3413.29.360.510.2115.511.10.320.2714.810.30.40
90.1614.810.80.280.1614.810.80.280.3912.78.970.550.3713.09.150.560.3013.89.800.490.1614.810.70.280.3313.39.500.500.1515.511.30.280.2614.19.980.41
100.2615.311.00.380.1416.611.60.320.4512.98.790.590.4314.59.670.500.3813.69.710.550.2215.210.90.330.4313.58.960.510.3114.910.70.430.3814.09.480.47
110.2014.910.70.320.2014.810.70.320.3812.99.000.550.3613.29.160.550.3014.19.730.510.2014.910.70.310.3413.39.370.510.2714.610.60.420.2814.210.00.48
120.1814.610.70.320.1814.610.70.320.3812.79.000.550.3413.39.440.520.2614.310.10.440.1814.710.70.300.3313.39.470.500.2214.910.90.380.2614.110.00.45
130.2117.512.30.330.2115.310.90.340.4213.39.140.540.2715.315.30.500.3014.610.20.500.1815.711.10.260.3514.09.640.470.2617.011.90.380.3913.69.240.53
140.2114.810.40.340.1915.010.70.300.3713.09.110.540.3812.912.90.550.2814.310.30.480.2114.810.50.340.3513.29.310.500.2216.612.20.360.2814.510.30.40
150.1914.510.40.330.1714.810.70.300.3413.29.450.520.3613.013.00.530.1715.210.90.330.1914.510.40.330.3413.29.370.480.1616.612.40.310.2514.310.30.43
(1–3) Predictor set based on day 2 indicators: body condition score (BCS), daily distance moved (DDM), temperature–humidity index (THI), illuminance (lux), CO2, NH3, animal’s skin temperature from collar (ST), breed (B), age (A); (1) standard evaluation (SE); (2) nested cross-validation (CV); (3) leave-one-animal-out cross-validation (LOAOCV). (4–6) Predictor set based on day 5 variables: BCS, lux, THI, NH3, CO2, B, A; (4) SE; (5) CV; (6) LOAOCV. (7–9) Predictor set based on day 6 variables: BCS, THI, lux, CO2, B, A; (7) SE; (8) CV; (9) LOAOCV. (10–12) Predictor set based on day 6 variables: BCS, THI, NH3, CO2, B, A; (10) SE; (11) CV; (12) LOAOCV. (13–15) Predictor set based on day 2 variables: BCS, DDM, z-axis movement (standing or lying), ST, daily avg speed, lux, THI, CO2, B, A; (4) SE; (5) CV; (6) LOAOCV. LR: linear regression, PLSR: partial least squares regression, RF: random forest, XGBoost: extreme gradient boosting, MLP: multilayer perceptron, EN: elastic net, SVR: support vector regression, R2: coefficient of determination, RMSE: root mean square error, MAE: mean absolute error, CCC: concordance correlation coefficient. ** breaths/min. Bold values highlight comparatively better-performing results.
Table 7. Training–test comparison, overfitting classification, and predictive performance for respiratory rate models.
Table 7. Training–test comparison, overfitting classification, and predictive performance for respiratory rate models.
ModelPredictorsRMSE (Breaths/min)MAE (Breaths/min)R2CCCOverfittingPerformance
TrainTestTrainTestTrainTestTrainTest
LR115.615.811.411.00.280.360.430.46LM
414.815.910.511.20.230.240.380.37LM
715.216.411.111.70.180.190.300.30LW
1014.715.310.611.00.230.260.380.38LM
1316.217.512.212.30.210.210.350.33LW
PLSR115.516.611.211.80.150.130.350.32LW
414.410.115.210.90.210.230.370.37LM
714.615.610.511.20.180.180.310.30LW
1015.516.611.111.60.150.140.340.32LW
1314.456.110.314.40.210.010.340.05SW
RF19.7013.66.879.290.680.380.740.52MM
411.113.17.788.950.540.430.660.57LM
711.112.87.718.740.540.460.660.60LM
1011.112.97.758.820.540.440.660.59LM
137.7213.35.579.170.820.410.840.54SM
XGBoost111.615.37.9310.20.690.340.650.44MM
412.614.68.539.640.560.430.580.50MM
712.814.48.609.500.540.450.560.50LM
1012.914.58.599.670.530.430.560.50LM
134.0915.31.6510.40.940.270.970.50SM
MLP111.815.08.4510.30.470.260.630.47MM
411.815.08.4910.80.460.250.620.45MM
713.213.59.479.560.340.390.490.52LM
1011.813.68.249.710.470.380.630.55LM
1311.914.68.5310.30.470.290.610.50MM
EN114.215.310.210.80.230.220.360.35LW
414.315.210.110.90.220.220.350.36LW
714.615.610.411.10.190.190.300.29LW
1014.415.210.410.90.200.220.320.33LW
1314.715.810.511.20.180.170.260.26LW
Ensemble111.114.17.569.370.610.360.630.46MM
412.013.68.139.090.510.420.570.51LM
712.113.68.149.110.500.430.560.50LM
1012.113.58.198.960.500.430.560.51LM
137.8414.05.739.680.870.350.830.47SM
Mixed Effects114.615.410.610.80.370.390.510.49LM
413.715.49.7010.80.340.280.480.42LM
714.316.10.311.30.280.230.390.34LW
1013.514.99.6810.70.360.310.490.43LM
1315.217.011.411.90.310.260.440.38LM
SVR111.314.07.189.310.540.350.650.51MM
413.314.78.6210.00.360.300.460.40LM
712.914.18.369.570.400.370.500.45LM
1012.814.08.399.480.400.380.500.47LM
1311.013.67.169.260.570.390.660.54MM
LR: linear regression, PLSR: partial least squares regression, RF: random forest, XGBoost: extreme gradient boosting, MLP: multilayer perceptron, EN: elastic net, SVR: support vector regression, R2: coefficient of determination, RMSE: root mean square error, MAE: mean absolute error, CCC: concordance correlation coefficient. (1) Predictor set based on day 2 indicators: body condition score (BCS), daily distance moved (DDM), temperature–humidity index (THI), illuminance (lux), CO2, NH3, animal’s skin temperature from collar (ST), breed (B), age (A); (4) predictor set based on day 5 variables: BCS, lux, THI, NH3, CO2, B, A; (7) predictor set based on day 6 variables: BCS, THI, lux, CO2, B, A; (10) predictor set based on day 6 variables: BCS, THI, NH3, CO2, B, A; (13) predictor set based on day 2 variables: BCS, DDM, z-axis movement (standing or lying), ST, daily avg speed, lux, THI, CO2, B, A. Bold values indicate the best-performing configurations, where applicable.
Table 8. Best-performing predictor subsets for estimating daily distance moved (m/day). For each algorithm, the feature combination yielding the highest test set R2 under the standard evaluation is reported, along with its corresponding nested CV and LOAOCV results.
Table 8. Best-performing predictor subsets for estimating daily distance moved (m/day). For each algorithm, the feature combination yielding the highest test set R2 under the standard evaluation is reported, along with its corresponding nested CV and LOAOCV results.
PredictorsLRPLSRRFXGBoostMLPENEnsembleMixed-EffectsSVR
R2RMSE
(m)
MAE
(m)
CCCR2RMSE
(m)
MAE
(m)
CCCR2RMSE
(m)
MAE
(m)
CCCR2RMSE
(m)
MAE
(m)
CCCR2RMSE
(m)
MAE
(m)
CCCR2RMSE
(m)
MAE
(m)
CCCR2RMSE
(m)
MAE
(m)
CCCR2 RMSE
(m)
MAE
(m)
CCC R2 RMSE
(m)
MAE
(m)
CCC
10.407045820.580.169237170.370.605984690.750.586164770.740.486865500.680.377536270.550.576245000.710.566034980.720.556325010.72
20.457125730.610.447135730.610.655684360.780.625924470.770.486875430.660.457125730.610.615984680.750.595884770.740.556504930.72
30.397596140.500.397596140.500.566455010.660.496985400.630.36779618.0.510.397586140.500.516715270.620.467115720.550.407395700.54
40.297486150.480.377496170.540.615934700.750.586144870.730.407365900.610.397386020.570.586134930.710.406885600.590.506685290.67
50.447105750.610.447105750.610.655654360.780.635834450.780.536495110.710.447105750.610.615954660.750.456735350.610.506805210.67
60.397636210.500.397616190.500.556525090.650.506875370.640.427385950.560.397636210.500.516725290.620.347626220.450.387435750.53
70.287295850.470.288036880.410.625864580.750.586164890.730.407306010.580.357646310.540.586164920.710.396675390.580.546435130.71
80.407355900.570.407345900.570.655654330.780.635824500.770.486985470.670.407355900.570.615944670.750.386195120.540.496925260.67
90.347846340.460.347836340.460.566435010.650.516855280.630.357686190.480.347836340.460.516715280.620.237175590.350.437145520.58
100.307616370.490.317856560.480.615924630.750.596054810.740.546435130.700.347726340.540.596144860.710.426875680.610.566264930.72
110.397395910.560.407385910.560.655634300.790.635774420.780.476935430.660.397395910.560.625914620.750.456755400.620.536645070.71
120.347856330.450.347856330.450.556485020.650.506965360.640.417466000.560.347846330.450.516735280.620.317736270.420.437185550.59
130.387145910.560.357666280.520.605964700.750.586134870.740.516625160.690.437125820.610.586134970.720.526235170.690.526565180.69
140.496855590.650.496865590.650.665604310.790.645774400.790.506795390.680.496855590.650.625854590.760.556175030.700.536795190.67
150.437386010.540.437386000.540.546615140.640.506985380.630.457295900.590.437376010.540.526695270.630.427255920.520.447265640.53
160.337085860.530.327806650.460.615894710.750.596054850.740.486815450.660.417285980.580.606044920.720.466325230.660.526615170.69
170.457075710.620.457055700.620.655704460.780.625864520.770.526665220.690.457075710.620.615954710.750.416024900.580.506825140.69
180.397596170.500.397586160.500.556505090.650.506905350.630.397606010.550.397596170.500.526675280.630.306865440.420.447095420.59
(1–3) Predictor set based on day 2 indicators: body condition score (BCS), temperature–humidity index (THI), illuminance (lux), CO2, NH3, breed (B), age (A); (1) standard evaluation (SE); (2) nested cross-validation (CV); (3) leave-one-animal-out cross-validation (LOAOCV). (4–6) Predictor set based on day 5 variables: BCS, lux, THI, NH3, CO2, B, A; (4) SE; (5) CV; (6) LOAOCV. (7–9) Predictor set based on day 1 variables: BCS, THI, lux, blood cortisol concentration, CO2, NH3, B, A; (7) SE; (8) CV; (9) LOAOCV. (10–12) Predictor set based on day 1 variables: BCS, THI, NH3, CO2, lux, B, A; (10) SE; (11) CV; (12) LOAOCV. (13–15) Predictor set based on day 3 variables: BCS, lux, THI, CO2, NH3, B, A; (4) SE; (5) CV; (6) LOAOCV. (16–18) Predictor set based on day 4 variables: BCS, THI, lux, blood cortisol concentration, CO2, NH3, B, A; (7) SE; (8) CV; (9) LOAOCV. LR: linear regression, PLSR: partial least squares regression, RF: random forest, XGBoost: extreme gradient boosting, MLP: multilayer perceptron, EN: elastic net, SVR: support vector regression, R2: coefficient of determination, RMSE: root mean square error, MAE: mean absolute error, CCC: concordance correlation coefficient. Bold values highlight comparatively better-performing results.
Table 9. Train–test comparison for daily distance moved (m/day) showing overfitting levels and predictive performance for each model. Overfitting categories are derived from train-to-test divergence in RMSE/MAE and reductions in R2 or CCC, using the same thresholds applied to the other prediction targets.
Table 9. Train–test comparison for daily distance moved (m/day) showing overfitting levels and predictive performance for each model. Overfitting categories are derived from train-to-test divergence in RMSE/MAE and reductions in R2 or CCC, using the same thresholds applied to the other prediction targets.
ModelPredictorsRMSE (m/Day)MAE (m/Day)R2CCCOverfittingPerformance
TrainTestTrainTestTrainTestTrainTest
LR16307045055820.520.400.680.58MM
47007485686150.400.290.580.48MM
75737294455850.370.280.540.47MM
107107615626370.390.300.560.49LM
136587145375910.480.380.640.56LM
165487084195860.420.330.590.53MM
PLSR18249236067170.300.160.510.37MW
46997495706170.460.370.610.54LM
77798036356880.340.280.440.41LM
107347855766560.400.310.550.48LM
137137665696280.430.350.600.52LM
167487806076650.390.320.510.46LM
RF14655983514740.760.600.860.75MG
44585943424710.770.600.860.75MG
74565873474660.770.620.860.75MG
104665883524660.760.610.850.75MG
134645956504740.760.600.860.75SG
164395893354700.790.610.870.75MG
XGBoost14256163144770.800.580.880.74MG
44236143164870.800.580.880.73MG
73956162934890.830.580.900.73MG
104206053134810.800.590.880.74MG
134146133064870.810.580.890.74MG
164086053094850.820.590.890.74MG
MLP15237014045610.690.460.820.66MM
46027364735900.590.400.740.61MM
76317304996010.550.400.710.58MM
105656434285130.640.540.780.70LG
135606624415160.650.510.790.69MM
165706814425450.640.480.780.66MM
EN16807535436270.480.370.650.55MM
46917385596020.470.390.630.57LM
77087645636310.440.350.600.54LM
107127725616340.430.340.600.54LM
136637125435820.510.430.670.61LM
166867285555980.470.410.640.58LM
Ensemble14876243825000.750.570.830.71MG
44856133844930.750.580.830.71MG
74836163794920.760.580.830.71MG
104946143864860.740.590.820.71MG
134806133804970.750.580.840.72MG
164756043774920.760.600.840.72MG
Mixed Effects15316034294980.660.560.790.72LG
46186884965600.540.400.680.59MM
74486673515390.630.390.730.58MM
106236874985680.530.420.680.61LM
135606234545170.620.520.760.69LM
163906323005230.720.460.810.66SM
SVR14786323415010.740.550.850.72MG
45116683715290.710.500.830.67MM
74386432985130.780.540.880.71MG
104566263194930.770.560.870.72MG
135506564165180.670.520.790.69MM
164816613465170.740.520.850.69MM
LR: linear regression, PLSR: partial least squares regression, RF: random forest, XGBoost: extreme gradient boosting, MLP: multilayer perceptron, EN: elastic net, SVR: support vector regression, R2: coefficient of determination, RMSE: root mean square error, MAE: mean absolute error, CCC: concordance correlation coefficient. (1) Predictor set based on day 2 indicators: body condition score (BCS), temperature–humidity index (THI), illuminance (lux), CO2, NH3, breed (B), age (A); (4) predictor set based on day 5 variables: BCS, lux, THI, NH3, CO2, B, A; (7) predictor set based on day 1 variables: BCS, THI, lux, blood cortisol concentration, CO2, NH3, B, A; (10) predictor set based on day 1 variables: BCS, THI, NH3, CO2, lux, B, A; (13) predictor set based on day 3 variables: BCS, lux, THI, CO2, NH3, B, A; (16) predictor set based on day 4 variables: BCS, THI, lux, blood cortisol concentration, CO2, NH3, B, A. Bold values indicate the best-performing configurations, where applicable.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nikolopoulou, M.P.; Gelasakis, A.I.; Demestichas, K.; Kalogianni, A.I.; Papada, I.; Lamprou, P.A.; Chalkos, A.; Manavis, E.; Bartzanas, T. Towards Decision Support in Precision Sheep Farming: A Data-Driven Approach Using Multimodal Sensor Data. Ruminants 2026, 6, 3. https://doi.org/10.3390/ruminants6010003

AMA Style

Nikolopoulou MP, Gelasakis AI, Demestichas K, Kalogianni AI, Papada I, Lamprou PA, Chalkos A, Manavis E, Bartzanas T. Towards Decision Support in Precision Sheep Farming: A Data-Driven Approach Using Multimodal Sensor Data. Ruminants. 2026; 6(1):3. https://doi.org/10.3390/ruminants6010003

Chicago/Turabian Style

Nikolopoulou, Maria P., Athanasios I. Gelasakis, Konstantinos Demestichas, Aphrodite I. Kalogianni, Iliana Papada, Paraskevas Athanasios Lamprou, Antonios Chalkos, Efstratios Manavis, and Thomas Bartzanas. 2026. "Towards Decision Support in Precision Sheep Farming: A Data-Driven Approach Using Multimodal Sensor Data" Ruminants 6, no. 1: 3. https://doi.org/10.3390/ruminants6010003

APA Style

Nikolopoulou, M. P., Gelasakis, A. I., Demestichas, K., Kalogianni, A. I., Papada, I., Lamprou, P. A., Chalkos, A., Manavis, E., & Bartzanas, T. (2026). Towards Decision Support in Precision Sheep Farming: A Data-Driven Approach Using Multimodal Sensor Data. Ruminants, 6(1), 3. https://doi.org/10.3390/ruminants6010003

Article Metrics

Back to TopTop