2.1. Sensors and Data Acquisition
The Nedap CowControl
® system (Nedap Livestock Management, Groenlo, The Netherlands) was employed, utilizing SmartTag intelligent collars (codes FERP4, 3rd generation). Each SmartTag incorporates a triaxial accelerometer that transmits data at 434 MHz with a power of <1 mW e.r.p., features an IP67 protection rating, and has an operational temperature range from −10 °C to 50 °C. The battery-powered (3V DC) RFID transmitters continuously record feeding, rumination, resting, locomotion, and inactivity, storing data locally for 24 h before transmission to the CowControl antenna (outdoor range ≥ 100 m). Data were processed by specialized algorithms on the cloud-based platform to detect reproductive, health, and behavioral patterns, generating automated alerts and analyses for both individual animals and the herd [
14]. These data were used to derive day/night activity indices and 24- to 36 h time series for circadian analysis, as detailed in
Section 2.3,
Section 2.4 and
Section 2.5 (
Figure 1 and
Table 1).
Although the cohort comprised 10 lactating Holstein cows, we recorded hourly time series for 12 months, generating 87,600 temporal measurements that constitute repeated observations nested within individual animals, not independent replicates. The experimental unit was the cow (n = 10) and all statistical inference was performed recognizing the hierarchical structure and temporal dependence of the data. Performance metrics were computed per cow and then summarized across cows using robust statistics that preserve within-cow temporal dependence. Hourly measurements were not treated as independent replicates, thereby avoiding temporal pseudo replication, we employed the measurement system (
Figure 2). We recognize the importance of distinguishing between experimental units and subsamples and avoiding temporal pseudo replication [
18,
19].
This was an observational study with a predictive objective: to forecast stress-state transitions one hour ahead, in alignment with the precision livestock farming (PLF) paradigm, which emphasizes continuous, sensor-based monitoring to enable timely management decisions [
17,
20,
21]. To model the hourly temporal sequence of behavioral data, we employed a Long Short-Term Memory (LSTM) network, as its gated memory architecture is uniquely suited to capturing both short-term fluctuations and long-range dependencies inherent in circadian rhythms, a capability that has been successfully demonstrated in prior applications using cattle-sensor data [
22,
23].
Our sample size of ten cows is consistent with other robust sensor-based studies in precision livestock farming, which often rely on small but longitudinally monitored cohorts (e.g., six cows in Vázquez-Diosdado et al., 2015 [
24]; seven cows in Hernández et al. [
10]). Given our predictive focus, traditional power analysis based on group differences was redefined in terms of predictive performance, through Monte Carlo simulations (n = 1000) using the observed class distribution (normal 72%, mild 21%, high 7%) and leave-one-cow-out evaluation to account for within-cow correlation. Under these conditions, the available sample (3650 cow-days; 2430 train/1220 test) provides adequate (≥0.80–0.85) power to detect an absolute improvement of ΔF1-macro ≥ 0.05 over strong baselines (logistic regression and Random Forest) at α = 0.05. Furthermore, within-animal stability of circadian patterns across weeks was statistically confirmed through repeated-measures ANOVA (F = 23.4,
p < 0.001). Importantly, our results are framed as predictive not causal outcomes [
24], and any extrapolation beyond this specific herd and environmental context should be approached with caution.
2.2. Environmental Conditions
Air temperature and relative humidity were recorded with a combined temperature/relative-humidity probe (Thies Clima, model NHTFB; Adolf Thies GmbH, Gotinga, Germany) operating in parallel with the smart collars. The station was housed in a multi-plate radiation shield, installed 2 m above ground and ≥10 m from obstructions; the logger stored 1 min samples that were aggregated to hourly values. From this data we computed the Temperature–Humidity Index (THI), a standard metric for heat stress in cattle [
4], using THI = (1.8 × T + 32) − [(0.55 − 0.0055 × RH) × (1.8 × T − 26)], where T is air temperature (°C) and RH is relative humidity (%). Thresholds were defined as normal THI ≤ 68 (thermal comfort), mild stress 68 < THI ≤ 72 (onset of stress), and high stress THI > 72 (severe stress). These categories showed strong association with behavioral changes (Pearson r = 0.79,
p < 0.001). Baselines were adjusted using the mean/maximum daily THI, consecutive hours above the threshold, and the area above the threshold to predict Vet+ and stress labels. The results quantify the predictive value of exposure duration, keeping THI out of the main LSTM inputs.
The study site at 2950 m elevation in Tulcán, Ecuador (0°49′ N) experiences a bi-seasonal Andean climate characteristic of equatorial highlands: a rainy season (October–May) with 100–150 mm monthly precipitation and a dry season (June–September) with 20–50 mm monthly precipitation. Day length remains relatively constant year-round (11.5–12.5 h) due to equatorial location, minimizing photoperiod-induced circadian variation compared to temperate regions.
Veterinary Clinical Indicators and Stress Classification Framework
The binary indicator Vet± denotes the presence (Vet+ = 1) or absence (Vet− = 0) of clinical findings identified during standardized twice-daily behavioral evaluations (06:00 and 18:00 h). Vet+ assessments capture observable manifestations of stress, including shivering/tremors, respiratory distress (rate > 60 breaths/min), abnormal posture (hunched/withdrawn), depressed activity or anorexia, cold stress hypersalivation, and locomotor abnormalities (stiffness, lameness on cold surfaces). Each Vet+ record is time-stamped and animal-specific, serving as an independent validation criterion and contributing to triangulated stress labeling. To prevent data leakage, Vet± is explicitly excluded from the LSTM feature set:
VAS (Veterinary Anomaly Score). A continuous cow-standardized index derived from non-clinical signals (e.g., behavioral rhythms), independent of Vett.
MY (milk yield). Daily milk production per cow (kg or L/day).
THI± load. Accumulated thermal load relative to the comfort zone, expressed as hour· unit deviations. Cold load is defined as THI < THIcold; heat load as THI > THIheat (when reported).
Vet × THI interaction. Captures the interaction between Vet± (or its lagged form) and THI± load, centered and scaled for modeling purposes.
Stress classification (model output variable). The LSTM model predicts, for each cow-day observation unit, one of three ordinal stress categories:
Normal: No Vet+ detection, circadian deviation dt < T1, and absence of clinically significant thermal challenge (THI within comfort zone).
Mild: Presence of any one of the following: transient or mild Vet+ sign, threshold exceedance T1 ≤ dt < T2, or short-term cold exposure (THI < cold-sensitivity threshold).
High: Persistent or severe Vet+ findings and/or critical circadian disruption dt ≥ T2, and/or prolonged cold stress (THI below threshold sustained ≥ X h).
2.6. Detection of the Dominant Circadian Frequency
The main spectral peak was identified in Equation (4):
where
k* is the frequency index of the component with the greatest magnitude |
S[
k]|. The FFT only evaluates discrete frequencies (called bins) separated by Δf = 1/N h
−1. With N = 36, Δf ≈ 0.0278 h
−1; therefore, the sampled frequencies are 0, 0.0278, 0.0556, … h
−1. The theoretical circadian rhythm is f
c ≈ 1/24 h
−1 ≈ 0.0417 h
−1, which falls between two bins: n = 1 (0.0278) and n = 2 (0.0556). To capture it correctly, a narrow bandpass filter centered on f
c is applied, which includes those adjacent bins. Furthermore, harmonics of the 24 h cycle sometimes appear (e.g., 12 h = 2f
c, 8 h = 3f
c). They are only included in the filtering when they show clear peaks in |
S[
n]| and improve the reconstruction of the day–night pattern; otherwise, they are omitted to avoid introducing noise. The circadian component (≈0.042 h
−1, corresponding to 24 h) lies between k = 1 and k = 2; we also evaluated harmonics at ≈12 h (k ≈ 3) and ≈8 h (k ≈ 4). A frequency bin k* was deemed a clear peak only if both conditions held: (i) prominence ≥ 2 × MAD in a local ± 2-bin neighborhood, and (ii) SNR ≥ 6 dB, where SNR = 20 × log
10(|S[k*]|/local median). Additionally, temporal stability had to be satisfied (occurrence in ≥60% of overlapping windows for the same cow).
2.8. Change Detection and Stress Labeling
To detect circadian cycle changes, we reconstructed the 24 h circadian component
using the bandpass + IFFT procedure (Equation (5)) and then split
into two 12 h subseries—A (day, 06:00–18:00) and B (night, 18:00–06:00)—to quantify amplitude, phase, coherence, and the A–B distance root mean square (RMS). Equation (6) quantifies the day–night contrast by computing the A–B distance on these circadian reconstructions:
where
n is the number of samples per cycle, and
,
(with 1 h sampling and 12 h offset,
N = 12) are the reconstructed and z-normalized signals per cow. To quantify day–night contrast, we computed the Euclidean distance
between consecutive 12 h subseries (Equation (6)) where
is dimensionless. Statistical thresholds were estimated per cow from the distribution of
during event-free periods: Threshold 1 =
+
s (normal → mild) and Threshold 2 =
+ 2
s (mild → high), where
and
s represent the mean and standard deviation of {
} under basal conditions. Following [
12], we use
strictly as a circadian change detector to form candidate hours/days;
is not used as a model feature.
A window was labeled as mild/high stress when (i)
exceeded the corresponding threshold and, additionally, at least one of the following conditions was met within a temporal tolerance (e.g., ±1 h): (a) A veterinary annotation of anomalous behavior. (b) The THI was used as a contextual rule within the operational labeling (see
Appendix C). Therefore, analyses based on the THI cannot be considered independent validation. (c) A circadian deviation from the expected pattern (phase shift and/or amplitude drop). When no additional condition was met, the window remained classified as normal (see
Table 2).
Table 2.
Operational criteria for stress levels.
Table 2.
Operational criteria for stress levels.
| Parameter | Normal (Eustress) | Mild (Adaptive Stress) | High (Distress) | Reference |
|---|
| Circadian Amplitude | ≥0.60 | 0.45–0.60 | <0.45 | Wagner et al., 2021 [12]; Refinetti et al., 2007 [26] |
| Physiological Definition | Homeostatic equilibrium with optimal adaptive responses | HPA axis activation with intact compensatory mechanisms | HPA axis overload with compromised adaptive capacity | Szabo et al., 2012 [27]; Moberg, 2000 [28] |
| Euclidean Distance Threshold | dt ≤ x̄ + s | x̄ + s < dt ≤ x̄ + 2s | dt > x̄ + 2s | This study |
| THI Range | ≤68 | 68–72 | >72 | Bouraoui et al., 2002 [29]; Bernabucci et al., 2010 [30] |
| Circadian Coherence | ≥80% | 65–80% | <65% | Piccione et al., 2013 [31] |
| Rumination Time | 420–480 min/day (7–8 h) | 315–357 min/day (15–25% ↓) | <252 min/day (>40% ↓) | Beauchemin, 2018 [32]; Schirmann et al., 2011 [33] |
| Feeding Pattern | 3–5 h/day, broadly distributed | Temporal irregularities, maintained intake | Highly irregular or prolonged fasting | DeVries et al., 2003 [34]; Munksgaard et al., 2005. [35] |
| Rest Periods | Stable 22:00–05:00 | Increased nocturnal activity (22:00–02:00) | Hyperactivity during typical rest times | Tucker et al., 2003 [36]; Ito et al., 2009. [37] |
| Expected Cortisol Range | 1–2 ng/mL (baseline) | 2–4 ng/mL (elevated–adaptive) | >4 ng/mL (pathological) | Mormède et al., 2007 [38]; Ralph & Tilbrook, 2016. [39] |
| Heart Rate Variation | Individual baseline range | 10–15% above baseline | >20% above baseline | von Borell et al., 2007 [40]; Hagen et al., 2005. [41] |
| Veterinary Validation | Normal behavior for breed/age/lactation stage | Subtle behavioral changes: productive functions maintained | Clinical signs of distress; compromised welfare | Welfare Quality, 2009 [42] |
| Intervention Required | Routine management | Enhanced monitoring; preventive measures | Immediate intervention; stress mitigation protocols | FAWC, 2009. [43] |
| Clinical Significance | Optimal welfare state | Early warning; intervention window | Critical state; welfare compromise risk | Fraser, 2008. [44] |
Unit of analysis and prediction level. We classified stress at the cow-day level, generating approximately 3650 observations (10 cows × 365 days). The analysis was performed using STFT with 36 h windows and a 1 h offset to calculate and DRMS(At, Bt); from each daily step (24 h), summaries per cow (amplitude, phase, and coherence) were obtained that served for labeling and training. The model produces individual predictions and, to contextualize at the herd level, we aggregated these predictions as the daily proportion of cows classified as stressed, recognizing that on any given day one or more cows may exhibit stress.
Statistical considerations for hierarchical data: All performance metrics account for the nested structure of hourly observations within cows. We used cluster-bootstrap resampling (resampling by cow) to compute confidence intervals that properly reflect uncertainty at the cow level rather than artificially inflating precision by treating temporal measurements as independent.
Operational stress labeling and validation. The final stress labels used for model training and evaluation were obtained by triangulating circadian distance thresholds (
dt), veterinary observations, and THI criteria. Veterinary observations and THI served as external validation sources rather than model inputs. The agreement between validation methods is summarized in
Table 2.
To operationalize the labeling process, we applied a conservative decision rule based on the Euclidean distance dt:
If dt ≤ T1 ⇒ normal.
If T1 < dt ≤ T2 and (Vett OR THIt OR circadian deviation) ⇒ mild.
If dt > T2 and (Vett OR THIt OR circadian deviation) ⇒ high.
Flags do not propagate beyond the ±1 h window; aggregation of contiguous labeled hours into episodes is used only for descriptive purposes in Results. Three pre-specified ablation experiments were designed to isolate the contribution of the clinical context (veterinary observations, “Vet,” 2×/day) and the thermal context (Temperature–Humidity Index, THI) on the operational labeling: (A) No Vet: The clinical signal was omitted from the labeling, leaving circadian + THI information. (B) No THI: The THI was omitted from the labeling, leaving circadian + Vet information. (C) No Vet and No THI: Only circadian thresholds were used. In all cases, the feature extraction pipeline (including FFT/STFT) and the validation scheme remained invariant. The corresponding results are presented in
Section 3.10 (Ablation Results). The THI-based flag (normal ≤ 68, mild 68–72, high > 72), computed hourly from on-farm temperature and relative humidity (T/RH) data using the standard formula, served exclusively as an external validation signal—not as model input. It was used within a ±1 h temporal window alongside veterinary annotations and circadian deviation metrics to triangulate stress labels, ensuring biological relevance without influencing algorithmic classification. This approach aligns with the contextual validation framework established in
Table 2, where THI serves as a supporting criterion rather than an independent classification rule.
Milk yield (MY). Milk production was recorded at each milking using certified milk meters and aggregated to kilograms per cow per day. To ensure temporal alignment with clinical assessments, MY was used in lagged form (MYt−1). If same-day MY was included, it was truncated to measurements preceding the clinical examination (e.g., only the first milking if the exam occurred in the evening). Feed intake. Individual feed intake was measured using electronic feeders equipped with RFID and load cells, capturing start/end time and mass variation per visit. Outlier cleaning was applied, and data were aggregated into hourly and daily time series. When only group-level data were available, intake per cow was estimated as [offer − refusal] (kg DM) divided by pen size, adjusted for the dry matter percentage (%DM) of the total mixed ration (TMR).
Leakage control (Vet, VAS, and milk). When Vett is the target variable, all potentially contemporaneous predictors were treated causally to prevent information leakage:
(i) VASt was computed without Vett and using only data preceding the clinical examination on day t.
(ii) MY followed the lag/truncation protocol described above.
(iii) Vet± was used exclusively in lagged form (Vet±t−1) when included as a predictor.
These rules were implemented alongside forward chaining and LOCO (leave-one-cow-out) cross-validation, with scalers and thresholds fitted strictly within fold-specific training data. From these time series, the feeding pattern was characterized in two steps: (i) circadian rhythmicity (≈24 h) using continuous wavelet transform (Morlet) to detect periodicities within the 23.5–24.5 h range; and (ii) diurnal traits using a 14-day hurdle-type Generalized Additive Model (GAM) to extract number and timing of the major peak, peak width/height, trough, nocturnal proportion, and probabilities of intake initiation (day-to-day consistency), following Bus et al. (2023) [
25].
The agreement between our labels (normal/mild/high) and the reference standard (veterinary annotation/THI validation/circadian deviation) was assessed using Cohen’s kappa (with linear and quadratic weighting for ordinal categories) and 95% bootstrap confidence intervals. Additionally, a confusion matrix and performance metrics are reported.
Stress classification employed a triangulated approach combining behavioral, environmental, and veterinary indicators to ensure robust labeling without invasive physiological measures. This methodology aligns with animal welfare-centered research principles and the practical limitations of on-farm implementation. Episodes denote contiguous days aggregated for visualization only (no temporal carry-over of labels).
Primary Validation: Analysis of Circadian Deviation and Veterinary Validation Protocol
Individual thresholds (T
1 =
+ s, T
2 =
+ 2s) were set to flag deviations exceeding approximately 84% and 97.5% of the baseline distribution, respectively, following the empirical rule for normal distributions. These correspond to the upper tail beyond 1 and 2 standard deviations from the mean, capturing rare events that indicate circadian disruption. In approximately normal distributions [
12].
Standardized behavioral assessments were conducted twice daily (06:00, 18:00) using modified welfare evaluation protocols. Veterinary observations included the following:
Deviations in respiratory rate (>60 breaths/min indicating stress).
Postural changes and social withdrawal.
Irregularities in feeding behavior.
Locomotion patterns and gait assessment.
Stress labels were cross validated with production indicators:
- -
Milk production reduction > 10% within 24–48 h after the stress event.
- -
Feed intake decrease > 15% compared to individual baseline values.
- -
Rumination time reduction consistent with stress classifications (normal: 420–480 min/day, mild: 315–357 min/day, high: <252 min/day).
To map clinical observations to the levels in
Table 2, each finding reported by the veterinarian was binarized into normal (0), mild (1), or high (2) based on intensity and persistence, following Wagner’s (2021) [
12] clinical taxonomy and field protocols. In summary: mild, transient signs (present in only one of the two daily rounds) or isolated signs were classified as mild; marked, sustained signs (present in both rounds of the day or with evident functional repercussions) were classified as high; absence of signs or physiological variations within the expected range were classified as normal. When multiple findings coexist, the highest level is assigned. The agreement between the validation methods is reported in results in
Table A6.