Author Contributions
Conceptualization, S.A.U., A.R.P. and M.E.B.; methodology, M.E.B., O.U. and V.T.; software, O.U., M.E.B. and V.T.; validation, M.E.B., O.U. and V.T.; formal analysis, O.U., V.T. and M.E.B.; investigation, A.R.P. and S.A.U.; resources, S.A.U.; data curation, A.R.P. and M.E.B.; writing—original draft preparation, O.U., M.E.B. and V.T.; writing—review and editing, O.U., V.T., A.R.P., S.A.U. and M.E.B.; visualization, O.U., M.E.B. and V.T.; supervision, S.A.U., M.E.B. and V.T.; project administration, M.E.B., V.T. and S.A.U.; funding acquisition, S.A.U. As Project Director, S.A.U. oversaw all aspects of this project. All authors have read and agreed to the published version of the manuscript.
Acknowledgments
The authors thank the field and technical staff at the USDA-ARS Jornada Experimental Range and the Chihuahuan Desert Rangeland Research Center, New Mexico State University, for their support with cattle handling, behavioral observation, and sensor deployment. During the preparation of this manuscript, the authors used Gemini 2.5 Pro (Google DeepMind, Mountain View, CA, USA) only for language editing, formatting assistance, and improving clarity of presentation. The authors reviewed and edited all AI-assisted outputs and take full responsibility for the content of the published article.
Figure 1.
(a) Motion Index distributions by behavior class; the black horizontal line inside each box indicates the median. WA and GR are well separated, while RE and RU produce nearly identical distributions concentrated near zero. (b) Violin plot comparing RE and RU distributions. Both central tendency measures are nearly identical between classes, highlighting the extensive overlap that prevents discrimination from MI alone.
Figure 1.
(a) Motion Index distributions by behavior class; the black horizontal line inside each box indicates the median. WA and GR are well separated, while RE and RU produce nearly identical distributions concentrated near zero. (b) Violin plot comparing RE and RU distributions. Both central tendency measures are nearly identical between classes, highlighting the extensive overlap that prevents discrimination from MI alone.
Figure 2.
Behavior transition probability matrix computed from within-session consecutive observation pairs ( transitions). Each row represents the behavior at time , and each column represents the behavior at time t; cell values give as percentages. Diagonal values indicate strong behavioral persistence (88.9 to 94.2%); off-diagonal values reveal distinct transition patterns between RU and RE despite their similar MI distributions.
Figure 2.
Behavior transition probability matrix computed from within-session consecutive observation pairs ( transitions). Each row represents the behavior at time , and each column represents the behavior at time t; cell values give as percentages. Diagonal values indicate strong behavioral persistence (88.9 to 94.2%); off-diagonal values reveal distinct transition patterns between RU and RE despite their similar MI distributions.
Figure 3.
Kernel density estimates of Normalized Activity Rate (NormAct) by behavior class. RE and RU produce heavily overlapping distributions near zero, confirming the insufficiency of the instantaneous activity rate for discriminating these classes.
Figure 3.
Kernel density estimates of Normalized Activity Rate (NormAct) by behavior class. RE and RU produce heavily overlapping distributions near zero, confirming the insufficiency of the instantaneous activity rate for discriminating these classes.
Figure 4.
Normalized confusion matrices for (a) 3-class and (b) 4-class classification. Cell values show proportions (with raw counts in parentheses). In the 4-class setting, 92.2% of RU instances are correctly classified despite MI overlap with RE.
Figure 4.
Normalized confusion matrices for (a) 3-class and (b) 4-class classification. Cell values show proportions (with raw counts in parentheses). In the 4-class setting, 92.2% of RU instances are correctly classified despite MI overlap with RE.
Figure 5.
CatBoost feature importance for (a) 3-class and (b) 4-class models. In the 4-class setting, Prev_Activity_Enc becomes the dominant feature, reflecting the model’s increased reliance on behavioral sequence information to discriminate RU from RE.
Figure 5.
CatBoost feature importance for (a) 3-class and (b) 4-class models. In the 4-class setting, Prev_Activity_Enc becomes the dominant feature, reflecting the model’s increased reliance on behavioral sequence information to discriminate RU from RE.
Figure 6.
Leave-One-Breed-Out cross-validation accuracy by holdout breed. The dashed line indicates the mean accuracy (93.8%) across all four breed holdouts. All breeds exceed 91% accuracy, demonstrating robust cross-breed generalization.
Figure 6.
Leave-One-Breed-Out cross-validation accuracy by holdout breed. The dashed line indicates the mean accuracy (93.8%) across all four breed holdouts. All breeds exceed 91% accuracy, demonstrating robust cross-breed generalization.
Table 1.
Class distribution in the cattle behavior dataset.
Table 1.
Class distribution in the cattle behavior dataset.
| Behavior | Code | Count | Proportion (%) |
|---|
| Walking | WA | 2286 | 24.8 |
| Grazing | GR | 3928 | 42.6 |
| Ruminating | RU | 976 | 10.6 |
| Resting | RE | 2032 | 22.0 |
| Total | | 9222 | 100.0 |
Table 2.
Motion Index statistics by behavioral class, reproduced from Perea et al. [
5]. Post hoc groups are based on Bonferroni-adjusted pairwise comparisons; behaviors sharing the same letter are not significantly different (
).
Table 2.
Motion Index statistics by behavioral class, reproduced from Perea et al. [
5]. Post hoc groups are based on Bonferroni-adjusted pairwise comparisons; behaviors sharing the same letter are not significantly different (
).
| Behavior | Mean MI | SD | Post-Hoc Group |
|---|
| Walking (WA) | 24.9 | 3.05 | a |
| Grazing (GR) | 13.2 | 4.23 | b |
| Ruminating (RU) | 1.3 | 2.29 | c |
| Resting (RE) | 2.3 | 3.21 | c |
Table 3.
Class distribution of the 7657 model-ready observations, after session-aware feature construction.
Table 3.
Class distribution of the 7657 model-ready observations, after session-aware feature construction.
| Behavior | Code | Count | Proportion (%) |
|---|
| Walking | WA | 1952 | 25.5 |
| Grazing | GR | 3440 | 44.9 |
| Ruminating | RU | 725 | 9.5 |
| Resting | RE | 1540 | 20.1 |
| Total | | 7657 | 100.0 |
Table 4.
Effect of the session-gap threshold on the number of detected sessions, the size of the model-ready set, and four-class performance (CatBoost, 5-fold cross-validated macro-F1).
Table 4.
Effect of the session-gap threshold on the number of detected sessions, the size of the model-ready set, and four-class performance (CatBoost, 5-fold cross-validated macro-F1).
| (min) | Sessions | Model-Ready Obs. | 4-Class Macro-F1 |
|---|
| 5 | 559 | 7160 | 0.946 |
| 10 | 375 | 7547 | 0.946 |
| 15 | 324 | 7657 | 0.943 |
| 20 | 294 | 7731 | 0.943 |
| 30 | 265 | 7788 | 0.942 |
| 60 | 214 | 7923 | 0.941 |
Table 5.
Extended model comparison using the temporal feature set. All seven classifiers, including the five used by Perea et al. [
5] with MI alone, were trained on the same temporal features. Macro-F1 is reported for 5-fold CV and hold-out test set. CatBoost is listed first as the primary model; remaining rows are sorted by 4-Cl. test macro-F1. Cl. = Class. The best result per column is shown in bold.
Table 5.
Extended model comparison using the temporal feature set. All seven classifiers, including the five used by Perea et al. [
5] with MI alone, were trained on the same temporal features. Macro-F1 is reported for 5-fold CV and hold-out test set. CatBoost is listed first as the primary model; remaining rows are sorted by 4-Cl. test macro-F1. Cl. = Class. The best result per column is shown in bold.
| Model | 3-Cl. CV F1 | 3-Cl. Test F1 | 4-Cl. CV F1 | 4-Cl. Test F1 |
|---|
| CatBoost [25] | | | | |
| MLP [28] | | | | |
| XGBoost [23] | | | | |
| SVM (RBF) [29] | | | | |
| Random Forest [30] | | | | |
| LightGBM [24] | | | | |
| Logistic Reg. [31] | | | | |
| Perea et al. [5] MI-only baseline (best of five models): |
| SVM/MLP/XG/RF | n/a | 0.927 a | n/a | 0.647 |
Table 6.
Tuned CatBoost performance on 3-class and 4-class classification. Hold-out test metrics and 10-fold stratified cross-validation results with 95% confidence intervals.
Table 6.
Tuned CatBoost performance on 3-class and 4-class classification. Hold-out test metrics and 10-fold stratified cross-validation results with 95% confidence intervals.
| Problem | Test Acc. | Test Macro-F1 | CV Acc. [95% CI] | CV Macro-F1 [95% CI] |
|---|
| 3-Class | 0.95 | 0.95 | 0.95 [0.95, 0.96] | 0.95 [0.95, 0.96] |
| 4-Class | 0.94 | 0.94 | 0.95 [0.94, 0.95] | 0.94 [0.94, 0.95] |
Table 7.
Direct comparison between the MI-only baseline (Perea et al. [
5], best-performing model) and our temporal feature approach (tuned CatBoost). Both approaches were evaluated on the same dataset of 9222 labeled observations from 24 cattle.
Table 7.
Direct comparison between the MI-only baseline (Perea et al. [
5], best-performing model) and our temporal feature approach (tuned CatBoost). Both approaches were evaluated on the same dataset of 9222 labeled observations from 24 cattle.
| Metric | Perea et al. [5] | This Work |
|---|
| 3-Class Macro-F1 | 0.927 | 0.95 [0.95, 0.96] |
| 4-Class Macro-F1 | 0.647 | 0.94 [0.94, 0.95] |
| Feature Set | MI only (1 feature) | NormAct + 5 temporal (6 features) |
Table 8.
Per-class metrics for the 4-class tuned CatBoost model. Precision and recall are point estimates from the hold-out test set; F1-score 95% confidence intervals are from 100-iteration bootstrap resampling.
Table 8.
Per-class metrics for the 4-class tuned CatBoost model. Precision and recall are point estimates from the hold-out test set; F1-score 95% confidence intervals are from 100-iteration bootstrap resampling.
| Class | Support | Precision | Recall | F1-Score [95% CI] |
|---|
| GR (Grazing) | 1032 | 0.94 | 0.95 | 0.95 [0.94, 0.96] |
| RE (Resting) | 462 | 0.93 | 0.92 | 0.92 [0.91, 0.93] |
| RU (Ruminating) | 218 | 0.98 | 0.92 | 0.95 [0.93, 0.96] |
| WA (Walking) | 586 | 0.95 | 0.95 | 0.95 [0.94, 0.96] |
Table 9.
Relative feature importance (%) for the 4-class problem across three classifiers (gain-based importance, computed on the same 70/30 split). The previous-behavior feature ranks first and NormAct second for all three.
Table 9.
Relative feature importance (%) for the 4-class problem across three classifiers (gain-based importance, computed on the same 70/30 split). The previous-behavior feature ranks first and NormAct second for all three.
| Model | Prev_Act. | NormAct | Lag1 | Lag2 | Roll_3 | Roll_5 |
|---|
| CatBoost | 43.0 | 31.2 | 8.1 | 5.2 | 4.8 | 7.6 |
| XGBoost | 82.4 | 8.9 | 1.3 | 1.0 | 4.5 | 1.9 |
| Random Forest | 41.0 | 24.5 | 8.8 | 2.5 | 11.6 | 11.5 |
Table 10.
Feature ablation for 4-class classification (stratified 5-fold cross-validation, CatBoost). The previous-behavior feature uses the true previous label. Per-class values are F1-scores.
Table 10.
Feature ablation for 4-class classification (stratified 5-fold cross-validation, CatBoost). The previous-behavior feature uses the true previous label. Per-class values are F1-scores.
| Feature Set | Macro-F1 | GR | RE | RU | WA |
|---|
| NormAct only | 0.63 | 0.90 | 0.71 | 0.01 | 0.90 |
| NormAct + lags and rolling means | 0.71 | 0.92 | 0.69 | 0.29 | 0.93 |
| Full model (with previous label) | 0.94 | 0.95 | 0.92 | 0.95 | 0.95 |
Table 11.
Deployable inference under leave-animals-out cross-validation (4-class). The teacher-forced row uses the true previous label and is the upper bound; every other row uses no ground-truth label at inference. K is the NormAct-only warm-up length, in observations. Per-class values are F1-scores; RU rec. is ruminating recall.
Table 11.
Deployable inference under leave-animals-out cross-validation (4-class). The teacher-forced row uses the true previous label and is the upper bound; every other row uses no ground-truth label at inference. K is the NormAct-only warm-up length, in observations. Per-class values are F1-scores; RU rec. is ruminating recall.
| Inference Regime | Macro-F1 | GR | RE | RU | WA | RU Rec. |
|---|
| Teacher-forced (true previous label) | 0.94 | 0.94 | 0.92 | 0.95 | 0.94 | 0.94 |
| Temporal-only (no previous label) | 0.69 | 0.91 | 0.68 | 0.26 | 0.92 | 0.19 |
| Closed-loop, no warm-up | 0.58 | 0.84 | 0.69 | 0.00 | 0.80 | 0.00 |
| Closed-loop, warm-up | 0.67 | 0.89 | 0.67 | 0.22 | 0.90 | 0.15 |
| Closed-loop, warm-up | 0.68 | 0.90 | 0.70 | 0.24 | 0.90 | 0.16 |
| Closed-loop, warm-up | 0.68 | 0.90 | 0.69 | 0.24 | 0.91 | 0.16 |
| Sequence decoding (Viterbi) | 0.71 | 0.91 | 0.58 | 0.43 | 0.92 | 0.47 |
Table 12.
Leave-One-Breed-Out cross-validation results for 4-class classification. Each breed was held out as the test set while the remaining three breeds served as training data.
Table 12.
Leave-One-Breed-Out cross-validation results for 4-class classification. Each breed was held out as the test set while the remaining three breeds served as training data.
| Holdout Breed | n (Test) | Accuracy (%) | Macro-F1 |
|---|
| Raramuri Criollo (RC) | 2189 | 91.50 | 0.92 |
| Angus–Hereford (AH) | 1986 | 91.44 | 0.92 |
| Brahman (BH) | 1688 | 98.05 | 0.98 |
| Brangus (BR) | 1794 | 94.37 | 0.94 |
| Mean ± SD | | 93.84 ± 3.12 | 0.94 ± 0.03 |
Table 13.
Comparison with selected cattle behavior classification studies. #Cl. = number of behavioral classes; n = number of animals. The performance column reports macro-F1 where available and overall accuracy otherwise, as indicated by the Metric column. Since prior studies report heterogeneous evaluation metrics, direct numerical comparison should be interpreted cautiously.
Table 13.
Comparison with selected cattle behavior classification studies. #Cl. = number of behavioral classes; n = number of animals. The performance column reports macro-F1 where available and overall accuracy otherwise, as indicated by the Metric column. Since prior studies report heterogeneous evaluation metrics, direct numerical comparison should be interpreted cautiously.
| Study | Sensor | n | #Cl. | Perf. (%) | Metric | Model |
|---|
| Martiskainen et al. [7] | Tri-axial accel. | 30 | 8 | 78 a | Prec. | SVM |
| González et al. [9] | Accel. + GPS | 58 | 5 | 86 to 91 | Acc. | Decision tree |
| Sprinkle et al. [12] | Tri-axial accel. | 48 | 3 | 12 to 30 b | Err. | Random forest |
| Li et al. [13] | 6-axis IMU | 10 | 6 | 94 c | F1 | XGBoost |
| Perea et al. [5] | LoRaWAN MI | 24 | 3 | 92.7 | F1 | RF/XGBoost |
| Perea et al. [5] | LoRaWAN MI | 24 | 4 | 64.7 d | F1 | RF/XGBoost |
| This work | LoRaWAN MI | 24 | 3 | 95 | F1 | CatBoost |
| This work | LoRaWAN MI | 24 | 4 | 94 | F1 | CatBoost |