Detection of Estrus in Dairy Cows Based on CE-YOLO

Zhao, Junjie; Zhang, Huijing; Liu, Lei

doi:10.3390/electronics15061269

Open AccessArticle

Detection of Estrus in Dairy Cows Based on CE-YOLO

by

Junjie Zhao

¹

,

Huijing Zhang

¹ and

Lei Liu

^1,2,*

¹

College of Computer Science and Technology, Xinjiang University, Urumqi 830046, China

²

Joint Laboratory of SilkRoad Multilingual Cognitive Computing International Collaboration, Urumqi 830046, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(6), 1269; https://doi.org/10.3390/electronics15061269

Submission received: 30 January 2026 / Revised: 4 March 2026 / Accepted: 11 March 2026 / Published: 18 March 2026

(This article belongs to the Special Issue Advances in Imaging Technologies for Precision Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Accurate estrus detection is essential for dairy farm productivity, yet traditional manual and wearable methods remain limited by high labor costs, delayed responses, and animal stress. To address these challenges, we propose CE-YOLO, a lightweight YOLOv11n-based vision model tailored for edge deployment, which detects mounting behavior by integrating a Channel-Aware Downsampling (CA-Down) module to preserve small-scale features, a SimSPPF module for efficient contextual fusion, and a DySample module for dynamic spatial alignment. Experiments on a curated estrus behavior dataset demonstrate that CE-YOLO achieves a precision of 94.9% and an mAP50 of 98.2%, significantly outperforming the baseline by 3.9% and 4.6% respectively. These results validate the model as an efficient, non-intrusive solution for real-time estrus monitoring, strongly supporting the advancement of smart livestock management.

Keywords:

CE-YOLO; estrus detection; dairy cows; YOLOv11n; edge intelligence; behavior recognition

1. Introduction

Reproductive efficiency in dairy cows directly affects farm productivity, operating costs, and the stability of milk supply, making it a critical issue in modern large-scale dairy farming. Studies have shown that the estrus period is relatively short and that cows exhibit considerable individual differences in behavioral and physiological responses. Failure to perform timely mating or artificial insemination during the optimal estrus window—which typically lasts only 12 to 18 h—can prolong the calving interval and disrupt the lactation cycle, thereby reducing overall milk yield and economic benefits [1]. Therefore, accurate and timely estrus detection is essential for improving reproductive efficiency and maintaining milk production performance.

In traditional farm management, estrus detection mainly relies on manual observation of behaviors such as standing to be mounted or increased activity. However, this method is highly dependent on observer experience and is difficult to implement effectively in large herds, at night, or under poor lighting conditions, often resulting in missed or incorrect detections, with failure rates in large herds frequently reported between 20% and 30% [2]. With the development of sensor and wearable technologies, contact-based estrus monitoring methods based on vaginal temperature, activity level, and posture changes have been proposed [3,4,5]. Although these approaches enable continuous monitoring, they are associated with high costs, complex maintenance, and potential discomfort or stress to cows, limiting their large-scale application [6].

Recent advances in computer vision and deep learning have provided alternative solutions for estrus detection. Early studies employed traditional image processing techniques, such as background modeling and optical flow, to identify cow mounting behavior [7,8,9], but these methods are sensitive to illumination changes, occlusion, and complex backgrounds. Subsequently, deep learning-based approaches, especially convolutional neural networks, have significantly improved detection accuracy through automatic feature learning [10,11]. Furthermore, recent advancements in moving object segmentation emphasize the necessity of learning temporal distribution and spatial correlation to handle dynamic behaviors in video sequences [12]. Inspired by these universal object segmentation principles, our task also requires capturing specific dynamic behavior patterns across consecutive frames while maintaining strict real-time performance for practical edge applications. Among them, YOLO-based object detection models have been widely adopted in real-time monitoring scenarios due to their end-to-end architecture and high inference efficiency [13]. Although previous studies have applied YOLO models to cow mounting or estrus detection with promising results [14,15], lightweight variants still exhibit performance degradation in dense and partially occluded farm environments. As shown in our comparative experiments, YOLOv11n achieves an mAP50 of 0.913 and a recall of 0.802 on the real farm dataset, indicating missed detections under complex conditions. These results suggest that existing lightweight models struggle to maintain robust detection performance when targets are small, distant, or partially occluded.

With the development of smart farming and edge computing, estrus monitoring systems are required to achieve both high detection accuracy and real-time performance while maintaining low deployment costs on resource-constrained platforms such as edge devices and smart cameras [16]. Consequently, balancing detection accuracy and computational efficiency in complex farm environments has become a key research focus in visual estrus monitoring.

The main contributions of this study are summarized as follows:

(1): A vision-based cow estrus behavior recognition framework, CE-YOLO, is proposed for efficient edge device deployment, balancing detection accuracy with computational speed.
(2): A CA-Down module is designed to effectively mitigate the loss of small-scale estrus behavior features during repeated downsampling operations.
(3): The network architecture is optimized by integrating the SimSPPF and DySample modules. This specific integration effectively enhances multi-scale feature representation and dynamic spatial alignment, specifically addressing farm environment challenges such as severe occlusion, dense herds, and complex backgrounds.
(4): Extensive experiments are conducted on real-world farm datasets, validating the effectiveness, real-time performance, and practical applicability of the proposed method compared to state-of-the-art baselines.

2. Materials and Methods

2.1. Dataset

The dataset was derived from a public repository [17], where ground-truth estrus events were initially validated by veterinarians. To ensure reproducibility and avoid ambiguity, YOLO format annotations strictly target mounting behavior (actively mounting or standing to be mounted), deliberately excluding secondary signs like restlessness. Due to the low frequency of this event, each image contains exactly one bounding box.

We expanded the original 400 images via random rotation, multi-scale scaling, and contrast enhancement into a curated set of 2000 images. The dataset was partitioned into training and test subsets (8:2) using a video-wise splitting strategy to prevent temporal data leakage. Notably, to evaluate real-world generalization, the dataset incorporates diverse conditions and challenging scenarios (Figure 1), such as heavy occlusion in dense herds and low-light ambiguity.

2.2. Improved Method

In this study, we propose CE-YOLO, a dedicated estrus behavior detection model built upon the lightweight YOLOv11n framework [18]. To address the specific challenges of dairy farm environments, such as small target features and complex background interference, we have reconstructed the network architecture by integrating channel-aware attention and dynamic upsampling mechanisms.

The overall architecture of CE-YOLO is illustrated in Figure 2. The network pipeline consists of three main components: (1) the Backbone, responsible for feature extraction, where the proposed CA-Down modules are embedded to preserve fine-grained details during downsampling; (2) the Neck, utilizing the SimSPPF and DySample modules to enhance multi-scale feature fusion and semantic alignment; and (3) the Head, performing the final object classification and bounding box regression. By jointly optimizing these components, CE-YOLO achieves a balance between detection accuracy and inference speed.

2.2.1. Channel Aware Downsampling Module

Conventional downsampling operations often lead to the degradation of small-object representations as spatial resolution decreases with increasing network depth [19]. To address this limitation, a dual-path attention-guided downsampling module (CA-Down) is proposed by integrating feature splitting strategies with channel-wise attention mechanisms.

Specifically, the input feature map is first divided into two parallel branches along the channel dimension. Each branch is projected through a 1 × 1 convolution to reduce redundancy while preserving discriminative information:

S_{1} = {Conv}_{1 \times 1} (S),

(1)

S_{2} = {Conv}_{1 \times 1} (S) .

(2)

To simultaneously capture global contextual cues and fine-grained local details, average pooling and max pooling are applied to the two branches, respectively:

S_{3} = A v g P o o l 2 d (S_{1}),

(3)

S_{4} = M a x P o o l 2 d (S_{2}) .

(4)

Theoretically, the average pooling branch aggregates global spatial information to preserve background context, while the max pooling branch extracts highly salient local features, which is particularly crucial for identifying the distinct physical overlaps occurring during a mounting event. By combining these two paths, the module effectively captures a comprehensive feature representation.

The pooled features are concatenated and fed into a squeeze-and-excitation (SE) block to adaptively recalibrate channel responses:

S_{5} = S E (C o n c a t (S_{3}, S_{4})) .

(5)

Within the SE module, global average pooling is employed to generate channel-wise descriptors

z_{c}

, followed by two fully connected layers and nonlinear activations to estimate channel importance weights based on the aggregate vector z:

z_{c} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} Y_{c} (i, j),

(6)

m_{c} = σ (W_{2} \cdot δ (W_{1} \cdot z)) .

(7)

where δ and σ denote the ReLU and Sigmoid activation functions, respectively. The final refined feature map is obtained by channel-wise reweighting:

Y_{c}^{'} = m_{c} \cdot Y_{c} .

(8)

By combining dual-path pooling strategies with channel attention, the proposed module effectively enhances feature preservation during downsampling, particularly for small and weak targets.

2.2.2. SimSPPF Spatial Pyramid Pooling Module

During the feature fusion stage, the conventional SPPF module widely used in YOLO-based detectors is replaced with the SimSPPF module to improve computational efficiency while preserving multi-scale contextual information.

Different from the original SPPF structure, SimSPPF introduces an initial SimConv layer to reduce channel dimensionality. Specifically, the input feature map is first compressed to half of its original channels through SimConv, which effectively lowers the subsequent computational cost:

S_{1} = S i m C o n v (S) .

(9)

On the compressed feature representation, parallel max-pooling operations with different kernel sizes are applied to capture multi-scale spatial context. The pooled features are concatenated with the original compressed feature and then fused by a convolutional layer:

Y = f_{c o n v} ([S_{1}, P_{5} (S_{1}), P_{9} (S_{1}), P_{13} (S_{1})]) .

(10)

where

S_{1}

denotes the channel-reduced feature map, and P_k represents a max-pooling operation with a kernel size of k × k.

Moreover, to further simplify the nonlinear transformation and enhance training stability, the SiLU activation function employed in the standard SPPF module is replaced by the ReLU activation function in SimSPPF:

S i L U (s) = \frac{s}{1 + e^{- s}},

(11)

R e L U (s) = \max (0, s) .

(12)

By combining channel compression, multi-scale pooling, and lightweight activation functions, the SimSPPF module achieves efficient feature fusion with reduced computational overhead.

2.2.3. DySample Dynamic Upsampling Module

During the feature fusion stage, conventional upsampling methods rely on fixed interpolation rules, which may lead to spatial misalignment and loss of fine details. To alleviate this issue, a DySample dynamic upsampling module is introduced to improve the quality of upsampled feature representations.

DySample formulates the upsampling process as a content-aware dynamic resampling operation. Given an input feature map:

S \in R^{C \times H \times W} .

(13)

The module first predicts spatial sampling offsets through a lightweight linear mapping:

Δ p = F (S) .

(14)

where Δp denotes the two-dimensional sampling offsets at each spatial location. Based on the regular sampling grid p, the dynamic sampling positions are constructed by incorporating the predicted offsets, and feature resampling is performed via bilinear interpolation:

S_{1} = GridSample (S, p + Δ p) .

(15)

The resulting output feature map is given by Equation (16), where s denotes the upsampling factor:

S_{1} \in R^{C \times sH \times sW} .

(16)

By adaptively adjusting sampling positions according to feature content, DySample enhances spatial alignment and preserves structural details during upsampling, while introducing only limited computational overhead.

2.3. Experimental Environment

All experiments were performed on a workstation server equipped with an NVIDIA GeForce RTX 4090 GPU, running the Ubuntu 22.04 operating system. The deep learning models were implemented using the PyTorch 2.4.1 framework with Python 3.12.0, and all experiments were developed and managed in the PyCharm 2024.2 integrated development environment.

For model training, the stochastic gradient descent (SGD) optimizer was adopted. The initial learning rate was set to 0.01, with a momentum coefficient of 0.937 and a weight decay of 0.0005. The network was trained for 300 epochs using a batch size of 16 to ensure stable convergence and fair comparison across different models.

Furthermore, building upon the YOLOv11 framework, CE-YOLO adopts an anchor-free architecture, which eliminates the need for manual anchor box tuning and enhances robustness to varying target scales. During training, the model utilized Complete Intersection over Union loss for bounding box regression and Binary Cross Entropy loss for classification.

2.4. Evaluation Metrics

To comprehensively evaluate the performance of different detection models in terms of both accuracy and efficiency, multiple quantitative metrics were employed, including Precision (P), Recall (R), F1-score, mAP₅₀, mAP_50–95, and frames per second (FPS) [20]. In these metrics, TP, FP, and FN represent the number of true positives, false positives, and false negatives, respectively.

Precision (P) measures the proportion of correctly identified estrous cows or mounting behaviors among all predicted positive samples, reflecting the model’s ability to reduce false detections:

P = \frac{T P}{T P + F P} .

(17)

Recall (R) indicates the proportion of correctly detected positive samples relative to the total number of ground-truth positives, describing the model’s capability to avoid missed detections:

R = \frac{T P}{T P + F N} .

(18)

Average Precision (AP) is used to assess the detection performance of a single class by integrating the precision-recall curve. Specifically, mAP₅₀ denotes the mean average precision at an IoU threshold of 0.5, while mAP_50–95 represents the averaged AP over IoU thresholds ranging from 0.5 to 0.95, providing a more comprehensive evaluation under varying localization strictness:

A P = \int_{0}^{1} P (r) d r,

(19)

m A P = \frac{1}{C} \sum_{t = 1}^{C} A P_{t} .

(20)

3. Results

3.1. Ablation Study

To systematically assess the contribution of each module, we conducted ablation experiments on the YOLOv11n baseline, with quantitative results detailed in Table 1. The baseline model initially achieved an mAP50 of 93.6%. Upon introducing individual components, the DySample module yielded the most significant gain, boosting mAP50 by 2.7% (to 96.3%), which validates its effectiveness in feature alignment under complex backgrounds. Meanwhile, the CA-Down module enhanced the model’s discriminative ability, increasing Precision by 1.0% (from 91.0% to 92.0%) by effectively preserving small-target details during downsampling. Although the SimSPPF module was primarily introduced to optimize computational flow—increasing FPS from 141 to 149 in single-module tests—its combination with other modules ensured the final model remained lightweight. Ultimately, integrating all three modules into the CE-YOLO framework resulted in peak performance, achieving an mAP50 of 98.2% and a Precision of 94.9%. These findings demonstrate that the proposed improvements mutually reinforce each other to effectively handle complex estrus scenarios.

3.2. Comparative Study

Table 2 presents a comprehensive comparison between CE-YOLO and state-of-the-art detectors, including YOLOv5n, YOLOv8n, and YOLOv11n, under identical experimental conditions. In terms of detection accuracy, CE-YOLO outperforms the widely used YOLOv8n by 3.3% in mAP50 and 2.5% in Precision, while the improvement over the earlier YOLOv5n is even more substantial at 6.6% in mAP50. Crucially, our model achieves these superior metrics while maintaining a high inference speed of 139 FPS, making it highly suitable for real-time edge deployment. This balance is further visually confirmed by the radar chart in Figure 3, where the performance polygon of CE-YOLO (represented by the green line) fully encompasses those of the competing models, indicating a robust balance between Precision, Recall, and F1-score across diverse test scenarios.

To further compare the overall performance of different models, a radar chart is presented in Figure 3, showing that the proposed model maintains balanced and strong performance across precision, recall, F1-score, and mAP-related metrics.

3.3. Visual Comparison of Detection Performance

Figure 4 illustrates the detection outcomes of different models across three representative challenging scenarios: dense herds, occlusion during mounting, and low-light conditions. As observed in the visual comparison, baseline models frequently exhibit missed detections (False Negatives) or incorrect bounding boxes when cows are heavily occluded or blend into the background. In contrast, CE-YOLO consistently identifies both the estrous cow and the mounting behavior with high confidence. This robust performance verifies that the integrated DySample module effectively maintains feature alignment and structural integrity, even when target features are partially obscured.

To further investigate the underlying reasons for the performance improvements, we utilized the Grad-CAM technique to generate attention heatmaps, as shown in Figure 5. The visualization reveals distinct differences in feature focus: baseline models often have dispersed attention, erroneously focusing on irrelevant background textures such as fences, ground debris, or non-estrous cows. Conversely, CE-YOLO exhibits a highly concentrated attention mechanism that accurately targets the key interaction regions characteristic of mounting behavior. This precise focus demonstrates that the proposed CA-Down module successfully suppresses environmental noise and enhances the model’s discriminative capacity for specific estrus features.

Despite its robust overall performance, CE-YOLO occasionally encounters failure cases. Analysis of missed detections (False Negatives) reveals that the model struggles primarily in extreme scenarios: (1) when the mounting behavior occurs at the extreme far edge of the camera’s field of view, resulting in target sizes too small for effective feature extraction; and (2) during severe weather conditions or when thick mud completely obscures the defining anatomical features of the cows. Addressing these extreme edge cases through multi-camera cross-validation remains a focus for future optimization.

4. Discussion

This study presents an enhanced YOLOv11n-based framework for estrus detection in dairy cows. In this field, high sensitivity (recall) is critical, as missed breeding windows directly result in prolonged calving intervals and significant economic losses. By achieving a recall of 92.6%, the proposed CE-YOLO model reliably minimizes these missed detections while maintaining high inference speed. Beyond outperforming traditional computer vision methods in accuracy and robustness, the system offers clear economic advantages for large-scale operations.

For a typical 5000-head dairy farm, continuous manual monitoring represents a significant recurring labor expense. Assuming one full-time worker with an annual salary of approximately USD 50,000, the cost remains substantial. Smart collar systems require herd-wide deployment and may exceed USD 200,000 under large-scale installation. In contrast, a vision-based solution leveraging existing surveillance infrastructure mainly involves camera installation. Assuming 40–50 cameras at approximately USD 250–300 per unit, the total investment is estimated at USD 10,000–15,000. These figures are provided as illustrative estimates.

Furthermore, the model’s lightweight architecture allows for seamless integration into edge devices and smart cameras, enabling real-time, on-site estrus detection and behavioral assessment. Despite its strong deployability, the current study is limited by a relatively small dataset collected from a single farm environment. Future work will focus on expanding the dataset to encompass diverse lighting conditions and different herds to validate cross-farm robustness. We will also explore the system’s integration with Internet of Things (IoT) platforms for comprehensive remote herd management. Ultimately, this research provides a scalable and practical foundation for advancing edge-based visual detection in modern livestock husbandry.

5. Conclusions

In this paper, we presented a lightweight and accurate estrus behavior detection model, CE-YOLO, designed to address the challenges of occlusion, small targets, and complex backgrounds in dairy farming. By reconstructing the network architecture with the CA-Down, SimSPPF, and DySample modules, the proposed model achieves a precision of 94.9% and an mAP50 of 98.2%, outperforming state-of-the-art lightweight detectors. Importantly, the model maintains real-time performance exceeding 130 FPS while preserving high detection accuracy, effectively addressing the research gap identified in the Introduction. Compared with the baseline model, CE-YOLO improves recall and mAP50 while maintaining lightweight characteristics, demonstrating a balanced optimization between robustness and efficiency.

These findings indicate that CE-YOLO serves as a robust technological foundation for non-contact, automated estrus monitoring, significantly contributing to the advancement of precision livestock farming.

Author Contributions

Conceptualization, J.Z.; methodology, J.Z.; software, J.Z.; validation, J.Z. and H.Z.; formal analysis, J.Z.; investigation, J.Z.; resources, J.Z.; data curation, J.Z.; writing—original draft preparation, J.Z.; writing—review and editing, J.Z. and H.Z.; visualization, J.Z.; supervision, L.L.; project administration, L.L.; funding acquisition, L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the second batch of Tianchi Talents (Leading Talents) project in Xinjiang Uygur Autonomous Region (No. 20231223).

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: [https://github.com/XingshiXu/ZhengWang_YOLO-TransT (accessed on 20 June 2025)].

Acknowledgments

The authors would like to thank the support from the School of Computer Science and Technology, Xinjiang University.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

YOLO	You Only Look Once
CE-YOLO	Cow Estrus YOLO
CA-Down	Channel-Aware Downsampling
SimSPPF	Simplified Spatial Pyramid Pooling Fast
DySample	Dynamic Upsampling
mAP	Mean Average Precision
FPS	Frames Per Second
IoU	Intersection over Union
SE	Squeeze and Excitation

References

Zhang, L.; Wang, J.; An, L.; En, H. How a major dairy-producing country accelerates toward a strong dairy industry. Econ. Ref. News 2024, 10, 007. [Google Scholar] [CrossRef]
Lu, X.; Cui, K.; Qi, H. Research progress on key technologies for intelligent dairy cow farming. Heilongjiang Anim. Husb. Vet. Med. 2025, 9, 23–31. [Google Scholar]
Li, H. Design of a Dairy Cow Step-Counting and Posture Detection System Based on LoRaWAN and Multi-Sensor. Master’s Thesis, Inner Mongolia University, Hohhot, China, 2018. [Google Scholar]
Wang, J.; Zhang, Y.; Bell, M.; Liu, G. Potential of an activity index combining acceleration and location for automated estrus detection in dairy cows. Inf. Process. Agric. 2022, 9, 288–299. [Google Scholar] [CrossRef]
Chang, Q.; Feng, Y.; Bao, J.; Zhao, G.; Guo, P.; Sun, W.; Han, W.; Yuan, P.; Bao, H.; Ren, R.; et al. Research on integrated vaginal implant sensors and wireless detection system for dairy cows. J. Inner Mongolia Univ. 2025; in press. [Google Scholar]
Nong, J. Impact of Automated Estrus Monitoring Technology on the Breeding Efficiency of Large-Scale Dairy Farms in China. Ph.D. Thesis, Chinese Academy of Agricultural Sciences, Beijing, China, 2023. [Google Scholar]
Wang, Z. Research on Key Technologies for Estrus Recognition in Dairy Cows Based on Temperature Distribution Features. Ph.D. Thesis, Inner Mongolia Agricultural University, Hohhot, China, 2022. [Google Scholar]
Guo, Y.; Zhang, Z.; He, D.; Niu, J.; Tan, Y. Detection of cow mounting behavior using region geometry and optical flow characteristics. Comput. Electron. Agric. 2019, 163, 104828. [Google Scholar] [CrossRef]
Lodkaew, T.; Pasupa, K.; Loo, C.K. CowXNet: An automated cow estrus detection system. Expert Syst. Appl. 2023, 211, 118550. [Google Scholar] [CrossRef]
Wang, R.; Gao, Z.; Li, Q.; Zhao, C.; Gao, R.; Zhang, H.; Li, S.; Feng, L. Detection method of cow estrus behavior in natural scenes based on improved YOLOv5. Agriculture 2022, 12, 1339. [Google Scholar] [CrossRef]
Wang, Z.; Xu, X.; Hua, Z.; Shang, Y.; Duan, Y.; Song, H. Lightweight recognition of dairy cow estrus behavior based on YOLOv5n combined with channel pruning algorithm. Trans. Chin. Soc. Agric. Eng. 2022, 38, 202–210. [Google Scholar]
Dong, G.; Zhao, C.; Pan, X.; Basu, A. Learning temporal distribution and spatial correlation toward universal moving object segmentation. IEEE Trans. Image Process. 2024, 33, 2447–2461. [Google Scholar] [CrossRef] [PubMed]
Vijayakumar, A.; Vairavasundaram, S. YOLO-based object detection models: A review and its applications. Multimed. Tools Appl. 2024, 83, 83535–83574. [Google Scholar] [CrossRef]
Wang, Z.; Hua, Z.; Wen, Y.; Zhang, S.; Xu, X.; Song, H. E-YOLO: Recognition of estrus cow based on improved YOLOv8n model. Expert Syst. Appl. 2024, 238, 122212. [Google Scholar] [CrossRef]
Li, C.; Ma, J.; Cao, S.; Guo, L. RFR-YOLO-based recognition method for dairy cow behavior in farming environments. Agriculture 2025, 15, 1952. [Google Scholar] [CrossRef]
Zheng, Y.; Zhang, J.; Yuan, B. Application of reproductive efficiency improvement technologies in large-scale dairy farms. Heilongjiang Anim. Reprod. 2025, 33, 54–59. [Google Scholar]
Wang, Z.; Deng, H.; Zhang, S.; Xu, X.; Wen, Y.; Song, H. Detection and tracking of oestrus dairy cows based on improved YOLOv8n and TransT models. Biosyst. Eng. 2025, 252, 61–76. [Google Scholar] [CrossRef]
Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
Zhang, M.; Ye, S.; Zhao, S.; Wang, W.; Xie, C. Pear object detection in complex orchard environment based on improved YOLO11. Symmetry 2025, 17, 255. [Google Scholar] [CrossRef]
Dang, L.; Li, Z.; Li, S.; Qiao, B.; Zhou, L. Effective plug-and-play lightweight modules for YOLO series models. J. Supercomput. 2025, 81, 493. [Google Scholar] [CrossRef]

Figure 1. (A) High Congregation of Cows with Similar Coat Patterns Leading to Homogeneous Background. (B) Mounting Behavior of Cows Obscured by Other Cows. (C) Small Target Size Caused by Cows Being Far from the Camera. (D) Reduced Camera Clarity at Night Due to Insufficient Illumination. Note: Timestamps are camera-generated; from left to right, characters denote Year, Month, Day, and Weekday (applies to all figures).

Figure 2. CE-YOLO Model Architecture Diagram.

Figure 3. Radar chart of detection performance for different models.

Figure 4. Comparison of detection results of different models in the same scene.

Figure 5. Comparison of Heatmaps for Detection Results of Different Models. Note: Warm colors (yellow/red) indicate high model attention, cool colors (blue/green) low attention.

Table 1. Ablation Experiments.

Baseline	CA-Down	SimSPPF	DySample	P	R	mAP₅₀	F1	FPS
√				91.0%	87.0%	93.6%	0.890	141
√	√			92.0%	86.5%	94.2%	0.891	138
√		√		93.7%	85.1%	92.5%	0.892	149
√			√	92.1%	86.3%	96.3%	0.891	134
√		√	√	93.5%	88.1%	93.5%	0.908	143
√	√	√		94.2%	88.2%	95.6%	0.911	140
√	√		√	93.2%	88.3%	94.6%	0.907	136
√	√	√	√	94.9%	92.6%	98.2%	0.937	139

Note: Checkmark (√) denotes the inclusion of the corresponding module.

Table 2. Comparative Experiments.

Model	P	P-Oestrus	P-Mounting	R	mAP₅₀	mAP_50–95	F1	Params (M)	FLOPs (G)
YOLOv5n	84.3%	78.6%	90.0%	85.7%	91.6%	60.5%	0.850	2.19	5.9
YOLOv8n	92.4%	89.4%	95.4%	88.7%	94.9%	62.7%	0.905	2.69	6.9
YOLOv11n	91.0%	90.5%	91.7%	87.0%	93.6%	65.5%	0.890	2.59	6.4
Ours	94.9%	91.7%	98.1%	92.6%	98.2%	66.8%	0.937	2.59	6.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, J.; Zhang, H.; Liu, L. Detection of Estrus in Dairy Cows Based on CE-YOLO. Electronics 2026, 15, 1269. https://doi.org/10.3390/electronics15061269

AMA Style

Zhao J, Zhang H, Liu L. Detection of Estrus in Dairy Cows Based on CE-YOLO. Electronics. 2026; 15(6):1269. https://doi.org/10.3390/electronics15061269

Chicago/Turabian Style

Zhao, Junjie, Huijing Zhang, and Lei Liu. 2026. "Detection of Estrus in Dairy Cows Based on CE-YOLO" Electronics 15, no. 6: 1269. https://doi.org/10.3390/electronics15061269

APA Style

Zhao, J., Zhang, H., & Liu, L. (2026). Detection of Estrus in Dairy Cows Based on CE-YOLO. Electronics, 15(6), 1269. https://doi.org/10.3390/electronics15061269

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Detection of Estrus in Dairy Cows Based on CE-YOLO

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. Improved Method

2.2.1. Channel Aware Downsampling Module

2.2.2. SimSPPF Spatial Pyramid Pooling Module

2.2.3. DySample Dynamic Upsampling Module

2.3. Experimental Environment

2.4. Evaluation Metrics

3. Results

3.1. Ablation Study

3.2. Comparative Study

3.3. Visual Comparison of Detection Performance

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI