Next Article in Journal
Assessing the Impact of Digital Economic Development on the Resilience of China’s Agricultural Industry Chain
Previous Article in Journal
Comparing Cotton ET Data from a Satellite Platform, In Situ Sensor, and Soil Water Balance Method in Arizona
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

YOLOv11n-CGSD: Lightweight Detection of Dairy Cow Body Temperature from Infrared Thermography Images in Complex Barn Environments

1
College of Engineering, Heilongjiang Bayi Agricultural University, Daqing 163319, China
2
Daqing Normal University, Daqing 163712, China
3
M3-BIORES Research Group, Division of Animal and Human Health Engineering, Faculty of Bioscience Engineering, Katholieke Universiteit Leuven (KU LEUVEN), Kasteelpark Arenberg 30, Heverlee, 3001 Leuven, Belgium
*
Authors to whom correspondence should be addressed.
Agriculture 2026, 16(2), 229; https://doi.org/10.3390/agriculture16020229
Submission received: 27 November 2025 / Revised: 5 January 2026 / Accepted: 14 January 2026 / Published: 15 January 2026
(This article belongs to the Section Farm Animal Production)

Simple Summary

This study proposes a lightweight automatic body temperature detection method for dairy cows that integrates deep learning with infrared thermography (IRT). The method effectively alleviates the decline in detection efficiency caused by complex background interference, low resolution, poor contrast, weak structural representation of regions of interest, and insufficient texture information in IRT images. Compared with RGB-based detection approaches, it is robust to illumination variations while maintaining real-time performance and ease of deployment, making it suitable for application in dairy cow barn production environments and enabling non-contact automatic acquisition of dairy cow body temperature.

Abstract

Dairy cow body temperature is a key physiological indicator that reflects metabolic level, immune status, and environmental stress responses, and it has been widely used for early disease recognition. Infrared thermography (IRT), as a non-contact imaging technique capable of remotely acquiring the surface radiation temperature distribution of animals, is regarded as a powerful alternative to traditional temperature measurement methods. Under practical cowshed conditions, IRT images of dairy cows are easily affected by complex background interference and generally suffer from low resolution, poor contrast, indistinct boundaries, weak structural perception, and insufficient texture information, which lead to significant degradation in target detection and temperature extraction performance. To address these issues, a lightweight detection model named YOLOv11n-CGSD is proposed for dairy cow IRT images, aiming to improve the accuracy and robustness of region of interest (ROI) detection and body temperature extraction under complex background conditions. At the architectural level, a C3Ghost lightweight module based on the Ghost concept is first constructed to reduce redundant feature extraction while lowering computational cost and enhancing the network capability for preserving fine-grained features during feature propagation. Subsequently, a space-to-depth convolution module is introduced to perform spatial rearrangement of feature maps and achieve channel compression via non-strided convolution, thereby improving the sensitivity of the model to local temperature variations and structural details. Finally, a dynamic sampling mechanism is embedded in the neck of the network, where the upsampling and scale alignment processes are adaptively driven by feature content, enhancing the model response to boundary temperature changes and weak-texture regions. Experimental results indicate that the YOLOv11n-CGSD model can effectively shift attention from irrelevant background regions to ROI contour boundaries and increase attention coverage within the ROI. Under complex IRT conditions, the model achieves P, R, and mAP50 values of 89.11%, 86.80%, and 91.94%, which represent improvements of 3.11%, 5.14%, and 4.08%, respectively, compared with the baseline model. Using Tmax as the temperature extraction parameter, the maximum error (Max. Error) and mean error (MAE. Error) in the lower udder region are reduced by 33.3% and 25.7%, respectively, while in the around the anus region, the Max. Error and MAE. Error are reduced by 87.5% and 95.0%, respectively. These findings demonstrate that, under complex backgrounds and low-quality IRT imaging conditions, the proposed model achieves lightweight and high-performance detection for both lower udder (LU) and around the anus (AA) regions and provides a methodological reference and technical support for non-contact body temperature measurement of dairy cows in practical cowshed production environments.

1. Introduction

Rectal temperature (Tr) in dairy cows is a key physiological indicator reflecting metabolic activity, immune status, and responses to environmental stress, and it is widely used for early disease detection, reproductive monitoring, and assessment of stress conditions assessment [1,2]. Studies have shown that when infection, inflammation, metabolic disorders, or heat and cold stress occur (e.g., mastitis, respiratory disease), Tr often undergoes measurable changes [3,4]; at the same time, it is closely associated with oestrous behaviour, pregnancy progression, and energy metabolism [5,6]. In healthy dairy cows, the mean Tr is 37.17 ± 0.07 °C, while the udder skin surface temperature is 37.16 ± 0.06 °C, and these values are nearly identical, indicating that body surface temperature can be regarded as a reliable surrogate for Tr [7].
Traditional measurement of body temperature in dairy cows mainly relies on rectal thermometers [8], intra-body implanted sensors [9] and wearable or contact-based skin surface sensors [10]. Although rectal thermometers can ensure high measurement accuracy, they are extremely labour-intensive and have poor timeliness [11,12]. Intra-body implanted sensors, such as vaginal temperature probes and rumen boluses, can provide continuous temperature monitoring [13]; however, they are greatly affected by factors such as drinking, feeding, and rumen fermentation and they are relatively expensive and complicated to maintain, making them unsuitable for routine monitoring [14]. Wearable or contact-based skin sensors, such as tympanic or ear canal thermometers [15,16], are constrained by battery life, skin contamination, and wearing stability [17], and it is difficult for them to operate reliably over long periods in open barn environments [18]. Moreover, all of the above methods are contact-based temperature measurements, which are prone to inducing stress responses in animals and are therefore not conducive to welfare-oriented farming [19,20]. Consequently, there is an urgent need to develop non-contact, stress-free, and herd-level body temperature monitoring technology.
Infrared thermography (IRT), as a non-contact tool for remotely acquiring images of the surface temperature distribution of animals [21], can cover multiple individuals within a short period of time [22]. Compared with traditional contact-based temperature measurement, IRT does not require restraining or handling animals [23], avoids the need for instrument disinfection and reduces the risk of cross-infection, and can be embedded into automatic monitoring systems within routine management procedures [24]. Therefore, it is considered a powerful alternative to conventional temperature measurement techniques [25].
IRT-based temperature measurement requires localisation of the target anatomical region of interest (ROI) [26], such as the udder surface [27], teat [28], or periocular region [29], followed by extraction of representative temperature parameters within that ROI [30]. Although manual delineation of ROIs can achieve high precision, it is extremely time-consuming and heavily dependent on the operator’s experience, making it unsuitable for continuous monitoring in large herds [31,32]. Algorithm-based automatic ROI annotation has been shown to exhibit a high level of agreement with manual annotation, with correlation coefficients exceeding 0.98 and both sensitivity and specificity above 93% [33], indicating that automatic ROI detection can essentially replace manual delineation in terms of accuracy and provides a reliable foundation for automatic body temperature detection.
At present, among intelligent algorithms used for animal body temperature and health monitoring, deep learning methods have become the dominant technical approach [34,35]. Shu et al. [36] developed an improved UNet network to perform semantic segmentation of multiple regions, including the eyes, nose and ears, in infrared images of dairy cows, achieving a mean intersection over union (IoU) of 80.76%, which provides pixel-level support for subsequent temperature quantification within predefined regions. Guo et al. [37] established a three-level YOLOv3-Tiny-based platform for individual identification and body temperature monitoring, which automatically detects the orbital region of dairy cows and extracts the maximum periocular temperature; the detection accuracy for orbital thermal images exceeds 99%, with an average prediction score of about 96%. For poultry surface temperature extraction, Yan et al. [38] used YOLOv8s to automatically extract head ROIs in laying hens, achieving an extraction accuracy of 99.38%. On this basis, the automatically extracted facial thermograms were further used as inputs to an improved Res2Net model for core temperature prediction, yielding a mean absolute error of 0.153 °C, which indicates that automatic ROI extraction combined with deep regression can enable accurate temperature estimation. Shen et al. [39] developed a YOLOv3-like model with Darknet53 as the backbone to extract head and leg ROIs in white-feather broilers, reporting a precision of 96.77%. A mean relative error of 0.29% was further reported for temperature extraction and prediction, suggesting that ROI localisation quality is directly associated with the error in temperature quantification. For key-site temperature extraction in dairy cows, Wang et al. [40] employed YOLOv5 to localise and extract udder ROIs, achieving a mean accuracy of 96.1%. Zhang et al. [41] proposed EFMYOLOv3 combined with greyscale histogram and bilateral filtering enhancement to extract ocular and udder ROIs, improving the mean accuracy to 96.8%; the enhancement strategy was intended to suppress background interference and increase foreground–background contrast, thereby improving localisation stability.
However, there remains room for improvement in temperature extraction accuracy. Although segmentation- and detection-based pipelines achieve favourable ROI localisation under relatively controlled conditions, they are often vulnerable to degraded thermogram quality and complex scenes. This is mainly because, when infrared thermography images suffer from low resolution, poor contrast, blurred boundaries, and background thermal interference or occlusion, ROI detection accuracy often decreases markedly, which in turn compromises the accuracy of body temperature parameter extraction [42]. At the same time, studies on body temperature extraction in dairy cows have generally been conducted on single animals, with virtually no interfering targets present in the IRT images [43,44]. Therefore, this study targets thermally crowded barn scenarios and conducts a lightweight redesign of the YOLO feature hierarchy, such that under a strict computational budget, weak thermal-texture representation, information loss during early downsampling, and boundary misalignment in multi-scale fusion can be jointly mitigated, thereby improving the overall reliability of ROI localisation and temperature extraction.
To address the above issues, the main contributions of this study are as follows:
  • A task-driven lightweight redesign of YOLOv11n for thermally crowded barn IRT ROI localisation. C3Ghost [45] reduces channel redundancy while retaining weak thermal cues, SPD-Conv [46] mitigates early downsampling information loss to preserve thermal gradients and boundaries, and DySample [47] performs content-adaptive resampling to alleviate boundary drift during multi-scale fusion.
  • A dual-level evaluation linking ROI localisation to temperature-extraction reliability. Besides detection metrics, Tmax and Tavg temperature errors and efficiency indicators (Params, GFLOPs, FPS) are reported, together with ablations that isolate and validate the effects of C3Ghost, SPD-Conv, and DySample.
The overall workflow framework of this study is shown in Figure 1.

2. Materials and Analysis

2.1. Data Collection

Data collection for this study was conducted in two phases at two dairy farms: from October 2023 to October 2024 at the 8511 Dairy Farm in Mishan City, Heilongjiang Province, China. All measurements in this study were obtained using dedicated veterinary and environmental monitoring instruments. Surface temperature images were acquired with a handheld IRT camera. Tr was determined using a veterinary mercury-in-glass thermometer. Ambient conditions, including air temperature and relative humidity in the barn, were monitored using a portable temperature–humidity (T-RH) meter. The specific details of the equipment are as shown in Table 1.
The data collection procedure is as shown in Figure 2.
First, during the feeding period, secure the dairy cow at the neck rail to restrict free movement, maintaining a relatively stable body posture for an extended period. Following restraint, allow approximately 10 min for the cow to fully acclimatise to the restrained state, thereby minimising temperature deviations caused by stress responses (Figure 3).
During the body temperature measurement stage, the tail end of the mercury-in-glass thermometer was tied to a metal clip, and the thermometer was inserted approximately 3–5 cm into the rectum of the cow; the clip was fixed to the hair at the tail base to prevent displacement or breakage. At the same time, the T-RH meter was placed beside the cow, and both instruments were left in place for approximately 5 min until the ambient T-RH and Tr readings had stabilized. The ambient values were then used to calibrate the IRT device parameters, and the stabilized Tr was read simultaneously with acquisition of the corresponding IRT images (Figure 4).
Throughout the data collection process, a designated individual was responsible for on-site recording. For each dairy cow, detailed documentation must include its ear tag number, corresponding IRT image number, ambient T-RH at the time of measurement, and Tr value.

2.2. Data Annotation

This study correlated surface temperature data with Tr data according to ear tag numbers and employed Spearman correlation analysis to assess the relationship between Tr and various surface temperature metrics. Our preliminary research analysis indicates that surface temperatures in the LU and AA exhibit the strongest correlation with Tr [48]. Consequently, LU and AA are selected as key regions of interest (ROIs) for body temperature extraction in subsequent modelling.
Prior to annotation, raw IRT images were screened for quality to exclude samples with severe occlusion, motion blur, or insufficient resolution due to excessive imaging distance. Only images featuring clear individual contours, identifiable ROI areas, and complete temperature distributions were retained for annotation.
During annotation, as it proved challenging to accurately identify anatomical locations solely from IRT images, all annotations strictly referenced corresponding anatomical positions in the RGB images. ROI boundaries were adjusted within IRT images to align as closely as possible with actual tissue contours, avoiding straddling different anatomical structures or incorporating excessive background areas. This approach enhances the reliability of subsequent temperature statistics and model training outcomes (Figure 5). To standardise annotation practices, pilot annotations were conducted and reviewed by experts prior to formal annotation, resulting in the establishment of operational guidelines.

2.3. Data Set Structure

As this study employed manual photography, it was difficult to keep the camera strictly horizontal to the ground during image capture, thereby making target recognition more difficult (Figure 6a,b). Concurrently, the IRT images exhibit low contrast and insufficient detail representation, further complicating target recognition. For these reasons, this study employed data augmentation to expand the dataset, thereby enhancing the model’s feature learning capabilities and improving its robustness. Specific operations included random rotation (Figure 6c), horizontal or vertical flipping (Figure 6d), slight geometric distortion (Figure 6e,f), contrast adjustment (Figure 6g), mosaic processing (Figure 6h), and the addition of image noise (Figure 6i). In total, 180 dairy cows were included in this study. Among them, 144 cows were used to construct the detection dataset for model training and evaluation, while the remaining 36 cows were used for the temperature extraction experiment. Images from the same cow were not shared across splits. A total of 3575 IRT images were used in this experiment and were divided into training, validation, and test sets in a ratio of 6:2:2. The LU category comprised 1654 images, while the AA category contained 1921 images.
Randomised contrast reduction aims to enhance the recognition capability of the model for images with inherently low overall contrast, such as those from IRT; random rotation simulates inevitable deviations in shooting angle inherent in handheld photography; mild distortion mimics ROI deformation caused by actual postural variations in cattle; while mosaic and salt-and-pepper noise weaken locally irrelevant information, guiding the model to focus more on key features that aid ROI differentiation, thereby reducing reliance on ineffective features.

2.4. Construction of a Lightweight Model Based on YOLOv11n-CGSD

Compared with prior lightweight YOLO adaptations for infrared or small-object detection, this study proposes a task-driven lightweight redesign of the YOLO feature hierarchy for thermally confounded multi-cow barn scenes. Under a strict computational budget, the redesign explicitly targets three failure modes that dominate IRT ROI localisation: weak thermal-texture representation, information loss during early downsampling, and boundary drift in multi-scale fusion. As illustrated in Figure 7, YOLOv11n is enhanced by three corresponding modifications:
The details of the improvement plan are as follows:
  • C3Ghost-based lightweight reconstruction. The C3K2 module is replaced with C3Ghost to reduce channel redundancy while retaining low-contrast thermal textures and blurred boundary cues.
  • SPD-Conv downsampling for information preservation. Strided downsampling is replaced with SPD-Conv to reduce resolution with minimal pixel discard, thereby preserving local thermal gradients and structural cues for small ROIs such as LU and AA.
  • DySample upsampling for boundary-aligned fusion. DySample is introduced in the neck to perform content-adaptive upsampling and scale alignment, improving boundary stability during multi-scale feature fusion.
Together, these changes operationalise the proposed need–design mapping and form the core of YOLOv11n-CGSD.

2.4.1. C3Ghost-Based Lightweight Feature Representation Module

Traditional backbone and neck architectures often rely on increasing the depth of convolutional layers and the number of channels to enhance their ability to distinguish weak textures. However, this significantly increases the number of parameters and FLOPs, leading to excessive consumption of computational resources. Simultaneously, convolutional neural networks contain a considerable number of highly similar feature maps. Directly relying on standard convolutions to compute all feature maps individually represents a computationally inefficient strategy. The Ghost module specifically addresses this redundancy by generating a complete feature set using a small number of feature maps combined with several inexpensive linear transformations. This approach substantially reduces computational overhead while largely preserving feature representation capabilities. Consequently, this study introduces the Ghost module to construct a lightweight C3Ghost, replacing the original C3K2 module. The core relationship of the Ghost module is shown in Equation (1).
n = m · s
where m denotes the number of primary channels directly computed by standard convolution, s denotes the number of features derived from each primary channel, and n denotes the total number of output channels in the layer. In the context of this study, m corresponds to the newly computed basic IRT temperature distribution patterns at a given layer, s corresponds to multiple local variations derived from each basic pattern, such as gradients in different directions and local blurring and smoothing, and n represents the complete feature set delivered to the subsequent neck and detection head. On this basis, the linear derivative operators in the Ghost module generate multiple local variations for each base feature, which is equivalent to explicitly modeling various small perturbations and structural variants in the feature space. For low-resolution, low-contrast dairy cow IRT images, this mechanism enriches the limited basic temperature-distribution patterns with additional variations related to ROI boundaries and fine temperature textures, thereby enhancing the sensitivity to weak-texture features in key regions such as the LU and AA. This enables the network to keep the number of output channels n unchanged while reducing the number of base channels m, so that the same number of output feature maps is supported with fewer convolutional kernels. The structure of the Ghost module is illustrated in Figure 8.
Simultaneously, the Ghost module effectively reduces the computational complexity required for each layer by decreasing the number of standard convolutions. Let c denote the number of input channels, n the number of output channels, and k × k the size of the standard convolution kernel. Within the Ghost module, the inexpensive linear operator employs a depth convolution with kernel size d × d , yielding an output feature map of spatial dimensions h × ω . The ratio of computational effort between standard convolution and the Ghost module may be expressed as Equation (2).
r s = n h w c k 2 s n h w c k 2 + ( s 1 ) s n h w d 2 s c s + c 1
where r s denotes the computational ratio of standard convolutions relative to the Ghost module for the same output scale. In the C3Ghost design presented herein, c and n correspond to the input and output channel counts of each layer on the cow IRT image, with d k . Furthermore, for deep layers, c s holds true. Consequently, r s s is derived. This demonstrates that, whilst maintaining a constant output channel count n, the introduction of the Ghost module enables structural compression. Features that would otherwise require numerous standard convolutions to characterise can instead be represented by computing approximately m fewer convolutional kernels and generating feature maps reduced by a factor of s. Consequently, under constraints of unchanged overall computational complexity, this provides a theoretical basis for subsequent joint adjustments to channel width and network depth.
In terms of parameter scale, the Ghost module also exhibits significantly fewer parameters compared to standard convolutions (Equation (3)).
r c = n c k 2 s n c k 2 + ( s 1 ) s n d 2 s c s + c 1 s
where r c describes the compression ratio of the parameter size for the same layer when using the Ghost module, relative to standard convolution, while keeping the output channel n constant. The relationship between c, s, d, and k, as shown in Equation (3), similarly yields that the parameter budget can theoretically be compressed to 1 s , the original convolution kernel parameter count. This enables the prioritised allocation of saved parameters to mid-high level feature maps more sensitive to ROI boundaries under constrained total parameter limits. It reserves capacity for fine-grained detection of head-like structures, thereby enhancing modelling capabilities for minute temperature textures and blurred contours in critical IRT regions such as the LU and AA without increasing—or even reducing—overall model size.
In terms of specific structural implementation, this study employs the Ghost bottleneck architecture proposed in GhostNet to replace the C3K2 module, utilising it as the fundamental building block of the entire network. Each bottleneck comprises a conv module, two Ghost modules connected in series, and a concat module, with its detailed structure illustrated in Figure 9.

2.4.2. SPD-Conv-Based Small-Scale Structure-Sensitive Downsampling

In IRT-based cow ROI detection tasks, LU and AA manifest as temperature patches with blurred edges in IRT images. Should stride convolution or pooling be employed for downsampling during initial feature extraction, each receptive field in early feature maps typically retains only one sampling point, with the remaining pixels discarded. Consequently, small-scale high-temperature regions are smoothed out before being fully modelled by higher-level semantic features. This phenomenon is particularly pronounced in low-resolution images and small object detection, where conventional stride convolution and pooling lead to significant loss of fine-grained information and insufficient feature learning in such scenarios. To mitigate this structural limitation and preserve the finite local temperature information within dairy cow IRT images, this study introduces SPD-Conv as a replacement for stride sampling at subsampling positions within the backbone and neck components of YOLOv11n, establishing it as a novel subsampling unit.
The core idea of SPD-Conv is to first perform spatial rearrangement of the feature map and then use non-strided convolution to achieve channel compression, so that downsampling is completed without discarding any pixel information. The formula for a certain intermediate feature map is:
X ϵ R S × S × C 1
where S denotes the height and width of the current feature map, which in this study are obtained by proportionally reducing the original image resolution of 320 × 240 according to the downsampling ratio of each layer, and C1 denotes the number of channels. First, a space-to-depth transformation is applied to X to split it into s2 sub-feature maps, which are then concatenated along the channel dimension to obtain the intermediate feature map as shown in Equation (5).
X S P D ϵ R S × S s × ( S 2 C 1 )
where s denotes the scale factor, indicating that the spatial size is reduced proportionally from S  ×  S to S s × S s , while all pixels that were originally distributed at different spatial locations are reorganized into the channel dimension, expanding the number of channels to s2C1 without discarding any pixels. Figure 10 illustrates the network structure and computational flow of SPD-Conv, where the input feature map is split into four non-overlapping sub-maps and then concatenated along the channel dimension.
SPD-Conv applies a stride-1 convolution to perform channel compression and feature transformation, producing an output of size S s × S s × C 2 , which has the same spatial dimensions as the feature map obtained by a conventional stride-2 convolution. However, in this case the downsampling decisions are generated by learnable convolutional kernels based on complete pixel information, rather than by simple regular sampling, max or average pooling. Compared with the original stride-2 convolutions in YOLOv11n, this strategy achieves the same downsampling ratio while retaining all temperature pixels from the early feature maps in the channel dimension, enabling higher-level layers to make fuller use of this information and improving the sensitivity to small-scale high-temperature spots and narrow boundary regions.

2.4.3. DySample-Based Upsampling Module for Scale Alignment

The original feature pyramid network and the path aggregation network structures in YOLOv11n generally adopt nearest-neighbour or bilinear interpolation for upsampling. Such fixed interpolation operators rely solely on geometric position and are insensitive to feature content. In low-resolution feature maps, the LU and AA regions occupy only a few pixels. After fixed-interpolation upsampling, the magnitude of high-temperature responses is easily reduced and boundary information becomes excessively smoothed, making it difficult to achieve precise semantic boundary alignment across different scales, which in turn leads to localisation offsets and decreased confidence for small targets. Therefore, in order to alleviate small-target feature blurring and scale misalignment during feature fusion, DySample dynamic upsampling is introduced in the neck stage. The overall upsampling structure of DySample is illustrated in Figure 11.
During the neck stage, DySample models the upsampling process as learnable sampling over a continuous feature field. Let the low-resolution feature map from the preceding level be denoted as X R C × H 1 × W 1 , and the goal is to restore it to a higher spatial resolution (H2, W2) To this end, DySample first adaptively generates a sampling coordinate set S R 2 × H 2 × W 2 in which the two channels represent the normalized horizontal and vertical sampling coordinates for each output position, respectively. The feature map X and the coordinates S are then fed into the grid_sample function, which performs interpolated sampling over the continuous feature field at the specified locations to obtain the upsampled feature map:
X = grid_sample ( X , S ) ,   X R C × H 2 × W 2
In this study, X denotes the low-resolution IRT features from the backbone, X denotes the fused features aligned with the high-resolution branch, and S is automatically learned by the network according to the local temperature distribution.
Internally, DySample still performs resampling of the continuous feature field using bilinear interpolation. For an arbitrary output location (i, j), the resampled feature value is given by:
X c ( i , j ) = u , v w i , j , u , v X c ( u , v )
In this context, c denotes the channel index, (u, v) denotes the sampling locations in the input feature map, and w i , j , u , v denotes the bilinear interpolation weights determined by the sampling coordinates S. Unlike fixed interpolation, DySample learns S to indirectly control the weights w i , j , u , v , thereby making the upsampling process content-adaptive to local temperature distributions and boundary structures. The sampling set S is obtained by adding the regular sampling grid G and the offset field O:
S = G + O
where G denotes the regular sampling grid and O denotes the offset field obtained under the joint modulation of the static scope factor and dynamic scope factor (Figure 12).
Subsequently, through the application of static scope factor and dynamic scope factor for scale constraint and spatial adaptive recalibration, it can be expressed as:
O = α s t a t i c ( α d y n a m i c O 1 )
where α s t a t i c is the static scope factor, which serves as a global scaling coefficient for the offset field, uniformly constraining the magnitude of offsets at all positions so that sampling points move only within local neighborhoods. In contrast, α d y n a m i c is the dynamic scope factor, which adapts to local features to modulate the allowable offset range at different locations, permitting larger offsets in regions with complex boundaries or temperature gradients while tightening the offsets in background areas, thereby enhancing the response to critical structures.
In the dairy cow IRT image detection task considered in this study, DySample learns sampling coordinates to transform upsampling from geometry-only fixed interpolation into content-adaptive dynamic sampling, while introducing almost no additional parameters or FLOPs. This allows better preservation of high-temperature responses and clear contours in the LU and AA small-target regions, and achieves more precise semantic and boundary alignment across multi-scale branches, thereby providing more reliable feature support for subsequent small-object detection and temperature estimation.

2.5. Evaluation Indicators

To validate the effectiveness of the proposed YOLOv11n-CGSD model in terms of both detection performance and lightweight capabilities, Precision (P), Recall (R), mean average precision at an intersection over union (IoU) threshold of 0.5 (mAP50), and mean average precision averaged over IoU thresholds from 0.50 to 0.95 with a step size of 0.05 (mAP50-95) are adopted to quantitatively analyse the ROI localization capability for LU and AA. In the detection task, P and R are defined in Equations (10) and (11).
P = T P T P + F P × 100 %
R = T P T P + F N × 100 %
where TP denotes the number of correctly detected positive samples, FP denotes the number of false positives, and FN denotes the number of false negatives. In a detection task with K classes, the average precision A P 50 ( k ) of the k-th class at an IoU threshold of 0.5 can be expressed as Equation (12)
A P 50 ( k ) = 0 1 P k ( R ) d R n P k ( R n ) R n
the mAP50 provides a comprehensive measure of the overall detection accuracy across all key body-surface regions of dairy cows and serves as one of the core metrics for evaluating whether the improved architecture enhances detection performance in IRT scenarios.
From the perspective of model complexity, the number of parameters (Params) is defined as the sum of the learnable parameters in all layers of the network. Taking a convolutional layer as an example, if the kernel size in the l-th layer is k l × k l and the numbers of input and output channels are C i n ( l ) and C o u t ( l ) , respectively, then the Params can be expressed as follows:
P a r a m s = l k l 2 C i n ( l ) C o u t ( l )
where b denotes the storage required for each floating-point parameter. GFLOPs is used to denote the number of floating-point operations required by a model during a single forward inference process. By comparing changes in Params and GFLOPs before and after the structural modifications, a quantitative analysis of model complexity reduction can be performed.
FPS was measured under a fixed inference configuration, where the hardware and software environment, input resolution, batch size, preprocessing and post-processing procedure, and the warm-up and averaging strategy were kept identical for all models. The reported FPS value was obtained by averaging the throughput after warm-up over a fixed number of images. FPS reflects the real-time inference capability and is interpreted jointly with Params and GFLOPs to characterise the accuracy–efficiency trade-off.

3. Results

3.1. Training Results

All experiments were conducted on the AutoDL cloud platform with Ubuntu 20.04, Python 3.8, PyTorch 2.0.0, and CUDA 11.8. The deep-learning backend was compiled with cuDNN 8.7.0. Model training was performed on an NVIDIA GeForce RTX 2080 Ti, and the host CPU was an Intel(R) Xeon(R) Platinum 8352Y @ 2.20 GHz (2 sockets, 32 cores per socket, 128 logical CPUs). In this study, YOLO-CGSD was trained for 300 epochs, and the evolution curves of P, R, mAP50, box loss, cls loss, and dfl loss were analysed (Figure 13).
As shown in Figure 13a, the three performance curves increase rapidly during 0–50 epochs and then grow more slowly, reaching convergence at approximately 200 epochs; P and R stabilise above 0.80, while mAP50 remains above 0.90. No obvious oscillation or degradation is observed in the later stage, and no sign of overfitting appears, indicating that the network achieves satisfactory detection accuracy and generalisation performance on the validation set. In Figure 13b, box loss, cls loss, and dfl loss all decrease from relatively high initial values and gradually level off as the number of epochs increases, indicating that the optimisation of object localisation, category classification, and distribution fitting sub-tasks has stably converged. These results demonstrate that the proposed YOLOv11n-CGSD has sufficiently learned discriminative features of cow targets, providing a reliable basis for subsequent comparative experiments.

3.2. Performance Comparison of Different Models

In this study, RTMDet-s and several YOLO-series detection models were implemented and trained on the same dataset split to benchmark the baseline model and the proposed YOLOv11n-CGSD (Figure 14 and Figure 15). For a fair comparison, all models were trained under a consistent training protocol (same input resolution, epochs, batch size, data augmentation, and evaluation settings). RTMDet-s was built using the official MMDetection implementation, while the YOLO baselines were trained using their official codebases.
The results indicate that RTMDet-s yields substantially lower performance on this IRT-based cow ROI detection task compared with the YOLO-series baselines. On the one hand, its relatively small number of parameters and simple network structure lead to insufficient capability for fine-grained feature extraction from low-resolution IRT images; on the other hand, its feature representation capacity is also limited under complex background interference. In contrast, the YOLO-series models exhibit generally stronger feature-learning capabilities, but only YOLOv8n, YOLOv10n, and YOLOv11n achieve relatively high values on the three metrics P, R, and mAP50.
However, as shown in Figure 15, the prediction confidence of YOLOv8n and YOLOv10n is still suboptimal, mainly because these models struggle to effectively distinguish between valid and invalid features and tend to encode them together. As a result, although the predicted bounding boxes are roughly located within the predefined ROI regions, the localisation and classification accuracy remain insufficient. YOLOv11n enhances, to some extent, the ability to suppress and differentiate invalid features, but the overall improvement is still limited. After introducing the improvement strategies proposed in this study, the adaptability of the model to complex backgrounds and low-resolution IRT images is further enhanced, and the overall detection performance reaches an optimal level. Specifically, YOLOv11n-CGSD achieves P, R, and mAP50 values of 89.11%, 86.80%, and 91.94%, respectively, corresponding to improvements of 3.11% in P, 5.14% in R, and 4.08% in mAP50 over YOLOv11n. These results indicate fewer missed ROIs, a lower false-positive rate, and better overall localisation quality. The concurrent improvements across multiple metrics further suggest that the performance gain is consistent and interpretable, reflecting the effective contribution of the proposed modules rather than an accidental fluctuation. Accordingly, the proposed strategies are suggested to effectively suppress invalid information and to extract key target features from IRT images characterised by low resolution, sparse valid features, and complex background interference, thereby improving detection performance.

3.3. Ablation Experiments

To verify the effectiveness of each module in improving performance, multiple ablation experiments were designed and conducted (Table 2).
After introducing C3Ghost as the new feature extraction module, both Params and GFLOPs are reduced, while the core detection metrics P, R, mAP50 and mAP50-95 show a slight decline compared with the baseline. This suggests that channel compression can effectively reduce model complexity, but may also limit representational capacity, leading to a modest accuracy–efficiency trade-off when C3Ghost is used in isolation. With the integration of the DySample module, the responses to small targets and local regions become clearer and spatial alignment becomes more accurate, leading to a notable performance improvement under nearly unchanged parameter counts, which is also reflected by the increase in the localisation-sensitive mAP50-95. The SPD-Conv module likewise exhibits a certain capability for complexity reduction and improves boundary-related localisation by mitigating early downsampling information loss, as indicated by mAP50-95. Regardless of how DySample is combined with C3Ghost or SPD-Conv, the accuracy of capturing and representing effective features can be further improved under a lightweight setting: on the one hand, C3Ghost reduces redundant feature channels; on the other hand, SPD-Conv and DySample perform more thorough rearrangement and resampling within a more compact feature space, thereby fully exploiting the discriminative information of the limited features. These results demonstrate the rationality of the targeted model improvements adopted in this study.

3.4. Visualisation Based on EigenCAM

In this study, EigenCAM [49] was employed to generate heatmaps for the IRT images, and the visualisation results of YOLOv11n and its variants with different module combinations were compared to analyse the differences in regional feature attention among these configurations (Figure 16). In the heatmaps, the colour transition from cool to warm corresponds to increasing activation intensity: high-response regions are shown in red, medium-response regions in yellow, and low-response regions in green, revealing the key spatial locations on which the network relies during target recognition.
Based on the EigenCAM visualisation results, it can be observed that under complex background conditions, the original YOLOv11n model extracts a large amount of redundant features, with its attention mainly concentrated on the centre and lower part of the LU region, while strong responses also appear on the legs of other cows. As a result, it is difficult for the model to effectively distinguish structures that are irrelevant to the ROI but share similar temperature distributions. After introducing the C3Ghost module, the response intensity at the boundaries between the LU and surrounding regions begins to increase. With the further addition of the SPD-Conv module, the attention at the junction between the LU and the trunk is markedly enhanced, while the responses on the legs of other cows are noticeably reduced, and the range of high responses along the edges is further expanded, reflecting the role of spatially adaptive convolution in strengthening local structures and fine-grained texture representations. When the DySample module is additionally incorporated, the responses to small targets during the upsampling stages exhibit better shape preservation and positional alignment across scales; the heatmaps become more concentrated along the udder contour with smoother and more regular boundaries, and irrelevant activations on the legs of background cows and other body parts are substantially suppressed, indicating a clear improvement in the ability of the model to focus on the LU structure.
For the AA region, the YOLOv11n model only produces a small area of high response near the centre of the AA and is relatively insensitive to its boundaries. After introducing C3Ghost, the high-response area over the AA and its edges is slightly expanded, suggesting that reducing redundant channels helps preserve and propagate valid features more effectively. With SPD-Conv added, the number of effective structural pixels covered by the heatmap increases; however, the overall attention still does not fully span the entire AA region, indicating that relying solely on local dynamic convolution remains insufficient for comprehensive structural sensitivity. After finally integrating the DySample module, the high responses within the AA region are almost entirely concentrated inside the ROI, with continuous boundary activations that closely follow the anatomical contour, while spurious activations in the background and non-AA areas are greatly reduced. This highlights the role of feature-guided dynamic resampling in enhancing structural alignment and spatial discrimination.
Taken together, the visualisation results for LU and AA demonstrate that the proposed combination of C3Ghost, SPD-Conv, and DySample effectively suppresses redundant features while controlling the parameter scale, and enhances the model’s boundary sensitivity and robustness to interference on low-resolution, low-contrast, and weak-structure IRT images.

3.5. Temperature Extraction Experiment

In this study, the YOLOv11n and YOLOv11n-CGSD models were compared by extracting temperatures from the LU and AA regions of 36 cows, with the extracted temperature indices including Tmax and Tavg. Each cow was measured four times on four consecutive days at the same time of day. Ambient temperature and relative humidity were recorded using a handheld thermo-hygrometer, and data were collected only when deviations from the day-1 baseline were within 1 °C for temperature and within 3 percent for humidity. Tmax is defined as the highest temperature among all pixels within the detected bounding box, and Tavg is defined as the average temperature of all pixels in the bounding box. The maximum error (Max. Error) and mean error (MAE. Error) of the two models under the same temperature parameter were adopted as evaluation metrics. Max. Error is defined as the maximum absolute difference between the measured value and the reference value, and the MAE. Error is the arithmetic mean of all errors (Table 3).
For the parameter Tavg, although both the Max. Error and MAE. Error decrease after model improvement, the temperature-extraction accuracy remains suboptimal, mainly because Tavg inevitably includes irrelevant pixels and is sensitive to occlusions such as dirt. A similar tendency was documented by Metzner et al. in 2014 [50] through a comparison of geometric analysis tools on caudal udder thermograms, showing that the Tavg was more variable, whereas Tmax achieved the lowest coefficient of variation. Therefore, Tavg is not considered a reliable reference parameter for dairy cow body-surface temperature in the presence of occlusions and irrelevant pixels.
By contrast, Tmax exhibits better stability in temperature extraction and is more suitable as a reference index for body-surface temperature in dairy cow disease detection. Furthermore, YOLOv11n-CGSD yields a lower mean error in Tmax extraction compared with the baseline model, indicating a smaller overall bias under limited sample conditions. The reduction rate of each error metric is calculated as follows:
T r e d u c t i o n = T b T i T b × 100 %
where Treduction denotes the percentage reduction in temperature error after model improvement, Tb denotes the temperature-extraction error of the baseline model, and Ti denotes the temperature-extraction error of the improved model.
Accordingly, Treduction for Max. Error and the MAE. Error in the LU region is 33.3% and 25.7%, respectively, while Treduction in the AA region is 87.5% and 95.0%, respectively, for Tmax.
To ensure that the observed improvements are not attributable to sampling variability, statistical validation was conducted on paired per-cow temperature-extraction errors. Paired bootstrapping with 2000 resamples was used to estimate 95% confidence intervals (CIs) for the paired differences between YOLOv11n-CGSD and YOLOv11n, and a one-sided paired Wilcoxon signed-rank test was performed to assess whether YOLOv11n-CGSD consistently reduced errors.
In Table 4, the paired difference is presented as YOLOv11n-CGSD minus YOLOv11n; negative values indicate reduced error. Statistically robust error reductions are observed for AA in both Tmax and Tavg and for LU in Tavg, as the 95% CIs of the paired differences exclude zero and the Wilcoxon tests indicate significance. For LU in Tmax, the MAE. Error decreases from 0.183 to 0.136, indicating a modest improvement; however, this reduction does not reach statistical significance under the Wilcoxon test (p = 0.135), likely because the baseline error is already low and leaves limited room for further reduction under the current sample size.
These results collectively demonstrate that the proposed method achieves high effectiveness and reliability for IRT-based body-surface temperature extraction under complex background conditions.

4. Discussion

Automatic ROI detection for dairy cow surface temperature extraction generally follows two routes: one directly performs ROI localisation and temperature extraction on IRT images, while the other detects ROIs on RGB images and then maps the results to the corresponding IRT images. Compared with RGB-based methods, IRT-based approaches are expected to offer advantages in terms of robustness to illumination variability. Under conditions such as insufficient barn lighting at night, severe local shadows, or strong direct sunlight and overexposure during the day, RGB images are prone to underexposure or highlight saturation, leading to the loss of texture and boundary information. In contrast, IRT images are derived from radiative temperature distributions and are largely insensitive to visible light conditions, thereby potentially reducing the impact of illumination variations on detection stability.
Compared with previously developed RGB-based models [48], the IRT-based model in this study shows heterogeneous performance between LU and AA. It should be clarified that the dataset used in the present IRT-based study is not fully consistent with that used in our previous RGB-based study [48]; the two data collections only partially overlap. Therefore, a strict quantitative comparison under an identical protocol is not possible in this manuscript. Nevertheless, the overall performance trends provide useful contextual evidence for the following discussion. For the LU region, the improvement in temperature extraction accuracy achieved by the IRT model is smaller than that of the RGB model. This is because, in RGB images, the udder can be more easily distinguished from other cows and farm structures in terms of colour and shape, whereas in IRT images the udder temperature is often similar to that of surrounding animals, which increases the difficulty of boundary segmentation. In contrast, for the AA region, the IRT model outperforms the RGB model. The Tmax of AA is usually located near the centre of the ROI and appears in IRT images as a distinct local hot spot; therefore, accurate boundary delineation is sufficient to obtain stable extraction results. By comparison, the AA region in RGB images contains limited usable texture and colour information, and its central prominence is weaker.
From the perspective of structural information, anatomical contours, shape boundaries, and the relative spatial relationships between different ROIs are more explicit in RGB images, which facilitates learning of “body structure–spatial layout” priors. Under IRT conditions, these geometric cues are attenuated, and the model must instead rely on subtle variations in temperature distribution to recover precise ROI boundaries.
In this study, the primary objective is to improve the accuracy and stability of LU and AA temperature extraction from IRT images, rather than to develop a model for predicting rectal temperature. Tr is used as a reference measure of core temperature to contextualise the extracted surface temperatures, while recognising that thermoregulation dynamics and environmental confounders may introduce systematic differences between surface and core measurements. Therefore, diagnostic thresholds and Tr prediction are beyond the scope of this work and should be addressed in future studies with dedicated physiological validation.
Based on the above comparison, several limitations remain in the IRT-based setting of this study:
  • The current model relies solely on single-modality IRT images; under conditions of multiple-animal interference, dirt contamination, and large areas of thermally connected regions, its discrimination performance for LU remains weaker than that of RGB-based models.
  • ROI boundaries are highly dependent on fine temperature gradients; once affected by emissivity changes, local contamination, or minor calibration errors, localisation bias can be amplified, and this uncertainty has not yet been systematically quantified.
  • The dataset was collected from a single farm, so inter-farm variability in barn layout, camera setup, management practices, and background heat sources is not fully represented; accordingly, robustness to unseen farms cannot yet be firmly established;
  • Seasonal conditions, posture, occlusion diversity, and thermal-environment heterogeneity were not comprehensively covered; changes in ambient temperature, ventilation, and heterogeneous heat backgrounds may shift radiometric patterns and reduce generalisation in field deployment.
  • Only Tmax was compared with Tavg in this study; alternative robust statistics, including percentile-based temperature, trimmed mean, and hotspot-based aggregation, were not evaluated which may further reduce noise sensitivity.
  • The reported FPS reflects GPU-based inference on an RTX 2080 Ti and may not directly translate to edge devices or CPU-only hardware; nevertheless, the lightweight design suggests deployment potential, and deployment-oriented benchmarking on typical farm edge platforms will be conducted in future work.
Future work will expand data collection across farms and seasons, include cross-device calibration, and conduct cross-farm validation using schemes such as training on one farm and testing on another or applying leave-one-farm-out when multi-farm data are available, to better quantify domain robustness and improve generalisability, and will further examine robust temperature statistics to strengthen the stability of ROI-based temperature extraction.

5. Conclusions

To address the issues of complex dairy cow barn backgrounds and IRT images of low resolution, poor contrast, indistinct ROI boundaries, and weak texture information, a lightweight detection model, YOLOv11n-CGSD, was developed for this scenario to improve ROI detection performance and the accuracy of body temperature extraction. By incorporating the C3Ghost, SPD-Conv, and DySample modules into the network, the model maintains a lightweight structure while redirecting attention from irrelevant regions to the contour boundaries of the ROI, thereby significantly enhancing detection performance. Experimental results show that under complex IRT conditions, the P, R, and mAP50 of the model reach 89.11%, 86.80%, and 91.94%, representing increases of 3.11%, 5.14%, and 4.08%, respectively, compared with the unimproved model. When Tmax is used as the temperature parameter, the body temperature extraction results indicate that the Max. Error is reduced by 33.3–87.5%, and the MAE. Error is reduced by 25.7–95.0%. This study provides methodological reference and technical support for non-contact body temperature extraction of dairy cows under practical cowshed production conditions.

Author Contributions

Contributions: Conceptualisation, Z.K., H.S. (Hang Song), H.X., H.S. (Hang Shi), J.H., and T.N.; Methodology, Z.K., H.S. (Hang Song), M.W., D.B., C.Y., H.S. (Hang Shi), J.H., and T.N.; Software, Z.K., H.S. (Hang Song), H.X., M.W., D.B. and C.Y., H.S. (Hang Shi); Validation, Z.K. and H.S. (Hang Shi); Formal analysis, Z.K.; Data curation, Z.K., H.S. (Hang Song), H.X., M.W., D.B. and C.Y.; Writing—original draft preparation, Z.K., H.S. (Hang Song), H.X., M.W., D.B,. C.Y. and H.S. (Hang Shi); Writing—review and editing, J.H. and T.N.; Visualization, Z.K.; Supervision, T.N.; Project administration, J.H.; Funding acquisition, J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Heilongjiang Provincial Natural Science Foundation for Young Scholars, grant number QC2025C032; Major Project of Heilongjiang Provincial Key Research and Development Program, grant number 2023ZX01A06; Collaborative Innovation Achievement Project of “Double First-Class” Disciplines in Heilongjiang Province, grant number LJGXCG2023-045; Heilongjiang Province Excellent Young Teachers Basic Research Support Program, grant number YQJH2025161; China University Industry–University–Research Innovation Fund, grant number 2023RY059; Research Start-up Fund for Introduced Talents of Heilongjiang Bayi Agricultural University, grant number XYB202504; Open Fund Project of the State Key Laboratory of Green Pesticides (Guizhou University), grant number GPLKF202511; and Daqing City Guidance Project, grant number zd-2025-034.

Institutional Review Board Statement

The animal study protocol was approved by the Animal Welfare and Ethics Committee of Heilongjiang Bayi Agricultural University (protocol code DWKJXY2024057).

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding authors.

Acknowledgments

We would like to thank the staff of the 8511 Dairy Farm in Mishan City, Heilongjiang, China, and the ELVO dairy barn in Belgium for their assistance with data collection.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Oliveira, C.P.; de Sousa, F.C.; da Silva, A.L.; Schultz, É.B.; Valderrama Londoño, R.I.; de Souza, P.A.R. Heat Stress in Dairy Cows: Impacts, Identification, and Mitigation Strategies—A Review. Animals 2025, 15, 249. [Google Scholar] [CrossRef]
  2. Giannone, C.; Bovo, M.; Ceccarelli, M.; Torreggiani, D.; Tassinari, P. Review of the Heat Stress-Induced Responses in Dairy Cattle. Animals 2023, 13, 3451. [Google Scholar] [CrossRef] [PubMed]
  3. Macmillan, K.; Colazo, M.G.; Cook, N.J. Evaluation of Infrared Thermography Compared to Rectal Temperature to Identify Illness in Early Postpartum Dairy Cows. Res. Vet. Sci. 2019, 125, 315–322. [Google Scholar] [CrossRef]
  4. McManus, R.; Boden, L.A.; Weir, W.; Viora, L.; Barker, R.; Kim, Y.; McBride, P.; Yang, S. Thermography for Disease Detection in Livestock: A Scoping Review. Front. Vet. Sci. 2022, 9, 965622. [Google Scholar] [CrossRef] [PubMed]
  5. Sathiyabarathi, M.; Jeyakumar, S.; Manimaran, A.; Pushpadass, H.A.; Sivaram, M.; Ramesha, K.P.; Das, D.N.; Kataktalware, M.A.; Jayaprakash, G.; Patbandha, T.K. Investigation of Body and Udder Skin Surface Temperature Differentials as an Early Indicator of Mastitis in Holstein Friesian Crossbred Cows Using Digital Infrared Thermography Technique. Vet. World 2016, 9, 1386–1391. [Google Scholar] [CrossRef] [PubMed]
  6. Sathiyabarathi, M.; Jeyakumar, S.; Manimaran, A.; Pushpadass, H.A.; Kumaresan, A.; Lathwal, S.S.; Sivaram, M.; Das, D.N.; Ramesha, K.P.; Jayaprakash, G. Infrared Thermography to Monitor Body and Udder Skin Surface Temperature Differences in Relation to Subclinical and Clinical Mastitis Condition in Karan Fries (Bos taurus × Bos indicus) Crossbred Cows. Indian J. Anim. Sci. 2018, 88, 694–699. [Google Scholar] [CrossRef]
  7. Yang, C.; Gu, X.; Cao, Z.; Zhang, X.; Hao, Y.; Liu, Y.; Shen, L.; Zhang, Y. Study on Possibility of Left and Right Quarter Skin Temperature Difference as a Detecting Indicator for Subclinical Mastitis in Dairy Cows. Acta Vet. Zootech. Sin. 2015, 46, 1663–1670. [Google Scholar]
  8. Isola, J.V.V.; Menegazzi, G.; Busanello, M.; dos Santos, S.B.; Agner, H.S.S.; Sarubbi, J. Differences in Body Temperature between Black-and-White and Red-and-White Holstein Cows Reared on a Hot Climate Using Infrared Thermography. J. Therm. Biol. 2020, 94, 102775. [Google Scholar] [CrossRef]
  9. Hajnal, É.; Kovács, L.; Vakulya, G. Dairy Cattle Rumen Bolus Developments with Special Regard to the Applicable Artificial Intelligence (AI) Methods. Sensors 2022, 22, 6812. [Google Scholar] [CrossRef]
  10. Hillman, P.E.; Gebremedhin, K.G.; Willard, S.T.; Lee, C.N.; Lee, J.A. Continuous Measurements of Vaginal Temperature of Female Cattle Using a Data Logger Encased in a Plastic Anchor. Appl. Eng. Agric. 2009, 25, 291–296. [Google Scholar] [CrossRef]
  11. Passawat, T.; Adisorn, Y.; Theera, R. Effect of Heat Stress on Subsequent Estrous Cycles Induced by PGF2α in Cross-Bred Holstein Dairy Cows. Animals 2024, 14, 2009. [Google Scholar]
  12. Salles, M.S.V.; da Silva, S.C.; Salles, F.A.; Roma, L.C.; El Faro, L.; Mac Lean, P.A.B.; de Oliveira, C.E.L.; Martello, L.S. Mapping the Body Surface Temperature of Cattle by Infrared Thermography. J. Therm. Biol. 2016, 62, 63–69. [Google Scholar] [CrossRef] [PubMed]
  13. Kim, S.-J.; Jin, X.-C.; Bharanidharan, R.; Kim, N.-Y. Distinguishing True from False Estrus in Hanwoo Cows Using Neck-Mounted IMU Sensors: Quantifying Behavioral Differences to Reduce False Positives. Agriculture 2025, 15, 2307. [Google Scholar] [CrossRef]
  14. Ipema, A.H.; Goense, D.; Hogewerf, P.H.; Houwers, H.W.J.; van Roest, H. Pilot Study to Monitor Body Temperature of Dairy Cows with a Rumen Bolus. Comput. Electron. Agric. 2008, 64, 49–52. [Google Scholar] [CrossRef]
  15. Lima, R.S.; Danielski, G.C.; Pires, A.C.S. Mastitis Detection and Prediction of Milk Composition Using Gas Sensor and Electrical Conductivity. Food Bioprocess Technol. 2018, 11, 551–560. [Google Scholar] [CrossRef]
  16. Chung, H.; Vu, H.; Kim, Y.; Choi, C.Y. Subcutaneous Temperature Monitoring through Ear Tag for Heat Stress Detection in Dairy Cows. Biosyst. Eng. 2023, 235, 202–214. [Google Scholar] [CrossRef]
  17. Yu, Z.; Han, Y.; Cha, L.; Chen, S.; Wang, Z.; Zhang, Y. Design of an Intelligent Wearable Device for Real-Time Cattle Health Monitoring. Front. Robot. AI 2024, 11, 1441960. [Google Scholar] [CrossRef]
  18. Fogarty, E.S.; Swain, D.L.; Cronin, G.; Trotter, M. Autonomous On-Animal Sensors in Sheep Research: A Systematic Review. Comput. Electron. Agric. 2018, 150, 245–256. [Google Scholar] [CrossRef]
  19. Ciliberti, M.G.; Albenzio, M.; Sevi, A. From Sensors to Sustainability: Integrating Welfare, Management, and Climate Resilience in Small Ruminant Farm Systems. Animals 2025, 15, 3240. [Google Scholar] [CrossRef]
  20. Zhenlong, W.; Sam, W.; Dong, L.; Tomas, N. How AI Improves Sustainable Chicken Farming: A Literature Review of Welfare, Economic, and Environmental Dimensions. Agriculture 2025, 19, 2028. [Google Scholar] [CrossRef]
  21. Geqi, Y.; Zhengxiang, S.; Hao, L. Critical Temperature-Humidity Index Thresholds Based on Surface Temperature for Lactating Dairy Cows in a Temperate Climate. Agriculture 2021, 11, 970. [Google Scholar] [CrossRef]
  22. Zhang, C.; Wu, X.; Xiao, D.; Zhang, X.; Lei, X.; Lin, S. An Automatic Ear Temperature Monitoring Method for Group-Housed Pigs Adopting Infrared Thermography. Animals 2025, 15, 2279. [Google Scholar] [CrossRef]
  23. Stanek, P.; Żółkiewski, P.; Januś, E. A Review on Mastitis in Dairy Cows Research: Current Status and Future Perspectives. Agriculture 2024, 14, 1292. [Google Scholar] [CrossRef]
  24. Alejandro, M.; Romero, G.; Sabater, J.M.; Diaz, J.R. Infrared Thermography as a Tool to Determine Teat Tissue Changes Caused by Machine Milking in Murciano-Granadina Goats. Livest. Sci. 2014, 160, 178–185. [Google Scholar] [CrossRef]
  25. Zhenqiang, C.; Jialiang, C.; Hongbo, Y.; Man, C. Application and Research Progress of Infrared Thermography in Temperature Measurement of Livestock and Poultry Animals: A Review. Comput. Electron. Agric. 2023, 205, 107586. [Google Scholar] [CrossRef]
  26. Bell, D.J.; Macrae, A.I.; Mitchell, M.A.; Mason, C.S.; Jennings, A.; Haskell, M.J. Comparison of Thermal Imaging and Rectal Temperature in the Diagnosis of Pyrexia in Pre-Weaned Calves Using on Farm Conditions. Res. Vet. Sci. 2020, 131, 259–265. [Google Scholar] [CrossRef] [PubMed]
  27. Marrero, M.G.; Rijos-Fernandez, C.; Velez-Robles, Y.; Ortiz-Colon, G.; Sanchez-Rodriguez, H.; Jimenez-Caban, E.; Curbelo-Rodriguez, J. Short-Milking-Tube Infrared Temperature as a Subclinical Mastitis Detection Tool in Tropical Dairy Farms. Appl. Anim. Sci. 2020, 36, 329–334. [Google Scholar] [CrossRef]
  28. Chu, M.; Liu, X.; Zeng, X.; Wang, Y.; Liu, G. Research Advances in the Automatic Detection Technology for Mastitis of Dairy Cows. Trans. Chin. Soc. Agric. Eng. 2023, 39, 1–12. [Google Scholar]
  29. Satheesan, L.; Kittur, P.; Alhussien, M.; Lal, G.; Kamboj, A.; Dang, A. Reliability of Udder Infrared Thermography as a Non-Invasive Technology for Early Detection of Sub-Clinical Mastitis in Sahiwal (Bos indicus) Cows Under Semi-Intensive Production System. J. Therm. Biol. 2024, 121, 103838. [Google Scholar] [CrossRef]
  30. Guo, Y.; Yang, S.; Chi, Y.; Wu, C.; Xu, H.; Shen, J.; Zheng, Y. Recognizing Mastitis Using Temperature Distribution from Thermal Infrared Images in Cow Udder Regions. Trans. Chin. Soc. Agric. Eng. 2022, 38, 250–259. [Google Scholar]
  31. Metzner, M.; Sauter-Louis, C.; Seemueller, A.; Petzl, W.; Zerbe, H. Infrared Thermography of the Udder After Experimentally Induced Escherichia coli Mastitis in Cows. Vet. J. 2015, 204, 360–362. [Google Scholar] [CrossRef] [PubMed]
  32. D’Alterio, G.; Casella, S.; Gatto, M.; Gianesella, M.; Piccione, G.; Morgante, M. Circadian Rhythm of Foot Temperature Assessed Using Infrared Thermography in Sheep. Czech J. Anim. Sci. 2011, 56, 293–300. [Google Scholar] [CrossRef]
  33. Watz, S.; Petzl, W.; Zerbe, H.; Rieger, A.; Glas, A.; Schröter, W.; Landgraf, T.; Metzner, M. Technical Note: Automatic Evaluation of Infrared Thermal Images by Computerized Active Shape Modeling of Bovine Udders Challenged with Escherichia coli. J. Dairy Sci. 2019, 102, 4541–4545. [Google Scholar] [CrossRef] [PubMed]
  34. Wang, C.S.; Du, P.F.; Wu, H.R.; Li, J.X.; Zhao, C.J.; Zhu, H.J. A Cucumber Leaf Disease Severity Classification Method Based on the Fusion of DeepLabV3+ and U-Net. Comput. Electron. Agric. 2021, 189, 106373. [Google Scholar] [CrossRef]
  35. Zhao, K.; Duan, Y.; Chen, J.; Li, Q.; Hong, X.; Zhang, R.; Wang, M. Detection of Respiratory Rate of Dairy Cows Based on Infrared Thermography and Deep Learning. Agriculture 2023, 13, 1939. [Google Scholar] [CrossRef]
  36. Shu, H.; Wang, K.; Guo, L.; Bindelle, J.; Wang, W. Automated Collection of Facial Temperatures in Dairy Cows via Improved UNet. Comput. Electron. Agric. 2024, 220, 108614. [Google Scholar] [CrossRef]
  37. Guo, S.-S.; Lee, K.-H.; Chang, L.; Tseng, C.-D.; Sie, S.-J.; Lin, G.-Z.; Chen, J.-Y.; Yeh, Y.-H.; Huang, Y.-J.; Lee, T.-F. Development of an Automated Body Temperature Detection Platform for Face Recognition in Cattle with YOLO V3-Tiny Deep Learning and Infrared Thermal Imaging. Appl. Sci. 2022, 12, 4036. [Google Scholar] [CrossRef]
  38. Yan, Y.; Sheng, Z.; Gu, Y.; Heng, Y.; Zhou, H. Non-contact Core Body Temperature Detection Method for Caged Laying Hens. Trans. Chin. Soc. Agric. Mach. 2024, 55, 312–321. [Google Scholar]
  39. Shen, M.; Lu, P.; Liu, L.; Sun, Y.; Xu, Y. Body Temperature Detection Method of Ross Broiler Based on Infrared Thermography. Trans. Chin. Soc. Agric. Mach. 2019, 50, 222–229. [Google Scholar]
  40. Wang, Y.; Kang, X.; He, Z.; Feng, Y.; Liu, G. Accurate Detection of Dairy Cow Mastitis with Deep Learning Technology: A New and Comprehensive Detection Method Based on Infrared Thermal Images. Animal 2022, 12, 100646. [Google Scholar] [CrossRef]
  41. Zhang, X.D.; Kang, X.; Feng, N.N.; Liu, G. Automatic Recognition of Dairy Cow Mastitis from Thermal Images by a Deep Learning Detector. Comput. Electron. Agric. 2020, 178, 105754. [Google Scholar] [CrossRef]
  42. Pezeshki, A.; Stordeur, P.; Wallemacq, H.; Schynts, F.; Stevens, M.; Boutet, P.; Peelman, L.J.; De Spiegeleer, B.; Duchateau, L.; Bureau, F.; et al. Variation of Inflammatory Dynamics and Mediators in Primiparous Cows after Intramammary Challenge with Escherichia coli. Vet. Res. 2011, 42, 15. [Google Scholar] [CrossRef] [PubMed]
  43. Chu, M.; Si, Y.; Li, Z.; Li, Q.; Liu, G. Multi-feature Image Layers Fusion for Accurate Detection of Dairy cow Mastitis Using Deep Learning. Comput. Electron. Agric. 2025, 239, 110937. [Google Scholar] [CrossRef]
  44. Velasco-Bolaños, J.; Ceballes-Serrano, C.C.; Velasquez-Mejia, D.; Riano-Rojas, J.C.; Giraldo, C.E.; Carmona, J.U.; Ceballos-Marquez, A. Application of Udder Surface Temperature by Infrared Thermography for Diagnosis of Subclinical Mastitis in Holstein Cows Located in Tropical Highlands. J. Dairy Sci. 2021, 104, 10310–10323. [Google Scholar] [CrossRef] [PubMed]
  45. Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features from Cheap Operations. arXiv 2020, arXiv:1911.11907. [Google Scholar] [CrossRef]
  46. Sunkara, R.; Luo, T. No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects. arXiv 2022, arXiv:2208.03641. [Google Scholar] [CrossRef]
  47. Liu, W.; Lu, H.; Fu, H.; Cao, Z. Learning to Upsample by Learning to Sample. arXiv 2023, arXiv:2308.15085. [Google Scholar] [CrossRef]
  48. Song, H.; Kang, Z.; Xue, H.; Hu, J.; Norton, T. EDC-YOLO-World-DB: A Model for Dairy Cow ROI Detection and Temperature Extraction Under Complex Conditions. Animals 2025, 15, 3361. [Google Scholar] [CrossRef]
  49. Francesco, C.; Alessandro, C. Real-Time Search and Rescue with Drones: A Deep Learning Approach for Small-Object Detection Based on YOLO. Drones 2025, 9, 514. [Google Scholar]
  50. Metzner, M.; Sauter-Louis, C.; Seemueller, A.; Petzl, W.; Klee, W. Infrared Thermography of the Udder Surface of Dairy Cattle: Characteristics, Methods, and Correlation with Rectal Temperature. Vet. J. 2014, 199, 57–62. [Google Scholar] [CrossRef]
Figure 1. Overall workflow and model architecture of the proposed YOLOv11n-CGSD framework for dairy cow IRT images. (1) multi-source data acquisition; (2) annotation of IRT images; (3) design of the C3Ghost module; (4) design of the space-to-depth (SPD-Conv) module; (5) design of the dynamic sampling (DySample) module; (6) temperature extraction from the detected ROIs. The improved model is developed by integrating modules (3)–(5) into the YOLOv11n model.
Figure 1. Overall workflow and model architecture of the proposed YOLOv11n-CGSD framework for dairy cow IRT images. (1) multi-source data acquisition; (2) annotation of IRT images; (3) design of the C3Ghost module; (4) design of the space-to-depth (SPD-Conv) module; (5) design of the dynamic sampling (DySample) module; (6) temperature extraction from the detected ROIs. The improved model is developed by integrating modules (3)–(5) into the YOLOv11n model.
Agriculture 16 00229 g001
Figure 2. Data collection process.
Figure 2. Data collection process.
Agriculture 16 00229 g002
Figure 3. Cows restrained in neck rails.
Figure 3. Cows restrained in neck rails.
Agriculture 16 00229 g003
Figure 4. Illustration of on-site rectal temperature (Tr) and body temperature acquisition. (a) Tr measurement with metal clip secured at the base of the tail hair, where ① indicates the position of the clip and ② indicates the mercury thermometer inserted into the anus of the cow; (b) IRT data collection alongside ambient temperature–humidity (T-RH) measurement.
Figure 4. Illustration of on-site rectal temperature (Tr) and body temperature acquisition. (a) Tr measurement with metal clip secured at the base of the tail hair, where ① indicates the position of the clip and ② indicates the mercury thermometer inserted into the anus of the cow; (b) IRT data collection alongside ambient temperature–humidity (T-RH) measurement.
Agriculture 16 00229 g004
Figure 5. ROI annotation on IRT images using RGB images as anatomical reference. (a) Lower udder (LU); (b) around the anus (AA).
Figure 5. ROI annotation on IRT images using RGB images as anatomical reference. (a) Lower udder (LU); (b) around the anus (AA).
Agriculture 16 00229 g005
Figure 6. Examples of data augmentation applied to the ROI. (a) Original lower udder (LU) image captured with the camera not parallel to the ground; (b) original LU image captured with the camera parallel to the ground; (c) random rotation; (d) vertical flip; (e,f) slight geometric distortion; (g) contrast adjustment; (h) mosaic; (i) added image noise.
Figure 6. Examples of data augmentation applied to the ROI. (a) Original lower udder (LU) image captured with the camera not parallel to the ground; (b) original LU image captured with the camera parallel to the ground; (c) random rotation; (d) vertical flip; (e,f) slight geometric distortion; (g) contrast adjustment; (h) mosaic; (i) added image noise.
Agriculture 16 00229 g006
Figure 7. Comparison of the original and the improved network architectures. (a) Model architecture of YOLOv11n. (b) Overall architecture of YOLOv11n-CGSD. The left panel displays the improved overall network, whilst the right panel presents the key network structures involved in the enhancement, including C3Ghost, space-to-depth convolution (SPD-Conv), and dynamic sampling (DySample).
Figure 7. Comparison of the original and the improved network architectures. (a) Model architecture of YOLOv11n. (b) Overall architecture of YOLOv11n-CGSD. The left panel displays the improved overall network, whilst the right panel presents the key network structures involved in the enhancement, including C3Ghost, space-to-depth convolution (SPD-Conv), and dynamic sampling (DySample).
Agriculture 16 00229 g007aAgriculture 16 00229 g007b
Figure 8. Network architecture of the Ghost module. ϕ denotes low-cost computing.
Figure 8. Network architecture of the Ghost module. ϕ denotes low-cost computing.
Agriculture 16 00229 g008
Figure 9. Architecture of the C3Ghost module.
Figure 9. Architecture of the C3Ghost module.
Agriculture 16 00229 g009
Figure 10. Schematic diagram of the space-to-depth convolution (SPD-Conv) network architecture.
Figure 10. Schematic diagram of the space-to-depth convolution (SPD-Conv) network architecture.
Agriculture 16 00229 g010
Figure 11. Schematic diagram of the dynamic sampling (DySample) network architecture.
Figure 11. Schematic diagram of the dynamic sampling (DySample) network architecture.
Agriculture 16 00229 g011
Figure 12. Sampling point generator dynamic sampling (DySample); (a) static scope factor; (b) dynamic scope factor.
Figure 12. Sampling point generator dynamic sampling (DySample); (a) static scope factor; (b) dynamic scope factor.
Agriculture 16 00229 g012
Figure 13. Training curves of YOLOv11n-CGSD over 300 epochs. (a) model detection performance metrics; (b) model real-time performance metrics.
Figure 13. Training curves of YOLOv11n-CGSD over 300 epochs. (a) model detection performance metrics; (b) model real-time performance metrics.
Agriculture 16 00229 g013
Figure 14. Detection performance comparison of different models.
Figure 14. Detection performance comparison of different models.
Agriculture 16 00229 g014
Figure 15. Comparison of confidence distributions across different models. (a) Lower udder (LU) region; (b) around the anus (AA) region.
Figure 15. Comparison of confidence distributions across different models. (a) Lower udder (LU) region; (b) around the anus (AA) region.
Agriculture 16 00229 g015aAgriculture 16 00229 g015bAgriculture 16 00229 g015c
Figure 16. Heatmap distribution across different models.
Figure 16. Heatmap distribution across different models.
Agriculture 16 00229 g016
Table 1. Summary of specifications for equipment required for data acquisition.
Table 1. Summary of specifications for equipment required for data acquisition.
InstrumentManufacturerFunctionParameterValue
FLIR E6Teledyne FLIR LLC, Wilsonville, OR, USABody temperatureRGB pixels640 × 480
Infrared resolution320 × 240
Emission0.98
Field of view45° × 34°
Spatial resolution3.7 mrad
Thermal sensitivity NETD 150 mK
Temperature measurement range−20–250 °C
Fotric 287FOTRIC USA Inc., Santa Clara, CA, USABody temperatureRGB pixels1024 × 768
Infrared resolution512 × 384
Emission0.98
Field of view20° × 15°
Spatial resolution0.68 mrad
Thermal sensitivity NETD30 mK
Temperature measurement range−40–150 °C
ThermometerEWHA Co., Ltd., Liaocheng, Shandong, ChinaRectal temperature (Tr) temperatureMeasurement range35–43 °C
Stated measurement error±0.1 °C
Scale division0.1 °C
Sensing mediummercury column
Ambient conditions Shandong Renke Measurement & Control Technology Co., Ltd., Jinan, Shandong, Chinatemperature–humidity (T-RH)Temperature measuring range–40 to +80 °C
Relative humidity range0–100%RH
Temperature resolution0.1 °C
Relative humidity resolution0.1%RH
1 NETD: noise equivalent temperature difference.
Table 2. Ablation study results for different module combinations.
Table 2. Ablation study results for different module combinations.
ModelPRmAP50mAP50-95ParamsFPSGFLOPs
YOLOv11n86.0081.6687.8651.502.62M906.6
YOLOv11n-C3Ghost85.6481.7286.3549.122.36M876.9
YOLOv11n-SPD-Conv86.7780.8088.5552.352.32M1155.7
YOLOv11n-DySample89.5484.1989.2553.442.60M856.5
YOLOv11n-C3Ghost + SPD-Conv86.5481.4188.2051.872.09M1126.1
YOLOv11n-SPD-Conv-DySample88.7584.9691.2356.692.37M1075.7
YOLOv11n-C3Ghost + DySample88.8685.5289.8054.102.37M816.9
YOLOv11n-CGSD89.1186.8091.9460.182.10M1096.1
Table 3. Comparison of temperature extraction results before and after model improvement.
Table 3. Comparison of temperature extraction results before and after model improvement.
MetricsROIYOLOv11nYOLOv11n-CGSD
TmaxTavgTmaxTavg
Max. ErrorLU0.33.40.21.7
AA2.42.80.32.4
MAE. ErrorLU0.1832.460.1361.05
AA0.941.490.0471.21
Table 4. Statistical validation of temperature-extraction errors.
Table 4. Statistical validation of temperature-extraction errors.
MetricsROIΔMAE (95% CI)RMSE (v11n)RMSE (CGSD)ΔRMSE (95% CI)Wilcoxon p
TmaxLU−0.047 [−0.092, −0.006]0.2070.154−0.053 [−0.091, −0.015]0.135
AA−0.898 [−1.183, −0.603]1.2830.091−1.185 [−1.422, −0.905]4.97 × 10−7
TavgLU−1.406 [−1.814, −0.986]2.7251.266−1.456 [−1.785, −1.118]1.65 × 10−7
AA−0.281 [−0.308, −0.239]1.7061.473−0.235 [−0.274, −0.178]1.07 × 10−7
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kang, Z.; Song, H.; Xue, H.; Wu, M.; Bao, D.; Yan, C.; Shi, H.; Hu, J.; Norton, T. YOLOv11n-CGSD: Lightweight Detection of Dairy Cow Body Temperature from Infrared Thermography Images in Complex Barn Environments. Agriculture 2026, 16, 229. https://doi.org/10.3390/agriculture16020229

AMA Style

Kang Z, Song H, Xue H, Wu M, Bao D, Yan C, Shi H, Hu J, Norton T. YOLOv11n-CGSD: Lightweight Detection of Dairy Cow Body Temperature from Infrared Thermography Images in Complex Barn Environments. Agriculture. 2026; 16(2):229. https://doi.org/10.3390/agriculture16020229

Chicago/Turabian Style

Kang, Zhongwei, Hang Song, Hang Xue, Miao Wu, Derui Bao, Chuang Yan, Hang Shi, Jun Hu, and Tomas Norton. 2026. "YOLOv11n-CGSD: Lightweight Detection of Dairy Cow Body Temperature from Infrared Thermography Images in Complex Barn Environments" Agriculture 16, no. 2: 229. https://doi.org/10.3390/agriculture16020229

APA Style

Kang, Z., Song, H., Xue, H., Wu, M., Bao, D., Yan, C., Shi, H., Hu, J., & Norton, T. (2026). YOLOv11n-CGSD: Lightweight Detection of Dairy Cow Body Temperature from Infrared Thermography Images in Complex Barn Environments. Agriculture, 16(2), 229. https://doi.org/10.3390/agriculture16020229

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop