YOLO-AWK: A Model for Injurious Bird Detection in Complex Farmland Environments

Yang, Xiang; Cheng, Yongliang; Dong, Minggang; Xie, Xiaolan

doi:10.3390/sym17081210

Open AccessArticle

YOLO-AWK: A Model for Injurious Bird Detection in Complex Farmland Environments

by

Xiang Yang

^1,2,

Yongliang Cheng

^1,2,*

,

Minggang Dong

³ and

Xiaolan Xie

^1,2

¹

College of Computer Science and Engineering, Guilin University of Technology, Guilin 541006, China

²

Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, Guilin 541004, China

³

College of Physics and Electronic Information Engineering, Guilin University of Technology, Guilin 541006, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(8), 1210; https://doi.org/10.3390/sym17081210

Submission received: 3 June 2025 / Revised: 18 July 2025 / Accepted: 21 July 2025 / Published: 30 July 2025

(This article belongs to the Special Issue Symmetry and Its Applications in Image Processing)

Download

Browse Figures

Versions Notes

Abstract

Injurious birds pose a significant threat to food production and the agricultural economy. To address the challenges posed by their small size, irregular shape, and frequent occlusion in complex farmland environments, this paper proposes YOLO-AWK, an improved bird detection model based on YOLOv11n. Firstly, to improve the ability of the enhanced model to recognize bird targets in complex backgrounds, we introduce the in-scale feature interaction (AIFI) module to replace the original SPPF module. Secondly, to more accurately localize and identify bird targets of different shapes and sizes, we use WIoUv3 as a new loss function. Thirdly, to remove the noise interference and improve the extraction of bird residual features, we introduce the Kolmogorov–Arnold network (KAN) module. Finally, to improve the model’s detection accuracy for small bird targets, we add a small target detection head. The experimental results show that the detection performance of YOLO-AWK on the farmland bird dataset is significantly improved, and the final precision, recall, mAP@0.5, and mAP@0.5:0.95 reach 93.9%, 91.2%, 95.8%, and 75.3%, respectively, which outperforms the original model by 2.7, 2.3, 1.6, and 3.0 percentage points, respectively. These results demonstrate that the proposed method offers a reliable and efficient technical solution for farmland injurious bird monitoring.

Keywords:

YOLOv11n; farmland injurious birds; food production; agricultural economy; target detection

1. Introduction

Food production is essential for human survival and forms the foundation of stable socioeconomic development [1]. With the global population increasing and climate change intensifying, ensuring food security has become a shared concern worldwide. In this context, crop damage by injurious birds has become one of the major challenges to global food production and the agricultural economy. According to relevant food production reports, the annual grain yield reduction caused by injurious birds in some rice-producing regions in Africa is 13.2% [2], causing incalculable losses to the agricultural economy. In the research conducted in Kenya, researchers found that the loss rate of crop seeds due to bird swallowing is close to 60% [3], which brings huge economic losses to agriculture. Therefore, in order to achieve sustainable agricultural development, detection and monitoring of injurious birds on farmland is necessary.

In China’s extensive agricultural regions, commonly observed injurious bird species include sparrows, pheasants, turtledoves, and magpies [4,5]. These birds feed on food crop seeds, seedlings, petals, fruits, etc., leading to a decline in food production and quality, and in severe cases, even resulting in food crop failure. Moreover, they are extremely adaptable and highly vigilant, able to recognize and evade simple repellent devices. Currently, the monitoring of injurious birds in agricultural fields still relies mainly on manual inspections. Field personnel must regularly conduct visual inspections to identify bird species, count populations, and estimate activity ranges. This method is not only inefficient, but also labor-intensive and susceptible to weather, light, and other natural conditions. Especially in large-scale, complex agricultural fields, bird monitoring faces greater challenges. On one hand, the diverse vegetation types, such as crops, weeds, and shrubs, often resemble birds in color and texture, leading to visual confusion and increasing identification difficulty. On the other hand, frequently changing lighting conditions, such as variations in sunlight angles or cloud cover, can result in uneven brightness in images, reducing target contrast. Additionally, noise interference is common in farmland environments, resulting in low-resolution images of birds captured by cameras and other sensors. These complex environmental factors in farmland collectively limit the efficiency of manual inspections and impose higher identification requirements on sensor systems.

With the rapid development of artificial intelligence, especially the wide application of deep learning in the field of computer vision, the bird recognition technology based on target detection provides a new solution idea for the monitoring of farmland injurious birds. The current mainstream target detection methods are mainly based on machine learning, which can be roughly divided into two categories: two-stage detection algorithms and single-stage detection algorithms [6]. The two-stage detection algorithm adopts the detection idea of “localization before identification”, which first generates candidate regions that may contain targets, and then classifies and accurately locates these regions. These methods are represented by Faster R-CNN [7] and Mask R-CNN [8], which have high detection accuracy. However, their complex structure and slow detection speed limit their application in injurious bird detection, which requires high real-time performance. In contrast, single-stage detection algorithms use end-to-end detection, simplify target detection into a single forward propagation process, and predict the category and location information of the target directly on the image. Representatives of single-stage detection algorithms include the SSD [9] and YOLO families [10]. These algorithms have a more concise structure and high computational efficiency and are suitable for tasks such as real-time detection of injurious birds. However, they still suffer from a lack of accuracy and missed detection when dealing with small injurious bird targets or complex farmland scenes.

In recent years, researchers have successively proposed a variety of coping strategies for various problems in bird detection. Aiming at the phenomena of motion blur and defocus blur in bird detection, Qiu et al. proposed a lightweight bird detection model based on YOLOv4-tiny [11]. The model demonstrates a lower false detection rate and leakage rate in comparison with common algorithms, such as Faster R-CNN [7], SSD [9], YOLOv3 [12], and YOLOv4 [13].

To cope with the recognition difficulties in bird classification and fine-grained feature extraction, Yi et al. optimized the structure of the YOLOv5 model [14]. Yi introduced the Res2Net-CBAM module [15] in the backbone network to enhance the feature extraction capability; meanwhile, the CBAM [16] attention mechanism was integrated in the neck network to improve the model’s sensing ability of the features of the key regions. The improved model achieves 86.6% in detection accuracy, which is 1.2% higher than the original YOLOv5.

For the problem of bird monitoring in complex background environments, such as airport runways, Liang et al. proposed the SMB-YOLOv5 model [17]. The method introduces the SSPCAB attention mechanism [18], which effectively enhances the model’s ability to pay attention to the key regions of the image, thus improving the detection of smaller-sized birds. At the same time, the combination of the MBB module enhances the feature expression ability and attenuates the influence of background interference on the detection results. Experimental results show that the model achieves an accuracy of 77.1% on the test set, which is a 2.6% improvement over the original YOLOv5.

Aiming at the problems of small size, dense distribution, and easy to be disturbed by the environment of birds, Jiang et al. [19] integrated the ODConv module [20] into the backbone network of YOLOv7, which effectively improved the model’s ability to recognize small bird targets. Meanwhile, the introduction of Alpha-GIoU loss function optimizes the bounding box regression, which further enhances the robustness and positioning accuracy of the model in complex environments.

To address the issue of reduced detection accuracy in complex environments, such as those with speckle noise and geometric distortion, Bayraktar et al. [21] introduced an enhanced channel attention (ECA) module into the neck section of YOLOv11 to enhance the model’s ability to extract key features. This method effectively compensates for the shortcomings of the original attention mechanism in handling the complexity of SAR images. Experimental results show that compared to the original YOLOv11, the model with ECA achieves an average detection accuracy improvement of 1.7% and a mAP@0.5 improvement of 2.3% across various resolutions.

Although the above studies have achieved some results in improving the accuracy and adaptability of bird detection, the basic detection algorithms they rely on are outdated. Although previous studies have achieved certain results in detection tasks under complex conditions, such as speckle noise and geometric distortion, the farmland environment contains numerous interfering factors, and birds often exhibit irregular shapes, large-scale variations, and strong camouflage, which still pose significant challenges for detection. Therefore, designing a target detection algorithm with higher accuracy, better generalization ability, and effective response to complex farmland scenarios is still of great research significance and application value.

In this study, we take YOLOv11n as the base model and propose the YOLO-AWK model for the practical problems faced by bird detection in farmland environments, and the main improvements include the following four aspects:

To address the problems of complex background of farmland scenes, where vegetation and other disturbing elements are easily confused with birds, we introduce an intrastate feature interaction (AIFI) module to replace the SPPF module of the YOLOv11n model to improve the model’s ability to recognize bird targets under complex backgrounds.
To address the problem that bird targets have irregular shapes and large-scale differences, and the CIoU used in the YOLOv11n model has limited ability to locate and identify a variety of bird targets, we use the WIoUv3 as a new loss function, so that the model can more accurately locate and identify bird targets of different shapes and sizes.
Aiming at the problems such as more noise interference and interferences in farmland, and the limited feature extraction ability of C3K2 in YOLOv11n, we fuse the KAN module with C3K2 and construct the new C3K2_KAN module to remove the noise interference and to improve the extracting ability of residual features of birds.
Aiming at the problems that bird targets in farmland are generally small, and the three detection heads that come with YOLOv11n do not make enough use of shallow features, which affects the detection performance of small targets, we add a small target detection head to the model to improve the detection accuracy of the model for small bird targets.

The structure of the article is organized as follows: Section 2 describes the dataset used, the YOLOv11n model, and the proposed improvements, Section 3 gives the experimental environment and training parameters, as well as the experimental results and analyses, and Section 4 concludes the work and looks at the future directions of the research.

2. Materials and Methods

2.1. Data Collection

To effectively evaluate the performance of the proposed YOLO11-AWK model, we constructed an image dataset containing seven common farmland injurious birds, including mynas, pheasants, sparrows, turtledoves, crows, magpies, and egrets. The image dataset of farmland injurious birds used in this study comes from the public data platform Roboflow (https://universe.roboflow.com, 20 April 2025). All image files in the dataset have been stored in JPG format, and all birds in the images have been labeled. We collected a total of 3851 images on the Roboflow platform. It is worth noting that nearly half of the images in the dataset have relatively cluttered backgrounds, often containing crops, weeds, and shrubs, which are distracting elements similar in color and texture to the birds, posing a significant challenge to accurate target identification. Furthermore, there are significant differences in the number of images of different species of birds in the dataset. The specific number of images for each type of bird is shown in Table 1. Figure 1 shows some of the original image samples.

To scale up the data and simulate the complex scenes of farmland birds under different time, angle, and weather conditions, we performed various enhancement processes on the raw bird image data. First, to address the issue of significant differences in the number of images of different bird species in the dataset, we applied image flipping to categories with fewer samples, thereby balancing the sample sizes across all bird species. Subsequently, to simulate low-resolution images generated by sensors under noise interference, we applied Gaussian blur to all images. Additionally, to reflect brightness and color differences caused by changes in lighting throughout the day, we adjusted the brightness and saturation of all images. Finally, based on the principle of image symmetry, we first performed image consolidation operations. After these enhancement operations, the total size of the dataset was expanded to 8932 images. All images were uniformly preprocessed to 640 × 640 pixels in order to adapt to the model input requirements. The dataset was then split into a training set, validation set, and test set in a 70%:15%:15% ratio. Figure 2 shows some representative samples of the enhanced images.

2.2. YOLOv11 Model

YOLOv11 is the next-generation target detection model released on 30 September 2024, by the Ultralytics team [22]. Ultralytics has introduced five model versions with differentiated performance for the computational requirements in different application scenarios, namely, n, s, m, l, and x. Among them, the YOLOv11n model, with the advantage of its lightweight architecture, achieves faster computational efficiency while maintaining higher detection accuracy, which is well-suited for real-time detection tasks of farmland injurious birds.

The YOLOv11n model as a whole consists of four main components: Inputs, Backbone, Neck, and Head, and its structure is schematically shown in Figure 3. The Inputs component preprocesses and normalizes the original image and adaptively scales the image to a resolution of 640 × 640. This processing helps the model converge faster and enhances its adaptability to images of different scales. In the Backbone component, the model employs multiple C3K2 modules, a Conv module, and SPPF layers for multi-scale feature extraction. Among them, the C3K2 module achieves efficient channel fusion by splitting feature channels and applying 3 × 3 convolutions, thereby enhancing feature fusion efficiency while maintaining computational efficiency and enhancing expressive power. The Conv module combines separable convolutions and grouped convolutions, effectively reducing the number of parameters and computational overhead while retaining low-level features, such as edges and textures. The SPPF layer enhances the ability to fuse contextual information through multi-scale pooling, improving the model’s perception of targets at different scales. The C2PSA module further introduces a pyramid squeeze attention mechanism to strengthen key area feature expression and suppress background interference. In the Neck component, the model uses upsampling (Upsample) and channel concatenation (Concat) to integrate semantic information and detail information from different levels, constructing multi-scale feature maps. This process enables the model to simultaneously focus on large-scale and small-scale targets, improving the utilization of fine-grained features, thereby better adapting to the detection requirements of different types and sizes of birds in farmland environments. In the Head component, the model classifies and locates the fused features through the multi-scale Detect module, outputting the category, bounding box coordinates, and confidence score of each target. The Detect module adopts an end-to-end structure, which has high detection accuracy and processing efficiency, and can achieve parallel prediction of multiple targets.

2.3. Improvements to YOLOv11n

Due to the small size, diverse morphology, and different scales of birds, as well as the complex background and interference factors in the farmland environment, the YOLOv11n model often faces the problems of low detection accuracy, false detection, and missed detection in practical applications. This study aims to improve the SPPF module, loss function module, and C3K2 module of the YOLO11 model, as well as increase the small target detection head, to enhance the model’s detection accuracy, recognition, and localization of birds in the farmland environment.

2.3.1. Improvements to the SPPF Module

In complex farmland environments, interfering elements, such as crops, weeds, and shrubs, often resemble birds in terms of color and texture, leading to target confusion. The internal scale feature extraction of SPPF in YOLOv11n is processed independently and lacks a direct interaction mechanism, which may make it difficult to accurately distinguish birds from interfering elements. To address these issues, we introduced the in-scale feature interaction (AIFI) module [23], which decouples intra-scale feature interaction from cross-scale fusion, enabling the model to process multi-scale features more efficiently. At the same time, it only applies attention to high-level features (such as

S 5

), significantly improving computational efficiency while maintaining high detection accuracy. This module replaces the original SPPF module in the YOLOv11n model to strengthen the information interaction mechanism of the internal scale feature layer, enhance the model’s ability to identify and classify the features of the bird targets, reduce the risk of double-counting and feature confusion, and thus reduce the cases of missed detection and misdetection effectively. The structure of AIFI is shown in Figure 4.

The AIFI module focuses on the interaction of feature information at the high-level semantic feature layer (

S 5

), which captures the teleconnection relationships between different regions in the image through the mechanism of multi-head attention. The model first spreads the feature map at the

S 5

layer and converts it into a one-dimensional sequence. The AIFI module then embeds the spatial location information into the features through position encoding to preserve the structural information of the image. Next, the features are analyzed in depth using a multi-head attention mechanism to enhance the model’s ability to recognize bird boundaries and semantic features. Each attention head focuses on different regions in the image to extract feature information from multiple perspectives to achieve more comprehensive feature perception. Afterwards, multilayer perceptron and layer normalization techniques are used to process the layer features. Eventually, the module reconverts the processed sequence features into a 2D structure, which is seamlessly integrated with the subsequent network layers. The specific computational flow is as follows:

1.: For the input high-level feature $S 5 \in R^{C \times H \times W}$ , it is first segmented and spread into sequence form, and then after preserving the spatial information by positional encoding, as well as a linear projection transformation of the feature $X$ , Equation (1) is obtained:

\{\begin{matrix} Q = W_{q} X \\ K = W_{k} X \\ V = W_{v} X \end{matrix},

(1)

where

Q

denotes query relevance,

K

denotes own features, and

V

denotes query value.

W_{q}

,

W_{k}

, and

W_{v}

denote the feature mapping matrices of

Q

,

K

, and

V

, respectively, which can be input into the multi-head attention layer to obtain feature interaction information.

2.: A multi-head attention mechanism is used to establish global dependencies between features. The calculation process of the multi-head self-attention mechanism is shown in Equation (2):

A t t (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V .

(2)

In the calculation process, the feature vectors

Q

and

K^{T}

are firstly subjected to dot product operation and normalized by scaling factor

\sqrt{d_{k}}

to obtain the attention score. Subsequently, the activation function

s o f t m a x

is introduced to transform the attention score to obtain the distribution of attention weights. This weight is then applied to the value vector

V

to achieve weighted summation. Finally, the information is further integrated through the multiple attention mechanism to generate the output feature tensor

A t t (Q, K, V)

.

3.: The serialized attentional output is restored to a two-dimensional spatial structure by a tensor reshaping operation using the $R e s h a p e$ operation:

\{\begin{matrix} F_{5} = R e s h a p e (A t t (Q, K, V)) \\ R e s h a p e (X) = X_{i_{1}, i_{2}, i_{m}}^{'} \end{matrix},

(3)

where

X

denotes the reconstructed feature tensor and

i

is the index of the corresponding element. The computed new tensor

F_{5}

is used to extract the global semantic information in the input features.

2.3.2. Loss Function Optimization

In the target detection task,

I O U

is an important metric to measure the degree of overlap between the prediction frame and the real frame, which is widely used to evaluate the model performance. The use of CIoU [24] as a loss function in the YOLOv11 model not only takes into account the area of overlap between the prediction frame and the real frame but also takes into account the Euclidean distance from the centroid and the aspect ratio factors, thus ensuring higher accuracy while alleviating the sensitivity problem of bounding box regression. However, compared to other targets, birds usually exhibit small size, irregular shape, varying scales, and blurred appearances in farmland environments, and CIoU overemphasizes geometric metrics, such as distance and aspect ratio, which may weaken the model’s ability to locate and discriminate complex bird targets, and reduce the overall detection performance. To address this issue, we introduce WIoUv3 [25] with dynamic weight allocation and outlier adjustment mechanisms as the loss function to enhance the algorithm’s ability to localize and discriminate bird targets. The initial version of Wise-IoU, WIoUv1, constructs the distance attention based on the distance metric, which is formulated as follows:

R_{W I o U} = e x p (\frac{{(x - x_{g t})}^{2} + {(y - y_{g t})}^{2}}{{(W_{g}^{2} + H_{g}^{2})}^{*}}),

(4)

L_{I o U} = 1 - I O U,

(5)

L_{W I o U v 1} = R_{W I o U} {\cdot L}_{I o U}, R_{W I o U} \in [1, e), L_{I o U} \in [0,1],

(6)

where

W_{g}

and

H_{g}

are the width and height of the smallest enclosing frame, respectively,

x_{g t}

and

y_{g t}

are the coordinate values of the real frame, respectively, and

*

denotes separation from the computational graph to prevent hindering the convergence of the model gradient. In Equation (6),

R_{W I o U}

is used to amplify

L_{I o U}

for normal-quality anchored frames, and

L_{I o U}

is used to reduce

R_{W I o U}

for high-quality anchored frames and to reduce the weighting of the distance to the centroid of the loss function when the anchor frames overlap the target frames to a high degree.

To accelerate model convergence and improve the ability to focus on regular samples, WIoU introduces the concept of outlier degree (see Equation (7)) in subsequent versions. On this basis, WIoUv3 further constructs non-monotonic focusing coefficients and introduces them into the weight allocation strategy of the earlier version, so that the model can adaptively adjust the gradient weights to cope with different types of samples more efficiently. The calculation process of WIoUv3 is shown in Equation (8):

β = \frac{L_{I o U}^{*}}{L_{I o U}} \in [0, + \infty),

(7)

L_{W I o U v 3} = {r L}_{W I o U v 1}, r = \frac{β}{{δ α}^{β - δ}} .

(8)

In the above equation,

β

denotes the degree of the outlier, while

α

and

δ

are two focusing coefficients whose values are adjusted with the data values to achieve the best optimization.

The improved YOLOv11n model in this paper uses the WIoUv3 algorithm to replace the original CIoU. In the WIoUv3 algorithm, an outlier is introduced to dynamically measure the difference between the predicted frames and the real frames, and its value is adjusted according to the change of CIoU. By adjusting the non-monotonic focusing coefficient, the model is made to focus more on ordinary anchor frames and reduce the penalty for special samples, such as those with large-scale variations and small area share, thus increasing the accuracy of identifying such samples.

2.3.3. Improvements to the C3K2 Module

Noise interference in farmland environments often reduces the resolution of images captured by sensors and causes defects in bird target features, while the original C3K2 module in YOLOv11n has limited ability to extract bird defect features. To address this issue, we introduced the Kolmogorov–Arnold network (KAN) module [26] and fused it with the C3K2 module to construct the new C3K2_KAN module to remove the noise interference and improve the extraction of defective features.

KAN’s most important feature is that it abandons the reliance on fixed activation functions in traditional networks. KAN places learnable activation functions at the edge of the network and completely removes the linear weight matrix. Each of its weight parameters is essentially a trainable univariate function, which enhances the model’s ability to express nonlinear relationships. KAN is based on the Kolmogorov–Arnold representation theorem, whose representation is shown in Equation (9):

f (x) = f (x_{1}, \cdot \cdot \cdot, x_{n}) = \sum_{q = 1}^{2 n + 1} φ_{q} (\sum_{p = 1}^{n} φ_{q, p} (x_{p})),

(9)

where

x = [x_{1}, \cdot \cdot \cdot, x_{n}]

is the input vector, while

φ_{q}

and

φ_{q, p}

are appropriately chosen one-dimensional continuous functions. The core of the Kolmogorov–Arnold theorem is to transform a multivariate function into a combination and superposition of multiple one-dimensional continuous functions, thus effectively reducing the complexity of function.

The activation function of KAN is shown in Equation (10), which consists of spline function

s p l i n e (x)

and basis function

b (x)

. The spline function is represented by the B-spline function, as shown in Equation (11). The basis function is denoted by

s i l u (x)

, which is a smooth nonlinear activation function, as shown in Equation (12):

\emptyset (x) = w_{1} \cdot s p l i n e (x) + w_{2} \cdot b (x),

(10)

s p l i n e (x) = \sum_{i = 1}^{n} C_{i} B_{i} (x)

(11)

b (x) = s i l u (x) = \frac{x}{1 + e^{- x}} .

(12)

In Equation (10),

w_{1} / w_{2}

is the path weight,

s p l i n e (x)

is the nonlinear fitting function, and

b (x)

is the basis function. In Equation (11),

s p l i n e (x)

is parameterized as a linear combination of B-splines,

B_{i} (x)

is the B-spline function,

C_{i}

is the coefficients, and

n

denotes the number of input values.

KAN uses B-splines to represent the spline function, which has local support, i.e., each data point only affects the shape of the function in its neighborhood, avoiding global perturbations. This property enables the model to locally adjust the image features and effectively suppress local noise interference without destroying the overall structure. At the same time, the B-spline function has good smoothing properties, which ensures a smooth transition of pixel values between neighboring regions, thus further reducing noise fluctuations in the image. With these properties, the C3K2_KAN module can not only effectively remove the noise but also extract the key defective features in birds. The specific B-spline function is shown in Equations (13) and (14):

B_{i, 0} = \{\begin{matrix} 1, i f x_{i} \leq x \leq x_{i + 1} \\ 0, o t h e r w i s e \end{matrix},

(13)

B_{i, p} (x) = \frac{x - x_{i}}{x_{i + p} - x_{i}} B_{i, p - 1} (x) + \frac{x_{i + p + 1} - x}{x_{i + p + 1} - x_{i + 1}} B_{i + 1, p - 1} (x) .

(14)

In the above equation,

p

denotes the order of the B-spline to be performed, and

B_{i, p} (x)

is the

p

-times B-spline basis function.

In this study, we constructed the improved KAN_Bottleneck module by introducing the learnable activation function in KAN, which consists of a basis function and a spline function, into the C3K2 module and integrating it into the Conv unit in the Bottleneck structure. The resulting new module, C3K2_KAN, has the structure shown in Figure 5.

2.3.4. Add Small Target Detection Head

In the field of object detection, small targets typically refer to objects that occupy an extremely small proportion of the pixel area in an image. Such targets often face challenges in feature extraction in deep feature maps due to their minimal pixel area coverage. The original YOLOv11 network structure has only three detection sizes: 80 × 80, 40 × 40, and 20 × 20. These detection heads extract features sequentially from shallow to deep layers, with shallow features having high resolution and rich geometric information, while deep features have a strong receptive field and rich semantic information. Therefore, the 80 × 80 shallow-layer detection head excels at extracting features of medium-sized and small targets, while the 20 × 20 shallow-layer detection head excels at extracting features of large targets. In agricultural monitoring scenarios, images captured by cameras often contain bird targets that are extremely small in size. Although the 80 × 80 detection head in the original YOLOv11n model can effectively detect medium and small targets, its ability to detect bird targets that are too small remains inadequate. This is mainly due to the inefficient utilization of shallow feature information, which causes the loss of small target details during the detection process. Therefore, we added the detection head of 160 × 160 small targets on the basis of the original three detection heads, which significantly improved the detection performance of small targets, such as birds, by enhancing the ability to extract local detail features. The improved structure of the YOLOv11n model is shown in Figure 6.

3. Results

3.1. Experimental Environment and Training Parameters

In the hardware part of the experiment, the central processor is Intel Pentium Gold G5400 (Intel Corporation, Santa Clara, CA, USA), and the graphics processor is NVIDIA GeForce RTX 4060 (NVIDIA Corporation, Santa Clara, CA, USA). The software environment runs on the Windows 11 64-bit operating system, the programming language is Python 3.12, the deep learning framework is PyTorch 2.1.0, and CUDA 12.1 is used to achieve efficient parallel computing.

The specific parameters of the model training settings are as follows:

The iteration period is set to 150 times, the batch is set to 16, the workers are set to 2, the initial value of the base learning rate is 0.01, and the weight decay coefficient is configured as 0.0005. The dimension of the input data is standardized and preprocessed to a uniform resolution of 640 × 640 pixels. All the experiments were conducted under the same hyperparameters without loading any pretrained weight coefficients.

3.2. Evaluation Metrics

When evaluating the performance of YOLO models, commonly used metrics include precision (P), recall (R), mAP@0.5, and mAP@0.5:0.95 to fully reflect the detection effectiveness [27]. To more accurately assess the applicability of the model in real-world deployment, we also introduce parameters, GFLOPs, and frames per second (FPS) as additional metrics. Parameters are used to measure the complexity of the model’s structure, where the fewer parameters there are, the leaner the model is, and GFLOPs are used to measure the computational efficiency of the model, where a lower value is usually a sign of a more efficient inference of the model. FPS is used to measure the real-time performance of the object detection model, where higher values mean the model can quickly process input image data and achieve real-time responses.

Precision is used to measure the proportion of all samples judged by the model to be birds that are actually birds. The higher the metric, the fewer birds the model misreported and the higher the model’s detection accuracy. Recall is used to assess the proportion of all samples of birds that were actually present that were successfully identified by the model. The higher the metric, the fewer missed detections and the more comprehensive the recognition ability of the model. Mean average precision (mAP) combines precision and recall and is used to measure the overall detection performance of the model on all categories of birds. mAP@0.5 is the key metric of mAP, which indicates the average precision calculated when the intersection and concurrence ratio (IoU) of predicted and real frames is not less than 0.5. The formulas for the above metrics are shown below:

P = \frac{T_{P}}{T_{P} + F_{P}},

(15)

R = \frac{T_{P}}{T_{P} + F_{N}},

(16)

m A P = \frac{1}{N} \sum_{i = 1}^{N} \int_{0}^{1} P (R) d R .

(17)

In the above equations, true positive

T_{P}

denotes the number of bird targets successfully identified by the model, false positive

F_{P}

refers to the number of non-bird targets incorrectly identified by the model as birds, false negative

F_{N}

refers to the number of birds actually present that the model fails to identify, and

N

denotes the number of categories of birds.

3.3. Loss Function Comparison Experiment

To verify the optimization effect of the bounding box regression loss function replacement strategy in this paper, we designed a series of bounding box regression loss function comparison experiments. We made side-by-side comparisons of the improved WIoUv3 [25] with the CIoU [24] used in the initial model, as well as the mainstream loss functions, such as DIoU [28], EIoU [29], GIoU [30], and PIoU [31]. The detailed comparison metrics of the loss functions are shown in Table 2.

From Table 2, it can be seen that the model (YOLOv11n + WIoUv3) with the introduction of the WIOUv3 loss function obtained optimal values in the key evaluation metrics, such as precision, mAP@0.5, and mAP@0.5:0.95, compared to the commonly used regression loss functions CIoU, DIoU, EIoU, GIoU, and PIoU. Compared with the original model (YOLOv11n + CIoU), the improved model (YOLOv11n + WIoUv3) substantially improved the precision by 1.8 percentage points, although the recall showed a slight decrease of 0.6 percentage points, while the mAP@0.5 and mAP@0.5:0.95 also improved by 0.4 and 0.1 percentage points. The above experimental results indicate that the introduction of WIoUv3 as a bounding box regression loss function can significantly improve the overall performance of the YOLOv11n model in the task of injurious bird detection in farmland.

Figure 7 visualizes the trend of model accuracy with training rounds for different loss functions during training. From the figure, it can be seen that each loss function converged rapidly within the first 60 rounds and stabilized thereafter. In the late stage of training, the fluctuation of the accuracy curve of each model was small, which indicates that the performance of these loss functions in the training process was relatively stable, with good convergence and robustness.

3.4. Model Comparison Experiment

To better validate the effectiveness and feasibility of the improved YOLOv11n model, YOLO-AWK, comparative experiments were conducted with several mainstream target detection models, including Faster R-CNN [7], YOLOv5n [32], YOLOv8n [33], YOLOv9t [34], YOLOv10n [35], YOLOv11n [36], and YOLO-AWK, and the experimental results are shown in Table 3.

The YOLO-AWK model we proposed achieved optimal performance in key metrics, such as P, R, mAP@0.5, and mAP@0.5:0.95, with values of 93.9%, 91.2%, 95.8%, and 75.3%, respectively. The high precision indicates that the model had a low false positive rate for birds, enabling it to accurately distinguish between bird species and non-bird targets. The high recall rate suggests that the model had few missed detections of birds, demonstrating strong bird recognition capabilities. The improvements in mAP@0.5 and mAP@0.5:0.95 further validate the model’s overall performance in terms of localization accuracy and classification accuracy for bird targets. However, the improved YOLO-AWK model had a GFLOPs of 10.5, an FPS of 169.5, and 4.02 million parameters, indicating that there is still room for further optimization in terms of inference speed, real-time detection performance, and structural complexity compared to some YOLO series models. Overall, YOLO-AWK demonstrated strong practical value in high-precision agricultural bird detection tasks.

Figure 8 shows the results of the normalized confusion matrix comparison between the improved model (YOLO-AWK) and the initial model (YOLOv11n). As can be seen from the figure, YOLO-AWK improved the detection accuracy in almost all categories, especially the classification accuracy of birds was significantly improved. Specifically, the occurrence of birds being misclassified was significantly reduced, while the occurrence of farmland backgrounds being misidentified as bird targets was effectively controlled. The incidence of missed detection and misdetection was significantly reduced, indicating that YOLO-AWK perceived and classified bird targets more accurately than the initial model.

3.5. Ablation Experiment

To verify the effect of each module on bird detection, each module was added to the original YOLOv11n model, and four sets of ablation experiments were conducted, the indicators were calculated, and the results are shown in Table 4.

As shown in Table 4, the original YOLOv11n model had high overall accuracy metrics when detecting birds in farmland. We introduced the AIFI module, WIoUv3 loss function, KAN module, and added a small-object detection head to improve the model, which achieved significant improvements in overall performance metrics. Experiment 1 introduced the AIFI module into the original YOLOv11n model to replace the original SPPF module, and although precision decreased, recall, mAP@0.5, and mAP@0.5:0.95 detection indexes were further improved. Experiment 2 replaced the loss function with WIOUv3 on the basis of Experiment 1, and although recall decreased, precision, mAP@0.5, and mAP@0.5:0.95 further improved, indicating that the module improved the detection performance of the model through the WIOUv3 loss function. Experiment 3 introduced the KAN neural network in the C3K2 module on the basis of Experiment 2, which used the attention mechanism to strengthen the key features and avoided information loss through residual connections. Finally, Experiment 4 added a new small target detection head on the basis of Experiment 3 and all the detection metrics improved. The precision, recall, mAP@0.5, and mAP@0.5:0.95 of the improved YOLOv11n model were increased to 93.9%, 91.2%, 95.8%, and 75.3%, respectively, which were 2.7, 2.3, 1.6, and 3.0 percentage points higher compared with the original YOLOv11n model.

3.6. Visual Contrast Experiment

To evaluate the detection effect of the YOLO-AWK model more intuitively, we selected YOLOv8n, YOLOv10n, and YOLOv11n, which have better performance in the YOLO series, as the comparison models, and conducted visualization experiments on the validation set. Figure 9 shows some representative bird image detection results, and the numbers in the upper left corner of the box in the figure indicate the confidence level of the model prediction, while we labeled the missed bird individuals with red boxes and labeled the false positives with yellow boxes. Under the complex conditions of background interference, low brightness, and image blurring, all four models were able to recognize bird targets in the images, but the confidence level of the YOLO-AWK model was higher. In the bird images merged using the symmetry principle, none of the four models showed misdetection, but YOLO-AWK still had a slight advantage in the confidence level. In scenes with dense bird populations, YOLOv8n, YOLOv10n, and YOLOv11n all exhibited varying degrees of false negatives and false positives (misclassifying non-bird objects as birds and category recognition errors). In contrast, YOLO-AWK accurately identified all bird targets in the images without any false negatives or false positives. In summary, the YOLO-AWK model showed strong bird detection ability in a variety of complex farmland environments and was able to maintain high accuracy and stability in scenes with low brightness, background interference, and dense targets, demonstrating strong potential for real-world applications.

4. Conclusions

This paper proposed an improved farmland injurious bird detection algorithm, YOLO-AWK, based on YOLOv11n. To address issues encountered in farmland bird damage detection, we designed a series of improvement schemes: introducing an AIFI module into the Backbone network to mitigate the confusion caused by complex farmland environment backgrounds on birds, integrating a KAN (Kolmogorov–Arnold network) into the C3K2 module to remove noise and highlight defective features, replacing the original CIoU with the WIoUv3 loss function to enhance bounding box regression capabilities, and adding a dedicated small-object detection head to improve recognition of small birds. Experimental results on the farmland injurious bird dataset showed that YOLO-AWK achieved significantly improved detection performance on the farmland injurious bird dataset, with final precision, recall, mAP@0.5, and mAP@0.5:0.95 reaching 93.9%, 91.2%, 95.8%, and 75.3%, respectively, representing improvements of approximately 2.7, 2.3, 1.6, and 3.0 percentage points over the original model. The above experiments verified that the YOLO-AWK model can stably and accurately identify harmful bird targets in complex farmland environments, demonstrating excellent detection performance and further proving its application potential and practical value in actual harmful bird monitoring.

Although the YOLOv11-AWK model has achieved significant improvements in detection accuracy and recall rate, it still has some shortcomings. Current optimizations have primarily focused on enhancing detection performance, with limited consideration given to computational efficiency, real-time detection, and model lightweighting. This may restrict its effectiveness in scenarios with limited device resources or real-time processing requirements, such as agricultural injurious bird monitoring. Therefore, we will focus on further optimizing the model structure while maintaining detection accuracy, reducing computational costs, and exploring methods to enhance model robustness to accommodate the increasingly diverse and dynamic demands of agricultural bird damage monitoring. At the same time, we will also test the model’s performance in real, uncontrolled, and unscreened noisy environments to improve its performance and robustness in actual farmland environments. As part of our ongoing research, we also plan to collect and manually annotate a wider range of bird species data, particularly from orchards and other agricultural settings, to expand the model’s adaptability and detection effectiveness in various real-world scenarios.

Author Contributions

Conceptualization, X.Y.; methodology, Y.C.; software, X.X.; validation, Y.C.; formal analysis, M.D.; resources, X.Y.; data curation, X.X.; writing—original draft, Y.C.; writing—review and editing, X.Y.; visualization, X.X.; supervision, M.D. and X.X.; project administration, X.Y., Y.C. and M.D. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the National Natural Science Foundation of China (61563012) and the General Project of Guangxi Natural Science Foundation (2021GXNSFAA220074).

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Savary, S.; Ficke, A.; Aubertot, J.N.; Hollier, C. Crop Losses Due to Diseases and Their Implications for Global Food Production Losses and Food Security. Food Secur. 2012, 4, 519–537. [Google Scholar] [CrossRef]
De Mey, Y.; Demont, M.; Diagne, M. Estimating Bird Damage to Rice in Africa: Evidence from the Senegal River Valley. J. Agric. Econ. 2012, 63, 175–200. [Google Scholar] [CrossRef]
Hiron, M.; Rubene, D.; Mweresa, C.K.; Ajamma, Y.U.O.; Owino, E.A.; Low, M. Crop Damage by Granivorous Birds Despite Protection Efforts by Human Bird Scarers in a Sorghum Field in Western Kenya. Ostrich 2014, 85, 153–159. [Google Scholar] [CrossRef]
Jiang, X.; Sun, Y.; Chen, F.; Ge, F.; Ouyang, F. Control of Maize Aphids by Natural Enemies and Birds under Different Farmland Landscape Patterns in North China. Chin. J. Biol. Control 2021, 37, 863. [Google Scholar] [CrossRef]
Wood, C.; Qiao, Y.; Li, P.; Ding, P.; Lu, B.; Xi, Y. Implications of Rice Agriculture for Wild Birds in China. Waterbirds 2010, 33, S30–S43. [Google Scholar] [CrossRef]
Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object Detection in 20 Years: A Survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Bi, X.; Hu, J.; Xiao, B.; Li, W.; Gao, X. IEMask R-CNN: Information-Enhanced Mask R-CNN. IEEE Trans. Big Data 2023, 9, 688–700. [Google Scholar] [CrossRef]
Zhai, S.; Shang, D.; Wang, S.; Dong, S. DF-SSD: An Improved SSD Object Detection Algorithm Based on DenseNet and Feature Fusion. IEEE Access 2020, 8, 24344–24357. [Google Scholar] [CrossRef]
Terven, J.; Córdova-Esparza, D.M.; Romero-González, J.A. A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Qiu, Z.; Zhu, X.; Liao, C.; Shi, D.; Kuang, Y.; Li, Y.; Zhang, Y. Detection of Bird Species Related to Transmission Line Faults Based on Lightweight Convolutional Neural Network. IET Gener. Transm. Distrib. 2022, 16, 869–881. [Google Scholar] [CrossRef]
Zhao, L.; Li, S. Object detection algorithm based on improved YOLOv3. Electronics 2020, 9, 537. [Google Scholar] [CrossRef]
Gai, R.; Chen, N.; Yuan, H. A detection algorithm for cherry fruits based on the improved YOLO-v4 model. Neural Comput. Appl. 2023, 35, 13895–13906. [Google Scholar] [CrossRef]
Yi, X.; Qian, C.; Wu, P.; Maponde, B.T.; Jiang, T.; Ge, W. Research on Fine-Grained Image Recognition of Birds Based on Improved YOLOv5. Sensors 2023, 23, 8204. [Google Scholar] [CrossRef] [PubMed]
Gao, S.H.; Cheng, M.M.; Zhao, K.; Zhang, X.Y.; Yang, M.H.; Torr, P.H.S. Res2Net: A New Multi-Scale Backbone Architecture. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 652–662. [Google Scholar] [CrossRef] [PubMed]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Liang, H.; Zhang, X.; Kong, J.; Zhao, Z.; Ma, K. SMB-YOLOv5: A Lightweight Airport Flying Bird Detection Algorithm Based on Deep Neural Networks. IEEE Access 2024, 12, 84878–84892. [Google Scholar] [CrossRef]
Ristea, N.C.; Madan, N.; Ionescu, R.T.; Nasrollahi, K.; Khan, F.S.; Moeslund, T.B.; Shah, M. Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 13576–13586. [Google Scholar]
Jiang, T.; Zhao, J.; Wang, M. Bird Detection on Power Transmission Lines Based on Improved YOLOv7. Appl. Sci. 2023, 13, 11940. [Google Scholar] [CrossRef]
Qian, J.; Lin, J.; Bai, D.; Xu, R.; Lin, H. Omni-Dimensional Dynamic Convolution Meets Bottleneck Transformer: A Novel Improved High Accuracy Forest Fire Smoke Detection Model. Forests 2023, 14, 838. [Google Scholar] [CrossRef]
Bayraktar, I.; Bakirci, M. Attention-augmented YOLO11 for high-precision aircraft detection in synthetic aperture radar imagery. In Proceedings of the 2025 27th International Conference on Digital Signal Processing and Its Applications (DSPA), Moscow, Russia, 26–28 March 2025; pp. 1–6. [Google Scholar] [CrossRef]
Wang, D.; Tan, J.; Wang, H.; Kong, L.; Zhang, C.; Pan, D.; Li, T.; Liu, J. SDS-YOLO: An Improved Vibratory Position Detection Algorithm Based on YOLOv11. Measurement 2025, 244, 116518. [Google Scholar] [CrossRef]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-Time Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2024; pp. 16965–16974. [Google Scholar]
Huang, P.; Tian, S.; Su, Y.; Tan, W.; Dong, Y.; Xu, W. IA-CIOU: An Improved IOU Bounding Box Loss Function for SAR Ship Target Detection Methods. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 10569–10582. [Google Scholar] [CrossRef]
Deng, L.; Wu, S.; Zhou, J.; Zou, S.; Liu, Q. LSKA-YOLOv8n-WIoU: An Enhanced YOLOv8n Method for Early Fire Detection in Airplane Hangars. Fire 2025, 8, 67. [Google Scholar] [CrossRef]
Somvanshi, S.; Javed, S.A.; Islam, M.M.; Pandit, D.; Das, S. A Survey on Kolmogorov-Arnold Network. ACM Comput. Surv. 2024. [Google Scholar] [CrossRef]
Badgujar, C.M.; Poulose, A.; Gan, H. Agricultural object detection with You Only Look Once (YOLO) Algorithm: A bibliometric and systematic literature review. Comput. Electron. Agric. 2024, 223, 109090. [Google Scholar] [CrossRef]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar] [CrossRef]
Zhang, Y.-F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
Chen, Z.; Chen, K.; Lin, W.; See, J.; Yu, H.; Ke, Y.; Yang, C. Piou loss: Towards accurate oriented object detection in complex environments. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part V.. Springer: Cham, Switzerland, 2020; pp. 195–211. [Google Scholar]
Yao, J.; Qi, J.; Zhang, J.; Shao, H.; Yang, J.; Li, X. A Real-Time Detection Algorithm for Kiwifruit Defects Based on YOLOv5. Electronics 2021, 10, 1711. [Google Scholar] [CrossRef]
Wang, G.; Chen, Y.; An, P.; Hong, H.; Hu, J.; Huang, T. UAV-YOLOv8: A Small-Object-Detection Model Based on Improved YOLOv8 for UAV Aerial Photography Scenarios. Sensors 2023, 23, 7190. [Google Scholar] [CrossRef] [PubMed]
Fatehi, F.; Bagherpour, H.; Parian, J.A. Enhancing the Performance of YOLOv9t Through a Knowledge Distillation Approach for Real-Time Detection of Bloomed Damask Roses in the Field. Smart Agric. Technol. 2025, 10, 100794. [Google Scholar] [CrossRef]
Li, Y.; Guo, Z.; Sun, Y.; Chen, X.; Cao, Y. Weed Detection Algorithms in Rice Fields Based on Improved YOLOv10n. Agriculture 2024, 14, 2066. [Google Scholar] [CrossRef]
Zhou, K.; Jiang, S. Forest Fire Detection Algorithm Based on Improved YOLOv11n. Sensors 2025, 25, 2989. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Pictures of some of the injurious birds.

Figure 2. Sample images after data enhancement.

Figure 3. YOLOv11n model structure diagram.

Figure 4. The structure of AIFI.

Figure 5. C3K2_KAN module structure.

Figure 6. Structural diagram of the improved YOLOv11n model.

Figure 7. Precision comparison for various loss functions.

Figure 8. Comparison of normalized confusion matrices. (a) Normalized confusion matrix of YOLOv11n. (b) Normalized confusion matrix of YOLOv11-AWK.

Figure 9. Visual comparison of detection results for different models.

Table 1. Number of images of various bird species.

Species	Number
Myna	594
Pheasant	535
Sparrow	593
Turtledove	515
Crow	597
Magpie	564
Egret	453

Table 2. Loss function comparison.

Loss Function	Precision/%	Recall/%	mAP@0.5/%	mAP@0.5:0.95/%
CIoU	91.2	88.9	94.2	72.3
DIoU	91.9	86.9	94.1	72.2
EIoU	92.3	87.2	94.5	72
GIoU	92.6	87.5	94.3	71.5
PIoU	92.3	87.5	94.1	72.4
WIoUv3	93.0	88.3	94.6	72.4

Table 3. Comparison of the performance of different models.

Models	Precision/%	Recall/%	mAP@0.5/%	mAP@0.5:0.95/%	Parameters/M	GFLOPs	FPS
Faster R-CNN	81.8	83.5	83.2	59.9	28.3	164.3	40.5
YOLOv5n	93.0	83.9	93.1	70.2	1.8	4.1	238.1
YOLOv8n	91.1	88.3	94.1	72.4	3.0	8.1	222.2
YOLOv9t	90.1	87.9	93.7	72.1	2.8	11.7	149.3
YOLOv10n	91.6	89.3	94.5	73.6	2.69	8.2	163.9
YOLOv11n	91.2	88.9	94.2	72.3	2.58	6.3	217.4
YOLO-AWK	93.9	91.2	95.8	75.3	4.02	10.5	169.5

Table 4. Comparative results of ablation tests.

YOLOv11n	AIFI	WIoUv3	KAN	Add Small Target Detection Head	Precision	Recall	mAP@0.5	mAP@0.5:0.95	Parameters/M	GFLOPs
`√`	-	-	-	-	91.2	88.9	94.2	72.3	2.58	6.3
√	√	-	-	-	90.9	89.5	94.4	72.7	3.21	6.6
√	-	√	-	-	93.0	88.3	94.6	72.4	2.58	6.3
√	-	-	√	-	92.8	88.6	94.6	72.9	3.32	6.3
√	-	-	-	√	91.8	90.2	94.8	72.8	2.66	10.2
√	√	√	-	-	93.2	87.8	94.8	73.5	3.21	6.6
√	√	√	√	-	93.8	89.4	95.3	74.3	3.95	6.6
√	√	√	√	√	93.9	91.2	95.8	75.3	4.02	10.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, X.; Cheng, Y.; Dong, M.; Xie, X. YOLO-AWK: A Model for Injurious Bird Detection in Complex Farmland Environments. Symmetry 2025, 17, 1210. https://doi.org/10.3390/sym17081210

AMA Style

Yang X, Cheng Y, Dong M, Xie X. YOLO-AWK: A Model for Injurious Bird Detection in Complex Farmland Environments. Symmetry. 2025; 17(8):1210. https://doi.org/10.3390/sym17081210

Chicago/Turabian Style

Yang, Xiang, Yongliang Cheng, Minggang Dong, and Xiaolan Xie. 2025. "YOLO-AWK: A Model for Injurious Bird Detection in Complex Farmland Environments" Symmetry 17, no. 8: 1210. https://doi.org/10.3390/sym17081210

APA Style

Yang, X., Cheng, Y., Dong, M., & Xie, X. (2025). YOLO-AWK: A Model for Injurious Bird Detection in Complex Farmland Environments. Symmetry, 17(8), 1210. https://doi.org/10.3390/sym17081210

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

YOLO-AWK: A Model for Injurious Bird Detection in Complex Farmland Environments

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection

2.2. YOLOv11 Model

2.3. Improvements to YOLOv11n

2.3.1. Improvements to the SPPF Module

2.3.2. Loss Function Optimization

2.3.3. Improvements to the C3K2 Module

2.3.4. Add Small Target Detection Head

3. Results

3.1. Experimental Environment and Training Parameters

3.2. Evaluation Metrics

3.3. Loss Function Comparison Experiment

3.4. Model Comparison Experiment

3.5. Ablation Experiment

3.6. Visual Contrast Experiment

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI