Advanced Insect Detection Network for UAV-Based Biodiversity Monitoring

Khujamatov, Halimjon; Muksimova, Shakhnoza; Abdullaev, Mirjamol; Cho, Jinsoo; Jeon, Heung-Seok

doi:10.3390/rs17060962

Open AccessArticle

Advanced Insect Detection Network for UAV-Based Biodiversity Monitoring

by

Halimjon Khujamatov

^1,†

,

Shakhnoza Muksimova

^1,†

,

Mirjamol Abdullaev

²,

Jinsoo Cho

¹ and

Heung-Seok Jeon

^3,*

¹

Department of Computer Engineering, Gachon University, Seognam-daero, Sujeong-gu, Seongnam-si 1342, Republic of Korea

²

Department of Information Systems and Technologies, Tashkent State University of Economics, Tashkent 100066, Uzbekistan

³

Department of Computer Engineering, Konkuk University, Chungju-si 27478, Republic of Korea

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Remote Sens. 2025, 17(6), 962; https://doi.org/10.3390/rs17060962

Submission received: 6 January 2025 / Revised: 24 February 2025 / Accepted: 6 March 2025 / Published: 9 March 2025

Download

Browse Figures

Versions Notes

Abstract

The Advanced Insect Detection Network (AIDN), which represents a significant advancement in the application of deep learning for ecological monitoring, is specifically designed to enhance the accuracy and efficiency of insect detection from unmanned aerial vehicle (UAV) imagery. Utilizing a novel architecture that incorporates advanced activation and normalization techniques, multi-scale feature fusion, and a custom-tailored loss function, the AIDN addresses the unique challenges posed by the small size, high mobility, and diverse backgrounds of insects in aerial images. In comprehensive testing against established detection models, the AIDN demonstrated superior performance, achieving 92% precision, 88% recall, an F1-score of 90%, and a mean Average Precision (mAP) score of 89%. These results signify a substantial improvement over traditional models such as YOLO v4, SSD, and Faster R-CNN, which typically show performance metrics approximately 10–15% lower across similar tests. The practical implications of AIDNs are profound, offering significant benefits for agricultural management and biodiversity conservation. By automating the detection and classification processes, the AIDN reduces the labor-intensive tasks of manual insect monitoring, enabling more frequent and accurate data collection. This improvement in data collection quality and frequency enhances decision making in pest management and ecological conservation, leading to more effective interventions and management strategies. The AIDN’s design and capabilities set a new standard in the field, promising scalable and effective solutions for the challenges of UAV-based monitoring. Its ongoing development is expected to integrate additional sensory data and real-time adaptive models to further enhance accuracy and applicability, ensuring its role as a transformative tool in ecological monitoring and environmental science.

Keywords:

insect detection; UAV-based monitoring; aerial imagery analysis; remote sensing; biodiversity conservation

1. Introduction

The monitoring and management of insect populations are critical tasks in both ecological research and agricultural practices [1]. Insects play diverse roles in ecosystems, such as pollination, decomposition, and as a food source for other wildlife, making their monitoring vital for ecological balance [2]. Conversely, certain species are pests that damage crops, leading to significant agricultural losses [3]. Traditional methods for monitoring insects, including manual trapping and visual counts, are labor-intensive and often lack the scale and precision required for effective population management [4]. With the advent of UAVs, new possibilities have emerged for environmental monitoring through enhanced spatial coverage and data collection efficiency [5]. However, despite these advancements, the application of UAV technology in insect monitoring faces significant challenges [6]. The primary issues include the small size and high mobility of insects, which make them difficult to detect accurately in the diverse and dynamic backgrounds of natural environments [7]. Furthermore, traditional image processing algorithms struggle to maintain high accuracy when applied to the high-altitude images typically captured by UAVs, where insects appear as small, fast-moving specks [8]. Deep learning has had a significant impact on the detection and recognition of insect species, particularly in the context of agricultural monitoring and pest management [9]. The integration of deep-learning techniques has enabled more accurate and efficient identification of insects, which is crucial for implementing effective pest control strategies and minimizing crop damage [10]. One of the key advantages of using deep learning in insect detection is its ability to process complex visual data and recognize patterns that are difficult for traditional algorithms to handle [11]. Deep learning has facilitated the development of real-time detection systems that can operate in the field, enhancing the timeliness and precision of pest management [12]. The ability to quickly identify pest species allows for more targeted and efficient interventions, reducing the need for broad-spectrum pesticide applications and supporting more sustainable agricultural practices [13]. Despite these advancements, challenges remain, such as the need for high computational resources and the difficulty of detecting very small or camouflaged insects [14]. In their study [15] on enhancing rice production efficiency, the authors highlight the significance of rice as a staple food and its challenges, including natural disasters and pests. They propose a novel solution using an Internet of Things (IoT)-assisted UAV that utilizes AI to detect pests via the Imagga cloud. This method facilitates timely interventions to prevent pest damage, reducing rice wastage during production. Another research study [16] introduces a UAV-based approach employing the YOLOv7 algorithm to automate the inspection of olive tree flies. By utilizing multirotor UAVs equipped with two different digital cameras, the study assesses the effectiveness of computer vision techniques to detect these pests under varied conditions. Ref. [17] introduces an innovative UAV-based visual–acoustic system designed for the early detection of pest invasions in farmlands, specifically targeting grasshoppers. By deploying a system that combines visual and acoustic sensing, the researchers aim to provide greater flexibility and accuracy in field conditions. Study [18] introduces a novel monitoring method using multispectral UAV imagery and machine learning to diagnose infections caused by Serenomyces phoenicis and Phoenicococcus marlatti. This method efficiently segments and classifies imagery to estimate the prevalence of disease at the individual tree level, culminating in a robust probabilistic classification model. A [19] study conducted in Hidalgo State, Mexico, implemented a UAV-based remote sensing process alongside a deep-learning framework to identify and map bark beetle damage effectively. Ref. [20] proposes a UAV-based trilateration system that leverages multiple omnidirectional antennas to localize and track flying insects more accurately and swiftly than systems using unidirectional antennas. By incorporating a finite impulse response (FIR) filter, the proposed method reduces noise and errors in the tracking data, demonstrated through various field experiments, including behavior, ground truth, and localization tracking tests. The introduction of a DHMPD-based [21] IoT-UAV system in recent research marks a significant advancement in smart agriculture. This system encompasses several stages, from data acquisition to classification, utilizing methods such as Z-score normalization for image preprocessing and hybrid deep-learning approaches for classification, including enhanced DBN and LSTM models. Innovative strides in agricultural technology have led to the development of the SP-YOLO detection algorithm, designed to overcome the inherent challenges of detecting soybean pests. This enhanced model, built on the YOLOv8n framework, incorporates FasterNet and the novel PConvGLU architecture to refine feature extraction and reduce resource usage. The study reports [22] significant improvements in detection accuracy, with the SP-YOLO model achieving higher precision and recall rates compared to its predecessor, alongside a more efficient operational profile characterized by a higher frame rate and a smaller model size.

Our model, the AIDN, addresses and effectively overcomes several critical limitations inherent in traditional monitoring methods, such as YOLO v4, SSD, and Faster R-CNN, which have previously struggled with the accurate detection of small, fast-moving insects against highly dynamic and diverse backgrounds. The development of the AIDN incorporates a novel architecture that leverages advanced activation and normalization techniques alongside a sophisticated multi-scale feature fusion system. This design significantly improves the model capacity to identify relevant features within complex aerial scenes, thereby enhancing detection accuracy. Unlike earlier approaches, the AIDN is optimized for real-time data processing, enabling immediate insights crucial for timely decision making in pest management and ecological conservation. Further distinguishing our work, the AIDN utilizes a custom-tailored loss function that finely balances the aspects of localization, classification, and confidence accuracy. This function is specifically designed to meet the unique challenges posed by aerial insect detection, offering a refined approach compared to the more generic loss functions employed by previous models. Additionally, our solution addresses the scalability and efficiency demands of extensive ecological monitoring. By automating the detection and classification processes, the AIDN facilitates the monitoring of larger areas than is feasible with manual methods, thus providing a more comprehensive understanding of insect populations across varied landscapes. The advancements presented in the AIDN not only solve specific challenges associated with the detection of small and fast-moving targets but also significantly improve the overall effectiveness and efficiency of UAV-based ecological monitoring systems, setting a new standard in the field. The introduction of the AIDN model represents a significant leap forward in the field of remote sensing applied to ecological monitoring. By integrating cutting-edge techniques in machine learning and computer vision, the AIDN is expected to provide several advancements:

Through its refined architecture, the AIDN aims to enhance the sensitivity and specificity of insect detection, thereby reducing both false positives and missed detections.
By automating the detection process, the AIDN allows for the monitoring of larger areas than is feasible with manual methods, offering a more comprehensive understanding of insect populations across diverse landscapes.
The AIDN’s processing framework is designed to support real-time data analysis, providing immediate insights that are crucial for timely decision making in pest management and ecological conservation.
Improved monitoring accuracy helps with better conservation efforts for beneficial insect species and more effective control of pest populations, leading to enhanced agricultural productivity and reduced chemical pesticide use.

The AIDN proposes significant advancements over existing models through the incorporation of a novel multi-scale feature fusion module, enhanced attention mechanisms, a custom-tailored loss function, and optimizations for real-time processing. These innovations collectively enable superior detection performance, particularly in challenging UAV-captured imagery where traditional models falter. By addressing specific limitations in current detection technologies, the AIDN sets a new standard in the field, offering both higher accuracy and operational efficiency. This research not only addresses a significant gap in current ecological monitoring practices but also contributes to the broader field of precision agriculture and biodiversity conservation. By demonstrating the feasibility and benefits of UAV-based insect monitoring, it lays the groundwork for future innovations in environmental science and technology.

2. Materials and Methods

This section outlines the methodologies employed to develop, train, and validate the AIDN, a deep-learning model designed for the detection of insects from images captured by UAVs. The approach combines innovative model architecture with robust training techniques and rigorous validation processes to ensure accuracy and reliability in real-world conditions. The methodologies described here are integral to understanding the enhancements the AIDN offers over traditional insect monitoring and detection methods. In the AIDN model, anchor boxes are scaled to cover insect sizes ranging from 10 × 10 to 30 × 30 pixels within 640 × 480 images, reflecting the typical visibility of these targets in UAV imagery. The detection network comprises three heads, each optimized for different scales of feature maps. These heads utilize convolutional kernels of sizes 1 × 1 and 3 × 3 to adeptly manage the trade-off between detection accuracy and computational efficiency, ensuring robust performance across varied sizes and movements of insect targets

The architecture (Figure 1) of the AIDN is meticulously designed to address the unique challenges posed by the aerial detection of small and fast-moving insects using UAV imagery. By integrating state-of-the-art deep-learning technologies and custom modifications, the AIDN achieves high accuracy and real-time performance, essential for effective ecological monitoring and agricultural applications. This section details the underlying structure of the AIDN, highlighting its innovative components and the rationale behind each design choice. The AIDN utilizes a convolutional neural network (CNN) as its foundational framework, leveraging the powerful feature extraction capabilities inherent to CNNs.

2.1. Multi-Scale Feature Fusion (MSFF) Module

To address the potential loss of important features through successive convolutional layers, our model employs a sophisticated MSFF technique. This technique synthesizes feature maps from F1, F2, and F3, ensuring a comprehensive aggregation of details from multiple scales. By combining these layers, the model retains critical features from each level of abstraction. F1’s fine details are enhanced through up-sampling, which increases the spatial resolution, allowing these finer features to persist in the final detection output. Simultaneously, the down-sampling of F3’s semantically rich maps ensures that high-level contextual information is not lost but effectively disseminated across the network. To further ensure the preservation and emphasis of relevant features, spatial and channel-wise attention mechanisms are incorporated within the MSFF module. These mechanisms prioritize and amplify features critical for detecting and classifying various insect species accurately. By integrating these methods, our model maintains and enhances the essential characteristics necessary for accurate insect detection across diverse environmental settings. The MSFF module is a pivotal enhancement in the AIDN, specifically engineered to tackle one of the significant challenges in UAV-based insect detection: effectively recognizing targets across a broad spectrum of scales. The rationale for the MSFF module stems from the necessity to discern minutely small insects, which might appear as mere specks in high-altitude UAV imagery, and to integrate these detections into the broader contextual framework of larger environmental features. Effective object detection in high-resolution aerial images demands a nuanced comprehension of both localized and expansive image features. Small insects, the primary focus of detection efforts, can be overlooked by detectors that primarily scrutinize larger-scale attributes. Conversely, a narrow focus on intricate details may result in the misclassification of noise as objects of interest. To mitigate these issues, the MSFF module synthesizes a rich, multi-scaled feature map that encapsulates both detailed and contextual information, thus ensuring a comprehensive detection capability.

The core of the MSFF module is a pyramidal feature hierarchy that processes multiple convolutional layers from the backbone network. This hierarchical architecture is designed to capture features at varying resolutions—beginning with high-resolution, granular feature maps and extending to lower-resolution maps that are imbued with robust semantic content. The integration of up-sampling and down-sampling pathways is pivotal in managing the scale disparity among features extracted at different levels of the network. Up-sampling involves augmenting the spatial resolution of deep, semantically enriched feature maps to enhance their spatial precision. Down-sampling reduces the spatial resolution of shallow, detail-rich feature maps to imbue them with greater semantic depth. These pathways ensure that each feature level is optimally prepared to contribute to the final detection task. Fusion of these multi-scaled features is achieved through convolutional operations that merge up-sampled and down-sampled outputs. This process can be mathematically represented as follows:

F_{f u s e d} = C o n v (C o n c a t (F_{u p,} F_{d o w n}))

(1)

where

F_{u p}

and

F_{d o w n}

denote the feature maps from the up-sampling and down-sampling pathways, respectively; Concat represents the concatenation operation; and Conv denotes a convolutional layer that integrates these features into a unified feature map. To refine the fusion process, attention mechanisms are employed to selectively emphasize features based on their relevance to the detection task. Spatial attention focuses on enhancing regions within the feature map that are more likely to contain targets, while channel-wise attention selectively amplifies the most informative feature channels across the different scales. During operation, inputs from various convolutional layers are fed into the MSFF module. These inputs undergo transformations via the up-sampling and down-sampling pathways, with attention-guided steps interspersed to ensure the focus remains on relevant features. The final output is a sophisticated multi-scale feature map that forms a robust basis for the detection layers in the AIDN, thus significantly bolstering the model detection capabilities.

2.2. Attention Mechanisms

In the realm of object detection, particularly in the complex scenarios presented by UAV-based aerial imagery, the deployment of attention mechanisms within a deep-learning architecture significantly augments the model ability to discern pertinent features from noisy backgrounds. In the AIDN, attention mechanisms play a critical role in enhancing the detection capabilities of the MSFF module by guiding the model to focus more effectively on regions and features that are indicative of insect presence (Figure 2).

Attention mechanisms in neural networks are inspired by the cognitive process in which humans focus more on specific parts of the visual field rather than processing the whole scene at once. This selective focus allows for a more efficient allocation of computational resources towards areas deemed most informative for a given task. In the context of the AIDN, the attention modules help the network to prioritize spatial and channel-wise features that are more likely to contain relevant information for detecting insects in diverse and often cluttered aerial scenes. The attention mechanisms in the AIDN are integrated within the MSFF module to enhance both the spatial and feature-channel relevance of the information processed by the network. The implementation consists of two primary components. This component of the attention mechanism focuses on identifying significant spatial locations within the feature maps. The spatial attention mechanism works by generating a spatial attention map that highlights the regions of the input feature map most relevant to the task. This is typically achieved through operations that compress the channel dimension of the feature map, such as using max pooling and average pooling followed by a convolution layer to predict the attention weights. Mathematically, this can be represented as follows:

S (x) = σ (f_{c o n v} ([M a x P o o l (x); A v g P o o l (x)]))

(2)

where

S (x)

is the spatial attention map, σ denotes the sigmoid activation function,

f_{c o n v}

represents a convolution operation, and

M a x P o o l

and

A v g P o o l

are the max pooling and average pooling operations across the channel dimensions, respectively. The resulting attention map is then multiplied element-wise with the input feature map to emphasize important spatial areas.

Unlike spatial attention, which focuses on “where” to focus, channel-wise attention determines “what” features to focus on across the channels of the feature maps. This type of attention assesses the inter-channel relationships of the features and adjusts the response of each channel based on its relevance to the detection task. One common approach to implementing channel-wise attention is through the squeeze-and-excitation (SE) block, which first squeezes the global spatial information into a channel descriptor using global average pooling. This descriptor is then processed through a series of fully connected layers to capture channel-wise dependencies and recalibrate the channel features by scaling them with learned weights. This process is formalized as follows:

C (x) = x \cdot σ (W_{2} δ (W_{1} A v g P o o l (x)))

(3)

where

C (x)

is the output of the channel-wise attention module; x is the input feature map; σ and δ are the sigmoid and ReLU activation functions, respectively; and

W_{1}

and

W_{2}

are the weights of the fully connected layers.

Attention mechanisms significantly empower the AIDN to perform with higher precision and reliability, enabling it to meet the demanding requirements of UAV-based insect monitoring. These mechanisms not only refine the feature processing capabilities of the network but also align its focus towards the most critical aspects of the input data, thereby driving substantial improvements in detection performance. The detection heads in the AIDN play a crucial role in identifying and classifying insects captured in UAV-based imagery. Designed to cope with the diverse challenges posed by the variability in insect sizes, their rapid mobility, and complex environmental backdrops, these enhanced detection heads are tailored to improve accuracy and processing speed, thereby enabling more effective monitoring of insect populations.

Localization and classification tasks are separated within the detection heads, each optimized for a specific function. The localization subnetwork calculates the bounding box adjustments, while the classification subnetwork assigns a probability to each class label. This separation is designed to enhance the focus of each network path, improving the overall detection accuracy. The detection heads also incorporate contextual information from surrounding image areas to improve detection reliability. This is implemented through a context-aware mechanism that adjusts the confidence of detections based on the surrounding visual information, mathematically represented as follows:

C (x) = σ (\sum_{j} w_{j} \cdot x_{i})

(4)

where

C (x)

represents the context-adjusted confidence, σ is the sigmoid function,

w_{j}

are the learned weights, and

x_{i}

represents the contextual features around the detection. Dynamic scaling is applied to the outputs of each detection head during both the training and inference phases. This process adjusts the confidence scores based on the clarity and size of the detected objects in the image, prioritizing detections that are more likely to be true positives. The scaling function can be expressed as follows:

S (c, s) = c \times e x p (- α \times {(s - \bar{s})}^{2})

(5)

here, S is the scaled confidence score, c is the original confidence score, s is the size of the detected object,

\bar{s}

is the average object size, and α is a scaling factor that adjusts the influence of size discrepancies. The introduction of optimized anchor settings, decoupled detection tasks, and contextual integration significantly elevates the performance of the AIDN. These enhancements ensure that each detection head operates with high precision, effectively localizing and classifying insects across various environmental settings. The dynamic scaling further adds to the robustness, allowing the AIDN to adapt to different sizes and appearances of insects, ensuring that the model performance remains consistent across diverse monitoring scenarios.

For AIDNs, the Mish activation function is utilized due to its properties that help in promoting higher order continuity in the learning process. Mish is a smooth, non-monotonic function defined as follows:

M i s h (x) = x \cdot t a n h (l n (1 + e^{x}))

(6)

This function has been chosen because it helps in reducing the likelihood of vanishing gradients—a common issue in deep networks dealing with complex image data. The soft and smooth profile of Mish allows for a better flow of gradients through the network, facilitating more effective training over traditional functions like ReLU. Normalization techniques adjust the activations in a network to have a mean of zero and a unit variance, helping to stabilize the training process by reducing internal covariate shift. In the AIDN, batch normalization is applied after every convolutional layer. Batch normalization (BN) standardizes the inputs to a layer for each mini-batch. This stabilizes the learning process and dramatically reduces the number of training epochs required to train deep networks. The BN transform is defined as follows:

B N (x) = γ (\frac{x - μ_{B}}{\sqrt{σ_{B}^{2} + \in}}) + β

(7)

where x is the input to a layer,

μ_{B}

and

σ_{B}^{2}

are the mini-batch mean and variance,

\in

is a small constant added for numerical stability, and

γ

and

β

are parameters to be learned that scale and shift the normalized value.

The integration of these attention mechanisms into the AIDN is critical for optimizing detection performance. Spatial attention operates on feature maps generated from early convolutional layers, applying a dynamic weighting that enhances relevant spatial features while diminishing less significant ones. Channel-wise attention, on the other hand, adjusts the weighting across different feature channels, prioritizing those that are most crucial for accurate insect detection. This two-pronged approach ensures that the AIDN can efficiently handle the variability and complexity typical of ecological monitoring imagery.

2.3. Custom Loss Function

The effectiveness of a neural network in tasks such as object detection significantly hinges on the design of its loss function. In the AIDN, a custom loss function is crafted to precisely address the challenges of detecting and classifying insects from UAV-captured imagery. This function combines aspects of localization, classification, and confidence prediction to refine the training process, guiding the network towards optimal performance with respect to the complex task at hand. The custom loss function in the AIDN is structured to comprehensively penalize the various discrepancies between the predicted outputs and the actual ground truth data. It encapsulates the errors in bounding box coordinates, the correctness of the class predictions, and the confidence scores associated with each prediction.

The detection heads utilize sets of anchor boxes, with each predefined to match the statistical distribution of insect dimensions. This optimization by adjusting the anchor boxes to minimize the localization loss

L_{l o c}

is calculated as follows:

L_{l o c} = \sum_{i \in P o s} \sum_{m \in \{x, y, w, h\}} {s m o o t h}_{L 1} ({l o c}_{i}^{m} - g_{i}^{m})

(8)

here, loc are the coordinates predicted by the model,

g

are the ground truth box coordinates, and Pos represents the set of positive anchor positions that corresponds to actual insect detections. The localization loss

L_{l o c}

focuses on the accuracy of the bounding box predictions, ensuring that the network learns to precisely localize the insects within the images. It is commonly calculated using the smooth L₁ loss, which is less sensitive to outliers than the mean squared error and is defined as follows:

L_{l o c} (x, g) = \sum_{i \in P o s} \sum_{m \in \{x, y, w, h\}} {s m o o t h}_{L 1} (x_{i}^{m} - x_{i}^{m})

(9)

where x represents the predicted bounding box coordinates, g represents the ground truth bounding box coordinates, and Pos denotes the set of positive anchor matches. The smooth L₁ loss is a combination of L₁ loss for larger errors and L₂ loss for smaller errors, providing a balanced approach to penalizing localization inaccuracies.

The classification loss

L_{c l a s s}

ensures that each detected object is classified into the correct category. This is typically handled by the cross-entropy loss for multi-class classification tasks, which measures the performance of a classification model whose output is a probability value between 0 and 1. The loss increases as the predicted probability diverges from the actual label. It is computed as follows:

L_{c l a s s} (p, c) = - \sum_{j = 1}^{C} y_{j} l o g (p_{j})

(10)

where p denotes the predicted probability distribution across classes, c is the actual class,

y_{j}

is a binary indicator (0 or 1) if class label j is the correct classification for the observation, and C is the number of classes.

The confidence loss

L_{c o n f}

addresses the confidence scores of the bounding box predictions, penalizing the network for being uncertain about its predictions. It is vital for minimizing false positives and improving the reliability of the detector. The binary cross-entropy is used for this purpose, and is defined as follows:

L_{c o n f} (s) = - \sum_{i \in P o s \cup N e g} y_{i} \log (s_{i}) + (1 - y_{i}) l o g (1 - s_{i})

(11)

where

s_{i}

is the predicted confidence score for each bounding box,

y_{i}

is 1 if the bounding box overlaps significantly with a ground truth box (positive example) and 0 otherwise (negative example). The overall loss function L used during the training of the AIDN is a weighted sum of these three components:

L = λ_{l o c} L_{l o c} + λ_{c l a s s} L_{c l a s s} + λ_{c o n f} L_{c o n f}

(12)

where

λ_{l o c}

,

λ_{c l a s s}

, and

λ_{c o n f}

are weights that balance the relative importance of the localization, classification, and confidence components within the total loss function. These weights are tuned based on the specific detection requirements and the nature of the training data to optimize the performance of the network. The custom loss function is fundamental in aligning the learning objectives of the AIDN with the practical requirements of UAV-based insect detection. By meticulously penalizing the errors across localization, classification, and confidence predictions, the function steers the network towards higher precision and reliability, which is crucial for effective monitoring and management of insect populations. This thoughtful construction not only enhances the model accuracy but also its applicability in varied and challenging environmental conditions.

3. Results

This section presents the empirical outcomes of deploying the AIDN in UAV-based ecological monitoring. The evaluation was meticulously designed to assess the performance of the AIDN against traditional models and SOTA benchmarks, focusing on key metrics such as precision and measuring the accuracy of positive predictions by calculating the proportion of correctly detected insects among all detected instances as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(13)

where TP (true positives) refers to correctly detected insects, and FP (false positives) refers to instances incorrectly classified as insects.

Recall measures the model ability to detect all actual insects in the dataset by computing as follows the proportion of correctly detected insects out of all actual insects:

R e c a l l = \frac{T P}{T P + F N}

(14)

where FN (false negatives) represents the number of actual insects that were not detected by the model.

The F1-score is the balanced metric that considers both precision and recall, ensuring that the model performs well both in detecting all insects and minimizing false detections. It is calculated as follows:

F 1 - s c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(15)

A higher F1-score indicates a strong balance between precision and recall, which is crucial for evaluating detection models.

Mean average precision (mAP) represents the overall accuracy of the detection model by averaging the precision as follows across multiple confidence thresholds and object classes:

m A P = \frac{1}{N} \sum_{i = 1}^{N} {A P}_{i}

(16)

where AP (average precision) is calculated for each class separately, and N represents the total number of classes. This metric provides a comprehensive measure of the model detection performance across different categories.

The performance of the AIDN model is quantitatively evaluated using the mean mAP at an intersection over union (IoU) threshold of 0.50, denoted as mAP50. This measure is selected to reflect the model precision in detecting and localizing insect targets with significant accuracy, suitable for practical applications in UAV-based monitoring. The results demonstrate the AIDN’s capability to significantly enhance insect detection and classification from aerial imagery. Performance comparisons are made through detailed statistical analysis and visual assessments, providing a comprehensive understanding of the model effectiveness. The data used in this evaluation were collected under diverse environmental conditions to ensure the robustness and reliability of the findings. The subsequent subsections detail the quantitative assessments, followed by a discussion of the model real-world applicability and operational viability based on feedback from field tests.

3.1. Implementation Details

The AIDN is developed within the PyTorch 2.5framework, chosen for its dynamic computation graph which is essential for handling complex gradient updates and intuitive model coding. This framework supports rapid prototyping and iterative testing, which are crucial given the experimental nature of the project. Additional support comes from Torchvision for accessing pre-built models and transformations and libraries like NumPy and OpenCV for numerical operations and image processing tasks, respectively. These tools facilitate the efficient manipulation of image data, both before they are input into the network and for interpreting the outputs from the model. The computational demands of training the AIDN model, particularly due to the depth of the network and the volume of data, necessitate the use of powerful GPU hardware. Training on NVIDIA GPUs, supported by CUDA and cuDNN, allows the network to process large batches of data in parallel, significantly speeding up the training phases and enabling real-time analysis capabilities necessary for UAV deployments (Table 1).

3.2. Dataset

The dataset employed for training the AIDN comprises 29,960 annotated images of insects captured using ten time-lapse cameras strategically placed over flowers during the summer of 2019 [23]. The dataset was meticulously curated through a collaborative effort involving automated preliminary detection, citizen science verification, and expert review, ensuring high-quality and reliable annotations. The insects in the dataset represent nine crucial taxa, including bees, hoverflies, butterflies, and beetles, reflecting a diverse cross-section of pollinators and pest species (Figure 3). The dataset is divided into three subsets: training (60%), validation (20%), and test (20%), consisting of 10,000 images collectively. Detailed augmentation techniques such as random rotations (0°, 90°, 180°, and 270°), horizontal flipping, and variations in lighting conditions were applied to enhance training diversity. Notably, classes 10–19, comprising rare insect species, are exclusively included in the test set to evaluate the model ability to generalize to new types of data. The images were captured using high-resolution aerial drones equipped with HD cameras, providing images at a resolution of 1920 × 1080 pixels.

This comprehensive collection provides a robust foundation for the deep-learning model, enabling the accurate identification and classification of small insects against complex natural backgrounds. The high-resolution images capture the intricate details necessary for effective machine-learning analysis. Training, validation, and testing splits were carefully prepared to ensure model robustness and generalizability. The training set was enhanced with data augmentation techniques to address class imbalances and improve the model ability to generalize across different insect appearances and environmental conditions. Validation and testing sets were designed to rigorously assess the model performance and its ability to handle unseen data effectively. The dataset supports the current study and serves as a valuable resource for the broader research community, offering a benchmark for future developments in automated ecological monitoring using deep-learning technologies.

The training and validation sets contain data from clearly identified classes. The test set includes additional classes to assess the model ability to generalize and detect unfamiliar insects. Table 2 provides a detailed view of how each insect class is represented across the different phases of the model training and evaluation, highlighting the extensive and varied nature of the dataset.

3.3. Comparison with SOTA Models

The deployment of the AIDN provided a comprehensive set of data illustrating its effectiveness in detecting and classifying insects from UAV-captured imagery. Evaluated across a spectrum of conditions and scenarios, the AIDN’s performance was quantified using several statistical metrics and compared to other established models in the field to demonstrate its improvements and operational viability. The effectiveness of the AIDN was gauged through precision, which measures the accuracy of the model positive predictions and its recall, assessing the model ability to capture all relevant instances, and the F1-score, which balances precision and recall in scenarios with uneven class distribution.

The AIDN model outperforms all baseline models and SOTA models included in the comparison, achieving a consistent 92% across all metrics. This highlights its robustness and effectiveness in detecting and classifying objects, particularly in challenging environments like insect detection from camera trap images. Among baseline models, YOLO variants show varied performance, with later versions (v8 to v11) generally performing better than earlier versions (v4 to v7), which illustrates improvements in architecture and training methodologies over time. SOTA models from various references exhibit a broad range of outcomes, with models from refs. [16,21] showing particularly high performance in certain metrics. However, none match the across-the-board high scores of the AIDN model (Table 3).

The superior performance of the AIDN model can be attributed to its innovative network architecture, tailored training approach, and meticulous dataset curation. This proves its potential as a leading solution for ecological monitoring and similar applications requiring high accuracy in object detection and classification.

In addition to numerical assessments, the model capabilities were visually demonstrated through annotated images from the test dataset (Figure 4). These images clearly show the precision with which the AIDN pinpointed and classified various insect species against complex backgrounds, with significantly fewer false positives and more accurate localization than the baseline models. Feedback from field tests further underscored the model applicability in real-world settings. Users noted the model efficiency in real-time image processing—a critical feature for UAV operations. The reduction in manual monitoring requirements facilitated by the AIDN’s automation of detection and classification processes allowed for more timely and informed decision making in both agricultural and ecological management practices.

The comprehensive evaluation and application testing of the AIDN highlight its potential to significantly advance the field of UAV-based insect monitoring. By delivering high performance across crucial metrics and demonstrating practical utility in field applications, the AIDN not only validates the effectiveness of its specialized design but also proves its adaptability and robustness across diverse operational environments. This positions the AIDN as a transformative tool for ecological monitoring and agricultural management, promising substantial improvements in data-driven decision making processes.

The AIDN model is adept at detecting multiple insects within a single frame, utilizing its multi-scale detection heads to localize and classify numerous targets concurrently. This ability is crucial for ecological monitoring, where images may contain diverse and densely populated insect communities. Extensive validation shows that our model maintains high accuracy metrics, including a mean average precision (mAP) of 89%, underscoring its effectiveness in complex multi-target scenarios (Figure 3).

To illustrate the AIDN model competitive edge in terms of accuracy and efficiency, we present the following table comparing it against other prevalent models in the field. Table 4 includes metrics for precision, recall, F1-score, mAP, GFLOPs, and FPS, providing a comprehensive overview of each model performance.

Table 4 showcases the AIDN model superior precision, recall, F1-score, and mAP metrics, which are complemented by its competitive computational efficiency (15 GFLOPs) and excellent real-time performance (30 FPS). The comparative data highlight the AIDN’s robustness and efficiency, positioning it as a leading solution for real-time insect detection in UAV-based monitoring. The results also illustrate the trade-offs between computational demand (GFLOPs) and processing speed (FPS) across different models, demonstrating the AIDN’s balanced performance in both metrics (Figure 5).

3.4. Ablation Studies

To comprehensively evaluate the contribution of individual components to the overall performance of the AIDN, we conducted several ablation studies (Table 5). These studies systematically removed or altered specific features of the model to identify their impact on detection accuracy and processing speed. We tested the model performance without the MSFF module to determine its effect on the model ability to integrate and leverage multi-scale information effectively (Table 5). The model was evaluated after disabling the spatial and channel-wise attention mechanisms to understand their role in enhancing the focus on relevant features within the complex aerial imagery. We replaced the custom-tailored loss function with a standard loss function typically used in similar detection tasks to assess its contribution toward the model precision and recall (Figure 6).

Each component was individually removed from the model, and the model was then retrained using the same dataset and under identical conditions. This approach ensured that the impact of each component could be isolated and accurately measured. Performance metrics such as precision, recall, F1-score, and mAP were recorded for each configuration.

Removing the MSFF resulted in the most significant decrease in all performance metrics, highlighting its crucial role in effectively handling multi-scale detections.

The decrease in performance metrics upon disabling the attention mechanisms indicates their importance in prioritizing essential features within the images. The use of a standard loss function led to a slight decrease in performance, underscoring the tailored loss function’s role in optimizing the balance between localization and classification accuracy.

4. Discussion

The AIDN demonstrated significant improvements in the accuracy and reliability of insect detection from UAV imagery, as evidenced by the results of comprehensive testing. The high precision and recall achieved by the AIDN suggest that the model is highly effective at correctly identifying insects in diverse environments and minimizing false negatives, which are crucial for applications in ecological monitoring and precision agriculture. The superior performance of the AIDN, particularly in comparison to established models like YOLO v4, SSD, and Faster R-CNN, can be attributed to its specialized architectural enhancements. The integration of advanced activation and normalization techniques, along with a sophisticated multi-scale feature fusion system and a tailored loss function, has enabled the network to handle the complexities associated with aerial insect detection more adeptly. These features help the AIDN manage variations in insect size, appearance, and movement, as well as the diverse backgrounds over which insects are detected.

The AIDN not only outperforms traditional object detection frameworks but also introduces several innovations that set a new benchmark in the field. Prior models often struggled with the scale and complexity of data derived from UAV-based imagery, particularly when detecting small or fast-moving objects against cluttered backgrounds. The AIDN addresses these challenges through its novel approach to feature fusion and attention mechanisms, which significantly enhance the model ability to discern pertinent features within the data. Moreover, the use of a custom loss function, which finely tunes the balance between localization, classification, and confidence accuracy, further refines the model output. This approach contrasts with more generic loss functions used in other models, which do not always cater to the specific nuances of insect detection from aerial platforms.

Despite its strengths, the AIDN is not without limitations. The complexity of the model, while beneficial for accuracy, requires substantial computational resources, particularly in terms of GPU memory and processing power. This could limit the deployment of the AIDN in real-time scenarios or on platforms with limited hardware capabilities. Additionally, while the AIDN has been trained and validated on a diverse dataset, the performance might still vary under extreme environmental conditions that were not fully represented in the training data. Ongoing adjustments and retraining with new data collected under a broader range of conditions will be necessary to maintain and enhance the model robustness. To extend the applicability of the AIDN to a broader range of ecological monitoring scenarios, particularly those involving smaller or less equipped UAVs, we assessed its deployment feasibility on lower-end hardware. Our analysis indicates that while certain computational reductions are possible by simplifying the model architecture or reducing the resolution of input images, these modifications may impact the accuracy and response time of the model.

The success of the AIDN opens several avenues for future research. One immediate area is the optimization of the network to reduce computational demands without compromising performance, potentially through network pruning or the development of more efficient convolutional operations. Another area is the expansion of the dataset to include more varied and challenging scenarios, particularly those involving rare insect species or extreme weather conditions, to further test and improve the model robustness. Long-term research could explore the integration of the AIDN with other types of sensory data, such as thermal or hyperspectral imaging, to enhance detection capabilities further. Additionally, developing a more dynamic model that can adapt its parameters in real time based on feedback from the detection environment could significantly enhance the utility of the AIDN in practical applications. The AIDN represents a significant advancement in the field of UAV-based insect detection, offering substantial improvements over existing models. Its development addresses key challenges in the field and sets a new standard for accuracy and reliability. Continued refinement and adaptation of the model in response to emerging challenges and technological advancements will ensure that the AIDN remains at the forefront of innovations in ecological monitoring and agricultural technology.

5. Conclusions

The development and deployment of the AIDN mark a significant milestone in the field of UAV-based ecological monitoring and precision agriculture. The results from extensive testing and real-world applications demonstrate that the AIDN substantially outperforms existing models in detecting and classifying insects from aerial imagery, showcasing high accuracy across essential metrics such as precision, recall, F1-score, and mean average precision (mAP). The AIDN’s success can be attributed to its innovative design, which incorporates advanced activation and normalization techniques, a sophisticated multi-scale feature fusion system, and a custom loss function tailored to address the specific challenges of aerial insect detection. These features enable the model to effectively process complex visual data, distinguishing insects from varied backgrounds with high precision. The model ability to perform reliably across different environmental conditions and its adaptability to various insect species and sizes underscore its potential as a transformative tool for environmental scientists and agricultural professionals. The implications of AIDNs extend beyond the technical sphere, offering practical benefits that could revolutionize ecological monitoring and agricultural practices. By automating the detection and classification of insects, AIDNs reduce the need for manual surveys, which are often labor-intensive and prone to human error. This automation facilitates more frequent and accurate data collection, leading to better-informed decision making in pest management and biodiversity conservation. Moreover, the AIDN’s efficiency in processing and analyzing data in real-time supports dynamic response strategies, enhancing the effectiveness of interventions to manage insect populations. While the AIDN has set a new benchmark in its domain, there are opportunities for further enhancements. Future developments could focus on optimizing the model to reduce computational demands, enabling its deployment on less powerful systems and broadening its applicability in various field conditions. Expanding the training dataset to include more diverse environmental scenarios and insect behaviors will improve the model robustness and accuracy. Additionally, integrating the AIDN with other technological advancements, such as multispectral imaging and AI-driven analytic platforms, could further enhance its capabilities. AIDNs represent a significant advancement in the application of deep-learning technologies to the challenges of UAV-based insect monitoring. As this field continues to evolve, AIDNs offer a scalable, efficient, and effective solution that could significantly impact how ecological data are collected and analyzed, paving the way for more sustainable agricultural practices and enhanced environmental conservation efforts. Future iterations of the AIDN, fueled by ongoing research and technological improvements, promise to further enhance its effectiveness and applicability, maintaining its position at the forefront of ecological and agricultural innovation.

Author Contributions

Methodology, H.K., M.A., S.M., J.C. and H.-S.J.; software, H.K., M.A. and S.M.; validation, J.C. and H.-S.J.; formal analysis, J.C. and H.-S.J.; resources, H.K., M.A., S.M., J.C. and H.-S.J.; data curation, H.K., M.A., S.M., J.C. and H.-S.J.; writing—original draft, H.K., M.A., S.M., J.C. and H.-S.J.; writing—review and editing, H.K., M.A., S.M., J.C. and H.-S.J.; supervision, J.C. and H.-S.J.; project administration, H.K., M.A. and S.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. RS-2024-00412141).

Data Availability Statement

All datasets used are available online with open access.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sánchez Herrera, M.; Forero, D.; Calor, A.R.; Romero, G.Q.; Riyaz, M.; Callisto, M.; de Oliveira Roque, F.; Elme-Tumpay, A.; Khan, M.K.; Justino de Faria, A.P.; et al. Systematic challenges and opportunities in insect monitoring: A Global South perspective. Philos. Trans. R. Soc. B 2024, 379, 20230102. [Google Scholar] [CrossRef] [PubMed]
Rajabpour, A.; Yarahmadi, F. Monitoring and Population Density Estimation. In Decision System in Agricultural Pest Management; Springer Nature: Singapore, 2024; pp. 37–67. [Google Scholar]
Keasar, T.; Yair, M.; Gottlieb, D.; Cabra-Leykin, L.; Keasar, C. STARdbi: A pipeline and database for insect monitoring based on automated image analysis. Ecol. Inform. 2024, 80, 102521. [Google Scholar] [CrossRef]
Kariyanna, B.; Sowjanya, M. Unravelling the use of artificial intelligence in management of insect pests. Smart Agric. Technol. 2024, 8, 100517. [Google Scholar] [CrossRef]
Duarte, A.; Borralho, N.; Cabral, P.; Caetano, M. Recent advances in forest insect pests and diseases monitoring using UAV-based data: A systematic review. Forests 2022, 13, 911. [Google Scholar] [CrossRef]
Betti Sorbelli, F.; Coró, F.; Das, S.K.; Palazzetti, L.; Pinotti, C.M. Drone-based Bug Detection in Orchards with Nets: A Novel Orienteering Approach. ACM Trans. Sens. Netw. 2024, 20, 1–28. [Google Scholar] [CrossRef]
Jain, A.; Cunha, F.; Bunsen, M.J.; Cañas, J.S.; Pasi, L.; Pinoy, N.; Helsing, F.; Russo, J.; Botham, M.; Sabourin, M.; et al. Insect identification in the wild: The ami dataset. arXiv 2024, arXiv:2406.12452. [Google Scholar]
Safarov, F.; Khojamuratova, U.; Komoliddin, M.; Bolikulov, F.; Muksimova, S.; Cho, Y.I. MBGPIN: Multi-Branch Generative Prior Integration Network for Super-Resolution Satellite Imagery. Remote Sens. 2025, 17, 805. [Google Scholar] [CrossRef]
O’Connor, T.; García, O.G.; Cabral, V.; Isacch, J.P. Agroecological farmer perceptions and opinions towards pest management and biodiversity in the Argentine Pampa region. Agroecol. Sustain. Food Syst. 2025, 49, 182–203. [Google Scholar] [CrossRef]
Padhiary, M. The Convergence of Deep Learning, IoT, Sensors, and Farm Machinery in Agriculture. In Designing Sustainable Internet of Things Solutions for Smart Industries; IGI Global: Hershey, PA, USA, 2025; pp. 109–142. [Google Scholar]
Vhatkar, K.N. An intellectual model of pest detection and classification using enhanced optimization-assisted single shot detector and graph attention network. Evol. Intell. 2025, 18, 3. [Google Scholar] [CrossRef]
Zhao, D. Cognitive process and information processing model based on deep learning algorithms. Neural Netw. 2025, 183, 106999. [Google Scholar] [CrossRef] [PubMed]
Santoso, C.B.; Singadji, M.; Purnama, D.G.; Abdel, S.; Kharismawardani, A. Enhancing Apple Leaf Disease Detection with Deep Learning: From Model Training to Android App Integration. J. Appl. Data Sci. 2025, 6, 377–390. [Google Scholar] [CrossRef]
Chen, H.; Wen, C.; Zhang, L.; Ma, Z.; Liu, T.; Wang, G.; Yu, H.; Yang, C.; Yuan, X.; Ren, J. Pest-PVT: A model for multi-class and dense pest detection and counting in field-scale environments. Comput. Electron. Agric. 2025, 230, 109864. [Google Scholar] [CrossRef]
Bhoi, S.K.; Jena, K.K.; Panda, S.K.; Long, H.V.; Kumar, R.; Subbulakshmi, P.; Jebreen, H.B. An Internet of Things assisted Unmanned Aerial Vehicle based artificial intelligence model for rice pest detection. Microprocess. Microsyst. 2021, 80, 103607. [Google Scholar] [CrossRef]
Berger, G.S.; Mendes, J.; Chellal, A.A.; Junior, L.B.; da Silva, Y.M.; Zorawski, M.; Pereira, A.I.; Pinto, M.F.; Castro, J.; Valente, A.; et al. A YOLO-based insect detection: Potential use of small multirotor unmanned aerial vehicles (UAVs) monitoring. In Proceedings of the International Conference on Optimization, Learning Algorithms and Applications, Ponta Delgada, Portugal, 27–29 September 2023; Springer Nature: Cham, Switzerland, 2023; pp. 3–17. [Google Scholar]
Thakre, R.N.; Kunte, P.A.; Chavhan, N.; Dhule, C.; Agrawal, R. UAV Based System For Detection in Integrated Insect Management for Agriculture Using Deep Learning. In Proceedings of the 2023 2nd International Conference on Futuristic Technologies (INCOFT), Belagavi, India, 24–26 November 2023; IEEE: Karnataka, India, 2023; pp. 1–6. [Google Scholar]
Casas, E.; Arbelo, M.; Moreno-Ruiz, J.A.; Hernández-Leal, P.A.; Reyes-Carlos, J.A. UAV-based disease detection in palm groves of Phoenix canariensis using machine learning and multispectral imagery. Remote Sens. 2023, 15, 3584. [Google Scholar] [CrossRef]
Godinez-Garrido, G.; Gonzalez-Islas, J.C.; Gonzalez-Rosas, A.; Flores, M.U.; Miranda-Gomez, J.M.; Gutierrez-Sanchez, M.D.J. Estimation of Damaged Regions by the Bark Beetle in a Mexican Forest Using UAV Images and Deep Learning. Sustainability 2024, 16, 10731. [Google Scholar] [CrossRef]
Pak, J.; Kim, B.; Ju, C.; You, S.H.; Son, H.I. UAV-Based Trilateration System for Localization and Tracking of Radio-Tagged Flying Insects: Development and Field Evaluation. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; IEEE: Detroit, MI, USA, 2023; pp. 1–8. [Google Scholar]
Gokeda, V.; Yalavarthi, R. Deep Hybrid Model for Pest Detection: IoT-UAV-Based Smart Agriculture System. J. Phytopathol. 2024, 172, e13381. [Google Scholar] [CrossRef]
Qin, K.; Zhang, J.; Hu, Y. Identification of Insect Pests on Soybean Leaves Based on SP-YOLO. Agronomy 2024, 14, 1586. [Google Scholar] [CrossRef]
Bjerge, K.; Alison, J.; Dyrmann, M.; Frigaard, C.E.; Mann, H.M.; Høye, T.T. Accurate detection and identification of insects from camera trap images with deep learning. PLOS Sustain. Transform. 2023, 2, e0000051. [Google Scholar] [CrossRef]

Figure 1. Architectural overview of the AIDN. The shallow layer, F1, captures high-resolution, fine-grained details crucial for identifying subtle textures and the small elements of insects. It is particularly vital for recognizing species with minimal morphological differences. The intermediate layer, F2, processes basic forms and shapes from the visual data, which is essential for distinguishing between insect types based on structural features. The deep layer, F3, encodes complex semantic information that aids robust classification against varied backgrounds, facilitating accurate detection even in cluttered or dynamic environments.

Figure 2. Attention mechanism.

Figure 3. Examples of insect benchmark datasets [23].

Figure 4. Detection results from the AIDN model.

Figure 5. Accuracy (MAP) vs. processing speed (FPS).

Figure 6. Impact of model components on performance.

Table 1. Experimental setup and configuration details.

Category	Specifications
Hardware Configuration
CPU	Intel Xeon Processor E5-2640 v4
GPU	NVIDIA Tesla P100
RAM	64 GB DDR4
Software Environment
Operating System	Ubuntu 18.04 LTS
Deep-learning Framework	TensorFlow 2.3
Additional Libraries	NumPy 1.19, OpenCV 4.5
Model Training Parameters
Learning Rate	0.001, decaying by 0.1 every 10 epochs
Batch Size	32
Optimizer	Adam
Epochs	50
Regularization	Dropout, rate = 0.5

Table 2. Outlines the classes within the dataset used for the AIDN.

No.	Class Name	Training Set Size	Validation Set Size	Test Set Size
1	Coccinella septempunctata	6344	299	396
2	Apis mellifera	6934	1663	2162
3	Bombus lapidarius	1551	250	269
4	Bombus terrestris	2955	240	268
5	Eupeodes corolla	3410	275	573
6	Episyrphus balteatus	1306	274	415
7	Eristalis tenax	286	27	31
8	Aglais urticae	286	27	31
9	Vespula vulgaris	956	201	217
10	Bombus spp. (related to 3, 4)	-	-	667
11	Syrphidae (related to 5, 6, 7)	-	-	1982
12	Coccinellidae (related to 1)	-	-	27
13	Non-Bombus Anthophila (2)	-	-	271
14	Rhopalocera (8)	-	-	51
15	Non-Anthophila Hymenoptera (9)	-	-	285
16	Non-Syrphidae Diptera	-	-	421
17	Non-Coccinelidae Coleoptera	-	-	19
18	Unclear insect	-	-	489
19	Other animal	-	-	231

Table 3. A comparative table summarizing the performance metrics across different models is presented below.

Model	Precision	Recall	F1-Score	mAP
AIDN	92%	91%	92%	92%
Baseline models
YOLO v4	80%	75%	77%	76%
YOLO v5	82%	81%	82%	80%
YOLO v6	84%	84%	83%	83%
YOLO v7	83%	85%	83%	81%
YOLO v8	85%	84%	85%	82%
YOLO v9	85%	85%	84%	83%
YOLO v10	85%	86%	85%	82%
YOLO v11	86%	85%	85%	84%
SSD	78%	74%	76%	74%
Faster R-CNN	82%	80%	81%	79%
SOTA models
Ref. [6]	79%	77%	77%	79%
Ref. [15]	81%	81%	82%	81%
Ref. [16]	88%	87%	87%	85%
Ref. [17]	87%	87%	88%	82%
Ref. [18]	79%	75%	77%	77%
Ref. [19]	89%	87%	87%	87%
Ref. [20]	78%	77%	79%	78%
Ref. [21]	89%	88%	88%	87%
Ref. [22]	85%	85%	86%	82%

Table 4. Comparative performance metrics of detection models.

Model	Precision (%)	Recall (%)	F1-Score (%)	mAP (%)	GFLOPs	FPS
AIDN	92	91	92	92	15	30
YOLO v4	80	75	77	76	20	25
YOLO v5	82	81	82	80	22	28
SSD	78	74	76	74	18	22
Faster R-CNN	82	80	81	79	25	20

Table 5. The results of the ablation studies.

Experimental Setting	MSFF Module	Attention Mechanisms	Custom Loss Function	Precision (%)	Recall (%)	F1-Score (%)	mAP (%)
Baseline (Full Model—AIDN)	✓	✓	✓	92	91	92	92
Without MSFF Module	✗	✓	✓	87	85	86	86
Without Attention Mechanisms	✓	✗	✓	89	88	88	88
With Standard Loss Function	✓	✓	✗	90	89	89	90

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Khujamatov, H.; Muksimova, S.; Abdullaev, M.; Cho, J.; Jeon, H.-S. Advanced Insect Detection Network for UAV-Based Biodiversity Monitoring. Remote Sens. 2025, 17, 962. https://doi.org/10.3390/rs17060962

AMA Style

Khujamatov H, Muksimova S, Abdullaev M, Cho J, Jeon H-S. Advanced Insect Detection Network for UAV-Based Biodiversity Monitoring. Remote Sensing. 2025; 17(6):962. https://doi.org/10.3390/rs17060962

Chicago/Turabian Style

Khujamatov, Halimjon, Shakhnoza Muksimova, Mirjamol Abdullaev, Jinsoo Cho, and Heung-Seok Jeon. 2025. "Advanced Insect Detection Network for UAV-Based Biodiversity Monitoring" Remote Sensing 17, no. 6: 962. https://doi.org/10.3390/rs17060962

APA Style

Khujamatov, H., Muksimova, S., Abdullaev, M., Cho, J., & Jeon, H.-S. (2025). Advanced Insect Detection Network for UAV-Based Biodiversity Monitoring. Remote Sensing, 17(6), 962. https://doi.org/10.3390/rs17060962

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Advanced Insect Detection Network for UAV-Based Biodiversity Monitoring

Abstract

1. Introduction

2. Materials and Methods

2.1. Multi-Scale Feature Fusion (MSFF) Module

2.2. Attention Mechanisms

2.3. Custom Loss Function

3. Results

3.1. Implementation Details

3.2. Dataset

3.3. Comparison with SOTA Models

3.4. Ablation Studies

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI