MDAS-YOLO: A Lightweight Adaptive Framework for Multi-Scale and Dense Pest Detection in Apple Orchards

Ma, Bo; Xu, Jiawei; Liu, Ruofei; Mu, Junlin; Li, Biye; Xie, Rongsen; Liu, Shuangxi; Hu, Xianliang; Zheng, Yongqiang; Zhang, Hongjian; Wang, Jinxing

doi:10.3390/horticulturae11111273

Open AccessArticle

MDAS-YOLO: A Lightweight Adaptive Framework for Multi-Scale and Dense Pest Detection in Apple Orchards

by

Bo Ma

¹,

Jiawei Xu

¹,

Ruofei Liu

¹,

Junlin Mu

¹,

Biye Li

¹,

Rongsen Xie

¹,

Shuangxi Liu

^1,2,

Xianliang Hu

³,

Yongqiang Zheng

⁴

,

Hongjian Zhang

^1,2,* and

Jinxing Wang

^1,5,*

¹

College of Mechanical and Electronic Engineering, Shandong Agricultural University, Taian 271018, China

²

Shandong Province Key Laboratory of Horticultural Machinery and Equipment, Taian 271018, China

³

Shandong Xiangchen Technology Group Co., Ltd., Jinan 250000, China

⁴

Citrus Research Institute, Chinese Academy of Agricultural Sciences, Southwest University, Chongqing 400712, China

⁵

Shandong Provincial Engineering Laboratory of Agricultural Equipment Intelligence, Taian 271018, China

^*

Authors to whom correspondence should be addressed.

Horticulturae 2025, 11(11), 1273; https://doi.org/10.3390/horticulturae11111273

Submission received: 4 September 2025 / Revised: 18 October 2025 / Accepted: 21 October 2025 / Published: 22 October 2025

(This article belongs to the Section Insect Pest Management)

Download

Browse Figures

Versions Notes

Abstract

Accurate monitoring of orchard pests is vital for green and efficient apple production. Yet images captured by intelligent pest-monitoring lamps often contain small targets, weak boundaries, and crowded scenes, which hamper detection accuracy. We present MDAS-YOLO, a lightweight detection framework tailored for smart pest monitoring in apple orchards. At the input stage, we adopt the LIME++ enhancement to mitigate low illumination and non-uniform lighting, improving image quality at the source. On the model side, we integrate three structural innovations: (1) a C3k2-MESA-DSM module in the backbone to explicitly strengthen contours and fine textures via multi-scale edge enhancement and dual-domain feature selection; (2) an AP-BiFPN in the neck to achieve adaptive cross-scale fusion through learnable weighting and differentiated pooling; and (3) a SimAM block before the detection head to perform zero-parameter, pixel-level saliency re-calibration, suppressing background redundancy without extra computation. On a self-built apple-orchard pest dataset, MDAS-YOLO attains 95.68% mAP, outperforming YOLOv11n by 6.97 percentage points while maintaining a superior trade-off among accuracy, model size, and inference speed. Overall, the proposed synergistic pipeline—input enhancement, early edge fidelity, mid-level adaptive fusion, and end-stage lightweight re-calibration—effectively addresses small-scale, weak-boundary, and densely distributed pests, providing a promising and regionally validated approach for intelligent pest monitoring and sustainable orchard management, and offering methodological insights for future multi-regional pest monitoring research.

Keywords:

apple orchard pest detection; intelligent insect monitoring; MDAS-YOLO; LIME++ enhancement

1. Introduction

Apples are a fruit of high nutritional value and economic importance. In 2023, China produced 34.31 million tons of apples and exported 0.70 million tons, ranking first worldwide in both categories [1]. The green and high-quality development of the apple industry is crucial for improving farmers’ income and advancing rural revitalization.

However, the production cost of apples in China has remained high. In 2023, the average production cost per mu reached 5451 yuan, a 4.62% increase over 2022. One major reason is ineffective pest control, which leads to blind pesticide application, excessive use, and bagging and debagging operations, thereby greatly increasing labor and material costs. The success of pest management in apple orchards depends on whether pests can be monitored in a timely and accurate manner, and on establishing a decision-making system that integrates pest population dynamics, species variation, and resistance evolution. Such a system is essential for identifying key control periods and formulating appropriate strategies [2,3].

Traditional monitoring methods in apple orchards are mainly of two types. The first relies on manual field inspection, which is time-consuming, labor-intensive, and based on outdated statistical approaches. The second uses trapping and forecasting methods, such as pest-monitoring lamps or pheromone traps, followed by on-site judgment or remote diagnosis based on uploaded images. However, trapped pests are often adhesive and overlapped, making accurate recognition difficult [4]. Both approaches depend heavily on subjective human judgment and cannot achieve precise pest identification or classification.

With the ongoing progress of smart agriculture, machine vision and deep learning have been increasingly applied to pest identification, and recognition accuracy continues to improve. Edson et al. [5] proposed a weakly supervised attention model based on activation maps that combines activation mapping with multi-instance learning to classify regions of interest. A dual-weighted activation mapping algorithm further improved the localization of tiny pest regions, enabling precise recognition and positioning of citrus pests. Chathurika et al. [6] introduced a fine-grained classification method for microscopic pests, building a domain-knowledge-driven stacked model based on Vision Transformer (ViT) with a preprocessing module; it achieved 97.8% accuracy on two thrip species. Pattnaik et al. [7] presented a transfer-learning framework using pre-trained deep convolutional networks; DenseNet169 recognized 10 tomato pest categories with 88.83% accuracy. Turkoglu et al. [8] developed an MLP-CNNs approach that couples pre-trained CNN feature extractors (AlexNet, GoogLeNet, DenseNet201) with a multi-model LSTM, where deep features are fused in the LSTM layer to enable accurate detection of apple-orchard pests.

In the domestic literature, Lin et al. [9] proposed a rice planthopper classification method that combines transfer learning (ResNet50 backbone) with Mask R-CNN, and demonstrated robust performance on adhesive and overlapping instances, reaching an average accuracy of 92.3%. Xiang et al. [10] developed Yolo-Pest for small-object detection; by integrating the ConvNeXt module, SE attention, and a controllable channel residual structure, the model achieved precise multi-scale detection under complex backgrounds with 91.9% accuracy. Li et al. [11] designed a lightweight, improved YOLOv5 algorithm for passion-fruit pests; a point-to-line distance loss reduced redundant computation, and adaptive attention in spatial and channel dimensions yielded 96.51% accuracy with an average detection time of 7.7 ms, satisfying real-time requirements. Tian et al. [12] proposed MD-YOLO for multi-scale dense pest detection; DenseNet modules and adaptive attention enhanced feature representation, enabling recognition of small Lepidoptera pests in dense scenes with 88.1% accuracy.

Despite the significant progress achieved in detection accuracy and real-time performance, existing YOLO-based variants still exhibit limitations under orchard pest-monitoring conditions. For instance, GA-YOLO enhances dense-target detection through the integration of SE-CSPGhostNet and ASFF modules but remains insufficient in extracting fine-grained features of small-scale and weak-boundary pests [13]. AHN-YOLO, which incorporates lightweight attention and NWD-IoU loss within the YOLOv11n framework to balance accuracy and efficiency, tends to suffer from feature confusion when dealing with dense or visually similar insect instances [14]. PHD-YOLO, while improving detection robustness in complex backgrounds through multi-branch feature fusion, introduces high structural complexity and computational overhead, making real-time deployment on edge devices challenging [15]. Collectively, these models have not yet effectively addressed the core challenges presented by pest-monitoring-lamp imagery, namely small scale, weak boundaries, and dense overlaps.

To address these gaps, this research introduces MDAS YOLO, a detection network designed for intelligent pest monitoring equipment in apple orchards. The main innovations are as follows:

(1): A C3k2 MESA DSM module is embedded in the backbone. It explicitly strengthens insect edges and fine grained textures, and suppresses redundant features through spatial and frequency dual domain screening.
(2): An AP-BiFPN is added to the neck. It realizes adaptive fusion of semantics and details across scales by means of learnable weighting and differentiated pooling.
(3): A SimAM module is placed before the detection head. It performs zero parameter pixel level recalibration, highlights salient insect cues, and suppresses background noise.

Experiments on a self-built apple orchard pest dataset demonstrate a favorable balance among detection accuracy, real time performance, and model complexity. The approach provides a new technical route for smart orchard pest monitoring and green precision control.

2. Materials and Methods

2.1. Sample Collection

The dataset used in this research was collected by intelligent pest monitoring equipment jointly developed by Shandong Xiangchen Technology Group Co., Ltd. (Jinan, China) and Shandong Agricultural University (Taian, China). The devices were deployed at 18 test sites across Shandong Province, ensuring the diversity and representativeness of samples, as shown in Figure 1a,b.

Each device consists of a frame, an X-shaped impact plate, an insect-attracting lamp, a sex pheromone lure, an industrial camera, a collection unit, and an embedded control unit. The industrial camera has a resolution of 12 megapixels, and the distance from the lens to the calibration board was set to 300 mm. During operation, the lamp and the pheromone lure jointly provide optical and chemical stimuli, leveraging phototaxis and pheromone attraction. Target insects strike the impact plate and fall into a drying unit below. The drying unit dehydrates and inactivates insects at high temperature to prevent sample decay. Dried specimens slide through a guiding funnel onto an insect collecting plate for subsequent collection and imaging. The control unit is microcontroller based and equipped with a 5G communication module. It triggers the industrial camera every 30 min for automatic imaging, then actuates a sweeper to move specimens from the collecting plate to the collection unit. Images and metadata are transmitted in real time to a cloud management platform via the 5G module.

2.2. Image Enhancement Strategy Based on LIME++

In the intelligent pest-monitoring equipment for apple orchards, image acquisition is often affected by uneven light distribution, vignetting from the circular plate structure, background reflection, and environmental noise, as shown in Figure 1c. These factors obscure insect details, distort color, and reduce overall visibility, thereby weakening the accuracy of subsequent detection and recognition. To address the problem of low-light and uneven-illumination image enhancement, the classical LIME (Low-light Image Enhancement) framework estimates an illumination map and applies adaptive gain guided by it [16,17]. This approach increases brightness in dark regions while preserving details in bright regions. However, the original LIME method remains limited in scenarios with structural lighting imbalance and complex color shifts [18].

To address this challenge, an enhanced algorithm named LIME++ is developed. While preserving the fundamental framework of LIME, it introduces three specific improvements tailored to the characteristics of images captured by intelligent pest-monitoring equipment:

(1): Based on multi-scale guided filtering, radial polynomial fitting was introduced to compensate for circular vignetting and to enhance large-scale illumination uniformity:

T_{r a d} (r) = \sum_{k = 0}^{K} α_{k} r^{k}

(1)

where

T_{r a d} (r)

is the radial illumination approximation function,

r

is the normalized radius,

α_{k}

are the fitting coefficients, and

K

denotes the polynomial order. This modeling captures the monotonic attenuation pattern in circular imaging caused by light distribution and device structure, enabling effective compensation for large-scale vignetting effects.

(2): An adaptive Gamma strategy with a dual brightness gating mechanism was designed to dynamically adjust gain, enhancing details in dark regions while suppressing overexposure in bright regions:

γ (x) = γ_{\min} + (γ_{\max} - γ_{\min}) {(1 - T_{r e f} (x))}^{η}

(2)

where

γ (x)

denotes the adaptive Gamma value,

T_{r e f} (x)

is the reference illumination,

γ_{\max}

and

γ_{\min}

are the maximum and minimum Gamma values, respectively, and

η

is a nonlinear adjustment factor. This mapping ensures stronger enhancement for dark pixels, while bright regions gradually approach linear preservation. In addition, luminance constraints are applied before enhancement and after color restoration to prevent overexposure in bright areas and to maintain structural stability.

(3): A safe color restoration factor was introduced, which applies bounded correction only to chromatic residuals, thereby avoiding color distortion caused by excessive luminance stretching:

F (x) = \exp (g \cdot (C R (x) - \bar{C R}))

(3)

where

F (x)

denotes the color restoration factor,

C R (x)

is the chromatic residual,

\bar{C R}

is the mean chromatic residual of the entire image, and

g

is the adjustment coefficient. This method prevents excessive stretching of luminance and ensures that the enhanced insect colors remain more natural and stable.

Overall, LIME++ significantly enhances robustness in three aspects—illumination modeling, dynamic gain regulation, and color constraint—allowing pest-monitoring images to achieve more balanced, natural, and stable visual performance under complex lighting conditions, as shown in Figure 1d. To validate its effectiveness, a systematic evaluation was conducted from four perspectives: illumination estimation, radial profile, and luminance and RGB histograms.

As shown in Figure 1e, the illumination distribution of the original image is generally low and uneven. Figure 1f presents the enhanced illumination map, where overall brightness is improved and local overexposure is suppressed. Figure 1g shows the radial illumination profile. The enhanced curve (orange, Enh) lies above the original curve (blue, Orig) and is smoother, indicating a more uniform brightness distribution across the disk surface. The lifted tail of the curve demonstrates effective compensation of the vignette region, while the absence of sharp peaks confirms that high-luminance areas were not overstretched. Figure 1h displays the luminance histogram. After enhancement, the main peak shifts to the right, reflecting an increase in mean brightness, and no abnormal accumulation occurs in the highlight range. This indicates that dark regions are elevated under the “brightness gating” constraint, leading to improved global contrast without noticeable noise amplification. Figure 1i presents the RGB channel histograms. After enhancement, the channel distributions exhibit closer shapes and peak positions, and the inter-channel offset is reduced. This verifies that the “safe color restoration” effectively mitigates color shift and prevents oversaturation.

2.3. Construction of the Apple Orchard Pest Dataset

The apple orchard pest dataset was annotated with bounding boxes using the LabelImg tool. A total of 2100 images were selected from those collected by the intelligent pest-monitoring lamps, covering seven common pest species in apple orchards: Grapholita molesta (Oriental fruit moth), Carposina sasakii (Peach fruit borer), Mythimna separata (Armyworm), Helicoverpa armigera (Cotton bollworm), Spodoptera litura (Common cutworm), Cydia pomonella (Codling moth), and Adoxophyes orana (Summer fruit tortrix moth).

According to sample size and class distribution, the dataset was split into training, validation, and test subsets at a ratio of 7:2:1, containing 1470, 420, and 210 images, respectively, with no overlap across sets. To ensure spatiotemporal independence, images from the same monitoring site or time window were not mixed across training, validation, and test partitions. The training set was used for model parameter learning, the validation set for hyperparameter tuning and structural optimization, and the test set for final performance evaluation.

To enhance model robustness and generalization, the built-in data augmentation strategies of YOLOv11 were applied during training, including Mosaic mixing, HSV color-space perturbation, random flipping, and scale jittering. These augmentations dynamically generate diverse visual samples, effectively mitigating potential overfitting caused by limited data volume and improving detection stability under varying illumination, background, and pest-density conditions.

2.4. MDAS-YOLO Model Construction

2.4.1. YOLOv11n Algorithm

YOLOv11n, proposed by the Ultralytics team in 2024 as an iterative extension of YOLOv8, consists of three main components: Backbone, Neck, and Head. The Backbone employs the C3k2 module as its core unit, which extends the CSP-Bottleneck by introducing variable convolution kernels and a dual-branch parallel structure to strengthen multi-scale contextual modeling. In addition, the incorporation of SPPF and C2PSA modules further enlarges the receptive field and enhances spatial attention allocation. The Neck integrates multi-scale information through up-sampling and cross-layer feature fusion. The Head replaces the conventional convolution in the classification branch with depthwise separable convolution, thereby reducing computational cost while preserving detection accuracy [19].

2.4.2. MDAS-YOLO Overview

In the apple orchard pest images collected by intelligent pest-monitoring lamps, targets exhibit large scale variation, weak boundaries, and complex backgrounds. The original YOLOv11n shows limitations in adaptive multi-scale fusion and fine-grained representation. To address these issues, MDAS-YOLO was developed on the YOLOv11n framework, with the overall architecture shown in Figure 2. The main improvements are as follows:

(1): A C3k2-MESA-DSM module was designed to replace the original C3k2 structure in the backbone. Through multi-scale decomposition, parallel edge modeling, and gated fusion, this module explicitly enhances high-frequency boundaries and fine-grained features of pests. Combined with the DSM module, it suppresses background redundancy, thereby improving the fidelity and separability of dense insect contours.
(2): The neck was reconstructed by introducing the AP-BiFPN, which achieves learnable top-down and bottom-up weighted fusion. Together with differentiated pooling, this design simultaneously strengthens local edges and global context, significantly improving cross-scale semantic and detail complementarity for pests.
(3): The SimAM module was embedded into the neck as a zero-parameter pixel-level recalibration operator. It highlights pest contours and microstructures while suppressing background redundancy, improving the stability and accuracy of small-object detection without increasing computational cost.

2.4.3. C3k2-MESA-DSM Module

In apple orchard pest images, small objects often exhibit blurred edges and dense distributions. As a result, the C3k2 module in the original YOLOv11n tends to lose fine-grained boundary information through repeated convolution and down-sampling, leading to missed and false detections of small pests. To address this issue, this research designed the C3k2-MESA-DSM module (C3k2–Multi-scale Edge Selection Enhancement and Dual-domain Selection Mechanism) on the basis of the C3k2 structure [20,21]. The core idea is to strengthen insect boundary features and suppress illumination and background noise through a strategy of “multi-scale edge selection enhancement–dual-domain feature screening–residual fusion,” thereby improving the robustness and accuracy of detection [22].

The MESA module aims to enhance fine-grained pest features through multi-scale edge amplification. As shown in Figure 3, parallel convolutions with different receptive fields (3 × 3, 6 × 6, 9 × 9, 12 × 12) are applied to the input feature map

F_{i n}

to capture both local contours and global context. The resulting outputs are adaptively fused to generate a multi-scale edge feature map

H_{i}

, which strengthens insect boundaries and highlights weak-edge regions.

To further balance the contributions of features at different scales, a lightweight gating mechanism is introduced:

E_{i} = σ ({Conv}_{1 \times 1} ([H_{i}, S_{i}])) \cdot H_{i} + (1 - σ (\cdot)) \cdot S_{i}

(4)

where

σ (\cdot)

denotes the Sigmoid function,

H_{i}

denotes the edge enhancement feature and

S_{i}

denotes the edge selection feature. This mechanism dynamically balances the contributions of “explicit enhancement” and “adaptive selection” according to the complexity of the input features.

Finally, a 1 × 1 convolution performs channel compression and spatial alignment, producing edge-preserving, multi-scale features that serve as high-quality inputs for the subsequent DSM module.

2.4.4. DSM Module

In the pest-monitoring lamp environment, collected insect images are often affected by background clutter and blurred features, which severely weaken the discriminative capacity of detection models. To address this issue, this research introduces a Dual-Domain Selection Mechanism (DSM), which consists of a Spatial Selection Module (SSM) and a Frequency Selection Module (FSM) [23,24]. As shown in Figure 4, the DSM highlights insect contours and fine-grained features through the joint effect of spatial saliency modeling and frequency component filtering, while effectively suppressing illumination and background interference [25].

Let the input feature be

F \in ℝ^{C \times H \times W}

, In the Spatial Selection Module (SSM), the feature map first undergoes pooling and convolution operations to obtain a compressed spatial representation. Depth-wise convolution is then applied to extract contextual information across spatial dimensions. The result is tiled and added element-wise to the original features, achieving a fusion of local and global spatial information. The spatially enhanced feature is expressed as:

F_{s s m} = F + Tile (DWConv (Conv (Pool (F))))

(5)

On this basis, the spatially enhanced feature

F_{s s m}

is fed into the Frequency Selection Module (FSM) for further refinement along the frequency dimension. Specifically,

F_{s s m}

is expanded and broadcast to obtain frequency responses across channels. Channel-wise averaging is then performed to generate a frequency weight vector

W_{f}

which is multiplied element-wise with the input, as follows:

F_{f s m} = W_{f} ⊙ F_{s s m}

(6)

where

⊙

denotes element-wise multiplication. The FSM strengthens high-frequency information associated with insect edges and textures, while suppressing low-frequency redundancy caused by illumination reflections and homogeneous backgrounds.

Finally, DSM fuses the outputs of spatial and frequency processing and feeds them back to the backbone through a residual connection, producing the final output as follows:

F_{o u t} = F + γ \cdot F_{f s m}

(7)

where

γ

is a learnable parameter. Through the complete process of “spatial saliency enhancement–frequency component selection–residual fusion,” DSM jointly models key high-frequency edge features across both spatial and frequency dimensions. This design preserves insect detail information while effectively suppressing background noise and redundant features, thereby substantially improving detection robustness in pest-monitoring lamp scenarios. In summary, the C3k2-MESA-DSM module adaptively models insect boundary information across multiple scales, mitigating edge-detail loss and background interference in complex monitoring environments. This provides strong support for accurate identification of apple orchard pests.

2.4.5. AP-BiFPN Module

In apple orchard pest images collected by intelligent pest-monitoring lamps, large variations in insect scale, blurred boundaries, and complex backgrounds often occur. In the original YOLOv11n, the Neck relies on FPN/PAN strategies of up-sampling and concatenation, where fixed equal weights are assigned during multi-scale feature fusion. This prevents adaptive adjustment according to the information content and discriminability of branch features, leading to the attenuation of fine-grained details of small pests during cross-layer transmission and limiting detection accuracy.

To address this limitation, the AP-BiFPN (Adaptive Weighted Bidirectional Feature Pyramid Network) module is introduced into the Neck of YOLOv11n. AP-BiFPN extends BiFPN by incorporating adaptive max pooling and adaptive average pooling, enabling bidirectional adaptive fusion of multi-scale features [26]. The core design of AP-BiFPN is illustrated in Figure 5.

In terms of structural design, AP-BiFPN introduces differentiated pooling operators after weight assignment to enhance scale adaptability. High-resolution branches are processed by adaptive max pooling to strengthen edge responses and local extremes, while low-resolution branches are processed by adaptive average pooling to preserve global context and background structures. After pooling and weighted fusion, the features are alternately stacked in top-down and bottom-up pathways, enabling bidirectional complementarity between semantic information and fine-grained details.

2.4.6. SimAM Module

In apple orchard pest images collected under pest-monitoring lamps, insect targets are typically very small, exhibit diverse postures, and often have blurred edges [27]. These factors cause redundant background features to accumulate and propagate through multi-scale fusion in the Neck, thereby weakening the model’s ability to represent fine-grained insect structures. To address this issue, the Simple Attention Module (SimAM) is introduced into the Neck branches of YOLOv11n and placed between the final feature output and the detection head [28,29]. SimAM explicitly models key features in a parameter-free manner, as illustrated in Figure 6.

The core idea of SimAM is to construct an energy function based on channel statistics to measure the importance of each pixel. For an input feature map

X \in ℝ^{C \times H \times W}

, the mean and variance are computed along each channel, and the energy function is defined as:

E (x_{i j k}) = \frac{{(x_{i j k} - μ)}^{2}}{4 σ^{2} + λ} + 0.5

(8)

where

x_{i j k}

denotes the pixel value at position

(j, k)

in channel

i

, and

λ

is a stabilization constant. The output of the Sigmoid function applied to

E

produces a three-dimensional weight vector

A = σ (E)

, which is broadcast to match the dimension of the input feature map. The final enhanced representation is obtained by element-wise multiplication between the input and the weight vector:

\overset{\land}{X} = X ⊙ A

(9)

This

\overset{\land}{X}

denotes the enhanced feature map, and

⊙

indicates element-wise multiplication.

In summary, SimAM maintains model lightweightness and real-time performance, while explicitly emphasizing high-frequency details such as insect contours, antennae, and textures. At the same time, it effectively suppresses background interference and low-contrast regions under pest-monitoring lamp conditions, thereby improving the accuracy and robustness of small-object pest detection.

2.4.7. Module Synergy and Feature-Flow Mechanism

MDAS-YOLO establishes a comprehensive detection pipeline in which all modules exhibit strong synergistic interactions. The LIME++ module first performs illumination modeling and contrast enhancement on the input images, providing high-quality visual inputs for subsequent feature extraction. Subsequently, the C3k2-MESA-DSM module conducts multi-scale edge enhancement at the early stage, effectively preserving the contours and fine-grained textures of small targets. The AP-BiFPN module further achieves adaptive fusion of semantic and detailed features across scales, enriching the hierarchical feature representation. Finally, the SimAM module performs pixel-level saliency recalibration prior to prediction, thereby enhancing feature focus under a lightweight configuration. Collectively, these modules form a progressive enhancement pathway from input to output, ensuring unified improvement in visual quality, feature integrity, and detection robustness.

3. Results

3.1. Experimental Setup and Hyperparameter Optimization

The experiments were conducted on a Windows 10 (64-bit) operating system with an Intel^® Xeon^® Silver 4210R processor and an NVIDIA GeForce RTX 3090 GPU, using PyTorch 2.4.0 as the deep learning framework. GPU parallel acceleration was implemented with CUDA 12.1 to ensure efficient and stable network training.

To improve the convergence speed and generalization capability of the MDAS-YOLO network in apple orchard pest recognition, a transfer learning strategy was adopted. Specifically, weights pre-trained on the COCO dataset were used as initialization parameters and fine-tuned with the apple orchard pest dataset, enabling the transfer of generic feature representations while adapting to the target task distribution. To determine the optimal training hyperparameters, different parameter combinations were systematically compared and analyzed under unified experimental conditions, based on their performance on the validation set. The final optimal training configuration was obtained, as summarized in Table 1.

3.2. Experimental Evaluation Metrics

To evaluate the recognition performance of the MDAS-YOLO model, precision (P), recall (R), mean average precision (mAP), and the number of parameters were selected as the primary metrics for apple orchard pest identification. Average precision (AP) denotes the recognition accuracy for a single pest category, while mAP is the average of AP across all categories, with AP measured as the area under the curve. In addition, GFlops was introduced to quantify the computational cost of the model, and FPS (frames per second) was used to assess inference speed, representing the number of images processed per unit time.

3.3. Contribution of LIME++ Preprocessing

To quantify the independent contribution of the LIME++ enhancement algorithm, a control experiment was conducted on the YOLOv11n baseline under identical training and inference settings, with and without LIME++ applied. The comparative results are shown in Table 2.

As shown in Table 2, enabling LIME++ leads to performance gains of 2.2% in precision, 0.87% in recall, and 2.59% in mAP, demonstrating that the enhancement module effectively improves the visual quality and feature distinguishability of lamp-trap pest images. The improvement mainly results from the illumination modeling and contrast stabilization functions of LIME++, which enhance contour clarity and intensity uniformity, thereby facilitating subsequent feature extraction.

It is hereby clarified that all subsequent experiments (including MDAS-YOLO and all baseline models) were trained and evaluated on datasets uniformly preprocessed with LIME++, ensuring fair comparison under identical input conditions.

3.4. Experiments with Different Models

To verify the advantages of the proposed MDAS-YOLO model for apple orchard pest recognition, experiments were conducted on the self-built pest image dataset collected by intelligent pest-monitoring lamps. Under identical experimental conditions, MDAS-YOLO was compared with mainstream lightweight detectors (YOLOv5n, YOLOv8n, YOLOv10n, YOLOv11n), the classical two-stage method (Faster R-CNN), the single-stage baseline (SSD), and the Transformer-based paradigm model (RT-DETR). The results are summarized in Table 3.

As shown in Table 3, the proposed MDAS-YOLO model achieved the best overall performance on the self-built apple orchard pest dataset, with an mAP of 95.68%. Compared with YOLOv11n, the mAP increased by 6.97 percentage points, with only a modest increase in parameters and computational cost. Relative to lightweight baselines such as YOLOv5n, YOLOv8n, and YOLOv10n, MDAS-YOLO substantially reduced the miss-detection rate for dense, small-scale, and weak-boundary pest samples. In contrast, Faster R-CNN and SSD, constrained by region proposal mechanisms and fixed fusion strategies, struggled to suppress interference under complex backgrounds. Although RT-DETR benefits from global modeling, its high computational overhead failed to yield superior accuracy in weak-texture, small-object scenarios.

The performance improvement of MDAS-YOLO mainly stems from the synergistic effect of three structural innovations: the C3k2-MESA-DSM in the backbone explicitly enhances insect edges and fine-grained textures while suppressing redundancy through dual-domain selection; the AP-BiFPN in the neck adaptively fuses cross-scale semantics and details via learnable weighting; and the SimAM module before the detection head recalibrates pixel-level features in a parameter-free manner, further suppressing noise. Together, these components form a closed-loop mechanism of “edge enhancement–multi-scale fusion–end-stage recalibration,” enabling the model to maintain strong discriminative ability and real-time performance even under the scale variation, boundary weakening, and dense adhesion conditions typical of pest-monitoring lamp scenarios.

To intuitively illustrate the differences among detection models in terms of accuracy, speed, and complexity, the performance comparison was further visualized using a bubble chart and a bar chart, as shown in Figure 7.

3.5. Experiments on Embedding C3k2 at Different Positions

To further verify the optimal embedding position of the C3k2-MESA-DSM module within the YOLOv11n architecture, three comparative experiments were designed based on the original YOLOv11n model. The module was applied separately to the Backbone, to the Neck, and to both Backbone and Neck simultaneously. All experiments were conducted under identical training configurations and datasets. The results are presented in Table 4.

As shown in Table 4, embedding the C3k2-MESA-DSM module into the Backbone yields the best results, with precision, recall, and mAP reaching 90.73%, 88.81%, and 91.99%, respectively. These values outperform the other schemes, while the computational cost only increases from 6.5 to 6.8 GFLOPs, and the model size remains at 5.92 MB, maintaining a favorable balance between complexity and performance. In comparison, embedding into the Neck improved edge responses to some extent, but the overall mAP was only 89.03%, indicating that edge modeling introduced solely at high-level abstract features provides limited gain. Embedding into both Backbone and Neck further raised the mAP to 90.21%; however, the computational cost rose to 7.1 GFLOPs, with diminishing marginal benefits.

In summary, Backbone embedding is optimal because apple orchard pest images are generally characterized by small scale, blurred boundaries, and dense distributions. Enhancing insect contours and fine-grained information through multi-scale edge modeling and dual-domain selection at the early stage of feature extraction maximizes the preservation of boundary information during down-sampling. By contrast, at the Neck stage, features are already highly abstract, and the module provides limited additional benefit, while combined use introduces redundant computation that compromises the trade-off between accuracy and efficiency. These findings confirm that embedding the C3k2-MESA-DSM module in the Backbone not only fully exploits its advantages in edge enhancement and background suppression but also achieves higher detection accuracy and robustness while maintaining lightweight efficiency.

To provide an intuitive comparison of the performance differences caused by embedding C3k2 at different positions, a bar chart visualization was employed, as shown in Figure 8.

3.6. Comparative Experiments on Different Attention Mechanisms

To verify the advantage of the SimAM attention mechanism in the MDAS-YOLO model, several comparative experiments were conducted based on the YOLOv11n baseline. Specifically, the ECA, MetaSA, CBAM, and CAA attention mechanisms were individually embedded between the outputs of the three Neck branches and the detection head. Their performance in apple orchard pest detection was then compared. All experiments were performed under the same dataset and training configurations. The results are summarized in Table 5.

As shown in Table 5, SimAM achieved the best performance in precision, recall, and mAP with almost no increase in parameters or computational cost, improving by 2.53%, 2.89%, and 1.57% over the YOLOv11n baseline, respectively. This highlights the advantage of SimAM in constructing a pixel-level energy function from channel statistics. By performing fine-grained saliency recalibration within the feature map, it effectively amplifies high-frequency details such as insect contours and antennae while suppressing the homogeneous background commonly present in pest-monitoring lamp images. Consequently, SimAM demonstrates high robustness in scenarios involving small-scale targets, blurred boundaries, and dense distributions.

In comparison, MetaSA and CAA achieved relatively higher recall, indicating that their dynamic spatial modeling and coordinate-guided mechanisms are beneficial for retrieving weak targets. However, they incur additional overhead of 0.08–0.10 M parameters and 0.15–0.20 GFLOPs, making their trade-off between computation and accuracy less favorable than SimAM. CBAM achieved accuracy close to that of SimAM through joint channel–spatial modeling, but its extra convolutional operations resulted in an increase of 0.14 M parameters and 0.30 GFLOPs, significantly reducing real-time efficiency. ECA achieved slight gains with minimal overhead, confirming the potential of fast channel recalibration in lightweight models. In summary, within pest-monitoring lamp scenarios, SimAM delivers the best detection performance with zero parameters and zero additional computational cost, demonstrating a superior balance between lightweight design and high-accuracy detection.

To provide a clearer visualization of the performance trends across different attention mechanisms, a radar chart was employed, as shown in Figure 9.

3.7. Comparative Experiments on Different Feature Fusion Modules

To systematically evaluate the impact of neck feature fusion strategies on small-object pest detection, this study conducted comparative experiments within the YOLOv11n framework, where AP-BiFPN was compared with five representative fusion modules, including PANet, ASFF, NAS-FPN, HRFPN, and BiFPN. All experiments were performed under the same dataset and training configuration, and the results are presented in Table 6.

As shown in Table 6, AP-BiFPN achieved the best performance in terms of P, R and mAP, reaching 90.16%, 88.60%and 90.83%, respectively. Compared with the baseline network, it required only an additional 0.4 M parameters and 0.8 GFLOPs, thereby achieving a favorable balance between accuracy and computational complexity. Its advantage primarily lies in the differentiated pooling and learnable weighting mechanism, which enhances insect body edges and fine-grained textures in high-resolution branches while preserving contextual information in low-resolution branches, thus significantly improving cross-scale feature complementarity.

By contrast, PANet alleviated the information flow bottleneck through a bottom-up path but still relied on equal-weight concatenation, which diluted small-object features. ASFF improved Recall under dense occlusion scenarios through spatial adaptive weighting, but at the cost of increased computational overhead. NAS-FPN enhanced mAP to 90.22% by leveraging automatically searched topologies, but this came with a substantial increase in parameters and GFLOPs, resulting in poor accuracy–efficiency trade-offs. HRFPN preserved high-resolution streams to improve the detection of blurred-boundary objects but significantly increased computational cost. BiFPN showed stable performance in channel-level weighted fusion, yet it was slightly inferior to AP-BiFPN in maintaining edge details and detecting small objects.

Overall, AP-BiFPN demonstrated the most favorable trade-off between computation and accuracy on orchard pest images collected by the intelligent monitoring device. It improved the detection accuracy of small-scale, weak-boundary, and densely clustered pests, while maintaining only a moderate increase in parameters and computational cost.

To provide an intuitive comparison of AP-BiFPN with five representative fusion modules, including PANet, ASFF, NAS-FPN, HRFPN, and BiFPN, radar charts were used for visualization, as shown in Figure 10.

3.8. Visualization Experiments of MDAS-YOLO

Due to the complexity of backgrounds in images collected by pest-monitoring equipment, which often contain background clutter, non-target insects, and occlusion or overlap among multiple pests, detection is subject to significant interference. To comprehensively evaluate model robustness under such conditions, the detection results of MDAS-YOLO and the comparative models were visualized, as shown in Figure 11.

The results demonstrate that MDAS-YOLO can accurately recognize and localize small-scale pest individuals under challenging conditions such as adhesion, occlusion, and background clutter. In multi-insect overlapping scenarios, the enhancement of edges and fine-grained textures by the C3k2-MESA-DSM enables the model to effectively distinguish individual boundaries and maintain detection integrity. Under low-light or blurred conditions, the pixel-level recalibration of SimAM further suppresses redundant features, significantly improving insect saliency and discriminability. In summary, MDAS-YOLO exhibits superior detection performance in complex backgrounds, small-scale targets, and densely adhesive scenarios, validating the effectiveness of the design paradigm of “early edge enhancement, cross-scale adaptive fusion, and end-stage lightweight recalibration.” These findings highlight the stability and adaptability of the model in practical applications and provide reliable support for high-precision pest monitoring with intelligent pest-monitoring lamps in complex field environments.

3.9. Visual Interpretation of Model Attention

To further elucidate the internal mechanism and synergistic behavior of MDAS-YOLO, Grad-CAM visualization was employed to compare attention responses between the baseline YOLOv11n and the proposed MDAS-YOLO, as shown in Figure 12.

The baseline YOLOv11n often produces dispersed activations that partially cover insects and extend to nearby specks, which leads to missed detections or false positives in dense scenes. In contrast, MDAS-YOLO yields compact and discriminative activation regions closely aligned with insect bodies and boundaries. This indicates that C3k2-MESA-DSM strengthens early edge/texture cues of tiny insects, AP-BiFPN performs selective cross-scale fusion that emphasizes informative scales while suppressing clutter, and SimAM applies pixel-level recalibration to attenuate spurious activations without extra parameters. Taken together, these complementary effects manifest a superadditive synergy: the combined model focuses more precisely and robustly than the sum of individual modules, explaining the reduction in false positives and the emergence of additional true positives in challenging lamp-trap scenarios.

3.10. Ablation Study

To verify the independent contributions and synergistic effects of each improved module in MDAS-YOLO, ablation experiments were conducted on the YOLOv11n baseline. The analysis focused on the impacts of the C3k2-MESA-DSM, AP-BiFPN, and SimAM modules on model performance. All experiments were performed under identical training environments and hyperparameter configurations. The results are presented in Table 7.

The experimental results indicate that all three modules achieved significant improvements over the YOLOv11n baseline when used individually, though with distinct performance differences. The C3k2-MESA-DSM provided the largest single-module gain, increasing mAP by 3.28 percentage points. This improvement stems from explicit reinforcement of insect edges and fine-grained textures at the early Backbone stage, combined with dual-domain selection to suppress redundant background features, thereby delivering higher feature fidelity for subsequent fusion. The AP-BiFPN improved mAP by 2.12 percentage points through learnable weighting and differentiated pooling, which enhance semantic–detail complementarity across scales. Notably, this was achieved with only an additional 0.40M parameters and 0.80 GFLOPs, striking a favorable balance between accuracy and computational cost. SimAM raised mAP by 1.57 percentage points; although the gain was smaller, its parameter-free and computation-free design allows stable enhancement of insect saliency while preserving real-time performance.

For dual-module combinations, C3k2-MESA-DSM + AP-BiFPN achieved an mAP of 93.90%, exceeding the linear sum of their individual improvements, thereby demonstrating a superadditive effect of cross-layer edge fidelity and adaptive fusion. The C3k2-MESA-DSM + SimAM combination raised mAP to 92.40%, though the marginal gain was smaller, suggesting partial redundancy between early-stage edge enhancement and end-stage recalibration. However, this redundancy is beneficial rather than detrimental, acting as a double validation mechanism that reinforces consistent attention on insect boundaries and suppresses background artifacts. This complementary overlap improves stability in dense lamp-trap scenes without introducing significant computational cost, indicating that both modules contribute distinct yet cooperative refinements to feature integrity. The AP-BiFPN + SimAM combination reached 91.95% mAP, close to the linear sum, confirming the complementary nature of mid-level fusion and end-stage recalibration.

Finally, integrating all three modules yielded MDAS-YOLO with an mAP of 95.68%, significantly outperforming both single- and dual-module configurations and far exceeding linear addition. These results confirm that the closed-loop design of “early edge fidelity, mid-level cross-scale adaptive fusion, and end-stage lightweight recalibration” effectively mitigates missed and false detections under small-scale, weak-boundary, and densely distributed conditions in pest-monitoring lamp images. This synergistic design achieves an optimal trade-off between accuracy, model complexity, and real-time performance, demonstrating strong potential for high-precision pest monitoring in intelligent agricultural applications.

4. Discussion

In apple orchard pest images collected by intelligent pest-monitoring lamps, small scale, weak boundaries, and dense distributions remain the key factors limiting detection accuracy. Previous studies have explored these challenges. For example, Xiang et al. proposed Yolo-Pest [10], which introduced ConvNext and SE attention to improve the robustness of small-object detection under complex backgrounds; Tian et al. [12] developed MD-YOLO, which enhanced feature representation through multi-scale dense connections. However, these approaches still show limitations in edge fidelity and cross-scale adaptability, offering only limited gains for weak-texture or adhesive targets.

The proposed MDAS-YOLO addresses these issues through a closed-loop optimization path. The C3k2-MESA-DSM in the Backbone explicitly enhances edges and applies dual-domain selection, effectively preserving insect contours and fine-grained features. The AP-BiFPN in the Neck achieves dynamic semantic–detail complementarity through learnable weighting and differentiated pooling, mitigating the dilution of small-object information during cross-layer transmission. The SimAM module before the detection head performs pixel-level recalibration in a parameter-free manner, suppressing background interference while maintaining lightweight efficiency. Combined with LIME++ image enhancement at the input stage, these modules validate the effectiveness of the collaborative paradigm of “early edge fidelity, mid-level adaptive fusion, and end-stage lightweight recalibration” for orchard pest detection.

Nevertheless, this study has certain limitations. The dataset used in this study was primarily collected from apple orchards in Shandong Province, with limited environmental and species diversity, which may restrict the model’s generalization across different ecological conditions. It is also noteworthy that the LIME++ enhancement performs effectively under the illumination and imaging conditions of Shandong orchards, but potential domain-shift risks may arise under varying lighting spectra, seasonal environments, or monitoring devices. In future work, we plan to expand data collection to include multiple regions, seasons, and devices, and to assess the robustness and transferability of both the LIME++ enhancement and detection modules under diverse environmental conditions.

In addition, the modular architecture of MDAS-YOLO demonstrates strong adaptability and scalability. Its multi-scale edge-fidelity (C3k2-MESA-DSM), adaptive cross-scale fusion (AP-BiFPN), and pixel-level recalibration (SimAM) mechanisms are task-generic rather than crop-specific, allowing easy adaptation to various orchard types and monitoring devices. Future studies will further evaluate the model’s transferability and robustness under diverse crop types and varying lighting, canopy, and device conditions, providing a foundation for cross-crop applications in intelligent pest-monitoring systems.

5. Conclusions

This study addresses the challenges of detecting small-scale, weak-boundary, and densely distributed pests in apple orchard images collected by intelligent pest-monitoring lamps. A detection network, MDAS-YOLO, is proposed for intelligent pest-monitoring equipment. In the Backbone, the C3k2-MESA-DSM module is embedded to achieve early edge fidelity and dual-domain feature selection. In the Neck, the AP-BiFPN enables adaptive fusion of cross-scale features. Before the detection head, the SimAM module performs lightweight pixel-level recalibration. Combined with the LIME++ image enhancement algorithm, the model effectively mitigates interference from low illumination and uneven lighting. Experimental results show that MDAS-YOLO achieved an mAP of 95.68% on the self-built apple orchard pest dataset, striking a favorable balance between detection accuracy and model complexity.

In conclusion, MDAS-YOLO substantially enhances the detection of small-scale, weak-boundary, and densely distributed pests, providing an efficient, robust, and scalable solution for intelligent pest monitoring in Shandong apple orchards. Beyond technical performance, this work contributes to sustainable pest management and smart orchard development by enabling early warning, real-time monitoring, and reduced reliance on chemical pesticides. To further strengthen the external validity of MDAS-YOLO, future studies will test the model on cross-regional and multi-season datasets to assess its robustness under diverse ecological conditions. Additionally, we plan to deploy the model on embedded edge devices and integrate it with orchard IoT systems, promoting intelligent, energy-efficient, and eco-friendly pest monitoring in modern agriculture.

Author Contributions

Conceptualization, B.M., H.Z. and J.W.; methodology, B.M., J.X. and R.L.; software, B.M. and J.M.; validation, B.M., B.L. and R.X.; formal analysis, B.M. and J.X.; investigation, B.M., J.M. and R.L.; resources, S.L., X.H. and Y.Z.; data curation, B.M. and R.X.; writing—original draft preparation, B.M.; writing—review and editing, H.Z. and J.W.; visualization, B.M.; supervision, H.Z. and J.W.; project administration, H.Z. and J.W.; funding acquisition, H.Z. and J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shandong-Chongqing Science and Technology Collaboration Project, National Natural Science Foundation of China (32472014 and 32071908), China Agriculture Research System (CARS-27) Shandong Province Key R&D Plan (2023TZXD061), Shandong Province “University Youth Innovation Team” Program (2023KJ160), Young Talent of Lifting Engineering for Science and Technology in Shandong (SDAST2024QTA050).

Data Availability Statement

The dataset, training/inference code, and model weights that support the findings of this study are available from the corresponding author upon reasonable request for non-commercial academic research purposes.

Conflicts of Interest

Author Xianliang Hu was employed by Shandong Xiangchen Technology Group Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

National Bureau of Statistics of China. China Statistical Yearbook 2023; National Bureau of Statistics of China: Beijing, China, 2023.
Zhai, Z.; Cao, Y.; Xu, H.; Yuan, P.; Wang, H. Review of key techniques for crop disease and pest detection. Trans. Chin. Soc. Agric. Mach. 2021, 52, 1–18. [Google Scholar] [CrossRef]
Wang, R.; Li, R.; Chen, T.; Zhang, J.; Xie, C.; Qiu, K.; Chen, P.; Du, J.; Chen, H.; Shao, F.; et al. An automatic system for pest recognition and forecasting. Pest Manag. Sci. 2022, 78, 711–721. [Google Scholar] [CrossRef] [PubMed]
Guo, B.; Wang, J.; Guo, M.; Chen, M.; Chen, T.; Miao, Y. Overview of pest detection and recognition algorithms. Electronics 2024, 13, 3008. [Google Scholar] [CrossRef]
Edson, B.; Helena, M.; Helio, P.; Avila, S. Weakly supervised attention-based models using activation maps for citrus mite and insect pest classification. Comput. Electron. Agric. 2022, 195, 106839. [Google Scholar] [CrossRef]
Chathurika, D.A.; Nisal, M.R.; John, G.; Dorin, A. Fine-grained image classification of microscopic insect pest species: Western Flower thrips and Plague thrips. Comput. Electron. Agric. 2022, 203, 107462. [Google Scholar] [CrossRef]
Pattnaik, G.; Shrivastava, V.; Parvathi, K. Transfer Learning-Based Framework for Classification of Pest in Tomato Plants. Appl. Artif. Intell. 2020, 34, 981–993. [Google Scholar] [CrossRef]
Turkoglu, M.; Hanbay, D.; Sengur, A. Multi-model LSTM-based convolutional neural networks for detection of apple diseases and pests. Ambient Intell. Humaniz. Comput. 2022, 13, 3335–3345. [Google Scholar] [CrossRef]
Lin, X.; Zhu, S.; Zhang, J.; Liu, D. Rice Planthopper Image Classification Method Based on Transfer Learning and Mask R-CNN. Trans. Chin. Soc. Agric. Mach. 2019, 50, 201–207. [Google Scholar] [CrossRef]
Xiang, Q.; Huang, X.; Huang, Z.; Chen, X.; Cheng, J.; Tang, X. Yolo-Pest: An Insect Pest Object Detection Algorithm via CAC3 Module. Sensors 2023, 23, 3221. [Google Scholar] [CrossRef]
Li, K.; Wang, J.; Jalil, H.; Wang, H. A fast and lightweight detection algorithm for passion fruit pests based on improved YOLOv5. Comput. Electron. Agric. 2023, 204, 107534. [Google Scholar] [CrossRef]
Tian, Y.; Wang, S.; Li, E.; Yang, G.; Liang, Z.; Tan, M. MD-YOLO: Multi-scale Dense YOLO for small target pest detection. Comput. Electron. Agric. 2023, 213, 108233. [Google Scholar] [CrossRef]
Chen, J.; Ma, A.; Huang, L.; Su, Y.; Li, W.; Zhang, H.; Wang, Z. GA-YOLO: A lightweight YOLO model for dense and occluded grape target detection. Horticulturae 2023, 9, 443. [Google Scholar] [CrossRef]
Zhang, W.; Jiang, F. AHN-YOLO: A Lightweight Tomato Detection Method for Dense Small-Sized Features Based on YOLO Architecture. Horticulturae 2025, 11, 639. [Google Scholar] [CrossRef]
Luo, R.; Zhao, R.; Ding, X.; Peng, S.; Cai, F. High-Precision Complex Orchard Passion Fruit Detection Using the PHD-YOLO Model Improved from YOLOv11n. Horticulturae 2025, 11, 785. [Google Scholar] [CrossRef]
Guo, X.; Li, Y.; Ling, H. LIME: Low-light image enhancement via illumination map estimation. IEEE Trans. Image Process. 2016, 26, 982–993. [Google Scholar] [CrossRef]
Guo, X. LIME: A method for low-light image enhancement. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 87–91. [Google Scholar] [CrossRef]
Wu, T.; Glavin, M.; Jones, E. An Efficient Illumination Estimation Algorithm for Low-light Image Enhancement. IEEE Access 2025, 13, 102848–102858. [Google Scholar] [CrossRef]
Hou, P.; Huang, S. BCSM-YOLO: An improved product package recognition algorithm for unmanned retail stores based on YOLOv11. IEEE Access 2025, 13, 139665–139679. [Google Scholar] [CrossRef]
Zhu, R.; Wang, X.; Wu, H.; Liu, S.; Huang, J.; Li, J.; Wang, H. Real-time detection method for potato surface defects based on YOLOv11-MML. Trans. Chin. Soc. Agric. Eng. 2025, 41, 117–126. [Google Scholar] [CrossRef]
Li, P.; Wen, M.; Zeng, Z.; Tian, Y. Cherry Tomato Bunch and Picking Point Detection for Robotic Harvesting Using an RGB-D Sensor and a StarBL-YOLO Network. Horticulturae 2025, 11, 949. [Google Scholar] [CrossRef]
Guo, J.; Huang, X. MFDF-YOLO: A Lightweight Cotton Detection Algorithm for Complex Field Environments. Comput. Eng. Appl. 2025, 1–13. Available online: https://link.cnki.net/urlid/11.2127.tp.20250821.1610.015 (accessed on 3 September 2025).
Cui, Y.; Ren, W.; Cao, X.; Alois, K. Focal network for image restoration. In Proceedings of the IEEE/CVF international conference on computer vision, Paris, France, 2–6 October 2023; pp. 13001–13011. [Google Scholar] [CrossRef]
Cai, S.; Zhou, X.; Cai, W.; Wei, L.; Mo, Y. Lightweight underwater object detection method based on multi-scale edge information selection. Sci. Rep. 2025, 15, 27681. [Google Scholar] [CrossRef]
Li, J.; Xu, Y.; Lu, J.; Li, S.; Cai, X. Lightweight multi-scale rice pest recognition model based on improved YOLOv11n. Trans. Chin. Soc. Agric. Eng. 2025, 41, 175–183. Available online: https://link.cnki.net/urlid/11.2047.S.20250819.1550.002 (accessed on 3 September 2025).
Tan, M.; Pang, R.; Le, Q. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10781–10790. [Google Scholar] [CrossRef]
Yang, L.; Zhang, R.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. Int. Conf. Mach. Learn. 2021, 139, 11863–11874. [Google Scholar]
Yang, D.; Huang, Z.; Zheng, C.; Chen, H.; Jiang, X. Detecting tea shoots using improved YOLOv8n. Trans. Chin. Soc. Agric. Eng. 2024, 40, 165–173+313. [Google Scholar] [CrossRef]
Yu, Y.; Li, D.; Song, S.; You, H.; Zhang, L.; Li, J. Ginseng-YOLO: Integrating Local Attention, Efficient Downsampling, and Slide Loss for Robust Ginseng Grading. Horticulturae 2025, 11, 1010. [Google Scholar] [CrossRef]

Figure 1. Pest collection and image enhancement in apple orchards. (a) Intelligent pest-monitoring equipment; (b) sampling sites; (c) raw collected image; (d) image enhanced by LIME++; (e) original illumination map; (f) enhanced illumination map; (g) radial illumination profile; (h) luminance histogram; (i) RGB channel histograms.

Figure 2. Overall architecture of the MDAS-YOLO model.

Figure 3. The C3k2-MESA-DSM module.

Figure 4. The DSM module.

Figure 5. Architecture of the AP-BiFPN module.

Figure 6. Architecture of the SimAM module.

Figure 7. Comprehensive performance comparison of different models: (a) mAP–FPS bubble chart; (b) parameters comparison; (c) GFlops comparison; (d) model size comparison.

Figure 8. Performance comparison of C3k2 embedding at different positions: (a) comparison of P, R, and mAP; (b) comparison of model size and GFlops.

Figure 9. Comparative performance of different attention mechanisms integrated into YOLOv11n. (a) Detection accuracy comparison in terms of mAP, R and P; (b) model complexity comparison including model size, GFLOPs and Params.

Figure 10. Performance comparison of different feature fusion modules. (a) AP-BiFPN vs. baseline YOLOv11n; (b) AP-BiFPN vs. PANet; (c) AP-BiFPN vs. ASFF; (d) AP-BiFPN vs. NAS-FPN; (e) AP-BiFPN vs. HRFPN; (f) AP-BiFPN vs. BiFPN.

Figure 11. Visualization experiments of MDAS-YOLO.

Figure 12. Visual interpretation of model attention.

Table 1. Model hyperparameter settings.

Parameter	Value
Input image size	640 × 640
Initial learning rate	0.01
Momentum	0.937
Optimizer	SGD
Batch size	8
Weight decay	0.0005
Epochs	300

Table 2. Comparative results of YOLOv11n with and without LIME++ preprocessing.

Model	LIME++	P/%	R/%	mAP/%
YOLOv11n	×	85.39	84.36	86.12
YOLOv11n	√	87.59	85.23	88.71

Table 3. Comparative results of different detection models.

Model	mAP/%	FPS	GFlops	Params/M	Size/MB
Faster R-CNN	83.14	27	139.85	41.1	108.13
SSD 300	80.67	68	25.60	26.20	91.35
RT-DETR	84.36	43	136.3	42.00	67.13
YOLOv5n	83.29	82	5.80	2.20	4.06
YOLOv8n	84.38	90	8.70	2.68	6.53
YOLOv10n	86.12	63	6.70	2.30	5.80
YOLOv11n	88.71	76	6.50	2.60	5.61
MDAS-YOLO	95.68	65	7.50	3.20	6.90

Table 4. Experiments on embedding C3k2 at different positions.

Model	P/%	R/%	mAP/%	GFlops	Size/MB
YOLOv11n	87.59	85.23	88.71	6.5	5.61
Backbone	90.73	88.81	91.99	6.8	5.92
Neck	88.63	84.17	89.03	6.6	5.92
Backbone + Neck	89.34	87.25	90.21	7.1	6.15

Table 5. Comparative experiments on different attention mechanisms.

Model	P/%	R/%	mAP/%	Params/M	GFlops	Size/MB
YOLOv11n	87.59	85.23	88.71	2.60	6.50	5.61
Baseline + ECA	88.90	86.40	89.26	2.62	6.53	5.65
Baseline + MetaSA	89.53	87.91	89.62	2.70	6.70	5.75
Baseline + CBAM	89.70	88.03	89.91	2.74	6.80	5.90
Baseline + CAA	89.50	87.83	89.39	2.68	6.65	5.78
Baseline + SimAM	90.12	88.12	90.28	2.60	6.50	5.61

Table 6. Comparative results of different feature fusion modules.

Model	P/%	R/%	mAP/%	Params/M	GFlops	Size/MB
YOLOv11n	87.59	85.23	88.71	2.60	6.50	5.61
Baseline + PANet	88.43	86.10	89.52	2.75	6.75	5.93
Baseline + ASFF	89.35	87.24	89.93	2.95	7.20	6.37
Baseline + NAS-FPN	89.81	88.19	90.22	3.40	8.10	7.34
Baseline + HRFPN	89.87	88.25	90.12	3.20	7.70	6.90
Baseline + BiFPN	89.92	87.90	90.10	2.90	7.10	6.26
Baseline + AP-BiFPN	90.16	88.60	90.83	3.00	7.30	6.47

Table 7. Results of ablation experiments on different modules of MDAS-YOLO.

YOLOv11n	C3k2-MESA-DSM	AP-BiFPN	SimAM	P/%	R/%	mAP/%	Params/M	GFlops
√				87.59	85.23	88.71	2.60	6.50
√	√			90.73	88.81	91.99	2.80	6.70
√		√		90.16	88.60	90.83	3.00	7.30
√			√	90.12	88.12	90.28	2.60	6.50
√	√	√		92.05	90.92	93.90	3.20	7.50
√	√		√	91.10	89.20	92.40	2.80	6.70
√		√	√	90.52	88.70	91.95	3.00	7.30
√	√	√	√	94.38	93.12	95.68	3.20	7.50

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ma, B.; Xu, J.; Liu, R.; Mu, J.; Li, B.; Xie, R.; Liu, S.; Hu, X.; Zheng, Y.; Zhang, H.; et al. MDAS-YOLO: A Lightweight Adaptive Framework for Multi-Scale and Dense Pest Detection in Apple Orchards. Horticulturae 2025, 11, 1273. https://doi.org/10.3390/horticulturae11111273

AMA Style

Ma B, Xu J, Liu R, Mu J, Li B, Xie R, Liu S, Hu X, Zheng Y, Zhang H, et al. MDAS-YOLO: A Lightweight Adaptive Framework for Multi-Scale and Dense Pest Detection in Apple Orchards. Horticulturae. 2025; 11(11):1273. https://doi.org/10.3390/horticulturae11111273

Chicago/Turabian Style

Ma, Bo, Jiawei Xu, Ruofei Liu, Junlin Mu, Biye Li, Rongsen Xie, Shuangxi Liu, Xianliang Hu, Yongqiang Zheng, Hongjian Zhang, and et al. 2025. "MDAS-YOLO: A Lightweight Adaptive Framework for Multi-Scale and Dense Pest Detection in Apple Orchards" Horticulturae 11, no. 11: 1273. https://doi.org/10.3390/horticulturae11111273

APA Style

Ma, B., Xu, J., Liu, R., Mu, J., Li, B., Xie, R., Liu, S., Hu, X., Zheng, Y., Zhang, H., & Wang, J. (2025). MDAS-YOLO: A Lightweight Adaptive Framework for Multi-Scale and Dense Pest Detection in Apple Orchards. Horticulturae, 11(11), 1273. https://doi.org/10.3390/horticulturae11111273

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MDAS-YOLO: A Lightweight Adaptive Framework for Multi-Scale and Dense Pest Detection in Apple Orchards

Abstract

1. Introduction

2. Materials and Methods

2.1. Sample Collection

2.2. Image Enhancement Strategy Based on LIME++

2.3. Construction of the Apple Orchard Pest Dataset

2.4. MDAS-YOLO Model Construction

2.4.1. YOLOv11n Algorithm

2.4.2. MDAS-YOLO Overview

2.4.3. C3k2-MESA-DSM Module

2.4.4. DSM Module

2.4.5. AP-BiFPN Module

2.4.6. SimAM Module

2.4.7. Module Synergy and Feature-Flow Mechanism

3. Results

3.1. Experimental Setup and Hyperparameter Optimization

3.2. Experimental Evaluation Metrics

3.3. Contribution of LIME++ Preprocessing

3.4. Experiments with Different Models

3.5. Experiments on Embedding C3k2 at Different Positions

3.6. Comparative Experiments on Different Attention Mechanisms

3.7. Comparative Experiments on Different Feature Fusion Modules

3.8. Visualization Experiments of MDAS-YOLO

3.9. Visual Interpretation of Model Attention

3.10. Ablation Study

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI