1. Introduction
Honeybees (
Apis mellifera, Bees) are economically and ecologically vital pollinators [
1], which produce beekeeping products, such as honey, propolis, pollen, and bee venom. In 2022, Colony Collapse Disorder (CCD) decimated bee populations across South Korea. In CCD, the bee population is rapidly reduced and the colony collapses, and it is characterized by the inability of bees to return to the hive [
2,
3,
4,
5]. Several factors contribute to CCD, including climate abnormalities caused by global warming, improper managements, insufficient nectar sources, and infestations by pests such as
Varroa destructor (bee mite) [
6,
7].
Bee mites parasitize both adult bees and larvae by consuming their fat reserves, resulting in reduced body weight and lifespan [
8]. Additionally, beehives are weakened, and in severe cases, can lead to complete hive collapse [
9]. Bee mites are vectors that transmit viruses and pathogens to bees [
10]. Deformed wing virus (DWV), which is transmitted by bee mites [
11,
12] is related to CCD. Infected worker bees exhibit impaired development and reduced flight ability [
13,
14]. Therefore, effectively controlling bee mites is crucial for the beekeeping industry.
Traditional inspection methods remain labor-intensive and highly dependent on the beekeeper’s expertise, making large-scale monitoring impractical [
15]. Approximately 120 beehives are managed by the average beekeeper in Korea [
16]. The direct inspection of all beehives is hindered by the aging of beekeepers because 52.6% of the agricultural population in Korea is over 65 years old [
17]. Furthermore, bee mites are reddish brown in color, making them difficult to distinguish from bees, and extremely small (1.6 mm (width) × 1.1 mm (length)) [
7,
18,
19]. Detecting, by eye, deformed-wing virus–infected bees (deformed bees) or bee mites in a beecomb of roughly 2000 bees within one minute is extremely challenging, and the accuracy of such diagnosis hinges on the beekeeper’s level of expertise [
20]. Therefore, a technology that can rapidly and objectively identify bee mites and deformed bees is required [
21].
Recent studies have explored image-based computer vision techniques for beekeeping application. Various systems, including real-time bee monitoring using cameras [
22], video monitoring units (VMU) with reflective surfaces [
23], and Image Acquisition Systems (IAS) [
24], have been developed. However, these methods suffer from limitations such as restricted monitoring of external bees, duplicate detections, manual intervention requirements, and instability in image acquisition.
Mrozek et al. [
22] developed a real-time bee monitoring system using external cameras. While this system provided continuous monitoring of bee activity outside the hive, it was unable to inspect internal hive conditions. Similarly, Bjerge et al. [
23] introduced the VMU, which used acrylic windows and reflective surfaces to capture bee abdomen images as they passed through a designated passage. Although this system allowed non-invasive monitoring, it was limited to external bees entering and exiting the hive, leading to potential duplicate detections and incomplete colony assessments.
Lee et al. [
24] developed the IAS with a fixed RGB camera to capture images of the beecomb for directly detecting mite. Unlike the VMU, this system enabled direct inspection of bee mites on the beecomb. However, manual intervention was required to rotate the beecomb after capturing one side, and motion-induced instability affected image quality. Additionally, Kongsilp et al. [
25] designed an observation hive using an RGB-based image acquisition system to analyze bee behavior patterns. This system incorporated a shading door for non-contact measurements but was limited by its inability to inspect the back of the hive, restricting its applicability for comprehensive colony health monitoring.
With the advancements in artificial intelligence (AI)-based object detection, deep learning models have been increasingly applied in agricultural and beekeeping research [
24,
26]. Unlike traditional visual inspection, AI-based object detection improves accuracy and objectivity. However, a significant challenge in training these models is the inherent class imbalance in datasets, where pests like bee mites are far less frequent than bees. Recent research has demonstrated that employing strategies such as data augmentation and stratified sampling can effectively mitigate this issue, thereby enhancing the detection performance of YOLO models for bee mites [
27]. Beekeeping environments require multi-object and small-object detection due to the presence of various elements on the beecomb. In this study, You Only Look Once -version 8 (YOLOv8) and -version 11 (YOLOv11), the state-of-the-art object detection algorithm known for its superior performance in detecting small and overlapping objects, were utilized [
28,
29].
This study aims to develop and validate an automated AI-based vision inspection system for detecting honeybee pests and diseases by YOLO-based object detection models. The primary contribution of this work is the development of an integrated edge-based inspection system that combines specialized dual-sided image acquisition hardware with a tiling-based inference pipeline, enabling high-resolution beecomb analysis on resource-constrained edge devices while preserving small-object details such as bee mites.
The image characteristics of normal bee, bee mites, and deformed bees were analyzed to construct a structured detection dataset. Furthermore, six YOLO models—based on YOLOv8 and YOLOv11 architectures across three model sizes (nano, small, and large)— were systematically evaluated to identify architectures suitable for practical apiary deployment and to investigate the relationship between morphological characteristics and detection errors in small-object recognition.
2. Materials and Methods
2.1. Design of an Automated Beecomb Inspection System
An automated AI-based vision inspection system was developed to identify normal bees, bee mites, and deformed bees, as illustrated in the system architecture and hardware configuration diagrams (
Figure 1). The proposed system consists of three main components: (1) a beecomb rotation unit, (2) an image acquisition unit, and (3) a control unit with hierarchical architecture.
2.1.1. Beecomb Rotation Unit
The beecomb rotation unit was designed to stabilize the beecomb and enable precise rotation during inspection. A beecomb supporter frame (508 mm × 290 mm, width × height) held the beecomb securely (
Figure 1b). Automated 180° rotation was achieved by integrating a stepping motor (NK245E, Motor Bank, Seoul, Korea) with the following specifications: 24 V rated voltage, 54 Nm torque, 0.04 gear ratio, and 1.8° step angle. The stepping motor was operated with a microstepping setting of 32, providing a rotational resolution of 160,000 pulses per revolution (Pulse/rev) at the output shaft and positional accuracy of 0.00225° per pulse. Motor control was managed through a motor driver (MSD-224, Motor Bank, Seoul, Republic of Korea), and a photosensor (PM-L24, Panasonic, Osaka, Japan) was installed to establish the initial reference position. When a reference protrusion on the stepping motor was detected by the photosensor, a signal was transmitted to an Arduino microcontroller, which then determined the initial position of the beecomb rotation unit based on this feedback.
2.1.2. Image Acquisition Unit
High-resolution image acquisition was achieved using a Raspberry Pi Camera Module 3 (Raspberry Pi Foundation, Cambridge, UK) with the following specifications: 4608 × 2592 pixel resolution, 1.4 μm × 1.4 μm pixel size, 66° horizontal field of view (FOV), and 41° vertical FOV. These specifications enabled clear capture of small objects such as bee mites.
2.1.3. Control Unit with Hierarchical Architecture
A hierarchical control structure was adopted to manage the system’s complexity and enhance overall stability and real-time responsiveness (
Figure 1a). The upper-level control layer was managed by a Raspberry Pi 5 (16 GB RAM, Raspberry Pi Foundation, Cambridge, UK), which performed overall computational processing and operation scenario management. To accelerate high-load AI model inference, an AI HAT+ (13 TOPS, Raspberry Pi Foundation, UK) neural processing unit (NPU) was connected directly to the Raspberry Pi 5 via a Flexible Printed Circuit (FPC) cable on the PCIe port. The integration of the AI HAT+ enabled the execution of complex YOLO models directly on the edge device. By offloading the computational burden of high-resolution image inference from the CPU to the NPU, the system achieved the real-time responsiveness required for field operations, processing dual-sided images of the beecomb without the need for high-performance external servers or cloud connectivity. The Raspberry Pi 5 CPU, freed from inference tasks, concentrated on system control, sensor data aggregation, and communication tasks, which improved overall system efficiency and responsiveness.
For real-time hardware control requiring precise timing, an Arduino Mega (Arduino LLC, Monza, Italy) served as the lower-level controller, receiving high-level commands from the Raspberry Pi 5 via serial communication. The Arduino Mega independently executed low-level tasks such as stepping motor control and photosensor signal detection. A regulated power supply unit (SMPS, KO ELECTRONIC, Seoul, Korea) provided stable voltage and current to all system components according to their rated specifications. For field deployment and operational mobility, the entire system was powered by a portable battery pack (power station, 324,000 mAh capacity) (DLP8092C, PHILIPS, Amsterdam, The Netherlands), enabling autonomous operation without dependence on external AC power sources and facilitating flexible deployment across apiary locations.
2.2. Optical System Design and Motor Positioning Accuracy
2.2.1. Optimal Imaging Distance Calculation
To capture the entire beecomb in a single image, the optimal distance between the beecomb and camera was determined using the camera’s field of view and target dimensions (
Figure 2). The minimum required distance was calculated as:
where
is the minimum distance between beecomb and camera,
is the camera’s horizontal measuring range (set to 508 mm, matching the beecomb supporter frame width), and
is the camera’s horizontal angle of view (66°).
The vertical measuring range was determined as:
where
is the vertical measuring range and
is the vertical angle of view (41°). The calculated vertical measuring range was verified to encompass the entire beecomb-supporter frame height (
Figure 2b,c).
To validate the suitability of deep learning-based object detection at this fixed distance, the pixel dimensions of bee mites in this study were compared with previously reported values using pixel-scaling Equations (3)–(5) [
23].
where
is pixel values of bee mites measured in previous study*;
is pixel values of bee mites measured by the system developed in this study;
is pixel values of bees measured in previous study*;
is pixel values of bees measured by the system developed in this study;
is horizontal pixel values of bee mites measured in previous study*;
is vertical pixel values of bee mites measured in previous study*;
is horizontal pixel values of bee mites measured by the system developed in this study;
is vertical pixel values of bee mites measured by the system developed in this study* [
23].
2.2.2. Automated Beecomb Rotation Mechanism
To address the limitations of manual rotation, an automated beecomb rotation mechanism was designed to maintain the beecomb in a vertical orientation and rotate it 180° around its vertical axis (
Figure 3). The stepping motor provided high-precision rotation, with positional repeatability not exceeding 15 arcseconds (0.004166°), backlash less than 1 arcminute (0.0167°), and output table parallelism and concentricity controlled within 0.01 mm.
Motor control was implemented using a trapezoidal acceleration–deceleration profile. For an 80,000 pulse 180° rotation command, the acceleration phase (first 2000 pulses) used a 0.15 ms step interval delay, the cruise phase (middle 76,000 pulses) used a 0.015 ms interval, and the deceleration phase (final 2000 pulses) returned to 0.150 ms. This control strategy minimized mechanical vibrations and ensured smooth rotation throughout the operational cycle.
Absolute position accuracy was ensured through a photosensor-based reference point correction system installed on the rotation table. After each 360° rotation, the photosensor re-detected the reference marker to remove cumulative errors from mechanical backlash or slip and to maintain high positional repeatability.
2.2.3. Image Acquisition Workflow
The overall image acquisition workflow for dual-sided beecomb inspection is presented in a flowchart (
Figure 4). Upon signal activation, the image counter was initialized and the beecomb supporter frame moved to the initial reference position. After the operator placed a beecomb on the beecomb supporter frame, the system acquired the i-th image of the front side, rotated the beecomb 180°, and then acquired the (i + 1)-th image of the back side. The image index was then increased by 2 and the system returned to standby mode until the next acquisition trigger.
2.3. Dataset Construction
The dataset for model development was constructed using images acquired by the automated inspection system. Three target object classes were defined: Normal Bee (B), Bee Mite (M), and Deformed Bee (DB). Images containing target objects were captured in 2023, 2024, and 2025 at the National Institute of Agricultural Sciences apiary in Wanju-gun, Jeollabuk-do, and an additional apiary in Gimje-si, Jeollabuk-do, Republic of Korea. Imaging parameters were fixed at a beecomb-to-camera distance of 400 mm, lens diopter of 2.50 D, and exposure time of 12.5 ms.
The data acquisition was performed within an apiary integrated with shading film to minimize the impact of natural lighting fluctuations, such as direct solar radiation and cloud-induced shadows. By stabilizing the ambient light environment through this structural shielding, consistent image acquisition was achieved without additional illumination. This setup allowed the system to operate under semi-outdoor conditions while maintaining stable imaging quality for detecting small objects such as bee mites.
Representative labeling criteria for each class are illustrated in example images (
Figure 5). Object pixel counts were calculated using Python 3.9.19 and opencv-python 4.9.0.80 to confirm that acquired images provided sufficient resolution for reliable detection of bee mites and deformed bees.
A total of 1294 raw images were acquired. To preserve the original pixel resolution and maintain the visibility of small-scale features such as bee mites, 640 × 640 pixel regions of interest (ROIs) were manually extracted from the high-resolution raw images (4608 × 2592 pixels). The ROI extraction was performed by cropping image regions containing bee mites or deformed bees so that the target objects were included within the 640 × 640 frame while maintaining sufficient surrounding context. Because the locations of pests and deformed bees varied naturally across the beecomb surface, the resulting ROIs contained naturally varying object positions within the frame.
This object-centered ROI extraction strategy also helped mitigate the severe class imbalance inherent in full-frame beecomb images. Because regions were cropped based on the presence of bee mites or deformed bees, the relative representation of these target classes in the dataset was increased compared with randomly sampled image patches. This approach allowed the models to learn target features under realistic field conditions while partially balancing the dataset distribution.
This approach was prioritized to provide the models with high-quality morphological features while ensuring that the detection capability remains robust across various spatial contexts within the input frame, which is essential for identifying microscopic targets. This cropping method prevented the loss of critical morphological details—such as the fine structures of bee mites and the distinct shapes of deformed wings—that would typically occur if the original images were downscaled to the model’s input size. From the initial 1294 raw images, a total of 2018 ROIs containing the target classes were selected to construct the final dataset for model development. Annotations were created using labelme (ver. 5.2.1) and saved in YOLO-formatted txt files. The resulting dataset contained 33,864 labeled objects, with class-wise distributions summarized in
Table 1.
The dataset was partitioned into training, validation, and test sets in a 7:2:1 ratio. To eliminate the risk of data leakage, which is critical for ensuring the validity of performance metrics, a source-aware split strategy was strictly enforced. While the partitioning occurred at the ROI level, we ensured that all ROIs originating from the same primary high-resolution image were assigned to the same subset (train, validation, or test). This isolation by raw image source prevents the model from memorizing specific backgrounds, honey residues, or lighting conditions unique to a particular beecomb frame, thereby ensuring that the evaluation stages reflect the model’s true ability to detect bee mites and deformed bees in novel contexts.
The independent test set (10%) was completely isolated from the training and hyperparameter tuning processes, enabling a reliable evaluation of the model’s generalization capability under unseen field conditions. All models were trained under identical experimental conditions to ensure fair and reproducible comparison.
2.4. Development of YOLO-Based Deep Learning Models
2.4.1. Model Selection and Architecture Overview
Six object detection models were developed to systematically evaluate the effects of model architecture and model scale on detection performance. The selected models consisted of YOLOv8 nano, small, and large (YOLOv8n/s/l), and YOLOv11 nano, small, and large (YOLOv11n/s/l), which represent widely used real-time object detection architectures suitable for edge deployment. YOLO-based detectors were selected due to their proven balance between detection accuracy and computational efficiency, making them particularly suitable for resource-constrained edge devices. In addition, YOLOv11 represents a recent architectural advancement over YOLOv8, enabling an evaluation of how updated backbone and attention mechanisms influence detection performance in apiculture imaging tasks.
YOLOv8 employs C2f (Concat-to-Fuse) modules in the Backbone and Neck, which split input features into parallel pathways—one applying repeated Bottleneck blocks and the other preserving skip connections—before fusing them through concatenation operations. This design improves hierarchical feature propagation and enhances small-object localization capability. YOLOv11 improves upon YOLOv8 by adopting C3k2 (Cross Stage Partial with kernel size 2) blocks to reduce FLOPs and parameters through consecutive smaller kernel operations, and C2PSA (Cross Stage Partial with Parallel Spatial Attention) blocks after the SPPF module to enhance spatial attention on multi-level features. These improvements allow YOLOv11 to achieve comparable accuracy with approximately 20% fewer parameters than YOLOv8 at equivalent model scales (e.g., YOLOv11s: 9.4M vs. YOLOv8s: 11.2M parameters).
Furthermore, three model scales (nano, small, and large) were selected to investigate the trade-off between computational efficiency and detection accuracy, which is critical for practical edge-based deployment. This experimental design enabled direct comparison of: (1) architectural differences between YOLOv8 and YOLOv11, and (2) model-scale effects (nano vs. small vs. large) on bee mite and deformed bee detection performance.
2.4.2. Hyperparameters and Training Configuration
To ensure the reproducibility of the study and maintain a fair comparison between the architectures, all YOLO models were trained using identical hyperparameters and environment settings derived from the experimental logs. The models were initialized with COCO-pretrained weights to leverage transfer learning and were trained on two Nvidia RTX A6000 GPUs within a Python 3.10.13 environment.
The training was conducted for a maximum of 1000 epochs, incorporating an early stopping (patience = 50 epochs) applied to prevent overfitting. The input resolution was fixed at 640 × 640 pixel, and a batch size was set to 16 for all models.
For the optimization process, an auto-selected optimizer (transitioning between SGD and AdamW) was utilized with an initial learning rate of 0.01, a momentum of 0.937, and a weight decay of 0.0005. A warm-up phase of 3 epochs was applied to stabilize the initial training process. The loss function configurations were defined with weights of 7.5 for Box loss, 0.5 for Class loss, and 1.5 for Distribution Focal Loss. All models were trained using a Non-Maximum Suppression (NMS) Intersection over Union threshold of 0.7.
Crucially, data augmentation techniques such as Mosaic, MixUp, HSV adjustment, flipping, and geometric transformations were not activated during the training process. This decision was made for two primary reasons. First, the objective of this study was to evaluate the intrinsic capability of the YOLO architectures to extract authentic morphological features from high-quality field images without introducing artificial distortions. Because the detection targets—bee mites and wing-deformed bees—are characterized by subtle morphological differences, aggressive augmentation could potentially alter the visual patterns relevant to diagnosis. Second, although each 640 × 640 pixel ROI was extracted based on the presence of bee mites or deformed bees, each frame contained a substantially higher number of normal bee instances. Applying global image augmentation would replicate these dominant background objects and potentially amplify the inherent class imbalance within the dataset. Therefore, augmentation was intentionally disabled to preserve the original object distribution of the field data and allow the models to learn from authentic morphological characteristics of the target pests.
2.4.3. Performance Evaluation Metrics
Model performance was evaluated on images not used during training using Accuracy, Precision, Recall, F1 score, and mAP[0.5], as defined in Equations (6)–(10).
where TP is the True Positive (correctly detected objects); FP is the False Positive (incorrectly detected non-objects as objects); FN is the False Negative (missed detections of actual objects); TN is the True Negative (correctly identified background regions as non-objects); n is the number of classes (3 in this study: Normal Bee, Bee Mite, Deformed Bee);
is the Average Precision of class k, calculated as the area under the Precision–Recall curve for that class [
30,
31].
F1-confidence curves were generated to identify optimal confidence thresholds that maximize F1-score for each class. Precision–Recall (PR) curves were plotted to evaluate the trade-off between precision and recall across different confidence thresholds and to calculate mAP[0.5] values. To thoroughly understand model limitations and failure modes, a comprehensive manual inspection was conducted on all test dataset images. Every instance of False Negative (FN, missed detection) and False Positive (FP, incorrect detection) was individually examined and categorized by failure type.
To qualitatively interpret model decision-making processes and validate that models attend to biologically relevant image regions, Occlusion Sensitivity Maps (OSM) were computed for representative examples from each class. OSM systematically occludes different regions of input images and measures the resulting reduction in class confidence scores, thereby highlighting which spatial regions are most critical for the model’s predictions. This analysis confirmed whether models focused on biologically meaningful features (e.g., wing structures for deformed bees, mite body shapes for bee mites) rather than spurious background correlations.
2.5. Tiling-Based Inference Pipeline and Deployment
To analyze high-resolution images (4608 × 2592 pixels) while preserving fine-scale visual details of small objects such as bee mites, a tiling-based inference strategy was implemented. Directly resizing the full-frame image (4608 × 2592 pixels) to the standard YOLO input resolution (640 × 640) would reduce the linear resolution by a factor of 7.2. Because a bee mite occupies only about 7 × 7 pixels in the original image, such resizing would compress the mite representation to approximately 1 pixel, resulting in the loss of critical morphological information required for reliable detection. To address this limitation, a tiling-based inference strategy was implemented. Each raw image is divided into 32 image tiles, with each tile measuring approximately 576 × 648 pixels. These tiles are then resized to 640 × 640 pixels, which corresponds to the fixed input resolution required by the YOLO models.
The inference process is executed on a Raspberry Pi AI HAT+ equipped with a Hailo NPU accelerator, enabling hardware-accelerated processing of each tile. After tile-level inference, Non-Maximum Suppression (NMS) is applied to manage overlapping detections at tile boundaries, and the tiles are merged to reconstruct the original full-resolution output to ensure that features as small as bee mites are accurately preserved.
4. Discussion
Comprehensive performance analysis revealed a clear phenomenon of model-task specialization: no single model emerged as universally optimal across all detection tasks. YOLOv8l specialized in detecting small, low-contrast bee mites (F1: 92.5%, mAP: 92.1%), leveraging its 43.7M parameters for robust small-object feature learning. Conversely, YOLOv11s specialized in classifying morphologically diverse deformed bees (F1: 95.1%), demonstrating the effectiveness of its C3k2 and C2PSA architectural refinements for handling varied morphological patterns.
These findings indicate that optimal model selection should prioritize target object characteristics rather than assuming newer architectures universally outperform predecessors. For small objects requiring precise localization, parameter-rich models offer advantages; for classification tasks with diverse morphologies, architecturally refined models provide superior efficiency and accuracy. Lightweight models (YOLOv8n, YOLOv11n) consistently underperformed (bee mites miss rate: 9.3%), indicating insufficient capacity for operational apiculture applications where detection reliability directly impacts colony health management.
These specialization patterns were observed under controlled conditions that prioritized data consistency. Specifically, all image acquisition was conducted within a rain-shielding apiary equipped with shading film. This structural environment effectively minimized external variables such as direct solar radiation and fluctuating shadows from cloud movement, allowing for high-quality image capture with a fixed exposure time of 12.5 ms without supplemental lighting. While this controlled setup ensured the reliability of the training dataset, it may limit the generalizability of the system in completely open-field apiaries where lighting conditions are more variable. Therefore, further validation under diverse environmental conditions will be necessary to confirm the robustness of the proposed system.
Furthermore, the system’s optical configuration was designed to accommodate the three-dimensional structure of the beecomb. By using a 2.50 diopter lens at a fixed imaging distance of approximately 400 mm, the system provides a depth of field that is sufficient to cover the typical beecomb cell depth of approximately 12 mm. This optical setup allows bee mites located at different vertical positions within the cells to remain within an acceptable focus range during image acquisition. However, in practical conditions, particularly under high bee density, the detection of bee mites deep inside the hexagonal cells is often limited by physical occlusion from surrounding bees. As a result, the primary objective of the proposed system for rapid field screening is to detect visible bee mites on the surface of adult bees, which serves as a reliable indicator of the overall colony infestation level.
5. Conclusions
This study developed and validated an automated AI-based vision inspection system for detecting bee mite and deformed bee, achieving a 12-fold reduction in inspection time compared to manual methods (20 s vs. 240 s per beecomb). By integrating a motorized beecomb rotation mechanism with YOLO-based deep learning models, the system enables rapid, dual-sided image acquisition and demonstrates significant potential for operational deployment in commercial apiaries.
To ensure the reliability of the performance metrics, the dataset was partitioned at the raw image level before tiling, effectively eliminating data leakage and ensuring spatial independence. The implementation of the tiling-based strategy was instrumental in preserving the fine-scale morphological features of bee mites, which would otherwise be mathematically undetectable through standard image resizing. Comprehensive evaluation of six YOLO models showed a model-task specialization pattern: YOLOv8l achieved the highest performance for small, low-contrast bee mites (F1: 92.5%, mAP[0.5]: 92.1%), while YOLOv11s excelled in classifying morphologically diverse wing deformities (F1: 95.1%). These results indicate that optimal model selection should be determined by the specific visual characteristics of target objects rather than by architectural generation alone.
Granular error analysis further demonstrated systematic relationships between morphological features and detection accuracy. For deformed bees, misclassification rates were associated with overlap in wing-to-body ratios relative to normal bees. DB Type II, which exhibited only a 1.7% difference from normal wing ratios, showed an 18.6% miss rate, whereas DB Type III, with a 19.7% difference, achieved perfect detection. This 1.7% threshold is explicitly identified as a “critical limit of detection” for single-view RGB systems. This boundary justifies the necessity for future research into multi-view imaging or the utilization of specific textural features, such as wing venation patterns, rather than relying solely on gross morphology to differentiate these subtle cases. This quantitative relationship suggests that morphological similarity to normal bees significantly influences detection difficulty.
While this study utilized shading films and fixed exposure settings to establish a stable imaging baseline, further research is required to evaluate system robustness under unconstrained outdoor lighting conditions. Future work will include multi-location dataset collection and field validation across diverse apiary environments. Expanding the dataset to include different geographical regions, seasonal conditions, and honeybee populations will be essential for improving the model’s generalization capability. To further improve environmental robustness, future implementations will integrate an LED illumination system synchronized with the rotation motor to reduce sensitivity to ambient lighting variations. To maintain imaging consistency and prevent motion blur, the LED system will employ indirect illumination, thereby minimizing harsh reflections on the comb surface. Furthermore, the lighting control will be integrated with the motor driver to activate exclusively when the comb is stationary for image acquisition. During the 5 s 180° rotation phase, the illumination will be turned off. This synchronized operation ensures uniform exposure across both sides of the beecomb while optimizing power consumption for long-term field deployment.