Deep Learning Beehive Monitoring System for Early Detection of the Varroa Mite

: One of the most critical causes of colony collapse disorder in beekeeping is caused by the Varroa mite. This paper presents an embedded camera module supported by a deep learning algorithm for the process of early detecting of Varroa infestations. This is achieved using a deep learning algorithm that tries to identify bees inside the brood frames carrying the mite in real-time. The end-node device camera module is placed inside the brood box. It is equipped with ofﬂine detection in remote areas of limited network coverage or online imagery data transmission and mite detection over the cloud. The proposed deep learning algorithm uses a deep learning network for bee object detection and an image processing step to identify the mite on the previously detected objects. Finally, the authors present their proof of concept experimentation of their approach that can offer a total bee and varroa detection accuracy of close to 70%. The authors present in detail and discuss their experimental results.


Introduction
Bees' population declination worldwide is mainly caused due to the reckless use of pesticides by humans [1], as well as the emergence of new resistant strains of bacteria and mites. Varroa disease is one of the most resistant diseases with two main forms (Varroa destructor and Varroa jacobsoni) [2], affecting the growth of the offspring, stressing the bees, and reducing their chances of survival in winter. More specifically, Varroa mite is one of the most devastating diseases for the beekeeping population. This parasite attaches to the bee and feeds on its fat cells, causing wing deformations.
The environment inside and around the beehives is vital to the colony establishment's success and development. An essential factor in apiary hives that affects both colony survival and honey yield is the ability to manage agricultural interventions and disease treatments (especially Varroa mite [3]) and monitor the conditions inside the beehive [4][5][6].
Most beekeepers use synthetic pesticides to control the mite infestation; nevertheless, the mite soon develops resistance to their active compounds, compromising their effectiveness. The prevalence of the mite over the year is confronted by: (a) The reproduction of resistant Apis mellifera breeds [7] and (b) the appliance of precise interventions upon the detection of the mite that includes controlled high temperatures (above 38 • C) or increased relative humidity stress events above 70% for normal brood box temperature environments (32-37 • C) [8,9].
The Internet of Everything (IoE) industry is shifting fast towards the beekeeping sector, aiming for the vast applicability of Internet technologies and IoT, capable of applying new smart AI detection algorithms [10][11][12]. Existing applications in agriculture include environmental monitoring and bee monitoring IoT systems [13][14][15][16][17][18]. This paper focuses on technological solutions for detecting bees and then Varroa mites using cameras inside the brood box.
Significant improvements have been made toward monitoring beehive conditions using cameras over the last few years. Such systems try to replace systems that use IR sensors at the beehive entrance or sound sensors [13,[19][20][21] and translate sensory information to conditions status. Therefore, image processing techniques and deep learning techniques have been utilized to monitor bees, and their actions at the hive entrances, as mentioned at [22,23]. Image processing methods are the easiest to implement. However, deep learning methods produce the most reliable condition outcome results, such as swarming or external attacks. Nevertheless, if such probing methods are performed only at the beehive entrance, they cannot reliably detect the occurring event early enough for the beekeepers to ameliorate them successfully.
This paper investigates existing technological systems focusing on implementing an IoT system for detecting Varroa mite in bees, which leads to major Colony Collapse Disorders (CCD) around the globe. Since Varroa infestation events occur mainly in the late spring (Apr-May) and first Autumn months, as indicated by the apiarists, this paper presents a new camera sensor system for detecting Varroa inside the beehive. The proposed system uses a camera that incorporates an image processing motion logic and utilizes a Convolutional Neural Network CNN pre-trained model for detecting bees. The rest of this paper is structured as follows: Section 2 presents related work in existing technological products for Varroa mite identification and confrontation. Section 3 presents the authors' proposed Varroa detection system and system capabilities. Section 4 presents the proposed CNN algorithm used by the system. Section 5 presents the authors' system experimentation (offline and online), evaluation, and cross-comparison with existing literature systems. Finally, Section 6 concludes the paper.

Related Work
This section presents the related work on detecting and treating Varroa. It is divided into two subsections: The first subsection includes Periodic uniform interventions (treatments) that mainly incorporate chemical substances, and the second subsection describes in detail technological detection systems that try to offer precise mite detection for beekeeping interventions to apply.

Related Work on Periodic Varroa Mite Treatments
Beekeepers have been evaluating the efficacy of an acaricide treatment by counting dead mites that drop from brood frames and bees onto the hive bottom board. Methods that they use are the "sugar shake" procedure, washing kits (alcohol wash), and organic substances such as thymol [24,25].
Another way of Varroa treatment is by using generated smoke-vaporizer kits produced by acid mixtures (usually Oxalic acid and glycerin). This smoke can be an effective and comparatively cheap method [26]. The use of antigenic acid is also mentioned by [27] as a means of limiting Varroa infestations if sprayed inside the brood box.
Various chemical substances and materials are used to determine if a colony is infested with Varroa mites, collecting and killing the mites. These normally are the following: Ether (or Alcohol) roll, Powdered-sugar shake, chemical sticky-boards (Mite Census), Drone/Brood Sampling, and Sticky-Board with acaricides [27,28].
However, because of the number of negative points in the usage of the chemical methods, the search for more efficient alternative methods kept going with the inclusion of the developing technology.

Related Work on Varroa Mite Detection Technologies
A technological method reported by [29] uses the E-nose technology. Its use is to detect Varroa, when the infestation influences the chemical composition of the air inside a hive. The deciding factor for the effectiveness of this technique is the time of detection.
Sound monitoring systems using MEL or FFT spectrograms are also powerful tools for Varroa mite indications. The use of frequency-amplitude over time representations combined with SVMs and neural network classifiers can distinguish between different hive states, specifically external attacks, colony stress, swarming, and queen loss [19]. Detection of colony collapse can be exploited as an indicator of the existence of the Varroa mite.
The authors at [13,30,31] also present their implementations of low power Wireless Sensor Network (WSN) technology, assisted by cloud computing, to monitor the existence of bees stress behavior, that also includes the existence of the Varroa mite. The key objective of this research is to use WSN technology to notice a beehive colony and collect key information about the activity/environment within a beehive and the health of the bees. Nevertheless, as mentioned by [13], Varroa mite detection using sensors (temperature, humidity, noise level, and gas sensors), may include many false detection cases, since other phenomena such as swarming, queen loss, or even hunger may lead to CCDs as the ones caused by the mite.
Based on its eco-friendly and sustainable nature, accurate results, and futuristic design, Var-Gor [32] is a promising device for the early detection of Varroa mite as well as its early struggle. Specifically, when a contaminated bee with varroa mite enters an uncontaminated hive, the Var-Gor system detects the mite by image capturing, template matching, color classification, and segmentation filters. Moreover, this sends a warning by notification to the beekeeper's phone.
For real-time bee monitoring using cameras, including deep learning approaches, the authors at [33] developed an experimental system. Their proposition is built on a singleboard computer Raspberry Pi (RPi) platform and aims to analyze video streams with bees and detect varroosis. They also applied two distinct detection processes with two CNN models, one for the bees and the other for the Varroa. Nevertheless, the camera is located outside the hive, making it hard for mite early detection. In case of infection, the pictures of the infected bees are transferred to the data center in the Cloud for further analysis, storing, upgrading their CNN models, and sending a notification to the relevant beekeeper. The use of two distinct CNN models makes this approach hard to apply detection to a standalone RPi device inside the device, as the authors of this paper propose with their offline system approach. Furthermore, the use of image processing techniques such as edge detection Hough transformations, region labeling, and color masking can accurately identify the mite on the detected bees, provided by a CNN trained network that accurately detects bees, similar to the one the authors of this paper propose.
The authors at [34] present a camera-based approach of CNN-trained networks using the classification of infected and non-infected bees manually and then utilizing a laser to exterminate the infected ones. This approach, as presented, has the drawback of using single bee images labeled and classified. This can perform well on detecting single bees on the beehive door openings or white background but fail significantly on detecting bees inside the frames (where the mites reside) due to the vast concentrations of bees on each frame. The solution to this problem proposed by this paper is to use frame images where manual bee annotation and image localization are performed before CNN training. Since regions in the image do not contain any information, image segmentation and manual identification of the ROIs can offer significantly better results.
The authors at [35] aim to provide a solution to identify Varroa mite from low quality and a limited number of images. The proposed model combines an image enhancement method CLAHE, data augmentation method DCGAN, and an optimized classification method CNN to classify infected or healthy bees from standard bee images. The results convey that the CLAHE method improves sharpness and positively affects the CNN performance. Furthermore, the DCGAN augmentation method provided more promising results than the conventional ones in the infection identification scenario. In conclusion, this vision-based approach appears to be more suitable and efficient for identifying Varroa mites on bees.
Finally, the authors at [36] experimented on whether the state-of-the-art object detectors using ill bees and varroa mites annotated datasets. Then, they experimented with CNN algorithms such as YOLOv5 [37] and SSD [38], to perform the varroa mite and ill bee detection. The authors CNN testing using F1-score results have shown a score of 87% for Yolo ill bees detection and above 70% for SSD varroa mite detection. The authors tried to use the Deep SVDD anomaly detector. However, the SVDD anomaly detector was not able to model the problem. The authors also mention that Jetson Nano can be used as part of an implemented detection end-node device. The authors' proposition of using YOLOv5 provides significant results. Nevertheless, no detection performance results are given out. Furthermore, the use of the F1 score metric that represents the model's precision and sensitivity is not an expected value of the model's accuracy. The mAP score indicates a good accuracy model but not a credible accuracy metric.
The following section presents the authors' proposed Varroa Detection system implementing their proposed smart detection algorithm, which focuses on the early detection of the mite to apply a precise treatment.

Proposed Varroa Detection System Implementation
The authors propose a new incident response system for the automatic detection of Varroa mite. The system includes the following parts: (1) The end-node device, (2) the cloud service for the online detection process, (3) the concentrator device serving the online detection process, and the mobile phone application used for the offline detection process. The end-node detection device parts and capabilities are presented in detail in the following subsections.

Beehive Camera End-Node
The beehive camera monitoring module is attached to a plastic beehive frame, and it is placed inside the beehive brood box, as shown in Figure 1. The module includes the following components: • The camera module component. There are two camera module components included in the end-node detection module. The first is a 5 MP camera with a fisheye lens of 160 • sighting and manually adjusted focus. This camera is connected to the end node microprocessor using a 15 pin FFC cable. The first camera is located in the middle of the plastic frame (see Figure 1, Camera 1). There is also a second camera attached to the end-node device. This is a 5 MP USB camera located on top of the plastic frame. It is placed on top of a plastic frame covered entirely with a smooth plastic surface to avoid being built or waxed by bees. It is equipped with a small led and a hinged arm that allows the camera to take images above and on top of the frames. Both cameras are connected to a quad-core ARM microprocessor and can be used concurrently to capture bee images inside the brood box. • The Microprocessor Control Unit (MCU). The MCU is a quad-core embedded 64 bit-ARM microprocessor device operating at 1 GHz, including a 512 MB LDDR2 RAM clocked at 450 MHz. The MCU is responsible for storing camera snapshots to its embedded SD-card and if appropriately configured, uploading them to the cloud, using the Varroa detection service Application Interface (API) is created for that purpose (see Figure 1, ARM end-node CPU). • The data transmission modules. There are MCU embedded Wi-Fi and Bluetooth 4.2 with Bluetooth Low Energy (BLE) capable transponders attached to the MCU. The transponders are used in turn by the two modes of end-node device operations: Online and offline. • The autonomous device power component. It includes a 20 W/12 V PV panel connected directly to a 12 V-60 Ah lead-acid SLA/AGM battery (see Figure 1). The battery is placed under the PV panel on top of the beehive and feeds the ARM MCU unit using a 12 V voltage regulator with 2 × USB outputs, used to power the end-node device through its micro-USB power port. The battery used is a deep depletion one, since the system might get fully discharged due to its short battery capacity, especially at night or on prolonged cloudy days. The Wi-Fi concentrator. It is used only in the online mode of operation, and it is a Wi-Fi access point device that includes an LTE/3G cellular transceiver. The end-node MCU connects to the concentrator for the process of imagery data uploads to the cloud if the device operates in online mode. For the offline mode, the MCU BLE interface is used, transmitting to a distance up to 4-10 m the detection output of the beehive. The following subsection describes the end-node device's two modes of operation.

End-Node Device Functionality and Modes of Operation
There are two modes of operation that the end-node device can be used: Offline and online. The operation modes can be selected using a selection switch placed at the bottom of the MCU casing. The switch is connected to an MCU GPIO pin digital input, and by setting it to LOW or HIGH, it switches between the two modes of operation accordingly. The two end-node modes operate as follows: • Online mode: In this mode of operation, the process of Varroa detection is performed over the cloud. For this purpose, appropriate cloud service and API using HTTP requests have been implemented. The API is capable of image data transmission from the uploaded by the end-node MCU using HTTP protocol PUT requests. Also, an HTTP JSON POST request can be sent to the cloud API, including an API key and a beehive id, and the API returns as part of a JSON object. The Varroa detection results for this beehive, including the base64 encoded images Regions Of Interest (ROIs), where Varroa mite has been detected. The online mode of operation requires using the beehive concentrator, which is responsible for the node data transmissions over the internet over HTTP. It acts as an intermediate gateway among the end nodes and the cloud application service. The concentrator can upload images with an overall bandwidth capability that varies from 1-7/10-57 Mbps, depending on the gateway distance from the beehive [39]. If the distance is 20-30 m, it is limited by the LTE technology used.
• Offline mode: In places of limited Internet connectivity and cellular coverage, the end-node offline mode can be used. The offline mode does not require the use of the concentrator device.
In this mode, a micro-service that includes a version of the detection algorithm inside the end-node device is used for the process of executing the Varroa detection algorithm locally (as presented in Figure 2). Then, the final CSV output is transmitted using the MCU BLE transponder to the farmers' mobile phones. A BLE service and two read characteristics can be used for the CSV output, and Varroa detected ROI image acquisition accordingly. The beekeeper can check the status of each one of his beehives by moving close to the beehive and pairing with each one accordingly, performing the BLE read from his mobile phone. The drawbacks of the offline mode are that it offers 20-25% less end-node energy consumption and no communication provider costs. Nevertheless, it has difficulties with BLE pairing, especially if many BLE devices are close-by and there are difficulties on characteristic reads of imagery base64 encoded data [40,41]. For this reason, only one Varroa mite ROI is available (the last one detected) via that BLE characteristic. The following section describes the Varroa mite detection algorithm used by the end-node devices in offline or online operation modes.

Proposed Method for Varroa Mite Early Detection
Training a Convolutional Neural Network for an object initiates with the collection of the sample of images that contain the objects to be detected. If no object localization applies to the image, then the process involved is a classification process that assists in classifying the entire image contents. On the other hand, suppose the localization of multiple objects in the image is enforced. In that case, the constructed process is a detection one, and it may provide output bounding boxes or masked areas of the detected objects. The first is called object segmentation, while the latter is called instance segmentation.
For CNN classifiers or detectors, different models can apply. In this paper, the Regional CNN algorithm Faster R-CNN [42] is used on pre-trained models [43], for the collection of fixed size bounding boxes and confidence scores in the presence of object instances. In these pre-trained models, arbitrary classes can be set, attached, and trained as part of the model's classes. The most well-known algorithms are the Single Shot MultiBox Detector (SSD) [38] and the Faster R-CNN [44]. These algorithms and these used models are the most common because they have been created to balance efficiency and accuracy. The SSD algorithm uses a Convolutional Neural Network (CNN) to input images only once and outputs a feature map [38]. The feature map then goes through a convolutional kernel to predict the bounding boxes and the possibility of classifying them. The Faster-RCNN algorithm uses a small convolutional network called the Region Proposal Network (RPN) to create areas of interest in which the network predicts the probability of being the background or object of interest [44].
For the implementation of their bee-detection process, the authors have selected MobileNet pre-trained CNN models since they are proven lightweight models with short detection times and can be used in embedded systems as part of deep industrial learning real-time or close to real-time applications [45]. Additionally, the authors selected a heavy model such as ResNet-50, with five times more trainable parameters that can be loaded to an embedded device to compare the accuracy results with the ones derived from lightweight models.
In this section, the authors describe their process of detecting the Varroa mite inside the beehive. Their proposition includes two steps: (a) use of SSD and Faster-RCNN Convolutional Neural Networks (CNN) including pre-trained models [46,47] for the detection of bees inside the brood box frames and (b) use of color masking and Hough transformation for the detection of Varroa upon the previously detected bee objects.
The algorithmic process used to build the neural networks and carry out the detection of bees is comprised of five steps. The first four are necessary for the CNN bees' object detection process, and the fifth step is required for the edge detection of the Varroa mite on the previously detected contours. The detection process steps are shown in Figure 2. The proposed algorithm CNN training steps and detection step used are as follows: • Step 1-Initial data acquisition and data cleansing: The initial imagery dataset acquired by the Beehive monitoring module is manually analyzed and filtered to eliminate blur images or images of low resolution and light intensity. The photos in this experimentation taken from the camera module are set of the minimum acquisition of 5 Mpx size of 800 × 600 px 300 dpi compressed at JPEG format using a compression ratio Q = 70, of picture size 350-500 KB each. Similarly, the trained CNN network and algorithms used are the most processing light for portable devices, using a minimum trained image size input of 640 × 640 px (lightly distorted at the image height) and Cubic interpolation. The trained Convolutional Neural Network (CNN) is used to solve the problem of swarming by counting bees' concentration above the bee frames and inside the beehive lid. The detection categories that the authors' classifier has used are: For each class, a number of detected bees has been set as a class identifier (The class identifier boundaries can be arbitrary and set accordingly at the detection service configuration file). Therefore, the selected initial dataset must consist of at least 1000 images per detection class, a total of 5000 images used for our training CNN case. • Step 2-Images transformation and manual data annotation: All images that went through the clearing process were manually annotated using the LabelImg [48] tool. There are other tools used for the photo annotation process, such as Labelbox [49], ImgAnnotation [50] and the Computer Vision Annotation Tool [51], which always create an output in either the JSON or XML format. The resolution and clarity of the original images are extremely important, as this facilitates the detection of the Varroa mite. Regarding the clarity of the photo, the method used is as follows. A bilateral filter smothers all images using a degree of smoothing sigma = 0.5-0.8 and a small 7 × 7 kernel. Afterward, all photos must be scaled to particular and fixed dimensions to be inserted into the training network. Scaling is performed either using a cubic interpolation process or a super-resolution EDSR process [52]. The preparation of the photos is initially based on the dimensions that each training algorithm requires for its smooth operation. The OpenCV [53] library is used for the image transformation process and is part of the second and fifth stages of detection. The second stage is before the detection of bees using CNN, and the fifth is the stage of detection of the Varroa mite (see Figure 2). • Step 2-Training process: The training process is based on the use of PyTorch pretrained Convolutional Neural Network (CNN) models [54] and the use of all available system resources. The essential computer subsystem for the training process is the GPU to speed up the neural network training. CUDA tools and libraries are used for this purpose according to PyTorch requirements. CNN's creation is based on pre-existing PyTorch [54] trained models used to train the neural network to detect bees. The selected PyTorch models and their capabilities for the detection process are presented in Table 1. After the annotation of the images of Step 2 is completed, the images are divided into two sets. The first set is the training set which contains 70-80% of the annotated photos, and the remaining 20-30% is the test set. The second can be divided into 50% to create another set which will be the validations sample. Then, you choose the model that will be used for the training. The output of the training process is the CNN model used in Figure 2, which is the Step 2 detection process of bee objects. Step 3-Detected bee contours: This step includes a selection process of bee-detected objects based on the confidence threshold value set by the service. A good confidence value threshold that can be used is above 0.5 (50%). Then, the Gaussian filtering cubic interpolation is applied to the selected contours to scale them to sizes (wxh) of 40 × 50 px for step 4 to apply on each distinct ROI. • Step 4-ROI masking, Varroa mite detection step: This step includes a color transformation from RGB to HSV. Then, an appropriate HSV mask is applied to each bee scaled image, transforming them into binary images, where the detected by the mask areas are set to white and all other areas to black. Then, a Hough transformation is applied to detect circular areas with a lower-upper threshold of 10-90 px 2 . If at least one circular area of this threshold is detected on a bee object, this bee is set as detected with Varroosis. The original color image, including the detected bee object contours with the masked Varroa mite areas annotated on each CNN detected image, is stored in the appropriate output folder. The detection results are appended to the detection service process CSV output file. • Detection service process and data output: The detection process is performed by a daemon application that is installed as a service on a cloud server or at the embedded end-node device depending on the mode of operation (online, offline). This application loads the inference graph of the CNN neural network into the system memory so that the bees can be detected and then the Varroa mite can be detected. This procedure is performed on each image received from the end node device using HTTP PUT requests. The HTTP PUT method requires that the requested URI message be updated or created, which is enclosed in the body of the PUT message. Thus, if there is a resource in this URI, the message body is considered as a new modified version of this resource. Once the PUT request is received, the service starts scanning the bees and then the Varroa so you can output an updated CSV file containing the number of Varroa mites detected in each photo taken by the end node device. Figure 2 shows in detail the steps of the detection process implemented on the cloud server (online) or in the embedded end-node device (offline).
The following section presents the authors' experimental results using the proposed CNNs combined with computational vision techniques. Figure 3 shows the detection output from each one of the steps that are followed by the author's proposed detection process as illustrated in the flowchart in Figure 2. The images displayed/per detection step have been taken during the Varroa mite detection process validation.

Experimental Scenarios and Results
This section presents the experimental scenarios that were implemented using endnode devices to detect bees using a selected CNN (Faster R-CNN), that uses the region proposal network (PRN) to reduce the total image detection times at least 10× times [44], and one of three pre-trained models: MobileNet V2 [55], MobileNet V3 [56], or ResNet-50 FPN [57] (as illustrated in Table 1), and then detect the Varroa mite using computer vision color masking and Hough transformation.
The tests to validate the detection were performed using 100 images from inside the hive that did not contain Varroa and 100 photographs of individual bees from a different set of photos containing Varroa. As a result, the Varroa mite could be detected.
For each test, five different measurements were taken: The time required to load the trained network into the system memory, the total detection time, the average ROI detection time, the mean Average Precision (mAP) obtained from testing the models for IoU = 0.5 and the maximum memory distribution per neural network. In addition, two different measurements have been measured for Varroa detection: The average detection accuracy and the Varroa mite detection time for each bee ROI (detected contour).
In addition, the Detection Accuracy (DA) and the Mean Detection Accuracy (MDA) metrics have been used (see Equation (1)) to evaluate the proposed Varroa detection process. DA and MDA metrics are calculated by manually measuring the bees in the test photos and the Varroa mites in each photo with individual bees. The accuracy metrics have been mentioned at [58].
Object detection accuracy is also measured by mean Average Precision (mAP) from Pytorch. That is the average of maximum precision at different detected contours over the real annotated ones. Precision measures prediction accuracy by measuring true positives from all true positive and false negative cases.
The tests were performed on three different systems, a Raspberry Pi Zero W version 2 and two cloud servers. The RPi has a quad-core processor clocked at 1 GHz with 364 MB of RAM, 2 GB of swap memory, and a GNU/Linux 64-bit operating system. The cloud server version 1 has an octa-core processor clocked at 2.6 GHz, 12 GB of RAM, 4 GB of swap memory, and its operating system is Ubuntu Server 18.04. While the cloud server version 2 has a 24-core processor clocked at 3.8 GHz, it has 64 GB of RAM, 60 GB of swap memory, and a Ubuntu 20.04 operating system.

Scenario I: Detection System Performance Tests
The first experimental scenario includes the results obtained from the three systems separately. During the tests, all trained models (MobileNet V2, MobileNet V3, ResNet) of the Faster R-CNN algorithm were evaluated for their performance. The results of the execution time and the memory usage during the detection of bees are presented in Tables 2-4.
Based on the results obtained from the end node device, the Faster R-CNN ResNet-50 model provides the best network load time in the system memory (30% less than the MobileNet V2 model and 7% less than the MobileNet V3 model). Furthermore, a comparison of load times shows that the Faster R-CNN ResNet-50 model provides a faster loading model than MobileNet. However, the average time to detect bees per photo is more extended than MobileNet models. The most efficient model is the MobileNet V3 (80% faster than the Faster R-CNN ResNet-50 and 25% faster than the MobileNet V2).
Then, the execution time of the detection process for the cloud services and the endnode device are compared to measure the value of the speedup σ. Using the average detection time of each model can compute σ = T 8 T 24 as the average detection time using CNN, which is conducted using parallel tasks utilizing all available cores of each system. Therefore, for the MobileNet V2 and V3 models, similar speedups are observed; specifically, the speedups of the models are σ = 3.5 and σ = 3.4, when moving from n = 8 to n = 24 cores, respectively. However, for the ResNet-50 FPN model, it provides a greater speedup equal to σ = 6.4. This is also shown at Tables 3 and 4. By the time each subsystem detects bees in each photo, the ResNet-50 FPN model is 88% faster in cloud server version 2 than version 1 (version 2 has three times more cores than version 1).

Scenario II: Detection Algorithm Accuracy Tests
Scenario II focuses on the precision results obtained from using the generated neural networks on all three devices operating either in online or offline mode. The measurements used are the Mean Detection Accuracy (MDA) from Equation (1), the mean Average Precision (mAP) calculated by Pytorch, and the Success Frequency (S.F.) from Equation (3), based on MDA and from Equation (4), based on mAP, as proposed in [58]. The results from the three systems are presented in Table 5. Using the metric S.F, the authors can evaluate which of the systems consumes the most energy in the process of detecting bees. Looking at the values in Table 5 and comparing the columns of S.F., it turns out that Cloud Server Version 2 consumes less power for crawling than the other two systems, since the energy needed for crawling is 99% less than the energy required by the End Node Device and 40-60% less than needed by Cloud Server Version 1.
The following scenario describes the results in detecting the Varroa mite, which is the last step in the proposed Varroa detection procedure as illustrated in Figure 2 (detection algorithm's steps and output).

Scenario III: Evaluation of Varroa Mite Detection Step
In this scenario, 200 photos were used to detect the Varroa mite on bee contours derived from the previous object detection step of the CNN models. This scenario aims to test the accuracy and precision of the Varroa mite detection process on bee objects. One hundred of the detected contours depicted bees with Varroa, while the rest depicted bees without Varroa. Upon the application of the HSV masking and Hough transformation detection step for Varroa mites on bee objects, the measurements in Table 6 were obtained. Table 6 shows the accuracy and precision calculated using the following Equation (5) for accuracy and precision accordingly.
The variables used for the validation process are: TP (True Positive) describes the number of photos in which the mite is present and detected successfully, FN (False Negative) is the number of photos that had Varroa but were not detected, FP (False Positive) is the number of images that did not have Varroa but was detected, and finally the variable TN (True Negative) is the number of images that did not have the Varroa mite and was not detected. In addition, using the measurements of Table 6 along with the measurements of scenario II (Table 5), the total accuracy of the whole Varroa detection process can be inferred through the formula: Total Acc = Acc * MDA. In this way, it can be determined which CNN, together with the use of image processing, can give the best results for the detection of Varroa. This is because the accuracy rates of Table 6 depend to a large extent on the number of images that go through the process of detecting the Varroa mite. The MobileNet V2 model has the best total accuracy, which corresponds to 68% (8% more than the ResNet-50 FPN and 24% more than the MobileNet V3).
It should be commented at this point that the authors have not used the Recall and F1-Score metrics. The recall metric refers to the sensitivity percentage of the varroa mite detection algorithm. The degree of sensitivity depends directly on the size of the masked color space to recognize the Varroa color hue. This space also depends on the brightness and sharpness of the photos that go through this process. For this reason, the F1-Score for the calculation of which recall in part has not been taken into account. Instead, only precision and accuracy metrics have been used.
Using the same set of photos for all system modes of operation (online version 1, online version 2, offline end-node device), there were no differences in the execution times of the detection process, since, in all three cases, the average detection time is close to 4ms. Moreover, there was no change in the size of the memory used by the three device cases during the Varroa detection step, which is close to 0.7 MB.

System Cross Comparison with Existing Literature Solutions
Summarizing the existing camera-based technological solutions, as presented in the bibliography, to detect Varroa mite, are presented in Table 7. Table 7 characterizes the proposed system capabilities based on their camera facing setup (inside the brood box facing bee frames or at the beehive door openings), maximum precision recorded, mean detection time per photo, and whether they are capable of online and offline operations. That is, the CNN is implemented in the cloud only and requires end-node devices capable of transmuting images only, or it also supports an embedded CNN implementation, having end-nodes that include the CNN algorithms, and no Internet connectivity is required. From Table 7 it is obvious that most of the systems include a camera setup facing the beehive door openings. This is not the case for the early detection of Varroa mite since it grows inside the brood-box frames, making most of the existing propositions incapable of early detection. In addition, only the author's proposed end-node devices and Chazette's [34] may offer offline operation embedded at the end-node devices.
Mrozek's [33] and Bilik's [36] implementations are mostly cloud-based, with the Bilik's algorithm to also utilize also GPU cores for the detection process. The authors' end-node detection algorithm has similar precision results to the Yolov5 algorithm proposed in [36]. The authors also express serious doubts about the mean detection time of Chazette's CNN implementation, executed in an embedded system that can achieve more than 72% precision. Accuracy measurements are not considered, since most existing systems use the F1 score as an evaluation metric. For that reason, the authors also assume that F1 score values represent mainly precision values of a constant recall value for all system cases examined.
According to Table 7, the authors' proposed online detection time is about 470 ms. That is two times more than Mrozek's solution [33]. Nevertheless, the maximum precision achieved by Mrozek's model is 36%, which is 2.3 times less than the precision achieved by our model. Comparing the offline operation of our model and Chazette's [34], the authors also mention that Chazette's implementation is not part of an embedded solution rather than an experimental proof of concept run on a test-bed PC.
The authors have also preliminarily experimented with the EfficientNet model [59], which is a lightweight model implementation similar to MobileNet V2 and V3 models and has similar accuracy results to the MobileNet V2 model. Nevertheless, loading the EfficientNet trained Neural Network into the system's memory takes 92% more time than the MobileNet V2 model. In addition, the total time required to detect bees using EfficientNet is 20% longer than the time required by MobileNet V2. Taking into account the above excessive model load time, the authors have chosen not to include EfficientNet in this study. The authors set as future work a further reduction of their model offline detection time (which also includes model load time).

Conclusions
This paper presents a novel Varroa mite detection system that uses cameras and deep learning techniques embedded into the beehive for early detection. The authors present both their system and their detection process. The proposed CNN detection algorithm can be incorporated either in the cloud or at the end-node system devices, modifying the system's architecture and operation accordingly.
The authors have experimented with using their system's detection process both online (performing detection process as a cloud-service) and offline (end-node device includes the detection algorithm and provides textual detection output over BLE). The experimental results have demonstrated that the detection process over the cloud is at least 3-4 times faster than if the algorithm is incorporated as a service at the end-node embedded device. The online mode also offers the advantage of easily changing the CNN model with a newly trained one. The disadvantage of the online mode includes communication providers' imagery data uploading costs using NBIoT or LTE Cat-M1 transceivers. On the other hand, in offline mode, the mean detection time per image of 104 s and the CNN model load time of 125 s (MobileNet V2 model) at the embedded quad-core ARM MCU is relatively 40×-80× times more than the time required for online detection. This detection time interval is significantly less energy efficient since the end-node device is battery operated and requires minimum detections to be performed either once per hour or once per day.
The authors' detection algorithm experimentation focused on using different pretrained models (MobileNet V2, MobileNet V3, ResNet-50), setting the MobileNet V2 model as the most accurate one in the Faster R-CNN algorithm used the authors' trained network. Finally, the authors experimented with their Varroa mite detection process and provided the evaluation of their detection algorithm's accuracy and precision in cases of actual Varroa incidents to 77% and 86% accordingly.
The authors set for future work the extensive evaluation of their proposed system towards Varroa mite outbursts and the extension of their algorithms accuracy, minimizing offline detection time and further classification of the severity of the Varroa incidents detected.