1. Introduction
Precision agriculture can be used to increase yields and provide information for decision making. The application of precision agriculture in fruit detection has attracted considerable attention from researchers. Examples of benefits of fruit detection include yield estimation and mapping [
1] and disease control [
2]. The increase in the world’s population and the resulting higher demand for food are accompanied by a change in dietary habits toward healthier foods such as fruits and vegetables, increasing the specific demand for this type of produce and the impact of climate change on agricultural activities; the migration of people to cities leads to a reduction in the labor force available in rural areas, which requires an improvement in the efficiency and effectiveness of agricultural practices. Technological evolution allows the automation and robotization of some of these practices, as well as the development of decision support systems that help in the management of these agricultural practices [
3].
The detection of fruits—and, particularly, peaches—through automatic systems can contribute to the improvement of the efficiency of agricultural cultivation processes, whether that be through the adequate and sufficient supply of water [
4,
5,
6], the fertilizer supply, evaluation of the vigor and state of health [
7], ripening state, and diseases [
2], or even the improvement of weed control [
8]. The management of agricultural practices supported by artificial intelligence systems for decision making helps in yield estimation, resource management, and circular economy [
9,
10]. These approaches can contribute to an increase in production and rentability through improved supply contracts and the reduction of fixed costs. Additionally, these results are part of an improvement in a crop’s environmental sustainability through the reduction of fertilizers and are a contribution to the reduction of food loss.
Computer vision for fruit detection can be developed such that it cannot be used in real time. That is, images or videos are first captured and stored for later use (processing) [
11]. This type of computer vision model was developed to run on a cloud or desktop computer, which typically requires large amounts of computing resources and memory. However, in certain applications, computer vision models must run on an edge device while performing inferences at high speed. This is the case with robot vision applications [
12]. In general, edge devices are limited in terms of processing, memory, and power consumption [
13,
14]. To adapt an image processing application to these constraints, models such as MobileNets [
15,
16,
17], ShuffleNet [
18], Squeezenet [
19], and DenseNet [
20] have been developed. Because these models are optimized to run on a CPU, they are only suitable for “light applications” (e.g., processing only approximately one frame per second (FPS)). This is because these models have a high latency. However, after training, these models can be optimized to run on a graphics processing unit (GPU) with much better inference time performance [
21,
22,
23].
Tian et al. [
24] proposed a modified version of the YOLOv3 detector model to detect apples at different growth rate stages in orchards. The authors used an NVIDIA Tesla V100 server GPU for the training and testing. Using 3000 × 3000 resolution images, they achieved an F1 score of 0.817 and an inference time of 0.304 s. It is important to emphasize that the approach used in this study was not portable. Fu et al. [
25] developed a vision system based on RGB and Kinect sensors for detecting apples in outdoor orchards. They used the faster R-CNN model and a desktop PC equipped with a GPU NVIDIA TITAN XP card. For original RGB images at a resolution of 1920 × 1080, they reported a detection performance of 0.79 AP and an inference time of 0.125 s. The approach used in this study was not portable. Liu et al. [
26] proposed a modified version of YOLOv3 for detecting tomatoes. The detection used circles instead of boxes to locate the tomatoes. The model received 416 × 416 pixel images as inputs and achieved a detection accuracy of 96.4% AP and an inference time of 54 ms in a PC target device. Because the target device was a PC, this approach does not fall into the portable category.
Zhang et al. [
22] proposed a lightweight fruit detection algorithm designed specifically for edge devices. The algorithm was based on a light-CSPNet network and YOLOv3. The model was deployed in the NVIDIA Jetson family (Jetson Xavier, Jetson TX2, and Jetson NANO). The detection accuracies for the orange, tomato, and apple datasets were 93, 88, and 85% AP, respectively. The detection speeds of the Jetson Xavier reached 46.9, 40.3, and 45.0 ms (orange, tomato, and apple, respectively) for image resolutions of different sizes. This approach fell into the portable category. Huang et al. [
23] proposed a modified version of the YOLOv5 detector by adding an attention mechanism and an adaptive fusion method to the citrus detection model. The target device was an NVIDIA Jetson Nano integrated graphics processor. Using images with a resolution of 608 × 608, they achieved a detection accuracy of 93.32% AP and an edge-computing processing speed of 180 ms. Based on the model used and the target device, this approach falls into the portable category. Tsironis et al. [
27] adapted the single-shot object detector (SSD) to the underlying object size distribution of the target detection area. They evaluated their proposed adapted model in tomato fruit detection and classification for three maturity stages of each tomato fruit. With the image resolution of 515 × 512, by using a PC with a standard GPU (not portable), the model performed inferences at a speed of 200 FPS. In addition, the model was not optimized in terms of the edge device approach. In another work, Tsironis et al. [
28] created a specialized tomato dataset with more than 250 images and a total of 2400 annotations. In this work, the dataset was evaluated for six object detection models.
Recently, a state-of-the-art TPU accelerator [
29] and the MobileDet detector were developed for general image detection tasks [
30]. In this work, we proposed the use of these two technologies with a Raspberry Pi target device for a real-time peach fruit detection application. The main contributions of this paper include the following:
We propose the use of a lightweight and hardware-aware MobileDet detector model for a real-time peach fruit detection application while embedded in a Raspberry Pi target device along with a Coral edge TPU accelerator.
We present a novel dataset of three peach cultivars with annotations and have made it available for further study (to our knowledge, this is the first work of its kind).
The remainder of this paper is organized as follows.
Section 2 presents the equipment used for inference, the image dataset, the object detection model, and the mathematical formulation for the model evaluation. To confirm the performance of the proposed method, the results and discussions are presented in
Section 3. Finally,
Section 4 concludes the paper and provides guidelines for future work.
4. Conclusions
In this study, we proposed the use of a lightweight and hardware-aware MobileDet detector model for real-time peach fruit detection applications in conjunction with an edge device and TPU accelerator. A novel annotated dataset of three peach cultivars was created and made available for further studies.
Models designed to run on a TPU device (e.g., SSD MobileDet and SSD EdgeTPU) (hardware-aware) performed approximately 4% (AP) better than models that were not designed to run on a TPU (native). An important result is that the inference speed was, on average, 20 times faster when the model ran on a TPU device than on a CPU. The MobileNetV1 model running on a TPU device performed at 21.01 FPS, and the MobileDet model performed at 19.84 FPS. At a loss of 1.17, the FPS did not significantly affect the performance for practical computer vision applications. Therefore, it is reasonable to use SSD MobileDet to improve the detection accuracy. A comparison was made with other approaches. However, a direct comparison between the approaches was not possible because different datasets and image sizes were used. Considering the price, AP, and latency, our approach of using a TPU accelerator is a good alternative for practical applications. Further research could also be conducted to explore fruit yield estimates based on the approach presented in this paper.