1. Introduction
The strawberry fruit is favoured by consumers because it is a good source of antioxidants and nutrients [
1]. Strawberries are also widely cultivated and rank first worldwide among the production of small berries [
2]. In 2021, the worldwide market for fresh strawberries held a valuation of approximately USD 24.79 billion, and it is anticipated to ascend to around USD 43.33 billion by the year 2028 [
3]. Ripe strawberries are sweet and juicy, possessing considerable economic worth. Given that strawberries are categorised as non-climacteric fruits—meaning they exclusively mature while attached to the plant—it becomes imperative to harvest them during their prime ripeness to guarantee the fruit’s superior quality [
4]. Despite continuous endeavours aimed at creating robotic harvesting solutions for strawberries and various other crops, the realisation of a fully functional commercial system remains elusive. As a consequence, the present process of strawberry harvesting continues to depend heavily on human labour. After the harvesting process, growers primarily evaluate the ripeness of strawberries by tallying the cultivation period, inspecting the colours of the fruits, and sometimes relying on personal taste judgment. However, this human element introduces subjectivity, resulting in an uneven ripeness among post-harvest strawberries, which significantly undermines the overall quality and uniformity of the strawberries [
5]. To this end, there exists a considerable market demand for the automated identification of strawberry ripeness within the field, aiming to enhance the precision of selective harvesting processes [
6].
Artificial intelligence has garnered growing interest across various domains and has been integrated into agricultural practices to advance the automation of production processes [
7]. Over the past few years, deep learning (DL) has surfaced as a catalyst, propelling artificial intelligence to new heights, and offering optimal solutions to numerous challenges within the realm of image recognition [
8]. Conventional machine learning approaches necessitate the manual curation of features that classifiers utilise for identifying patterns. While effective for addressing straightforward or clearly defined issues, these methods often falter when confronted with intricate real-world challenges, such as object detection. In contrast, deep learning is specifically engineered to transcend this constraint, employing intricate neural networks that empower computers to perceive, comprehend, and respond to intricate scenarios. In terms of ripeness detection, Miragaia et al. [
9] developed a classification system to determine the ripening stage of plums based on Convolutional Neural Networks (CNN). Additional research has documented the utilisation of CNN for categorising the various ripening stages of apples [
10], mulberries [
11], and bananas [
12].
Strawberries are usually difficult to detect due to the significant variability among fruits (e.g., size and colour) [
13]. Detecting strawberries involves the initial step of locating a strawberry fruit within an image or video. This process of object detection proves valuable for enumerating instances of strawberries within a scene and monitoring their exact positions. While various studies have documented the use of Region-based Convolutional Neural Networks (R-CNN), these are deemed impractical for real-time applications due to the significant time required to execute object detection [
14]. Conversely, the You Only Look Once (YOLO) approach [
15,
16] offers a substantial enhancement in detection speed as a one-stage detector. This method operates by employing the entire image as the network’s input and directly provides output pertaining to the bounding box positions and the associated class probabilities of those bounding boxes [
17], thereby enabling real-time object detection. In contrast to its initial iteration, YOLOv2 introduced a novel integrated training approach that enables users to train object detectors using both detection and classification data [
16]. Subsequently, YOLOv3 was formulated by enhancing the feature extraction backbone network Darknet53, resulting in an enhanced processing speed [
18]. Bochkovskiy et al. [
14] proposed YOLOv4 with the aim of fine-tuning the equilibrium between detection precision and processing speed. This iteration stood out as an exceptionally advanced detector, exhibiting swifter operation and heightened accuracy in comparison to existing alternatives. As the most recent addition to the YOLO lineage, YOLOv7 is architected around a trainable Bag of Freebies, thereby empowering real-time detectors to notably enhance accuracy without inflating inference expenses. Additionally, it substantially enhances detection velocity by curtailing parameter quantities and computational demands, achieved through the utilisation of “extend” and “compound scaling” strategies [
19]. Emerging as the latest benchmark, YOLOv7 outperforms all existing object detectors in terms of both speed and accuracy [
20]. In the context of identifying strawberry ripeness, the initial research conducted by Habaragamuwa et al. [
21] introduced a Deep Convolutional Neural Network (DCNN). This network was designed to distinguish between two categories of strawberries—mature and immature—utilising images from greenhouses. The resulting deep-learning model attained an average precision of 88.03% for mature strawberries and 77.21% for immature strawberries. More recently, Y. Wang et al. [
22] proposed a multi-stage approach for detecting strawberry fruits using YOLOv3, resulting in a mean average precision (mAP) of 86.58% and an F1 Score of 81.59%. Despite their capacity to discern strawberry ripeness, these prior investigations have yet to be evaluated in real-time conditions owing to their sluggish detection speeds. Consequently, the task of identifying the ripeness of strawberry fruits in practical field settings remains demanding, as evidenced by the modest detection accuracies (i.e., mAP < 90%).
Significant opportunities lie in the integration of emerging digital technologies within the agri-food industry. Augmented reality (AR) technology, in particular, facilitates the overlay of computer-generated virtual information onto the physical world [
23]. The utilisation of AR technology has the potential to bring about a revolutionary change in agricultural applications by enriching the physical world with immersive virtual information, thus overcoming human limitations. In recent times, there has been an increase in the volume of literature focusing on the integration of AR within precision farming practices. For instance, Goka et al. [
24] proposed a harvest support system aimed at aiding tomato harvesting, utilising Microsoft HoloLens for visualising the sugar content and acidity levels of individual tomatoes. Additionally, augmented reality techniques have been employed to facilitate the identification of weeds [
25], plants [
26], and pests [
27], where users could be guided to the location that needs intervention. AR also enables the representation of Internet of Things (IoT) information, which can be overlaid onto an actual crop in real time. This approach enables farmers to engage with IoT data seamlessly within the real-world setting. As a result, it significantly enhances monitoring duties and minimises expenses related to planting operations. Consequently, it is clear that AR technology plays a pivotal role in the advancement of agriculture by enhancing the effectiveness and output in the administration of farming tasks.
Finally, the introduction of autonomous harvesting robots is costly and, therefore not a viable option for many small business-sized growers. At present, most farmers harvest strawberries manually based on human observation to decide the level of ripeness, leading to an uncertain harvest quality. Hence, in this work, we aim to develop an AR head-mounted display system that captures images of strawberries and displays the predicted ripeness label in real-time by leveraging cutting-edge deep learning technology (i.e., YOLOv7) for the rapid detection of strawberry ripeness in the greenhouse. In summary, the key objectives of this work are (a) developing an object detector for the ripeness identification of strawberries, (b) testing the model performance using a different variety of strawberries for in-field validation, and (c) designing an AR application framework for the real-time detection of the ripeness levels of strawberry fruit in the greenhouse. The primary novelty of this work lies in the integration of AR technology with object detection for the real-time identification of strawberry ripeness. While object detection has been used extensively for various applications, its combination with AR for fruit ripeness assessment is a pioneering concept. This integration offers a visually intuitive and contextually informative platform, bridging the gap between digital predictions and real-world scenarios.
4. Discussion
In this study, we focused on the development of an innovative AR head-mounted display system that employs cutting-edge deep learning technology, specifically YOLOv7, to realise real-time strawberry ripeness detection within greenhouse environments. Our research distinguishes itself from existing methods in several key aspects. Traditional strawberry ripeness assessment methods often rely on manual observation and subjective judgment, leading to inconsistencies and delays in decision making. In contrast, our AR head-mounted display system provides an automated, real-time solution that mitigates human error and offers instantaneous insights into fruit ripeness. While various image-based fruit assessment techniques exist, the integration of AR technology and deep learning object detection for agricultural purposes is a unique departure from conventional practices. Furthermore, our approach goes beyond the confines of controlled laboratory conditions. The validation of our model using various strawberry varieties in real-world greenhouse environments demonstrates its robustness and adaptability.
The proposed strawberry ripeness detection system exhibits certain limitations that warrant consideration. First of all, lighting conditions can significantly impact the colour representation in the images, which is a critical parameter for our RGB-based ripeness classification. To account for variations in lighting conditions and to enhance the model’s ability to generalise across diverse scenarios, we have implemented a data augmentation strategy that introduces adjustments to the values of the three HSV channels. Meanwhile, strawberries, often nestled amidst green foliage, can indeed pose challenges for accurate image-based classification. Leaves may occlude parts of the fruit, altering the colour appearance captured by the camera. As is seen in
Figure 8, the YOLOv7 and YOLOv7-multi-scale models mistakenly detected a leaf as an unripe strawberry. Future work involves continually refining the object detection component of our system, leveraging YOLOv7’s robustness to handle occlusions and varying object sizes. Employing a random grid search for fine-tuning is a limitation in comparison to contemporary state-of-the-art practices. Random grid searches lack the assurance of discovering the globally optimal hyper-parameters. Further techniques, such as Bayesian optimisation, can be used for more advanced and targeted hyper-parameter tuning. Meanwhile, utilising Python’s HyperOpt library can simplify hyper-parameter tuning with sophisticated algorithms such as Tree-structured Parzen Estimators; therefore, it requires less manual intervention compared to a random grid search.
Other limitations include the dependence on specific hardware for AR applications, which might pose challenges in terms of availability and compatibility. This dependency could limit the widespread adoption of the proposed approach, especially in regions with limited access to specialized equipment. While the system has been initially validated in greenhouse settings, its performance in open-field conditions might differ due to variations in lighting and weather conditions, and potential interference from natural elements. Furthermore, the accuracy of the deep learning model is dependent on the quality of the annotated training data. Human errors during annotation could introduce inaccuracies that impact the model’s performance. In addition, the effectiveness of the system depends on the user’s ability to operate and interpret the AR display. Adequate training and familiarisation are essential to ensure accurate and consistent ripeness assessments.
The focus of this work lies in delivering a practical and real-time solution to replace the subjective nature of an immediate visual assessment of strawberry ripeness, introducing a more objective and efficient approach. However, it is important to note that considering a broader spectrum of factors such as soluble solids, phenols, and Vitamin C content will undoubtedly contribute to achieving a holistic characterisation of ripeness. The inclusion of multiple characteristics has the potential to facilitate the development of a comprehensive grading or sorting system, enabling farmers to make informed decisions based on harvested strawberry quality. Therefore, future research endeavours should be performed to refine this work by integrating these essential aspects and further enhancing the accuracy and applicability of this strawberry ripeness classification system.
One of the most immediate practical implications is the acceleration of the decision-making processes in strawberry cultivation. The ability to swiftly and accurately identify ripe strawberries reduces the time required for manual assessment. This is especially valuable in large-scale greenhouse operations where timely harvesting decisions can impact yield quality and minimise waste. More importantly, the implications of our work extend beyond immediate ripeness identification. The real-time overlay of predicted ripeness labels onto physical strawberries provides growers with targeted information. This can guide interventions such as selective harvesting or customised treatment strategies based on the specific needs of individual plants or areas. The successful integration of AR and deep learning opens avenues for broader applications in agricultural practices, including automated sorting and grading systems.