1. Introduction
Strawberry (
Fragaria ×
ananassa) is an economically important berry crop with high nutritional and commercial value. Its bright colour, distinctive flavour, and abundant bioactive compounds, including vitamin C, phenolics, flavonoids, and anthocyanins, make it an important fresh fruit and functional food resource [
1]. In commercial production, however, the value of strawberry fruit is strongly dependent on the accuracy of harvest timing. Fruits harvested too early often fail to develop desirable colour, aroma, and sweetness, whereas over-mature fruits are more vulnerable to mechanical damage, softening, decay, and postharvest quality deterioration. Therefore, precise and non-destructive maturity assessment is not only a quality evaluation problem, but also a key technical link connecting harvest scheduling, automatic grading, robotic picking, and postharvest management.
Traditional maturity assessment in strawberry production still relies largely on a visual assessment of fruit colour, surface gloss, firmness or softness, and grower experience. Although this approach is flexible, it is inherently subjective and labour-intensive, and its consistency decreases when production scale expands or when fruits are distributed unevenly under dense foliage. With the rapid development of digital agriculture, deep learning-based fruit detection has become an important technical pathway for replacing subjective manual judgement with objective visual perception [
2]. In intelligent harvesting systems, visual perception is no longer a separate recognition module; rather, it determines whether the robot can accurately identify fruit targets, estimate their positions, and make reliable picking decisions in real-time [
3]. For this reason, object detection and localisation have become fundamental tasks in fruit harvesting robots, especially under complex orchard conditions where occlusion, overlapping fruits, illumination variation, and background interference frequently occur [
4].
Compared with many other orchard fruits, such as apples, tomatoes, cherries, pitaya, and passion fruits, strawberry maturity detection presents a more fine-grained and unstable visual recognition problem, strawberry maturity detection presents a more fine-grained and unstable visual recognition problem. Strawberries are usually small in size, densely distributed, and easily occluded by leaves or neighbouring fruits. More importantly, their maturity changes gradually rather than discretely, and the visual difference between low- and medium-maturity stages can be subtle under natural illumination. Recent studies have attempted to improve YOLO-based models for strawberry maturity detection. CES-YOLOv8 demonstrated that optimising the feature extraction and detection structure of YOLOv8 can improve strawberry maturity recognition in field scenes [
5]. CR-YOLOv9 further showed that multi-stage maturity detection can benefit from enhanced representation of ripening-related visual features [
6]. With the emergence of YOLOv11, YOLOv11-GSF provided a newer framework for strawberry ripeness detection in agricultural environments [
7]. LBS-YOLO highlighted the need to reduce model complexity while maintaining maturity recognition accuracy [
8]. These studies indicate that deep learning has become a feasible tool for strawberry maturity detection. However, in practical field scenarios, low-maturity strawberries remain difficult to detect because they are often small, greenish-white, and visually similar to leaves.
The difficulty of strawberry maturity recognition is closely related to several common challenges in agricultural object detection. For example, green fruit detection based on an optimised YOLOX-m model showed that colour similarity between fruits and leaves can significantly weaken target separability [
9]. This problem is highly relevant to immature strawberry detection, where the fruit surface has not yet developed strong red pigmentation. Meanwhile, real-time fruit detection on CPU platforms has shown that agricultural detection models must be evaluated not only in terms of accuracy, but also whether they can run efficiently on resource-limited devices [
10]. Lightweight fruit detection models for orchard environments further confirm that dense distributions, irregular occlusion, and natural illumination changes are persistent constraints in practical deployment [
11]. Studies on embedded passion fruit detection also suggest that a compact model design is necessary when detection systems are expected to operate on field robots or portable devices [
12]. In automated mulberry harvesting, lightweight YOLOv8n-based detection has been used to improve the deployability of fruit recognition models [
13]. Similarly, GreenFruitDetector revealed that low contrast between fruit and vegetation can lead to missed detections when the model fails to preserve weak target cues [
14].
Therefore, lightweight design is not merely an engineering preference in agricultural vision; it is a practical requirement imposed by field deployment. In fruit detection tasks, the model must maintain sufficient representational ability to distinguish small and occluded targets while reducing excessive parameters, memory consumption, and floating-point operations. Lightweight YOLOv5s-based pitaya detection has demonstrated that compact models can support fruit recognition under both daytime and nighttime light-supplement environments [
15]. The use of S-YOLO for greenhouse tomato detection further illustrates that accuracy and efficiency must be jointly optimised rather than treated as separate objectives [
16]. YOLOv8-CML extended this idea to colour-changing melon ripening detection, where the model needs to capture both fruit location and maturity-related colour variation [
17]. A pineapple maturity analysis based on MobileNetV3-YOLOv4 also shows that lightweight backbones can reduce computational cost in natural environments [
18]. More recent YOLOv11-based apple detection indicates that the latest YOLO architectures are being actively adapted for lightweight orchard perception [
19]. The use of GPC-YOLO for tomato maturity detection similarly reflects the trend of redesigning YOLOv8n-like structures for agricultural maturity recognition [
20]. Embedded apple detection studies further suggest that model size and inference latency are decisive factors for practical orchard systems [
21]. Lightweight cherry detection also confirms that small fruit targets require compact yet detail-preserving feature extraction strategies [
22].
Although YOLO-series models have become widely used in agricultural detection, other object detection frameworks still provide useful methodological references. Faster R-CNN-based apple detection has shown strong localisation ability in complex orchard environments [
23]. However, its two-stage detection pipeline generally increases computational burden, which is why lightweight Faster R-CNN variants based on MobileNetV3 have been explored for densely planted pitaya orchards [
24]. Comparative research on the detection of date fruits further indicates that YOLO and Faster R-CNN differ not only in accuracy, but also in inference speed and deployment suitability [
25]. RetinaNet-based fruit detection provides another perspective by using focal loss and multi-scale feature fusion to address class imbalance and complex field backgrounds [
26]. Studies comparing YOLOv8, Faster R-CNN, and RetinaNet in olive fruit detection show that the optimal detector must be selected according to both accuracy and real-time requirements [
27]. Research on the detection and classification of date fruits also reveals that agricultural detection tasks often involve not only target localisation, but also subtle category discrimination among visually similar fruit classes [
28]. These findings suggest that, for strawberry maturity detection, the model should combine the speed advantage of YOLO with stronger feature fusion, shape-aware localisation, and maturity-sensitive classification.
Knowledge distillation offers a promising strategy for improving lightweight models without increasing inference complexity. In object detection, distillation transfers useful feature, response, or relational knowledge from a high-capacity teacher model to a compact student model, thereby improving the performance of small detectors [
29]. More general studies on knowledge distillation also regard it as an effective approach for model compression and efficient deployment under limited computational resources [
30]. In agricultural scenarios, reconstructed feature and dual distillation have been used to improve lightweight tea shoot detection, showing that distillation can enhance the representation ability of compact models in complex field backgrounds [
31]. Knowledge distillation has also been combined with pruning for colour-changing melon ripeness detection, demonstrating its potential in fruit maturity recognition tasks where both accuracy and efficiency are required [
32]. Nevertheless, most existing distillation strategies are designed as general compression methods. They do not explicitly consider the specific error sources in strawberry maturity detection, such as weak colour transition in low-maturity fruits, boundary ambiguity under occlusion, and confusion between adjacent maturity stages.
Overall, despite progress in accuracy and lightweight exploration, three gaps remain. First, adaptation to fine-grained cues is insufficient: most models under-capture small low-ripeness targets and the near-elliptic fruit shape, and only weakly model the green-to-red transition and occluded local textures. Second, balancing lightness and accuracy is difficult: some methods sacrifice accuracy for compactness and lack robustness to illumination changes and fruit overlap under real-time constraints. Third, the integration of specialised techniques is low: cross-domain modules such as partial convolution and learned upsampling are still fragmented, and dedicated datasets for complex field scenes are scarce.
Guided by the practical demands of field maturity detection, this study proposes SMLO-YOLO, an improved detector built on YOLOv11s that combines small-object enhancement, structural compression and knowledge distillation, and evaluates it on a dataset covering complex field scenarios. The main contributions are as follows:
1. Complex-Scene Strawberry Dataset: Images were collected in Sichuan and Shanxi across major cultivars, covering leaf occlusion, fruit overlap and uneven illumination. Three maturity levels—low, medium and high—were annotated, and a high-quality dataset was constructed through augmentation and quality control to support the training of lightweight models.
2. SMLO-YOLO Architecture: To reconcile feature transfer with lightweight design, the HDP-Neck integrates cross-scale alignment (HSPAN), dynamic sampling (DySample) and selective convolution (C3K2_PConv), stabilising feature transfer while reducing redundancy. A decoupled EfficientHead enhances class discrimination, and ShapeIoU improves localisation of near-elliptic fruits, addressing small-fruit misses and boundary bias.
3. BCKDloss Knowledge Distillation: To overcome the limited adaptability of generic distillation, a scheme tailored to small and occluded strawberries transfers the teacher’s discriminative capability without increasing model size, thereby improving robustness under complex field conditions.
4. Discussion
SMLO-YOLO was developed to address the main challenges of field strawberry maturity detection, including small targets, leaf occlusion, fruit overlap, and subtle colour transitions between maturity stages. Previous YOLO-based studies have improved strawberry ripeness detection through feature enhancement and lightweight design, such as CES-YOLOv8, CR-YOLOv9, YOLOv11-GSF, and LBS-YOLO [
5,
6,
7,
8]. Compared with these methods, SMLO-YOLO further combines cross-scale feature fusion, shape-aware localisation, and knowledge distillation, which makes it more suitable for complex orchard environments.
The improvement of SMLO-YOLO mainly comes from the coordinated optimisation of several task-specific components. HDP-Neck strengthened multi-scale feature representation for small and occluded fruits, while EfficientHead reduced redundant computation and improved maturity classification. ShapeIoU introduced strawberry-shape information into bounding-box regression, and BCKDloss enhanced the discriminative ability of the lightweight student model without increasing inference complexity. These results suggest that effective strawberry maturity detection depends not only on lightweight model design, but also on preserving weak visual cues related to fruit colour, boundary, and shape.
However, several limitations remain. The current validation was limited to three cultivars and two production regions, and the model was mainly tested under daytime field conditions. Its robustness under broader cultivars, different illumination conditions, disease interference, and real edge-device deployment still requires further evaluation. In addition, maturity assessment was mainly based on external visual cues, while physiological indicators such as firmness, soluble solid content, and acidity were not included. Therefore, the current model should be regarded as a visual maturity detection method rather than a complete physiological maturity evaluation system.
5. Conclusions
This study proposed SMLO-YOLO, a lightweight strawberry maturity detection model designed for complex field environments. The model integrates HDP-Neck, EfficientHead, ShapeIoU, and BCKDloss-based knowledge distillation to improve the detection of strawberries at different maturity stages under conditions such as small targets, occlusion, overlap, and subtle colour variation.
The final SMLO-YOLO model achieved an mAP50 of 92.4%, Recall of 86.5%, Precision of 85.3%, 256.41 FPS, 6.49 M parameters, and 15.0 GFLOPs. These results demonstrate that the proposed method can maintain high detection accuracy while preserving a compact model structure and real-time inference capability. Therefore, SMLO-YOLO provides a feasible visual perception solution for strawberry maturity monitoring in agricultural robots, handheld devices, and edge-based orchard systems.
Future work will expand the dataset, include more cultivars and environmental conditions, introduce maturity-related physiological indicators, and validate the model in closed-loop robotic harvesting scenarios to improve the robustness and replicability of strawberry maturity detection methods.