StarNet-Embedded Efficient Network for On-Tree Palm Fruit Ripeness Identification in Complex Environments

Li, Jiehao; Zhang, Tao; Zeng, Shan; Gao, Qiaoming; Wang, Lianqi; Lu, Jiahuan

doi:10.3390/agriculture15171823

Open AccessArticle

StarNet-Embedded Efficient Network for On-Tree Palm Fruit Ripeness Identification in Complex Environments

by

Jiehao Li

^1,2,3

,

Tao Zhang

¹,

Shan Zeng

¹

,

Qiaoming Gao

³,

Lianqi Wang

³ and

Jiahuan Lu

^1,*

¹

National Key Laboratory of Agricultural Equipment Technology, South China Agricultural University, Guangzhou 510642, China

²

Key Laboratory of Spectroscopy Sensing, Ministry of Agriculture and Rural Affairs, Zhejiang University, Hangzhou 310058, China

³

Guangxi Hepu County Huilaibao Manufacturing Co., Ltd., Beihai 536100, China

^*

Author to whom correspondence should be addressed.

Agriculture 2025, 15(17), 1823; https://doi.org/10.3390/agriculture15171823

Submission received: 13 July 2025 / Revised: 19 August 2025 / Accepted: 26 August 2025 / Published: 27 August 2025

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

As a globally significant oil crop, precise ripeness identification of palm fruits directly impacts harvesting efficiency and oil quality. However, the progress and application of identifying the ripeness of palm fruits have been impeded by the computational limitations of agricultural hardware and the insufficient robustness in accurately identifying palm fruits in complex on-tree environments. To address these challenges, this paper proposes an efficient recognition network tailored for complex canopy-level palm fruit ripeness assessment. Progressive combination optimization enhances the baseline network, which utilizes the YOLOv8 architecture. This study has individually enhanced the backbone network, neck, detection head, and loss function. Specifically, the backbone integrates the StarNet framework, while the detection head incorporates the lightweight LSCD structure. To enhance recognition precision, StarNet-derived Star Blocks replace standard bottleneck modules in the neck, forming optimized C2F-Star components, complemented by DIoU loss implementation to accelerate convergence. The resultant on-tree model for recognizing palm fruit ripeness achieves substantial efficiency gains. While simultaneously elevating detection precision to 76.0% mAP@0.5, our method’s GFLOPs, parameters, and model size are only 4.5 G, 1.37 M, and 2.85 MB, which are 56.0%, 46.0%, and 48.0% of the original model. The effectiveness of the model in recognizing palm fruit ripeness in complex environments, such as uneven lighting, motion blur, and occlusion, validates its robustness.

Keywords:

ripeness identification; palm fruit; image processing; StarNet-embedded; complex agriculture environments

1. Introduction

One important cash crop that produces oil on a global scale is palm fruit [1,2]. Global palm fruit output has consistently risen recently, growing from 410.697 million tons in 2019 to 424.587 million tons in 2022, according to the statistical yearbook issued by the Food and Agriculture Organization of the United Nations (FAO) in 2024 [3]. Indonesia and Malaysia are the principal global producers of palm fruit. In 2022, Indonesia produced 257 million tons, representing 60.49% of the global total. Owing to relatively lower degrees of industrialization, the primary production districts of Malaysia and Indonesia predominantly rely on labor-intensive methods, and the adoption of advanced technology has not yet greatly risen in these locales. The quality of oil and harvesting efficiency strongly influence the precise identification of palm fruit ripeness [4]. The evaluation of ripeness has traditionally relied on manual eye inspection, which is inefficient and highly subjective [5]. Despite advancements in computer vision technology for fruit detection [6,7], the intricate scene characteristics of palm fruits on trees continue to present significant recognition issues. Agricultural scenarios necessitate lightweight models to accommodate the computational limitations of edge devices.

The precise determination of palm fruit ripeness has gained significance as smart farming techniques in palm plantations have evolved rapidly in response to the rising demand for palm oil [8,9,10,11]. Certain researchers have determined the ripeness of palm fruit via image processing techniques grounded in conventional machine learning. Taparugssanagorn et al. [12] proposed an image processing method based on relative entropy for non-destructive prediction of palm fruit ripeness. By applying the Kullback–Leibler distance to oil palm classification, their experimental results demonstrated that the proposed algorithm achieves high accuracy and rapid computation speed. Septiarini et al. [13] employed a machine vision approach to classify oil palm fruits into unripe, ripe, and partially ripe categories. Color and texture features were first extracted from the palm fruit images. Subsequently, an ANN was applied to the classification process to obtain predicted ripeness classes. The experimental results demonstrated that this method achieved an accuracy of 98.3%. Alfatni et al. [14] obtained images under regulated LED light utilizing a CCD camera. Utilizing a texture-based model that combines BGLAM and ANN, they attained 93% accuracy in identifying palm fruit ripeness inside the ROI2/ROI3 regions, with a processing duration of only 0.4 s. Certain researchers employ deep learning techniques to evaluate the freshness of palm fruits. Chang et al. [15] suggested a hybrid approach to color correction. This method creates ground-truth images using concurrently recorded spectral data and photographed palm fruit. A color constancy model is then trained by mapping the palm fruit photos onto various ambient spectra. Finally, a YOLOv8 model for identifying the ripeness of palm fruit is trained using the rectified photos. Elwirehardja et al. [16] developed a lightweight CNN model capable of classifying the ripeness of palm fruit on Android-based applications. By employing three unfrozen convolutional blocks and a specialized data augmentation technique termed “Nine-Corner Cropping”, they enhanced the network’s ripeness classification accuracy. To advance research and application in deep learning-based palm fruit ripeness recognition, Suharjito et al. [17] released an open-source dataset specifically curated for this task. Collected at a palm oil mill in South Kalimantan, Indonesia, the dataset comprises six ripeness classes: unripe, underripe, ripe, overripe, empty bunches, and abnormal fruits. At the same time, the dataset was verified by using deep learning methods. Salim et al. [18] employed a genetic algorithm to automatically search for the optimal learning rate, thereby optimizing YOLOv4-tiny and enhancing its accuracy in palm fruit ripeness detection. There are also researchers who use specific physical sensing techniques to explore the correlation between the intrinsic physical properties of palm fruit and ripeness. Ali et al. [19] proposed an approach for detecting the ripeness of palm fruit by integrating computer vision with laser backscattering imaging. Their study significantly established a strong correlation between palm fruit color values and oil content. Zolfagharnassab et al. [20] achieved high-precision classification of palm fruits into unripe, ripe, and overripe categories by capturing surface thermal variations via thermal imaging technology and integrating this data with machine learning algorithms. Goh et al. [21] collected reflectance spectral data from different regions of palm fruits using an optical spectrometer. They observed superior ripeness classification performance for the anterior equatorial region. However, the study was limited by a small sample size and did not account for the effects of field variables such as lighting conditions and humidity on the spectral data.

Traditional machine learning often necessitates researchers to manually design and select discriminating color and texture features based on domain knowledge or experience to distinguish between various stages of palm fruit ripeness. While the integration of specific physical sensing technologies for recognizing the ripeness of palm fruit represents a noteworthy approach, its adoption is constrained by prohibitive costs, portability limitations, and pronounced susceptibility to environmental interference [22,23]. Deep learning-based approaches have emerged as the predominant choice in smart agriculture [24,25,26,27], owing to their operational flexibility, scalability, and cost-effectiveness [28,29,30]. Deep learning demonstrates marked superiority over alternatives for tasks that require managing significant visual variability, such as identifying the ripeness of palm fruit. Nonetheless, attaining both robustness and lightweight recognition in complex environments, such as palm fruits on trees, remains challenging.

Palm fruit, cultivated on trees and enveloped by branches and foliage, frequently exists in complex environments characterized by obstructions and uneven illumination. Static recognition is ineffective, and the swift assessment of palm fruit ripeness leads to image motion blur due to camera movement. The “complex environment” described in this paper mainly involves high-frequency interference under the palm tree canopy, including occlusion by branches and leaves, image motion blur, and uneven lighting. Moreover, constrained computational resources in agricultural hardware require a more streamlined recognition network. This research presents an efficient network for recognizing palm fruit ripeness, enhanced principally through the integration of StarNet, to overcome the challenge of balancing model lightweightness with robust recognition. The main contributions of this paper are summarized as follows:

This study proposed a novel StarNet embedding-based efficient network for recognizing the ripeness of palm fruit in complex environments. By adopting the StarNet backbone architecture and introducing an optimized C2F-Star module in the neck structure, the network’s feature representation capability is effectively improved, significantly reducing model complexity.
This research employs a combinatorial optimization approach to enhance the network for identifying the ripeness of palm fruit. Initially, StarNet and the LSCD detecting head are employed to accomplish lightweighting. The C2F-Star module and DIoU loss are employed to enhance and stabilize the network, delivering both lightweightness and robustness.
Evaluation under intricate environmental conditions, such as inconsistent lighting, motion blur, and occlusion, confirmed the model’s robustness. The model for recognizing the ripeness of on-tree palm fruit attained 4.5 GFLOPs, 1.37 M parameters, and a size of 2.85 MB.

The following chapters are arranged: Section 2 introduces the dataset for palm fruit ripeness and the designed on-tree palm fruit ripeness identification networks. Section 3 compares and analyzes the result. In Section 4, we discuss the limitations of the research and possible directions for future work. Finally, Section 5 summarizes the article.

2. Methodologies

2.1. On-Tree Palm Fruit Ripeness Dataset

The dataset is sourced from the Science Data Bank [31]. The researchers collected samples from palm trees located in an oil palm plantation in Kalimantan Province, Indonesia [32]. It is constructed through video recordings captured at 30 frames per second across the plantation. These videos are acquired at a resolution of 320 × 640 pixels, and their image size is between 50k and 100k, with individual frames subsequently extracted and professionally annotated using the Computer Vision Annotation Tool (v2.4.8.). The dataset categorizes palm fruit images into five distinct stages: flowering stage, early fruiting stage, transitional fruiting stage, mature and harvest-ready fruit, and abnormal specimens exhibiting characteristic features indicative of potential disease or irregularities. Representative images of each class are illustrated in the upper part of Figure 1: (a) anripe, (b) underripe, (c) ripe, (d) flower, (e) abnormal. Videos are recorded by circumnavigating each tree at different heights during distinct times to capture palm fruit images from multiple viewing angles. As illustrated in the lower part of Figure 1, the collected images exhibit real-world complexities arising from (f) uneven lighting, (g) motion blur, (h) backlight, (i) exposure, and (j) occlusion. The dataset of palm fruit ripeness contains 12,560 photos. The study divided the dataset in a 7:2:1 ratio, yielding 8792 images for the training set, 2512 for the validation set, and 1256 for the test set.

2.2. Enhanced Palm Fruit Ripeness Recognition Network

YOLOv8, launched by Ultralytics, attains superior mean average precision across multiple object identification benchmarks due to an enhanced backbone network design, a more refined feature pyramid network, and an optimized training method. It achieves an optimal equilibrium of speed, accuracy, and user-friendliness, rendering it an appropriate fundamental network for the identification of palm fruit ripeness. However, YOLOv8 still has a relatively large model size and computational complexity, and there is still room for improvement in its adaptability to resource-constrained environments. This study mainly enhances the overall performance and recognition robustness of the original YOLOv8 for palm fruit ripeness assessment by integrating StarNet into the YOLOv8 network. The architectural modifications comprise four key elements: (1) initial integration of the LSCD detection head; (2) substitution of the original backbone network with StarNet for reduced computational demands; (3) replacement of default C2F modules with C2F-Star blocks; and (4) utilization of DIoU rather than CIoU as the bounding box regression loss function to enhance localization accuracy. This integrated approach simultaneously addresses model efficiency and recognition performance. The improved network structure is shown in Figure 2.

2.2.1. StarNet Replaces the Backbone Network

Building upon prior research, Ma et al. [33] introduced StarNet, which demonstrates impressive computational efficiency and low latency through its compact architecture under constrained resource budgets. The core StarNet operation maps inputs into a high-dimensional, non-linear feature space without requiring network expansion, thereby significantly enhancing the model’s feature representation capacity. Part A of Figure 3 illustrates the network structure of StarNet. For object detection networks such as YOLO, this mechanism proves particularly effective in capturing subtle structural variations within images. Within single-layer neural networks, the star operation achieves feature fusion through element-wise multiplication of linearly transformed features, formulated as

w_{1}^{T} x \times w_{2}^{T} x

(1)

where

w_{1}, w_{2}, x \in R^{(d + 1) \times 1}

, and d is the number of input channels. To simplify the analysis, StarNet focuses on the scenario that involves one output channel conversion and one input unit. StarNet can be further rewritten as

(\sum_{i = 1}^{d + 1} w_{1}^{i} x^{i}) \times (\sum_{j = 1}^{d + 1} w_{2}^{j} x^{j}) = \sum_{i = 1}^{d + 1} \sum_{j = 1}^{d + 1} w_{1}^{i} w_{2}^{j} x^{i} x^{j}

(2)

StarNet can be further expanded into

α_{(1, 1)} x^{1} x^{1} + \dots + α_{(4, 5)} x^{4} x^{5} + \dots + α_{(d + 1, d + 1)} x^{d + 1} x^{d + 1}

(3)

This process generates a total of (d + 2)(d + 1)/2 different items.

α

is the coefficient of each item, which is expressed as

α_{(i, j)} = \{\begin{matrix} w_{1}^{i} w_{2}^{j} & if i = = j \\ w_{1}^{i} w_{2}^{j} + w_{1}^{j} w_{2}^{i} & if i! = j \end{matrix}

(4)

where i and j are used to index the channel.

Through sequential layer stacking, StarNet recursively expands implicit dimensionality exponentially, asymptotically approaching an unbounded feature space. This architectural property enables effective generalization to multi-layer networks. Under this logic, StarNet can be expressed as

O_{1} = \sum_{i = 1}^{d + 1} \sum_{j = 1}^{d + 1} w_{(1, 1)}^{i} w_{(1, 2)}^{j} x^{i} x^{j} \in R^{{(\frac{d}{\sqrt{2}})}^{2^{1}}}

(5)

O_{2} = W_{2, 1}^{T} O_{1} \times W_{2, 2}^{T} O_{1} \in R^{{(\frac{d}{\sqrt{2}})}^{2^{2}}}

(6)

O_{3} = W_{3, 1}^{T} O_{2} \times W_{3, 2}^{T} O_{2} \in R^{{(\frac{d}{\sqrt{2}})}^{2^{3}}}

(7)

O_{l} = W_{l, 1}^{T} O_{1 - 1} \times W_{l, 2}^{T} O_{1 - 1} \in R^{{(\frac{d}{\sqrt{2}})}^{2^{I}}}

(8)

where multiple output channels

W_{l} \in R^{(d + 1) \times (d^{'} + 1)}

, d is the width of the initial network layer, and

O_{l}

represents the output of the lth star operation.

Through sequential stacking of multiple layers, the star operation exponentially amplifies the implicit dimensionality of the representation space. For palm fruit ripeness recognition on trees, discriminative characteristics, such as subtle chromatic variations, textural transitions, or ripeness-specific spotting, typically emerge from intricate pixel-level combinations across feature channels. The star operation simulates the interaction among these features via nonlinear combinations, enhances the extraction of ripeness features from palm fruit, and decreases both computing complexity and the number of parameters in the network. This paper uses StarNet to replace the backbone network in YOLOv8.

2.2.2. C2F-Star Module Introduced into the Neck Network

The standard bottleneck module in YOLOv8 comprises two 1 × 1 convolutions performing dimensionality reduction and expansion, respectively, followed by a 3 × 3 convolution for feature extraction. While efficient and lightweight, this relatively simplistic structure exhibits constrained representational capacity. Improved multi-scale feature extraction and effective information fusion skills are necessary for the detection network to recognize palm fruit ripeness under difficult circumstances, such as occlusion, uneven illumination, and motion blur. In YOLOv8, we replace the typical bottleneck architecture of the C2F module with StarNet’s StarBlock to address the limits in feature richness and fusion efficiency. StarBlock’s star-shaped topology, which consists of a central processing branch and several parallel radial branches, significantly increases multi-scale representational capacity. Its fusion approach, which is based on concatenation, integrates cross-scale information more effectively. This architectural substitution synergizes with the branch structure of C2F to enhance feature quality and detection accuracy while maintaining manageable computational overhead. The network structure of the C2F-Star module is shown in part B of Figure 3. We implement the C2F-Star variant throughout the neck network of the baseline YOLOv8 architecture, replacing all original C2F modules.

2.2.3. Detection Network Introduces LSCD Detection Head

The original YOLOv8 detection head relies on single-level prediction, where computations occur independently across scales. This design suffers from insufficient cross-scale feature interaction, leading to vanishing feature signals for small objects at higher layers and inadequate semantic enrichment for large objects at lower levels. To address these multi-scale detection limitations in YOLOv8, we introduce the LSCD head [34], whose structural configuration is depicted in Figure 2. The LSCD detection head employs shared convolution blocks to replace the original convolutional modules, while substituting batch normalization for group normalization (GN) within these layers. The LSCD detection head reduces computational complexity through shared convolutional operations. Within its feature pyramid, 1 × 1 convolution coupled with Group Normalization (Conv-GN) performs channel compression while preserving high-resolution details. For the enhancement of large-scale object detection, the low-level feature expansion employs 3 × 3 Conv-GN downsampling to enlarge the receptive field strategically. We replace the baseline YOLOv8 detection head with this LSCD architecture.

2.2.4. More Efficient Loss Function

The default bounding box regression loss function in YOLOv8 is Complete-IoU (CIoU). Given that the complex aspect ratio discrepancy term in CIoU may lead to slower convergence and training instability, we replace it with Distance-IoU (DIoU) [35] loss. DIoU retains the core IoU metric and centroid distance penalty while eliminating the aspect ratio term. Its more streamlined gradient behavior accelerates convergence and enhances training stability. This focused optimization on fundamental localization properties enables maintained or improved detection accuracy. The IoU loss can be generally defined as

L = 1 - I o U + R (B, B^{g t})

(9)

where

R (B, B^{g t})

is the penalty term for predicted box B and target box

B^{g t}

. DIoU minimizes the normalized distance between the centroids of two bounding boxes, with the penalty term defined as

R_{D I o U} = \frac{ρ^{2} (b, b^{g t})}{c^{2}}

(10)

where

ρ (\cdot)

is the Euclidean distance, b and

b^{g t}

represent the center points of B and

B^{g t}

, and c is the diagonal length of the minimum bounding box covering the two boxes. Then the DIoU loss function can be defined as

L_{D I o U} = 1 - I o U + \frac{ρ^{2} (b, b^{g t})}{c^{2}}

(11)

3. Result

The training computing equipment we used is an NVIDIA GeForce RTX 3060 GPU and a 12th Gen Intel(R) Core(TM) i5-12400F CPU. The training is performed on a Windows 10 system using the torch version 2.1.2+cu118. The computing equipment and environmental configurations used in this study are summarized in Table 1. In this study, all models are trained in the same environment and with the same equipment. Each model is trained on 300 epochs, with a batch size of 8 and an image size of 640. Using the SGD optimizer, the initial learning rate, momentum, and weight decay are set to 0.01, 0.937, and 0.0005, respectively. In this study, the rest of the experiments use the same hyperparameters, except for the comparison experiments of different models that utilize their default hyperparameters. It is important to note that in the results section, all photos utilized for validation differ from those in the training and validation sets employed for model training.

3.1. Evaluation of Model

This study employs five metrics to evaluate the model’s performance. The first metric is the F1-score, the harmonic mean of precision and recall. Precision and recall are calculated using Equations (12) and (13).

P r e c i s i o n = T P / (F P + T P)

(12)

R e c a l l = T P / (F N + T P)

(13)

where

T P

(True Positive) denotes the number of correctly predicted labeled palm fruits and

F P

(False Positive) is the number of incorrectly predicted images without palm labels.

F N

(False Negative) denotes that the number of palm fruit targets is undetected. And the calculation formula of the F1-score is

F 1 = \frac{2 \times Precision \times Recall}{Precision + Recall}

(14)

The second metric is GFLOPs, which stands for billions of floating-point operations per second, used to measure the computational efficiency and speed of the model. GFLOPs is the sum of the computational effort of all layers of the network, and a general expression can express its calculation principle:

GFLOPs = \frac{1}{10^{9}} \sum_{l = 1}^{L} (2 \times C_{in}^{(l)} \times K_{h}^{(l)} \times K_{w}^{(l)} \times C_{out}^{(l)} \times H_{out}^{(l)} \times W_{out}^{(l)})

(15)

where the number 2 is the multiplication and addition operation factor,

C_{in}^{(l)}

is the number of input channels,

K_{h}^{(l)} \times K_{w}^{(l)}

represents the convolution kernel size, and

C_{out}^{(l)}

is the number of output channels.

H_{out}^{(l)} \times W_{out}^{(l)}

represents the output feature map size. The summation symbol is used to traverse the L convolutional layers in the network.

The third metric is parameters, which determine the model’s storage and memory footprint size. The parameter is the sum of all the parameters of the network layers, and its general expression is

Parameters = \sum_{l = 1}^{L} (C_{in}^{(l)} \times K_{h}^{(l)} \times K_{w}^{(l)} \times C_{out 1}^{(l)} + C_{out 2}^{(l)} + δ_{l} \cdot 2 C_{out 3}^{(l)})

(16)

where

C_{in}^{(l)} \times K_{h}^{(l)} \times K_{w}^{(l)} \times C_{out 1}^{(l)}

represents the number of convolution kernel weights,

C_{out 2}^{(l)}

is the number of bias terms, and

δ_{l} \cdot 2 C_{out 3}^{(l)}

is the BN layer parameter. The L learnable layers in the network are traversed through the summation symbol.

The fourth evaluation metric is mAP@0.5, which represents the average of the

A P

values across all categories, providing a more comprehensive reflection of the effectiveness of the object detection algorithm. mAP@0.5 can be calculated using the following formula:

A P = \int_{0}^{1} P (R) d R

(17)

mAP = \frac{\sum_{i = 1}^{N} A P_{i}}{N}

(18)

where the

A P

represents the area under the precision–recall curve. N represents the quantity of the detection category.

The fifth metric is FPS, which stands for frames per second and is used to measure the real-time capability of the model. The formula for calculating FPS is

FPS = 1000 / I n f e r e n c e

(19)

where the

I n f e r e n c e

represents the time it takes for the model to infer a frame of imagery, measured in milliseconds. These metrics provide a more comprehensive assessment of the model’s performance.

3.2. Ablation Experiments

In this study, we enhance the YOLOv8 architecture through four key modifications: (1) replacement of the detection head with LSCD, (2) substitution of the original backbone network with StarNet, (3) integration of C2F-Star modules, and (4) adoption of DIoU loss. These refinements collectively establish a network structure optimized for palm fruit ripeness recognition tasks. To systematically assess the individual contributions of each architectural improvement, we conduct ablation studies under identical computational hardware and hyperparameter settings.

As presented in Table 2, replacing the detection head with LSCD reduces model complexity to 80.2% of the original GFLOPs, 78.4% of the original parameters, and 79.0% of the original model size compared to the baseline, while maintaining equivalent recognition accuracy (mAP@0.5). Substituting the original backbone with StarNet further compresses computational requirements to 60.5% of GFLOPs, 52.2% of parameters, and 54.0% of baseline model size. Although this modification incurs a marginal precision reduction, StarNet achieves substantial model streamlining. Subsequently, the C2F-Star module incorporating Star Blocks is deployed to replace the original C2F modules. Relative to the baseline network that features the LSCD detection head and the StarNet backbone, this refinement yielded additional model compression, with reductions in GFLOPs, parameters, and model size, while simultaneously increasing detection precision by 1.4 percentage points. Following these cumulative modifications, the model achieves substantial streamlining with GFLOPs, parameters, and size reduced to 56.0%, 46.0%, and 48.0% of the baseline, respectively. However, this configuration exhibits a 1.2 percentage point reduction in mAP@0.5 compared to the original model. To mitigate this precision degradation, we implement DIoU as a replacement for the default CIoU loss function, elevating the final mAP@0.5 to 76.0%, surpassing the baseline model’s performance. When taken as a whole, these improvements improved recognition accuracy while reducing the model’s complexity by around half. This dual optimization demonstrates the efficacy of our methodological improvements.

3.3. Recognition Performance at Different Ripeness Stages

This section illustrates the efficacy of our method for recognizing palm fruit across five maturation stages: unripe, underripe, ripe, flower, and abnormal. Table 3 illustrates that mature fruit possesses notable attributes, attaining the best detection accuracy (mAP@0.5 = 88.5% and F1-score = 82.0%). Unripe fruits exhibited strong performance (mAP@0.5 = 82.8%, F1-score = 78.1%). The flowers are generally smaller and sometimes concealed by foliage, leading to moderate performance throughout the flowering stage (mAP@0.5 = 72.1%, F1-score = 71.4%). It is worth noting that the underripe stage is a transitional state between ripe and unripe, and the F1-score of 69.9% indicates that there is a phenomenon of accurate positioning but incorrect classification, which is caused by the instability of the transition state characteristics. The model exhibits limited efficacy in detecting anomalous stages. The aberrant stage encompasses various scenarios, including sickness, deformity, and rodent damage, and the features are highly discrete. The model’s recognition data for this phase is below the average. Overall, the results indicate that the model can prioritize the reliability of identifying ripe fruits that can be picked, which meets the engineering requirements of orchard operations.

3.4. Comparison of Experimental Results for Different IoU Losses

To validate the effectiveness of our DIoU implementation, this study employs the original CIoU in YOLOv8 as the baseline for comparative experiments with alternative loss functions. The loss functions compared include CIoU, EIoU, PIoU, and SIoU. CIoU improves localization accuracy by constraining aspect ratios and is suitable for general object detection. CIoU can achieve better mAP in high-resource tasks, but its high computational overhead makes it unsuitable for resource-constrained scenarios. The benefit of EIoU lies in its ability to decompose the aspect ratio loss, attributing greater significance to low-quality predictions, hence expediting the convergence of challenging samples and rectifying the disparity between easy and difficult samples. It is more appropriate for scenarios including densely clustered, diminutive objects and highly distorted forms. The primary complexity of this test is environmental interference, such as occlusion, blur, and uneven lighting, rather than the inherent difficulty of the object itself. The advantage of PIoU lies in its non-monotonic focusing mechanism, which improves learning efficiency for difficult samples and simplifies computation, making it a clear advantage in tasks with imbalanced samples. The core challenge of this task is environmental interference, not target density. SIoU attains advancements in traditional detection tasks via directional perception and is appropriate for contexts exhibiting directional consistency. Nonetheless, the stringent limitations imposed by its numerous punishment terms are at odds with the variety of deformations occurring during occlusion, resulting in an excessive penalization of justifiable deformations.

Palm fruit grows on trees, often surrounded by branches and leaves, which means it is often obscured. DIoU does not enforce aspect ratio matching, making it more tolerant to object width and height variations under occlusion. DIoU also maintains positioning capability through center point distance calculation, enhancing positional robustness. Therefore, DIoU is particularly suitable for recognizing the ripeness of palm fruits on trees. Comparative results of different loss functions are detailed in Table 4. Implementations of PIoU, SIoU, and DIoU all yielded positive improvements, whereas EIoU reduced recognition accuracy by 1.2 percentage points. Among the three beneficial loss functions, our adopted DIoU demonstrated superior efficacy with a 1.7% increase in recognition accuracy.

3.5. Performance Comparison of Different Algorithms

To further demonstrate the superior performance of our model, we conduct comparative experiments against several state-of-the-art detection algorithms, including YOLOv5n, YOLOv8n, YOLOv8s, RT-DETR, and YOLOv10n. As presented in Table 5, our palm fruit ripeness recognition model achieves superior efficiency with the lowest computational footprint among all methods, establishing the most compact architecture. Compared to YOLOv10n (the most compact model among other comparison methods), our solution achieves further reductions to 69.2% of its GFLOPs, 60.4% of parameters, and 52.7% of model size while maintaining competitive accuracy. Comparative analysis reveals YOLOv8n and YOLOv8s achieve the highest F1-score (73.7%), marginally exceeding our method by 0.5 percentage points. Meanwhile, our approach attains 76.0% mAP@0.5, slightly below YOLOv10n yet superior to other algorithms. In inference speed, our proposed model demonstrates competitive FPS performance, surpassed only by YOLOv5n and YOLOv8n while outperforming all other benchmarked models. Overall, our method can considerably cut down on GFLOPs and parameters while ensuring satisfactory accuracy and compliance of the F1-score and FPS metrics with acceptable limits. Our model’s performance is generally superior.

3.6. Ripeness Recognition Performance in Complex Environments

The dataset employed in this study primarily consists of proximity shots captured at very close range to palm fruits. While effective for detailed ripeness assessment, exclusive reliance on such near-field imagery imposes operational inefficiencies for large-scale ripeness screening. To further validate the generalizability of our proposed model, we conducted additional testing using newly captured palm fruit images acquired from operationally efficient vantage points. We utilized mobile phones to walk around palm trees in the PT. PALMINA UTAMA palm plantation in Kalimantan, Indonesia, to capture images of palm fruits on location, employing the ripeness identification model developed in this paper to analyze the acquired images. We employed regular smartphones instead of professional equipment for filming. Any smartphone possessing basic photographic functionalities is appropriate for use. As shown in Figure 4, our ripeness recognition model effectively identifies the ripeness of palm fruit under optimal operational conditions. The size of the palm fruit ranges from one-fifth to one-third of the image height. The scaling in Figure 4 demonstrates that the ripeness recognition model is proficient for palm fruit targets as small as 50 × 50 pixels.

To demonstrate the practical efficacy of our proposed palm fruit ripeness recognition model, we conducted ripeness assessment tests on palm fruits under diverse complex agricultural environments. The inherent growth characteristics of palm fruits—being enveloped by dense foliage canopies on trees—frequently subject them to challenging environments involving occlusion and uneven illumination. Concurrently, static recognition approaches prove inefficient for rapid ripeness assessment, while accelerated data acquisition inevitably introduces motion blur artifacts during camera movement. Figure 5, Figure 6, and Figure 7, respectively, show the recognition effects of palm fruits under uneven lighting, motion blur, and occlusion conditions. The on-tree palm fruit ripeness recognition model proposed in this paper shows excellent recognition performance.

3.7. Model Visualization Analysis

While deep learning proves highly effective for object detection tasks, its opaque internal processes often lack interpretability. To address this, our research employs heatmap visualization techniques that transform the "black-box" decision-making into human-comprehensible visual signals. These heatmaps utilize gradient color mapping to represent the contribution intensity of each pixel location to the model’s predictions, with warmer hues indicating higher contributions and cooler tones signifying lower relevance, thereby visually revealing regions of visual saliency prioritized by the model. The Grad-CAMPlusPlus method is used to generate heatmaps in this section. This study employs comparative visualization of recognition heatmaps between the baseline YOLOv8 model and our proposed palm fruit ripeness recognition model. As depicted in Figure 8, specular reflection occurs at the apex region of palm fruit granules under direct illumination, a phenomenon prominently visible in the original imagery due to the fruit’s smooth pericarp surface. Heatmap analysis reveals that the baseline YOLOv8 model exhibits over-attention to granules’ apex features, rendering it vulnerable to robustness degradation under specular reflection interference. In contrast, our proposed palm ripeness recognition model demonstrates holistic feature extraction across the entire fruit morphology, as evidenced by its heatmap distribution, yielding superior robustness through comprehensive feature extraction capability.

4. Discussion

This lightweight technique attains an average mAP@0.5 accuracy of 76.0% in intricate tree landscapes characterized by uneven lighting, motion blur, and occlusion. It can substantially reduce the network size while preserving recognition reliability in intricate situations. This paper enhances the YOLOv8 network by substituting the LSCD detection head, replacing the original backbone with StarNet, incorporating the C2F-Star module, and implementing the progressive combination optimization method of DIoU, thereby rendering it more adept for the task of palm fruit ripeness recognition.

Following the introduction of StarNet, GFLOPs, parameters, and model size are diminished by 24.7%, 33.5%, and 31.9%, respectively, whereas mAP@0.5 had a decline of 2.6%. StarNet employs an exponential feature reuse strategy, utilizing weight sharing via a recursive structure that does not completely align with the feature scale of the initial convolutional layer. This leads to the attenuation of shallow texture features during the recursive process, diminishing the capacity to share weights for optimizing goals across various scales. StarNet attains lightweight characteristics via weight reuse, albeit at the expense of diminished representational flexibility. The adverse impact is mitigated by the C2F-Star module and DIoU loss. The C2F-Star module employs cross-layer skip connections to reintegrate superficial details, counteracting the smoothing impact of StarNet. Following the implementation of the C2F-Star module, mAP@0.5 experienced an increase of 1.4%. The DIoU loss enhances the central alignment of the bounding box to mitigate positioning drift resulting from feature ambiguity and employs gradient-stabilized positioning loss for rectification. Following the implementation of the DIoU loss, mAP@0.5 experiences an additional rise of 1.7%. StarNet itself is not a defective module, but its recursive nature requires other parts of the network to specifically compensate for feature diversity. C2F-Star provides detailed supplements, and DIoU corrects positioning deviations. Through this combination, the final accuracy exceeds that of the baseline model.

The proposed method for assessing palm fruit ripeness considers both lightweight characteristics and recognition robustness; nonetheless, it possesses certain limitations. The present study does not address interferences such as precipitation, fog, and varying times of day (late at night, intense direct sunlight). The model demonstrates resilience under moderate weather conditions. Conditions such as precipitation, fog, and nighttime necessitate supplementary data and meticulous adjustment. Currently, the model fully learns ripe palm fruits, ensuring they meet harvesting priorities. However, there is room for improvement in identifying underripe and abnormal samples. In the future, the small object detection layer can be upgraded to adapt it to flower and lesion recognition. Multispectral imaging can also be deployed to resolve color confusion through multimodal fusion. The current research, with its ultra-small model (2.85 MB) and high frame rate (253 FPS), lays the foundation for mobile deployment. In the future, the program can be deployed on mobile devices, allowing workers on-site to access ripeness identification results by taking photos with their phones. Furthermore, integrating SLAM technology with the current ripeness identification model to create ripeness maps for palm plantations is a valuable development direction.

5. Conclusions

This paper presents an efficient network for recognizing palm fruit ripeness, integrated within StarNet, to reconcile a lightweight design with robustness in complicated circumstances. This network, built upon YOLOv8, incorporates enhancements to the backbone network, neck, detection head, and loss function. The LSCD detection head and StarNet were implemented to diminish the model’s computational complexity and parameter quantity. The original C2F module is subsequently replaced with C2F-Star, thereby enhancing the network’s detection accuracy. The more efficient DIoU has supplanted the original CIoU, enhancing model convergence and considerably augmenting accuracy. Following the introduction of StarNet, C2F-Star furnished comprehensive information, while DIoU rectified positional inaccuracies. The integration of these modules resulted in a network that is both lightweight and highly effective in recognition. The mAP50 accuracy of the palm fruit ripeness recognition model developed in this research is enhanced to 76.0% by the advancement of the network via progressive combinatorial optimization. The model’s GFLOPs, parameter, and model size are 4.5 GB, 1.37 MB, and 2.85 MB, respectively, constituting 56.0%, 46.0%, and 48.0% of the original model. Robustness evaluation under complex environments, including uneven illumination, motion blur, and occlusion, demonstrates the resilience of our model for recognizing palm ripeness. Complementing these robustness tests, we further validate generalization capabilities using newly captured palm fruit images from operationally efficient vantage points distinct from the original dataset. Recognition performance on this independent test set confirms the model’s robust generalization capabilities. We employ a heatmap visualization to demystify the model’s internal decision-making through intuitive, pixel-level color mapping, which validates enhanced feature saliency and demonstrates superior focal precision in our proposed methodology. This study also discussed the limitations of the method used and pointed out that the “plantation-level ripeness map” and “real-time viewing of fruit ripeness on mobile phones” functions are extremely valuable development directions.

Author Contributions

Conceptualization, J.L. (Jiehao Li); data curation, T.Z., Q.G. and J.L. (Jiahuan Lu); formal analysis, S.Z. and J.L. (Jiahuan Lu); funding acquisition, J.L. (Jiehao Li); investigation, Q.G.; methodology, J.L. (Jiehao Li) and T.Z.; project administration, S.Z. and L.W.; software, Q.G. and J.L. (Jiahuan Lu); supervision, S.Z. and L.W.; validation, T.Z.; writing—original draft, J.L. (Jiehao Li) and T.Z.; writing—review and editing, J.L. (Jiehao Li). All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Guangdong High-Level Talents Special Support Program (2024TQ08Z107), Guangdong Science and Technology Programme (2025B0202010036), Key Research and Development Program of Guangxi Province (AB24010107), Key Laboratory of Spectroscopy Sensing, Ministry of Agriculture and Rural Affairs (2024ZJUGP004), Young Talent Support Project of Guangzhou Association for Science and Technology (QT2024-006), and State Key Laboratory of Agricultural Equipment Technology (NSKLAET-202408).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

Authors Jiehao Li, Qiaoming Gao and Lianqi Wang were employed by the company Guangxi Hepu County Huilaibao Manufacturing Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Meijaard, E.; Brooks, T.M.; Carlson, K.M.; Slade, E.M.; Garcia-Ulloa, J.; Gaveau, D.L.; Lee, J.S.H.; Santika, T.; Juffe-Bignoli, D.; Struebig, M.J.; et al. The environmental impacts of palm oil in context. Nat. Plants 2020, 6, 1418–1426. [Google Scholar] [CrossRef] [PubMed]
Zaki, M.A.M.; Ooi, J.; Ng, W.P.Q.; How, B.S.; Lam, H.L.; Foo, D.C.; Lim, C.H. Impact of industry 4.0 technologies on the oil palm industry: A literature review. Smart Agric. Technol. 2025, 10, 100685. [Google Scholar] [CrossRef]
FAO. FAO Statistical Yearbook-World Food and Agriculture; FAO: Rome, Italy, 2024. [Google Scholar]
Goh, J.Y.; Md Yunos, Y.; Mohamed Ali, M.S. Fresh Fruit Bunch Ripeness Classification Methods: A Review. Food Bioprocess Technol. 2025, 18, 183–206. [Google Scholar] [CrossRef]
Lai, J.W.; Ramli, H.R.; Ismail, L.I.; Wan Hasan, W.Z. Oil palm fresh fruit bunch ripeness detection methods: A systematic review. Agriculture 2023, 13, 156. [Google Scholar] [CrossRef]
Li, C.; Wu, H.; Zhang, T.; Lu, J.; Li, J. Lightweight Network of Multi-Stage Strawberry Detection Based on Improved YOLOv7-Tiny. Agriculture 2024, 14, 1132. [Google Scholar] [CrossRef]
Li, J.; Liu, Y.; Li, C.; Luo, Q.; Lu, J. Pineapple detection with yolov7-tiny network model improved via pruning and a lightweight backbone sub-network. Remote Sens. 2024, 16, 2805. [Google Scholar] [CrossRef]
Khan, N.; Kamaruddin, M.A.; Sheikh, U.U.; Yusup, Y.; Bakht, M.P. Oil palm and machine learning: Reviewing one decade of ideas, innovations, applications, and gaps. Agriculture 2021, 11, 832. [Google Scholar] [CrossRef]
Li, J.; Zhang, T.; Luo, Q.; Zeng, S.; Luo, X.; Chen, C.P.; Yang, C. A lightweight palm fruit detection network for harvesting equipment integrates binocular depth matching. Comput. Electron. Agric. 2025, 233, 110061. [Google Scholar] [CrossRef]
Goh, J.Y.; Mohamed Ali, M.S.; Md Yunos, Y.; Sheikh, U.U.; Khan, M.S. Outdoor RGB and Point Cloud Depth Dataset for Palm Oil Fresh Fruit Bunch Ripeness Classification and Localization. Sci. Data 2025, 12, 687. [Google Scholar] [CrossRef]
Iddris, N.A.A.; Formaglio, G.; Paul, C.; von Groß, V.; Chen, G.; Angulo-Rubiano, A.; Berkelmann, D.; Brambach, F.; Darras, K.F.; Krashevska, V.; et al. Mechanical weeding enhances ecosystem multifunctionality and profit in industrial oil palm. Nat. Sustain. 2023, 6, 683–695. [Google Scholar] [CrossRef]
Taparugssanagorn, A.; Siwamogsatham, S.; Pomalaza-Ráez, C. A non-destructive oil palm ripeness recognition system using relative entropy. Comput. Electron. Agric. 2015, 118, 340–349. [Google Scholar] [CrossRef]
Septiarini, A.; Sunyoto, A.; Hamdani, H.; Kasim, A.A.; Utaminingrum, F.; Hatta, H.R. Machine vision for the maturity classification of oil palm fresh fruit bunches based on color and texture features. Sci. Hortic. 2021, 286, 110245. [Google Scholar] [CrossRef]
Alfatni, M.S.M.; Khairunniza-Bejo, S.; Marhaban, M.H.B.; Saaed, O.M.B.; Mustapha, A.; Shariff, A.R.M. Towards a real-time oil palm fruit maturity system using supervised classifiers based on feature analysis. Agriculture 2022, 12, 1461. [Google Scholar] [CrossRef]
Chang, C.; Parthiban, R.; Kalavally, V.; Hung, Y.M.; Wang, X. Unharvested palm fruit bunch ripeness detection with hybrid color correction. Smart Agric. Technol. 2024, 9, 100643. [Google Scholar] [CrossRef]
Elwirehardja, G.N.; Prayoga, J.S. Oil palm fresh fruit bunch ripeness classification on mobile devices using deep learning approaches. Comput. Electron. Agric. 2021, 188, 106359. [Google Scholar] [CrossRef]
Suharjito; Junior, F.A.; Koeswandy, Y.P.; Debi; Nurhayati, P.W.; Asrol, M.; Marimin. Annotated datasets of oil palm fruit bunch piles for ripeness grading using deep learning. Sci. Data 2023, 10, 72. [Google Scholar] [CrossRef] [PubMed]
Salim, E.; Suharjito. Hyperparameter optimization of YOLOv4 tiny for palm oil fresh fruit bunches maturity detection using genetics algorithms. Smart Agric. Technol. 2023, 6, 100364. [Google Scholar] [CrossRef]
Ali, M.M.; Hashim, N.; Hamid, A.S.A. Combination of laser-light backscattering imaging and computer vision for rapid determination of oil palm fresh fruit bunches maturity. Comput. Electron. Agric. 2020, 169, 105235. [Google Scholar] [CrossRef]
Zolfagharnassab, S.; Shariff, A.R.B.M.; Ehsani, R.; Jaafar, H.Z.; Aris, I.B. Classification of oil palm fresh fruit bunches based on their maturity using thermal imaging technique. Agriculture 2022, 12, 1779. [Google Scholar] [CrossRef]
Goh, J.Q.; Mohamed Shariff, A.R.; Mat Nawi, N. Application of optical spectrometer to determine maturity level of oil palm fresh fruit bunches based on analysis of the front equatorial, front basil, back equatorial, back basil and apical parts of the oil palm bunches. Agriculture 2021, 11, 1179. [Google Scholar] [CrossRef]
Li, J.; Zeng, D.; Luo, Q.; Luo, X.; Chen, C.P.; Yang, C. Feature Assessment and Enhanced Vertical Constraint Lidar Odometry and Mapping on Quadruped Robot. IEEE Trans. Instrum. Meas. 2025, 74, 5008814. [Google Scholar] [CrossRef]
Liu, X.; Wang, J.; Li, J. URTSegNet: A real-time segmentation network of unstructured road at night based on thermal infrared images for autonomous robot system. Control Eng. Pract. 2023, 137, 105560. [Google Scholar] [CrossRef]
Dhanya, V.; Subeesh, A.; Kushwaha, N.; Vishwakarma, D.K.; Kumar, T.N.; Ritika, G.; Singh, A. Deep learning based computer vision approaches for smart agricultural applications. Artif. Intell. Agric. 2022, 6, 211–229. [Google Scholar] [CrossRef]
Li, J.; Li, C.; Luo, X.; Chen, C.P.; Yang, C. Lightweight pineapple detection framework for agricultural robots via YOLO-v5sp. Int. J. Agric. Biol. Eng. 2025, 18, 204–214. [Google Scholar]
He, F.; Zhang, Q.; Deng, G.; Li, G.; Yan, B.; Pan, D.; Luo, X.; Li, J. Research Status and Development Trend of Key Technologies for Pineapple Harvesting Equipment: A Review. Agriculture 2024, 14, 975. [Google Scholar] [CrossRef]
Li, J.; Chen, H.; Lu, J.; Luo, X.; Zeng, S.; Chen, C.P.; Yang, C. A small-scale target enhancement framework for aerial pineapple images on accurate agricultural information. Comput. Electron. Agric. 2025, 1–34. [Google Scholar]
Liao, J.; Tao, W.; Liang, Y.; He, X.; Wang, H.; Zeng, H.; Wang, Z.; Luo, X.; Sun, J.; Wang, P.; et al. Multi-scale monitoring for hazard level classification of brown planthopper damage in rice using hyperspectral technique. Int. J. Agric. Biol. Eng. 2024, 17, 202–211. [Google Scholar] [CrossRef]
Li, J.; Li, C.; Zeng, S.; Luo, X.; Chen, C.P.; Yang, C. A lightweight pineapple detection network based on YOLOv7-tiny for agricultural robot system. Comput. Electron. Agric. 2025, 231, 109944. [Google Scholar] [CrossRef]
Liao, J.; He, X.; Liang, Y.; Wang, H.; Zeng, H.; Luo, X.; Li, X.; Zhang, L.; Xing, H.; Zang, Y. A Lightweight Cotton Verticillium Wilt Hazard Level Real-Time Assessment System Based on an Improved YOLOv10n Model. Agriculture 2024, 14, 1617. [Google Scholar] [CrossRef]
Asrol, M.; Utama, D.N.; Junior, F.A.; Marimin. Real-time oil palm fruit grading system using smartphone and modified YOLOv4. IEEE Access 2023, 11, 59758–59773. [Google Scholar] [CrossRef]
Suharjito; Naftali, M.G.; Hugo, G.; Priyadi, M.R.A.; Asrol, M.; Utama, D.N. Oil Palm Fruits Dataset in Plantations for Harvest Estimation Using Digital Census and Smartphone. Sci. Data 2025, 12, 972. [Google Scholar] [CrossRef] [PubMed]
Ma, X.; Dai, X.; Bai, Y.; Wang, Y.; Fu, Y. Rewrite the stars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 5694–5703. [Google Scholar]
Zeng, J.; Zhong, H. YOLOv8-PD: An improved road damage detection algorithm based on YOLOv8n model. Sci. Rep. 2024, 14, 12052. [Google Scholar] [CrossRef]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]

Figure 1. Display of the palm fruit ripeness dataset on the tree.

Figure 2. Palm fruit ripeness recognition network structure.

Figure 3. StarNet architecture and C2F-Star module structure diagram. Part A is used to represent the network structure of StarNet, and Part B is used to represent the network structure of the C2f-Star module.

Figure 4. The recognition effect of images acquired from an operationally efficient vantage point.

Figure 5. Recognition effect under uneven lighting.

Figure 6. Recognition effect under motion blur.

Figure 7. Recognition effect under occlusion.

Figure 8. Heatmap visualization comparison effect. (a,b) are in motion blur environments, (c) is in an occlusion environment, and (d) is in an uneven lighting environment. The color gradient indicates the contribution of each pixel position to the model prediction result. Red indicates high contribution, and blue indicates low contribution.

Table 1. Experimental configuration.

Configuration	Parameters
CPU	i5-12400F
GPU	NVIDIA GeForce RTX 3060
Operating system	Windows 10
Accelerated environment	CUDA 11.8
Library	torch 2.1.2+cu118

Table 2. Ablation experiment results.

Basic	+	+	+	+	mAP@0.5	GFLOPs	Params	Model Size
Model	LSCD	StarNet	C2F-Star	DIoU	(%)	(G)	(M)	(MB)
✓					75.5%	8.1	3.01	5.99
✓	✓				75.5%	6.5	2.36	4.73
✓	✓	✓			72.9%	4.9	1.57	3.22
✓	✓	✓	✓		74.3%	4.5	1.37	2.85
✓	✓	✓	✓	✓	76.0%	4.5	1.37	2.85

Table 3. Identification results at different ripeness stages.

Ripeness Stages	Unripe	Underripe	Ripe	Flower	Abnormal
F1-score	78.1%	69.9%	82.0%	71.4%	57.7%
mAP@0.5	82.8%	73.5%	88.5%	72.1%	62.9%

Table 4. Comparison of experimental results of different IoU losses.

Loss Function	CIoU	EIoU	PIoU	SIoU	DIoU
mAP@0.5	74.3%	73.1%	74.5%	75.2%	76.0%

Table 5. Comparison of experimental results of different models.

Model	F1-Score (%)	mAP@0.5 (%)	GFLOPs (G)	Params (M)	Model Size (MB)	FPS (Frame/S)
YOLOv5n	73.4%	75.9%	7.1	2.50	5.1	299
YOLOv8n	73.7%	75.5%	8.1	3.01	6.0	297
YOLOv8s	73.7%	75.5%	28.4	11.13	21.4	151
YOLOv10n	72.9%	76.4%	6.5	2.27	5.5	246
RT-DETR	72.7%	67.6%	100.6	28.45	56.3	37
Our	73.2%	76.0%	4.5	1.37	2.9	253

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, J.; Zhang, T.; Zeng, S.; Gao, Q.; Wang, L.; Lu, J. StarNet-Embedded Efficient Network for On-Tree Palm Fruit Ripeness Identification in Complex Environments. Agriculture 2025, 15, 1823. https://doi.org/10.3390/agriculture15171823

AMA Style

Li J, Zhang T, Zeng S, Gao Q, Wang L, Lu J. StarNet-Embedded Efficient Network for On-Tree Palm Fruit Ripeness Identification in Complex Environments. Agriculture. 2025; 15(17):1823. https://doi.org/10.3390/agriculture15171823

Chicago/Turabian Style

Li, Jiehao, Tao Zhang, Shan Zeng, Qiaoming Gao, Lianqi Wang, and Jiahuan Lu. 2025. "StarNet-Embedded Efficient Network for On-Tree Palm Fruit Ripeness Identification in Complex Environments" Agriculture 15, no. 17: 1823. https://doi.org/10.3390/agriculture15171823

APA Style

Li, J., Zhang, T., Zeng, S., Gao, Q., Wang, L., & Lu, J. (2025). StarNet-Embedded Efficient Network for On-Tree Palm Fruit Ripeness Identification in Complex Environments. Agriculture, 15(17), 1823. https://doi.org/10.3390/agriculture15171823

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

StarNet-Embedded Efficient Network for On-Tree Palm Fruit Ripeness Identification in Complex Environments

Abstract

1. Introduction

2. Methodologies

2.1. On-Tree Palm Fruit Ripeness Dataset

2.2. Enhanced Palm Fruit Ripeness Recognition Network

2.2.1. StarNet Replaces the Backbone Network

2.2.2. C2F-Star Module Introduced into the Neck Network

2.2.3. Detection Network Introduces LSCD Detection Head

2.2.4. More Efficient Loss Function

3. Result

3.1. Evaluation of Model

3.2. Ablation Experiments

3.3. Recognition Performance at Different Ripeness Stages

3.4. Comparison of Experimental Results for Different IoU Losses

3.5. Performance Comparison of Different Algorithms

3.6. Ripeness Recognition Performance in Complex Environments

3.7. Model Visualization Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI