Comparative Analysis of CNN-Based Semantic Segmentation for Apple Tree Canopy Size Recognition in Automated Variable-Rate Spraying

Jin, Tantan; Kang, Su Min; Kim, Na Rin; Kim, Hye Ryeong; Han, Xiongzhe

doi:10.3390/agriculture15070789

Open AccessArticle

Comparative Analysis of CNN-Based Semantic Segmentation for Apple Tree Canopy Size Recognition in Automated Variable-Rate Spraying

by

Tantan Jin

¹,

Su Min Kang

²,

Na Rin Kim

²,

Hye Ryeong Kim

² and

Xiongzhe Han

^1,2,*

¹

Interdisciplinary Program in Smart Agriculture, College of Agriculture and Life Sciences, Kangwon National University, Chuncheon 24341, Republic of Korea

²

Department of Biosystems Engineering, College of Agriculture and Life Sciences, Kangwon National University, Chuncheon 24341, Republic of Korea

^*

Author to whom correspondence should be addressed.

Agriculture 2025, 15(7), 789; https://doi.org/10.3390/agriculture15070789

Submission received: 4 March 2025 / Revised: 31 March 2025 / Accepted: 4 April 2025 / Published: 6 April 2025

(This article belongs to the Special Issue Agricultural Machinery and Technology for Fruit Orchard Management)

Download

Browse Figures

Versions Notes

Abstract

:

Efficient pest control in orchards is crucial for preserving crop quality and maximizing yield. A key factor in optimizing automated variable-rate spraying is accurate tree canopy size estimation, which helps reduce pesticide overuse while minimizing environmental and health risks. This study evaluates the performance of two advanced convolutional neural networks, PP-LiteSeg and fully convolutional networks (FCNs), for segmenting tree canopies of varying sizes—small, medium, and large—using short-term dense-connection networks (STDC1 and STDC2) as backbones. A dataset of 305 field-collected images was used for model training and evaluation. The results show that FCNs with STDC backbones outperform PP-LiteSeg, delivering superior semantic segmentation accuracy and background classification. The STDC1-based model excels in precision variable-rate spraying, achieving an Intersection-over-Union of up to 0.75, Recall of 0.85, and Precision of approximately 0.85. Meanwhile, the STDC2-based model demonstrates greater optimization stability and faster convergence, making it more suitable for resource-constrained environments. Notably, the STDC2-based model significantly enhances canopy-background differentiation, achieving a background classification Recall of 0.9942. In contrast, PP-LiteSeg struggles with small canopy detection, leading to reduced segmentation accuracy. These findings highlight the potential of FCNs with STDC backbones for automated apple tree canopy recognition, advancing precision agriculture and promoting sustainable pesticide application through improved variable-rate spraying strategies.

Keywords:

semantic segmentation; apple tree canopy size; variable-rate spraying; convolutional neural networks; orchard management

1. Introduction

Conventional agriculture relies heavily on agrochemicals, particularly pesticides, to maintain crop productivity and quality. While inadequate pest control can significantly reduce yields [1], excessive and improper pesticide application raises serious concerns due to its harmful effects on non-target organisms, human health, and ecosystems [2]. The misuse of pesticides can accelerate pest resistance, increase production costs, reduce resource efficiency, and destabilize the agroecosystem, ultimately diminishing the effectiveness of pest management strategies. Global pesticide consumption reached approximately 4.3 million metric tons in 2023, with projections from the Food and Agriculture Organization of the United Nations estimating an increase to 4.41 million metric tons by 2027 [3]. This growing reliance on pesticides exacerbates both economic and environmental risks, posing significant challenges to agricultural sustainability [4]. Consequently, reducing pesticide usage is imperative for promoting environmental sustainability and mitigating economic losses.

Modern orchards increasingly integrate advanced pest and disease control technologies to enhance crop health management [5]. The aging agricultural workforce, declining rural labor availability, and harsh working conditions have accelerated the adoption of automation in fruit tree production. Among these tasks, pest control remains among the most labor-intensive and challenging aspects. A major limitation of current pest management systems is the health risks posed to workers and the environmental contamination resulting from pesticide over-spraying. When spray applications do not account for tree canopy size or ecological conditions, pesticide distribution becomes uneven, leading to increased waste, higher drift risks, and unnecessary chemical exposure [4].

With advancements in intelligent orchard pest control, variable-rate spraying has emerged as an essential technique in precision agriculture. This method dynamically adjusts spray parameters based on tree canopy characteristics, reducing pesticide waste and improving application precision [6,7]. Accurate orchard canopy detection is critical for these systems, as high-quality image datasets are essential for improving segmentation and classification accuracy. Existing canopy detection technologies include ultrasonic sensors [8,9], LiDAR [6,10], and computer vision [11,12], each offering distinct advantages and limitations [13,14].

Ultrasonic-based variable control spraying has shown promising results. For instance, Maghsoudi et al. (2015) [8] integrated ultrasonic sensors with a multi-layer perceptron (MLP) neural network to estimate canopy volume, reducing pesticide use by 34.5%. Nan et al. (2023) [9] optimized ultrasonic-based canopy spraying, achieving a 35% reduction in tracking error, a 60.8% decrease in ground deposition, and a 32.1% reduction in spray flow, significantly lowering costs and pollution. LiDAR-based variable control spraying has further enhanced precision. Liu et al. (2022) [6] combined 3D LiDAR, an encoder, and an IMU for autonomous navigation, achieving pesticide reductions of 32.46%, airborne drift reductions of 44.34%, and ground loss reductions of 58.14%. Luo et al. (2024) [10] applied a LiDAR-based approach to cotton canopy spraying, reducing spray volume by 43.37% while improving deposition efficiency and coverage. However, ultrasonic sensors have limited resolution, which affects canopy recognition, while LiDAR—though highly precise—is costly and sensitive to environmental factors [14,15].

Semantic segmentation, a key technique in computer vision, enables pixel-wise canopy recognition, ensuring precise target area delineation for optimized spray control [16,17,18,19]. Deep learning models such as U-Net, DeepLabV3+, and PSPNet have been widely adopted for high-accuracy segmentation in orchard monitoring and precision agriculture [11,20,21,22,23]. However, traditional convolutional neural network (CNN)-based segmentation models often suffer from high computational complexity, making real-time deployment on resource-constrained agricultural devices challenging [24]. To address this, lightweight deep learning architectures have been introduced to balance computational efficiency with segmentation performance. PP-LiteSeg utilizes depthwise separable convolutions to significantly reduce computational overhead while maintaining robust segmentation accuracy [25,26]. Meanwhile, fully convolutional networks (FCNs) demonstrate strong adaptability to varying object scales, efficient end-to-end training, and superior spatial information preservation due to their fully convolutional nature [12]. These advancements enhance the feasibility of deploying semantic segmentation models in real-time agricultural applications, enabling more efficient and intelligent precision farming solutions.

In the context of variable-rate spraying systems, precise orchard canopy detection is crucial for optimizing pesticide use and improving spraying efficiency. This study compares PP-LiteSeg and FCNs for apple tree canopy segmentation, utilizing short-term dense-connection networks (STDC1 and STDC2) as backbone networks for feature extraction. Through experimental analysis, this study aims to identify the most suitable model for variable-rate spraying in orchards. Beyond accurate canopy segmentation, this research provides valuable insights into optimizing smart agricultural equipment, reducing pesticide waste, enhancing spraying precision, and advancing automation in orchard management.

2. Materials and Methods

2.1. Overall Workflow

Figure 1 illustrates the overall workflow of this study. The process begins with data acquisition, where RGB-D images are captured using an unmanned ground vehicle (UGV)-based platform, accompanied by an environmental survey to ensure comprehensive coverage of tree canopy structures. In the dataset preparation phase, the acquired data are preprocessed, categorized by canopy size, and partitioned into training and testing sets to facilitate robust model development. During the model development phase, segmentation models—PP-LiteSeg and FCNs—are trained using STDC1 and STDC2 backbone networks to achieve precise canopy segmentation and classification. The model evaluation phase involves rigorous performance assessments, including prediction accuracy, classification analysis, and validation metrics, to ensure the model’s reliability and generalizability. Finally, in the model implementation phase, the trained system is deployed for tree canopy size segmentation, followed by an evaluation of its potential to enhance the precision and adaptability of variable-rate spraying in orchard environments.

2.2. Data Acquisition

A high-quality image dataset is critical for improving the accuracy of tree canopy segmentation and classification, which serves as the foundation for precise control in variable-rate spraying systems. To ensure the comprehensive coverage of canopy structures and growth conditions, a systematic data collection campaign was conducted from June to October 2024 at the apple orchard of Kangwon National University’s experimental farm in Chuncheon, South Korea. This period aligns with the peak growing season when apple trees exhibit dense foliage and fully developed canopies, providing an optimal dataset for training and evaluating deep learning models. The collected data not only enhance canopy detection accuracy but also provide precise dimensional information, supporting the optimization of pesticide application strategies in variable-rate spraying systems.

To facilitate automated and consistent data collection, this study employed a robotic platform that integrates the Intel RealSense D435i RGB-D depth camera (Intel Corporation, Santa Clara, CA, USA) and the AgileX Bunker UGV (AgileX Robotics, Shenzhen, China), as shown in Figure 2. The robotic platform navigated orchard inter-row paths characterized by wide tree spacing and moderate canopy height, ensuring a clear forward view for the RGB-D camera and minimizing occlusion. Operating at a controlled speed over slightly uneven terrain, the system was supervised to maintain data quality. These favorable field conditions—open row structure, unobstructed visibility, and stable ground—supported the efficient and repeatable acquisition of high-quality canopy imagery representative of typical orchard environments. The Bunker UGV, designed for high-performance navigation in unstructured environments, ensures stable operation in orchard settings, making it well suited for precision agriculture applications. A laptop, securely mounted on the platform and running Ubuntu 18.04, serves as the central computing and control unit, managing real-time image acquisition, data processing, and trajectory control via USB and wireless communication. The Intel RealSense D435i camera, selected for its depth-sensing capabilities and RealSense SDK support, enables precise parameter adjustments and high-resolution RGB image acquisition. To maximize coverage and minimize occlusions, the camera is mounted on a 20 cm high rear bracket, allowing it to autonomously capture multi-angle canopy images as the UGV traverses the orchard rows. The captured data are stored in real time on the onboard laptop, ensuring data integrity for subsequent segmentation and classification tasks. The technical specifications of the robotic platform and the RGB-D camera are summarized in Table 1.

2.3. Dataset Preparation

To ensure that the variable-rate spraying system can effectively adapt to different fruit tree types, this study measured the canopy width and height of each apple tree during data acquisition. The trees were categorized based on their total canopy dimensions (height + width) to facilitate precise spray adjustments. Specifically, trees with a total canopy size of less than 350 cm were classified as small. These trees typically feature a compact structure with limited foliage coverage, requiring a minimal spray volume to reduce pesticide waste. Trees with canopy dimensions ranging from 350 to 500 cm were classified as medium. These trees exhibit a balanced canopy structure with moderate coverage, allowing for uniform pesticide distribution and optimized spraying efficiency. Trees exceeding 500 cm were designated as large, with fully developed canopies that require an extended spray range to ensure effective pesticide deposition on the upper leaves. Accurate canopy classification is essential for an effective variable-rate application. Misclassifying a large tree as small may lead to under-application, reducing pest control efficacy, while misclassifying a small tree as large can result in pesticide overuse, elevating operational costs and environmental impact. Therefore, precise canopy assessment plays a critical role in maximizing spray effectiveness while minimizing pesticide waste.

This classification system forms the foundation for optimizing the variable-rate spraying strategy, enabling intelligent adjustments based on canopy size. As a result, pesticide efficiency is enhanced, unnecessary application is minimized, and environmental impact is reduced. Furthermore, this approach improves model adaptability, allowing for generalization across different fruit tree species, canopy structures, and growth stages. This broadens the robustness and practicality of variable-rate spraying systems in orchard applications.

Prior to model training, a series of preprocessing steps were implemented to ensure data quality and consistency. Low-quality images—such as those exhibiting blurring, underexposure, or overexposure—were manually removed to improve dataset reliability. To address class imbalance, stratified sampling was conducted by grouping the images into small, medium, and large canopy categories. No artificial data augmentation techniques (e.g., flipping, rotation, or color jittering) were applied, as the dataset was collected continuously in real orchard environments under naturally varying lighting conditions, camera angles, and canopy structures. This inherent variability enriched the dataset and enhanced the model’s ability to generalize to unseen samples. A total of 305 images were selected and divided into training, validation, and test sets following a 7:2:1 ratio. The training set was used for feature extraction and model learning, the validation set for hyperparameter tuning to prevent overfitting and optimize model performance, and the test set for final evaluation to ensure generalization, consistency, and robustness across diverse canopy structures. All images were resized to 512 × 512 pixels to standardize input dimensions while balancing spatial detail with computational efficiency. Pixel values were normalized to the [0, 1] range to support stable model training. For canopy segmentation and classification, the PaddleSeg framework was adopted as the semantic segmentation framework [27].

For image annotation, Labelme software (version 5.5.0, MIT Computer Vision Group, Cambridge, MA, USA) [28] was employed to manually label the tree canopies. The dataset was categorized into three primary classes—small, medium, and large—while non-target areas, such as the orchard floor, support poles, and potential obstructions, were labeled as background. This classification strategy ensured that the model focused solely on tree canopies, thereby improving segmentation accuracy and enhancing robustness in variable orchard conditions. Figure 3 illustrates the dataset annotations, highlighting the distinctions between the classified tree categories.

2.4. Enhanced Deep CNN Architectures for Semantic Segmentation

2.4.1. PP-LiteSeg Model

The PP-LiteSeg model, introduced by Peng et al. (2022) [29], is a lightweight semantic segmentation network optimized for resource-constrained environments. Widely adopted in precision agriculture applications such as crop management and large-scale agricultural monitoring [26,30,31], the model combines efficiency with accuracy. As shown in Figure 4, the architecture consists of three primary components: an encoder, a flexible lightweight decoder (FLD), and a spatial pyramid pooling module (SPPM). This design significantly reduces computational complexity while improving segmentation accuracy and inference speed [29].

The encoder utilizes dense convolutional structures and multi-scale feature extraction to capture both target and background information at varying scales. Hierarchical feature propagation further enhances the model’s representational power, making it well suited for segmentation tasks in orchard environments [26]. The FLD progressively reduces channel dimensions while integrating skip connections to fuse shallow spatial and deep semantic features, ensuring the precise segmentation of tree canopy boundaries [30]. Furthermore, the unified attention fusion module (UAFM) dynamically adjusts feature weights using spatial and channel attention mechanisms, boosting the model’s adaptability to variations in vegetation and lighting conditions across diverse orchard environments [32]. The SPPM employs pyramid pooling strategies to capture global context information with minimal computational cost, enhancing the model’s ability to segment tree canopies effectively [33]. Together, these components enable PP-LiteSeg to capture multi-scale features of fruit trees with high accuracy, making it particularly effective for orchard applications.

Despite its efficiency, PP-LiteSeg faces challenges when working with ultra-high-resolution imagery, especially in complex scenarios such as dense foliage and overlapping objects. Recent innovations, including dynamic convolution, adaptive feature fusion, and multi-modal data processing, have been developed to improve global context modeling and adaptability to heterogeneous environments [34,35,36]. For instance, Zheng et al. (2024) [30] enhanced UAV-based crop classification by integrating adaptive pooling and spatial pyramid techniques for better multi-scale feature extraction. Similarly, Cao et al. (2024) [33] optimized rural road segmentation by employing attention fusion and strip pooling to address unstructured features. Additionally, Zhang and Ling (2024) [37] advanced LiDAR point cloud denoising using WeatherBlock and sequential attention fusion to improve segmentation accuracy under adverse weather conditions. With its modular and efficient design, PP-LiteSeg remains a pivotal solution for automated orchard monitoring and precision agriculture, offering a perfect balance between computational efficiency and segmentation accuracy, ideal for large-scale deployment in dynamic, resource-limited agricultural settings.

2.4.2. FCN Model

FCNs revolutionized semantic segmentation by replacing traditional handcrafted feature engineering with an end-to-end, pixel-wise prediction approach, significantly enhancing both efficiency and adaptability [38]. The FCN leverages convolutional networks for hierarchical feature extraction, using skip connections and upsampling techniques to reconstruct semantic and spatial information. The model architecture consists of an encoder–decoder structure, where skip connections facilitate feature fusion, improving segmentation accuracy [39].

The encoder, based on deep CNNs, extracts hierarchical features through a series of convolution and pooling layers. These layers progressively capture high-level semantic information, reducing spatial resolution and expanding the receptive field while retaining essential object features [40]. As illustrated in Figure 5, the convolutional layers downsample an orchard image from an initial resolution of 256 × 256 pixels to 38 × 38 pixels, encoding the semantic content. The decoder then reconstructs the feature map using transposed convolutions or upsampling, gradually restoring the spatial structures while preserving the semantic integrity [41]. As shown in Figure 4, the decoder upscales the low-resolution feature map to the original input size, with its design being essential for accurate boundary detection and small-object segmentation. Modern variations in FCN integrate feature fusion and depthwise separable convolutions to further enhance segmentation precision. Skip connections bridge shallow encoder features with deeper decoder representations, mitigating the loss of spatial detail caused by pooling and improving segmentation accuracy, particularly for fine-grained structures like tree canopies. This mechanism ensures the effective integration of spatial features with upsampled outputs, which is vital for boundary segmentation and detecting small objects.

Compared to traditional CNNs, the FCN’s fully convolutional design eliminates fixed input size constraints and significantly reduces computational complexity, positioning it as a fundamental model in semantic segmentation [42]. The introduction of FCNs by Long et al. (2015) [43] replaced fully connected layers with 1 × 1 convolutions, incorporated skip connections, and applied multi-scale upsampling techniques. This adaptation allowed classification networks like AlexNet, VGG16, and GoogLeNet to be repurposed for segmentation, establishing the foundation for modern segmentation models. Further advancements have refined FCNs’ capabilities, especially in remote sensing applications. For example, SDFCNv2 improves the receptive field and reduces parameter size with hybrid convolutional blocks and spatial-channel fusion modules [44]. Generative adversarial networks (GANs) have been integrated into FCN models to enhance feature representations and segmentation accuracy. GAN-FCN, for instance, uses adversarial training to refine boundaries and reduce misclassification in high-variance regions [43]. Additionally, the Class-Wise FCN optimizes computational efficiency and segmentation accuracy through class-specific feature extraction [45]. These innovations have significantly advanced the FCN’s application in challenging remote sensing scenarios, reinforcing its role as a cornerstone of deep learning-based semantic segmentation.

2.4.3. STDC Backbone

Lightweight network architectures have gained significant attention for their ability to balance segmentation accuracy and computational efficiency, particularly in real-time applications operating under resource-constrained conditions. Representative backbones such as MobileNet and ShuffleNet achieve reduced computational complexity through depthwise separable convolutions and grouped convolutions with channel shuffling, respectively [46]. However, these architectures often suffer from limited cross-channel information interaction and reduced edge-preserving capabilities, which restrict their effectiveness in tasks requiring fine-grained structural parsing, such as tree canopy segmentation in orchard environments. The STDC architecture addresses these limitations by integrating short-term dense connections with multi-scale feature fusion. This design enhances semantic abstraction while preserving spatial detail, enabling the effective capture of complex textures and occluded canopy boundaries with minimal computational overhead. Two STDC variants—STDC1 and STDC2—are employed in this study to accommodate varying computational constraints [29]. STDC1 features a shallower and more compact design, offering lower latency and faster inference, making it ideal for deployment on mobile robotic platforms and edge devices. In contrast, STDC2 incorporates deeper layers and wider feature channels to enhance the semantic representation capacity while still maintaining real-time feasibility [47].

Recent studies have validated the effectiveness of STDC1 and STDC2 in lightweight semantic segmentation. Fan et al. (2021) [48] reported that STDC consistently outperforms MobileNetV3 in terms of both segmentation accuracy and inference speed on benchmarks such as Cityscapes. Gong et al. (2024) [49] further demonstrated that STDC achieves a superior accuracy–speed trade-off compared to MobileNetV3 and ShuffleNetV2 under standardized experimental settings. Similarly, Wang et al. (2024) [50] showed that STDC backbones maintain stronger feature representation and robustness than ShuffleNetV1 when trained under equivalent conditions. Based on this evidence, this study adopts STDC1 and STDC2 as encoder backbones for semantic segmentation and evaluates their performance in the real-time segmentation of apple tree canopies under orchard environments.

As illustrated in Figure 4 and Figure 5, the STDC encoder generates multi-scale feature maps that are fed into either the PP-LiteSeg or FCN decoder. These components are linked via structured feature transfer and resolution alignment, enabling the effective propagation of semantic information while preserving spatial consistency, thereby forming a unified end-to-end segmentation pipeline. Specifically, feature maps from Stage 3 (1/8 resolution) and Stage 4 (1/16 resolution) are transmitted to the UAFMs in the PP-LiteSeg decoder. Meanwhile, the output from Stage 5 (1/32 resolution) is refined by SPPM and subsequently integrated into the decoding stream. In contrast, as shown in Figure 5, the FCN decoder directly receives a high-level feature map (4096 channels) from the encoder, which is passed through a series of fully convolutional layers. The final output, consisting of 21 semantic channels, is then upsampled to the original resolution for dense, pixel-wise segmentation.

Functionally, the model architecture can be conceptualized as a sensory–cognitive system: the STDC encoder acts as the “sensory unit” that extracts hierarchical visual features, while the decoder (either PP-LiteSeg or FCNs) functions as the “cognitive unit” that interprets those features to generate spatially coherent segmentation outputs. PP-LiteSeg emphasizes computational efficiency and spatial detail preservation, while the FCN prioritizes architectural simplicity and robustness. This modular design enables comprehensive performance comparisons between encoder and decoder combinations under real-time orchard conditions and supports optimization for deployment on resource-limited platforms.

2.5. Comprehensive Metrics for Model Performance Evaluation

The performance of the segmentation models was evaluated using three key metrics: Intersection-over-Union (IoU), Precision (P), and Recall (R), as defined in Equations (1)–(3). IoU measures segmentation accuracy by evaluating the overlap between the predicted and actual regions relative to their union. P represents the proportion of correctly identified positive samples (true positives, TPs) among all predicted positives (TPs + false positives, FPs), reducing FPs. R measures the proportion of correctly identified positives (TPs) among all actual positives (TPs + false negatives, FNs), minimizing missed detections [51]. These metrics provide a comprehensive assessment of segmentation accuracy, ensuring the precise detection of apple tree canopies even under challenging conditions.

IoU = (P × R)/(P + R − P × R),

(1)

P = TP/(TP + FP),

(2)

R = TP/(TP + FN).

(3)

3. Results

3.1. Empirical Analysis of Model Training Performance

The models evaluated in this study—PP-LiteSeg and FCNs—utilized STDC1 and STDC2 as backbone networks, referred to as PP-LiteSeg_STDC1, PP-LiteSeg_STDC2, FCN_STDC1, and FCN_STDC2. Training was performed with a batch size of 4 over 10,000 iterations, optimizing the models by minimizing the loss function. Figure 6 presents the training loss curves, offering key insights into model convergence over 125 epochs. The vertical axis represents loss values, reflecting the progress of optimization, while the horizontal axis denotes the number of epochs. Initially, all models exhibit high loss values, which gradually decrease as the parameter updates improve alignment with the training data. The convergence of these loss curves indicates the stability and efficiency of the training process.

Among the evaluated models, FCN_STDC1 and FCN_STDC2 show faster convergence and lower final loss values compared to PP-LiteSeg_STDC1 and PP-LiteSeg_STDC2, suggesting superior optimization efficiency. Notably, FCN_STDC2 achieves the lowest and most stable loss curve, especially in the later training stages. Although both FCN models display a similar downward trend in the early epochs, FCN_STDC2 demonstrates a more rapid decline, indicating more effective parameter adjustments. After approximately 80 epochs, FCN_STDC2 maintains a smoother loss curve, while FCN_STDC1 exhibits fluctuations. This enhanced stability, coupled with a lower final loss, underscores FCN_STDC2′s superior optimization performance, making it the most effective architecture for tree canopy segmentation under the given conditions.

3.2. Comparative Evaluation of Model Performance

A comprehensive evaluation is crucial for assessing the effectiveness of segmentation models in apple tree canopy detection. This section applies the dataset and evaluation metrics outlined in Section 2.3 and Section 2.5 to assess segmentation accuracy and robustness. By comparing PP-LiteSeg and FCNs, both integrated with STDC1 and STDC2 backbone networks, this study investigates their ability to capture fine-grained canopy structures while maintaining performance across diverse orchard conditions.

Table 2 presents the evaluation results across different tree canopy size classes. For small tree canopies, FCN_STDC1 achieved the highest IoU of 0.6266. In the medium canopy class, PP-LiteSeg_STDC1 reached the highest IoU of 0.4815, while FCN_STDC1 demonstrated superior performance for large canopies, achieving an IoU of 0.7476. Regarding R, FCN_STDC1 consistently outperformed all other configurations across all size classes, with values of 0.7674 for small, 0.4958 for medium, and 0.8524 for large tree canopies. P varied across the models: FCN_STDC2 achieved the highest P (0.9305) for small canopies, PP-LiteSeg_STDC2 led for medium canopies (0.9376), and PP-LiteSeg_STDC1 attained the highest P (0.8942) for large canopies. Figure 7 visually summarizes these outcomes, illustrating comparative model performance across canopy size classes using (a) IoU, (b) P, and (c) R. Each metric provides unique insights into segmentation quality. IoU reflects the spatial overlap between predicted and ground truth regions, with higher values indicating more accurate boundary delineation. P measures the correctness of positive predictions, indicating the model’s ability to avoid false positives. R evaluates detection completeness, representing the model’s capability to fully capture true canopy areas. From the visual analysis in Figure 7, it is evident that while individual models may perform better under specific conditions, FCN_STDC1 consistently delivers robust and balanced performance across all canopy size classes, making it a strong candidate for generalized deployment in orchard environments.

Notably, FCN_STDC1 outperformed other models in 12 evaluation metrics, securing the highest IoU for small and large canopies as well as the background class. Additionally, it exhibited the best R across all canopy size classes and achieved the highest precision for background segmentation. These findings demonstrate FCN_STDC1′s robustness and reliability in orchard environments, providing a solid foundation for selecting an optimal segmentation model for real-time orchard applications. This is particularly relevant for precision spraying, where accurate tree canopy detection is critical for efficiency and adaptability.

3.3. Automated Recognition of Apple Tree Canopy Sizes

The accurate and rapid recognition of apple tree canopy sizes is crucial for automated orchard management and optimized variable-rate spraying. Deep learning-based segmentation models play a pivotal role not only in precisely delineating canopy boundaries, minimizing misclassifications, and enhancing overall recognition accuracy but also in determining the feasibility of real-time deployment through their inference speed and computational efficiency.

Figure 8 illustrates the segmentation performance of various model–backbone configurations using identical input images. From left to right, the three image sets represent small and medium, medium and large, and small and large apple canopy categories. The subfigures include (a) raw input images, (b) corresponding annotated images, and (c)–(f) segmentations produced by PP-LiteSeg_STDC1, PP-LiteSeg_STDC2, FCN_STDC1, and FCN_STDC2, respectively. In these visualizations, red denotes the background, green represents small canopy classes, yellow indicates medium canopy classes, and blue corresponds to large canopy classes. A comparative analysis of the segmentation outcomes reveals significant variations in boundary clarity, misclassification rates, and detail preservation across the models. FCN_STDC1 demonstrates the highest segmentation accuracy, effectively delineating canopy boundaries while minimizing background misclassification. This model excels in structural retention and classification stability, ensuring reliable differentiation between canopy size categories. FCN_STDC2 performs similarly in terms of segmentation accuracy and robustness, but it shows slightly less precise boundary delineation in certain instances, leading to a minor reduction in segmentation precision. In contrast, the PP-LiteSeg models perform relatively weaker. PP-LiteSeg_STDC1 occasionally misclassifies the background, while PP-LiteSeg_STDC2 suffers from more significant misclassification errors, particularly within small canopy categories, affecting overall segmentation reliability.

To evaluate deployment feasibility under real-world conditions, the models were tested on a low-power computing setup (Intel Core i7-11370H CPU and NVIDIA GeForce MX450 GPU). The average inference speeds were as follows: PP-LiteSeg_STDC1—30.4 FPS, PP-LiteSeg_STDC2—11.4 FPS, FCN_STDC1—27.8 FPS, and FCN_STDC2—10.3 FPS. Notably, PP-LiteSeg_STDC1 and FCN_STDC1 achieved near real-time performance, making them suitable for mobile agricultural systems. While PP-LiteSeg_STDC1 offers a slight edge in processing speed, it suffers from increased misclassification, particularly for small canopies. In contrast, FCN_STDC1 provides superior segmentation accuracy—including clearer boundary delineation and reduced background noise—without compromising much on speed. This balance between accuracy and efficiency is crucial for ensuring consistent performance in orchard.

The inference speed of FCN_STDC1, averaging around 36 milliseconds per frame, falls well within the acceptable range for real-time decision-making in variable-rate spraying applications. In practical orchard scenarios, where the perception–decision–actuation loop includes image acquisition, segmentation, control computation, navigation, and spray execution, the segmentation step contributes minimally to overall system latency. Moreover, the achieved frame rate satisfies the typical threshold for real-time agricultural automation (25–30 FPS), ensuring sufficient temporal resolution for consistent canopy tracking and timely spray activation, even under hardware constraints. In summary, FCN_STDC1 strikes an optimal balance between segmentation accuracy, computational efficiency, and real-time applicability, making it a highly promising candidate for deployment in precision orchard spraying systems.

4. Discussion

This study provides a systematic analysis of the segmentation performance of various model–backbone configurations for apple tree canopy size recognition. The results show that FCN_STDC1 achieved the highest segmentation accuracy, effectively distinguishing canopies of different sizes while minimizing misclassification. In contrast, FCN_STDC2 demonstrated faster convergence and greater optimization stability during training, as reflected in its loss curve. This model exhibited more efficient parameter adjustments, smoother loss reduction, and fewer fluctuations, all while maintaining high segmentation accuracy. These traits make FCN_STDC2 particularly advantageous in resource-constrained environments where stable model training with minimal computational overhead is required.

Notably, FCN_STDC2 also achieved the highest R in background classification, highlighting its strong ability to differentiate tree canopies from the background and minimize false positives. However, its slightly lower IoU compared to FCN_STDC1 in fine-grained boundary segmentation suggests that FCN_STDC2′s optimization strategy prioritizes overall structural stability over precise edge delineation. This distinction makes FCN_STDC1 more suitable for applications that demand highly precise boundary detection, whereas FCN_STDC2 is preferable for orchard management systems that prioritize efficient training and deployment without significant trade-offs in segmentation performance.

Among all the evaluated models, PP-LiteSeg_STDC1 achieved the highest inference speed, reaching 30.4 FPS on a low-power computing platform, thereby demonstrating a clear advantage in real-time processing efficiency. However, despite its speed, the PP-LiteSeg models exhibited notably lower segmentation accuracy—particularly for smaller canopy classes—where reduced R values highlighted their limited detection capability. This deficiency often led to missed canopy regions, compromising overall segmentation performance. The primary cause lies in PP-LiteSeg’s streamlined architecture, which prioritizes inference speed over robust multi-scale feature representation. Its relatively shallow network depth and constrained receptive field limit the model’s ability to capture subtle spatial patterns, especially within small or fragmented canopy areas. While this architecture is advantageous for applications where real-time processing is essential and high precision is not mandatory, it falls short in scenarios demanding fine-grained and accurate segmentation. To overcome these limitations, further improvements may focus on strengthening the feature fusion mechanisms and integrating scale-aware decoding strategies. Such enhancements could enable the model to better capture complex spatial hierarchies across varying canopy sizes, thereby improving segmentation fidelity without significantly increasing computational overhead.

This study introduces a novel integration of lightweight STDC1 and STDC2 backbones into two representative pixel-wise segmentation architectures—PP-LiteSeg and FCNs—and conducts a comparative evaluation of the four resulting configurations for apple tree canopy size recognition in support of automated variable-rate spraying. The implementation of three-level canopy size segmentation (small, medium, large) provides a practical framework for adaptive spraying strategies, supporting reduced agrochemical usage, enhanced spray accuracy, and improved environmental sustainability in orchard management. Among the models, FCN_STDC1 demonstrated the most balanced performance, excelling in segmentation accuracy, boundary delineation, and real-time inference. These characteristics position it as a strong candidate for integration into intelligent spraying systems requiring both accuracy and a timely response.

To comprehensively evaluate model performance, this study compared the detection and segmentation results with several existing methods. As summarized in Table 3, the proposed FCN_STDC1 achieved the highest overall metrics, with an average IoU of 0.70, P of 0.88, and R of 0.78, while maintaining real-time inference at 27.80 FPS on an NVIDIA GeForce MX450 GPU. In contrast, alternative approaches such as RetinaNet, DeepForest, and Detectree2 demonstrated lower accuracy and either lacked speed evaluations or failed to offer integrated classification and segmentation functionalities. These findings suggest that FCN_STDC1 in this study not only delivers high-precision canopy-size-aware segmentation across a range of crown scales but also meets the demands of real-time deployment, making it highly suitable for precision orchard spraying applications.

Despite its contributions, this study has several limitations. The dataset—collected using a robotic RGB-D system during under typical orchard conditions—includes diverse canopy sizes, spatial layouts, and viewing angles representative of real-world spraying scenarios. While this provides structured and consistent data aligned with this study’s objectives, its confinement to a single orchard environment limits variability in environmental conditions, orchard types, and canopy architectures. This restriction may constrain the model’s generalizability to broader agricultural contexts. Expanding the dataset to include multiple field environments with greater visual and structural diversity would promote stronger feature generalization, reduce dependence on localized patterns, and mitigate overfitting to scene-specific characteristics. Such enhancements would improve the model’s robustness and adaptability for deployment across varied orchard conditions.

Additionally, this study introduces only minor architectural optimizations, instead focusing on achieving accurate and efficient inference through innovative combinations within established lightweight segmentation frameworks. This design strategy prioritizes practical applicability in automated variable-rate spraying, particularly under the constraints of edge-based deployment. Although this study does not pursue groundbreaking advancements in network architecture, the proposed approach effectively balances inference speed, segmentation accuracy, and model compactness. To meet the increasing requirements for segmentation accuracy and processing efficiency in intelligent and fully automated orchard systems, it is crucial to develop more advanced models and benchmark them against state-of-the-art methods to ensure robustness and real-world applicability.

Moreover, as a promising future direction, integrating multi-modal sensing—such as depth, multispectral imagery, thermal data, and LiDAR—can enhance perception. These complementary modalities offer richer scene understanding beyond RGB input, contributing to more robust segmentation under challenging orchard conditions. Advancing canopy detection through effective multi-sensor fusion will support the development of intelligent, adaptive variable-rate spraying systems that improved precision, efficiency, and environmental sustainability. Further research is needed to develop and validate sensor fusion strategies suited to real-world agricultural deployments.

5. Conclusions

This study introduced a novel deep learning-based semantic segmentation approach that accurately classifies apple tree canopies into small, medium, and large categories using RGB imagery. The proposed method effectively distinguishes canopy sizes from the background, optimizing its potential for precision agriculture applications. Two specialized model implementations were developed: FCN_STDC1, which achieved high segmentation accuracy and precise canopy boundary detection (IoU up to 0.75, R up to 0.85, and P around 0.85), and FCN_STDC2, which demonstrated exceptional canopy-background differentiation with a background classification R of 0.9942, the lowest and most stable loss curve, and faster convergence. These features make FCN_STDC2 particularly suitable for resource-constrained orchard management, offering enhanced optimization stability and training efficiency. By enabling the automatic differentiation of canopy sizes, this method improves pesticide application efficiency, allowing for the dynamic adjustment of spray volumes based on tree dimensions. Future research should focus on validating the generalizability of this method across a broader range of evaluation metrics and experimental conditions. To support practical deployment, future models must also demonstrate stable real-time performance under field constraints such as limited computational resources, variable lighting, and dynamic environmental conditions. Extensive field testing across diverse orchard types, geographic locations, and operational scenarios will be essential to assess the robustness and scalability of the approach. Such efforts will further contribute to the broader advancement of spraying robotics and object recognition technologies in orchard environments, ultimately promoting more effective, efficient, and sustainable agricultural practices.

Author Contributions

Writing—original draft: T.J.; writing—review and editing: X.H.; software: T.J.; methodology: T.J.; investigation: T.J., S.M.K., N.R.K. and H.R.K.; formal analysis: T.J., S.M.K., N.R.K. and H.R.K.; data curation: T.J.; conceptualization: X.H.; validation: T.J., S.M.K., N.R.K. and H.R.K.; supervision: X.H.; resources: X.H.; project administration: X.H. All authors have read and agreed to the published version of this manuscript.

Funding

This work was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP)—Innovative Human Resource Development for Local Intellectualization program grant funded by the Korea government (MSIT) (IITP-2025-RS-2023-00260267).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors state that none of the work presented in this study has been influenced by any known conflicting financial interests or relationships.

References

Pérez-Lucas, G.; Navarro, G.; Navarro, S. Adapting agriculture and pesticide use in mediterranean regions under climate change scenarios: A comprehensive review. Eur. J. Agron. 2024, 161, 127337. [Google Scholar] [CrossRef]
Omoyajowo, K. Sustainable Environmental Policies: An Impact Analysis of U.S. Regulations on Pesticides and Chemical Discharges. SSRN Preprints 2024.4943299. Available online: https://dx.doi.org/10.2139/ssrn.4943299 (accessed on 9 February 2025).
Statista. Forecast Agricultural Consumption of Pesticides Worldwide from 2023 to 2027 (in 1000 Metric Tons). Available online: https://www.statista.com/statistics/1401556/global-agricultural-use-of-pesticides-forecast/ (accessed on 9 February 2025).
Khan, Z.; Liu, H.; Shen, Y.; Zeng, X. Deep learning improved yolov8 algorithm: Real-time precise instance segmentation of crown region orchard canopies in natural environment. Comput. Electron. Agric. 2024, 224, 109168. [Google Scholar] [CrossRef]
Kansotia, K.; Saikanth, D.R.; Kumar, R.; Pamala, P.; Kumar, P.; Pandey, S.; Kushwaha, T. Role of modern technologies in plant disease management: A comprehensive review of benefits, challenges, and future perspectives. Int. J. Environ. Clim. Change 2023, 13, 1325–1335. [Google Scholar] [CrossRef]
Liu, L.; Liu, Y.; He, X.; Liu, W. Precision variable-rate spraying robot by using single 3D lidar in orchards. Agronomy 2022, 12, 2509. [Google Scholar] [CrossRef]
Liu, H.; Du, Z.; Shen, Y.; Du, W.; Zhang, X. Development and evaluation of an intelligent multivariable spraying robot for orchards and nurseries. Comput. Electron. Agric. 2024, 222, 109056. [Google Scholar] [CrossRef]
Maghsoudi, H.; Minaei, S.; Ghobadian, B.; Masoudi, H. Ultrasonic sensing of pistachio canopy for low-volume precision spraying. Comput. Electron. Agric. 2015, 112, 149–160. [Google Scholar] [CrossRef]
Nan, Y.; Zhang, H.; Zheng, J.; Yang, K.; Ge, Y. Low-volume precision spray for plant pest control using profile variable rate spraying and ultrasonic detection. Front. Plant Sci. 2022, 13, 1042769. [Google Scholar] [CrossRef]
Luo, S.; Wen, S.; Zhang, L.; Lan, Y.; Chen, X. Extraction of crop canopy features and decision-making for variable spraying based on unmanned aerial vehicle lidar data. Comput. Electron. Agric. 2024, 224, 109197. [Google Scholar] [CrossRef]
Xue, X.; Luo, Q.; Bu, M.; Li, Z.; Lyu, S.; Song, S. Citrus tree canopy segmentation of orchard spraying robot based on RGB-D image and the improved DeepLabv3+. Agronomy 2023, 13, 2059. [Google Scholar] [CrossRef]
Wang, X.; Tang, J.; Whitty, M. Side-view apple flower mapping using edge-based fully convolutional networks for variable rate chemical thinning. Comput. Electron. Agric. 2020, 178, 105673. [Google Scholar] [CrossRef]
Gu, C.; Wang, X.; Wang, X.; Yang, F.; Zhai, C. Research progress on variable-rate spraying technology in orchards. Appl. Eng. Agric. 2020, 36, 927–942. [Google Scholar] [CrossRef]
Mahmud, M.S.; Zahid, A.; He, L.; Martin, P. Opportunities and possibilities of developing an advanced precision spraying system for tree fruits. Sensors 2021, 21, 3262. [Google Scholar] [CrossRef] [PubMed]
Abbas, I.; Liu, J.; Faheem, M.; Noor, R.S.; Shaikh, S.A.; Solangi, K.A.; Raza, S.M. Different sensor based intelligent spraying systems in agriculture. Sens. Actuators A Phys. 2020, 316, 112265. [Google Scholar] [CrossRef]
Azizi, A.; Abbaspour-Gilandeh, Y.; Vannier, E.; Dusséaux, R.; Mseri-Gundoshmian, T.; Moghaddam, H.A. Semantic segmentation: A modern approach for identifying soil clods in precision farming. Biosyst. Eng. 2020, 196, 172–182. [Google Scholar] [CrossRef]
Abioye, A.E.; Larbi, P.A.; Hadwan, A.A.K. Deep learning guided variable rate robotic sprayer prototype. Smart Agric. Technol. 2024, 9, 100540. [Google Scholar] [CrossRef]
Hussain, N.; Farooque, A.A.; Schumann, A.W.; McKenzie-Gopsill, A.; Esau, T.; Abbas, F.; Acharya, B.; Zaman, Q. Design and development of a smart variable rate sprayer using deep learning. Remote Sens. 2020, 12, 4091. [Google Scholar] [CrossRef]
Asaei, H.; Jafari, A.; Loghavi, M. Site-specific orchard sprayer equipped with machine vision for chemical usage management. Comput. Electron. Agric. 2019, 162, 431–439. [Google Scholar] [CrossRef]
Pathan, R.K.; Lim, W.L.; Lau, S.L.; Ho, C.C.; Khare, P.; Koneru, R.B. Experimental analysis of U-Net and Mask R-CNN for segmentation of synthetic liquid spray. In Proceedings of the 2022 IEEE International Conference on Computing (ICOCO), Kota Kinabalu, Malaysia, 14–16 November 2022; pp. 237–242. [Google Scholar] [CrossRef]
Li, X.; Duan, F.; Hu, M.; Hua, J.; Du, X. Weed density detection method based on a high weed pressure dataset and improved PSP Net. IEEE Access 2023, 11, 98244–98255. [Google Scholar] [CrossRef]
Zhao, J.; Li, Z.; Lei, Y.; Huang, L. Application of UAV RGB images and improved PSP Net network to the identification of wheat lodging areas. Agronomy 2023, 13, 1309. [Google Scholar] [CrossRef]
Chang, Z.; Li, H.; Chen, D.; Liu, Y.; Zou, C.; Chen, J.; Han, W.; Liu, S.; Zhang, N. Crop type identification using high-resolution remote sensing images based on an improved DeepLabv3+ network. Remote Sens. 2023, 15, 5088. [Google Scholar] [CrossRef]
Czymmek, V.; Köhn, C.; Harders, L.O.; Hussmann, S. Review of energy-efficient embedded system acceleration of convolution neural networks for organic weeding robots. Agriculture 2023, 13, 2103. [Google Scholar] [CrossRef]
Son, H.; Weiland, J. Lightweight Semantic Segmentation Network for Semantic Scene Understanding on Low-Compute Devices. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 62–69. [Google Scholar] [CrossRef]
Zhang, J.; Qiang, Z.; Lin, H.; Chen, Z.; Li, K.; Zhang, S. Research on tobacco field semantic segmentation method based on multispectral unmanned aerial vehicle data and improved PP-LiteSeg model. Agronomy 2024, 14, 1502. [Google Scholar] [CrossRef]
Liu, Y.; Chu, L.; Chen, G.; Wu, Z.; Chen, Z.; Lai, B.; Hao, Y. PaddleSeg: A high-efficient development toolkit for image segmentation. arXiv 2021, arXiv:2101.06175. [Google Scholar] [CrossRef]
Torralba, A.; Russell, B.C.; Yuen, J. Labelme: Online image annotation and applications. Proc. IEEE 2010, 98, 1467–1484. [Google Scholar] [CrossRef]
Peng, J.; Liu, Y.; Tang, S.; Hao, Y.; Chu, L.; Chen, G.; Wu, Z.; Chen, Z.; Yu, Z.; Du, Y.; et al. PP- LiteSeg: A superior real-time semantic segmentation model. arXiv 2022, arXiv:2204.02681. [Google Scholar] [CrossRef]
Zheng, Z.; Yuan, J.; Yao, W.; Yao, H.; Liu, Q.; Guo, L. Crop classification from drone imagery based on lightweight semantic segmentation methods. Remote Sens. 2024, 16, 4099. [Google Scholar] [CrossRef]
Yang, M.; Huang, C.; Li, Z.; Shao, Y.; Yuan, J.; Yang, W.; Song, P. Autonomous navigation method based on RGB-D camera for a crop phenotyping robot. J. Field Robot. 2024, 41, 2663–2675. [Google Scholar] [CrossRef]
Zhao, S.; Zhao, X.; Huo, Z.; Zhang, F. BMSeNet: Multiscale context pyramid pooling and spatial detail enhancement network for real-time semantic segmentation. Sensors 2024, 24, 5145. [Google Scholar] [CrossRef]
Cao, X.; Tian, Y.; Yao, Z.; Zhao, Y.; Zhang, T. Semantic segmentation network for unstructured rural roads based on improved SPPM and fused multiscale features. Appl. Sci. 2024, 14, 8739. [Google Scholar] [CrossRef]
Xu, H.; Li, C.; Liu, Y.; Shao, G.; Hussain, Z.K. A Fast Rooftop Extraction Deep Learning Method Based on PP-LiteSeg for UAV Imagery. In Proceedings of the 2023 IEEE International Conference on Artificial Intelligence and Automation Control (AIAC), Xiamen, China, 17–19 November 2023; pp. 70–76. [Google Scholar] [CrossRef]
Tan, Y.; Li, X.; Lai, J.; Ai, J. Real-time tunnel lining leakage image semantic segmentation via multiple attention mechanisms. Meas. Sci. Technol. 2024, 35, 075204. [Google Scholar] [CrossRef]
Qu, G.; Wu, Y.; Lv, Z.; Zhao, D.; Lu, Y.; Zhou, K.; Tang, J.; Zhang, Q.; Zhang, A. Road-MobileSeg: Lightweight and accurate road extraction model from remote sensing images for mobile devices. Sensors 2024, 24, 531. [Google Scholar] [CrossRef] [PubMed]
Zhang, W.; Ling, M. An improved point cloud denoising method in adverse weather conditions based on PP-LiteSeg network. PeerJ Comput. Sci. 2024, 10, e1832. [Google Scholar] [CrossRef] [PubMed]
Dang, T.-V.; Bui, N.-T. Multi-scale fully convolutional network-based semantic segmentation for mobile robot navigation. Electronics 2023, 12, 533. [Google Scholar] [CrossRef]
Bilinski, P.; Prisacariu, V. Dense decoder shortcut connections for single-pass semantic segmentation. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 6596–6605. [Google Scholar]
Xing, Y.; Zhong, L.; Zhong, X. An Encoder-Decoder Network Based FCN Architecture for Semantic Segmentation. Wirel. Commun. Mob. Com. 2020, 2020, 8861886. [Google Scholar] [CrossRef]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
Chen, G.; Tan, X.; Guo, B.; Zhu, K.; Liao, P.; Wang, T.; Wang, Q.; Zhang, X. SDFCNv2: An improved FCN framework for remote sensing images semantic segmentation. Remote Sens. 2021, 13, 4902. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Tian, L.; Zhong, X.; Chen, M. Semantic segmentation of remote sensing image based on GAN and FCN network model. Sci. Program. 2021, 2021, 9491376. [Google Scholar] [CrossRef]
Tian, T.; Chu, Z.; Hu, Q.; Ma, L. Class-wise fully convolutional network for semantic segmentation of remote sensing images. Remote Sens. 2021, 13, 3211. [Google Scholar] [CrossRef]
Wang, C.; Chen, X.; Wang, B.; Zhang, L.; Liu, B. SLTM network: Efficient application of lightweight image segmentation technology in detecting drivable areas for unmanned line-marking machines. IEEE Access 2024, 12, 169001–169012. [Google Scholar] [CrossRef]
Wang, X.; Li, Z.; Zhang, Y.; An, G. Water level recognition based on deep learning and character interpolation strategy for stained water gauge. River 2023, 2, 506–517. [Google Scholar] [CrossRef]
Fan, M.; Lai, S.; Huang, J.; Wei, X.; Chai, Z.; Luo, J.; Wei, X. Rethinking BiSeNet for Real-Time Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Online, 20–25 June 2021; pp. 9716–9725. [Google Scholar]
Gong, D.; Du, Y.; Li, Z.; Li, D.; Xi, W. Semantic Segmentation for Tool Wear Monitoring: A Comparative Study. In Proceedings of the Seventh International Symposium on Laser Interaction with Matter (LIMIS), Shenzhen, China, 20–22 November 2024. Proc. SPIE 2025, 13543, 135430Y. [Google Scholar] [CrossRef]
Wang, X.; Liu, R.; Yang, X.; Zhang, Q.; Zhou, D. MCFNet: Multi-attentional class feature augmentation network for real-time scene parsing. ACM Trans. Multimedia Comput. Commun. Appl. 2024, 20, 166. [Google Scholar] [CrossRef]
Jin, T.; Han, X.; Wang, P.; Zhang, Z.; Guo, J.; Ding, F. Enhanced deep learning model for apple detection, localization, and counting in complex orchards for robotic arm-based harvesting. Smart Agric. Technol. 2025, 10, 100784. [Google Scholar] [CrossRef]
Culman, M.; Delalieux, S.; Van Tricht, K. Individual palm tree detection using deep learning on RGB imagery to support tree inventory. Remote Sens. 2020, 12, 3476. [Google Scholar] [CrossRef]
Gan, Y.; Wang, Q.; Iio, A. Tree crown detection and delineation in a temperate deciduous forest from UAV RGB imagery using deep learning approaches: Effects of spatial resolution and species characteristics. Remote Sens. 2023, 15, 778. [Google Scholar] [CrossRef]
Weinstein, B.G.; Marconi, S.; Bohlman, S.; Zare, A.; White, E. Individual tree-crown detection in RGB imagery using semi-supervised deep learning neural networks. Remote Sens. 2019, 11, 1309. [Google Scholar] [CrossRef]

Figure 1. Flowchart depicting the comprehensive study process, from data acquisition and dataset preparation to model development, performance evaluation, and final implementation.

Figure 2. Overview of the experimental field (coordinates: 37°30′ N, 128°15′ E) alongside the data acquisition platform, which includes an RGB-D depth camera, a robotic UGV platform, and a computer.

Figure 3. Representative dataset annotations showcasing the distinct labeling of small (red color) and large tree (green color) canopies.

Figure 4. Schematic overview of the PP-LiteSeg model architecture, highlighting its lightweight design and optimized structure for efficient semantic segmentation.

Figure 5. Schematic illustration of the FCN model architecture, showcasing its effective handling of variable object sizes while preserving key spatial details for robust semantic segmentation.

Figure 6. Training loss curves of PP-LiteSeg_STDC1, PP-LiteSeg_STDC2, FCN_STDC1, and FCN_STDC2 over 125 epochs, illustrating their convergence behavior and comparative performance.

Figure 7. Comparative visualization of evaluation metrics across tree canopy size classes for various model–backbone configurations, illustrating (a) IoU, (b) P, and (c) R.

Figure 8. Sequential visualization of segmentation results for paired apple canopy classes (small and medium, medium and large, and small and large) from left to right, depicting (a) raw input images, (b) ground truth annotations, and predictions from (c) PP-LiteSeg_STDC1, (d) PP-LiteSeg_STDC2, (e) FCN_STDC1, and (f) FCN_STDC2.

Table 1. Technical specifications for the Intel RealSense D435i RGB-D camera and AgileX Bunker robotic platform.

Product Name	Intel RealSense D435i	AgileX Bunker
Size	90 × 25 × 25 mm	1020 × 760 × 360 mm
Weight	0.72 kg	130 kg
Battery	Powered via USB-C	48 V/30 Ah Lithium battery
Maximum angle degree	Not applicable	35°
Speed range	Not applicable	0–15 m/s
Receiver	Not applicable	2.4 GHz/Max distance 1 km
Communication interface	USB-C	CAN
Field of view (FOV)	Horizontal: 87°; vertical: 58°; diagonal: 95°	Not applicable
Brushless servo motor	Not applicable	2 × 650 W
Resolution	Depth: 1280 × 720 pixels; RGB: 1920 × 1080 pixels	Not applicable
Frame rate	Depth: Up to 90 fps; RGB: 30 fps	Not applicable

Table 2. Comparative evaluation of model–backbone configurations across tree canopy size classes.

Model	Backbone	Class	IoU	Precision	Recall
PP-LiteSeg	STDC1	Background	0.9525	0.9647	0.9823
		Small	0.5020	0.7187	0.6902
		Medium	0.4815	0.8586	0.4399
		Large	0.7147	0.8942	0.8323
PP-LiteSeg	STDC2	Background	0.9511	0.9562	0.9045
		Small	0.4887	0.8918	0.5106
		Medium	0.4669	0.9376	0.4810
		Large	0.7340	0.8770	0.8183
FCN	STDC1	Background	0.9576	0.9712	0.9855
		Small	0.6266	0.7735	0.7674
		Medium	0.4706	0.9022	0.4958
		Large	0.7476	0.8588	0.8524
FCN	STDC2	Background	0.9570	0.9624	0.9942
		Small	0.5912	0.9305	0.6311
		Medium	0.4761	0.9315	0.4933
		Large	0.7258	0.8851	0.8014

Table 3. Comparative performance of various methods for apple tree canopy detection and segmentation.

Method	Object	Accuracy	Speed	Reference
RetinaNet	Detection (orchard scene)	P: 0.79; R: 0.65	No evaluation	[52]
DeepForest	Detection and delineation	P: 0.59; R: 0.46	No evaluation	[53]
Detectree2	Detection and delineation	P: 0.66; R: 0.50	No evaluation	[53]
Semi-supervised	Detection	IoU: 0.5; P: 0.69; R: 0.61	No evaluation	[54]
FCN_STDC1	Segmentation and classification of different sizes	IoU: 0.70; P: 0.88; R: 0.78	27.8 FPS	This study

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jin, T.; Kang, S.M.; Kim, N.R.; Kim, H.R.; Han, X. Comparative Analysis of CNN-Based Semantic Segmentation for Apple Tree Canopy Size Recognition in Automated Variable-Rate Spraying. Agriculture 2025, 15, 789. https://doi.org/10.3390/agriculture15070789

AMA Style

Jin T, Kang SM, Kim NR, Kim HR, Han X. Comparative Analysis of CNN-Based Semantic Segmentation for Apple Tree Canopy Size Recognition in Automated Variable-Rate Spraying. Agriculture. 2025; 15(7):789. https://doi.org/10.3390/agriculture15070789

Chicago/Turabian Style

Jin, Tantan, Su Min Kang, Na Rin Kim, Hye Ryeong Kim, and Xiongzhe Han. 2025. "Comparative Analysis of CNN-Based Semantic Segmentation for Apple Tree Canopy Size Recognition in Automated Variable-Rate Spraying" Agriculture 15, no. 7: 789. https://doi.org/10.3390/agriculture15070789

APA Style

Jin, T., Kang, S. M., Kim, N. R., Kim, H. R., & Han, X. (2025). Comparative Analysis of CNN-Based Semantic Segmentation for Apple Tree Canopy Size Recognition in Automated Variable-Rate Spraying. Agriculture, 15(7), 789. https://doi.org/10.3390/agriculture15070789

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparative Analysis of CNN-Based Semantic Segmentation for Apple Tree Canopy Size Recognition in Automated Variable-Rate Spraying

Abstract

1. Introduction

2. Materials and Methods

2.1. Overall Workflow

2.2. Data Acquisition

2.3. Dataset Preparation

2.4. Enhanced Deep CNN Architectures for Semantic Segmentation

2.4.1. PP-LiteSeg Model

2.4.2. FCN Model

2.4.3. STDC Backbone

2.5. Comprehensive Metrics for Model Performance Evaluation

3. Results

3.1. Empirical Analysis of Model Training Performance

3.2. Comparative Evaluation of Model Performance

3.3. Automated Recognition of Apple Tree Canopy Sizes

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI