GL-SeqNet: Global–Local Fusion for Intra-Cluster Tomato Harvesting Sequence Optimization
Abstract
1. Introduction
- (1)
- A novel GL-SeqNet model is proposed that integrates feature information from the entire tomato cluster and individual tomatoes to realize stepwise optimal target selection.
- (2)
- An AI-based image object-removal tool is introduced to simulate intra-cluster structural changes in tomatoes after each harvesting action, enabling the construction of a sequence state evolution dataset that better aligns training with real-world conditions.
- (3)
- A systematic comparison is conducted among RankNet, ListNet, and mean squared error (MSE) loss functions, as well as different input resolutions, to analyze their effects on ranking quality and system real-time performance.
2. Materials and Methods
2.1. Image Acquisition
2.2. Dataset Construction
2.3. GL-SeqNet Model
2.3.1. GL-SeqNet Model Architecture
- (1)
- Input and dual-stream encoding: The cluster-cropped image is used as the input to the global branch, while five mature fruit local images are provided as inputs to the local branch. Features are extracted by lightweight modules, the global backbone and the local backbone, producing a global feature and a local feature . The activations use Leaky ReLU with a dropout rate of 0.2. Both the global backbone and the local backbone extract features through four cascaded Residual Blocks. Each Residual Block comprises two 3 × 3 convolutional layers (Conv), each followed by batch normalization (BN) and a Leaky ReLU activation, with skip connections configured according to the input–output dimensions and stride. The channel width is progressively expanded to 16, 32, and 64, ensuring sufficient representational capacity while alleviating gradient vanishing through residual connections.
- (2)
- Global and local feature fusion: After feature extraction by the global and local branches, feature concatenation (cat) is applied for fusion. Specifically, the global feature vector is broadcast along the candidate dimension to [B, N, D], and concatenated with each local feature along the channel dimension, resulting in a fused representation of [B, N, 2D].
- (3)
- Ranking head: Following the above extraction and fusion, a fully connected layer outputs a priority score for each candidate fruit, where a larger score indicates a higher harvesting priority. During training, the scores can be optimized with RankNet [41], ListNet [42,43], or MSE [44] losses. During inference, the scores are sorted in descending order to generate the intra-cluster harvesting sequence.
2.3.2. Model Loss Function
2.4. Evaluation Metrics
2.5. Experimental Setups
3. Results
3.1. Experiments on Different Resolutions and Loss Functions
3.2. Comparison of Different Global and Local Resolution Combinations
4. Discussion
- (1)
- This study focused on static image-based ranking. It did not systematically assess robustness under dynamic conditions such as wind disturbances and foliage motion, which are common in practical greenhouse environments.
- (2)
- Although GL-SeqNet effectively models intra-cluster harvesting sequence optimization, it does not incorporate joint optimization with related tasks such as manipulator grasp pose estimation.
- (3)
- The current dataset scale remains relatively limited and was collected under specific greenhouse conditions. In addition, part of the dataset was constructed using AI-based object removal to simulate post-harvesting cluster states, which may still differ from real harvesting scenarios. Therefore, further validation using multi-greenhouse, multi-device, and real sequential harvesting datasets is still required to comprehensively evaluate the robustness and generalization capability of the proposed method.
- (4)
- The current framework focuses on image-level intra-cluster harvesting sequence prediction and does not yet integrate key robotic components such as grasp pose estimation, motion planning, and collision avoidance. As a result, the proposed method should be regarded as a decision-level ranking module rather than a fully deployed robotic harvesting system. This also limits the direct evaluation of end-to-end robotic performance in real-world harvesting tasks.
5. Conclusions
- (1)
- With respect to resolution, low-resolution inputs can ensure high prediction accuracy while significantly improving inference efficiency.
- (2)
- With respect to loss functions, the ranking-based RankNet and ListNet substantially outperform MSE, better capturing the priority relationships among fruits within a cluster.
- (3)
- The best overall performance was achieved with a global resolution of 112 × 112 and a local resolution of 56 × 56, yielding a Top-1 accuracy of 0.950, a PMR of 0.970, and an inference time of only 22.6 ms, along with faster convergence.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Mal, S.; Sarkar, D.; Mandal, B.; Basak, P.; Debnath, S.; Chattopadhyay, A.; Pramanik, K. Improving quality of tomato (Solanum lycopersicum L.) fruits for fresh consumption and processing with optimised boron application. J. Food Compos. Anal. 2025, 140, 107255. [Google Scholar] [CrossRef]
- Szabo, K.; Varvara, R.A.; Ciont, C.; Macri, A.M.; Vodnar, D.C. An updated overview on the revalorization of bioactive compounds derived from tomato production and processing by-products. J. Clean. Prod. 2025, 497, 145–151. [Google Scholar] [CrossRef]
- Zhang, J.; Xiang, L.; Liu, Y.; Jing, D.; Zhang, L.; Liu, Y.; Li, J. Optimizing irrigation schedules of greenhouse tomato based on a comprehensive evaluation model. Agric. Water Manag. 2024, 295, 108741. [Google Scholar] [CrossRef]
- Hidgot, A.; Zeweld, W. Measuring and expounding technical and cost efficiencies of smallholder tomato producers in Northern Ethiopia. Clean. Circ. Bioecon. 2024, 9, 100124. [Google Scholar] [CrossRef]
- Nguyen, G.N.; Singh, Z. Recent advances in research and development for vegetable crops under protected cultivation. Front. Plant Sci. 2024, 15, 1459919. [Google Scholar] [CrossRef]
- Wang, J.; Shan, C.; Gou, F.; Qian, Z.; Ni, Y.; Liu, Z.; Jin, C. A review of key technologies and intelligent applications in soybean mechanized harvesting: Chinese and international perspectives. Biosyst. Eng. 2025, 50, 79–104. [Google Scholar] [CrossRef]
- Rong, J.; Zheng, W.; Qi, Z.; Yuan, T.; Wang, P. RTMFusion: An enhanced dual-stream architecture algorithm fusing RGB and depth features for instance segmentation of tomato organs. Measurement 2025, 239, 115484. [Google Scholar] [CrossRef]
- Sun, T.; Zhang, W.; Gao, X.; Zhang, W.; Li, N.; Miao, Z. Efficient occlusion avoidance based on active deep sensing for harvesting robots. Comput. Electron. Agric. 2024, 225, 109360. [Google Scholar] [CrossRef]
- Badgujar, C.M.; Poulose, A.; Gan, H. Agricultural object detection with You Only Look Once (YOLO) algorithm: A bibliometric and systematic literature review. Comput. Electron. Agric. 2024, 223, 109090. [Google Scholar] [CrossRef]
- Zhang, Y.; Chuah, J.H. An intelligent grading system for mangosteen based on improved convolutional neural network. Knowl. Based Syst. 2025, 309, 112904. [Google Scholar] [CrossRef]
- Zhang, F.; Jin, X.; Jiang, J.; Lin, G.; Wang, M.; An, S.; Lyu, Q. Fine-grained recognition of citrus varieties via wavelet channel attention network. Knowl. Based Syst. 2025, 311, 113128. [Google Scholar] [CrossRef]
- Bhattarai, U.; Karkee, M. A weakly-supervised approach for flower/fruit counting in apple orchards. Comput. Ind. 2022, 138, 103635. [Google Scholar] [CrossRef]
- Ren, Z.; Tang, X.; Ren, G.; Wu, D. Research on improved fast-RCNN target detection algorithm based on Kolmogorov-Arnold network. Appl. Intell. 2026, 56, 63. [Google Scholar] [CrossRef]
- Chen, Y.; Xie, X.; Yin, W.; Li, B.A.; Li, F. Structure guided network for human pose estimation. Appl. Intell. 2023, 53, 21012–21026. [Google Scholar] [CrossRef]
- Jiang, Z.; An, T.; Tong, Z.; Li, Z.; Du, Y.; Xie, T.; Li, R. MTF-Net: A mediator transformer-based fusion network with MOE for 6D object pose estimation. Knowl. Based Syst. 2025, 330, 114674. [Google Scholar] [CrossRef]
- Wu, S.; Wang, B. DRSI-Net: Dual-residual spatial interaction network for multi-person pose estimation. Knowl. Based Syst. 2024, 295, 111836. [Google Scholar] [CrossRef]
- Żywanowski, K.; Łysakowski, M.; Nowicki, M.R.; Jacques, J.T.; Tadeja, S.K.; Bohné, T.; Skrzypczyński, P. Vision-based hand pose estimation methods for augmented reality in industry: Crowdsourced evaluation on HoloLens 2. Comput. Ind. 2025, 171, 104328. [Google Scholar] [CrossRef]
- Govi, E.; Sapienza, D.; Toscani, S.; Cotti, I.; Franchini, G.; Bertogna, M. Addressing challenges in industrial pick and place: A deep learning-based 6 degrees-of-freedom pose estimation solution. Comput. Ind. 2024, 161, 104130. [Google Scholar] [CrossRef]
- Bai, Y.; Mao, S.; Zhou, J.; Zhang, B. Clustered tomato detection and picking point location using machine learning-aided image analysis for automatic robotic harvesting. Precis. Agric. 2023, 24, 727–743. [Google Scholar] [CrossRef]
- Zang, Q.; Zhang, J.; Bo, L.; Xiao, Y.; Gao, G.; Zhang, H.; Ren, Y. A fully automatic adjacent key-points localization framework for minimal repeated pattern detection in printed fabric images. Knowl. Based Syst. 2024, 300, 112157. [Google Scholar] [CrossRef]
- Wu, J.; Lee, H.J. Optimizing offset-regression by relay point for bottom-up human pose estimation. Appl. Intell. 2023, 53, 30535–30551. [Google Scholar] [CrossRef]
- Duan, W.; Wang, F.; Li, H.; Liu, N.; Fu, X. Lameness detection in dairy cows from overhead view: High-precision keypoint localization and multi-feature fusion classification. Front. Vet. Sci. 2025, 12, 1675181. [Google Scholar] [CrossRef]
- Zhao, G.; Dong, S.; Wen, J.; Ban, Y.; Zhang, X. Selective fruit harvesting prediction and 6D pose estimation based on YOLOv7 multi-parameter recognition. Comput. Electron. Agric. 2025, 229, 109815. [Google Scholar] [CrossRef]
- Liu, L.; Li, G.; Du, Y.; Li, X.; Wu, X.; Qiao, Z.; Wang, T. CS-Net: Conv-SimpleFormer network for agricultural image segmentation. Pattern Recognit. 2024, 147, 110140. [Google Scholar] [CrossRef]
- Zhao, R.; Zhu, Y.; Li, Y. An end-to-end lightweight model for grape and picking point simultaneous detection. Biosyst. Eng. 2022, 223, 174–188. [Google Scholar] [CrossRef]
- Li, H.; He, Z.; Wang, Y.; Ding, X.; Cui, Y. Research on the mechanized harvesting strategy for clustered kiwi fruits based on deep reinforcement learning. Comput. Electron. Agric. 2025, 237, 110686. [Google Scholar] [CrossRef]
- Wee, B.S.; Chin, C.S.; Sharma, A. Survey of mushroom harvesting agricultural robots and systems design. IEEE Trans. AgriFood Electron. 2024, 2, 59–80. [Google Scholar] [CrossRef]
- Lin, G.; Xiong, J.; Zhao, R.; Li, X.; Hu, H.; Zhu, L.; Zhang, R. Efficient detection and picking sequence planning of tea buds in a high-density canopy. Comput. Electron. Agric. 2023, 213, 108213. [Google Scholar] [CrossRef]
- Wang, X.; Zhou, J.; Xu, Y.; Liu, Z. Research on low-loss and high-efficiency picking sequence planning of safflower filaments based on improved deep reinforcement learning. Comput. Electron. Agric. 2025, 237, 110692. [Google Scholar]
- Dai, N.; Fang, J.; Yuan, J.; Liu, X. 3MSP2: Sequential picking planning for multi-fruit congregated tomato harvesting in multi-clusters environment based on multi-views. Comput. Electron. Agric. 2024, 225, 109303. [Google Scholar] [CrossRef]
- Huang, Y.; Lyu, B.; Gao, T.; Wu, X.; Duan, Y. CornMFN: A multimodal fusion network for corn phenology stage identification. Smart Agric. Technol. 2025, 12, 101202. [Google Scholar] [CrossRef]
- Liu, C.; Feng, Q.; Sun, Y.; Li, Y.; Ru, M.; Xu, L. YOLACTFusion: An instance segmentation method for RGB-NIR multimodal image fusion based on an attention mechanism. Comput. Electron. Agric. 2023, 213, 108186. [Google Scholar] [CrossRef]
- Jiang, T.; Li, Y.; Li, Y.; Xing, W.; Yu, M.; Xie, F.; Ta, D. A segmentation knowledge-based global-local attention network for tumor classification in breast ultrasound images. Pattern Recognit. 2025, 171, 112152. [Google Scholar] [CrossRef]
- Restrepo-Arias, J.F.; Branch-Bedoya, J.W.; Awad, G. Image classification on smart agriculture platforms: Systematic literature review. Artif. Intell. Agric. 2024, 13, 1–17. [Google Scholar] [CrossRef]
- Chin, R.; Catal, C.; Kassahun, A. Plant disease detection using drones in precision agriculture. Precis. Agric. 2023, 24, 1663–1682. [Google Scholar] [CrossRef]
- Wan, P.; Toudeshki, A.; Tan, H.; Ehsani, R. A methodology for fresh tomato maturity detection using computer vision. Comput. Electron. Agric. 2018, 146, 43–50. [Google Scholar] [CrossRef]
- Lei, L.; Yang, Q.; Yang, L.; Shen, T.; Wang, R.; Fu, C. Deep learning implementation of image segmentation in agricultural applications: A comprehensive review. Artif. Intell. Rev. 2024, 57, 1. [Google Scholar] [CrossRef]
- Luo, Z.; Yang, W.; Yuan, Y.; Gou, R.; Li, X. Semantic segmentation of agricultural images: A survey. Inf. Process. Agric. 2024, 11, 172–186. [Google Scholar] [CrossRef]
- Liu, F.; Liu, H.; Wu, Q.; Han, Z.; Pang, S.; Wang, S.; Zhao, L. Pod-Pose: An efficient top-down keypoint detection model for fine-grained pod phenotyping in mature soybean. Plant Methods 2025, 21, 82. [Google Scholar] [CrossRef]
- Zhang, F.; Gao, J.; Song, C.; Zhou, H.; Zou, K.; Xie, J.; Zhang, J. TPMv2: An end-to-end tomato pose method based on 3D keypoints detection. Comput. Electron. Agric. 2023, 210, 107878. [Google Scholar] [CrossRef]
- Burges, C.J. From RankNet to LambdaRank to LambdaMART: An overview. Learning 2010, 11, 81. [Google Scholar]
- Cao, Z.; Qin, T.; Liu, T.-Y.; Tsai, M.-F.; Li, H. Learning to rank: From pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, USA, 20–24 June 2007; pp. 129–136. [Google Scholar]
- Buyl, M.; Missault, P.; Sondag, P.A. RankFormer: Listwise learning-to-rank using listwide labels. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2023; pp. 3762–3773. [Google Scholar]
- Hasan, M.; Marjan, M.A.; Uddin, M.P.; Afjal, M.I.; Kardy, S.; Ma, S.; Nam, Y. Ensemble machine learning-based recommendation system for effective prediction of suitable agricultural crop cultivation. Front. Plant Sci. 2023, 14, 1234555. [Google Scholar] [CrossRef] [PubMed]
- Bera, A.; Krejcar, O.; Bhattacharjee, D. Rafa-Net: Region attention network for food items and agricultural stress recognition. IEEE Trans. AgriFood Electron. 2024, 3, 121–133. [Google Scholar] [CrossRef]








| Global Resolution | Local Resolution | Loss Function | Top-1 | PMR | Time/ms |
|---|---|---|---|---|---|
| 224 × 224 | 224 × 224 | RankNet | 0.925 | 0.950 | 22.8 |
| 112 × 112 | 112 × 112 | RankNet | 0.925 | 0.960 | 22.6 |
| 56 × 56 | 56 × 56 | RankNet | 0.950 | 0.970 | 22.3 |
| 224 × 224 | 224 × 224 | ListNet | 0.925 | 0.950 | 22.9 |
| 112 × 112 | 112 × 112 | ListNet | 0.925 | 0.950 | 22.7 |
| 56 × 56 | 56 × 56 | ListNet | 0.925 | 0.950 | 22.2 |
| 224 × 224 | 224 × 224 | MSE | 0.850 | 0.920 | 23.0 |
| 112 × 112 | 112 × 112 | MSE | 0.850 | 0.925 | 22.8 |
| 56 × 56 | 56 × 56 | MSE | 0.900 | 0.945 | 22.3 |
| Global Resolution | Local Resolution | Loss Function | Top-1 | PMR | Time/ms | FLOPs/G |
|---|---|---|---|---|---|---|
| 224 × 224 | 56 × 56 | RankNet | 0.950 | 0.970 | 22.7 | 2.567 |
| 112 × 112 | 56 × 56 | RankNet | 0.950 | 0.970 | 22.6 | 1.100 |
| 56 × 56 | 56 × 56 | RankNet | 0.950 | 0.970 | 22.3 | 0.734 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Meng, Z.; Du, S.; Wang, B.; Pan, J.; Hu, D.; Du, X.; Yang, Q. GL-SeqNet: Global–Local Fusion for Intra-Cluster Tomato Harvesting Sequence Optimization. Agriculture 2026, 16, 1322. https://doi.org/10.3390/agriculture16121322
Meng Z, Du S, Wang B, Pan J, Hu D, Du X, Yang Q. GL-SeqNet: Global–Local Fusion for Intra-Cluster Tomato Harvesting Sequence Optimization. Agriculture. 2026; 16(12):1322. https://doi.org/10.3390/agriculture16121322
Chicago/Turabian StyleMeng, Zhichao, Shan Du, Bo Wang, Jun Pan, Dong Hu, Xiaoqiang Du, and Qinghua Yang. 2026. "GL-SeqNet: Global–Local Fusion for Intra-Cluster Tomato Harvesting Sequence Optimization" Agriculture 16, no. 12: 1322. https://doi.org/10.3390/agriculture16121322
APA StyleMeng, Z., Du, S., Wang, B., Pan, J., Hu, D., Du, X., & Yang, Q. (2026). GL-SeqNet: Global–Local Fusion for Intra-Cluster Tomato Harvesting Sequence Optimization. Agriculture, 16(12), 1322. https://doi.org/10.3390/agriculture16121322

