Abstract
This paper presents a hybrid pipeline based on zero-shot vision models for automatic node count estimation in Lisianthus (Eustoma grandiflorum) cultivation and a system for real-time growth information sharing. The multistage image analysis pipeline integrates Grounding DINO for zero-shot leaf-region detection, MiDaS for monocular depth estimation, and a YOLO-based classifier, using daily time-lapse images from low-cost fixed cameras in commercial greenhouses. The model parameters are derived from field measurements of 2024 seasonal crops (Trial 1) and then applied to different cropping seasons, growers, and cultivars (Trials 2 and 3) without any additional retraining. Trial 1 indicates high accuracy (R2 = 0.930, mean absolute error (MAE) = 0.73). Generalization performance is confirmed in Trials 2 (MAE = 0.45) and 3 (MAE = 1.14); reproducibility across multiple growers and four cultivars yields MAEs of approximately ±1 node. The model effectively captures the growth progression despite variations in lighting, plant architecture, and grower practices, although errors increase during early growth stages and under unstable leaf detection. Furthermore, an automated Discord-based notification system enables real-time sharing of node trends and analytical images, facilitating communication. The feasibility of combining zero-shot vision models with cloud-based communication tools for sustainable and collaborative floricultural production is thus demonstrated.