Next Article in Journal
GABES-LSTM-Based Method for Predicting Draft Force in Tractor Rotary Tillage Operations
Next Article in Special Issue
A Precision Weeding System for Cabbage Seedling Stage
Previous Article in Journal
Agronomic, Nitrogen Use, and Economic Efficiency of Winter Wheat (Triticum aestivum L.) Under Variable-Rate Versus Uniform Nitrogen Fertilization
Previous Article in Special Issue
Explainable Deep Learning and Edge Inference for Chilli Thrips Severity Classification in Strawberry Canopies
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hybrid Zero-Shot Node-Count Estimation and Growth-Information Sharing for Lisianthus (Eustoma grandiflorum) Cultivation in Fukushima’s Floricultural Revitalization

1
Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 113-8657, Japan
2
Hama Agricultural Regeneration Research Centre, Fukushima Agricultural Technology Centre, Minamisoma 975-0036, Japan
3
Division of Field Cropping and Horticulture Research, Tohoku Agricultural Research Center, National Agriculture and Food Research Organization (NARO), Morioka 020-0198, Japan
4
Agricultural Radiation Research Center, Tohoku Agricultural Research Center, National Agriculture and Food Research Organization (NARO), Fukushima 960-2156, Japan
*
Authors to whom correspondence should be addressed.
Agriculture 2026, 16(3), 296; https://doi.org/10.3390/agriculture16030296
Submission received: 24 December 2025 / Revised: 13 January 2026 / Accepted: 21 January 2026 / Published: 23 January 2026

Abstract

This paper presents a hybrid pipeline based on zero-shot vision models for automatic node count estimation in Lisianthus (Eustoma grandiflorum) cultivation and a system for real-time growth information sharing. The multistage image analysis pipeline integrates Grounding DINO for zero-shot leaf-region detection, MiDaS for monocular depth estimation, and a YOLO-based classifier, using daily time-lapse images from low-cost fixed cameras in commercial greenhouses. The model parameters are derived from field measurements of 2024 seasonal crops (Trial 1) and then applied to different cropping seasons, growers, and cultivars (Trials 2 and 3) without any additional retraining. Trial 1 indicates high accuracy (R2 = 0.930, mean absolute error (MAE) = 0.73). Generalization performance is confirmed in Trials 2 (MAE = 0.45) and 3 (MAE = 1.14); reproducibility across multiple growers and four cultivars yields MAEs of approximately ±1 node. The model effectively captures the growth progression despite variations in lighting, plant architecture, and grower practices, although errors increase during early growth stages and under unstable leaf detection. Furthermore, an automated Discord-based notification system enables real-time sharing of node trends and analytical images, facilitating communication. The feasibility of combining zero-shot vision models with cloud-based communication tools for sustainable and collaborative floricultural production is thus demonstrated.

1. Introduction

1.1. Background and Research Context

Since the Great East Japan Earthquake, agricultural reconstruction and support for new farmers have become key issues in the revitalization of the Hamadori coastal region in Fukushima Prefecture [1]. In particular, in the Sōsō area, the resumption of farming after the disaster has been accompanied by an increasing number of new entrants to cut-flower production [2]. Lisianthus (Eustoma grandiflorum) is one of the major cut flower crops owing to its relatively high market unit price. In the Japanese cut flower market, Lisianthus ranks ninth, with an annual production value of 10 billion yen and a cultivation area of 400 ha [3]. Internationally, it is recognized as one of the world’s top ten cut flowers and is widely supported by consumers owing to its excellent vase life and diverse coloration [4]. However, for inexperienced growers, the cultivation of Lisianthus poses challenges stemming from skill gaps in growth diagnosis and management decision-making [5]. Consequently, a system that enables the sharing of objective growth indicators and provides technical support is required.
Flower crops, such as Lisianthus, exhibit distinct developmental stages according to the progression of node numbers. Node count is an effective monitoring index for describing crop growth and development [6]. This parameter is widely used in production areas as an indicator of early or delayed growth, as well as the timing of irrigation and temperature management. It also serves as a crucial indicator directly linked to pinching schedules and flowering prediction. Conventionally, node counts have been assessed through on-site observation or manual measurement; however, these methods are time-consuming, labor-intensive, and prone to observer subjectivity. Moreover, quantitative growth information based on standardized criteria is essential for comparing growth across multiple fields and providing cultivation guidance. For these reasons, developing technologies that enable objective and automatic evaluation of crop growth is an important challenge from the perspective of reducing the labor burden and standardizing cultivation techniques.
In recent years, advances in smart agriculture have drawn attention to crop-monitoring methods utilizing AI and image analysis technologies. In particular, the use of fixed cameras installed inside greenhouses to automatically collect growth images enables long-term, non-contact, and low-cost observations, making practical adoption at production sites increasingly feasible [7]. However, such image-based analyses often require large amounts of training data and annotation, and constructing generalized models is difficult for horticultural crops with diverse varieties and morphological traits. To address this issue, recently developed vision-language models (VLMs) are expected to serve as general-purpose growth analysis methods with zero-shot capability, leveraging knowledge learned from large-scale datasets to identify previously unseen targets without annotation [8,9]. This is particularly promising in agriculture, where the collected data inherently include variations arising from differences in growers’ production environments, facility structures, lighting conditions at the time of imaging, and crop architecture, depending on the cultivar or management practices [10]. Zero-shot models capable of integrating data across heterogeneous domains have significant potential for improving generalizability.
Furthermore, the mechanisms by which growth data collected on individual farms are shared among stakeholders and used to support decision-making for farm management are of critical importance. Advances in decentralized information technology have enabled the acquisition and analysis of large-scale datasets. Among the various forms of data sharing, effective decision-support systems have been reported to facilitate interactions between farmers’ learning needs and reliable advisors [11]. Although small-scale farmers generally recognize the necessity of information-sharing using ICT, limited ICT skills often constitute a major barrier to adoption [12]. Therefore, beyond simply enabling the technical handling of data, an environment must be established in which farmers can mutually learn, access necessary support, and make practical use of shared information. In particular, to enhance cultivation skills within a local production community, an “information-sharing infrastructure” is indispensable for circulating knowledge by sharing field-collected data in real-time among experts, growers, and extension officers.
Against this background, the present study aimed to integrate AI-based node-count estimation with digital communication tools for sharing growth information and to conduct a practical demonstration of labor-saving and sustainable technical support in Lisianthus cultivation.

1.2. Related Research

Research on automatic evaluation of crop growth has progressed rapidly with the development of smart agriculture. Digital phenotyping, which acquires morphological and physiological information of plants in a nondestructive and high-frequency manner using image analysis, has been highlighted as a fundamental technology that contributes to advanced breeding and cultivation management, and its importance has been emphasized in numerous review papers [13,14]. In recent years, with the advancement of deep learning technologies, research has expanded beyond feature extraction at a single point in time to the quantitative analysis of growth processes using time-series images [15]. However, practical challenges such as the processing of large-scale image datasets and the computational burden involved in creating training data have been highlighted [16], and a robust technical foundation that can withstand real-world implementation must be established.
Among the growth-structural traits, the estimation of finer indicators, such as leaf and node numbers, is a domain in which the limitations of supervised learning become more evident. Fan et al. (2022) proposed a method that integrates segmentation and regression models, demonstrating robust leaf-count estimation even in images containing complex leaf overlap [17]. Deb et al. (2024) developed LC-Net, which uses SegNet-based leaf region maps as auxiliary information and achieves high-precision leaf count estimation across multiple rosette plant species [18]. Hu et al. (2023) employed an improved YOLOv5 to detect stem nodes and automatically calculate internode lengths [19]. However, these methods often strongly depend on specific crops or imaging conditions, and a major challenge is the large amount of annotated data required for model construction.
In floricultural crops, in particular, the constraints of supervised learning become even more severe because of the frequent overlap of leaves and buds, as well as high morphological diversity. Attempts to evaluate flower crop growth include the morphological measurements of carnations [20], RoseTracker for the automatic detection of buds and blooming flowers in roses [21], and field-level flowering estimation using UAV imagery [22]. Recent studies have advanced 3D-based modeling approaches for analyzing floral morphological structures [23,24]. Nevertheless, image-based growth evaluation in floriculture remains limited, and research on fine structural traits, such as node and leaf counts, has not accumulated sufficiently.
In domains where preparing large amounts of labeled data for each crop species is impractical, a new framework that can reduce the dependence on training data is required. Recently, zero-shot learning (ZSL) has attracted increasing attention. ZSL is a framework that enables the inference of classes not observed during training by utilizing attribute information or linguistic descriptions. Its applications have been widely discussed, from image classification to object detection [25,26]. In particular, the emergence of vision-language models (VLMs) pretrained on large-scale image–text paired datasets has greatly expanded the ZSL framework to zero-shot object detection (ZSD).
Zero-shot detection models based on VLMs include ViLD [27] and RegionCLIP [28], which were developed through the distillation of CLIP knowledge and demonstrate performances comparable to those of conventional supervised detectors. GLIP [29] and Grounding DINO [30] integrate phrase grounding with object detection to achieve flexible detection in response to arbitrary textual instructions. With these technological advancements, applications of zero-shot detection have also begun to emerge in the agricultural domain, including the detection of blueberry fruits, disease spots, and weeds [31], instance segmentation in plant factories [9], and pest detection in grains [32]. However, applications to floricultural crops remain limited, and there is scarcely any research applying zero-shot methods to estimate fine structures, such as node or leaf counts.

1.3. Research Objectives

Despite recent advances, existing AI-based growth-estimation techniques have not yet established a stable method applicable to floricultural crops. This limitation arises primarily from leaf overlap and variable lighting conditions encountered in real production environments. Moreover, mechanisms for sharing growth information among multiple growers remain insufficient, and existing systems that focus primarily on environmental data cannot handle morphological growth indicators [5,33]. To address this gap, an integrated platform is needed that can quantify crop growth in a labor-saving and generalizable manner and share the results in real time. The objective of this study is to develop a cultivation support system for Lisianthus (Eustoma grandiflorum) that integrates node-count estimation using AI-based image analysis with a mechanism for sharing growth information. Instead of conventional manual observations, this study aims to implement a system that utilizes daily images captured by fixed cameras to visualize and disseminate growth progression objectively.
Specifically, the system integrates three components;
  • low-cost growth monitoring in greenhouses using fixed cameras;
  • node-count estimation using a hybrid pipeline combining zero-shot vision models (Grounding DINO, MiDaS) with a conventional supervised model (YOLO); and
  • automated information sharing among growers, extension officers, and researchers using Discord.
Through this system, we aim to simultaneously support the quantitative assessment and knowledge sharing of crop growth, thereby contributing to the establishment of sustainable and collaborative cultivation techniques for floricultural production. By targeting the coastal region of Fukushima Prefecture, an area affected by the disaster, this study seeks to generate practical insights that contribute to both regional recovery and the transmission and sharing of agricultural expertise.

2. Materials and Methods

2.1. Overview of the Demonstration Trials

In this study, three demonstration trials targeting Lisianthus (Eustoma grandiflorum) were conducted in Namie Town, Fukushima Prefecture, between 2024 and 2025. These trials aimed to develop and validate a node-count estimation algorithm that utilizes zero-shot vision models. To evaluate the performance of the estimation models from multiple perspectives, growth images were collected under differing cropping seasons, growers, and cultivar conditions.

2.1.1. Cultivation Conditions

The demonstration trials were conducted in 2024 and 2025 in greenhouses managed by floricultural growers in Namie Town, Futaba District, Fukushima Prefecture, Japan. All greenhouses were pipe-frame structures covered with PO film and equipped with natural ventilation through side windows. They were equipped with circulation fans to promote air mixing and ensure a uniform temperature. No heating or cooling systems were installed, and the temperature management relied on ambient outdoor conditions. Photoperiod control was implemented in all trial plots using supplemental lighting.
Irrigation was primarily applied through tube irrigation, supplemented with hand watering when necessary, based on pF meters or soil moisture sensors installed by each grower. Compost application and bed shaping were performed 1–2 months prior to transplanting, and the bed height was standardized at 15–20 cm. The planting layout consisted of a four-row configuration with one row removed from the center, with 75 cm bed spacing (65 cm in Trial 2 only), 12 cm row spacing, and 12 cm plant spacing. Black-and-white mulch was used in all trial plots. Pest and disease control were conducted as needed, depending on the incidence, and weeding was performed manually.

2.1.2. Summary of the Trials

An overview of each trial is presented in Table 1. The trials targeted two cropping types, seasonal cropping (spring transplanting and summer harvesting) and retarded cropping (early-summer transplanting and autumn harvesting). The number of participating growers ranged from one to four, with considerable variation in cultivation experience. Grower A was the most experienced producer and played a leading role in the local community. Grower B was a mid-career producer with the second-highest level of experience. In contrast, Grower C was in their second year of cultivation as of 2025, and Grower D was a newly established grower beginning cultivation that same year.
Five cultivars were used: Julius Lavender (JL; KANEKO SEEDS CO., LTD, Maebashi, Gunma, Japan), Celebrich White (CW; Sumika Agrotech Co., Ltd., Osaka, Osaka, Japan), Happiness White (HW; MIYOSHI & CO., LTD., Setagaya, Tokyo, Japan), Largo Marine (LM; MIYOSHI & CO., LTD., Setagaya, Tokyo, Japan), and NF Antique Pink (AP; NAKASONE Lisianthus Inc., Chikuma, Nagano, Japan), with different cultivars selected according to the cropping type. According to the maturity characteristics reported by the respective seed suppliers, JL is classified as early–medium, CW as medium, HW as medium–late, and LM as late. AP is primarily used to retard cultivation in this region.
The three demonstration trials conducted in this study had distinct objectives. The purpose of each trial was as follows.
  • Trial 1: Development of a Node-Estimation Model (2024 Seasonal Cropping Type)
    Trial 1 was conducted to design a leaf-counting algorithm using zero-shot vision models and to optimize the parameters of the node-estimation formula. Fixed cameras installed in greenhouses in Namie Town were used to capture time-series images of Lisianthus growth, and an algorithm for detecting and counting leaf regions was constructed. In this trial, a basic analytical pipeline was established by integrating model components such as background separation, candidate leaf-region detection, and individual-leaf identification. Based on the resulting leaf-count data, regression analysis was performed to estimate the node numbers. The final model and optimized parameters were used as the baseline for Trials 2 and 3, respectively.
  • Trial 2: Evaluation of Estimation Accuracy (2024 Retarding Cropping Type)
    Trial 2 was conducted to evaluate the generalization performance of the model developed in Trial 1 across different cropping types and cultivars. In this trial, node-count estimation was applied to images of other cultivars under the retarded cropping type, and the estimation accuracy was validated through comparison with manual measurements.
  • Trial 3: Verification of Reproducibility and Operational Suitability (2025 Seasonal Cropping Type)
    Trial 3 was conducted the following year using the same seasonal cropping type to verify the reproducibility of the estimation method established in Trial 1. In this trial, node-count estimation was performed not only for the cultivar used in model development (HW) but also for three additional cultivars (JL, CW, and LM) and under conditions including the participation of a new grower (Grower D). This allowed us to assess the stability of the model across different cultivars and growers. Additionally, an operational test was conducted to share the estimation results with growers in real-time, enabling an examination of the model’s practicality and potential for implementation in real-world production settings.

2.2. Automatic Image Acquisition System for Growth Monitoring

In this study, an automated image acquisition system was developed to capture and record the growth process of Lisianthus using fixed cameras installed in the field. The overall system configuration is illustrated in Figure 1. The system was designed to automate the entire sequence of processes from image capture to storage and cloud transfer, and to stably collect image data usable for AI analysis at regular time intervals.
In each greenhouse, 1–4 compact network cameras (ATOM Cam2; ATOM Tech Inc., Yokohama, Kanagawa, Japan) were installed. The cameras were fixed at positions that captured side views of the target plants across the work aisle, ensuring that images were taken from a consistent angle throughout the trial period. The distance between the camera and the plants corresponded to a bed spacing of 75 cm (65 cm in Trial 2), and the camera height was fixed at 25 cm above the substrate surface. Although 10–13 plants were included in each imaging plot, cropping pre-processing, as described later, standardized the final analysis area to the field of view corresponding to the 10 plants.
Each camera was connected via Wi-Fi to a local network and was configured to be accessible from a Raspberry Pi (Models 3 B, 3 B +, and 3A+; Raspberry Pi Ltd., Cambridge, UK) on the same network. The IP addresses of each camera were crawled on a Raspberry Pi, and still images were obtained using the RTSP protocol. The hourly scheduled image capture was automated using the cron scheduler implemented in Raspberry Pi OS (Raspberry Pi Ltd., Cambridge, UK). Images were saved in the JPEG format at a resolution of 1920 × 1080 pixels (Full HD). Natural daylight was used for illumination, without artificial lighting. To mitigate the effects of diurnal and day-to-day variations in lighting intensity, representative daytime images were selected at a fixed time each day, as described in Section 2.3.1. In addition, camera parameters including exposure, shutter speed, ISO sensitivity, and white balance were maintained at default automatic settings to adapt to gradual changes in ambient light conditions. Foggy or severely hazy conditions were rare during the experimental periods and were not treated as a separate factor in the analysis. However, under such low-contrast conditions, leaf detection performance may be degraded, which is recognized as a limitation of the current system and discussed in Section 4.8.
Captured images were saved in automatically generated date-based folders for each camera and subsequently uploaded to a designated directory on Google Drive (Google LLC, Mountain View, CA, USA). The Google Drive API (Google LLC, Mountain View, CA, USA) was used to upload and automate cloud data transfer from the Raspberry Pi [34]. This enabled the centralized cloud-based management of image data collected across different growers’ fields and provided an environment for direct access and processing from Google Colaboratory (Google LLC, Mountain View, CA, USA). Filenames were automatically assigned camera IDs and timestamps to facilitate the matching of grower, cultivar, and capture times during subsequent leaf-counting and node-estimation processes.
This image acquisition system enabled continuous recording of plant growth without manual intervention and established a foundation for the high-frequency and high-accuracy data collection necessary for comparative analysis across different fields and cultivation environments.

2.3. Leaf Counting Method Using Zero-Shot Models

In this study, we constructed a hybrid image-analysis pipeline that combines zero-shot vision models with an existing deep-learning classifier to automatically estimate leaf numbers from Lisianthus growth images. Leaf counting was performed on time-lapse images obtained from fixed cameras in the following three processing stages:
  • Extraction of representative daytime images
  • Preprocessing via lens distortion correction and cropping
  • Leaf-region detection and counting using a hybrid approach
These processes were implemented in a Python notebook executable on Google Colaboratory (Runtime version: 2025.10; Python 3.12.12) using an NVIDIA Tesla T4 GPU (NVIDIA Corp., Santa Clara, CA, USA). The notebook was designed to directly read and write image data stored on Google Drive using the automatic image acquisition system described in Section 2.2, enabling efficient cloud-based analysis.

2.3.1. Extraction of Representative Daytime Images

Representative daytime images were automatically extracted from hourly time-lapse images captured using each camera in the field. First, the image directory on Google Drive was referenced, and nighttime images were filtered based on timestamp information. An image captured at a predetermined time (e.g., 10:00 a.m.) was selected as the daily representative image. The extraction process was automated using a Python script and organized using the camera ID (folder name). Consequently, the light condition variability caused by changes in solar elevation was minimized, enabling the construction of a time-series dataset suitable for comparison.

2.3.2. Preprocessing of Extracted Images

Image preprocessing aimed to correct camera-dependent imaging conditions and generate a uniform dataset comparable across the time series. To achieve this, lens distortion correction and field-of-view normalization were applied. These steps reduced the influence of differences in shooting positions or angles and ensured stable evaluation of daily growth changes. An outline of the processing is provided below.
  • Lens Calibration and Distortion Correction
    Using the pre-acquired internal parameters of each camera (e.g., focal length, principal point) and distortion coefficients, lens distortion was corrected with the cv2.undistort() function in OpenCV (v4.12.0; OpenCV Foundation, Palo Alto, CA, USA). This mitigated the distortion that occurred from the center toward the periphery of the image and improved the stability of shape recognition for nodes and leaf regions located near the image edges.
  • Cropping and Field-of-View Normalization
    Because each camera image contained multiple plants, the image width was automatically scaled based on the number of captured plants and standardized to the equivalent width of ten plants. Cropping was performed using the lower center as the reference point, and the resulting region was rescaled to the original resolution (1920 × 1080 pixels). This procedure reduced daily and camera-based variations in the field of view and generated standardized images suitable for leaf counting and node estimation.

2.3.3. Leaf Counting Procedure

Leaf counting was performed using a hybrid method that integrated text-guided zero-shot object detection, auxiliary classification, and depth-estimation models. The overall process flow is illustrated in Figure 2. By combining leaf region detection via Grounding DINO (v1.0; IDEA Research, Shenzhen, Guangdong, China), region filtering using a YOLO (v8.0; Ultralytics Inc., Frederick, MD, USA) classifier, and monocular depth estimation via MiDaS (v3.1; Intel Corporation, Santa Clara, CA, USA), we constructed a pipeline capable of estimating leaf numbers for Lisianthus with high accuracy and stability.
First, the candidate leaf regions in the images were detected in a zero-shot manner using Grounding DINO [30]. Grounding DINO is characterized by its ability to recognize general leaf morphological features independent of the crop species, owing to large-scale pretraining. The text prompt “individual leaf on plant body” was used to specify the detection target. During the preliminary prompt design stage, the use of a single term such as “leaf” caused frequent misdetections in which the entire plant canopy was recognized as a leaf region. To suppress such overgeneralized detections, the phrase “on plant body” was explicitly added to constrain the spatial context of the target. In addition, because subsequent filtering steps focused on selecting individual leaves, the term “individual” was included to emphasize the detection of single-leaf structures rather than aggregated leaf regions. No few-shot learning or fine-tuning was performed for domain adaptation, and the model was used solely for zero-shot inference.
Subsequently, a YOLOv8-based classifier [35] was introduced to categorize each detected region into three classes: “single leaf”, “multiple leaves”, and “non-leaf”. Only the regions classified as single leaves were counted as leaf instances. Regions with classification scores below a certain threshold were removed to prevent over-detection. Because the detected candidate regions included duplicates, non-maximum suppression (NMS) was applied to integrate redundant bounding boxes.
Furthermore, MiDaS was used to estimate a depth map for the entire image and evaluate the three-dimensional foreground–background relationships among the candidate regions [36]. MiDaS is a deep learning model that estimates the relative depth from monocular RGB images and exhibits high zero-shot generalizability owing to its training on combined depth-estimation datasets. As MiDaS outputs relative depth values, outlier regions were removed by excluding those whose depth values fell outside ±1σ of the depth distribution. This reduces false detections caused by adjacent plants, beds, and background structures, such as stakes or windbreak nets. Although a wider threshold such as ±3σ was effective in suppressing distant background regions, it was insufficient to remove regions from adjacent plant rows; therefore, a ±1σ threshold was empirically adopted.
Finally, the outputs from the three models were integrated to visualize the valid leaf regions and automatically calculate the leaf numbers for each image. The calculated leaf counts were organized as data corresponding to ten plants per image and were used to compare daily growth progression, as well as to serve as input for the node-estimation model described in Section 2.4.

2.4. Node Estimation and Accuracy Evaluation

In this study, the node number of Lisianthus (Eustoma grandiflorum) was automatically calculated based on the leaf count estimates obtained in the previous section, and the estimation accuracy was evaluated. Node number is a key indicator representing a plant’s growth stage and is widely used in this region as a criterion for judging growth progress and determining the appropriate timing for irrigation and temperature management. Traditionally, node counting relied on manual surveys or visual assessments by growers and extension officers. In this study, we propose a new method that statistically estimates node numbers from leaf counts, enabling nondestructive and continuous monitoring.

2.4.1. Node Estimation

Node estimation was performed using the leaf counts obtained using the hybrid approach. Lisianthus is a dicotyledonous plant with an opposite phyllotaxy, forming one pair of opposite leaves at each node as the stem elongates. Therefore, the node number exhibits a linear relationship with the leaf number. In this study, the relationship between the leaf number and node number was expressed using the linear approximation shown in Equation (1):
N node = 0.09 × L ` 5 d + 1.03 ,
where N node is the estimated node number, and L ` 5 d is the 5-day moving average of the leaf number. Because the leaf counts obtained in Section 2.3.3 contained short-term noise caused by variations in lighting conditions and detection instability, a 5-day moving average ( L ` 5 d ) was used to smooth the leaf count time series. The 5-day window was determined in Trial 1 to be the optimal balance, as longer windows obscured short-term developmental changes, whereas shorter windows caused unstable fluctuations. This window length is also consistent with the biophysical growth characteristics of lisianthus; based on recorded node count data for HW, one node increase required approximately 6–7 days during the target growth stage, indicating that node progression occurs on a multi-day time scale rather than daily. This process suppresses irregular day-to-day fluctuations and enables a more stable representation of continuous leaf development.
The coefficients of the linear equation (0.09 and 1.03) were derived through least-squares regression using manually surveyed node data collected from the fields of the three growers in Trial 1. The observed node number was calculated as the mean of the 36 plants per grower. Field surveys were conducted weekly, with ten surveys performed during the monitoring period.
In this study, node number refers specifically to the number of bolting nodes. The number of bolting nodes is defined as the number of nodes formed after stem elongation (bolting), with an internode length of at least 1 cm and fully developed leaves [24].

2.4.2. Node Number Validation

To evaluate the accuracy of the node-estimation model, manual node measurements were conducted in Trials 2 and 3 and compared with the automated estimates determined using Equation (1). The survey plants were selected from plots that exhibited moderate growth within each field. In Trial 2, the mean value of 36 plants was used for each cultivar–grower combination, and in Trial 3, the mean value of 15 plants was used. These values reflect the growth of representative plants.
Field surveys were conducted every two weeks, three times in Trial 2 and five times in Trial 3. The collected observational node data were used for comparative analysis of the estimated values. Because the number of data points in Trial 2 was extremely small (n = 3), performing a statistically meaningful regression analysis was deemed inappropriate; therefore, only the mean absolute error (MAE) and root mean square error (RMSE) between the estimated and observed values were reported. In Trial 3, considering the possibility that the estimation accuracy might vary by grower or cultivar, both an overall regression analysis, including all data, and separate regression analyses by grower and cultivar were conducted.
For each survey date, the estimated node number for the same day was matched with the observed value, and three evaluation metrics, mean absolute error (MAE), root mean square error (RMSE) and coefficient of determination (R2), were calculated. MAE represents the magnitude of the difference between the estimated and observed values, with smaller values indicating smaller errors. RMSE provides additional information by penalizing larger errors more strongly and is therefore sensitive to occasional large deviations between estimated and observed values. R2 expresses the degree to which the estimated values explain the variation in observed values and serves as an indicator of how well the model captures growth progression. Using MAE, RMSE, and R2 together, the model performance was comprehensively evaluated from the perspectives of absolute error and reproducibility of growth dynamics.

2.5. Evaluation of Accuracy–Complexity Trade-Off

In addition to estimation accuracy, the computational cost of the proposed pipeline was quantitatively evaluated to clarify the trade-off between accuracy and computational complexity. Because the proposed method integrates multiple large-scale zero-shot models, its computational burden is expected to be substantially higher than that of conventional single-stage detectors. Therefore, a baseline method based on a pure YOLOv5 detector (v5.0; Ultralytics Inc., Frederick, MD, USA) was introduced for benchmarking.
For both YOLOv5 and the proposed method, computational complexity was assessed using four indicators: the number of model parameters, floating-point operations (FLOPs), inference latency, and peak GPU memory usage. All benchmarks were conducted with an input resolution of 1280 × 1280 pixels and batch size 1. To ensure a fair comparison, inference latency was measured as the model-only execution time, excluding disk input/output operations. Each model was warmed up before measurement, and the reported values represent averages over 50 images. FLOPs were obtained using the PyTorch Profiler (PyTorch v2.8.0; Meta Platforms Inc., Menlo Park, CA, USA) by executing the full inference pipeline, and peak GPU memory was recorded.
The leaf-count and node-estimation accuracies obtained in Section 2.3 and Section 2.4 were then analyzed together with these computational metrics. This enabled a direct comparison of how much additional computational cost was required to achieve a given improvement in estimation accuracy, providing a quantitative basis for evaluating the practical trade-off between model complexity and performance in real-world greenhouse monitoring.

2.6. Sharing of Growth Information

In this study, an automated notification system was developed using the online communication platform Discord (Discord Inc., San Francisco, CA, USA) to share node-estimation results among growers, extension officers, and researchers for growth management and technical guidance [37]. The system was implemented using a Python script, which was automatically executed upon completion of the node-estimation process. After each daily analysis, graphs showing node number progression for each cultivar and representative images were posted to a designated Discord channel, enabling stakeholders to access the same information immediately.
This notification function was integrated into a remote greenhouse-monitoring system that was previously developed and operated by the authors [33]. In this system, environmental data, such as greenhouse air temperature, relative humidity, soil temperature, and solar radiation, are automatically delivered to Discord. Consequently, both growth information (node number trends and leaf count images) and environmental information can be viewed in a unified interface, allowing stakeholders to provide guidance and make management decisions based on the relationship between environmental fluctuations and plant responses.
The shared Discord channel was designed to facilitate real-time communication through comment threads, enabling discussion of growth conditions and adjustments to management strategies. These functions substantially improve the efficiency of on-site growth monitoring and information sharing, and further contribute to strengthening knowledge exchange and technical support among growers within the region.

3. Results

3.1. Construction of the Node Count Estimation Model (Trial 1: Seasonal Crop 2024)

In the 2024 seasonal cropping trial, a node-count estimation model based on zero-shot vision models was constructed, and its initial performance was evaluated using data from three fields in Namie Town (Growers A, B, and C). In this trial, a regression analysis was performed to model the relationship between the counted leaf number obtained by the proposed system and the manually observed node number. The relationship between the counted leaf number using the system (horizontal axis) and the manually observed node number (vertical axis) is shown in Figure 3, and the estimation accuracy metrics are summarized in Table 2. A strong linear relationship was observed with a coefficient of determination of R2 = 0.930 and a mean absolute error (MAE) of 0.73 (n = 26). Trends were generally consistent across growers, and the model successfully reproduced the pattern of increase in node number, even during the later growth stages when leaf density increased. This indicates that the model effectively functioned as a quantitative indicator of the growth stage.
Examining errors by grower, Grower B and C showed MAE values of 0.68 and 0.57, respectively, indicating deviations within ±0.7 nodes from the observed values. In contrast, Grower A exhibited a slightly higher MAE of 0.83 with a tendency to underestimate the number of nodes, particularly in the later growth stages.
These results demonstrate that node-estimation using zero-shot vision models can achieve sufficient accuracy without requiring additional domain adaptation, such as few-shot learning or fine-tuning, and that a stable estimation performance can be maintained across growers under different imaging conditions. The model constructed in this trial was subsequently applied to accuracy verification trials (Trials 2 and 3), enabling further evaluation of reproducibility across diverse production environments.

3.2. Verification of Node-Estimation Accuracy (Trial 2: Retarding Crop 2024)

In Trial 2, the accuracy of the node-estimation model constructed in Trial 1 was evaluated using a dataset from different retarded cropping types. This trial targeted Grower A, and the model parameters obtained in Trial 1 were applied without retraining, allowing the assessment of the model’s generalization performance across seasonal and cultivar changes. The mean absolute error (MAE) was 0.45, indicating that the estimation errors remained within one node and that the estimated values were closely aligned with the observed measurements (RMSE = 0.46). In particular, the model consistently reproduced the node numbers during the intermediate growth stage at approximately four to six nodes.
These results show that the zero-shot-based node-estimation model developed in Trial 1 can maintain a stable performance across different cropping seasons without additional training. Based on these findings, Trial 3 expanded the number of growers and cultivars to evaluate the reproducibility under a broader range of conditions.

3.3. Re-Evaluation of Node-Estimation Accuracy (Trial 3: Seasonal Crop 2025)

In Trial 3, the node-estimation model constructed and validated in Trial 1 during the previous year was applied directly to the data from the following year to verify its reproducibility and generalization performance across multiple growers and cultivars.
As shown in Figure 4, the overall estimated values exhibited good agreement with the observed node numbers, yielding R2 = 0.768 and MAE = 1.14 (n = 46). Although the sample size increased compared to Trials 1 and 2, and the coefficient of determination decreased, estimation errors generally remained within approximately ±1 node, indicating that the model retained a certain level of reproducibility. However, in the early growth stages (approximately 1–4 nodes), the stability of leaf detection decreased, resulting in the estimated node numbers clustering around one, thereby tending toward underestimation. Using the mean absolute percentage error (MAPE), when the data from Trial 3 were stratified by growth stage, the early stage (observed node number <4, n = 14) exhibited larger relative errors (MAE = 1.16 nodes, RMSE = 1.30 nodes, MAPE = 43.4%) than the later stage (≥4 nodes, n = 32; MAE = 1.11 nodes, RMSE = 1.38 nodes, MAPE = 16.5%).
The regression results for each grower are shown in Figure 5, and the accuracy metrics are listed in Table 3. Although an underestimation was observed in the early growth stages for all growers, the estimated values for growers A and D increased in alignment with the 1:1 line during the mid- to late-growth stages, maintaining an approximately linear relationship. During the late growth stages, variability, likely attributable to cultivar differences, was observed. For Grower C, the estimated number of nodes tended to be lower throughout the entire period. At the latter half of the image analysis period (12 June), the average plant height of Grower C was 25.7 cm, compared with 30.2 cm for Grower A and 38.4 cm for Grower D (four-cultivar mean). Although initial plant height did not differ markedly at transplanting, Grower C consistently exhibited weaker growth during the analysis period, which likely contributed to lower node estimates. Overall, the MAE values for each grower ranged from 0.91 to 1.32 nodes, with errors for all growers remaining within approximately ±1 node.
Furthermore, cultivar-specific regression analyses for the four cultivars (JL, CW, HW, and LM) are presented in Figure 6, and the accuracy metrics are listed in Table 4. All cultivars exhibited a clear positive correlation between the estimated and observed node numbers, confirming that the linear model-based estimation approach was effective across cultivars. The MAE values for all cultivars were within 1.3 nodes, and no extreme degradation in performance attributable to cultivar characteristics was observed. For JL and CW, the MAE values were 1.30 and 1.17, respectively, indicating moderate accuracy. HW, the cultivar used to determine the model parameters, achieved an MAE of 1.11, indicating high consistency. LM, a late-maturing cultivar, was the only cultivar with an MAE below 1.0 (MAE = 0.93), and it exhibited the most stable performance among the four cultivars. Comparing the cultivar maturity groups, late-maturing cultivars tended to show lower errors and more stable estimates than mid-maturing cultivars (Table 4). These findings suggest a possible association between node-estimation accuracy and cultivar maturity characteristics.
Finally, the node estimates obtained in Trial 3 were automatically integrated into the growth information–sharing system using Discord and utilized for feedback among growers, extension officers, and researchers.

3.4. Benchmark Comparison with a Single-Stage YOLO Detector

To characterize the computational and practical properties of the proposed pipeline, we compared it with a pure YOLOv5 detector, which directly estimates leaf counts using a single-stage detection framework, whereas the proposed method employs a multi-stage pipeline integrating Grounding DINO, a YOLO-based classifier, and MiDaS.
As shown in Table 5, YOLOv5 is computationally lightweight, with 7.0 million parameters, 63 GFLOPs, and an average inference time of 34 ms per image. In contrast, the proposed method requires 518 million parameters and 860 GFLOPs, resulting in an inference time of 1811 ms and peak GPU memory usage of 4.8 GB. This difference mainly arises from the transformer-based architectures of Grounding DINO and MiDaS and the repeated classification of multiple candidate regions (on average 88 per image).
Despite the higher computational cost, the proposed method achieved substantially better node-estimation accuracy. As shown in Table 6, in Trial 1 the coefficient of determination increased from 0.723 to 0.930 and the RMSE decreased from 1.73 to 0.91, corresponding to an approximately 47% reduction in error. In Trial 3, the proposed method also showed slightly higher R2 and lower RMSE and MAE than YOLOv5, indicating comparable or better generalization.
Overall, these results reveal a clear accuracy–complexity trade-off: although the proposed method is much more computationally demanding than YOLOv5, it provides more accurate and robust node estimation under diverse field conditions.

3.5. Sharing of Growth Information via Discord Notifications

In Trial 3, an automated notification system using Discord was employed to share daily node-estimation data among growers, extension officers, and researchers. The notification content was automatically generated using a Python script and sent immediately after the analysis was completed to Google Colaboratory. Each notification included a graph of the node number progression up to the current day and analytical images showing the leaf detection results, allowing members of the channel to promptly grasp the growth status of each grower (Figure 7). The notifications displayed the node-estimation results for each grower and cultivar in graphical form, enabling simultaneous confirmation of both numerical indicators and visual information.
Notifications were issued twice a week (every 2–3 days), reflecting the updated node estimates derived from camera images in each field. Following each notification, comments and questions from growers and extension officers were posted on the channel, leading to discussions directly related to growth management, such as irrigation volume and method, timing of additional fertilization, and pest and disease control strategies, based on the observed progression of growth in the images. Additionally, among growers cultivating the same cultivar across multiple fields, a graphical comparison of the growth progress served as an opportunity for mutual learning, allowing them to exchange information regarding differences in cultivation practices, such as irrigation and temperature management.

4. Discussion

4.1. Effectiveness of the Zero-Shot Approach for Node Estimation

The node-estimation model constructed in this study demonstrated the effectiveness of the zero-shot approach centered on Grounding DINO, yielding strong performance in Trial 1 with R2 = 0.930 and an MAE of 0.73 nodes. Furthermore, the trimodel hybrid architecture—combining Grounding DINO with MiDaS for depth estimation and a YOLO classifier—effectively suppressed greenhouse-specific noise (background structures, overlapping plants, and variations in lighting conditions) and contributed to stable leaf counting. In addition, the morphological characteristics of Lisianthus, which exhibit opposite phyllotaxy favorable for leaf count–based estimation, also supported the accuracy of the model. The fact that estimation errors generally remained within ±1 node suggests that the proposed method is suitable for practical use as a growth-stage indicator in production environments.
A previous study involving 3D modeling analysis reported a node-estimation error of 1.2 nodes [24]. In Trial 3 of the present study, the overall analysis across all growers yielded an MAE of 1.14 nodes, and grower-specific analyses showed MAE values ranging from 0.91 to 1.32 nodes. These results indicate that the performance obtained in this study is comparable to or, under certain conditions, may exceed that of previous research.

4.2. Generalization Performance Across Cropping Seasons and Growers

The node-estimation model constructed in this study exhibited strong generalization performance across different cropping seasons and growers. When the model developed using the seasonal cropping data in Trial 1 was applied to the retarded cropping dataset in Trial 2 without additional training, it maintained a high accuracy (MAE = 0.45). Moreover, the model could be reused as-is in the seasonal cropping trial conducted the following year (Trial 3), where it achieved an accuracy of approximately ±1 node (MAE = 1.14) even under more diverse conditions involving multiple growers and cultivars. This demonstrates the practical applicability of the model for real-world deployment.
Furthermore, the fact that the estimation accuracy did not markedly deteriorate among growers with different levels of cultivation experience suggests that the model may be capable of absorbing a certain degree of variability in plant growth arising from differences in grower practices.

4.3. Influence of Growth Stage Differences on Node-Estimation Error

In Trial 3, node numbers tended to be underestimated during the early growth stage owing to delayed bolting and limited stem elongation. Previous studies have reported that, in young leaves, the accuracy of leaf segmentation decreases because of variations in leaf shape, orientation, and occlusion caused by overlapping tissues [38]. Another study noted that dramatic morphological differences between juvenile and mature leaves could complicate the construction of unified models across different growth stages [39]. In the present study, morphological bottlenecks, such as small leaf size, overlapping leaf structures, and short internode length in the early stages (below approximately four nodes), likely contributed to reduced estimation accuracy. Consistent with this interpretation, quantitative evaluation showed that relative errors were markedly higher during the early growth stage, whereas MAPE decreased substantially once plants exceeded approximately four nodes. Because node-based management becomes practically relevant mainly after bolting, this indicates that the proposed method is primarily reliable from the mid-growth stage onward, while estimates during the earliest stages should be interpreted with caution.
In addition, Trial 3 revealed that Grower C consistently showed lower estimated node numbers throughout the period and that variability increased in the later growth stages owing to cultivar-specific traits. In the case of Grower C, reduced growth vigor, characterized by lower plant height and limited stem elongation, likely increased leaf occlusion and decreased the visibility of individual leaves, leading to systematic underestimation of node numbers. These findings indicate that differences in growth characteristics contribute to node-estimation errors. This represents a practical limitation of the system when applied to real production environments, and highlights an important issue to be addressed in future model improvements.

4.4. Effects of Cultivar Differences and Maturity Characteristics

In phenotyping research, it is well-established that differences in leaf morphology and plant architecture across cultivars influence model performance [40,41]. Consistent with these findings, the cultivar-specific analyses in the present study demonstrated that inherent variations in leaf morphology and plant form among Lisianthus cultivars can affect node-estimation accuracy.
A particularly notable result was that the late-maturing cultivar Largo Marine (LM) exhibited the smallest error (MAE ≈ 0.9). This may be attributable to the longer number of days required to form each node and the comparatively longer internode length in the late-maturing cultivars, which likely reduced leaf overlap and stabilized leaf recognition in the images. Prior research on other crops has reported that the maturity class influences leaf emergence and internode elongation [42,43].
In contrast, the early–medium cultivar Julius Lavender and the medium-maturing Celebrich White (CW) exhibited larger errors than Largo Marine (LM), likely due to occlusion caused by greater leaf overlap, which destabilized leaf detection. The cultivar Happiness White (HW), which was used in the model construction, also showed comparatively high accuracy in the subsequent year’s trial, suggesting that growth characteristics similar to those in the training environment contributed to improved performance.
Overall, these findings suggest a relationship between cultivar maturity class (early–mid–late) and node-estimation accuracy; late-maturing cultivars tend to exhibit more stable leaf detection and lower estimation errors.

4.5. Influence of Imaging Conditions on Estimation Accuracy

In Trial 1, the node-estimation error for Grower A was slightly larger than that for Growers B and C. One plausible explanation is the influence of the field of view of the camera, which is affected by the installation height. In Trial 1, the camera used by Grower A was installed at a slightly downward-facing angle compared to that of the other growers, which limited the ability to capture the increased number of leaves during later growth stages. This likely contributed to an underestimation of the number of nodes observed, particularly in the late growth period of Grower A.
In actual production fields, maintaining a constant field of view with fixed cameras can be challenging, owing to physical and spatial constraints. Nevertheless, when comparing across fields, standardizing imaging conditions as much as possible, along with applying preprocessing steps, such as lens distortion correction and cropping images to a “ten-plant equivalent width,” as implemented in this study, can partially compensate for differences in camera installation and improve consistency across growers.

4.6. Accuracy–Complexity Trade-Off of the Proposed Pipeline

Based on the benchmark results in Section 3.4, the proposed multi-stage pipeline exhibits a clear trade-off between estimation accuracy and computational cost. Compared with the single-stage YOLOv5 detector, the proposed method requires substantially more parameters, FLOPs, and GPU memory, and therefore shows much higher inference latency. However, this increased computational burden yields a pronounced improvement in node-estimation accuracy, particularly in Trial 1, where the coefficient of determination increased from 0.723 to 0.930 and the RMSE was reduced by nearly half. In Trial 3, which involved different growers and cultivars in the following year, the proposed method also achieved slightly better R2 and lower error metrics than YOLOv5, indicating that the accuracy gains were not limited to the training season.
This trade-off suggests that the proposed method is best suited for high-precision phenotyping and monitoring scenarios in which robustness and accuracy are prioritized over real-time performance. Owing to the zero-shot nature of Grounding DINO and MiDaS, the pipeline can be applied to new cultivars, growers, and cropping conditions without retraining the zero-shot models. Only minimal additional training is required for the auxiliary classifier, which is a major advantage for deployment in heterogeneous agricultural environments. In contrast, for applications with strict time or resource constraints, such as real-time field robotics or embedded systems, lightweight detectors such as YOLOv5 remain a more appropriate choice.

4.7. Practical Implementation and the Effects of Information Sharing on Communication

The growth monitoring system developed in this study demonstrated stable performance even with a low-cost hardware configuration combining a Raspberry Pi and an ATOM Cam2. Cloud integration via Google Drive facilitated seamless operations, from image capture to analysis and data storage, substantially reducing the barriers to adoption in production settings.
Moreover, the automated sharing of growth information via Discord enables growers, extension officers, and researchers to simultaneously reference the same data at the same time. Bidirectional communication facilitated through comment features improved the accuracy of growth assessment. Specifically, the near elimination of data transmission delays enables stakeholders to monitor daily growth conditions in real-time. Expert growers, extension officers, and researchers remotely observed the cultivation status and provided timely guidance when necessary.
Integrating node-number trends with environmental data collected from the existing environmental monitoring system [5,33]—such as temperature, humidity, and solar radiation—further enhanced the interpretation of growth delays and management differences, accelerating decision-making. These outcomes suggest that the system improves information-sharing efficiency among stakeholders and functions effectively as a practical field-ready technology for real-world implementation. From a practical perspective, the proposed system can be applied to real-world farming scenarios such as remote monitoring of crop growth progression, early identification of growth delays, and comparative assessment of cultivation practices across multiple fields or growers. By providing objective and continuous growth indicators derived from daily images, the system has the potential to reduce reliance on subjective visual assessments and to support growers with limited experience. In addition, the low-cost and remote-based nature of the system makes it particularly suitable for small-scale or labor-constrained farming operations, as well as for regions where farms are geographically dispersed and direct, frequent on-site technical guidance is difficult, such as areas undergoing agricultural revitalization.
The data sharing system constructed in this study is small-scale, decentralized, and flexible, offering advantages in enabling farmers to make decisions with enhanced transparency [44]. However, the volume of collected data is substantial, and challenges remain regarding how to organize accumulated big data and share it in a form that is easily interpretable for growers [45]. While the present study focuses on demonstrating the system implementation and operational feasibility, quantitative evaluation of the effectiveness of information sharing—such as its impact on skill acquisition by new growers—would require psychological and behavioral analyses, which are beyond the scope of this study and should be addressed in future work.

4.8. Limitations of the Demonstration Trials and Future Challenges

This study is based on three demonstration trials conducted in a specific region, Namie Town, Fukushima Prefecture, and therefore requires further validation in other regions, cropping types, and crop species. Additionally, the cultivation periods examined were limited to spring–summer and autumn, and data are lacking for winter conditions, where low temperatures may influence imaging characteristics and plant growth behavior.
Moreover, the proposed system assumes conditions under which plants generally maintain normal growth. The behavior of this model under abnormal growth conditions—such as lodging, excessive elongation, chemical injury, or pest and disease damage—has not been sufficiently evaluated. Future studies should not only examine the limits of node estimation under such conditions, but also consider integrating disease-detection algorithms and assessing the reliability of node estimates by comparing normal and abnormal growth patterns. Indeed, in Trial 3, analysis for Grower B was not possible because of severe growth suppression.
Because the experiments were conducted in greenhouse environments, the effects of extreme outdoor weather events were not directly investigated. However, although the study site experiences relatively stable natural light conditions, the logic for selecting representative daytime images to maximize analytical accuracy remains undeveloped. More robust methods for extracting and correcting images suitable for analysis under real-world production environments are required. In addition, rare visibility-degrading phenomena inside facilities—such as fog or haze—were not explicitly evaluated and remain important subjects for further detailed examination, as they may affect image contrast and detection stability under certain conditions.
Extension of the proposed framework to other crop species also represents an important future challenge. The authors conducted preliminary trials applying the same leaf-counting pipeline to stock (Matthiola incana (L.) R. Br.), another floricultural crop, and confirmed that the leaf-counting process functioned effectively regardless of crop species. However, node estimation required further improvement owing to occlusion effects specific to plant architecture. These observations suggest that, while the proposed approach has a certain degree of generality, estimation accuracy must be carefully evaluated for each crop species.
Given these limitations, additional validation across diverse regions, seasons, crop species, and environmental conditions is essential to improve the implementation and generalizability of the system.

5. Conclusions

This study presented a hybrid zero-shot–based framework for estimating node numbers and sharing growth information in lisianthus cultivation under real greenhouse conditions in Fukushima. By integrating a vision–language model (Grounding DINO), depth estimation (MiDaS), and a lightweight classifier (YOLO), the proposed system enabled robust leaf detection and node estimation without crop-specific retraining of the zero-shot models, while effectively suppressing background noise and occlusion.
Through three demonstration trials conducted across different cropping seasons, years, growers, and cultivars, the system achieved practical accuracy, with typical estimation errors within approximately ±1 node and stable generalization performance. Additional analyses showed that estimation reliability improves after the early growth stage (around four nodes), providing useful guidance for practical deployment.
Beyond phenotyping accuracy, the study demonstrated a low-cost, field-deployable information-sharing platform using consumer-grade cameras, Raspberry Pi devices, and cloud services. This infrastructure allowed growers, extension officers, and researchers to simultaneously monitor growth dynamics and exchange feedback in near real time, offering a practical foundation for digital support in floricultural production, particularly in regions with dispersed farms and limited on-site advisory capacity.
While further validation across regions, crop species, and stress conditions is required, the proposed framework provides a practical, generalizable, and extensible data-driven crop growth monitoring framework that can support real-world floricultural production. Ultimately, it can also help rebuild resilient agricultural communities that sustain people living and working in post-disaster regions.

Author Contributions

Conceptualization, H.N. and Y.Y.; methodology, H.N., K.K., O.I. and Y.Y.; software, H.N. and Y.Y.; validation, H.N., K.K., O.I. and Y.Y.; investigation, Y.Y. and K.K.; resources, Y.Y. and K.K.; data curation, H.N. and Y.Y.; writing—original draft preparation, H.N.; writing—review and editing, Y.Y., K.K., O.I., F.H., N.H. and Y.Y.; visualization, H.N.; supervision, F.H. and N.H.; project administration, N.H. and Y.Y.; funding acquisition, N.H. and Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research is an outcome of “Technological verification for resumption of farming in the specified reconstruction and revitalization base areas” (JPFR25060105) among advanced technology development projects in the field of agriculture, forestry, and fisheries. (Fukushima Institute for Research, Education and Innovation (F-REI)).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available in GitHub at https://github.com/BeieUTokyo/LisianthusNodeCount (accessed on 13 January 2026). The repository and accompanying documentation are provided only in Japanese.

Acknowledgments

We express our sincere gratitude to Hiroyuki Munakata and Takashi Hirayama of the Hama Agricultural Regeneration Research Centre, Fukushima Agricultural Technology Centre, who served as supervisors for the field survey conducted in this study. We also wish to thank Yuhei Sato (currently in the Agricultural Promotion and Extension Division, Minamiaizu Agriculture and Forestry Office) for his invaluable support in designing and preparing Trial 1. We would also like to express our appreciation to Takanori Yasuda (currently with the Aizubange Agriculture Promotion Sector, Aizu Agriculture and Forestry Office) and Natuki Sasaki of the Futaba Agriculture Promotion Sector, Soso Agriculture and Forestry Office, who reviewed the monitoring system during the trials and provided valuable advice on its practical dissemination. Finally, we gratefully acknowledge the producers who cooperated in the demonstration trials and provided valuable insights for improving the system.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
VLMVision–Language Model
ZSLZero-Shot Learning
ZSDZero-Shot Detection
ICTInformation and Communication Technology
UAVUnmanned Aerial Vehicle
JLJulius Lavender
CWCelebrich White
HWHappiness White
LMLargo Marine
APNFAntique Pink
RTSPReal-Time Streaming Protocol
NMSNon-Maximum Suppression
MAEMean Absolute Error
RMSERoot Mean Square Error
MAPEMean Absolute Percentage Error
FLOPsfloating-point operations
R2Coefficient of Determination
pFpF value (soil water potential unit)
RGBRed–Green–Blue (color channels)

References

  1. Fukuda, N. New production technology for Eustoma grandiflorum developed and demonstrated in the project “A Scheme to Revitalize Agriculture and Fisheries in Disaster Areas through Deploying Highly Advanced Technology”. J. NARO Res. Dev. 2021, 8, 185–189. (In Japanese) [Google Scholar] [CrossRef]
  2. Yasuda, T.; Nemoto, T.; Egawa, K. Cultivation techniques of Eustoma can be shared in the ICT network. Tohoku Agric. Res. 2019, 72, 91–92. (In Japanese) [Google Scholar]
  3. Handa, T.; Deroles, S.C. Transgenic Eustoma grandiflorum (Lisianthus). In Transgenic Crops III; Springer: Berlin/Heidelberg, Germany, 2001; pp. 107–122. [Google Scholar]
  4. Kumaresan, M.; Rajaselvam, M.; Devi, K.N.; Vasanthkumar, S.S. A newly emerging potential cut flower of Lisianthus (Eustoma grandiflorum) in Tamil Nadu: A review. Ecol. Environ. Conserv. 2024, 30, 790–795. [Google Scholar] [CrossRef]
  5. Sato, Y.; Yamashita, Y.; Inaba, O.; Naito, H.; Hoshi, N. Utilization and evaluation of a commuting-based agricultural support system for stock and Eustoma cultivation in areas where farming has resumed. Tohoku Agric. Res. 2023, 76, 91–92. (In Japanese) [Google Scholar]
  6. Bourland, F.M.; Oosterhuis, D.M.; Tugwell, N.P. Concept for monitoring the growth and development of cotton plants using main-stem node counts. J. Prod. Agric. 1992, 5, 532–538. [Google Scholar] [CrossRef]
  7. Rocamora-Osorio, C.; Aragon-Rodriguez, F.; Codes-Alcaraz, A.M.; Ferrández-Pastor, F.J. Automated IoT-Based Monitoring of Industrial Hemp in Greenhouses Using Open-Source Systems and Computer Vision. AgriEngineering 2025, 7, 272. [Google Scholar] [CrossRef]
  8. Zhou, Y.; Yan, H.; Ding, K.; Cai, T.; Zhang, Y. Few-shot image classification of crop diseases based on vision–language models. Sensors 2024, 24, 6109. [Google Scholar] [CrossRef]
  9. Bao, Q.Z.; Yang, Y.X.; Li, Q.; Yang, H.C. Zero-shot instance segmentation for plant phenotyping in vertical farming with foundation models and VC-NMS. Front. Plant Sci. 2025, 16, 1536226. [Google Scholar] [CrossRef]
  10. Naito, H.; Shimomoto, K.; Fukatsu, T.; Hosoi, F.; Ota, T. Interoperability analysis of tomato fruit detection models for images taken at different facilities, cultivation methods, and times of the day. AgriEngineering 2024, 6, 1827–1846. [Google Scholar] [CrossRef]
  11. Evans, K.J.; Terhorst, A.; Kang, B.H. From data to decisions: Helping crop producers build their actionable knowledge. Crit. Rev. Plant Sci. 2017, 36, 71–88. [Google Scholar] [CrossRef]
  12. Howland, F.C.; Muñoz, L.A.; Staiger, S.; Cock, J.; Alvarez, S. Data sharing and use of ICTs in agriculture: Working with small farmer groups in Colombia. Knowl. Manag. Dev. J. 2015, 11, 44–63. [Google Scholar]
  13. Li, L.; Zhang, Q.; Huang, D. A review of imaging techniques for plant phenotyping. Sensors 2014, 14, 20078–20111. [Google Scholar] [CrossRef] [PubMed]
  14. Li, Z.; Guo, R.; Li, M.; Chen, Y.; Li, G. A review of computer vision technologies for plant phenotyping. Comput. Electron. Agric. 2020, 176, 105672. [Google Scholar] [CrossRef]
  15. Tong, Y.S.; Lee, T.H.; Yen, K.S. Deep learning for image-based plant growth monitoring: A review. Int. J. Eng. Technol. Innov. 2022, 12, 225–246. [Google Scholar] [CrossRef]
  16. Murphy, K.M.; Ludwig, E.; Gutierrez, J.; Gehan, M.A. Deep learning in image-based plant phenotyping. Annu. Rev. Plant Biol. 2024, 75, 771–795. [Google Scholar] [CrossRef] [PubMed]
  17. Fan, X.; Zhou, R.; Tjahjadi, T.; Das Choudhury, S.; Ye, Q. A segmentation-guided deep learning framework for leaf counting. Front. Plant Sci. 2022, 13, 844522. [Google Scholar] [CrossRef]
  18. Deb, M.; Dhal, K.G.; Das, A.; Hussien, A.G.; Abualigah, L.; Garai, A. A CNN-based model to count the leaves of rosette plants (LC-Net). Sci. Rep. 2024, 14, 1496. [Google Scholar] [CrossRef]
  19. Hu, J.; Li, G.; Mo, H.; Lv, Y.; Qian, T.; Chen, M.; Lu, S. Crop node detection and internode length estimation using an improved YOLOv5 model. Agriculture 2023, 13, 473. [Google Scholar] [CrossRef]
  20. Chacón, B.; Ballester, R.; Birlanga, V.; Rolland-Lagan, A.G.; Pérez-Pérez, J.M. A quantitative framework for flower phenotyping in cultivated carnation (Dianthus caryophyllus L.). PLoS ONE 2013, 8, e82165. [Google Scholar] [CrossRef]
  21. Shinoda, R.; Motoki, K.; Hara, K.; Kataoka, H.; Nakano, R.; Nakazaki, T.; Noguchi, R. RoseTracker: A system for automated rose growth monitoring. Smart Agric. Technol. 2023, 5, 100271. [Google Scholar] [CrossRef]
  22. Zhao, F.; Ren, Z.; Wang, J.; Wu, Q.; Xi, D.; Shao, X.; Mizuno, K. Smart UAV-assisted rose growth monitoring with improved YOLOv10 and Mamba restoration techniques. Smart Agric. Technol. 2025, 10, 100730. [Google Scholar] [CrossRef]
  23. Turgut, K.; Dutagaci, H.; Rousseau, D. RoseSegNet: An attention-based deep learning architecture for organ segmentation of plants. Biosyst. Eng. 2022, 221, 138–153. [Google Scholar] [CrossRef]
  24. Yanagita, R.; Naito, H.; Yamashita, Y.; Hosoi, F. Estimation of growth parameters of Eustoma grandiflorum using a smartphone 3D scanner. Eng 2025, 6, 232. [Google Scholar] [CrossRef]
  25. Xian, Y.; Schiele, B.; Akata, Z. Zero-shot learning—The good, the bad and the ugly. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4582–4591. [Google Scholar]
  26. Chen, J.; Geng, Y.; Chen, Z.; Horrocks, I.; Pan, J.Z.; Chen, H. Knowledge-aware zero-shot learning: Survey and perspective. arXiv 2021, arXiv:2103.00070. [Google Scholar]
  27. Gu, X.; Lin, T.-Y.; Kuo, W.; Cui, Y. Open-vocabulary object detection via vision and language knowledge distillation. arXiv 2021, arXiv:2104.13921. [Google Scholar]
  28. Zhong, Y.; Yang, J.; Zhang, P.; Li, C.; Codella, N.; Li, L.H.; Gao, J. RegionCLIP: Region-based language–image pretraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 16793–16803. [Google Scholar]
  29. Li, L.H.; Zhang, P.; Zhang, H.; Yang, J.; Li, C.; Zhong, Y.; Gao, J. Grounded language–image pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 10965–10975. [Google Scholar]
  30. Liu, S.; Zeng, Z.; Ren, T.; Li, F.; Zhang, H.; Yang, J.; Zhang, L. Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection. In Proceedings of the European Conference on Computer Vision; Springer Nature: Cham, Switzerland, 2024; pp. 38–55. [Google Scholar]
  31. Mullins, C.C.; Esau, T.J.; Zaman, Q.U.; Toombs, C.L.; Hennessy, P.J. Leveraging zero-shot detection mechanisms to accelerate image annotation for machine learning in wild blueberry (Vaccinium angustifolium Ait.). Agronomy 2024, 14, 2830. [Google Scholar] [CrossRef]
  32. Jiang, J.; Wang, S.; Gao, Q.; Wang, S.; Chai, X.; Li, H. Grounding DINO for enhanced quarantine pest detection in grain imports. In Proceedings of the International Conference on Measurement, Communication, and Virtual Reality (MCVR 2024); SPIE: Hangzhou, China, 2025; Volume 13634, pp. 264–273. [Google Scholar]
  33. Yamashita, Y.; Naito, H.; Inaba, O.; Nemoto, T.; Kanai, G.; Hoshi, N. Development of a remote greenhouse monitoring system to support agriculture after resumption of farming in Fukushima. J. NARO Res. Dev. 2021, 8, 211–230. (In Japanese) [Google Scholar] [CrossRef]
  34. Google. Google Drive API—REST Resource: v3. Available online: https://developers.google.com/workspace/drive/api/reference/rest/v3 (accessed on 28 November 2025).
  35. Ultralytics. GitHub Repository—Ultralytics YOLO. Available online: https://github.com/ultralytics/ultralytics (accessed on 28 November 2025).
  36. Ranftl, R.; Lasinger, K.; Hafner, D.; Schindler, K.; Koltun, V. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 1623–1637. [Google Scholar] [CrossRef]
  37. Discord Inc. Discord—Group Chat That’s All Fun & Games. Available online: https://discord.com/ (accessed on 28 November 2025).
  38. Scharr, H.; Minervini, M.; French, A.P.; Klukas, C.; Kramer, D.M.; Liu, X.; Tsaftaris, S.A. Leaf segmentation in plant phenotyping: A collation study. Mach. Vis. Appl. 2016, 27, 585–606. [Google Scholar] [CrossRef]
  39. Dornbusch, T.; Watt, J.; Baccar, R.; Fournier, C.; Andrieu, B. A comparative analysis of leaf shape of wheat, barley and maize using an empirical shape model. Ann. Bot. 2011, 107, 865–873. [Google Scholar] [CrossRef]
  40. Pérez-Pérez, J.M.; Esteve-Bruna, D.; Micol, J.L. QTL analysis of leaf architecture. J. Plant Res. 2010, 123, 15–23. [Google Scholar] [CrossRef] [PubMed]
  41. Song, Q.; Liu, F.; Bu, H.; Zhu, X.G. Quantifying contributions of different factors to canopy photosynthesis in two maize varieties: Development of a novel 3D canopy modeling pipeline. Plant Phenomics 2023, 5, 0075. [Google Scholar] [CrossRef] [PubMed]
  42. Pao, C.I.; Morgan, P.W. Genetic regulation of development in Sorghum bicolor: I. Role of the maturity genes. Plant Physiol. 1986, 82, 575–580. [Google Scholar] [CrossRef]
  43. Clerget, B.; Dingkuhn, M.; Gozé, E.; Rattunde, H.F.W.; Ney, B. Variability of phyllochron, plastochron and rate of increase in height in photoperiod-sensitive sorghum varieties. Ann. Bot. 2008, 101, 579–594. [Google Scholar] [CrossRef]
  44. Borrero, J.D.; Mariscal, J. A case study of a digital data platform for the agricultural sector: A valuable decision support system for small farmers. Agriculture 2022, 12, 767. [Google Scholar] [CrossRef]
  45. Iftikhar, N. Integration, aggregation and exchange of farming device data: A high-level perspective. In Proceedings of the Second International Conference on the Applications of Digital Information and Web Technologies, London, UK, 4–6 August 2009; pp. 14–19. [Google Scholar]
Figure 1. Overview of the automated image acquisition system installed in growers’ greenhouses. The system integrates fixed network cameras (ATOM Cam2) positioned beside crop rows with a Raspberry Pi unit that periodically captures images via RTSP. Captured images are stored locally and automatically uploaded to Google Drive via the Drive API, enabling standardized, time-series monitoring for subsequent leaf-counting and node-estimation analyses.
Figure 1. Overview of the automated image acquisition system installed in growers’ greenhouses. The system integrates fixed network cameras (ATOM Cam2) positioned beside crop rows with a Raspberry Pi unit that periodically captures images via RTSP. Captured images are stored locally and automatically uploaded to Google Drive via the Drive API, enabling standardized, time-series monitoring for subsequent leaf-counting and node-estimation analyses.
Agriculture 16 00296 g001
Figure 2. Overview of the leaf-counting pipeline integrating Grounding DINO, YOLO, and MiDaS. The pipeline consists of three sequential steps: (1) zero-shot detection of candidate leaf regions using Grounding DINO with a text prompt, (2) classification of detected regions into single, multiple, or background classes using a YOLO-based classifier, and (3) depth-based outlier removal using MiDaS to exclude foreground and background interference. Yellow bounding boxes indicate candidate leaf regions at each stage of the pipeline. Only the validated single-leaf regions were aggregated to compute the final leaf count per image.
Figure 2. Overview of the leaf-counting pipeline integrating Grounding DINO, YOLO, and MiDaS. The pipeline consists of three sequential steps: (1) zero-shot detection of candidate leaf regions using Grounding DINO with a text prompt, (2) classification of detected regions into single, multiple, or background classes using a YOLO-based classifier, and (3) depth-based outlier removal using MiDaS to exclude foreground and background interference. Yellow bounding boxes indicate candidate leaf regions at each stage of the pipeline. Only the validated single-leaf regions were aggregated to compute the final leaf count per image.
Agriculture 16 00296 g002
Figure 3. Relationship between the counted leaf number and the manually observed node number in Trial 1 (2024 seasonal season). The leaf numbers were obtained using a hybrid leaf-count–based estimation model without additional domain adaptation. A strong linear relationship was observed among the three growers (A–C), indicating that the model successfully captured node development under varying field conditions.
Figure 3. Relationship between the counted leaf number and the manually observed node number in Trial 1 (2024 seasonal season). The leaf numbers were obtained using a hybrid leaf-count–based estimation model without additional domain adaptation. A strong linear relationship was observed among the three growers (A–C), indicating that the model successfully captured node development under varying field conditions.
Agriculture 16 00296 g003
Figure 4. Overall relationship between estimated and observed node numbers in Trial 3 (2025 seasonal crop). The node count estimation model developed in the previous year (Trial 1) was applied without retraining. The estimated values were in good agreement with manual observations (R2 = 0.768, MAE = 1.14, n = 46), demonstrating year-to-year reproducibility across multiple growers and cultivars.
Figure 4. Overall relationship between estimated and observed node numbers in Trial 3 (2025 seasonal crop). The node count estimation model developed in the previous year (Trial 1) was applied without retraining. The estimated values were in good agreement with manual observations (R2 = 0.768, MAE = 1.14, n = 46), demonstrating year-to-year reproducibility across multiple growers and cultivars.
Agriculture 16 00296 g004
Figure 5. Regression results of estimated versus observed node numbers for each grower in Trial 3 (2025 seasonal crop). Although the early growth stages showed consistent underestimation across growers, mid- to late-stage trends for growers A and D aligned closely with the 1:1 line. Grower C exhibited a persistent underestimation throughout the monitoring period. Overall, the estimation errors remained within approximately one node for all the growers.
Figure 5. Regression results of estimated versus observed node numbers for each grower in Trial 3 (2025 seasonal crop). Although the early growth stages showed consistent underestimation across growers, mid- to late-stage trends for growers A and D aligned closely with the 1:1 line. Grower C exhibited a persistent underestimation throughout the monitoring period. Overall, the estimation errors remained within approximately one node for all the growers.
Agriculture 16 00296 g005
Figure 6. Regression results of estimated versus observed node numbers for each cultivar in Trial 3 (2025 seasonal crop). All four cultivars exhibited positive linear relationships between the estimated and observed node numbers, indicating that the model performed consistently across the different maturity types. Happiness White (HW), which was included in model development during Trial 1, showed relatively high agreement with observed values, while the late-maturing Largo Marine (LM) displayed the smallest dispersion around the 1:1 line. Mid-maturing cultivars (Julius Lavender (JL), Celebrich White (CW)) demonstrated moderate deviations.
Figure 6. Regression results of estimated versus observed node numbers for each cultivar in Trial 3 (2025 seasonal crop). All four cultivars exhibited positive linear relationships between the estimated and observed node numbers, indicating that the model performed consistently across the different maturity types. Happiness White (HW), which was included in model development during Trial 1, showed relatively high agreement with observed values, while the late-maturing Largo Marine (LM) displayed the smallest dispersion around the 1:1 line. Mid-maturing cultivars (Julius Lavender (JL), Celebrich White (CW)) demonstrated moderate deviations.
Agriculture 16 00296 g006
Figure 7. Example of automated growth-monitoring notifications posted to Discord. Daily node-number estimates and representative leaf-detection images were automatically generated using a Python script and posted to a shared Discord channel. The left panel shows the temporal transition of the estimated node numbers for each grower and cultivar, whereas the right panel displays a representative image with overlaid detected leaves. Yellow bounding boxes indicate automatically detected leaf regions. This notification system enables real-time sharing of growth information among growers, extension officers, and researchers, facilitating discussion and decision-making related to crop management.
Figure 7. Example of automated growth-monitoring notifications posted to Discord. Daily node-number estimates and representative leaf-detection images were automatically generated using a Python script and posted to a shared Discord channel. The left panel shows the temporal transition of the estimated node numbers for each grower and cultivar, whereas the right panel displays a representative image with overlaid detected leaves. Yellow bounding boxes indicate automatically detected leaf regions. This notification system enables real-time sharing of growth information among growers, extension officers, and researchers, facilitating discussion and decision-making related to crop management.
Agriculture 16 00296 g007
Table 1. Summary of the three field trials conducted in this study. The table lists cropping type, growers involved, planting dates, and cultivars used in each trial.
Table 1. Summary of the three field trials conducted in this study. The table lists cropping type, growers involved, planting dates, and cultivars used in each trial.
ExperimentCropping Type (Year)ObjectivePlanting DateMonitoring
Start-End Dates
GrowerCultivar
Trial 1Seasonal (2024)Model Construction19 April 20248 May 2024
23 July 2024
A, B, and CHW
Trial 2Retarding (2024)Accuracy Evaluation23 August 20246 September 2024
26 October 2024
AAP
Trial 3Seasonal (2025)Reproducibility Assessment17 April 20258 May 2025
26 June 2025
A, B *, C and DHW, JL, CW, LM
* Grower B was excluded from the image analysis because of severe soil-borne diseases that significantly suppressed plant growth.
Table 2. Prediction accuracy of the node-estimation model in Trial 1. The mean absolute error (MAE), root mean square error (RMSE) and coefficient of determination (R2) were calculated using weekly field measurements from the three growers.
Table 2. Prediction accuracy of the node-estimation model in Trial 1. The mean absolute error (MAE), root mean square error (RMSE) and coefficient of determination (R2) were calculated using weekly field measurements from the three growers.
GrowerSample Size (n)MAE (Nodes)RMSE (Nodes)R2
A, B, and C260.730.910.930
A80.831.020.873
B80.680.840.936
C100.570.760.952
Table 3. Accuracy metrics of node-count estimation for each grower in Trial 3 (2025 seasonal crop). Summary of estimation accuracy evaluated using mean absolute error (MAE), root mean square error (RMSE) and coefficient of determination (R2). All growers demonstrated MAE values within 0.91–1.32 nodes, indicating that the model maintained practical accuracy across different production environments.
Table 3. Accuracy metrics of node-count estimation for each grower in Trial 3 (2025 seasonal crop). Summary of estimation accuracy evaluated using mean absolute error (MAE), root mean square error (RMSE) and coefficient of determination (R2). All growers demonstrated MAE values within 0.91–1.32 nodes, indicating that the model maintained practical accuracy across different production environments.
GrowerSample Size (n)MAE (Nodes)RMSE (Nodes)R2
A, C, and D461.141.380.768
A151.321.510.728
C161.191.510.727
D150.911.060.849
Table 4. Accuracy metrics of node-count estimation for each cultivar in Trial 3 (2025 seasonal crop). Summary of mean absolute error (MAE), root mean square error (RMSE) and coefficient of determination (R2) for the four cultivars. The estimation errors remained within 1.3 nodes across all cultivars, with the late-maturing Largo Marine (LM) achieving the highest accuracy (MAE = 0.91), suggesting that the model performance may vary with the maturity class of the cultivar.
Table 4. Accuracy metrics of node-count estimation for each cultivar in Trial 3 (2025 seasonal crop). Summary of mean absolute error (MAE), root mean square error (RMSE) and coefficient of determination (R2) for the four cultivars. The estimation errors remained within 1.3 nodes across all cultivars, with the late-maturing Largo Marine (LM) achieving the highest accuracy (MAE = 0.91), suggesting that the model performance may vary with the maturity class of the cultivar.
CultivarMaturity ClassSample Size (n)MAE (Nodes)RMSE (Nodes)R2
JLEarly–Mid Season121.301.510.643
CWMid Season111.171.330.716
HWMid–Late Season111.111.350.774
LMLate Season120.931.220.864
Table 5. Computational complexity of YOLOv5 and the proposed multi-stage pipeline. Comparison of model size (parameters), computational cost (FLOPs), inference latency, and peak GPU memory usage measured at an input resolution of 1280 × 1280 pixels with batch size 1 on an NVIDIA Tesla T4 GPU. Latency was measured for model execution excluding disk input/output operations.
Table 5. Computational complexity of YOLOv5 and the proposed multi-stage pipeline. Comparison of model size (parameters), computational cost (FLOPs), inference latency, and peak GPU memory usage measured at an input resolution of 1280 × 1280 pixels with batch size 1 on an NVIDIA Tesla T4 GPU. Latency was measured for model execution excluding disk input/output operations.
MethodParams (M)FLOPs (GFLOPs)Latency (ms)Peak GPU Mem (MB)
YOLOv5 (detector)7.016334.4119
Proposal
(GDINO+YOLOcls+MiDaS)
518.386018114830
Table 6. Accuracy of node number estimation for YOLOv5 and the proposed method. Comparison of the coefficient of determination (R2), root mean square error (RMSE), and mean absolute error (MAE) across three demonstration trials. Trial 1 was used for model construction, Trial 2 for cross-season validation, and Trial 3 for reproducibility assessment across different growers and cultivars.
Table 6. Accuracy of node number estimation for YOLOv5 and the proposed method. Comparison of the coefficient of determination (R2), root mean square error (RMSE), and mean absolute error (MAE) across three demonstration trials. Trial 1 was used for model construction, Trial 2 for cross-season validation, and Trial 3 for reproducibility assessment across different growers and cultivars.
TrialMethodSample Size (n)MAE (Nodes)RMSE (Nodes)R2
Trial 1YOLOv5251.211.730.723
Proposal260.730.910.930
Trial 2YOLOv530.690.84
Proposal30.450.46
Trial 3YOLOv5461.161.450.745
Proposal461.141.380.768
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Naito, H.; Kobayashi, K.; Inaba, O.; Hosoi, F.; Hoshi, N.; Yamashita, Y. Hybrid Zero-Shot Node-Count Estimation and Growth-Information Sharing for Lisianthus (Eustoma grandiflorum) Cultivation in Fukushima’s Floricultural Revitalization. Agriculture 2026, 16, 296. https://doi.org/10.3390/agriculture16030296

AMA Style

Naito H, Kobayashi K, Inaba O, Hosoi F, Hoshi N, Yamashita Y. Hybrid Zero-Shot Node-Count Estimation and Growth-Information Sharing for Lisianthus (Eustoma grandiflorum) Cultivation in Fukushima’s Floricultural Revitalization. Agriculture. 2026; 16(3):296. https://doi.org/10.3390/agriculture16030296

Chicago/Turabian Style

Naito, Hiroki, Kota Kobayashi, Osamu Inaba, Fumiki Hosoi, Norihiro Hoshi, and Yoshimichi Yamashita. 2026. "Hybrid Zero-Shot Node-Count Estimation and Growth-Information Sharing for Lisianthus (Eustoma grandiflorum) Cultivation in Fukushima’s Floricultural Revitalization" Agriculture 16, no. 3: 296. https://doi.org/10.3390/agriculture16030296

APA Style

Naito, H., Kobayashi, K., Inaba, O., Hosoi, F., Hoshi, N., & Yamashita, Y. (2026). Hybrid Zero-Shot Node-Count Estimation and Growth-Information Sharing for Lisianthus (Eustoma grandiflorum) Cultivation in Fukushima’s Floricultural Revitalization. Agriculture, 16(3), 296. https://doi.org/10.3390/agriculture16030296

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop