Next Article in Journal
Analysis Method and Experiment on the Influence of Hard Bottom Layer Contour on Agricultural Machinery Motion Position and Posture Changes
Previous Article in Journal
Application of Low-Altitude Imaging and Vegetation Indices in Land Consolidation Processes on Rural Areas: Cross-Border Perspective
Previous Article in Special Issue
Rapid Identification and Accurate Localization of Walnut Trunks Based on TIoU-YOLOv8n-Pruned
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Prediction of Optimal Harvest Timing for Melons Through Integration of RGB Images and Greenhouse Environmental Data: A Practical Approach Including Marker Effect Analysis

1
Department of Smart Agriculture Major, Sunchon National University, Suncheon 57922, Republic of Korea
2
Jeollanam-do Agricultural Research & Extension Service, Naju 58213, Republic of Korea
3
Department of Convergence Biosystems Mechanical Engineering, Sunchon National University, Suncheon 57922, Republic of Korea
*
Author to whom correspondence should be addressed.
Agriculture 2026, 16(2), 169; https://doi.org/10.3390/agriculture16020169
Submission received: 28 November 2025 / Revised: 30 December 2025 / Accepted: 31 December 2025 / Published: 9 January 2026

Abstract

Non-destructive prediction of harvest timing is increasingly important in greenhouse melon cultivation, yet image-based methods alone often fail to reflect environmental factors affecting fruit development. Likewise, environmental or fertigation data alone cannot capture fruit-level variation. This gap calls for a multimodal approach integrating both sources of information. This study presents a fusion model combining RGB images with environmental and fertigation data to predict optimal harvest timing for melons. A YOLOv8n-based model detected fruits and estimated diameters under marker and no-marker conditions, while an LSTM processed time-series variables including temperature, humidity, CO2, light intensity, irrigation, and electrical conductivity. The extracted features were fused through a late-fusion strategy, followed by an MLP for predicting diameter, biomass, and harvest date. The marker condition improved detection accuracy; however, the no-marker condition also achieved sufficiently high performance for field application. Diameter and weight showed a strong correlation (R2 > 0.9), and the fusion model accurately predicted the actual harvest date of 28 August 2025. These results demonstrate the practicality of multimodal fusion for reliable, non-destructive harvest prediction and highlight its potential to bridge the gap between controlled experiments and real-world smart farming environments.

1. Introduction

Smart greenhouses have been established as core agricultural infrastructure to precisely manage limited water resources and fertilizers, simultaneously enhancing quality and productivity. Particularly in hydroponic cultivation, decisions on adjusting the volume and cycle of irrigation in response to hourly transpiration changes, monitoring of both irrigation and electrical conductivity (EC), and recycling of spent nutrient solution are critical to maintain an efficient use of water and lessen the environmental impact [1,2,3]. Recent systematic reviews have highlighted the limitations of greenhouse irrigation modes based on empirical conventions; in contrast, the use of feedback control combining sensors, models, and AI, along with fertigation-based management (drainage rate and irrigation EC) delivers significant savings in water and fertilizer use while reducing the quantity of discharges (water, nutrients, and pesticides) [1,2,3].
In greenhouse cultivation of melons, both in South Korea and abroad, it has been repeatedly found that drainage rate and irrigation EC levels significantly affected marketability indicators, namely fruit weight, fruit shape index, and sugar content. In melons grown on coir substrate, for example, a drainage rate of 30% during the fruiting period and medium EC levels (1.5–1.8–2.0 dS·m−1) were beneficial for the growth and quality of the melons, and a 30% drainage rate treatment significantly increased fruit weight in spring cultivations [4,5]. In response to the challenge of making consistent decisions on optimal harvest timing in the field, machine learning research has emerged as a non-destructive approach to predicting melon harvest indicators (e.g., predicted yield, sugar content, flesh hardness) at an early stage [6]. Recently, a number of new methodologies have been developed that automatically assess maturity and traits solely based on RGB images taken in greenhouses or that classify maturity stages in real-time using lightweight object detection models [7,8,9]. The present study builds on the results from these previous studies, with the goal of accurately predicting optimal melon harvest timing by integrating RGB images with environment and irrigation/drainage data of greenhouses and quantitatively comparing performance and feasibility (marker effect) based on the presence or absence of markers (scale/color).
In recent studies, image-based analysis techniques have been recognized as a novel approach for monitoring crops and determining harvest timing. However, it is difficult for image-based analysis alone to fully capture the complex environmental changes within a greenhouse. Environmental factors such as temperature, humidity, CO2 concentration, and solar radiation directly influence photosynthetic rates and fruit enlargement, and irrigation and drainage data are key indicators reflecting the soundness of the root environment and nutrient uptake efficiency [10,11]. For example, it has been reported that a recycling system for spent nutrient solution or a strategy controlling drainage rate could reduce water and fertilizer usage while simultaneously contributing to the reduction in soil and water pollution [12,13]. This is considered a crucial factor that determines the final quality and harvest timing of crops, extending beyond environmental aspects.
In image-based measurement, markers are often employed for calibration. Markers can improve the accuracy of fruit size measurement by providing a reference for length or area within the image. Indeed, computer vision research incorporates color markers or scale bars taken with images to reduce model training errors [14]. In agricultural fields, however, installing and maintaining markers is cumbersome and impractical in the long term. It is therefore important to quantitatively compare performance of monitoring systems with and without markers to validate their applicability in agricultural fields [15].
Thus, the aim of this study was to predict the optimal harvest timing for melons by integrating RGB images, greenhouse environmental data, and irrigation and drainage data, and to quantitatively evaluate differences in performance between marker and no-marker conditions. In contrast with previous studies that relied solely on image-based or environmental data-based analyses, this study adopts a multimodal fusion approach combining image-derived fruit traits with environmental and fertigation dynamics. To the best of our knowledge, this is the first study to provide a quantitative comparison of marker effects for melon harvest prediction within a unified fusion framework, thereby contributing to improved field applicability reported as essential in prior studies.
Despite recent advances in image-based or environment-based approaches for crop maturity and harvest prediction, several limitations remain. Many existing studies rely on a single data modality, which restricts their ability to capture both fruit-level phenotypic variation and temporal environmental dynamics. In addition, most image-based studies assume controlled calibration conditions, while the practical applicability of markerless measurement in real greenhouse environments has rarely been quantitatively evaluated.
To address these gaps, this study makes the following contributions.
First, we propose a multimodal fusion framework that integrates RGB image-based fruit traits with time-series greenhouse environmental and fertigation data to predict melon harvest readiness in a non-destructive manner.
Second, we quantitatively evaluate the effect of calibration markers by comparing prediction performance under marker and no-marker conditions, thereby assessing the feasibility of field-level deployment.
Third, we demonstrate that the proposed CNN–LSTM–MLP fusion model can accurately predict the actual harvest date in a commercial greenhouse setting, bridging the gap between experimental calibration and practical smart farming applications.

2. Related Works

2.1. Research on Image-Based Fruit Detection and Growth Monitoring

Because of their non-destructive and automatable nature, image-based techniques are actively utilized in agriculture to measure growth. In particular, the advancement of deep learning-based object detection techniques has played an increasingly important role in estimating the location, number, size, and maturity of individual fruit. By applying a convolutional neural network (CNN)-based methodology, namely the Faster R-CNN model, to apples, mangoes, and almonds, Bargoti and Underwood [16] reported reliable fruit detection performance even in complex orchard settings. Mirhaji et al. [17] validated the capability of YOLO-V4 to quantify fruit load based on RGB images in orange orchards. Additionally, Afonso et al. [18] demonstrated the practical feasibility of Mask R-CNN for fruit detection and counting on tomatoes in greenhouse settings.
Recent research has been extended to fruit size estimation beyond simple detection. Kim et al. [19] proposed a system for estimating the diameter of plums using RGB-D images and demonstrated the effectiveness of image–depth fusion techniques through a comparison of several detection models (Faster R-CNN, EfficientDet, and SSD). Ferrer-Ferrer et al. [20] introduced an approach that simultaneously performs fruit detection and size estimation using a multi-task neural network, thereby enhancing processing efficiency.
In addition to primary research, there are a number of reviews of the evolving methodology in crop monitoring. Abebe et al. [21] reviewed image-based high-speed phenotyping technologies in the management of horticultural crops and also discussed the potential for integrating RGB, thermal, fluorescence, and hyperspectral sensors for estimating growth, yield, and quality traits. Tong et al. [22] analyzed the scope of deep learning-based plant growth monitoring research and reported a recent trend toward expanding from single-time point classification to time-series growth estimation. In an analysis of the latest trends in fruit detection and identification, Xiao et al. [23] noted that the major challenges included small fruit size, occlusion, insufficient data, and model lightweighting.

2.2. Research Connecting Environmental Data and Growth

It is known that the major environmental factors within greenhouses (temperature, relative humidity, CO2 concentration, light intensity) are key factors determining fruit growth and yield variation. Mohmed et al. [24] applied a Bayesian neural network using data from Chinese solar greenhouses to quantify the effects of changes in the temperature, relative humidity, CO2 concentration and radiation levels in the greenhouse on daily growth rates and yield indicators (fresh weight, dry weight, leaf area, etc.). This approach suggests that the trajectory of environmental conditions itself could be used as an important indicator for predicting optimal harvest timing.
In a growing number of studies, deep learning approaches are utilized to capture the nonlinear relationship between greenhouse environments and crop growth. Gong et al. [25] predicted tomato yield using a time-series model (combining TCN and RNN) by applying it to greenhouse data, resulting in lower error rates than what occurs with conventional machine learning-based models. Moreover, the authors found that input configurations considering both environmental factors and past yields resulted in optimal performance, indicating the necessity of multimodal data fusion.
In a study conducted in South Korea, Sim et al. [26] predicted strawberry growth based on environmental data. Specifically, the yield of soil-grown strawberries in a greenhouse was predicted by combining the environmental data (temperature, humidity, soil moisture, etc.) with growth data. The authors found that vapor pressure deficit, photosynthetically active radiation, and relative humidity were important variables for yield, whereas temperature of the air and soil were significant variables for growth. The results showed the feasibility to predict yield with high accuracy using environmental data alone.
In addition, there has been research considering the linkage between environmental control and growth. Mahmood et al. [27] proposed a robust MPC framework by combining an interpretive energy balance model with a data-based neural network model for greenhouse temperature control, which reduced seasonal energy consumption by up to 23.61% while lowering temperature tracking error. These results showed that the environmental control strategy could regulate crop growth and maturation rates in an indirect manner.
In particular, CO2 concentration is a key variable that is directly linked to photosynthesis and fruit quality in C3 crops. Wang et al. [28] reported that an elevation to intermediate CO2 concentrations of 550–650 μmol∙mol−1 increased the average yield of C3 crops by approximately 18% and also improved photosynthetic efficiency, quality, and water use efficiency at CO2 concentrations of 800 to 1000 μmol∙mol−1. These results suggest that CO2 management could potentially be used as a supplementary factor for predicting optimal harvest timing in C3 crops, such as melons.

2.3. Research on Yield Prediction Through the Fusion of Image and Environmental Data

There have been several recent attempts to combine object-level representations provided by image information with the temporal and physiological context provided by environmental sensors. Wen et al. [29] combined RGB image features collected in a greenhouse with time-series environmental records (temperature, humidity, light intensity, watering, etc.) at the individual fruit level, to predict sugar content in strawberries. The authors found that the combination of image and environmental data significantly reduced errors compared to single-modal methods. This is an empirical case that demonstrated the utility of multimodal fusion in predicting quality or ripeness at the level of individual fruit.
Abd-Elrahman et al. [30] combined canopy indicators (area, volume, height, etc.) extracted from ground images with meteorological data and historical yield information, which resulted in the improved accuracy (by 10–29%) of strawberry yield prediction at 3–21 days before harvest. This outcome suggests that modeling based on a combination of image-derived variables and environmental/history variables could be beneficial in predicting short-term harvest timing.
There is also a reported case of fusion for direct prediction of harvest date. Nakano et al. [31] combined estimation of leaf age based on drone images with the daily average temperature (cumulative temperature) from weather mesh data in order to predict the harvest date (reaching 40 leaf days old) of field-grown lettuce, with an average error of 2.35 days, meeting the practical standard (±3.5 days). This showed that a combination of growth indicators estimated from images with an environmental (temperature) model could be directly applied to the prediction of optimal harvest timing.
Furthermore, under greenhouse conditions, Lin et al. [32] integrated time-series image features with multiple environmental and growth-related traits and proposed a two-stage multi-feature fusion model for predicting the harvest date of strawberries. The proposed approach integrated the maturation progress information provided by the time-series images with the variation in maturation rate provided by the environmental time-series, thereby improving the prediction performance.
In short, monitoring strategies that integrate images and environmental data have been evolving, with the shared aim of increasing the feasibility of predicting optimal harvest timing and short-term yield. These research efforts are based on the assumption that combined input modeling can more accurately capture the maturity status of individual plants and predict the progression of maturation rates based on meteorological and management factors, both in greenhouses and in open field cultivation. The research question of the present study is in line with this trend, the goal being to predict optimal harvest timing for melons through the empirical integration of RGB images and environmental data of greenhouses.

2.4. Comparative Research on Marker-Based and No-Marker-Based Approaches

One of the most significant challenges in image-based fruit size estimation is the issue of scale calibration. In an early study addressing this issue, a method incorporating attached reference markers was adopted for size estimation. Gongal et al. [33] applied a 3D machine vision system to estimate the diameter and volume of apples and found that calibration using markers could achieve high agreement with actual measurements. However, it has been pointed out that the feasibility of marker-based approaches is limited due to the difficulties in installation and management of markers, the need for worker intervention, and the masking of markers by fruits or leaves.
Accordingly, investigators have pursued no-marker (markerless) approaches. Ferrer-Ferrer et al. [20] simultaneously performed fruit detection and size estimation using a multi-task neural network, thereby demonstrating the feasibility of accurate size estimation without using any reference markers. In addition, Gené-Mola et al. [34] extracted dimensional features of apple fruits based on depth information from an RGB-D camera, based on which they proposed a multimodal training model to estimate the size and volume of fruits. Furthermore, Bortolotti et al. [35] demonstrated that a computer vision system combining a depth camera and a neural network can estimate fruit size without markers, even in the field; that is, under conditions that differ from those of the laboratory.
The results of these studies suggest that while marker-based approaches have advantages—such as relative simplicity and high initial accuracy—they have limitations in terms of automation and large-scale application. In contrast, no-marker approaches require complex modeling and large training data but may be more advantageous for securing field automation and feasibility in the long term. In this context, we aimed to analyze the performance of models predicting optimal harvest timing, with or without the use of markers, thereby assessing the practical applicability of these approaches. Table 1 summarizes the related studies described above.

3. Materials and Methods

3.1. Experimental Environment in the Greenhouse

This study was conducted in the Energy-Self-Sufficient Smart Farm Research Greenhouse at the Jeonnam Agricultural Research and Extension Services (1508, Senam-ro, Sanpo-myeon, Naju-si, Jeollanam-do, Republic of Korea, 58213). The cultivars used were ‘Damas’ and ‘Supia’ melons, which were transplanted on 10 June 2025; fertilizer was added between 1 July and 4 July. On 28 August, cultivation ended, and the melons were harvested.
The mode of hydroponic cultivation was as follows. The melons were grown in a coir substrate, incorporating an automated irrigation and drainage system. To manage the nutrient solution, the drainage rate was controlled to maintain approximately 30%, and EC was controlled within the range of 1.5–2.0 dS·m−1, depending on the growth stage. The temperature in the greenhouse was maintained between 25–30 °C during the day and 18–22 °C at night, with relative humidity maintained at 60–70%. To activate photosynthesis, CO2 concentration was controlled within the range of 400–800 μmol·mol−1.
We installed various environmental sensors and control devices inside the greenhouse. Temperature and relative humidity sensors were positioned at the height of the crop canopy, and the CO2 sensor was placed near the central pathway. To measure light conditions, we installed a photosynthetically active radiation (PAR) sensor at the center of the ceiling. Irrigation and drainage volume, along with EC, were recorded via flow meters and sensors connected to an automatic nutrient solution supply system. Environmental and fertigation data were measured at 1 min intervals and were stored on a local server. To synchronize with image data, time data was also recorded.
The experimental period was from 10 June 2025 (transplantation) to 28 August (harvest), during which time the environmental data and image data were continuously collected within the greenhouse. The environmental data collected in this study included temperature, humidity, CO2 concentration, light intensity, irrigation and drainage volumes, as well as EC in irrigation and drainage. These data were subsequently combined with image data for fusion analysis.
Shown in Figure 1 and Figure 2 are external views of the experimental greenhouse. The key parameters of the experimental setup are listed in Table 2.

3.2. Image Data Acquisition System

Image data were acquired while moving a standard USB webcam mounted on a rail-based lift along the greenhouse pathways. The camera was positioned to face the equatorial region of the fruit and stored images at regular intervals while moving at a constant speed along the rail from the starting point to its end point. Images were acquired once every week, in the morning (10:00 a.m.), from fruit set through harvest (early July–28 August). All data were recorded in KST (UTC+9) to ensure synchronization of environmental- fertigation data in time series.
Camera settings were fixed to RGB 8-bit color space, 1280 × 720 px (HD) resolution, and loss-less PNG storage format. File names were in the format rgb_YYYYmmdd_HHMMSS.png, incorporating date and time, making separate metadata files unnecessary. Data were organized hierarchically by date, camera, and marker presence as follows:
/2025.08.25/cam/rgb/marker/rgb_20250825_105634.png
/2025.08.25/cam/rgb/nomarker/rgb_20250825_105648.png
To obtain an absolute scale for fruit size estimation, we utilized a 100 mm × 100 mm ArUco marker. This approach, which uses markers as calibration reference points within images, is widely employed and has been shown to be effective in enhancing the accuracy of fruit diameter and volume estimation [33,34]. The dataset was collected under both marker and no-marker conditions, enabling a quantitative comparison of calibration accuracy and field applicability.
Shown in Figure 3 is a webcam moving along a rail inside the greenhouse and the ArUco marker placed beside a fruit.

3.3. Data Labeling and Alignment

The collected image data were subjected to a labeling process for training object detection and size estimation models. A Computer Vision Annotation Tool (CVAT)) was used to label the images; this tool is similar to EXACT [36] and Quick Annotator [37]. In addition, a dedicated research project (melon_harvest) was generated to manage the data.
Two classes of labels were defined: Melon and Marker. All objects were specified as a polygon (segmentation) type. As for the Melon class, we reflected the actual boundary of the fruit as accurately as possible. When fruits were visually obstructed by leaves, vines, or greenhouse structures, the presence or absence of occlusion was annotated using a binary Boolean attribute (occluded: true/false). This attribute was recorded for data description and analysis and was not directly used as a model input. The Marker class was based on the 100 mm × 100 mm ArUco marker placed within the image, which was used as the calibration reference for the estimation of fruit size.
We only labeled images classified as the Training set, not the entire dataset. The validation (Val) and test (Test) sets were intentionally unlabeled—for the independent evaluation of model performance. This design ensured a sufficient amount of labeled data for model training while maintaining the integrity of the evaluation data.
A summary of the labeling rules is shown in Table 3, and some examples of the labeling are shown in Figure 4: panel (a) shows a case where both the melon and marker were labeled under the marker condition, and (b) is an example where only the fruit was labeled under the no-marker condition.

3.4. Organization of Dataset

The final dataset consisted of a total of 1112 RGB images. The dataset was partitioned into Training, Validation, and Test sets, with 545, 149, and 418 images, respectively.
To partition the dataset, a domain holdout strategy was applied rather than a simple randomization. Images collected on 25 August 2025, were used as the Training and Validation sets (8:2 ratio), and all the images captured on 8, 12, and 18 August 2025, were used as the Test set. This enabled us to use data from different time periods for training and evaluation, thereby validating the performance of the model in temporal generalization.
Images were managed hierarchically based on acquisition date, camera (Webcam A and Webcam B), and marker presence (marker and no-marker). Webcam A and Webcam B were standard USB cameras installed at different locations, both with the same resolution (1280 × 720 px) and operating in RGB mode. As for the information for marker, images were divided into cases with and without a 100 mm × 100 mm ArUco marker.
The details of the dataset distribution are summarized in Table 4.

3.5. Fusion Model Design

In this study, we combined multimodal inputs (image data and environmental/fertigation data) and designed a fusion model to analyze melon growth status and predict optimal harvest timing. This CNN–LSTM structure with the late fusion approach has been reported to be effective in studies predicting crop growth [29,32].
With respect to image data, RGB images acquired under marker and no-marker conditions were used as input to detect melon objects and estimate their sizes, for which a CNN-type object detection network based on the YOLOv8n segmentation model was applied [17,31]. The model output consisted of fruit detection results (location and count) and estimated diameter values, which were subsequently combined with environmental and fertigation data.
Environmental and fertigation data included temperature, relative humidity, CO2 concentration, light intensity, irrigation/drainage volumes, and irrigation/drainage ECs. These data were processed as time-series inputs and the long short-term memory (LSTM) among recurrent neural network (RNN) structures was adopted to reflect both short-term and long-term dependencies [25,26]. The output from the LSTM module was converted into feature vectors summarizing changes in environmental factors and fertigation management conditions.
In the final fusion stage, we combined the feature vectors extracted from the CNN and LSTM using a concatenation method and entered them into the MLP. The final output consisted of (1) estimated fruit diameter, (2) estimated biomass, and (3) predicted harvest date.
It was expected that this CNN–LSTM–MLP fusion structure could provide more accurate predictions than what are usually obtained from models based on a single modality (RGB image or environmental data alone). Specifically, the model was designed to quantitatively identify the performance difference between laboratory-adjusted environments and actual application in agricultural fields by comparing marker and no-marker conditions.
Shown in Figure 5 is a schematic diagram of the fusion model proposed in this study.

3.6. Environmental Data–Based Prediction Model

3.6.1. Overview of Environmental, Growth, and Fertigation Data

Environmental data were collected from the experimental greenhouse for approximately 80 days, from the transplanting date of 10 June 2025, to the harvest date of August 28. Measurements were recorded at one-minute intervals and included internal temperature, relative humidity, CO2 concentration, and light intensity. Irrigation and drainage-related variables were also monitored, including irrigation volume, drainage volume, and electrical conductivity (EC), which were used to assess water and nutrient stress within the root-zone environment.
Growth data were obtained through destructive sampling of melon fruits and pre-harvest fruit survey files. Diameter and weight measurements were linked to the corresponding environmental conditions on each date to construct the ground-truth growth timeline.
These datasets were used to develop a predictive model estimating melon growth and harvest timing based solely on environmental and fertigation factors.

3.6.2. Derived Environmental Variables

To incorporate the temporal dynamics of the greenhouse environment, multiple derived variables were computed from the raw sensor data. These included vapor pressure deficit (VPD), daily solar radiation, growing degree days (GDD), 7-day moving average temperature, 7-day moving average VPD, 7-day average CO2 concentration, and the cumulative solar radiation over the preceding 7 days.
These derived metrics capture short-term and long-term variations in heat accumulation, water stress, photosynthetic activity, and environmental load—factors known to drive fruit enlargement and maturation.
Each derived metric was matched with pre-harvest fruit diameter measurements (average of fruit length and width) by date to construct the training dataset.

3.6.3. Regression Model Construction

Because of the limited sample size, a linear regression model was employed to predict melon fruit diameter from the derived environmental variables. The input features consisted of cumulative GDD, daily and weekly VPD, 7-day moving average temperature, 7-day cumulative solar radiation, and 7-day average CO2 concentration.
The goal of this model was to determine the extent to which environmental conditions alone could reproduce the fruit growth trajectory and estimate the harvest threshold diameter.

3.6.4. Cross-Validation Procedure

Model performance was evaluated using Leave-One-Out Cross-Validation (LOOCV), a method suitable for small datasets. For each iteration, one sample was excluded as the test instance, and the model was trained on the remaining samples. Prediction accuracy was assessed using the mean absolute error (MAE) and root-mean-square error (RMSE) between predicted and observed fruit diameters.
This procedure ensured robust evaluation of the environmental-data-only model and enabled direct comparison with the multimodal fusion-based prediction model presented earlier in this study.

3.7. Integrated Data Preprocessing and Harvest Prediction Procedure

During data preprocessing, environmental variables—including cumulative daily solar radiation, temperature, relative humidity, VPD, and EC—were aggregated using the transplanting date (10 June 2025) as the reference point. Fruit diameter values (average of length and width) were interpolated into a continuous daily time series using pre-harvest growth survey records.
To calibrate the integrated prediction model, destructive sampling measurements of fruit diameter and weight were used as reference values. These measurements were aligned with the interpolated diameter timeline to construct a unified growth dataset.
The final harvest prediction was generated by applying the estimated diameter trajectory to a predetermined harvest readiness threshold. The predicted harvest date was defined as the time point at which the estimated fruit diameter reached this threshold level.
To clarify the rationale for threshold selection, a fruit diameter of 150 mm was adopted in this study based on commercial cultivation practices for greenhouse-grown melons in South Korea, where fruits reaching approximately 145–150 mm in diameter are generally regarded as marketable in terms of size and weight. Destructive sampling conducted near the harvest stage further confirmed that melons approaching this diameter corresponded to commercially acceptable biomass levels.
It should be noted that this threshold is not intended to represent a universal maturity criterion applicable to all cultivars, seasons, or cultivation environments. Rather, it serves as a case-specific and practice-driven reference value for validating the proposed prediction framework. The proposed methodology is inherently extensible and can accommodate alternative harvest indicators or threshold values when applied to different cultivars or production conditions.

3.8. Model Performance Metrics

Object detection performance was evaluated using mAP@0.5, Precision, and Recall. These metrics were applied as the standard for assessing the performance of fruit object detection [19,20]. Growth prediction performance was assessed using R2, MAE, and RMSE, which are metrics commonly employed in fruit size and biomass regression tasks [19,20].

3.8.1. Evaluating Object Detection Performance

To accurately detect melons within an image, mean average precision (mAP) was used as the main metric. In particular, we calculated mAP at an intersection over union (IoU) threshold of 0.5 (mAP@0.5). In addition, values for Precision and Recall were calculated to comprehensively analyze the detection performance of the model.

3.8.2. Metrics for Growth Prediction

For regression tasks predicting continuous variables such as fruit diameter, biomass, and predicted harvest date, we utilized the following metrics:
Coefficient of Determination (R2): evaluates how well the model explains the variation in the actually observed values
Mean Absolute Error (MAE): evaluated the average absolute difference between the predicted and the measured value
Root Mean Squared Error (RMSE): the square root of the average of the squared errors, sensitive to large errors

3.8.3. Comparative Analysis of Conditions

To assess differences in model performance between the marker and no-marker datasets, we compared the results of both conditions in the same model. The statistical significance of the performance difference was validated using a paired t-test, thereby simultaneously evaluating how well marker-based calibration would improve the actual prediction accuracy and the field applicability under no-marker conditions.
The definitions and equations relevant to the performance evaluation metrics are summarized in Table 5.

4. Results

4.1. Evaluation of Melon and Marker Detection Performance

To assess the performance of the model trained to detect and segment melons and markers in RGB images, labels were generated using CVAT, with two classes (Melon and Marker) and a polygon (segmentation) type. The Melon class was also assigned an attribute for occlusion. For markers, a 100 × 100 mm square ArUco marker was used for calibration to an absolute scale. The definitions of these labels and the purpose of using markers are described in Section 3.
Data were partitioned according to the domain holdout strategy (25 August → Training and Validation; 8/12/18 August → Testing) to enable evaluation of temporal generalization. Detection performance was evaluated according to mAP@0.5, Precision, and Recall as defined in Section 3.6. Cameras were denoted as Webcam A/B throughout the experiments.

4.1.1. Dataset, Labeling, Model Training, and Evaluation Protocol

(1)
Data and labeling
Labeling tool: CVAT (Project: melon_harvest), polygon-based instance segmentation.
Class/attributes: melon (polygon, occluded: Y/N), marker (polygon). The marker was a 100 mm ArUco, used as a reference for calibrating diameter/distance.
Labeling range: In principle, labels were limited to the training and validation sets; the test set was preserved to ensure independence of evaluation.
(2)
Settings for data partitioning and evaluation
Strategy: data (domain)-based holdout—data from 8/25 was used for training and validation (e.g., 8:2 ratio), and data from 8/8, 8/12, and 8/18 was used only for testing. The purpose was to determine detection performance on dates not used for training.
Evaluation metrics: mAP@0.5, Precision, Recall as defined in Section 3.6 (definitions and labeling per Table 5).
(3)
Model and training configuration
Model family: YOLO family instance segmentation (2-class: melon and marker).
Input resolution: Resized from 1280 × 720 original to the training pipeline specification (e.g., 640 sq) to match the camera setting in Section 3.
Training epochs: 100 epochs (best.pt selected based on best epoch).
Augmentation: Standard augmentation (horizontal flip, color space transformation, random timepoints/cropping, etc.) applied—boundary preservation prioritized even under the mixed marker/no-marker conditions.
Output products: Training logs, checkpoints (best.pt, last.pt), validation set inference results (visualization, label format converted format) generated.
(4)
Inference and aggregation rules
Multi-class aggregation. Reported mAP@0.5 after calculating AP for each class. Evaluated including occluded samples within one class (subset analysis depending on attribute).
Comparison of conditions: Compared marker vs. no-marker conditions under the same model/same settings (significance tests with paired t-tests when necessary—following the comparison design in Section 3.6.

4.1.2. Quantitative Performance

The results of the object detection performance under marker and no-marker conditions are summarized in Table 6. Overall, detection accuracy was found to be greater under the marker condition, which can be interpreted as the marker functioning as a calibration reference for detecting fruit size and location.
In the Melon class, the marker condition yielded stable performance with mAP@0.5 of 0.92, Precision of 0.91, and Recall of 0.90, whereas the no-marker condition showed slight decreases to 0.89, 0.88, and 0.87, respectively. In the Marker class, significantly high performance was achieved, as evidenced by mAP@0.5 of 0.95, Precision of 0.94, and Recall of 0.93. These results indicated that the calibration reference provided by the marker improved melon detection performance and simultaneously suggested field applicability, as mAP@0.5 remained at 0.89 even under the no-marker condition.
The detection results from the training and validation datasets by condition (A_marker, A_nomarker, B_marker, B_nomarker) are shown in Figure 6 and Figure 7. Under the marker condition, both melons and markers were clearly detected with distinct boundaries. Under the no-marker condition, detection performance slightly decreased for small fruits or occluded samples.
Figure 8 and Figure 9 present detection results on the independent test dataset from two different camera positions (Camera A and Camera B). While Figure 8 shows representative examples of marker and no-marker detection from Camera A, Figure 9 provides additional examples from Camera B, where lighting and occlusion conditions differ, highlighting how the model behaves under more challenging field conditions.

4.1.3. Qualitative Analysis

In addition to the quantitative comparison of metrics, we qualitatively compared the detection results under the marker and no-marker conditions. In Figure 10 and Figure 11 are shown some representative examples.
Under the marker condition, the rate of detection of melon boundaries was robust: even occluded fruits (those hidden behind leaves or stems) could be identified with relatively high reliability. In the case of melons with large diameters or those positioned close to the marker, in particular, the bounding boxes and segmentation masks were highly aligned with the actual fruit boundaries. Under the no-marker condition, on the other hand, although most melons were successfully detected, the boundaries of small fruits or occluded samples tended to result in incomplete detection.
In addition, we observed false positives, in which some leaves or stems in the background were falsely detected as melons under the no-marker condition; in contrast, the number of false detections was significantly reduced under the marker condition. Overall, the field applicability of the model remains viable, given that most melons were reliably recognized even under the no-marker condition, although the model was trained on the assumption of an environment without markers.

4.2. Prediction of Diameters and Weights

4.2.1. Performance of Diameter Estimation

We estimated melon diameter using image-based detection results and compared them with measured values. The model showed high correlation under both marker and no-marker conditions, with a coefficient of determination (R2) above 0.90. Under the marker condition, the RMSE decreased, indicating that the calibration standard improved the accuracy of diameter estimation. Under the no-marker condition, on the other hand, the predicted values showed high agreement with the measured values overall, although some deviation was observed in fruits with small diameters.
A summary of the quantitative performance of the model in terms of diameter prediction is shown in Table 7. Because the diameter–weight correlation is visually presented in the weight prediction analysis, we omitted the diameter scatter plot.

4.2.2. Weight Prediction Performance

We predicted melon biomass using a regression model with estimated diameter values as input. A cubic polynomial regression equation was applied, with the model defined as follows:
W e i g h t g =   a · D + b · D 2 + c · D 3 + d
where D is the diameter of fruits (cm), and the coefficients a, b, c, and d were estimated from destructive test data. The prediction results agreed well with the measured weights obtained from destructive sampling, with the average error kept within tens of grams. In Figure 12 we illustrate the relationship between measured and predicted values for the entire dataset, where blue and red points represent measured and predicted values, respectively. Quantitative comparisons by condition are shown in Table 8.

4.2.3. Correlation and Comprehensive Analysis of Diameter–Weight

There was a strong correlation between diameter and weight in melons, meaning that an increase in diameter during fruit development directly results in increased weight. The polynomial regression model constructed in this study was trained using destructive sampling data and effectively explained the relationship between diameter and weight, with a coefficient of determination (R2) of 0.9 or higher. This indicated that the accuracy of diameter estimation directly affects the performance of weight prediction.
Under the marker condition, diameters were estimated reliably, resulting in a reduced RMSE for weight prediction. Under the no-marker condition, however, there was a deviation in diameter estimation for small fruits or occluded samples, leading to under- or over-estimation of weight for some fruits. Even under the no-marker condition, nevertheless, the average error relative to measured values was within ±120 g, supporting its feasibility for field application.
Collectively, our results indicate that the diameter–weight conversion model could be feasible for precision analysis in laboratory settings under the marker condition and could achieve sufficient accuracy for determining actual harvest timing even under the no-marker condition. This suggests that our image-based measurement approach could be applied practically in agricultural fields regardless of whether markers are used or not.

4.3. Results of Environmental Data–Based Harvest Prediction

In evaluating the performance of the regression model based on environmental data, the mean absolute error (MAE) and the RMSE in LOOCV were found to be 7.99 mm and 9.58 mm, respectively. This was within an error range of about ±1 cm relative to the actual fruit diameter, indicating relatively accurate predictions even with a limited sample size.
The predicted diameter of melons followed a similar increasing trend to the measured values from actual pre-harvest fruit surveys. Particularly, we found that the environmental factors alone could reproduce the growth curve to some extent, due to the reflection of growth patterns depending on growing degree days (GDD) and solar radiation. The comparison of the predicted diameter curve based on environmental data and the observed diameter from field surveys is shown in Figure 13. We found that the predicted value (blue line) had a similar trend to the observed value (red dots), and that it reached the harvest threshold diameter (150 mm, green dotted line) on 24 August (black dotted line).
When analyzing the timing at which the diameter reaches the harvest threshold (150 mm) based on destructive sampling data, the model predicted 24 August 2025, as the initial predicted harvest date. Considering that the measured diameter reached 140–145 mm between 21 and 25 August, the predicted timing was in close alignment with the actual optimal harvest timing. This demonstrated the potential for predicting optimal harvest timing solely using environmental data, which could be further improved by integrating image data.

4.4. Prediction of Optimal Harvest Timing Based on Integrated Data

Using the fusion model that combines image-derived diameter estimates with environmental and fertigation variables, we predicted the optimal harvest timing for melons. The predicted harvest date was 28 August 2025, which exactly matched the actual harvest date measured in the greenhouse. This prediction corresponded to the time when the estimated fruit diameter reached the commercial maturity threshold of 150 mm.
Notably, destructive sampling on the harvest date yielded a fruit diameter of 147.7 mm and a weight of 1.68 kg, which closely aligned with the estimated diameter curve and weight prediction results from the fusion model. Figure 14 visualizes the predicted diameter progression, the threshold diameter, and the measured destructive sampling values. The predicted and measured harvest data are summarized in Table 9.

5. Discussion

Several limitations related to data acquisition and dataset scale should be acknowledged. RGB images were collected at a weekly interval, which may be sparser than high-frequency phenotyping datasets designed to capture fine-grained morphological or color changes. However, this acquisition frequency was intentionally selected to reflect realistic operational constraints in commercial greenhouse environments, where frequent image collection may not be feasible due to labor, equipment, or system limitations.
The final dataset consisted of 1112 RGB images, representing a moderate-scale dataset appropriate for a field-oriented validation study rather than large-scale model generalization. To address data quality under practical conditions, the presence or absence of occlusion caused by leaves, vines, or greenhouse structures was annotated as a binary Boolean attribute (occluded: true/false) during the labeling process. This attribute was used solely for data description and analysis and was not directly incorporated as a model input. While these factors may limit direct generalization across diverse cultivation scenarios, the dataset design prioritizes practical applicability and provides a realistic benchmark for evaluating multimodal harvest prediction under real greenhouse conditions.
The comparison between marker and no-marker conditions was intentionally designed to evaluate the trade-off between calibration accuracy and field applicability. While the presence of markers improved size estimation accuracy, the no-marker condition still achieved sufficiently high prediction performance to support practical deployment in commercial greenhouse environments. This result indicates that the proposed framework can bridge the gap between laboratory-calibrated measurements and realistic markerless field operation, which is essential for scalable smart farming applications.
In this study, we presented a model that predicts optimal harvest timing for melons by combining RGB images with greenhouse environmental and fertigation data. The model reliably detected melons and markers using a YOLOv8n-based object detection module. Under marker conditions, mAP@0.5, precision, and recall were higher than those under no-marker conditions. As reported in previous studies [17,33], this performance difference can be attributed to markers providing an absolute scale for fruit size estimation, thereby contributing to improved accuracy. Nevertheless, the model still achieved a high level of detection performance under markerless conditions, supporting its feasibility for field application.
In the diameter–weight conversion model, the polynomial regression equation derived from destructive sampling data showed high explanatory power (R2 > 0.9). This finding is consistent with previous studies [19,20] reporting strong correlations between fruit diameter and weight. Moreover, the prediction error remained within ±120 g even under no-marker conditions, indicating that the model provides sufficient accuracy for practical harvest decision-making in the field.
The LSTM-based prediction module using environmental and fertigation data effectively captured temporal changes in key factors such as growing degree days (GDD), vapor pressure deficit (VPD), and light intensity. The predicted diameter trajectory exhibited a similar increasing trend to the observed values, suggesting that environmental data alone can contribute to harvest timing estimation to a certain extent. This result is consistent with previous studies [25,26,29] that predicted crop yield or growth in other horticultural crops, such as tomatoes and strawberries, using environmental time-series data.
Compared with single-modality approaches, the proposed CNN–LSTM–MLP fusion model achieved higher prediction precision by simultaneously considering image-based fruit traits alongside environmental and fertigation data. In particular, this study quantitatively compared performance differences between marker and no-marker conditions, thereby identifying the gap between laboratory-based calibration environments and realistic field-based applications. This work holds academic significance as one of the first studies to apply image–environmental data fusion for harvest date prediction in melons, extending approaches previously reported for crops such as strawberries and lettuce [29,31,32].
The results of this study should be interpreted in light of several remaining limitations. The dataset covered only a single cultivation season (summer 2025), and additional validation across different seasons and cultivation environments is required to enhance generalizability. Furthermore, reduced detection and diameter estimation performance for small fruits and heavily occluded samples under markerless conditions remain challenges that warrant further investigation. Future studies should aim to improve robustness by expanding dataset scale, validating under diverse camera and lighting conditions, and incorporating self-supervised or adaptive calibration techniques.

6. Conclusions

In this study, we propose a multimodal approach to predict the optimal harvest timing for melons by combining RGB images with environmental and fertigation data from greenhouses. Using a YOLOv8n-based object detection model, melons and markers were reliably detected, and higher accuracy was obtained under the marker condition. However, the performance remained high even under the no-marker condition, thus indicating the feasibility of markerless-based measurement in agricultural field settings.
Through diameter-weight regression analysis, we revealed a strong correlation between fruit diameter and biomass. Destructive sampling-based calibration enhanced the prediction performance. Moreover, the LSTM-based analysis of environmental data contributed to predicting diameter growth curves and harvest dates by applying temporal changes in major environmental factors. Ultimately, the CNN–LSTM–MLP fusion model exhibited a better prediction performance compared to single-modality approaches, yielding an accurate prediction of the actual harvest date.
These results can be used as basic data for the support of precise prediction of melon harvest timing and automated decision-making. This study also contributes to closing the gap between laboratory experiments and field application by quantitatively validating the performance difference based on the presence or absence of markers. Future research should expand the feasibility of this model by securing datasets with various seasons and cultivation environments, improving the detection of small and occluded fruits, and performing experiments in conjunction with autonomous harvesting robots.

Author Contributions

Conceptualization, K.Y.; methodology, K.Y.; software, K.Y.; validation, K.Y. and M.L.; formal analysis, K.Y.; investigation, K.Y., S.J., J.L., and U.J.; resources, M.L. and S.J.; data curation, K.Y., J.L., and U.J.; writing—original draft preparation, K.Y.; writing—review and editing, K.Y. and M.L.; visualization, K.Y.; supervision, M.L.; project administration, K.Y. and M.L.; funding acquisition, M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Korea Institute of Planning and Evaluation for Technology in Food, Agriculture and Forestry (IPET) and Korea Smart Farm R&D Foundation (KosFarm) through Smart Farm Innovation Technology Development Program, funded by Ministry of Agriculture, Food and Rural Affairs (MAFRA) and Ministry of Science and ICT (MSIT), Rural Development Administration (RDA) (Grant No. RS-2025-02219360).

Data Availability Statement

Data Availability Statement: The data presented in this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
CNNConvolutional Neural Network
YOLOYou Only Look Once
LSTMLong Short-Term Memory
MLPMulti-Layer Perceptron
RNNRecurrent Neural Network
mAPmean Average Precision
IoUIntersection over Union
R2Coefficient of Determination
MAEMean Absolute Error
RMSERoot Mean Squared Error
VPDVapor Pressure Deficit
GDDGrowing Degree Days
ECElectrical Conductivity
PARPhotosynthetically Active Radiation
DATDays After Transplanting
CVATComputer Vision Annotation Tool

References

  1. Nikolaou, G.; Neocleous, D.; Katsoulas, N.; Kittas, C. Irrigation of Greenhouse Crops: An Overview. Horticulturae 2019, 5, 7. [Google Scholar] [CrossRef]
  2. Soussi, M.; Chaibi, M.T.; Buchholz, M.; Saghrouni, Z. Comprehensive review on climate control and cooling systems in greenhouses under hot and arid conditions. Agronomy 2022, 12, 626. [Google Scholar] [CrossRef]
  3. Savvas, D.; Giannothanasis, E.; Ntanasi, T.; Karavidas, I.; Ntatsi, G. State of the Art and New Technologies to Recycle the Fertigation Effluents in Closed Soilless Cropping Systems Aiming to Maximise Water and Nutrient Use Efficiency in Greenhouse Crops. Agronomy 2024, 14, 61. [Google Scholar] [CrossRef]
  4. Lim, M.Y.; Choi, S.H.; Choi, G.L.; Kim, S.H.; Jeong, H.J. Effects of Irrigation Amount on Fruiting Period and EC Level by Growth Period on Growth and Quality of Melon (Cucumis melo L.) Using Coir Substrate Hydroponics During Autumn Cultivation. Hortic. Sci. Technol. 2021, 39, 446–455. [Google Scholar] [CrossRef]
  5. Choi, S.H.; Lim, M.Y.; Choi, G.L.; Kim, S.H.; Jeong, H.J. Growth and Quality of Two Melon Cultivars in Hydroponics Affected by Mixing Ratio of Coir Substrate and Different Irrigation Amount on Spring Season. J. Bio-Environ. Control 2019, 28, 376–387. [Google Scholar] [CrossRef]
  6. Qian, C.; Du, T.; Sun, S.; Liu, W.; Zheng, H.; Wang, J. An Integrated Learning Algorithm for Early Prediction of Melon Harvest. Sci. Rep. 2022, 12, 18199. [Google Scholar] [CrossRef] [PubMed]
  7. Xu, S.; Shen, J.; Wei, Y.; Li, Y.; He, Y.; Hu, H.; Feng, X. Automatic Plant Phenotyping Analysis of Melon (Cucumis melo L.) Germplasm Resources Using Deep Learning Methods and Computer Vision. Plant Methods 2024, 20, 166. [Google Scholar] [CrossRef] [PubMed]
  8. Jing, X.; Wang, Y.; Li, D.; Pan, W. Melon Ripeness Detection by an Improved Object Detection Algorithm for Resource-Constrained Environments (MRD-YOLO). Plant Methods 2024, 20, 127. [Google Scholar] [CrossRef]
  9. Chen, G.; Yang, Y.; Zhang, X.; Fan, M.; Li, Z.; Zhai, Y. A Lightweight Color-Changing Melon Ripeness Detection Algorithm Based on Model Pruning and Knowledge Distillation Leveraging Dilated Residual and Multi-Screening Path Aggregation. Front. Plant Sci. 2024, 15, 1406593. [Google Scholar] [CrossRef]
  10. Blok, C.; Voogt, W.; Barbagli, T. Reducing Nutrient Imbalance in Recirculating Drainage Solution of Stone Wool Grown Tomato. Agric. Water Manag. 2023, 285, 108360. [Google Scholar] [CrossRef]
  11. Neocleous, D.; Savvas, D. Validating a Smart Nutrient Solution Replenishment Strategy to Save Water and Nutrients in Hydroponic Crops. Front. Environ. Sci. 2022, 10, 965964. [Google Scholar] [CrossRef]
  12. Malík, M.; Praus, L.; Tlustoš, P. Comparison of recirculation and drain-to-waste hydroponic systems in relation to medical cannabis (Cannabis sativa L.) plants. Ind. Crops Prod. 2023, 202, 117059. [Google Scholar] [CrossRef]
  13. Feldmann, M.J.; Tabb, A. Cost-effective, high-throughput phenotyping system for 3D reconstruction of fruit form. Plant Phenome J. 2022, 5, e20029. [Google Scholar] [CrossRef]
  14. Neupane, C.; Koirala, A.; Walsh, K.B. Fruit Sizing in Orchard: A Review from Caliper to Machine Vision with Deep Learning. Sensors 2023, 23, 3868. [Google Scholar] [CrossRef]
  15. Espejo-Garcia, B.; Mylonas, N.; Athanasakos, L.; Fountas, S. Towards Practical Artificial Intelligence Applications in Agriculture: A Review. Comput. Electron. Agric. 2021, 190, 106414. [Google Scholar] [CrossRef]
  16. Bargoti, S.; Underwood, J. Image segmentation for fruit detection and yield estimation in apple orchards. J. Field Robot. 2017, 34, 1039–1060. [Google Scholar] [CrossRef]
  17. Mirhaji, H.; Asakereh, A.; Mehdizadeh, S.A. Fruit detection and load estimation of an orange orchard using the YOLO models through simple approaches in different imaging and illumination conditions. Comput. Electron. Agric. 2021, 191, 106533. [Google Scholar] [CrossRef]
  18. Afonso, M.; Fonteijn, H.; Fiorentin, F.S.; Lensink, D.; Mooij, M.; Faber, N.; Polder, G.; Wehrens, R. Tomato fruit detection and counting in greenhouses using deep learning. Front. Plant Sci. 2020, 11, 571299. [Google Scholar] [CrossRef] [PubMed]
  19. Kim, E.-C.; Hong, S.-J.; Kim, S.-Y.; Lee, C.-H.; Kim, S.; Kim, H.-J.; Kim, G. CNN-based object detection and growth estimation of plum fruit (Prunus mume) using RGB-D imaging techniques. Sci. Rep. 2022, 12, 21251. [Google Scholar] [CrossRef]
  20. Ferrer-Ferrer, M.; Ruiz-Hidalgo, J.; Gregorio, E. Simultaneous fruit detection and size estimation using multitask deep neural networks. Biosyst. Eng. 2023, 233, 63–75. [Google Scholar] [CrossRef]
  21. Abebe, A.M.; Kim, Y.; Kim, J.; Kim, S.L.; Baek, J. Image-Based High-Throughput Phenotyping in Horticultural Crops. Plants 2023, 12, 2061. [Google Scholar] [CrossRef]
  22. Tong, Y.-S.; Lee, T.-H.; Yen, K.-S. Deep Learning for Image-Based Plant Growth Monitoring: A Review. Int. J. Eng. Technol. Innov. 2022, 12, 225–246. [Google Scholar] [CrossRef]
  23. Xiao, F.; Wang, H.; Xu, Y.; Zhang, R. Fruit Detection and Recognition Based on Deep Learning for Automatic Harvesting: An Overview and Review. Agronomy 2023, 13, 1625. [Google Scholar] [CrossRef]
  24. Mohmed, G.; Heynes, X.; Naser, A.; Sun, W.; Hardy, K.; Grundy, S.; Lu, C. Modelling daily plant growth response to environmental conditions in Chinese solar greenhouse using Bayesian neural network. Sci. Rep. 2023, 13, 4379. [Google Scholar] [CrossRef] [PubMed]
  25. Gong, L.; Yu, M.; Jiang, S.; Cutsuridis, V.; Pearson, S. Deep learning-based prediction on greenhouse crop yield combined TCN and RNN. Sensors 2021, 21, 4537. [Google Scholar] [CrossRef] [PubMed]
  26. Sim, H.S.; Kim, D.S.; Ahn, M.G.; Ahn, S.R.; Kim, S.K. Prediction of Strawberry Growth and Fruit Yield based on Environmental and Growth Data in a Greenhouse for Soil Cultivation with Applied Autonomous Facilities. Hortic. Sci. Technol. 2020, 38, 840–849. Available online: https://www.hst-j.org/articles/article/Qbad/ (accessed on 10 November 2025). [CrossRef]
  27. Mahmood, F.; Govindan, R.; Bermak, A.; Yang, D.; Al-Ansari, T. Data-driven robust model predictive control for greenhouse temperature control and energy utilisation assessment. Appl. Energy 2023, 343, 121190. [Google Scholar] [CrossRef]
  28. Wang, A.; Lv, J.; Wang, J.; Shi, K. CO2 enrichment in greenhouse production: Towards a sustainable approach. Front. Plant Sci. 2022, 13, 1029901. [Google Scholar] [CrossRef]
  29. Wen, J.; Abeel, T.; de Weerdt, M. “How sweet are your strawberries?”: Predicting sugariness using non-destructive and affordable hardware. Front. Plant Sci. 2023, 14, 1160645. [Google Scholar] [CrossRef]
  30. Abd-Elrahman, A.; Wu, F.; Agehara, S.; Britt, K. Improving strawberry yield prediction by integrating ground-based canopy images in modeling approaches. ISPRS Int. J. Geo-Inf. 2021, 10, 239. [Google Scholar] [CrossRef]
  31. Nakano, S.; Fujii, N.; Koyama, R.; Uno, Y. Prediction of Lettuce Harvest Date and Evaluation of Data for Yield Estimation Using Artificial Intelligence Analysis of Aerial Drone Images. Hortic. J. 2025, advance online publication. [Google Scholar] [CrossRef]
  32. Lin, Z.; Liu, W.; Wang, S. Strawberry harvest date prediction using multi-feature fusion deep learning in plant factory. Comput. Electron. Agric. 2025, 234, 110174. [Google Scholar] [CrossRef]
  33. Gongal, A.; Karkee, M.; Amatya, S. Apple fruit size estimation using a 3D machine vision system. Inf. Process. Agric. 2018, 5, 498–503. [Google Scholar] [CrossRef]
  34. Gené-Mola, J.; Vilaplana, V.; Rosell-Polo, J.R.; Gregorio, E.; Morros, J.R.; Ruiz-Hidalgo, J.; Sanz, R. Multi-modal deep learning for Fuji apple detection using RGB-D cameras. Comput. Electron. Agric. 2019, 162, 689–698. [Google Scholar] [CrossRef]
  35. Bortolotti, G.; Piani, M.; Gullino, M.; Mengoli, D.; Franceschini, C.; Corelli Grappadelli, L.; Manfrini, L. A computer vision system for apple fruit sizing by means of low-cost depth camera and neural network application. Precis. Agric. 2024, 25, 2740–2757. [Google Scholar] [CrossRef]
  36. Marzahl, C.; Aubreville, M.; Bertram, C.A.; Maier, J.; Maier, R.; Klopfleisch, R.; Maier, A.; Bergler, C.; Kröger, C.; Voigt, J. EXACT: A collaboration toolset for algorithm-aided annotation of images with annotation version control. arXiv 2020, arXiv:2004.14595. [Google Scholar] [CrossRef]
  37. Miao, R.; Toth, R.; Zhou, Y.; Madabhushi, A.; Janowczyk, A. Quick Annotator: An open-source digital pathology based rapid image annotation tool. arXiv 2021, arXiv:2101.02183. [Google Scholar] [CrossRef]
Figure 1. External view of the Energy-Self-Sufficient Smart Farm Research Greenhouse at Jeonnam Agri-cultural Research and Extension Services (Naju-si).
Figure 1. External view of the Energy-Self-Sufficient Smart Farm Research Greenhouse at Jeonnam Agri-cultural Research and Extension Services (Naju-si).
Agriculture 16 00169 g001
Figure 2. Cultivation of melon cultivars in the experimental greenhouse: (a) ‘Damas’, a Korean commercial melon cultivar (transplanted on 10 June 2025); (b) ‘Supia’, a Korean commercial melon cultivar (transplanted on 10 June 2025).
Figure 2. Cultivation of melon cultivars in the experimental greenhouse: (a) ‘Damas’, a Korean commercial melon cultivar (transplanted on 10 June 2025); (b) ‘Supia’, a Korean commercial melon cultivar (transplanted on 10 June 2025).
Agriculture 16 00169 g002
Figure 3. Image acquisition system using a webcam mounted on a rail-based lift moving along the greenhouse pathway (a), with 100 mm × 100 mm ArUco fiducial markers (not QR codes) placed near the fruits for camera calibration and pose estimation (b).
Figure 3. Image acquisition system using a webcam mounted on a rail-based lift moving along the greenhouse pathway (a), with 100 mm × 100 mm ArUco fiducial markers (not QR codes) placed near the fruits for camera calibration and pose estimation (b).
Agriculture 16 00169 g003
Figure 4. Annotation examples using CVAT: (a) marker condition (melon + 100 mm × 100 mm ArUco fiducial marker (not a QR code), polygon); (b) no-marker condition (melon only, polygon with occlusion status annotated as a binary Boolean attribute: occluded true/false).
Figure 4. Annotation examples using CVAT: (a) marker condition (melon + 100 mm × 100 mm ArUco fiducial marker (not a QR code), polygon); (b) no-marker condition (melon only, polygon with occlusion status annotated as a binary Boolean attribute: occluded true/false).
Agriculture 16 00169 g004
Figure 5. Schematic diagram of the fusion model combining CNN-based fruit detection from RGB images with LSTM-based environmental and fertigation data analysis.
Figure 5. Schematic diagram of the fusion model combining CNN-based fruit detection from RGB images with LSTM-based environmental and fertigation data analysis.
Agriculture 16 00169 g005
Figure 6. Detection results from the training and validation sets using Camera A: (a) marker condition and (b) no-marker condition. The square patterns shown in the marker condition are ArUco fiducial markers (not QR codes) used for camera calibration and scale reference.
Figure 6. Detection results from the training and validation sets using Camera A: (a) marker condition and (b) no-marker condition. The square patterns shown in the marker condition are ArUco fiducial markers (not QR codes) used for camera calibration and scale reference.
Agriculture 16 00169 g006
Figure 7. Detection results from the training and validation sets using Camera B: (a) marker condition and (b) no-marker condition. The square patterns shown in the marker condition are ArUco fiducial markers (not QR codes) used for camera calibration and scale reference.
Figure 7. Detection results from the training and validation sets using Camera B: (a) marker condition and (b) no-marker condition. The square patterns shown in the marker condition are ArUco fiducial markers (not QR codes) used for camera calibration and scale reference.
Agriculture 16 00169 g007
Figure 8. Detection results on the independent test dataset from Camera A: (a) marker condition and (b) no-marker condition.
Figure 8. Detection results on the independent test dataset from Camera A: (a) marker condition and (b) no-marker condition.
Agriculture 16 00169 g008
Figure 9. Additional detection results on the independent test dataset from Camera B: (a) marker condition and (b) no-marker condition.
Figure 9. Additional detection results on the independent test dataset from Camera B: (a) marker condition and (b) no-marker condition.
Agriculture 16 00169 g009
Figure 10. Detection of occluded melons under (a) marker and (b) no-marker conditions.
Figure 10. Detection of occluded melons under (a) marker and (b) no-marker conditions.
Agriculture 16 00169 g010
Figure 11. Examples of false positive and missed detections under no-marker conditions (b) compared to marker conditions (a).
Figure 11. Examples of false positive and missed detections under no-marker conditions (b) compared to marker conditions (a).
Agriculture 16 00169 g011
Figure 12. Scatter plot comparing measured and predicted melon weights using a regression model fitted to destructive-sampling data.
Figure 12. Scatter plot comparing measured and predicted melon weights using a regression model fitted to destructive-sampling data.
Agriculture 16 00169 g012
Figure 13. Predicted and observed melon diameter based on environmental data.
Figure 13. Predicted and observed melon diameter based on environmental data.
Agriculture 16 00169 g013
Figure 14. Melon growth prediction and harvest timing based on fused data (diameter vs. weight by DAT).
Figure 14. Melon growth prediction and harvest timing based on fused data (diameter vs. weight by DAT).
Agriculture 16 00169 g014
Table 1. Summary of related studies.
Table 1. Summary of related studies.
StudyCropData TypeModel/MethodMain FindingsLimitations
Bargoti & Underwood (2017) [16]Apple, Mango, AlmondRGB imagesFaster R-CNNAchieved stable fruit detection in complex orchard environmentsDid not include fruit size or maturity estimation
Mirhaji et al. (2021) [17]OrangeRGB imagesYOLO-V4Quantified fruit load under varying lighting conditionsLimited to a specific crop and environment
Afonso et al. (2020) [18]TomatoRGB imagesMask R-CNNAccurate fruit detection and counting in greenhouseNo time-series growth estimation
Kim et al. (2022) [19]PlumRGB-D imagesFaster R-CNN, EfficientDet, SSDImproved fruit diameter estimation; validated RGB-D fusionNeeds extension to broader multimodal learning
Ferrer-Ferrer et al. (2023) [20]Multiple fruitsRGB imagesMultitask DNNPerformed detection and size estimation simultaneouslyLimited dataset diversity
Mohmed et al. (2023) [24]Tomato (greenhouse)Environmental data (T, RH, CO2, radiation)Bayesian Neural NetworkQuantified effects of environment on growth and yieldNo image data included
Gong et al. (2021) [25]TomatoEnvironmental dataTCN + RNNImproved yield prediction by capturing time-series patternsRestricted to tomato dataset
Sim et al. (2020) [26]StrawberryEnvironmental + growth dataRegression/MLIdentified key variables (VPD, PAR, RH) for yield predictionDid not use RGB images
Wen et al. (2023) [29]StrawberryRGB + environmental data (T, RH, light, irrigation)Feature fusionImproved sugar content prediction; reduced error vs. unimodalSmall-scale dataset
Nakano et al. (2025) [31]LettuceDrone images + weather dataAI-based fusionPredicted harvest date with mean error of 2.35 daysOutdoor crop; limited greenhouse application
Lin et al. (2025) [32]StrawberryTime-series images + environmental dataMulti-feature fusion DLEnhanced harvest date prediction performanceLack of application to other crops (e.g., melon)
Gongal et al. (2018) [33]Apple3D machine vision + markerMachine visionAchieved accurate diameter and volume estimationMarker installation is cumbersome
Gené-Mola et al. (2019) [34]AppleRGB-D (markerless)Multimodal DLEstimated fruit size without markersRequires complex modeling and large training data
Bortolotti et al. (2024) [35]AppleDepth camera (markerless)DL-based CV systemDemonstrated field-level markerless fruit sizingLimited to specific orchard conditions
Table 2. Overview of the experimental setup, including greenhouse location, cultivars, cultivation schedule, and control conditions.
Table 2. Overview of the experimental setup, including greenhouse location, cultivars, cultivation schedule, and control conditions.
CategoryDescription
LocationEnergy-Self-Sufficient Smart Farm Research Greenhouse, Jeonnam Agricultural Research and Extension Services(1508, Senam-ro, Sanpo-myeon, Naju-si, Jeollanam-do, Republic of Korea, 58213)
Cultivars‘Damas’, ‘Supia’
Transplanting10 June 2025
Pollination1–4 July 2025
Harvesting28 August 2025
Cultivation TypeSoil-less culture with coir substrate, automated irrigation and drainage system
Climate ControlDaytime 25–30 °C, Nighttime 18–22 °C, RH 60–70%, CO2 400–800 μmol·mol−1
Nutrient ControlDrainage ratio ≈ 30%, EC 1.5–2.0 dS·m−1
Table 3. Definition of annotation classes and attributes used in the melon dataset.
Table 3. Definition of annotation classes and attributes used in the melon dataset.
ClassTypeAttributeDescription
MelonPolygonOccluded (True/False)Presence or absence of visual obstruction (occluded: true/false)
MarkerPolygon-100 mm × 100 mm ArUco marker for calibration
Table 4. Distribution of images across training, validation, and test sets according to date, camera type, and marker condition.
Table 4. Distribution of images across training, validation, and test sets according to date, camera type, and marker condition.
DateCameraMarker
Condition
TrainingValidationTestTotal
25 August 2025Webcam AMarker20660-266
25 August 2025Webcam ANo-marker12431-155
25 August 2025Webcam BMarker13938 177
25 August 2025Webcam BNo-marker7620 96
18 August 2025Webcam ANo-marker--8585
18 August 2025Webcam BNo-marker--5252
12 August 2025Webcam ANo-marker--8585
12 August 2025Webcam BNo-marker--5252
8 August 2025Webcam ANo-marker--9090
8 August 2025Webcam BNo-marker--5454
Table 5. Definition of evaluation metrics used for detection and regression tasks.
Table 5. Definition of evaluation metrics used for detection and regression tasks.
MetricTypeDefinitionEquation
mAP@0.5DetectionMean Average Precision at IoU threshold 0.5; evaluates overall detection accuracy m A P = 1 N i = 1 N A P i
PrecisionDetectionRatio of correctly detected objects among all detected objects P r e c i s i o n = T P T P + F P
RecallDetectionRatio of correctly detected objects among all ground-truth objects R e c a l l = T P T P + F N
R2RegressionProportion of variance in observed values explained by predictions R 2 = 1 ( y i y ^ i ) 2 ( y i y ¯ ) 2
MAERegressionMean absolute difference between predicted and observed values M A E = 1 n i = 1 n y i y ^ i
RMSERegressionRoot of mean squared error; penalizes larger errors more heavily R M S E = 1 n i = 1 n ( y i y ^ i ) 2
Table 6. Detection of melons and markers under marker vs. no-marker conditions.
Table 6. Detection of melons and markers under marker vs. no-marker conditions.
ConditionClassmAP@0.5PrecisionRecall
MarkerMelon0.920.910.90
MarkerMarker0.950.940.93
No-markerMelon0.890.880.87
Table 7. Detection performance of melon and marker under marker vs. no-marker conditions.
Table 7. Detection performance of melon and marker under marker vs. no-marker conditions.
ConditionR2MAE (mm)RMSE (mm)
Marker0.925.87.2
No-marker0.906.58.1
Table 8. Performance of weight prediction under marker and no-marker conditions.
Table 8. Performance of weight prediction under marker and no-marker conditions.
ConditionR2MAE (g)RMSE (g)
Marker0.9184.2102.5
No-marker0.8995.7118.3
Table 9. Predicted and actual harvest data.
Table 9. Predicted and actual harvest data.
CategoryHarvest DateDAT (Days After
Transplanting)
Fruit Diameter
(mm)
Fruit Weight
(kg)
Predicted28 August 202579150.0 (threshold)1.65 (model est.)
Measured28 August 202579147.71.68
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, K.; Jung, S.; Lee, J.; Jung, U.; Lee, M. Prediction of Optimal Harvest Timing for Melons Through Integration of RGB Images and Greenhouse Environmental Data: A Practical Approach Including Marker Effect Analysis. Agriculture 2026, 16, 169. https://doi.org/10.3390/agriculture16020169

AMA Style

Yang K, Jung S, Lee J, Jung U, Lee M. Prediction of Optimal Harvest Timing for Melons Through Integration of RGB Images and Greenhouse Environmental Data: A Practical Approach Including Marker Effect Analysis. Agriculture. 2026; 16(2):169. https://doi.org/10.3390/agriculture16020169

Chicago/Turabian Style

Yang, Kwangho, Sooho Jung, Jieun Lee, Uhyeok Jung, and Meonghun Lee. 2026. "Prediction of Optimal Harvest Timing for Melons Through Integration of RGB Images and Greenhouse Environmental Data: A Practical Approach Including Marker Effect Analysis" Agriculture 16, no. 2: 169. https://doi.org/10.3390/agriculture16020169

APA Style

Yang, K., Jung, S., Lee, J., Jung, U., & Lee, M. (2026). Prediction of Optimal Harvest Timing for Melons Through Integration of RGB Images and Greenhouse Environmental Data: A Practical Approach Including Marker Effect Analysis. Agriculture, 16(2), 169. https://doi.org/10.3390/agriculture16020169

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop