Next Article in Journal
Growth and Nutritional Responses of Zucchini Squash to a Novel Consortium of Six Bacillus sp. Strains Used as a Biostimulant
Previous Article in Journal
Development of Assessment Criteria for Managing the Quality of Taishan Black Tea
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

RNN-Based Approach for Broccoli Harvest Time Forecast

Faculty of Engineering, Kyoto University of Advanced Science, Kyoto 615-8577, Japan
Air Water Inc., Osaka 542-0081, Japan
Institute of Industrial Science, The University of Tokyo, Tokyo 113-8654, Japan
Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8563, Japan
Graduate School of Bioresources, Mie University, Mie 514-8507, Japan
Faculty of Engineering, Alexandria University, Alexandria 11432, Egypt
Department of Civil and Environmental Engineering, School of Environment and Society, Tokyo Institute of Technology, Tokyo 152-8550, Japan
Author to whom correspondence should be addressed.
Agronomy 2024, 14(2), 361;
Submission received: 16 January 2024 / Revised: 7 February 2024 / Accepted: 8 February 2024 / Published: 10 February 2024
(This article belongs to the Section Precision and Digital Agriculture)


This article investigates approaches for broccoli harvest time prediction through the application of various machine learning models. This study’s experiment is conducted on a commercial farm in Ecuador, and it integrates in situ weather and broccoli growing cycle observations made over seven years. This research incorporates models such as the persistence, thermal, and calendar models, demonstrating their strengths and limitations in calculating the optimal broccoli harvest day. Additionally, Recurrent Neural Network (RNN) models with Long Short-term Memory (LSTM) layers were developed, showcasing enhanced accuracy with an error of less than 2.5 days on average when combined with outputs from the calendar model. In the final comparison, the RNN models outperformed both the thermal and calendar models, with an error of 3.14 and 2.5 days, respectively. Furthermore, this article explores the impact of utilizing Global Ensemble Forecast System forecast weather data as a supplementary source to the in situ observations on model accuracy. The analysis revealed the limited effect of extension with a 9-day forecast on the experimental field, reaching an error reduction of up to 0.04 days. The findings provide insights into the effectiveness of different modeling approaches for optimizing broccoli harvest times, emphasizing the potential of RNN techniques in agricultural decision making.

1. Introduction

Over the years, modeling plant growth throughout its various developmental stages has gained considerable attention in scientific research [1]. Understanding the dependencies among variables influencing plant development has numerous applications in agriculture. When applied at the farm level, advanced models serve as tools for signaling the optimal timing of agronomic practices such as pruning, harvesting, and postharvest procedures. The forecast provided by these models enables farmers to make informed decisions on farm management [2], such as transportation logistics, storage allocation, and the formulation of marketing campaigns.
A challenge in the pursuit of accurate models arises from the natural complexity of the physiological processes responsible for plant growth. A perfect model must account for the numerous variables, each introducing its own level of uncertainty. The cumulative effect of these uncertainties often results in a model with substantial error accumulation, making it unsuitable for reliable predictions. To overcome this challenge, many researchers have narrowed the focus to specific key factors, with a particular emphasis on climate conditions [3]. These conditions include, but are not limited to, temperature, solar radiation, and water availability, which are some of the most significant for plant development. Other limitations are caused by the inherent variability in plant structures and the durations of the harvesting period, ranging from a few weeks to an entire season [4].
Several attempts have been made to construct a comprehensive phenological model for broccoli to determine the optimal harvest time. In one study [5] conducted in Ibaraki prefecture, Japan, the researchers focused on the “Ohayou” broccoli cultivar, utilizing temperature and the total accumulated solar radiation to estimate the dry matter weight and predict production quantities. Similarly, investigations in northeast Germany [6] centered on defining the optimal harvest window for the “Ironman F1” cultivar, leveraging daily air temperature on the field. The findings of these studies indicate the potential to accurately predict the broccoli harvest day based on meteorological factors such as temperature and solar radiation, achieving at least a four-day accuracy. While examining the other factors influencing broccoli growth, studies have pointed to parameters like the sowing day [7], fertilizer amount, and plant density [8] as significant contributors.
On the other hand, Mourao and Brito [9] chose a modeling approach based solely on the total accumulated temperature. This methodology demonstrated low errors and consistent outcomes across various planting sites while requiring a minimal set of parameters. In acknowledgement of the variability of input parameters, it is crucial to account for the distinct growth patterns observed at each stage of development, spanning from seeding maturity, such as spear initiation, head formation, and flowering; for this reason, studies such as [10,11] have opted for the independent modeling of each stage. However, a subsequent study by the same researcher [12] discovered minimal differences in those model parameters configured for every phenological stage, highlighting the similar growth pattern for various cultivars on the same testing field.
Recent strides in the field of phenological phase forecasting for hazel and chestnut have showcased the efficiency of neural network (NN) models. In particular, Czernecki et al. [13] showed the capability of NN models in handling meteorological data. Other works [14,15] have used convolutional neural networks for forecasting wheat, soybean, and corn yield. However, in the context of broccoli farming, such an approach would fail due to a low spatial resolution of remotely sensed data and the small diameter of individual broccoli heads. On the other hand, novel studies [16,17] have emphasized the efficiency of recurrent neural networks (RNNs) in modeling time series data, suggesting that the division of atmospheric conditions into timesteps for sequential input into the neural network would result in improved forecasts for corn and wheat harvesting. While departing from traditional polynomial regression architectures, these models retained similar input features, such as temperature and solar radiation accumulation. For instance, LSTM networks were employed in [18] to improve apple production, exhibiting a three-day error in the average harvest day prediction. In a similar context, a study [19] demonstrated the estimation of bok choy production using advanced feature selection techniques to reduce the number of training parameters. In conclusion, the recent interest in well-suited RNN models for the task has prompted this study to develop similar approaches for optimizing broccoli plant harvest times.
In order to address this, multiple approaches are demonstrated in this study to predict the optimal broccoli harvest time. Firstly, we will develop RNN-based models that allow for capturing temporal changes in atmospheric parameters. Secondly, the proposed models will be evaluated along with traditional models, such as thermal and seasonal averages. Thirdly, the possibility of extending the input for RNNs with forecast data is evaluated. In summary, the primary objective of this investigation is to address challenges in predicting the harvest time of broccoli through the application of machine learning techniques.

2. Materials and Methods

2.1. Data Sources

2.1.1. Broccoli Dataset

The dataset for this experiment was sourced from a commercial farm in Ecuador, specifically from two field locations: La Merced and Cochabamba, both situated south of Quito, as seen on Figure 1. Given the favorable environmental conditions for broccoli cultivation in this region, farming activities were conducted throughout the entire year. The dataset contained a list of dates for the “Avenger” cultivar, including transplantation and harvest. The dataset was collected over a span of seven years, from 2015 to 2022, resulting in a total of 779 samples. During the germination stage of each batch, it was ensured that the plants received enough water supply and were exposed to natural sunlight until the emergence of true leaves, typically occurring around day 33. Subsequently, the seedlings were transplanted into the field, maintaining a consistent density of 50 cm between each sprout. Continuous monitoring was undertaken until the plants reached a head weight of 500 g or more, a criterion suitable for harvest readiness, so this stage was defined as maturity. Cropping was performed multiple times per batch to minimize production losses when harvesting only matured heads. However, the harvest time in this study was defined as the date of the first cropping; thus, it was recorded in the dataset. All mentioned models are configured to estimate this date, providing the closest optimal harvest to satisfy business needs.
Observations of cropping between 5 September 2019, and 9 April 2020 were excluded from the dataset due to its anomalous behavior, probably caused by business demands or volcanic activity depositing layers of ash onto the fields. This impact significantly altered harvest timings, making the data during this period non-representative for model training. Additionally, two records around 6 September 2016, were discarded as they contained exceptionally low values of growing duration (78 days). These anomalies were inconsistent with observations taken on adjacent days and were likely a result of the business’s necessity to hasten production to meet increased demand. As a result, the field dataset contained 682 samples, where the fastest growth ended in 81 days and the longest in 107 days. The dataset was divided into two sets: a training set comprising the first 60% of the samples chronologically and a validation set comprising the remaining 40%. The training set was exclusively utilized for deriving optimal model parameters, ensuring that the model learned from a substantial portion of the data. On the other hand, the validation set was employed to test the model’s performance on independent and previously unseen data. Following these practices protects against overfitting and enhances the performance of the models.

2.1.2. Weather Dataset

This study systematically measured field weather conditions using various sensors from 2015 to 2023. Over this period, the data collection process was conducted four times: January 2015 to January 2020, January 2020 to June 2021, June to September 2021, and January 2022 to May 2023, each time with differently calibrated sensors. Throughout these periods, measurements of average, minimum, and maximum temperature, average wind speed, and daily total precipitation were recorded. Since each sensor was calibrated independently, the mean correction method [20] was employed to address discrepancies in sensor measurements. When applying this methodology, it was assumed that the most recent observations were the most reliable and would be considered a reference for future use. Consequently, each variable’s means and standard deviations were computed over measured years based on observations made in 2022 and 2023, denoted as μ b a s e and σ b a s e . For individual sensors, the observations for each variable (denoted as x) underwent a transformation x c o r r = μ b a s e + x μ s σ b a s e σ s , where μ s and σ s represent the annual mean and standard deviation for the specific sensor. Notably, in the case of the sensor active from June to September 2021, seasonal μ b a s e and σ b a s e were computed only for June, July, and August instead of the whole year. This adjustment was necessary due to the limited number of observed measurements, preventing the compilation of complete yearly statistics. The mean correction method ensures consistency in mean and standard deviation in the dataset by aligning individual sensor measurements with the reference values derived from the most recent observations. Statistics over the included features after adjustments are shown in Table 1 for each season. In Figure 2, the monthly average over the same corrected features is demonstrated.

2.1.3. Forecast Dataset

The weather forecast data were used as a supplementary data source alongside field observations in order to increase the amount of information about weather conditions. Considering the data availability and spatial resolution, the Global Ensemble Forecast System (GEFS), developed by the National Centers for Environmental Prediction (NCEP) [21,22], was used in this study. The GEFS dataset is accessible on a 0.25-degree Gaussian grid, and has provided forecasts for ten days with a temporal resolution of 3 h per timestep since 2000. Given that the agricultural fields were situated between grid points, the widely adopted inverse distance interpolation method [23] was performed to extrapolate forecasted weather data to the exact coordinates of the field locations where weather sensors were installed. To ensure alignment between field observations and GEFS forecasts, parameters including average, minimum, and maximum 2 m temperatures, average 10 m wind speed, and total precipitation were extracted and organized into 24 h groups corresponding to Ecuadorian days (GMT-5). It is essential that due to the GEFS forecast being produced daily according to Greenwich Mean Time, after the conversion to the Ecuadorian time zone, the forecast for the 10th day in Ecuador was truncated by 5 h. Therefore, only the data for the first nine days of the forecast were utilized in the analysis, with predictions for the last day excluded. This approach aimed to maintain accuracy and consistency in integrating forecasted weather conditions with field observations.
This study used the linear regression (LR) approach [24] as a mitigation strategy to address biases inherent in the model outputs caused by complex topography around the experimental site. This bias is shown in Table A1, where the Mean Absolute Error between in situ observations and GEFS is computed. Specifically, for each forecasted variable, nine distinct LR models were developed. Each model was configured to correct the bias on a specific forecast day. Training these models involved utilizing field weather observations as true values, with forecasted variables as inputs. The acquired LR models were subsequently applied to transform the entire selection of the GEFS subset.

2.2. Harvest Models

This study introduces multiple models for estimating broccoli harvest time, simplified to calculating the growing days (GDs), particularly, the number of days between transplanting (planting day, PD) and harvest day.

2.2.1. Persistence Model

The initial approach, termed the “persistence model”, sets a constant number of GDs, regardless of seasonal variations. The constant output was configured to be equal to the mean value of GDs in the training dataset. This assumption establishes a baseline prediction, providing insight into the accuracy achieved when neglecting variables that exhibit temporal fluctuations. The resulting persistence model serves as a reference point against which the subsequent, more complex models can be compared and evaluated.

2.2.2. Thermal Model

Thermal models, proposed by [25], served as a framework to establish a relationship between temperature and the growth progression of broccoli. These models operate under the assumption that at each stage of its life, broccoli exhibits optimal growing temperatures, as well as lower and higher temperature bounds. Crossing these extreme bounds t m i n and t o p t halts progress, outlining the sensitivity of broccoli growth to temperature fluctuations. Furthermore, the model defines that the accumulation of average daily temperature T = i = 1 d a y s t m i n + t m a x 2 , referred to as thermal time, serves as a metric to state the current stage of phenotypic development. This study used linear regression to establish the relationship between thermal time accumulated over the entire period from transplantation to harvest and the corresponding number of days required for growth.

2.2.3. Calendar Model

The first introduced models capable of incorporating variable features and generating forecasts are statistical calendar models (CMs). In this context, these models utilize information solely about PD to estimate the corresponding harvest day. Operating on the assumption that seasonal influence is the most significant factor, these models infer that the day number within a year can serve as a metric for the average speed of broccoli growth. The underlying idea could be expanded to the idea that broccoli growth strongly correlates with the seasons, implying a cyclical nature in growing duration over the years. Consequently, a transformation from the day of the year, starting from 1 January to 31 December, is applied to deduce the number of days required for growth.
This study explored two distinct approaches for the structure of this transformation function. The first approach involves approximating the shape of the dataset with a sine wave (Sine). In this case, the optimization goal was to minimize the mean square difference between the curve and the original points. Parameters such as amplitude and phase shift for the best-fitting sine wave were obtained using optimization techniques available in the scikit-learn library [26]. The sine wave was then used to map new observations to the original data points’ space, where the period from 1 January to 31 December each year corresponded to the interval [0, 2 π ). The second method, the Average Window (AW) technique, commonly used for trend estimation, calculates the average growing duration over a predefined range around the selected point based on training data. However, the challenge arises near the range’s edges, representing the year’s beginning and end, where an average window might extend beyond the available range. Therefore, the whole sequence was repeated on both sides, ensuring that edge cases were handled correctly and accounted for points in calendar order. The advantage of these models lies in their independence from hard-to-measure variables, enabling immediate forecasting after planting into the field.

2.2.4. RNN Model

The foundation of estimating plant growing speed lies in the seasonal influence and daily variations in meteorological parameters deviating from the seasonal mean. RNN models have emerged as powerful tools for handling time series data, particularly daily sensor measurements. The model architecture used in this study contained one LSTM [27] layer followed by four fully connected layers. The complete model description and list of hyperparameters are explained in Figure 3. Training epochs were controlled by early stopping with a tolerance of 10 epochs. For the optimizer, Adam was chosen with an initial learning rate of 0.001. Mean Absolute Error was used as a loss function.
The input features for these models encompass minimum, maximum, and average temperatures, average wind speed, total daily precipitation, thermal time, and planting date. In addition to the base atmospheric parameters, 14-day accumulations and averages for each feature were added as a supplement to compensate for the lack of information about past events, totaling eighteen features. All features underwent normalization within the range of −1 to 1 before entering the neural network. This selection of variables is explained by the fact that all atmospheric parameters might play a role in influencing plant growing speed.
A modified approach to output value normalization was implemented to simplify training complexity. Rather than directly normalizing output values, subtracting the average GDs was performed beforehand, effectively reducing the range of potential outcomes. By design, the mean GD value can be immediately estimated at the time of planting by the persistence and calendar models. The capability of CMs to produce seasonal averages instead of yearly averages was the factor for integrating RNN models with a CM, where the CM initially generates rough estimations, and the subsequent RNN model refines these forecasts by adjusting the initial estimations. Consequently, two RNN-based models were developed: RNN with sine (RNN-S) and RNN with average window calendar (RNN-AW) models. The RNN model dynamically generates predictions with each new day’s data, as shown in Figure 4a. Model run day (RD) was defined as the number of days between PD and the date when a model generated a prediction. It signifies the amount of historical information about weather conditions that, when increasing in number, should lead to more accurate results. In this context, an extension of field observations with GEFS predictions in Figure 4b that serves as input to RNN models could lower the final error. In order to provide a robust estimation of the models’ performance, a forward-chaining cross-validation approach was used with each batch consisting of 2 years of observations. The testing was performed two times: with one and two batches in the training dataset. In the final performance report, errors for each metric were averaged.

2.3. Evaluation Metrics

In evaluating broccoli growth models, multiple performance metrics were used. First, Mean Absolute Error (MAE) was employed to determine the average forecast error that is expected from introduced techniques. MAE is calculated as
MAE = 1 n i = 1 n y i x i
where n indicates a number of planting–harvest observations in the test dataset, y i and x i are actual and predicted GDs, respectively. Secondly, in order to correctly handle median bias, Root Mean Square Error (RMSE) was used.
RMSE = 1 n i = 1 n y i x i
Thirdly, a Coefficient of Determination ( R 2 ) was added to the analyses to provide a measure of how well observed GR is estimated by models.
R 2 = 1 i = 1 n ( y i x i ) 2 i = 1 n ( y i y ^ ) 2
where y ^ is the mean GDs in the test dataset.

3. Results

3.1. Model Parameters

The persistence model, configured for a constant 92-day growth duration derived from the training dataset’s mean, served as a baseline. The parameters of a thermal model, such as the required thermal time, were calculated over the whole period of growth and are equal to 1147 degrees Celsius, which is close to the value of 1272 derived in the previous study [12]. The model parameters, such as t m i n and t o p t , were obtained with the same optimizing algorithm as for the Sine model. These resulting parameters of 4 and 22 are also similar to 0 and 20 derived in [28].
This experiment aimed to identify the optimal window size required for the analysis of the Average Window model. In order to ensure the independence of the testing dataset when calculating the optimal window, the training dataset was divided in half, with 30% allocated for training and another 30% for validation. A search of all possible window sizes, ranging from 3 to 99 days, was conducted. The results indicated a gradual increase in the MAE for all the datasets, starting from 19 days. Consequently, a window size of 19 days was selected as the optimal choice for the Average Window model due to showing the lowest error in the validation dataset. This decision was made to balance capturing relevant temporal patterns and avoid overfitting the data.
The exploration of parameters for the CM resulted in achieving MAE values of 2.06 and 2.47 days for the Average Window and Sine models, respectively. In contrast, the standard deviation for these approximations yielded closely aligned values of 2.56 and 2.63, respectively. Findings from the best-fitting curves in Figure 5 strongly suggest that the data can be effectively approximated with a sine wave, similar to the performance achieved through the Average Window filtering method.

3.2. Model Performance

The thermal model, characterized by parameters like the required thermal time, exhibited better attempts to approximate the plant growth speed at the beginning (January–April) and end (November–December) of the year, as shown in Figure 6a. However, disparities in results across other periods showed areas for potential model improvement. The calendar models in Figure 6b,c showcased a superior performance compared to the thermal model, in line with training data approximations.
Nevertheless, the adjustments with RNN models in Figure 6b,c demonstrated enhanced accuracy in capturing local features. RNN models, capable of daily predictions, underwent benchmarking on day 50, giving farmers at least 30 days’ notice before harvest. When adjusted, particularly the RNN-S, these models outperformed their base calendar models, capturing local features like a decrease in the growth speed for January. The study’s primary objective of assessing the improvement in the RNN-type forecast models with additional data revealed a general tendency of a decreasing MAE with increasing RD, as shown in Figure 7a,b, particularly within the initial 60 days. This emphasized the significance of early atmospheric measurements. Beyond day 60, the model performance stabilized, with error-increasing peaks around 19, 40, and 90 days, probably associated with distinct stages of broccoli development, such as the onset of head formation around day 40 [29].
When comparing the performance of all the models in Table 2, it becomes evident that the persistence model exhibits the least favorable results in both the MAE for 3.97 and RMSE for 4.77. Following in accuracy is the thermal model with the MAE 3.14, RMSE 3.92, and low R2 0.32. Both the calendar models outperformed the persistence and thermal models. They demonstrated similar results among one another across all the metrics, with the Sine model exhibiting slightly superior metrics compared to the Average Window. Notably, on day 50, the RNN models surpassed their base calendar models in terms of the MAE by 0.18 and 0.15 days, with the RNN-S model achieving the best scores in the MAE for 2.25, RMSE for 2.94 days, and R2 for 0.61.

3.3. Forecast Extension

The application of the linear regression model demonstrated in Table A1 showed notable reductions in the MAE across all the considered variables. For the unaltered GEFS forecasts, the MAE for the maximum, minimum, and average temperatures remained relatively consistent at 1.9, 7.9, and 3.7 degrees Celsius, respectively. However, applying the linear regression technique resulted in substantial improvements, reducing the MAE to 1.1, 1.6, and 0.7 degrees, respectively. Notably, the minimum temperature exhibited the highest MAE with a value of 1.6. In the context of the wind speed, the corrected forecast maintained a consistent MAE of around 0.4 m/s, showcasing a significant enhancement over the original forecast, which had an MAE of 1.3. Conversely, the correction yielded limited improvement for the daily precipitation variable, with the corrected model’s MAE at 2.9 compared to the original’s MAE of 5.1. Across all the variables, the error exhibited an observable but relatively small increase, highlighting the inherent uncertainties in numerical weather models in precipitation forecasting.
Testing dates starting from day 40 and ending at day 70 with a 5-day interval were chosen for the experiment to assess the performance of the models with the extension of the GEFS weather forecast. Due to the limited access to forecast data in 2021, caused by an upgrade of the GEFS model, the testing was performed only for the PD in 2022. Subsequently, the nine-day extension was incorporated using the GEFS forecast from that starting day. In Table 3, the performance of the RNN models is illustrated. Notably, the forecast extension contributed to an enhancement in the MAE for the days preceding day 60 for RNN-S. For the RNN-AW model, this improvement was observed up to the 55-day threshold. Beyond this point, errors increased for both models, with the RNN-AW model experiencing a 0.02-day error difference on day 70.
Students’ t-tests with a significance level of α = 0.05 were employed to assess the impact of forecast extension compared to the field-only data. The analysis in Table 4 reveals that, for both models, the calculated p-values exceeded 0.05. This observation indicates insufficient evidence to reject the null hypothesis, suggesting that the forecast extension does not result in a statistically significant difference in distribution compared to the field-only data. Although the forecast extension did not show a significant improvement in this study, its potential in combination with RNN models for a plant growth model is still good.

4. Discussion

The success of the calendar model can be attributed to the cyclic nature of broccoli growth throughout the year, allowing seasonal variations to be approximated by accounting for many historical observations. Notably, the sine wave demonstrated superior performance in testing compared to the commonly used method of trend extraction with the average filter.
Integrating the RNN models with the calendar model resulted in improved accuracy, dynamically adjusting and converging toward a constant estimation. Compared with the models proposed by Wei-Ming [8], the current model requires a larger set of input variables and processing resources, which might not be accessible at some fields. On the other hand, an extensive range of input features ensures the stable operation of the RNN models over the entire year instead only in the late spring and summer periods, which were used by Tan’s models [29].
While the error curves for both RNN-S and RNN-AW in Figure 7 tended to decrease, occasional error fluctuations caused by model imperfections and external factors led to an increase in the MAE and RMSE for short periods. Several factors hindered the performance of the RNN-like models. Firstly, they did not account for the phenological stage of the plant, which encompasses diverse growing patterns. Secondly, because the model was trained on commercial field data, business demands may have artificially influenced specific harvest timings. Thirdly, the stable weather conditions in the valleys of the Ecuadorian mountains, allowing for year-round broccoli farming, might lower the significance of individual atmospheric variables. Nevertheless, the proposed architecture of the RNN-like models successfully showed its capabilities to keep track of the weather conditions that affect the GDs prior to the RD. We suggest that improvements be made to interpreting the input features, such as changing the accumulation window. Another theory concluded that alterations from the seasonal averages of temperature and precipitation can be attributed to changes in the expected harvest time. While the minimum, maximum, and average daily temperatures remained almost consistent throughout the years with mean values of 7.8, 17.0, and 11.4 degrees Celsius, the RNN models showed that even weak fluctuations are responsible for changes in prediction. High sensitivity can become a weakness for the model when measurements are made with a low precision or a different sensor.
Weather forecasts can be a reliable data source for the atmospheric state, but it is important to consider the forecast error of downscaling it on a field location. The limited impact of a 9-day weather forecast extension is caused by the internal limitations of the numerical weather model that were not corrected by LR and a small RD increase by 9 days. However, a lower error was observed when estimating the harvest timings before day 55 compared to the model without forecast extension, showing a potential for the usage of weather forecasts. In conclusion, more complex techniques for forecast bias correction and downscaling should be considered in the future, as the error directly impacts the final accuracy of broccoli models.

5. Conclusions

This study explored the application of various plant models to estimate the optimal broccoli harvest timings in Ecuador, comparing them with traditional models such as persistence and thermal. The results reveal that the persistence model could not capture changes in the maturity timings perfectly, achieving the lowest MAE of 3.97 days. The thermal model in this study shared parameters similar to those in previous research by Tan. et al. [12], and its accuracy with an RMSE of 4.4 days underscored the significance of thermal time as a parameter for estimating the growth speed. Nevertheless, the proposed calendar and RNN models consistently outperformed in all the metrics, such as in RMSE < 3 days. It is essential to consider the potential application of these models in different climate zones suitable for broccoli farming, where forecasts are more accurate and annual meteorological variables have larger variations, such as the Mediterranean countries. Another area of future research lies in using unmanned aerial vehicles for the constant monitoring of broccoli maturity and adjusting the proposed models’ inputs.

Author Contributions

Conceptualization, M.L. and N.U.; data curation, M.L., R.K. and N.U.; formal analysis, M.L.; funding acquisition, K.O. and N.U.; investigation, M.L. and N.U.; methodology, M.L. and N.U.; project administration, K.O. and N.U.; resources, R.K.; software, M.L.; supervision, K.O. and N.U.; validation, M.L.; visualization, M.L.; writing—original draft, M.L. and N.U.; writing—review & editing, R.K., K.O., K.Y., I.A. and S.I.S. All authors have read and agreed to the published version of the manuscript.


This work was financially supported by the collaborative research framework of “Technology for IoT Sensing and Analysis” established between The University of Tokyo and Air Water Inc. Also, N.U. was funded by JSPS KAKENHI Grant Number JP21H01442, JP19K15096, JP18KK0117, MEXT KAKENHI JP21H05178 and by Japan Science and Technology Agency (JST) Belmont Forum. S.I.S. was funded by the Advanced Research Grant of Kyoto University of Advanced Science (KUAS).

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from Air Water Inc. and are available from Ryoji Korei with the permission of Air Water Inc.


The author wants to express gratitude to the Nippon Foundation for providing financial and non-material aid for Ukrainian students in Japan.

Conflicts of Interest

Ryoji Korei was employed by the company Air Water Inc. An author has received research grants from funding agencies, as explained in the Funding section. Funders took the role in the collection of data, in the writing of the manuscript, and in the decision to publish the results.

Appendix A

Table A1. Mean Absolute Error (MAE) of GEFS forecast without correction. The error when the forecast was corrected with linear regression is shown in parentheses.
Table A1. Mean Absolute Error (MAE) of GEFS forecast without correction. The error when the forecast was corrected with linear regression is shown in parentheses.
VariableGEFS Forecast Day
Average Temperature (C)3.723.663.683.713.733.743.753.743.75
Minimum Temperature (C)2.091.831.851.871.891.91.911.911.92
Maximum Temperature (C)7.537.567.67.637.657.677.687.677.67
Wind Speed (m/s)1.31.351.361.331.311.321.331.331.35
Total Precipitation (mm)4.595.95.765.595.535.


  1. Jin, X.; Kumar, L.; Li, Z.; Feng, H.; Xu, X.; Yang, G.; Wang, J. A Review of Data Assimilation of Remote Sensing and Crop Models. Eur. J. Agron. 2018, 92, 141–152. [Google Scholar] [CrossRef]
  2. Saiz-Rubio, V.; Rovira-Más, F. From Smart Farming towards Agriculture 5.0: A Review on Crop Data Management. Agronomy 2020, 10, 207. [Google Scholar] [CrossRef]
  3. Wheeler, T.R.; Hong, T.D.; Ellis, R.H.; Batts, G.R.; Morison, J.I.L.; Hadley, P. The Duration and Rate of Grain Growth, and Harvest Index, of Wheat (Triticum aestivum L.) in Response to Temperature and CO2. J. Exp. Bot. 1996, 47, 623–630. [Google Scholar] [CrossRef]
  4. Williams, J.R.; Jones, C.A.; Kiniry, J.R.; Spanel, D.A. The EPIC Crop Growth Model. Trans. ASAE 1989, 32, 0497–0511. [Google Scholar] [CrossRef]
  5. Ohishi, M.; Takahashi, M.; Fukuda, M.; Sato, F. Developing a Growth Model to Predict Dry Matter Production in Broccoli (Brassica oleracea L. var. italica) “Ohayou”. Hortic. J. 2023, 92, 77–87. [Google Scholar] [CrossRef]
  6. Lindemann-Zutz, K.; Fricke, A.; Stützel, H. Prediction of Time to Harvest and Its Variability of Broccoli (Brassica oleracea var. italica) Part II. Growth Model Description, Parameterisation and Field Evaluation. Sci. Hortic. 2016, 200, 151–160. [Google Scholar]
  7. Diputado, M.T., Jr.; Nichols, M.A. The Effect of Sowing Date and Cultivar on the Maturity Characteristics of Broccoli (Brassica oleraceae var. italica). Acta Hortic. 1989, 247, 59–66. [Google Scholar] [CrossRef]
  8. Wei-ming, L.; En-guo, W. Mathematical Modeling of Broccoli Cultivation and Growth Period and Yield of Flower Heads. In Computer and Computing Technologies in Agriculture VIII; Springer International Publishing: Cham, Switzerland, 2015; pp. 94–98. [Google Scholar]
  9. de Maria Mourao, I.; Brito, L.M. Empirical models for harvest date prediction in broccoli (Brassica oleracea L. var. italica Plenck). Acta Hortic. 2000, 239, 47–53. [Google Scholar] [CrossRef]
  10. Fujime, Y. Studies on Thermal Conditions of Curd Formation and Development in Cauliflower and Broccoli, with Special Reference to Abnormal Curd Development; Kagawa University: Takamatsu City, Japan, 1983. [Google Scholar]
  11. Tan, D.K.Y.; Wearing, A.H.; Rickert, K.G.; Birch, C.J. Detection of Floral Initiation in Broccoli (Brassica oleracea L. var. italica Plenck) Based on Electron Micrograph Standards of Shoot Apices. Aust. J. Exp. Agric. 1998, 38, 313–318. [Google Scholar]
  12. Tan, D.K.Y.; Birch, C.J.; Wearing, A.H.; Rickert, K.G. Predicting Broccoli Development: II. Comparison and Validation of Thermal Time Models. Sci. Hortic. 2000, 86, 89–101. [Google Scholar] [CrossRef]
  13. Czernecki, B.; Nowosad, J.; Jabłońska, K. Machine Learning Modeling of Plant Phenology Based on Coupling Satellite and Gridded Meteorological Dataset. Int. J. Biometeorol. 2018, 62, 1297–1309. [Google Scholar] [CrossRef]
  14. Gavahi, K.; Abbaszadeh, P.; Moradkhani, H. DeepYield: A Combined Convolutional Neural Network with Long Short-Term Memory for Crop Yield Forecasting. Expert Syst. Appl. 2021, 184, 115511. [Google Scholar] [CrossRef]
  15. Srivastava, A.K.; Safaei, N.; Khaki, S.; Lopez, G.; Zeng, W.; Ewert, F.; Gaiser, T.; Rahimi, J. Winter Wheat Yield Prediction Using Convolutional Neural Networks from Environmental and Phenological Data. Sci. Rep. 2022, 12, 3215. [Google Scholar] [CrossRef]
  16. Khaki, S.; Wang, L.; Archontoulis, S.V. A CNN-RNN Framework for Crop Yield Prediction. Front. Plant Sci. 2019, 10, 1750. [Google Scholar] [CrossRef]
  17. Haider, S.A.; Naqvi, S.R.; Akram, T.; Umar, G.A.; Shahzad, A.; Sial, M.R.; Khaliq, S.; Kamran, M. LSTM Neural Network Based Forecasting Model for Wheat Production in Pakistan. Agronomy 2019, 9, 72. [Google Scholar] [CrossRef]
  18. Boechel, T.; Policarpo, L.M.; Ramos, G.d.O.; da Rosa Righi, R.; Singh, D. Prediction of Harvest Time of Apple Trees: An RNN-Based Approach. Algorithms 2022, 15, 95. [Google Scholar] [CrossRef]
  19. Liu, S.-C.; Jian, Q.-Y.; Wen, H.-Y.; Chung, C.-H. A Crop Harvest Time Prediction Model for Better Sustainability, Integrating Feature Selection and Artificial Intelligence Methods. Sustain. Sci. Pract. Policy 2022, 14, 14101. [Google Scholar] [CrossRef]
  20. Jain, A.; Nandakumar, K.; Ross, A. Score Normalization in Multimodal Biometric Systems. Pattern Recognit. 2005, 38, 2270–2285. [Google Scholar] [CrossRef]
  21. NOAA Global Ensemble Forecast System (GEFS). Available online: (accessed on 1 September 2023).
  22. Zhou, X.; Zhu, Y.; Hou, D.; Fu, B.; Li, W.; Guan, H.; Sinsky, E.; Kolczynski, W.; Xue, X.; Luo, Y.; et al. The Development of the NCEP Global Ensemble Forecast System Version 12. Weather Forecast. 2022, 37, 1069–1084. [Google Scholar] [CrossRef]
  23. Zimmerman, D.; Pavlik, C.; Ruggles, A.; Armstrong, M.P. An Experimental Comparison of Ordinary and Universal Kriging and Inverse Distance Weighting. Math. Geol. 1999, 31, 375–390. [Google Scholar] [CrossRef]
  24. Huth, R. Statistical Downscaling of Daily Temperature in Central Europe. J. Clim. 2002, 15, 1731–1742. [Google Scholar] [CrossRef]
  25. Holzworth, D.P. DEVEL: A Crop Development Modelling Tool; Queensland Department of Primary Industsries: Brisbane City, QLD, Australia, 1991. [Google Scholar]
  26. Swiler, L.P.; Roberts, R.M.; Sullivan, S.P.; Stucky-Mack, N.J.; Vugrin, K.W. Confidence Region Estimation Techniques for Nonlinear Regression: Three Case Studies; United States Department of Energy: Washington, DC, USA, 2005. [Google Scholar]
  27. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  28. Tan, D.K.Y.; Birch, C.J.; Wearing, A.H.; Rickert, K.G. Predicting Broccoli Development I. Development Is Predominantly Determined by Temperature rather than Photoperiod. Sci. Hortic. 2000, 84, 227–243. [Google Scholar] [CrossRef]
  29. Tan, D.K.Y.; Birch, C.J.; Wearing, A.H.; Rickert, K.G. Modelling Broccoli Development, Yield and Quality. Available online: (accessed on 24 November 2023).
Figure 1. The location of the broccoli field depicted on three map scales. A red star indicates the relative field location.
Figure 1. The location of the broccoli field depicted on three map scales. A red star indicates the relative field location.
Agronomy 14 00361 g001
Figure 2. Monthly average for each month derived from years 2015 to 2023: (a) Temperature fields; (b) average daily wind speed and monthly total received precipitation. Error bars show the 10th and 90th percentile for measurements taken in each month.
Figure 2. Monthly average for each month derived from years 2015 to 2023: (a) Temperature fields; (b) average daily wind speed and monthly total received precipitation. Error bars show the 10th and 90th percentile for measurements taken in each month.
Agronomy 14 00361 g002
Figure 3. Architecture description of the used RNN model.
Figure 3. Architecture description of the used RNN model.
Agronomy 14 00361 g003
Figure 4. Timeline of the predictions with RNN-type models: (a) weather observations are only from the field; (b) weather observations from the field are combined with a forecast.
Figure 4. Timeline of the predictions with RNN-type models: (a) weather observations are only from the field; (b) weather observations from the field are combined with a forecast.
Agronomy 14 00361 g004
Figure 5. Approximation of derived curves on the training data from 1 January to 31 December: (a) best-fitting sine curve; (b) prediction curve obtained from averaging window approximation.
Figure 5. Approximation of derived curves on the training data from 1 January to 31 December: (a) best-fitting sine curve; (b) prediction curve obtained from averaging window approximation.
Agronomy 14 00361 g005
Figure 6. Output of the models when applied to the whole in situ weather dataset: (a) thermal model; (b) Sine and RNN-S models; (c) Average Window and RNN-AW models.
Figure 6. Output of the models when applied to the whole in situ weather dataset: (a) thermal model; (b) Sine and RNN-S models; (c) Average Window and RNN-AW models.
Agronomy 14 00361 g006
Figure 7. Changes in MAE for selected RDs for RNN-type models. The dotted lines represent an MAE for a base calendar model used for each case: (a) RNN with Sine calendar model; (b) RNN with AW calendar model.
Figure 7. Changes in MAE for selected RDs for RNN-type models. The dotted lines represent an MAE for a base calendar model used for each case: (a) RNN with Sine calendar model; (b) RNN with AW calendar model.
Agronomy 14 00361 g007
Table 1. Mean, median, and standard deviation over the available weather dataset.
Table 1. Mean, median, and standard deviation over the available weather dataset.
Wind Speed
Standard deviation1.641.810.944.040.61
Table 2. MAE, RMSE, and R2 metrics for models were used in this study. Notably, R2 was not calculated for the persistence model because the output is constant. Numbers in bold indicate the lowest error obtained for each metric.
Table 2. MAE, RMSE, and R2 metrics for models were used in this study. Notably, R2 was not calculated for the persistence model because the output is constant. Numbers in bold indicate the lowest error obtained for each metric.
MetricPersistenceThermalAverage Window SineRNN-S
(Day 50)
(Day 50)
Table 3. MAE comparison of models when extending meteorological data with nine additional days from the forecast. Numbers in bold indicate the lowest error obtained for each RD.
Table 3. MAE comparison of models when extending meteorological data with nine additional days from the forecast. Numbers in bold indicate the lowest error obtained for each RD.
Field OnlyField with ForecastField OnlyField with Forecast
Table 4. Obtained p-value for model outputs when extending with forecast data.
Table 4. Obtained p-value for model outputs when extending with forecast data.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lohachov, M.; Korei, R.; Oki, K.; Yoshida, K.; Azechi, I.; Salem, S.I.; Utsumi, N. RNN-Based Approach for Broccoli Harvest Time Forecast. Agronomy 2024, 14, 361.

AMA Style

Lohachov M, Korei R, Oki K, Yoshida K, Azechi I, Salem SI, Utsumi N. RNN-Based Approach for Broccoli Harvest Time Forecast. Agronomy. 2024; 14(2):361.

Chicago/Turabian Style

Lohachov, Mykhailo, Ryoji Korei, Kazuo Oki, Koshi Yoshida, Issaku Azechi, Salem Ibrahim Salem, and Nobuyuki Utsumi. 2024. "RNN-Based Approach for Broccoli Harvest Time Forecast" Agronomy 14, no. 2: 361.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop