1. Introduction
The agricultural sector is facing an intensifying labor shortage, which exacerbates the workload on farmworkers and necessitates the development of sophisticated predictive tools. Smart agriculture, which integrates information and communications technology, big data, and intelligent devices, has emerged as a viable solution for optimizing farm operations and increasing productivity. Globally, various studies have underscored its efficacy in enhancing agricultural practices, including improved water management through Internet of Things (IoT)-based environmental monitoring [
1] and yield enhancement via IoT-integrated machine learning models [
2]. In Japan, field experiments have been instrumental in visualizing cultivation conditions using environmental data and in enhancing agricultural operations with smart devices [
3].
Fruit harvest timing prediction models have been investigated in previous studies. de Souza et al. [
4] developed an artificial neural network (ANN) model for harvesting bananas, using the average temperature, maximum temperature, minimum temperature, precipitation, and sunshine duration as input variables. Their results indicated that the model could predict harvest timing with an error margin of 0.3%. In a previous study [
5], we developed an ANN model to predict the peak harvest dates for ‘Tonewase’, ‘Hiratanenashi’, and ‘Fuyu’ persimmons. This model utilizes meteorological variables that influence fruit coloration and can predict the peak harvest date with an accuracy of approximately three days. It was trained using peak harvest date data derived from arrival volume records at the JA Nara Prefecture Nishiyoshino Sorting Facility (JA Peak data) and cultivar adaptability test data from the Nara Prefecture Agriculture Research and Development Center in Japan (CT Peak data). Although the model demonstrated promising performance, it requires additional validation at the orchard level to confirm its robustness under varying growing conditions.
Image-based harvest prediction methods have been investigated for fruit crops. A method that integrates image processing with convolutional neural networks has been proposed for grape yield estimation, achieving an error rate of 11.8% [
6]. Furthermore, backpropagation neural networks have been used to predict apple yields based on fruit canopy images, with a root mean square error (RMSE) of 2.34 kg per tree [
7]. For citrus yield predictions, a long short-term memory (LSTM)-based model has been developed, which achieved error rates of 4.53% per tree and 7.22% for total yield prediction [
8]. These studies highlight the potential of image-based deep learning techniques in yield prediction. However, applying these methods to other fruit trees presents challenges owing to the scarcity of historical image datasets and the extended maturation period of fruit trees, which complicates data accumulation.
Alternatively, numerical data such as meteorological records present a robust alternative for yield prediction. Unlike image-based methods, numerical datasets boast extensive historical records and are amenable to predictive modeling. For example, prior research on blueberries has successfully utilized numerical data, incorporating variables like bee species composition and weather conditions, to predict yields. In this context, several models—multiple linear regression, boosted decision trees, random forests, and extreme gradient boosting (XGBoost)—were evaluated, with XGBoost demonstrating the highest accuracy and achieving an error rate of 5.444% [
9].
This study focuses on persimmon production in the Gojo-Yoshino region of Nara Prefecture, a prominent persimmon-growing area in Japan [
10]. To address labor shortages, previous studies have explored the use of smart devices [
11] and cultivation data [
12,
13] to enhance agricultural efficiency. Within the framework of the “Development and Improvement Program of Strategic Smart Agricultural Technology Grants”, demonstration experiments are underway to assess remote and automated irrigation systems operated via a dedicated communication network. Moreover, predictive models for harvest timing and yield are being developed to aid in more effective planning and decision making.
In this region, seasonal workers are conventionally hired to reduce labor shortages during the harvest season. Early planning of seasonal employment contributes to the stable securing of labor. To achieve this, it is essential to obtain prior information on factors such as harvest timing and yield. Conventionally, workforce planning has depended on empirical knowledge to predict harvest timing and yield. However, the escalating impacts of climate change [
14,
15] and the diminishing availability of agricultural labor are rendering conventional forecasting methods increasingly unreliable. Consequently, advanced predictive methodologies that utilize machine learning and meteorological data analysis are urgently needed.
Harvest timing is chiefly influenced by two physiological factors, namely, fruit enlargement and coloration, both of which are affected by meteorological conditions. For example, the ‘Fuyu’ persimmon grows optimally within a temperature range of 20–25 °C but shows growth suppression when temperatures rise above 25 °C. Additionally, temperatures exceeding 30 °C can adversely affect proper fruit coloration [
16]. In contrast, ‘Hiratanenashi’ exhibits a secondary enlargement phase in response to the cooler temperatures of autumn [
17], while ‘Tonewase’ is notably sensitive to temperature fluctuations in late August [
18]. Prior studies have shown that summer irrigation promotes fruit enlargement in ‘Fuyu’, whereas exclusion of rainfall during this period inhibits it [
19]. Furthermore, inadequate sunlight has been documented to negatively impact both fruit growth and coloration [
20,
21].
Several studies have investigated predictive models for the timing of persimmon harvests. For ‘Fuyu’, the initial occurrence of daily average temperatures below 23 °C is linked with the onset of peak harvest, indicating that harvest timing can be estimated by late September [
22]. For ‘Hiratanenashi’, a multiple regression model utilizing mid-July average temperature and mid-September maximum temperature has achieved a prediction error margin of 2–3 days [
23]. However, these models only provide predictions starting in September, which is often too proximal to the harvest period to be effectively used for workforce planning. To optimize labor allocation, developing models that can predict harvest timing several months in advance is essential. To the best of our knowledge, no prior studies have developed yield prediction models specifically for persimmons.
The authors are currently conducting a smart agriculture project in Japan’s hilly and mountainous regions. This paper reports on a predictive model for estimating the harvest timing and yield of fruit trees within this project. In particular, this study aims to refine the harvest peak date prediction model and develop a yield prediction model utilizing meteorological data. Whereas previous studies have often been limited to validation at a single site due to difficulties in collecting data from multiple locations, this study utilizes harvest data collected from multiple sites. The harvest peak date model will be enhanced with a more comprehensive dataset and its performance will be assessed at the field level. The yield prediction model will be devised into two variants: one for estimating total yield at the regional level and the other for calculating yield for individual orchards. The primary cultivars targeted for harvest timing prediction are ‘Tonewase’ and ‘Fuyu’, while ‘Tonewase’, ‘Hiratanenashi’, and ‘Fuyu’ are selected for yield prediction. By integrating sophisticated predictive modeling techniques into smart agriculture, this study aims to improve the efficiency of harvest scheduling and resource allocation in persimmon production.
4. Conclusions
This study developed predictive models for persimmon harvest timing and yield, achieving high levels of accuracy by integrating meteorological data with machine learning techniques. For harvest timing prediction, increasing the dataset size significantly reduced the variability in prediction results. Moreover, capturing site-specific variations in peak harvest dates necessitated a more extensive array of input features. Future research should explore a more detailed analysis of the interrelationships among these features to further refine model accuracy. Additionally, this study utilized accumulated meteorological data to construct a pseudo-time-series dataset and developed predictive models using an ANN. The adoption of time-series-specific architectures, such as LSTM, is anticipated to enhance prediction performance. In this study, we focused on major cultivars with a relatively large dataset. Validation for other cultivars will expand the range of cultivars to which the model can be applied in the future. For yield prediction, the model that incorporated regional meteorological variables demonstrated a low error rate of approximately 10%, effectively capturing regional yield trends. However, the field-level models exhibited higher error rates, indicating that regional input variables alone do not sufficiently capture field-specific variations. Yield variability at the field level is influenced by both inter-orchard and intra-orchard factors. Integrating elevation data into the field-level model improved prediction accuracy by addressing inter-orchard variability. However, to further increase model robustness, additional factors such as soil properties, irrigation management, and cultivation techniques should be integrated. Developing a generalized model that remains effective across different regions will necessitate identifying key variables that encapsulate both inter-orchard and intra-orchard variations. No correlation was observed between the predictions of the two models.
Additionally, this study identified common meteorological factors that influence both harvest timing and yield. Maximum temperature emerged as a critical factor for ‘Tonewase’, while minimum temperature was more impactful for ‘Fuyu’. These factors are crucial for both harvest timing and yield, highlighting the potential to develop a unified model that simultaneously predicts both dimensions. Employing a multi-task learning framework capable of forecasting harvest timing and yield within a single model could lead to further enhancements in agricultural planning and decision-making.
By implementing AI-driven prediction models, this research advances smart agriculture, facilitating optimized labor management and harvest scheduling in persimmon production. The developed harvest prediction model demonstrates high accuracy at the regional level, making it valuable for optimizing production and distribution. However, the use of only meteorological data or a limited set of field observations constrains model accuracy, particularly in regional-scale applications and generalization. Incorporating qualitative data—such as soil conditions and cultivation practices—as well as data from a larger number of fields with diverse topographic characteristics, is expected to improve model accuracy and generalizability. Future studies should investigate additional environmental factors and pursue long-term dataset expansions to enhance the generalizability and practical applicability of these models.