1. Introduction
Rice is the most important crop not only in Asia but is also a staple food in many other countries. Currently, approximately 5.6 billion people (80% of the world’s population) have rice as their staple food and according to the World Agricultural Supply and Demand Estimates (WASDE), the annual worldwide rice consumption in 2015 was 4.842 million tons, with rice production being 4.71 million tons. Rice consumption is expected to reach 8 million tons in 2030 owing to population and economic growth. It is predicted that in 2050, 9 billion people will be dependent on rice for food [
1]. Rice blast disease (also simply called “blast”), which affects most of the rice-producing countries, has already spread to approximately 85 countries [
2], leading to loss of food that could feed 60 million people per year. The rice blast pathogen
Magnaporthe oryzae is a fungus that causes disease throughout the rice growth period and develops in the node, leaves, collars, necks, panicles and seed. The phases of the disease are generally divided into leaf blast and panicle blast. However, depending on the onset area and onset time, the blast may progress through leaf, collar and neck node and panicle blast phases.
The rice variety, environmental factors and pathogens influence the occurrence of blast. In particular, when the environmental conditions are favorable to rice blast fungus during harvest season, an extremely large loss of yield can occur. In rice-blast-susceptible varieties, yield reductions of up to 65% have been reported [
3]. Environmental factors associated with disease development include weather conditions, fertilizer management and soil and fungicides control. Because recent weather conditions and fertilizer management methods, conditions have not been favorable to the occurrence of rice blast and there has been a decrease in the occurrence of leaf blast. In particular, higher temperatures during the growing season due to global warming and the disappearance of the monsoon in July [
4], along with the spread of cultivation techniques using less nitrogen-based fertilizers, are considered to be the main reasons for the decreased leaf blast occurrence. In addition, cultivation and propagation of blast-resistant rice varieties were initiated in the 1960s and this change seems to have made the most significant contribution to the reduction of rice blast [
5].
However, despite the ongoing spread of new blast-resistant rice varieties each year, a number of cases have been reported in which resistant varieties were converted into susceptible varieties within a few years [
6]. National Institute of Crop Science, Rural Development Administration (RDA), Suwon, Republic of Korea, is continuously monitoring the occurrence of rice blast; in addition, it has been monitoring the responses of both newly bred and existing rice varieties to rice blast in test sites throughout the country. The aim is to develop methods to prevent the outbreak of rice blast. However, as many farmers prefer rice cultivars with superior taste to disease-resistant varieties, it is highly likely that the damage caused by the disease will increase. Therefore, it is desirable to prevent damage by introducing various resistance genes into the cultivars that are primarily distributed in farms [
7].
In recent years, use of chemical fertilizers has reduced to meet consumer demand for environmentally friendly agricultural products; however, this can lead to an increase in the occurrence of rice blast. Further, in 2016 and 2017, rice blast disease became a serious problem in Bangladesh and India. Moreover, blast disease affects not only rice but also wheat. Therefore, prevention of blast disease is highly necessary and early prediction of its occurrence will be very helpful in achieving the abovementioned aim.
Most existing studies on rice blast fungus prediction are based on meteorological variables such as the leaf wetness duration, temperature, relative humidity and rainfall associated with blast occurrence and progression; these studies also consider other factors such as the use of host varieties and nitrogen fertilization. The correlations among the considered variables in these researches are then analyzed [
8,
9,
10,
11]. In these studies, the most commonly used weather variables are air temperature, relative humidity and rainfall and researches were conducted to construct blast simulation systems based on such weather information [
12]. However, those studies were hindered by difficulties in identifying the mathematical relationship between blast disease and environmental factors and the complexity of the mechanism involving the plant, pathogen and environmental factors; thus, practical application of those developed models is difficult.
In addition to mathematical formulations of the mechanism involving blast and environmental factors, attempts are also being made to predict rice blast disease using data-based machine learning methods. Previously, Kaundal et al. used weather variables such as temperature (max, min), relative humidity (max, min), rainfall and the number of rainy days per week for multiple regression, back propagation neural networks, generalized neural networks and support vector machine (SVM) methods, with the aim of predicting blast occurrence in India [
13]. The prediction results were compared with the actual blast occurrence; based on the results, the SVM method was found to be the most suitable machine learning method for blast disease prediction. These researchers now provide an SVM-based rice blast prediction web server. However, in the study by Kaundal et al. [
13], the historical patterns of the input variables (such as the temperature, relative humidity and rainfall) were not considered in the modeling. In addition, non-meteorological variables were not included.
Malicdem and Fernandez have proposed a blast prediction model for the northern Philippines using an artificial neural network and SVM [
14]. Under given weather conditions, models for classifying possible rice blast occurrence in specific rice growth stages and for predicting onset severity were generated. Principal component analysis (PCA) of the effect of weather data on the rice blast disease onset showed that precipitation had the greatest influence (48%), followed by the temperature minimum (31%), temperature maximum (17%) and humidity (3%). In that study, however, no non-weather-related variables were used and the temporal patterns of the weather variables were not considered during modeling. In addition, the developed model predicts blast disease occurrence based on the weather conditions of the same year; thus, the capability for preemptive blast disease prevention is limited.
A potentially effective approach to blast disease prediction involves the use of artificial intelligence techniques. In artificial neural networks, a shallow neural network with one hidden layer is mainly employed, because of data and computational power limitations. Recently, deep-learning-based algorithms called “deep neural networks” have been used for image classification, face recognition and speech recognition and have been attracting attention because of their high accuracy in many fields [
15,
16,
17]. Deep-learning-based models perform feature extraction and classification simultaneously in deep neural networks, unlike traditional machine learning methods that extract hand-crafted features and then input features extracted from simple classifiers such as SVMs. The ability to learn important features from data is their most important advantage; therefore, deep neural network-based algorithms have high performance if provided with sufficient data [
18].
Among deep-learning algorithms, recurrent neural networks (RNNs) can effectively learn sequential patterns from data containing temporal or sequential information. In particular, long short-term memory networks (LSTMs), a kind of RNNs, are becoming prominent [
17,
19]. LSTMs are designed to solve the vanishing gradient/exploding gradient problem of conventional RNNs and can efficiently capture long-term dependencies through memory cells and gates [
20]. Many recent studies have shown that LSTMs perform better than basic deep feedforward neural networks for time series data [
21,
22,
23].
In this study, to establish a system for early prediction of rice blast disease occurrence, we apply artificial intelligence techniques to past degree of blast onset data and historical climatic data, unlike previous studies that require climate information in the same year as the forecasting time. A region-specific model that can predict the incidence of rice blast in the target area for the coming year is developed; this is achieved by applying LSTMs to data on the past rice blast disease scores in four representative rice-producing regions in South Korea and to the historical climatic data of the target area. In the proposed model, the data for the year to be predicted is not used as an input, instead, only data for the previous three years are used. In addition, the influences of the LSTM input factors, i.e., rice blast disease scores, air temperature, relative humidity and sunshine hours, on the predictive accuracy are analyzed. Among the various rice cultivars cultivated in Korea, the rice varieties cultivated over the largest area (according to the Korean RDA) are selected for evaluation of the rice blast prediction results.
The Rural Development Administration is conducting inspections and forecasting regarding the occurrence of disease and insect pests, including rice blast in South Korea. The decision of whether or not to control pests is based on the information obtained from these inspections and forecasting. This information is sent to the municipal agricultural technology center to help farmers perform timely pest control. This proposed system for the early prediction of the occurrence of rice blast disease can contribute to the disease management of related administrative departments. The proposed forecasting system is developed based on regions and cultivars and hence a user who simply chooses a cultivar and a region by creating a web-server based system will help the management of an individual rice grower by providing the rice blast prediction result of the selected cultivar of the selected region.
The remainder of this paper is organized as follows.
Section 2 describes the data and LSTMs, while
Section 3 describes the experimental procedures. In
Section 4, we present and discuss the experimental results and draw conclusions.
4. Results and Discussion
Through the process described in the paper, a rice blast prediction model applicable to the Cheolwon, Icheon and Milyang areas and facilitating different input variable combinations was developed. Our model that can be adjusted to predict the rice blast occurrence in each region in the next year, based on the rice blast scores for the previous three years in Cheolwon, Icheon, Milyang and Naju and/or the weather, humidity and sunlight hours data for each region.
Table 5 lists the prediction performance for rice blast occurrence in the Cheolwon, Icheon and Milyang regions for each of the different model variations (see
Table 4). To assess their performance, we used standard performance metrics, i.e., the accuracy and F1-score. Here, the accuracy is defined as the number of correctly classified data elements among the total number of test data elements and the F1-score is the harmonic mean of the precision and recall.
We first discuss the performance of Blast_LSTM with regards to its prediction of the incidence of rice blast one year after the previous rice blast occurrence. Recall that the trained model yields predictions classified into three groups: resistance, middle resistance and susceptible groups. Thus, if the information on past rice blast occurrence did not contribute to the prediction of rice blast occurrence in the following year, the accuracy would be approximately 33%. However, as apparent from
Table 5, the Blast_LSTM model yielded 62.3% accuracy (F1-score: 59.6%) for the Cheolwon region and 61.5% (59.8%) and 46.9% (44.9%) accuracies for the Icheon and Milyang regions, respectively, all of which are considerably higher than the threshold accuracy and F1-scores of 33%. In particular, the accuracy for the Cheolwon region was quite high, at 62.3%. Thus, the results indicate that information on past rice blast occurrence helps predict the occurrence of rice blast in the distant future. Based on this finding, the accuracy of Blast_LSTM was used as a base model for comparison with the results of the other model variations.
In comparison with the Blast_LSTM base model, for BlastT_LSTM, in which the past average temperature information is combined with that of past rice blast incidence, the accuracy was improved by 5.5% (from 62.3% to 65.7%) for the Cheolwon region model (with 2.3% improvement in F1-score), by 2.1% (F1-score improvement: 1.5%) for the Icheon area model and by 8.7% (F1-score improvement: 13.8%) for the Milyang model. Therefore, it was found that the average temperature information from the preceding three years is helpful for prediction of rice blast disease occurrence. Note that, for the Milyang area model in particular, the Blast_LSTM accuracy was lower than that for the other regions. For this region, the addition of temperature information in BlastT_LSTM helped improve the prediction performance. As shown in
Figure 7a, the average temperatures of Cheolwon, Icheon and Milyang gradually increased from 2003 to 2016 and the temperature gradually changed over several years. Therefore, even without knowledge of the temperature of the next year, the temperature information from the past three years is thought to help improve the accuracy of blast disease occurrence prediction.
Compared with the BlastT_LSTM results, the model variable incorporating the relative humidity input variable as well as the above variables, i.e., BlastTH_LSTM, showed improvements of 3.9%, 1.2% and 2.9% in the F1-score for the Cheolwon, Icheon and Milyang regions. Compared with Blast_LSTM, which is the model having only rice blast disease scores as the input variables, the F1-scores improved by 6.4%, 2.7% and 17.1% for Cheolwon, Icheon and Milyang, respectively. It is not possible to accurately determine the influence of each input variable on the prediction by examining the correlation between input variables. However, as BlastTH_LSTM exhibits superior accuracy to the preceding model variations, it is apparent that the past relative humidity is also an important factor affecting rice blast prediction.
Comparison of the prediction results of BlastTHS_LSTM, the model variation also incorporating the sunshine hours, with those of BlastTH_LSTM indicates that the F1-scores were improved by 0.3%, 1.1% and 1.5% for the Cheolwon, Icheon and Milyang regions, respectively and the accuracy was improved by 0.7%, 0.6% and 4.0%, respectively. For all three regions, BlastTHS_LSTM had the highest accuracy among the examined model variations; this model variation incorporated the rice blast incidence scores and the average temperature, relative humidity and sunshine hour data.
Finally, we discuss the prediction results of Climate_LSTM, which considered the climate data of the previous three years only. This model variation excludes the information on the past rice blast incidence and exhibits the lowest prediction accuracy for all three regions, ranging between 44.4% and 55.2%. However, these results confirm that past climate information can be helpful for prediction of rice blast disease incidence. Note that this approach differs from methods forecasting the rice blast occurrence of the current year based on the previous year’s climate information only [
8,
9,
10,
11]. In comparison with the base model, Blast_LSTM, which considers past rice blast disease scores only without climate information, the prediction performance of the Climate_LSTM model was 11.4% for Cheolwon and 14.3% for Icheon. In the case of the Milyang region, the forecast performance was also low, at 5.3% (although the decline is different, the F1-scores were also considerably inferior.)
Figure 9 shows a comparison of the prediction accuracies of the Climate_LSTM, Blast_LSTM and BlastTHS_LSTM model variations by region. From
Figure 9 and
Table 5, it is apparent the prediction performance of each of these prediction model variations differed from region to region. For all model variations, the highest prediction performance was obtained for the Cheolwon region, followed by the Icheon region, for which the accuracy was 0.8–4.3% lower than that for the Cheolwon region. The lowest prediction performance was obtained for the Milyang region, for all prediction model variations, with the accuracy being lower than that of the Cheolwon area by 10.8–15.4%.
The deep learning model used in this study implements a data-driven approach, in which the learning data is very important because the LSTM model itself learns the features needed for prediction from the learning data. Note that we divided the data from 2003 to 2016 into training (70%), validation (10%) and test (20%) data in order of time. Therefore, the test data were the most recent data and the training data were the most historical data. The Class 0–2 distribution varies with time and
Figure 10 shows the class distributions for the training (historical) and test (most recent) data for each region.
From
Figure 10, the class distributions of the training and test data are similar for the Cheolwon region but differ significantly for Milyang region. Therefore, the class distribution and characteristics of the training and test data for the Milyang region differ. It is thought that these differences may have caused the significant reduction in the prediction accuracy.
The prediction system proposed in this study was developed cultivar-agnostically. Therefore, regardless of the cultivar, the rice blast incidence in one year can be predicted using data from the past three years and the model developed in this study. To examine this feature, the rice varieties most widely cultivated in Korea were selected and the predictions yielded by the developed model variations were analyzed.
According to the Korean RDA, rice varieties with high yield and superior taste are primarily grown each year. The most popular rice varieties for cultivation in Korea are Jopyeong, Samdeok, Onnuri, Nampyeong, Hwanggeumnuri, Koshihikari, Saenuri, Unkwang, Ilpum, Chucheong, Dongjinchal, Ilmi, Odae, Daean, Samgwang, Hopum, Chilbo, Saeilmi, Wungwang and Sindongjin. Data for 17 of these varieties (excluding Chilbo, Saeilmi and Wungwang) were included in the test dataset. Then, the effectiveness of each proposed prediction model variation was analyzed by examining the rice blast disease incidence prediction results. The average prediction accuracies for each of the 17 varieties cultivated in Cheolwon, Icheon and Milyang are listed in
Table 6. Among the three regions, the most accurate results were obtained for Cheolwon. The BlastTHS_LSTM model variation, which is the LSTM model incorporating the past blast occurrence rate, average temperature, relative humidity and sunshine hours as input variables, yielded the highest accuracies among all variations. For this model, accuracies of 79.4%, 64.7% and 55.6% were obtained for the Cheolwon, Icheon and Milyang regions, respectively.
The prediction accuracies for each of the cultivars given by BlastTHS_LSTM are shown in
Figure 11. It is apparent that the accuracies for some individual cultivars were better than others. However, the sample size was small for each cultivar, ranging from 1 to 4. These results indicate that rice blast information for cultivars cultivated in Korea for the previous three years, in conjunction with climate information, can be used to predict the occurrence of rice blast a year later. These findings will be helpful for preventing blast disease in the future. If weather conditions such as humidity vary within an area, the blast population in the area is expected to vary also and the response of rice to the blast is also expected to differ. The accuracies of the proposed models are not very high; however, it is a meaningful starting point, being the first attempt of the LSTM-based rice blast prediction model. The performance of the models can be further improved by adding more training data and optimizing the LSTM architecture.
Rice blast fungus is a representative model phytopathogenic fungus for which gene-for-gene interactions with the host are applicable. To date, more than 40 resistance genes have been identified in host rice and among the corresponding pathogen avirulence, 9 genes have been identified through molecular biology and functional genetics [
37,
38,
39,
40,
41,
42]. As a strategy to control rice blast fungus, introduction of a resistance gene through breeding is considered to be most effective. To understand the race or pathotype of rice blast fungus in South Korea and to collect information for introducing resistance genes to rice varieties, the distribution and transposon of nonpathogenic genes of rice blast fungus at the group level have been determined through DNA-fingerprinting studies using molecular biomarkers [
43]. According to the results of such studies, the pathogenic race pathotype is becoming more diverse than in the past, despite the decrease of rice blast throughout South Korea [
44]. These different types of pathogenesis are presumed to be caused by genetic variation of the rice blast fungus, which may lead to increased affinity strains for resistant cultivars [
6]. Resistant reversal of these resistant varieties to become susceptible varieties has been reported for many crops, including rice [
45]. The rice blast disease prediction system presented in this study can provide results for specific rice varieties; therefore, it will be of considerable assistance to rice blast researchers, especially in comparison with conventional rice blast predictions. It is possible to suggest the direction to breed as a resistant breed, by providing the data to rice breeders. Rice blast resistance genes are constantly being studied and hence using all this information together will be helpful for resistant breeding.
In this study, we incorporated different combinations of input variables, i.e., the degree of rice blast occurrence scores and the temperature, humidity and sunshine hours data, into the developed model variations. For all regions, the predicted results were most accurate when the rice blast occurrence scores, temperature, humidity and sunshine hours data were all considered. Note that other studies involving artificial intelligence have indicated that botanical disease occurrence is related to the combination of pathogens, environmental conditions and host plants.
In addition, we found that early prediction of rice blast occurrence based on climate data for the past three years is possible and that blast disease prevention can be further facilitated by incorporating knowledge of the rice blast occurrence for each of those years. In this study, we developed models for three representative regions in South Korea to analyze the feasibility of the LSTM-based methodology and analyzed the prediction results for 17 varieties. However, in addition to the three regions, there are data for monitoring rice blast disease in 9 regions and 358 varieties of rice in South Korea. Therefore, the framework used in this study can be extended to a prediction system for the remaining regions and all varieties. The utility of the proposed LSTM models is expected to be high. In addition, because the deep learning method used in this study is capable of transfer learning, it can easily be applied to data from other countries or regions. Thus, although this study was based on data from South Korea, the findings and developed system will be helpful for the various countries in which rice is grown as a primary crop.