Next Article in Journal
Regulation of Hydrogen Peroxide Dosage in a Heterogeneous Photo-Fenton Process
Next Article in Special Issue
A Study of Text Vectorization Method Combining Topic Model and Transfer Learning
Previous Article in Journal
Novel Module-Based Design Algorithm for Intensified Membrane Reactor Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dynamic Prediction of Chilo suppressalis Occurrence in Rice Based on Deep Learning

1
Collage of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China
2
Hunan Engineering Research Center of Rural and Agriculture Informatization, Changsha 410128, China
3
College of Plant Protection, Hunan Agricultural University, Changsha 410128, China
4
Station of Plant Protection and Quarantine of Hunan Province, Changsha 410005, China
*
Authors to whom correspondence should be addressed.
Processes 2021, 9(12), 2166; https://doi.org/10.3390/pr9122166
Submission received: 14 October 2021 / Revised: 16 November 2021 / Accepted: 17 November 2021 / Published: 1 December 2021

Abstract

:
(1) Background: The striped rice stem borer (SRSB), Chilo suppressalis, has severely diminished the yield and quality of rice in China. A timely and accurate prediction of the rice pest population can facilitate the designation of a pest control strategy. (2) Methods: In this study, we applied multiple linear regression (MLR), gradient boosting decision tree (GBDT), and deep auto-regressive (DeepAR) models in the dynamic prediction of the SRSB population occurrence during the crop season from 2000 to 2020 in Hunan province, China, by using weather factors and time series of related pests. (3) Results: This research demonstrated the potential of the deep learning method used in integrated pest management through the qualitative and quantitative evaluation of a reasonable validating dataset (the average coefficient of determination R mean 2 for the DeepAR, GBDT, and MLR models were 0.952, 0.500, and 0.166, respectively). (4) Conclusions: The DeepAR model with integrated ground-based meteorological variables, time series of related pests, and time features achieved the most accurate dynamic forecasting of the population occurrence quantity of SRSB as compared with MLR and GBDT.

1. Introduction

The Chilo suppressalis (striped rice stem borer, hereafter referred to as SRSB), the most widely distributed and destructive rice pest [1], is also the worst rice pest in China [2]. The larvae of SRSB eat rice stems, which leads to rice with dead hearts in the tillering stage, then forms white earheads during the heading stage, which can finally lead to rice with dead sheath [3] (Figure 1). Annually, China suffers severe rice yield reduction and economic losses from the SRSB pest [3,4,5,6]. This destruction is caused in part by the rapid proliferation of pests within pest populations, which makes it difficult for farmers to predict its outbreak. Continuously monitoring and accurately predicting the dynamic changes in the pest population during the crop growth period may be helpful for protecting rice from SRSB.
The insect population can be affected by many factors; both abiotic and biotic factors are believed to be responsible for changes in the insect population [1]. The effects of abiotic factors such as climate variables have been well-documented [7]. Therefore, an adequate early warning of an SRSB infestation combined with meteorological factors can support plant protection efforts. Apart from being threatened by SRSB, rice is also negatively affected by various pests such as the rice planthopper (hereafter referred to as RPH) and the paddy leaf roller (hereafter referred to as PLR). Thus, the species and the numbers of other pests in the region will also have effects on the population development trends for SRSB.
An agricultural pest prediction model is built on the pest occurrence mechanism, mathematical statistics, time series analyses, together with the critical factors affecting pest occurrence, which can provide information on pest occurrence, severity, and development trends. Differentiated by the principles of prediction methods, the current range of pest prediction models include statistical regression models, machine learning models, and deep learning models.
Statistical modeling for the early prediction of pest risks is one strategy that has been widely adopted [8], and whose essence is to ascertain the relationships between variables in the form of fitting equations. The predicting steps involve: performing a statistical analysis with the historical data on pests, extracting the relationship between the target pests y and a related factor x , establishing the mathematical equations, and then making a quantitative prediction of pests by means of these equations. This kind of prediction approach treats pest occurrence as a separate system, without considering the occurrence process and mechanism. The most commonly used approach is the multiple linear regression (MLR) model. The severe difficulty for statistical regression methods lies in choosing the relevant factor x ; most researchers currently tend to build predicting models with relevant meteorological factors [9,10]. Some researchers have found that combining weather factors with other factors, such as variety, soil, fertilization, etc., can improve a model’s prediction capability [11,12]. The statistical learning-based methods focusing on finding the linear relations between variables have high interpretability. However, most problems in real-life production show rich, non-linear links for which traditional statistical regression methods do not work.
The machine learning method has a strong predictive capability that automates the organization, fits the parameter adjustment model, obtains the optimal model to fit the current datasets, and predicts with the optimal model. The accuracy and speed of the machine learning method improve as the amount of data increases, which is what distinguishes it from traditional statistical regression methods. The machine learning method can also learn non-linear relationships; consequently, the machine learning-based regression analysis has become the mainstream in agricultural pest prediction, with support vector machines [13] and decision trees [10] as the two commonly adopted machine learning prediction algorithms. However, machine learning algorithms are so diverse that it is difficult for researchers to choose one for practical problems. Moreover, the pros and cons of machine learning algorithms also differ, such as the SVM being inefficient in processing large samples of data [14], while the performance of neural networks improves with an increase in data volume [15], but which also easily leads to higher computational costs and the overfitting of traditional neural networks [16].
Deep learning, a branch of machine learning, is an algorithm using the artificial neural networks as an architecture to characterize and learn data [17,18,19,20,21]. The algorithm is extensively applied in most traditional fields [22,23,24,25], and some progress has been made in the field of agricultural pest prediction in recent years [26]. With the reduction of hardware costs and the improvement of algorithms, deep learning-based methods will become a leading research topic for agricultural pest prediction.
Aiming at the prediction of SRSB occurrence in rice, and combining this with ground meteorological observation data and related pest time sequence data, this paper constructs a multi-dimensional dynamic probability prediction model based on the use of deep learning for time series analyses. The model presented in the paper is more applicable than the traditional pest prediction models and can realize a dynamic timing prediction of pests. The key works included in the paper are as follows: (1) investigating the relationships between ground meteorological data, related pests, and SRSB; (2) comparing the performances of the models using only meteorological variables to that of the models combining meteorological variables with the time series of related pests; (3) developing a deep learning-based dynamic probability prediction DeepAR model for the occurrence of SRSB; (4) evaluating the performance of the DeepAR model using the traditional MLR model and the machine learning GBDT model.
Our method is expected to lead to an improved method for the management of SRSB for following reasons:
(1)
Our study suggests that combining related pest time series data with the ground meteorological data can improve the model’s prediction accuracy as compared to previous studies using only the ground meteorological data;
(2)
Combining weather and associated pest time series with deep learning-based DeepAR models can provide more accurate predictions than the traditional MLR and the machine learning GBDT. These findings could be utilized to support an integrated pest management (IPM) program to help farmers reduce the use of pesticides and minimize crop loss in rice paddy fields.

2. Materials and Methods

2.1. Study Areas

This paper mainly studied the dynamic population change of SRSB in Hunan Province, China, and the area selection was based on the following considerations:
  • Areas have a high number of insects;
  • Areas have a long history of rice cultivation;
  • Area characteristics can represent different regions in Hunan Province, China.
Based on these, A (Hongjiang), B (Yuangjiang), C (Dong’an), D (Linli), and E (Liling) were selected as the study areas (Figure 2). Hunan Province belongs to an area with the most extensive rice farming in China. The selected area has high temperatures and is rainy in summer and hot at the same time, which is suitable for the occurrence of SRSB.

2.2. Data Collection

2.2.1. Pest Data

The pest data came from the daily records of the rice pest light traps for major insect pests in the crop monitoring and early warning information system in Hunan Province, China. Pest species include 11 rice pests (Table 1), such as SRSB, RPH, and PLR. Adult pests were collected by a light trap set from 18:00 to 6:00 the next day, located in areas A (2000–2020), B (2000–2020), C (2000–2020), D (2000–2020), and E (2010–2020). Plant protection workers removed the insects from the traps every morning, and subsequently identified and counted them.

2.2.2. Meteorological Data

Meteorological data were obtained from the ground daily meteorological data downloaded by the National Meteorological Center, spanning the years 2000–2020, including 19 factors such as temperature, precipitation, and sunshine duration; the detailed information is shown in Table 2.

2.2.3. Time Features

As pest occurrence is a typical time series problem, it has prominent temporal characteristics. The extraction of the time characteristics of pest data facilitates the construction of more accurate predictive models. We extracted the time features, including years, seasons, months, weeks, weekdays, and days. Among these were March–May for spring, June–August for summer, September–November for autumn, and from December to the following February for winter. The weeks were composed of seven days as one week, with 52 weeks per year. Weekdays entailed the obtainment of working day information according to the Gregorian calendar, mainly considering that the acquisition of pest data required manual recording.

2.2.4. Data Preprocessing

The original pest data had some missing and outlier values. The missing values were interpolated using the average adjacent position interpolation method. We selected the five previous and five subsequent effective values of the missing fraction to calculate the arithmetic mean, and used this arithmetic mean to interpolate the missing part. The outliers were processed using the exponentially weighted averages method, and the exponentially weighted averages were defined as follows:
y t = x t + 1 α x t 1 + 1 α 2 x t 2 + + 1 α t x 0 1 + 1 α + 1 α 2 + + 1 α t ,
where α is the smoothing factor ( α 0 , 1 , y t is the value after t moment smoothing, x t is the value before t moment smoothing. In this paper, a sliding window with seven days as a window and one day as a step were established to smooth the pest data.
The meteorological data were processed in the same way. There were no meteorological stations in some parts of the study area. This paper used meteorological stations near cities and counties in the study area (Table 3).
Some time series of related pests contain unique values. Unique values do not help with model construction. In addition, there is a collinearity relationship among some variables. Collinearity plays a consistent role in the process of model construction, where it raises the complexity of the model. Therefore, we removed the unique values and excess collinearity variables during the model construction. In this paper, high-quality pest variables, meteorological variables, and time datasets (Table 4) were constructed, laying the material basis for the subsequent analysis and prediction.

2.3. Methods

2.3.1. Datasets Preparation

To predict the SRSB, weather variables (including TEMP, RH, and PRCP) and the associated time series of related pests were included as input variables. Furthermore, the daily SRSB light trap catches were natural log-transformed before analysis to satisfy the regression hypothesis [27,28]. The SRSB light trap catches were treated as an output variable in all models and an input variable in the autoregressive model.
The datasets of all variables of crop seasons in E (Liling) from 2010 to 2019, and those of other study areas from 2000 to 2018 were used as training datasets. E (Liling) training datasets contained 3726 samples, and other study areas’ training datasets contained 7013 samples. In Liling 2020, the remaining observations from other regions from 2019 to 2020 were used as test datasets to verify the model. We chose data from March to October to develop the models, as this period was commonly used to plan pest monitoring. All the details are summarized in Table 5.

2.3.2. Model

Multiple Linear Regression (MLR)

Pearson correlation analysis was used to obtain the relationship between SRSB and meteorological data, associated pest time series, and time features. Taking the significant correlation coefficient (R) as the standard, we selected appropriate variables to develop the linear model of SRSB.
An MLR model using stepwise selection was established in three scenarios: (1) only meteorological variables were considered to estimate the maximum determination coefficient of the SRSB (the R square); (2) meteorological variables and time series-related pests were combined to estimate the maximum determination coefficient of the SRSB (the R square); (3) meteorological variables, time series-related pests, and time features were considered to estimate the maximum determination coefficient of the SRSB (the R square).
MLR is a statistical method of regression for analyzing the relationship of an individual dependent variable with two or more independent variables [29], which can be demonstrated as follows:
y = α 0 + α 1 x 1 + α 2 x 2 + + α i x i + α k x k + ε ,
Here y is the dependent variable, x i   is the independent variable, α 0 represents the intercept, α i is the slope of x i to y , ε is the residual. Stepwise regression can automatically select the most relevant independent variables when the number of independent variables is large and where it is noted possible to fit all potential models [30]. We used Python’s statsmodels library to implement the MLR model.

Gradient Boosting Decision Tree (GBDT)

The GBDT or the Gradient Boosting Decision Tree, an ensemble model of an iterative decision tree algorithm proposed by Jerome Friedman in 1999, is a representative model of the ensemble method. GBDT takes the regression tree as a base learner, integrated gradient boosting algorithm [31].
To train the GBDT model, we used a grid search combined with a 5-fold cross-validation [32]. The GBDT model was parameter-optimized to obtain the best performing GBDT model under the current datasets. The training and test datasets contained all variables (meteorological, related pest, and time features). We selected the model with the highest R2 as the best GBDT model, calculating and plotting the importance of the input variables. This model was developed using the LightGBM library of python.

DeepAR Model

DeepAR is a probabilistic prediction method based on auto-regression recurrent neural networks. The approach solves the prediction problem through deep neural network learning by combining the appropriate likelihood, using non-linear data transformation techniques. DeepAR takes advantage of LSTM-based recurrent neural network architecture [33,34]. It also builds on previous deep learning work on time-series data [35,36,37] to address the probabilistic prediction problem. Deep networks, allowing for more abstract data representation through more complex transformations [21], thus generally outperform shallow and broad neural networks.
DeepAR has the following advantages: First, it performs a probabilistic prediction of the sample using the Monte Carlo method and can calculate consistent quantile estimates across all sub-ranges in the predicted range. Secondly, the method does not assume Gaussian noise, but broad likelihood functions can be supported and allow users to select the parts most suitable for the statistical data properties. Once again, by learning from similar data, being able to provide predictions from data with little or no history is something that conventional one-dimensional predictions cannot do. Finally, DeepAR can understand seasonal behavior and complex dependencies with minimal human intervention [38].
Through the use of the deep learning DeepAR time series prediction model, combining the time series of related pests, meteorological variables, and time features to predict the daily capture of SRSB light traps produced training and test datasets that contained all variables (meteorological, pest, and time).
We used the Gluonts library based on the MXNet framework to build the DeepAR model of the rice SRSB, selected the negative binomial distribution as the likelihood function of the DeepAR model; all the other hyperparameters used the default hyperparameters.

2.3.3. Evaluation Metrics

Multiple metrics can be used to analyze the performance of our prediction, so we opted to use the top 4 most used metrics for time series forecasting. We used the Coefficient of Determination (R2), Mean Absolute Error (MAE), Symmetric Mean Absolute Percentage Error (sMAPE), and Root Mean Square Error (RMSE) to evaluate the prediction model.

Coefficient of Determination (R2)

R2 was used to measure the proportion of various independent variables that independent variables could explain to judge the explanatory power of the regression model [39,40,41].
Suppose that a dataset includes y 1 , ,   y n   total n observations, the corresponding model of predicted values is thus f 1 , ,   f n . Defining the residual with e i = y i f i , the average observed value is calculated as follows:
y = α 0 + α 1 x 1 + α 2 x 2 + + α i x i + α k x k + ε , y ¯ = 1 n i = 1 n y i ,
The total sum of the square can thus be obtained with:
S S t o t = i y i y ¯ 2 ,
The sum of the squares of residuals can be calculated with the following formula:
S S r e s = i y i f i 2 = i e i 2 ,
Thus, the determination coefficient can be defined as follows:
R 2 = 1 S S r e s S S t o t ,
The R 2 usually ranges from 0 to 1. The R 2 can be more truthful than sMAPE, MAE, MAPE, MSE, and RMSE in regression analysis evaluation [42].

Mean Absolute Error (MAE)

MAE refers to the meaning of the distance between the predictive model value f i and the true value y i of the sample. MAE is calculated as:
M A E = 1 n i = 1 n y i f i   ,

Symmetric Mean Absolute Percentage Error (sMAPE)

sMAPE is an accuracy measure based on percentage (or relative) errors. It is usually defined as follows:
s M A P E = 100 % n i = 1 n f i y i ( | f i | + y i ) / 2 ,

Root Mean Square Error (RMSE)

RMSE is widely used to measure the differences between values predicted by a model and the values observed. It is defined as follows:
s M R M S E = 1 n i = 1 n f i y i 2 ,
In general, lower MAE, sMAPE, and RMSE are better than higher values, and all three metrics are non-negative. But for the R2, higher is better.

3. Results

3.1. Relationship between Climatic Variables, Time Series of Related Pests, and Time Features, and the SRSB Light Trap Catch

The correlation coefficient (R) was calculated between the natural log-transformed SRSB light trap catch and the selected environmental variables (climatic variables, related pest, and time features), and the correlation coefficient (R) and sig. (p > |t|) were calculated for five study regions and then averaged (Table 6). Our results show that the SRSB light trap catch had a significant positive correlation with the RPH light trap catch (R = 0.458 ± 0.111, p > |t|), and had an extremely significant positive correlation with the PSB light trap catch (R = 0.271 ± 0.098, p > |t|), but had a significant negative correlation with AP (R= −0.445 ± 0.070, p > |t|) and the season (R = −0.247 ± 0.079, p > |t|). Meanwhile, it there was some correlation between the SRSB light trap catch and the TEMP, SDD, and PLR light trap catch, but not significant. Extremely significant and significant correlation variables were included in the linear and non-linear models to predict SRSB light trap catches.

3.2. Multiple Linear Regression Prediction

Using a stepwise selected MLR model, we combined meteorological, associated pest, and time features (Table 7). C o e f is the MLR model coefficient that indicates the contribution of each variable to the model. s t d   e r r is the standard error of the coefficient estimation. t and p > t represent the effects of the independent variable on the dependent variable. The meteorological variable AP was significantly and negatively correlated with the SRSB light trap catch. Related pest RPH, YSB, and PSB with the light trap catch and the time variable season were negatively correlated with the SRSB light trap catch.
The use of meteorological variables (Model 1) alone explain approximately 35% (Adj.R2 = 0.346) of the variability in the SRSB light trap catch; the model based on meteorological variables and related pest (Model 2) explains 39.9% (Adj.R2 = 0.399) of the variability in the SRSB light trap capture; in comparison, a model based on meteorological variables, associated related pests, and time features (Model 3) could explain 40% (Adj.R2 = 0.400) of the variability in the SRSB light trap catch. The variance inflation factor (VIF) for all the input variables was less than three, indicating no multiple collinearities among the variables. The adjusted R2 selected the model combining meteorological variables, associated pests, and time features (Model 3) as the best model.
According to the results of the stepwise regression shown in Table 7, the prediction model of the Yuanjiang can be represented using the following regression equation:
l n S R S B 535.9426 + 45.5917 × A P + 0.1283 × R P H + 0.4709 × P S B + 2.1272 × Y S B + 0.005 × S e a s o n ,
The dependent variable ln(SRSB) indicates the natural logarithm of the SRSB light trap catch. The independent variables AP, RPH, PSB, YSB, and Season indicate the AP, RPH light trap catch, PSB light trap catch, YSB light trap catch, and Season, respectively.
The MLR model of the other SRSB light trap catch of the study area was obtained using the same method, and a summary of the MLR model for the training datasets of each study area is shown in Table 8. R2 and Adj.R2 represent the MLR fitting accuracy of the training datasets, and N represents the length of the training datasets. The results show that in different study regions, stepwise regression selected different independent variables.
The average coefficient of determination, minimum coefficient of determination, and maximum coefficient of determination of the MLR model based on the test dataset in the study areas (Linli, Liling, Yuanjiang, Dong’an, and Hongjiang) are R mean 2 = 0.166 , R min 2 = 0.083 , and R max 2 = 0.312 , respectively.

3.3. GBDT Model Prediction

Based on the training dataset, the GBDT models from different study regions (Linli, Liling, Yuanjiang, Dong’an, and Hongjiang) yielded other results (R2 = 0.420, 0.104, 0.639, 0.509, and 0.564, RMSE = 0.999, 1.799, 0.860, 1.078, and 0.733). Figure 3 shows the GBDT model input variable’s importance in the natural log conversion ln (SRSB) light trap catch. The season is the least important input variable in the Yuanjiang GBDT model. Weeks and Year are the most important input variables in the GBDT model.
The average coefficient of determination, minimum coefficient of determination, and maximum coefficient of determination of the GBDT model based on the test dataset in the study areas (Linli, Liling, Yuanjiang, Dong’an, and Hongjiang) are R mean 2 = 0.500 , R min 2 = 0.295 , and R max 2 = 0.687 , respectively.

3.4. DeepAR Model Prediction

DeepAR uses the previous time step value to set the current time step of the model. These values were available within the regulatory range during training and prediction. For the prediction range, the training and the forecast values must be distinguished. During projection, time-series values within the prediction range were not available because these were the results to be predicted. Therefore, the samples from the likelihood function (whose parameters were predicted in the previous step) were used as the input values for the current time step.
Figure 4 shows the learning process of the DeepAR model [43], with the training process on the left and the prediction process on the right. After the training, the historical data t < t 0 were entered into the network to obtain the predicted initial hidden state h i , t 0 1 t 0 , and then the prediction results were obtained using ancestral sampling. More specifically, at each time step, t 0 , t 0 + 1 , , T could be randomly sampled to get z ¯ i , t , the z ¯ i , t as a partial input for the next time step. In this way, a series of all sampling values from t 0 to T could be obtained on the time scale, and these sampling values could then be used to calculate the required target value.
The average coefficient of determination, minimum coefficient of determination, and maximum coefficient of determination of the DeepAR model based on the test dataset in the study areas (Linli, Liling, Yuanjiang, Dong’an, and Hongjiang) are R mean 2 = 0.952 , R min 2 = 0.945 , and R max 2 = 0.958 , respectively.

3.5. MLR, GBDT, and DeepAR Model Validation and Performance Comparison

Figure 5 compares the true and predicted values of the test datasets of the rice SRSB light trap capture after natural log transformation (ln) in different study regions under different models (traditional MLR model, machine learning GBDT, and deep learning DeepAR model).
We found that for all study regions, the MLR model could not correctly fit the actual values. Even negative predicted values were obtained for some periods (for example, in Hongjiang from December 2019 to January 2020), which have a large gap with the actual value. The GBDT model had good prediction results, although the prediction values in Hongjiang and Yuanjiang still could not accurately fit the actual values. However, the trend of the SRSB light trap catch was correctly reflected. It showed the upward and downward movement of the SRSB light trap catch in some periods (for example, in Hongjiang from April 2019 to October 2020, and in Yuanjiang from April 2019 to October 2020). The DeepAR model had the best predictions, and actual values could be accurately fitted in all study regions and periods.
The results show that the deep learning DeepAR model produced good predictions for all sites as compared to the traditional MLR model and the machine learning GBDT model.
Figure 6 shows the comparison between the light trap catch of natural log-transformed SRSB populations as predicted by the MLR, GBDT, and DeepAR models, respectively. Compared to R2 and RMSE, DeepAR models produced more accurate predictions than the MLR and GBDT models, and the GBDT was more accurate than the MLR. The R2 values for DeepAR, GBDT, and MLR were 0.944–0.960, 0.295–0.687, and 0.083–0.312, respectively, and the RMSE values were 0.228–0.425, 0.733–1.271, and 1.158–1.576, respectively.
Figure 7 shows the performance of MLR, GBDT, and DeepAR in evaluating the indicators MAE, RMSE, sMAPE, and R2 in different study areas. We found that the MLR model showed the worst performance in the SRSB light trap catches in Hongjiang, Yuanjiang, Dong’an, and Linli. The Liling GBDT model had the worst performance, probably the smallest sample of the datasets (compared to other study regions). The DeepAR model (MAE, RMSE, sMAPE, and R2 were 0.125–0.245, 0.228–0.425, 0.360–0.657, and 0.945–0.959, respectively) had the best performance in all areas, outperforming the MLR (MAE, RMSE, sMAPE, and R2 were 0.856–1.297, 1.158–1.576, 0.808–1.414, and 0.083–0.312, respectively) and the GBDT (MAE, RMSE, sMAPE, and R2 were 0.494–0.981, 0.733–1.271, 0.003–1.296, and 0.295–0.687, respectively) models in terms of stability and accuracy.
In conclusion, it is feasible to predict the SRSB light trap catch using the deep learning DeepAR model.

4. Discussion

Predicting pest populations helps specify pest management strategies, reduce the use of pesticides, and is an integral part of the successful implementation of IPM. For pest prediction models, weather variables such as temperature, humidity, rainfall, and sunshine duration are often used as abiotic predictors in model development [8,9,10,11,12,13,26]. We found that TEMP [1,7], RH, and SDD were positively associated with the SRSB light trap catch, while AP and PRCP were negatively associated with the SRSB light trap catch. WDSP, EVP, and MXWDSP were also associated with the SRSB light trap catch. Generally, when WDSP, EVP, and MXWDSP are moderate, the amount of SRSB is the highest.
The rice light trap was used to capture rice pests in order to study the relationship between them. We found a significant positive correlation between SRSB and RPH, and a highly significant positive correlation between SRSB and PSB. This indicates an interactional relationship between rice pests, which could be used to predict some areas with little or even no historical pest data, especially for migratory pests such as the Spodoptera odorata.
The stepwise multivariate regression model established in this study showed that the model which combined meteorological variables, associated pests, and time features (adjusted R2 = 0.400) was more accurate than the model using meteorological variables alone (adjusted R2 = 0.346).
The GBDT model is a suitable choice for predicting pest occurrence. Our study showed that the GBDT model produced more accurate pest predictions than the MLR model. The deep learning DeepAR model obtained the best predictions, probably because of the extended data cycles we used, and deep learning is known to perform well with large samples. Our study showed that the deep learning DeepAR model predicted the natural log-transformed SRSB light trap catch with an average accuracy of 95.2% (the average prediction accuracy of the MLR model was 16.6%, and that of the GBDT model was 50.0%), which has a good application value. Since the rice field is an open environment, the factors driving the growth of the SRSB population are variable. In addition to weather and pest-related factors, rice growth phenology, natural enemies, rice varieties, pest prevention, control information, and even farmer practices may affect population dynamics. The observed and predicted kurtosis differences in SRSB may be due to seasonality and changes in the surrounding environment. Rice-related pest factors with a larger area can be considered in future work.

5. Conclusions

In this study, we presented a prediction model for SRSB population occurrence in the Hunan Province of China by integrating time series variables of ground weather, the number of related pests captured by light traps, and the number of SRSB captured by light traps. The MLR, GBDT, and DeepAR models were constructed based on the abovementioned variables. MLR was used to study the predictive power of meteorological variables alone or combined with related pest and time variables. At the same time, the GBDT and DeepAR models were established to enhance the model prediction performance compared to MLR.
Based on the high correlation coefficient of the MLR model, the main features of the MLR model for the SRSB captured by the light trap in the research areas were selected as follows: Yuanjiang made use of AP, RPH, PSB, YSB, and Season; Hongjiang used RHmin, RPH, PSB, YSB, and Season; Dong’an used TEMP, RPH, YSB, and Season; Linli used TEMP, PSB, YSB, PLR, and Season; Liling used TEMP, PSB, YSB, PLR, and Season. The GBDT model performed better than the MLR model in four regions (Hongjiang, Yuanjiang, Dong’an, and Linli), and DeepAR performed better than MLR and GBDT in all areas.
In conclusion, deep learning-based DeepAR models can dynamically predict SRSB populations combined with the ground meteorological variables, associated pest variables, and pest variable-derived time variables, which can be applied to the timely management of crop pests after proper validation in different regions. We anticipate that these results can cooperate with an online rice pest monitoring and intelligent prediction system developed by the Hunan Provincial Department of Agriculture to support an effective early pest warning system.

Author Contributions

Conceptualization, Y.L. and S.T.; methodology, Y.L.; software, Y.L. and H.Y.; validation, Y.L., S.T., and H.Y.; formal analysis, Y.L.; investigation, Y.L.; resources, S.T., Z.Z., and C.L.; data curation, Y.L., R.Z., and H.Y.; writing—original draft preparation, Y.L., R.Z., and H.Y.; writing—review and editing, Y.L., C.L., and H.Y.; visualization, Y.L.; supervision, S.T. and C.L.; project administration, S.T.; funding acquisition, Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Science Fund Project of China (31772157).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest

The authors declare no conflict of interest with respect to the research, authorship, and publication of this article.

References

  1. Feng, Q.L. Physiology and interaction of insects with environmental factors. J. Integr. Agric. 2020, 19, 1411–1416. [Google Scholar] [CrossRef]
  2. Sun, Y.; Xu, L.; Chen, Q.; Qin, W.; Huang, S.; Jiang, Y.; Qin, H. Chlorantraniliprole resistance and its biochemical and new molecular target mechanisms in laboratory and field strains of Chilo suppressalis (Walker). Pest Manag. Sci. 2018, 74, 1416–1423. [Google Scholar] [CrossRef]
  3. Muralidharan, K.; Pasalu, I.C. Assessments of crop losses in rice ecosystems due to stem borer damage (Lepidoptera: Pyralidae). Crop Prot. 2006, 25, 409–417. [Google Scholar] [CrossRef]
  4. Chen, M.; Shelton, A.; Ye, G. Insect-Resistant Genetically Modified Rice in China: From Research to Commercialization. Annu. Rev. Entomol. 2011, 56, 81–101. [Google Scholar] [CrossRef] [Green Version]
  5. He, Y.; Zhang, J.; Gao, C.; Su, J.; Chen, J.; Shen, J. Regression analysis of dynamics of insecticide resistance in field populations of Chilo suppressalis (Lepidoptera: Crambidae) during 2002–2011 in China. J. Econ. Entomol. 2013, 106, 1832–1837. [Google Scholar] [CrossRef]
  6. Wang, Y.N.; Ke, K.Q.; Li, Y.H.; Han, L.Z.; Liu, Y.M.; Hua, H.X.; Peng, Y.F. Comparison of three transgenic Bt rice lines for insecticidal protein expression and resistance against a target pest, Chilo suppressalis (Lepidoptera: Crambidae). Insect Sci. 2016, 23, 78–87. [Google Scholar] [CrossRef] [PubMed]
  7. Qiang, C.K.; Du, Y.Z.; Yu, L.Y.; Qin, Y.H.; Feng, W.J. Effects of temperature stress on physiological indices of Chilo suppressalis Walker (Lepidoptera: Pyralidae) diapause larvae. Chin. J. Appl. Ecol. 2012, 23, 1365–1369. [Google Scholar]
  8. Skawsang, S.; Nagai, M.; Tripathi, N.K.; Soni, P. Predicting Rice Pest Population Occurrence with Satellite-Derived Crop Phenology, Ground Meteorological Observation, and Machine Learning: A Case Study for the Central Plain of Thailand. Appl. Sci. 2019, 9, 4846. [Google Scholar] [CrossRef] [Green Version]
  9. Aparecido, L.; Rolim, G.; De Moraes, J.R.D.S.; Costa, C.; Souza, P. Machine learning algorithms for forecasting the incidence of Coffea arabica pests and diseases. Int. J. Biometeorol. 2020, 64, 671–688. [Google Scholar] [CrossRef]
  10. Holloway, P.; Kudenko, D.; Bell, J.R. Dynamic selection of environmental variables to improve the prediction of aphid phenology: A machine learning approach. Ecol. Indic. 2018, 88, 512–521. [Google Scholar] [CrossRef] [Green Version]
  11. Narayanasamy, M.; Kennedy, J.; Geethalakshmi, V. Weather Based Pest Forewarning Model for Major Insect Pests of Rice—An Effective Way for Insect Pest Prediction. Annu. Res. Rev. Biol. 2017, 21, 1–13. [Google Scholar] [CrossRef]
  12. Poggi, S.; Le Cointe, R.; Riou, J.; Larroudé, P.; Thibord, J.; Plantegenest, M. Relative influence of climate and agroenvironmental factors on wireworm damage risk in maize crops. J. Pest Sci. 2018, 91, 585–599. [Google Scholar] [CrossRef]
  13. Gu, Y.H.; Yoo, S.J.; Park, C.J.; Kim, Y.H.; Park, S.K.; Kim, J.S.; Lim, J.H. BLITE-SVR: New forecasting model for late blight on potato using support-vector regression. Comput. Electron. Agric. 2016, 130, 169–176. [Google Scholar] [CrossRef]
  14. Ni, T.; Zhai, J. A matrix-free smoothing algorithm for large-scale support vector machines. Inf. Sci. 2016, 358, 29–43. [Google Scholar] [CrossRef]
  15. Feng, S.; Zhou, H.; Dong, H. Using deep neural network with small dataset to predict material defects. Mater. Des. 2019, 162, 300–310. [Google Scholar] [CrossRef]
  16. Salman, S.; Liu, X. Overfitting Mechanism and Avoidance in Deep Neural Networks. arXiv 2019, arXiv:1901.06566. [Google Scholar]
  17. Bengio, Y. Learning Deep Architectures for AI. Found. Trends Mach. Learn. 2009, 2, 1–127. [Google Scholar] [CrossRef]
  18. Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [Green Version]
  19. Bengio, Y.; Courville, A.; Vincent, P. Representation Learning: A Review and New Perspectives. IEEE T. Pattern Anal. 2013, 35, 1798–1828. [Google Scholar] [CrossRef] [PubMed]
  20. Deng, L.; Yu, D. Deep Learning: Methods and Applications. Found. Trends Signal Process. 2014, 7, 197–387. [Google Scholar] [CrossRef] [Green Version]
  21. Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
  22. Luo, X.; Li, J.; Chen, M.; Yang, X.; Li, X. Ophthalmic Disease Detection via Deep Learning with a Novel Mixture Loss Function. IEEE J. Biomed. Health Inform. 2021, 25, 3332–3339. [Google Scholar] [CrossRef]
  23. Chen, M.; Li, Y.; Luo, X.; Wang, W.; Wang, L.; Zhao, W. A Novel Human Activity Recognition Scheme for Smart Health Using Multilayer Extreme Learning Machine. IEEE Internet Things J. 2019, 6, 1410–1418. [Google Scholar] [CrossRef]
  24. Sun, J.; Luo, X.; Gao, H.; Wang, W.; Gao, Y.; Yang, X. Categorizing Malware via A Word2Vec-based Temporal Convolutional Network Scheme. J. Cloud Comput. 2020, 9, 1–14. [Google Scholar] [CrossRef]
  25. Luo, X.; Sun, J.K.; Wang, L.; Wang, W.P.; Zhao, W.B.; Wu, J.S.; Wang, J.H.; Zhang, Z.J. Short-Term Wind Speed Forecasting via Stacked Extreme Learning Machine With Generalized Correntropy. IEEE Trans. Ind. Inform. 2018, 14, 4963–4971. [Google Scholar] [CrossRef] [Green Version]
  26. Wahyono, T.; Yaya, H.; Haryono, S.; Saleh, A.B. Enhanced lstm multivariate time series forecasting for crop pest attack prediction. ICIC Express Lett. 2020, 10, 943–949. [Google Scholar]
  27. Yan, Y.; Feng, C.; Wan, M.P.; Chang, K.T. Multiple Regression and Artificial Neural Network for the Prediction of Crop Pest Risks; Springer International Publishing: Cham, Switzerland, 2015; pp. 73–84. ISBN 1865-1348. [Google Scholar]
  28. Yamamura, K.; Yokozawa, M.; Nishimori, M.; Ueda, Y.; Yokosuka, T. How to analyze long-term insect population dynamics under climate change: 50-year data of three insect pests in paddy fields. Popul. Ecol. 2006, 48, 31–48. [Google Scholar] [CrossRef]
  29. Ghani, I.M.M.; Ahmad, S. Stepwise Multiple Regression Method to Forecast Fish Landing. Procedia-Soc. Behav. Sci. 2010, 8, 549–554. [Google Scholar] [CrossRef] [Green Version]
  30. Amiri, S.S.; Mottahedi, M.; Asadi, S. Using multiple regression analysis to develop energy consumption indicators for commercial buildings in the U.S. Energy Build. 2015, 109, 209–216. [Google Scholar] [CrossRef]
  31. Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  32. Picard, R.R.; Cook, R.D. Cross-Validation of Regression Models. Publ. Am. Stat. Assoc. 1984, 79, 575–583. [Google Scholar] [CrossRef]
  33. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  34. Hochreiter, S.; Schmidhuber, J. LSTM can solve hard long time lag problems. In Proceedings of the 9th International Conference on Neural Information Processing Systems, Denver, CO, USA, 3–5 December 1996; pp. 473–479. [Google Scholar]
  35. Graves, A. Generating Sequences With Recurrent Neural Networks. arXiv 2013, arXiv:1308.0850. [Google Scholar]
  36. Van den Oord, A.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. WaveNet: A Generative Model for Raw Audio. arXiv 2016, arXiv:1609.03499. [Google Scholar]
  37. Zaremba, W.; Sutskever, I.; Vinyals, O. Recurrent Neural Network Regularization. arXiv 2014, arXiv:1409.2329. [Google Scholar]
  38. Salinas, D.; Flunkert, V.; Gasthaus, J.; Januschowski, T. DeepAR: Probabilistic forecasting with autoregressive recurrent networks. Int. J. Forecast. 2020, 36, 1181–1191. [Google Scholar] [CrossRef]
  39. Draper, N.; Smith, H. Applied Regression Analysis, 2nd ed.; John Wiley: New York, NY, USA, 1981; ISBN 978-0-471-02995-3. [Google Scholar]
  40. Glantz, S.A.V.; Slinker, B.K. Primer of Applied Regression and Analysis of Variance; McGraw-Hill: New York, NY, USA, 1990; ISBN 0070234078. [Google Scholar]
  41. Carpenter, R.G. Principles and procedures of statistics, with special reference to the biological sciences. Eugen. Rev. 1960, 52, 172–173. [Google Scholar]
  42. Chicco, D.; Warrens, M.J.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef]
  43. Time Series Prediction—Telesens. Available online: https://www.telesens.co/2019/06/08/time-series-prediction/ (accessed on 21 September 2021).
Figure 1. The main symptom in rice of damage by SRSB.
Figure 1. The main symptom in rice of damage by SRSB.
Processes 09 02166 g001
Figure 2. Locations of study areas and light traps.
Figure 2. Locations of study areas and light traps.
Processes 09 02166 g002
Figure 3. Variable importance derived from the GBDT model for natural log-transformed ln (SRSB) light trap catches in Yuanjiang.
Figure 3. Variable importance derived from the GBDT model for natural log-transformed ln (SRSB) light trap catches in Yuanjiang.
Processes 09 02166 g003
Figure 4. Forecasting process of DeepAR.
Figure 4. Forecasting process of DeepAR.
Processes 09 02166 g004
Figure 5. Predicted results of the SRSB light trap catches from the test datasets using the MLR, GBDT, and DeepAR models.
Figure 5. Predicted results of the SRSB light trap catches from the test datasets using the MLR, GBDT, and DeepAR models.
Processes 09 02166 g005
Figure 6. Actual versus predicted natural log-transformed ln (SRSB) in (a) Hongjiang, (b) Yuanjiang, (c) Dong’an, (d) Linli, and (e) Liling.
Figure 6. Actual versus predicted natural log-transformed ln (SRSB) in (a) Hongjiang, (b) Yuanjiang, (c) Dong’an, (d) Linli, and (e) Liling.
Processes 09 02166 g006
Figure 7. Predicted performance of MLR, GBDT, and DeepAR in different areas.
Figure 7. Predicted performance of MLR, GBDT, and DeepAR in different areas.
Processes 09 02166 g007
Table 1. The main rice pests captured by the light traps.
Table 1. The main rice pests captured by the light traps.
NumberNameAbbreviationLatin Name
0rice planthopperRPH-
1paddy leaf rollerPLRCnaphalocrocis medinalis
2striped rice stem borerSRSBChilo suppressalis
3pink sugarcane borerPSBSesamia grisescens
4yellow stem borerYSBScirpophaga incertulas
5rice green semilooperRGSNaranga diffusa
6rice plant weevilRPWEchinocnemus squameus
7rice water weevilRWWLissorhoptrus oryzophilus
8gall midgeGMOrseoia oryzae
9paddy armywormPAMythimna separata
10-Other *-
* ‘Other’ is the sum of other species captured by our light traps apart from the rice pests shown (Numbers 0–9).
Table 2. Types and units of meteorological factors.
Table 2. Types and units of meteorological factors.
NumberTypeAbbreviationUnitNumberTypeAbbreviationUnit
0TemperatureTEMP°C10PrecipitationPRCPmm
1Maximum temperatureTmax°C11EvaporationEVPmm
2Minimum temperatureTmin°C12Atmospheric pressureAPpa
3Average relative humidityRH%13Maximum atmospheric pressureAPmaxpa
4Minimum relative humidityRHmin%14Minimum atmospheric pressureAPminpa
5Wind speedWDSPm/s15Skin temperatureSKT°C
6Maximum wind speedMXWDSPm/s16Maximum skin temperatureSKTmax°C
7Maximum wind directionMXWDD16 directions17Minimum skin temperatureSKTmin°C
8Extreme wind speedEXWDSPm/s18Sunshine durationSDDH
9Extreme wind directionEXWDD16 directions
Table 3. The study areas and the corresponding meteorological stations.
Table 3. The study areas and the corresponding meteorological stations.
NumberStudy AreaMeteorological Stations
0LilingZhuzhou
1HongjiangZhijiang Dong Autonomous County
2Dong’AnLingling
3YuanjiangYuanjiang
4LinliShimen
Table 4. Rice pests, weather, and time datasets.
Table 4. Rice pests, weather, and time datasets.
Weather VariablesTime Series of Related PestsTime Features
TEMPEVPRPHYear
RHAPPLRseason
RHminPRCPSRSBmonth
WDSPEXWDDPSBweeks
MXWDSPEXWDSPYSB-
MXWDDSDD--
SKTmax---
Table 5. Details of the data used in model development.
Table 5. Details of the data used in model development.
SitePlaceInput
Variable
Output
Variable
Month (Yearly)Training
Data
Testing
Data
AHongjiangWeather variables, Time series of related pests, and Time featuresChilo suppressalis (SRSB)March to October2000 to 20182019 to 2020
BYuanjiang2000 to 20182019 to 2020
CDong’an2000 to 20182019 to 2020
DLinli2000 to 20182019 to 2020
ELiling2010 to 20192020
Table 6. Pearson’s correlation coefficient (R) between the natural log-transformed SRSB light trap catch and its associated pest, meteorological, and time features from March to October in the rice crop season.
Table 6. Pearson’s correlation coefficient (R) between the natural log-transformed SRSB light trap catch and its associated pest, meteorological, and time features from March to October in the rice crop season.
Variable TypesExternal VariablesCorrelation Coefficient (R)Sig. (p > |t|)
Related pestsRPH0.458 ± 0.1110.008 ± 0.011 *
PLR0.368 ± 0.0680.123 ± 0.179
PSB0.271 ± 0.0980.000 ± 0.000 **
YSB0.086 ± 0.0290.119 ± 0.265
WeatherTEMP0.449 ± 0.0460.213 ± 0.374
RH−0.031 ± 0.0470.048 ± 0.235
RHmin−0.041 ± 0.0640.082 ± 0.133
WDSP0.049 ± 0.1360.052 ± 0.048
MXWDSP0.161 ± 0.0830.121 ± 0.131
MXWDD0.086 ± 0.1610.335 ± 0.364
EXWDSP0.175 ± 0.0610.352 ± 0.355
EXWDD0.098 ± 0.1550.334 ± 0.291
SDD0.282 ± 0.0130.112 ± 0.174
PRCP0.091 ± 0.0390.090 ± 0.121
EVP0.409 ± 0.0330.073 ± 0.163
AP−0.445 ± 0.0700.013 ± 0.029 *
SKT0.469 ± 0.0500.208 ±0.209
TimeWeeks0.112 ± 0.0420.562 ± 0.264
Month0.113 ± 0.0410.477 ± 0.238
Year0.146 ± 0.0870.191 ± 0.418
Season−0.247 ± 0.0790.001 ± 0.001 *
** Extremely significant, * significant.
Table 7. Statistical diagnostics of the stepwise multiple linear regression models (taking the Yuanjiang SRSB as an example).
Table 7. Statistical diagnostics of the stepwise multiple linear regression models (taking the Yuanjiang SRSB as an example).
ModelVariablesCoefStd Errt p > t VIF < 3
WeatherConst.708.132911.60461.0260.000
AP−61.37991.007−60.9770.000True
N = 7013R2 = 0.347Adj.R2 = 0.346
Weather and related pests time seriesConst.536.156113.54339.5910.000
AP−46.48951.175−39.5680.000True
RPH0.12050.00619.1130.000True
YSB2.17250.19411.1820.000True
PSB0.45840.04310.6940.000True
N = 7013R2 = 0.3998Adj.R2 = 0.399
Weather, time series of related pests, and time featuresConst.535.942613.53539.5890.000
AP−45.59271.210−37.6880.000True
RPH0.12830.00718.8890.000True
YSB2.12720.19510.9240.000True
PSB0.47090.04310.9430.000True
Season−0.00500.002−3.0670.002True
N = 7013R2 = 0.400Adj.R2 = 0.400
Table 8. Summary of the MLR model for the training datasets of each study area.
Table 8. Summary of the MLR model for the training datasets of each study area.
PlaceYuanjiang (R2 = 0.400, Adj.R2 = 0.400, N = 7013)Hongjiang (R2 = 0.379, Adj.R2 = 0.378, N = 7013)Dong’an (R2 = 0.398, Adj.R2 = 0.398, N = 7013)Linli (R2 = 0.257, Adj.R2 = 0.256, N = 7013)Liling (R2 = 0.359, Adj.R2 = 0.358, N = 3726)
Variable
Const.535.9432.1460.425−0.489−0.952
TEMP000.0710.4940.701
RHmin0−0.412000
AP−45.5920000
RPH0.1280.2090.25800
PSB0.4710.75902.1230.430
YSB2.127−0.2353.3590.1693.070
PLR0000.0540.397
Season−0.005−0.134−0.165−0.159−0.175
Weeks0−0.007000
Month00−0.03000
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Tan, S.; Liang, Y.; Zheng, R.; Yuan, H.; Zhang, Z.; Long, C. Dynamic Prediction of Chilo suppressalis Occurrence in Rice Based on Deep Learning. Processes 2021, 9, 2166. https://doi.org/10.3390/pr9122166

AMA Style

Tan S, Liang Y, Zheng R, Yuan H, Zhang Z, Long C. Dynamic Prediction of Chilo suppressalis Occurrence in Rice Based on Deep Learning. Processes. 2021; 9(12):2166. https://doi.org/10.3390/pr9122166

Chicago/Turabian Style

Tan, Siqiao, Yu Liang, Ruowen Zheng, Hongjie Yuan, Zhengbing Zhang, and Chenfeng Long. 2021. "Dynamic Prediction of Chilo suppressalis Occurrence in Rice Based on Deep Learning" Processes 9, no. 12: 2166. https://doi.org/10.3390/pr9122166

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop