Rice Mapping Using a BiLSTM-Attention Model from Multitemporal Sentinel-1 Data

: Timely and accurate rice distribution information is needed to ensure the sustainable development of food production and food security. With its unique advantages, synthetic aperture radar (SAR) can monitor the rice distribution in tropical and subtropical areas under any type of weather condition. This study proposes an accurate rice extraction and mapping framework that can solve the issues of low sample production efﬁciency and fragmented rice plots when prior information on rice distribution is insufﬁcient. The experiment was carried out using multitemporal Sentinel-1A Data in Zhanjiang, China. First, the temporal characteristic map was used for the visualization of rice distribution to improve the efﬁciency of rice sample production. Second, rice classiﬁcation was carried out based on the BiLSTM-Attention model, which focuses on learning the key information of rice and non-rice in the backscattering coefﬁcient curve and gives different types of attention to rice and non-rice features. Finally, the rice classiﬁcation results were optimized based on the high-precision global land cover classiﬁcation map. The experimental results showed that the classiﬁcation accuracy of the proposed framework on the test dataset was 0.9351, the kappa coefﬁcient was 0.8703, and the extracted plots maintained good integrity. Compared with the statistical data, the consistency reached 94.6%. Therefore, the framework proposed in this study can be used to extract rice distribution information accurately and efﬁciently.


Introduction
Rice is one of the most important food crops in the world, and more than half of the world's population relies on rice as a staple food [1]. With the continuous growth of population and consumption, the global demand for rice will increase for at least another 40 years [2]. Nearly 496 million metric tons of milled rice were produced in 2019 worldwide (http://www.worldagriculturalproduction.com/crops/rice.aspx) accessed on 20 September 2021. China's rice output exceeded 209 million tons in 2019, becoming the world's leading rice producer, followed by India and Indonesia. Almost all rice areas in China are irrigated, which makes China's production even higher [3]. A reliable and accurate rice classification map is an important prerequisite for spatiotemporal rice monitoring and yield estimation [4,5], and it is also an important data source for food policy formulation and food security assessment [6][7][8].
Compared with traditional land resource survey methods, remote sensing technology has a large spatial coverage and a low cost, is not limited by season, and can provide timely and effective rice information [9]. Rice planting areas are mainly distributed in tropical and subtropical monsoon climates that share similar periods of rain and heat, increasing the difficulty of obtaining reliable high-resolution optical time series data [10]. Synthetic aperture radar (SAR) can work under any weather conditions and is very sensitive to the false alarm or misclassification. Therefore, in order to improve the classification accuracy, further post-processing is needed.
To address the abovementioned issues, a multitemporal rice extraction and mapping framework was designed. First, the statistical parameter characteristic maps of time series data were used to assist rice sample production and improve the efficiency of sample generation. Second, the attention mechanism [49] was introduced into the BiLSTM network model to strengthen the learning of rice temporal features and improve the accuracy of rice extraction. Finally, the classification results were optimized by using FROM-GLC10 (Finer Resolution Observation and Monitoring of Global Land Cover) [50]. The body of this paper is organized as follows. Section 2 introduces materials and the proposed method, and Section 3 introduces the experimental results and analysis. Section 4 provides a discussion of results. Finally, a conclusion is drawn.  Figure 1. Zhanjiang City, with a total area of 13,225.44 km 2 , is the largest rice planting area in Guangdong Province, and it is known as the "granary of western Guangdong". Zhanjiang city has a tropical monsoon climate and a subtropical monsoon climate. The annual active accumulated temperature ≥10 • C was 8000~8500 • C. The terrain is dominated by plains and platforms, and paddy fields are mainly distributed in coastal plains and intermountain basins. The rice planting cycle in Zhanjiang City is mainly from April to December. The planting system is a one-year multi-cropping system dominated by double cropping indica rice, which implements water and drought rotation with sugarcane, peanut, potato, beans, and other crops in the same year or the next year.

SAR Data
To fully ensure the integrity of the rice planting cycle in the SAR time series data, total of 66 C-band (frequency = 5.406 GHz, wavelength~6 cm) SAR images of the Sentinel-1A (S1A) satellite spanning March 2019 to December 2019 were used. The Sentinel-1 images used were dual polarization (VV and VH) GRD products in interferometric broadband (IW) imaging mode [51]. The coverages of the adjacent track S1A data used in this paper are presented in Figure 1b, and the list of SAR data is shown in Table 1.

Methodology
As mentioned above, the following issues are present in the research of rice extraction from multitemporal SAR data: (1) it is very difficult to construct rice samples using only SAR time series data without rice prior distribution information; (2) the rice planting cycle in tropical or subtropical areas is complex, and the existing rice extraction methods do not make full use of the temporal characteristics of rice, and the classification accuracy needs to be improved; (3) additionally, small rice plots are often affected by small roads and shadows. There are some false alarms in the extraction results, so the classification results need to be optimized. Therefore, this paper proposes a rice extraction and mapping method using multitemporal SAR data, as shown in Figure 2. This research was conducted in the following parts: (1) pixel-level rice sample production based on temporal statistical characteristics; (2) the BiLSTM-Attention network model constructed by combining BiLSTM model and attention mechanism for rice region, and (3) the optimization of classification results based on FROM-GLC10 data.

Preprocessing
Because VH polarization is superior to VV polarization in monitoring rice phenology, especially during the rice flooding period [52,53], the VH polarization was chosen. Several preprocessing steps were carried out. First, the S1A level-1 GRD data format were imported to generate the VH intensity images. Second, the multitemporal intensity image in the same coverage area were registered using ENVI software. Then, the De Grandi Spatio-temporal Filter was used to filter the intensity image in the time-space combination domain. Finally, Shuttle Radar Topography Mission (SRTM)-90 m DEM was used to calibrate and geocode the intensity map, and the intensity data value was converted into the backscattering coefficient on the logarithmic dB scale. The pixel size of the orthophoto is 10 m, which is reprojected to the UTM region 49 N in the WGS-84 geographic coordinate system.

Time Series Curves of Different Landcovers
To understand the time series characteristics of rice and non-rice in the study area, typical rice, buildings, water, and vegetation samples in the study area were selected for time series curve analysis. The sample areas of four types of ground objects were selected from 157-63, 157-66 and 84-65, a total of 12 sample boxes were selected, and each box contains 20 pixels * 20 pixels. Figure 3 shows the distribution diagram of the selected four types of landcovers.
The average of 400 sample points in each box was calculated to obtain the average of the time series curve of the four types of landcovers, as shown in Figure 4.
Among the four types of ground objects, the average backscattering coefficient of buildings was the highest, and that of water was the lowest. The average backscattering coefficient of non-rice vegetation was higher than that of rice. Moreover, because there was no flooding period for non-rice vegetation, the minimum value of its time series curve was greater than that of rice.  Different from other dryland crops and vegetation, there was an agricultural flooding period in the growth process of rice, at which the backscattering coefficient of rice was close to that of water. The transplanting time of early rice was approximately April, and the harvesting time was approximately from the end of July to the beginning of August. The transplanting time of late rice was approximately from the end of July to the beginning of August, and the harvesting time was approximately December. The rice in the three frames was rice-1, rice-2 and rice-3. They started transplanting at the corresponding first time, when the rice was in the flooding period. With the growth of rice, the backscattering coefficient reached the maximum at almost the eighth time. When the rice entered the mature stage, the backscattering coefficient began to decrease, and the harvest was completed at the beginning of August and entered the next growth cycle of late rice. The results showed that the growth cycle of rice in the three frames had a certain synchronization. Although the data of the three frames at the corresponding time were not completely consistent, the maximum time difference was only 6 days, which was not enough to affect the phenological analysis of rice. The backscatter curves of three rice samples had some fluctuations, and a possible explanation was different soil conditions.

Rice Sample Production Based on Optimal Time Series Statistical Parameters
In order to calculate the efficiency, four simple time series statistical parameters were selected for comparative analysis of four ground objects, including maximum, minimum, average and variance. The average represents the relatively concentrated position in the time series data, the maximum value and the time series minimum value reflect the range of data change, and the variance reflects the dispersion of time series data. The results were shown in Figure 5. According to Figure 5, the maximum value of rice was close to the vegetation, the minimum value of rice was close to the water body, the variance of rice was large, and the average was lower than that of vegetation. The maximum, minimum, and average values of buildings were the highest. The maximum, minimum, and the average of the water body were the lowest.
Then, the three parameters were arbitrarily selected to synthesize the false color composite image. As shown in Figure 6, rice was displayed in red in combination 1 (R: maximum G: minimum B: average), blue-purple in combination 2 (R: maximum G: minimum B: variance) and combination 3 (R: maximum G: average B: variance), and dark blue-purple in combination 4 (R: Average G: Minimum B: Variance). Two subareas containing rice and water were selected for comparative analysis, as shown in Figures 7 and 8. First, which false color image had high color discrimination between rice area and other ground objects was analyzed. As shown in Figure 7a, the central belt area was a rice area. In Figure 7b, the area was shown in red, which was significantly different from the green shown by the surrounding features. In Figure 7c-e the rice area was shown in blue, but some surrounding non-rice areas, such as small roads and building shadows, were also shown in blue, which was easy to be confused with rice and was not conducive to rice sample extraction. Therefore, the combination (R: maximum G: minimum B: average) shown in Figure 7b was more suitable for extracting rice areas. The next step was to confirm the separability of rice and water on the false color images. Figure 8a showed an area containing both rice and water. Figure 8c-e could not accurately distinguish rice from water, only the color of rice area and water in Figure 8b was significantly different. Therefore, according to the above results, among the four time series statistical parameter combinations, the combination 1, i.e., the maximum, minimum, and average synthetic false color images, had the best visualization effect on rice in this study area. After determining the optimal time series statistical parameter characteristic image, the production of SAR rice sample sets was carried out. The specific steps are as follows: (1) based on the rice position displayed in the false color image, the corresponding rice position in the time series SAR data is preliminarily determined to quickly locate the rice region; (2) cross validation using Google Earth's optical data; (3) manually draw the boundaries of rice and non-rice plots and complete the production of sample sets.
The samples were divided into two classes: rice and non-rice. The distribution of sample points is shown in Figure 9. There were 300,000 sample points (150,000 sample points in each category), of which 210,000 samples were used for model training (70%), 60,000 samples were used for model training verification (20%), and 30,000 samples were used for the model performance test (10%).

BiLSTM-Attention Model
The Bi-LSTM structure consists of a forward LSTM layer and a backward LSTM layer, which can be used to understand the past and future information in time series data [46]. Because the output of the BiLSTM model at a given time depends on both the previous time period and the next time period, the BiLSTM model has a stronger ability to process contextual information than the one-way LSTM model.
The rice planting patterns in tropical or subtropical regions are complex and diverse. The existing research methods have yet to improve the ability of learning time series information of rice, which makes it difficult to achieve high-precision extraction of rice distribution. It is necessary to strengthen the study of important temporal characteristics of rice and non-rice land types, and strengthen the separability of rice and non-rice, to improve the extraction results of rice. However, the various time-dimensional features extracted from the time series data by the BiLSTM model have the same weight in the decisionmaking process of the classification results, which will weaken the role of important time-dimensional features in the classification process and affect the classification results. Therefore, it is necessary to assign different weights to the various time-dimensional features obtained by the BiLSTM model to give full play to the contribution of different time-dimensional features to the classification results.
To solve the abovementioned issues, a BiLSTM-Attention network model was designed combining a BiLSTM model and an attention mechanism to realize high-precision rice extraction.
The core of the model was composed of two BiLSTM layers (each layer had five LSTM units, and the hidden dimension of each LSTM unit was 256), one attention layer, two full connection layers, and a softmax function, as shown in Figure 10. The input of the model was the vector composed of the sequential backscattering coefficient of VH polarization at each sample point. Since the time dimension of time series data was 22, its size was 22 * 1. Each BiLSTM layer consisted of a forward LSTM layer and a backward LSTM layer. Then, the rice timing features learned by the two BiLSTM layers were input into the attention layer. The core idea of the attention layer was to learn task-related features by suppressing irrelevant parts in pattern recognition, as shown in Figure 10.
The attention layer forced the network to focus on the rice extraction task, was more sensitive to the unique information of different classes in the time series data, paid attention to extracting the effective information that could be used for classification in the SAR time series, endowed it with the ability of different "attention", and kept consistent with the classification information in the whole time series data. When faced with more complicated rice extraction tasks in tropical and subtropical regions, the presence of the attention layer enabled the network model to reduce the misclassification of rice and non-rice.
First, the hidden vector h it obtained from the two BiLSTM layers was input into a single-layer neural network to get u it , then the transposition of u it and u w , were multiplied and then normalized by softmax to get the weight α it . Subsequently, α it and h it were multiplied and summed to get the weighted vector c i . Finally, the output of attention c i successively was sent to two fully connected layers and one softmax layer to get the final classification result.
where h it represents the hidden vector at time t of the ith sample, α it , W w and u w are the weights, b w is bias, and c it represents the output of the attention mechanism. The hidden vector h it obtained from BiLSTM obtains u it after activating the function. Additionally, u w and W w were randomly initialized.
The BiLSTM-Attention model could effectively mine the change information between the previous time and the next time in the SAR time series data and could discern the high-dimensional time features of rice and non-rice from the time series data. Additionally, by learning the variation characteristics of the temporal backscatter coefficient of the rice growth cycle and the variation characteristics of the temporal backscatter coefficient of non-rice, the model could extract the key temporal data for rice and non-rice, strengthen the ability to distinguish rice and non-rice, and help to improve the classification effect of the model.

Optimization of Classification Results Based on FROM-GLC10
Due to the fragmentation of rice plots in the study area and the impact of buildings and water bodies, there may be a misclassification of rice in the classification results. Further post-processing was needed to improve the classification results.
In 2019, the research team of Professor Gong Peng, Department of Earth System Science at Tsinghua University, released the method and results of global surface coverage mapping with 10 m resolution (FROM-GLC10), which can be passed through http://data. ess.tsinghua.edu.cn (accessed on 22 January 2021) free download. The experimental results show that the overall accuracy of FROM-GLC10 product is 72.76% [50].
As shown in Figure 3, the water layer mask and impermeable layer mask were extracted from FROM-GLC10, and then the rice classification results were optimized using the intersection of the initial extraction results and the mask layer.

Accuracy Evaluation
In this research, the precision indicators of the confusion matrix widely used in crop classification research were used, including accuracy, precision, recall, F1, and kappa [54][55][56]. (TP + TN + FN + FP) 2 (9) where TP is the number of the rice pixels truly classified as rice pixels, TN is the number of non-rice pixels truly classified as non-rice pixels, FP is the number of non-rice pixels falsely classified as rice, FN is the number of rice pixels falsely classified as non-rice pixels, and Pe is the expected accuracy.

Parameter Settings
The BiLSTM-Attention model was built through the PyTorch framework. The version of Python is 3.7, and the version of PyTorch employed in this study is 1.2.0. All the processes were performed on a Windows 7 workstation with a NVIDIA GeForce GTX 1080 Ti graphics card. The batch size was set to 64, the initial learning rate was 0.001, and the learning rate was adjusted according to the epoch training times. The attenuation step of the learning rate was 10, and the multiplication factor of the updating learning rate was 0.1. Using the Adam optimizer, the optimized loss function was cross entropy, which was the standard loss function used in all multiclassification tasks and has acceptable results in secondary classification tasks [57].

Results
In order to verify the effectiveness of our proposed method, we carried out three experiments: (1) the comparison of our proposed method with BiLSTM model and RF classification method; (2) comparative analysis before and after optimization by using FROM-GLC10; (3) comparison between our experimental results and agricultural statistics.

Comparison of Rice Classification Methods
In this experiment, the BiLSTM method and the classical machine learning method RF were selected for comparative analysis, and the five evaluation indexes introduced in Section 2.2.5 were used for quantitative evaluation.
To ensure the accuracy of the comparison results, the BiLSTM model had the same BiLSTM layers and parameter settings with the BiLSTM-Attention model. The BiLSTM model was also built through the PyTorch framework.
Random forest, like its name implies, consists of a large number of individual decision trees that operate as an ensemble. Each individual tree in the random forest spits out a class prediction and the class with the most votes becomes the model's prediction. The implementation of the RF method is shown in [58]. By setting the maximum depth and the number of samples on the node, the tree construction can be stopped, which can reduce the computational complexity of the algorithm and the correlation between sub-samples. In our experiment, RF and parameter tuning were realized by using Python and Sklearn libraries. The version of Sklearn libraries was 0.24.2. The number of trees was 100, the maximum tree depth was 22.
The quantitative results of different methods on the test dataset mentioned in the Section 2.2.3 are shown in Table 2. The accuracy of BiLSTM-Attention was 0.9351, which was significantly better than that of BiLSTM (0.9012) and RF (0.8809). This result showed that compared with BiLSTM and RF, the BiLSTM-Attention model achieved higher classification accuracy.
A test area was selected for detailed comparative analysis, as shown in Figure 11. Figure 11b shows the RF classification results. There were some broken missing areas. It was possible that the structure of RF itself limited its ability to learn the temporal characteristics of rice. The areas missed in the classification results of BiLSTM shown in Figure 11c were reduced and the plots were relatively complete. It was found that the time series curve of missed rice in the classification results of BiLSTM model and RF had obvious flooding period signal. When the signal in harvest period is not obvious, the model discriminates it into non-rice, resulting in missed detection of rice. Compared with the classification results of the BiLSTM and RF, the rice plots in the classification results of the BiLSTM-Attention in Figure 11d were relatively complete and less missing, indicating that the proposed method had better discrimination ability between rice and non-rice.

Optimization of Classification Results Based on FROM-GLC10 Data
Due to the complex background of the test area and the fragmentation of rice plots, there are still some false alarms after classification by the BiLSTM-Attention model. These false alarms are mainly small water bodies and impermeable layers. FROM-GLC10 data were used to remove false alarms in the original classification results. Figure 12 shows an example of using FROM-GLC10 data to remove false alarms. As shown in Figure 12a, this area contains water bodies and impermeable layers. As shown in Figure 12b, small roads, riverside vegetation and some water bodies were wrongly classified as rice in the original classification results. The water layer and the impermeable layer extracted from FROM-GLC10 are respectively shown in Figure 12c-e is the optimized result. The false alarm removed (Figure 12f) accounted for about 2.4% of the extracted rice area.  Figure 13 shows the distribution results of rice in Zhanjiang city using the proposed framework. The rice area in the rice distribution map was 180,012.39 ha, and the statistical data of paddy fields in Zhanjiang city in 2018 released by the Department of Natural Resources of Guangdong Province was 190,280.02 ha (http://nr.gd.gov.cn/zwgknew/ tzgg/gg/content/post_2719232.html, accessed on 6 March 2021). If the area of paddy fields was the area of rice, the overall detection accuracy of our method is 94.6%, which is in good agreement with the statistical data. The difference between the rice area in the classification results and the statistical data was 10,267.63 ha, accounting for approximately 5.4% of the statistical data. Table 3 shows the classified areas of rice in various districts and counties of Zhanjiang City. It can be seen from Table 3 that there were some omissions in in some districts and counties. There were some omissions in the extraction results of rice in Lianjiang City. The possible reason is that the terrain of Lianjiang City is mainly mountainous, and the complex terrain affects the extraction of rice. Potou District, Chikan District and Xiashan district also had the same problems. It may be that the administrative divisions in these areas are very small, the distribution of buildings and streets is very dense, and the paddy fields are small and scattered, resulting in the decline of detection accuracy. In the future, with the help of higher precision DEM data and landcover data, the omission can be reduced.

Discussion
In this study, our goal was to study how to use SAR data to extract rice in tropical or subtropical areas based on deep learning methods. Based on our proposed method, the rice area of Zhanjiang City is successfully extracted by using Sentinel-1 data.
Both the classification method based on deep learning and the traditional machine learning method need a certain amount of rice sample data. Most existing studies used the open land cover classification map drawn by government agencies as the ground truth value of rice extraction research [32,47,48], but the coverage of these land cover classification maps is limited and cannot be updated in time to meet the research needs. In addition, researchers could obtain the basic truth value of rice distribution through field investigations [43]. However, this method is time-consuming and laborious. When field investigation is impossible, rice samples are often selected based on remote sensing images. Due to the imaging mechanism of SAR images, the interpretation of SAR images is much more difficult than optical images. At present, the common solution is to locate the rice planting area by using the time series curve of the backscattering coefficient of SAR image and optical data [24,27,30,39,59]. It is a great challenge for human eyes to interpret rice region on SAR gray images. It is an effective strategy to use the combination of characteristic parameters to form a false color image to increase the color difference between rice and other ground objects as much as possible and achieve the best interpretation effect. Based on the analysis of the statistical characteristics of time series backscatter coefficients of rice and non-rice in Zhanjiang City, this paper compared the color combination methods of multiple statistical parameters, selected the feature combination strategy most suitable for extracting rice region, realized the rapid positioning of rice and improved the efficiency of sample production.
There are many successful cases of rice classification methods based on traditional machine learning or deep learning [32,39,41,52,60]. In 2016, Nguyen et al. used the decision tree method to realize rice recognition based on Sentinel-1 time series data, with an accuracy of 87.2% [52]. Bazzi et al. used RF and DT classifiers with Sentinel-1 SAR data time series between May 2017 and September 2017 to map the rice area over the Camargue region of France [32]. The overall accuracies of both methods were better than 95%. However, the derived indicators used in these machine learning methods are too dependent on the prior knowledge of specific regions, and it is difficult to be directly applied to other regions. In addition, they all studied single cropping rice and were not suitable for rice areas with complex planting patterns. Ndikumana et al. carried out a comparative experimental study of deep learning methods and traditional machine learning methods in crop classification of SAR data. Based on 25 Sentinel-1 images, they carried out crop classification in Camage, France. The experimental results showed that LSTM and GRU classifiers were significantly better than the classical methods [41]. Wang et.al combined 11 Sentinel-2 images and 23 Sentinel-1 GRD images covering the Tongxiang County of China's Zhejiang Province and then put them to the designed LSTM classifier to obtain a paddy rice map [60]. The overall accuracy was up to 0.937. Filho et al. used 60 scenes of Sentinel-1 VH data from 2017 to 2018 and BiLSTM to classify rice in Rio Grande do Sul state of Brazil [39]. The results of the BiLSTM model were better than the LSTM model. RNNs have achieved some success in the field of rice extraction, but these models give the same weight to the time dimension features with different importance in the classification decision-making process, which affects the final classification accuracy. We added the attention model to the BiLSTM model, which could fully mine the favorable time series information, gave different weights to various time dimension features in the classification decision-making process, and strengthened the separability of rice and non-rice, so as to improve the classification performance of the model.
In the absence of a large amount of prior knowledge of rice, there will inevitably be some misclassification in the original classification results, so the original classification results need to be post-processed. Many researchers used post-processing methods to optimize the classification results [36,[61][62][63]. Therefore, we used FROM-GLC10 for the post-processing of rice extraction results, which reduced the false alarm to a certain extent.
Whether compared with other methods or with statistical data, our proposed method has achieved good results, which shows that our method has certain practical value in the extraction of tropical and subtropical rice. However, there are still some deficiencies in the current research results. In mountainous areas, the mountains and shadows in SAR images cause the omission of rice. Secondly, the riverside vegetation has similar temporal characteristics with rice, which leads to false alarm in rice extraction results. In the future, we will add some negative sample training to further improve the performance of the method.

Conclusions
According to the application requirements of tropical and subtropical rice monitoring, this study proposed a set of rice extraction and mapping frameworks, including rice sample making method using time characteristics, rice classification method based on BiLSTM-Attention model, and post-processing method based on high-precision global land cover. Using 66 scenes of Sentinel-1 data in 2019 and the proposed framework, rice mapping was carried out in Zhanjiang City, China. Experimental results show that the time series feature combination strategy of time series maximum, time series minimum, and average can intuitively reflect the distribution of rice and improve the production efficiency of samples. The accuracy of rice region extraction by the proposed method is 0.9351, which is better than BiLSTM and RF methods, and the extracted plots maintain good integrity.
In the coming years, we will carry out large-scale rice mapping research based on multitemporal SAR data, further improve the classification accuracy, and promote rice yield estimation based on yield estimation models, so as to provide valuable information for the formulation of food policy.

Data Availability Statement:
The Sentinel-1 data presented in this study are openly and freely available at https://urs.earthdata.nasa.gov/, accessed on 15 April 2020.