Rice Crop Detection Using LSTM, Bi-LSTM, and Machine Learning Models from Sentinel-1 Time Series

: The Synthetic Aperture Radar (SAR) time series allows describing the rice phenological cycle by the backscattering time signature. Therefore, the advent of the Copernicus Sentinel-1 program expands studies of radar data (C-band) for rice monitoring at regional scales, due to the high temporal resolution and free data distribution. Recurrent Neural Network (RNN) model has reached state-of-the-art in the pattern recognition of time-sequenced data, obtaining a signiﬁcant advantage at crop classiﬁcation on the remote sensing images. One of the most used approaches in the RNN model is the Long Short-Term Memory (LSTM) model and its improvements, such as Bidirectional LSTM (Bi-LSTM). Bi-LSTM models are more e ﬀ ective as their output depends on the previous and the next segment, in contrast to the unidirectional LSTM models. The present research aims to map rice crops from Sentinel-1 time series (band C) using LSTM and Bi-LSTM models in West Rio Grande do Sul (Brazil). We compared the results with traditional Machine Learning techniques: Support Vector Machines (SVM), Random Forest (RF), k-Nearest Neighbors (k-NN), and Normal Bayes (NB). The developed methodology can be subdivided into the following steps: (a) acquisition of the Sentinel time series over two years; (b) data pre-processing and minimizing noise from 3D spatial-temporal ﬁlters and smoothing with Savitzky-Golay ﬁlter; (c) time series classiﬁcation procedures; (d) accuracy analysis and comparison among the methods. The results show high overall accuracy and Kappa ( > 97% for all methods and metrics). Bi-LSTM was the best model, presenting statistical di ﬀ erences in the McNemar test with a signiﬁcance of 0.05. However, LSTM and Traditional Machine Learning models also achieved high accuracy values. The study establishes an adequate methodology for mapping the rice crops in West Rio Grande do Sul.


Introduction
Spatiotemporal monitoring of rice plantations is crucial information for the development of policies for economic growth, food security, and environmental conservation [1].In many regions of the world, official data used to estimate cultivated areas is based on surveys of statistical data through field visits and farmers' interviews, which is a very slow, expensive, and laborious process [2].In this context, remote sensing images allow systematic coverage of a wide geographic area over time, quickly and at a low cost.Thus, in recent decades, different remote sensing data and digital image processing have been developed to monitor crops [3][4][5], especially rice plantations [6][7][8].
In mapping rice paddies, many remote sensing studies were carried out before 2000, in the 1980s and 1990s, using optical images [9][10][11], microwave images [12][13][14], and a combination of these two data [15].An essential approach to rice plantation detection came with phenological behavior analysis, which requires a dense time series to distinguish it from other crops or between different rice-growing systems.In the optical image time series, various sensors have been evaluated in mapping rice planting: Landsat Series (Terrestrial Remote Sensing Satellite) [16][17][18], NOAA AVHRR [19][20][21], SPOT [22][23][24], and MODIS [25][26][27][28][29][30].However, optical data analysis requires sensors with a high temporal resolution to acquire enough cloudless images to create reliable time series.In certain places, there are severe climatic limitations to obtain cloudless images throughout the rice-growing season.
The advent of the European Space Agency's Sentinel-1A images (C-band) has significantly increased the volume of high revisit time SAR data with free availability, resulting in greater dissemination and advancement in crop mapping research.These denser time-series SAR data allow us to capture short phenological stages, increasing the classification capacity.Monitoring and mapping the pattern of rice cultivation from Sentinel-1 time-series images has been tested in different locations around the world: Bangladesh [56], China [57], France [58], India [56,59], Japan [60], Spain [57], USA [57], Vietnam [57,[61][62][63][64].Some studies also integrate the Sentinel-1 image with optical images from the Landsat 8 Operational Land Imager (OLI) and the Sentinel-2 A/B [65][66][67][68].
Recently, Deep Learning has reached the state-of-the-art in the field of computer vision, obtaining a significant advantage in the classification of natural images [69][70][71] and remote sensing data [72][73][74].In studies of dense time series, the Recurrent Neural Network (RNN) methods are the most promising due to their ability to analyze sequential data [75].Among RNNs, the Long Short-Term Memory (LSTM) model [76] is widely used to capture time correlation efficiently.Therefore, LSTM allows the evaluation of plantations' phenological variation, detecting pixel coherence between time sequence data.In remote sensing data, the LSTM model was applied in: change detection in bi-temporal images [75,77], time-series classification to distinguish crops [78][79][80][81][82], land use/land cover [83,84], and vegetation dynamics [85].
This research aims to evaluate methods to detect rice cultivation in southern Brazil from the Sentinel-1 time series, using the LSTM and Bidirectional LSTM (Bi-LSTM) methods.The study compared the results based on LSTM and Bidirectional LSTM (Bi-LSTM) with traditional machine learning techniques: Support Vector Machines (SVM), Random Forest (RF), k-Nearest Neighbors (k-NN), and Normal Bayes (NB).

Study Area
Brazil was the tenth largest producer of milled rice in 2018/2019 [86].Apart from Asian countries, Brazil became the most significant global producer.However, the southern region concentrates the national production, with 80% of the production, where the State of Rio Grande do Sul is the largest Brazilian producer [87].The Rio Grande do Sul (composed of 497 municipalities) is the ninth-largest Brazilian State (281,730.2km 2 ), representing more than 3% of the Brazilian territory.In 2019, the State's population was 11,377,278 habitants (6% of Brazil) with a demographic density of 40.3 habitants/km 2 [88].The production of rough rice in the Rio Grande do Sul in 2018 was 8401 million tons in a planted area of 1068 million hectares, representing 71% of the national production, which was 11,749 million tons [89].The study area is the rice-producing region of Uruguaiana, which has been successively the national leader in rice production among Brazilian municipalities [89].It covers the municipality and its surroundings in the Uruguay River basin, located west of the State of Rio Grande do Sul, on the border with Argentina (Figure 1).national production, with 80% of the production, where the State of Rio Grande do Sul is the largest Brazilian producer [87].The Rio Grande do Sul (composed of 497 municipalities) is the ninth-largest Brazilian State (281,730.2km 2 ), representing more than 3% of the Brazilian territory.In 2019, the State's population was 11,377,278 habitants (6% of Brazil) with a demographic density of 40.3 habitants/km 2 [88].The production of rough rice in the Rio Grande do Sul in 2018 was 8401 million tons in a planted area of 1068 million hectares, representing 71% of the national production, which was 11,749 million tons [89].The study area is the rice-producing region of Uruguaiana, which has been successively the national leader in rice production among Brazilian municipalities [89].It covers the municipality and its surroundings in the Uruguay River basin, located west of the State of Rio Grande do Sul, on the border with Argentina (Figure 1).This region is on the Campaign Plateau with the presence of flatforms associated with the Uruguay River and its tributaries (Quaraí, Ibicui, and Butuí-Icamaquã) [90,91].The rocky substrate consists of intermediate volcanic rocks (andesites and basalts) [92].The area belongs to the Pampa Biome, where the Vachellia caven grasslands predominate [93].However, there is a high grassland conversion to crop areas, mainly from rice and secondarily to soybeans, corn, and vegetables.
However, the expansion of irrigated rice fields in flooded plain intensifies the fragmentation of wetlands in southern Brazil, which contains approximately 72% of the fragments smaller than 1 km 2 [94].Many studies analyze the effects of rice plantations in south Brazil biodiversity [95][96][97][98].In this context, Maltchik et al. [99] propose good practices in managing rice production to conserve biodiversity within production systems.Besides, increasing irrigation to obtain higher productivity than rainfed cultivation intensifies the need for water resource management to avoid contingencies and equalize the demand for agriculture, livestock, and industry.

Materials and Methods
Image processing included the following steps (Figure 2 This region is on the Campaign Plateau with the presence of flatforms associated with the Uruguay River and its tributaries (Quaraí, Ibicui, and Butuí-Icamaquã) [90,91].The rocky substrate consists of intermediate volcanic rocks (andesites and basalts) [92].The area belongs to the Pampa Biome, where the Vachellia caven grasslands predominate [93].However, there is a high grassland conversion to crop areas, mainly from rice and secondarily to soybeans, corn, and vegetables.
However, the expansion of irrigated rice fields in flooded plain intensifies the fragmentation of wetlands in southern Brazil, which contains approximately 72% of the fragments smaller than 1 km 2 [94].Many studies analyze the effects of rice plantations in south Brazil biodiversity [95][96][97][98].In this context, Maltchik et al. [99] propose good practices in managing rice production to conserve biodiversity within production systems.Besides, increasing irrigation to obtain higher productivity than rainfed cultivation intensifies the need for water resource management to avoid contingencies and equalize the demand for agriculture, livestock, and industry.

Materials and Methods
Image processing included the following steps (Figure 2

Dataset and Pre-Processing
The research used a Sentinel-1 time series over two years (2017 and 2018).The images are available free of charge by the European Space Agency (ESA) (https://scihub.copernicus.eu/dhus/#/home).The Sentinel-1 images correspond to the C band (5.4 GHz) with four modes with different spatial resolutions and swath width [100].The present research used the product Ground Range Detected (GRD) in Interferometric Wide Swath mode, with a 10-m resolution image and a large swath width (250 km).The Sentinel-1 revisit cycle is 12 days, obtaining 30 images per year for the study area, which totals 60 images for the 2017-2018 period.We adopted two years of images because the rice cycle starts in half a year and ends in the next.Besides, investigates indicate that the adoption of two-year time signatures facilitates the detection of targets

Dataset and Pre-Processing
The research used a Sentinel-1 time series over two years (2017 and 2018).The images are available free of charge by the European Space Agency (ESA) (https://scihub.copernicus.eu/dhus/#/home).The Sentinel-1 images correspond to the C band (5.4 GHz) with four modes with different spatial resolutions and swath width [100].The present research used the product Ground Range Detected (GRD) in Interferometric Wide Swath mode, with a 10-m resolution image and a large swath width (250 km).The Sentinel-1 revisit cycle is 12 days, obtaining 30 images per year for the study area, which totals 60 images for the 2017-2018 period.We adopted two years of images because the rice cycle starts in half a year and ends in the next.Besides, investigates indicate that the adoption of two-year time signatures facilitates the detection of targets with phenological cycles [101,102].This temporal resolution allows following the phenological behavior of the rice plantation.In this research, we used Vertical transmit and Horizontal receive (VH) polarization since other studies have shown that the VH polarized backscatter is more sensitive to rice-crop growth than other polarizations [62,65].The pre-processing of the images used the Sentinel Application Platform (SNAP) software, considering the following workflow [103]: apply orbit file; calibration (procedure that converts digital pixel values to radiometrically calibrated SAR backscatter); a range-Doppler terrain correction by utilizing the SRTM digital terrain model; and linear conversion in decibels (dB).The corrected images (2017-2018) were stacking in just one file generated a temporal cube.

SAR Image Denoising
The speckle is inherent in SAR images, establishing a granular appearance with light and dark pixels, affecting the identification of surface targets.Speckle increases confusion between different types of surfaces, which are worst in low contrast areas.Therefore, noise filtering is a prerequisite for radar imagery applications.Following other strategies to reduce noise and create a high-quality satellite image time series, we use a two-stage filtering scheme [104,105]: (a) elimination of speckle noise using a 3D-Gamma filter, and (b) smoothing of time series using the method of Savitzky and Golay (S-G) [106].
Most filters operate in the spatial domain, using moving windows that perform statistical calculations of central value or adaptive methods.However, noise reduction in the SAR time series allows approaches that integrate the spatial and temporal dimensions to improve the quality of noise reduction [107][108][109][110].In this context, three-dimensional (3D) filters use spatial-temporal neighborhood information to derive the noise-free value [111,112] (Figure 3).Considering only the spatial domain, the effect of sliding window filters only increases with the expansion of the window size, which reduces the effective spatial resolution.On the other hand, spatiotemporal filters increase the amount of information in the time domain, reducing the edge-blurred effect.For example, the central pixel value of a 3 × 3 window uses nine pixels in the calculation, while a 3 × 3 × 3 cubic window uses 27 pixels in the same area.We adapted the Gamma adaptive filter [113] for a 3D window considering spatiotemporal data.The Gamma filter allows image conservation, filtering homogeneous surfaces, and preserving the edges.The window size used was 5 × 5 in the spatial domain and 9 in the temporal dimension.
with phenological cycles [101,102].This temporal resolution allows following the phenological behavior of the rice plantation.In this research, we used Vertical transmit and Horizontal receive (VH) polarization since other studies have shown that the VH polarized backscatter is more sensitive to rice-crop growth than other polarizations [62,65].The pre-processing of the images used the Sentinel Application Platform (SNAP) software, considering the following workflow [103]: apply orbit file; calibration (procedure that converts digital pixel values to radiometrically calibrated SAR backscatter); a range-Doppler terrain correction by utilizing the SRTM digital terrain model; and linear conversion in decibels (dB).The corrected images (2017-2018) were stacking in just one file generated a temporal cube.

SAR Image Denoising
The speckle is inherent in SAR images, establishing a granular appearance with light and dark pixels, affecting the identification of surface targets.Speckle increases confusion between different types of surfaces, which are worst in low contrast areas.Therefore, noise filtering is a prerequisite for radar imagery applications.Following other strategies to reduce noise and create a high-quality satellite image time series, we use a two-stage filtering scheme [104,105]: (a) elimination of speckle noise using a 3D-Gamma filter, and (b) smoothing of time series using the method of Savitzky and Golay (S-G) [106].
Most filters operate in the spatial domain, using moving windows that perform statistical calculations of central value or adaptive methods.However, noise reduction in the SAR time series allows approaches that integrate the spatial and temporal dimensions to improve the quality of noise reduction [107][108][109][110].In this context, three-dimensional (3D) filters use spatial-temporal neighborhood information to derive the noise-free value [111,112] (Figure 3).Considering only the spatial domain, the effect of sliding window filters only increases with the expansion of the window size, which reduces the effective spatial resolution.On the other hand, spatiotemporal filters increase the amount of information in the time domain, reducing the edge-blurred effect.For example, the central pixel value of a 3 × 3 window uses nine pixels in the calculation, while a 3 × 3 × 3 cubic window uses 27 pixels in the same area.We adapted the Gamma adaptive filter [113] for a 3D window considering spatiotemporal data.The Gamma filter allows image conservation, filtering homogeneous surfaces, and preserving the edges.The window size used was 5 × 5 in the spatial domain and 9 in the temporal dimension.Besides, we use the Savitzky and Golay (S-G) method for smoothing the SAR data in the temporal domain [104].The S-G filter combines noise elimination and preserves the phenological attributes (height, shape, and asymmetry) [101,114,115].This procedure established a long-term change trend curve that eliminates small interferences still present and establishes a gradual process of the annual rice-crop cycle.

RNN and Traditional Machine Learning Models
RNN architectures are very efficient when dealing with sequenced data, and its application has increased in the last few years in many scientific areas [116].Hochreiter et al. [76] proposed Long Short-Term Memory (LSTM) architecture, one of the most common RNN architecture, which shows state-of-the-art results for sequential data tasks because of its ability to capture long time Besides, we use the Savitzky and Golay (S-G) method for smoothing the SAR data in the temporal domain [104].The S-G filter combines noise elimination and preserves the phenological attributes (height, shape, and asymmetry) [101,114,115].This procedure established a long-term change trend curve that eliminates small interferences still present and establishes a gradual process of the annual rice-crop cycle.

RNN and Traditional Machine Learning Models
RNN architectures are very efficient when dealing with sequenced data, and its application has increased in the last few years in many scientific areas [116].Hochreiter et al. [76] proposed Long Short-Term Memory (LSTM) architecture, one of the most common RNN architecture, which shows state-of-the-art results for sequential data tasks because of its ability to capture long time dependencies [117].The architecture structure presents a memory cell that holds a current state over different sequential instances and non-linear dependencies that control information entering and exiting the cell [117,118].Figure 4 shows LSTM architecture, where Xt is the input vector, Ct is the memory from the current block, and ht is the output of the current block.Sigmoid (σ) and hyperbolic tangent (tanh) are the nonlinearities and the vector operations are element-wise multiplication (x) and element-wise concatenation (+).Furthermore, Ct-1 and ht-1 are the memory and output from the previous block.
dependencies [117].The architecture structure presents a memory cell that holds a current state over different sequential instances and non-linear dependencies that control information entering and exiting the cell [117,118].Figure 4 shows LSTM architecture, where Xt is the input vector, Ct is the memory from the current block, and ht is the output of the current block.Sigmoid (σ) and hyperbolic tangent (tanh) are the nonlinearities and the vector operations are element-wise multiplication (x) and element-wise concatenation (+).Furthermore, Ct-1 and ht-1 are the memory and output from the previous block.Many studies have been carried out to improve the LSTM architecture.One of the most powerful approaches is Bidirectional LSTM (Bi-LSTM).Bi-LSTM models are generally more effective when handling contextual information since their output at a given time depends on both previous and next segments, contrasting with LSTM models that are unidirectional [119,120].The Bi-LSTM architecture has a forward and a backward layer that consists of LSTM units to apprehend past and future information (Figure 5).Many studies have been carried out to improve the LSTM architecture.One of the most powerful approaches is Bidirectional LSTM (Bi-LSTM).Bi-LSTM models are generally more effective when handling contextual information since their output at a given time depends on both previous and next segments, contrasting with LSTM models that are unidirectional [119,120].The Bi-LSTM architecture has a forward and a backward layer that consists of LSTM units to apprehend past and future information (Figure 5).dependencies [117].The architecture structure presents a memory cell that holds a current state over different sequential instances and non-linear dependencies that control information entering and exiting the cell [117,118].Figure 4 shows LSTM architecture, where Xt is the input vector, Ct is the memory from the current block, and ht is the output of the current block.Sigmoid (σ) and hyperbolic tangent (tanh) are the nonlinearities and the vector operations are element-wise multiplication (x) and element-wise concatenation (+).Furthermore, Ct-1 and ht-1 are the memory and output from the previous block.Many studies have been carried out to improve the LSTM architecture.One of the most powerful approaches is Bidirectional LSTM (Bi-LSTM).Bi-LSTM models are generally more effective when handling contextual information since their output at a given time depends on both previous and next segments, contrasting with LSTM models that are unidirectional [119,120].The Bi-LSTM architecture has a forward and a backward layer that consists of LSTM units to apprehend past and future information (Figure 5).In the present research, we used the LSTM and Bi-LSTM with two hidden layers (122 neurons in the first layer and 61 neurons in the second layer) and softmax activation function (7 outputs).We trained the models with the following hyperparameters: (a) 5000 epochs; (b) dropout rate of 0.2; (c) Adam optimizer with a starting learning rate of 0.01 (divided by 10 after 1000 epochs); and (d) mini-batch size of 128.Moreover, we used categorical cross-entropy loss function [121] presented in Equation ( 1), where i is the element's number, y is the ground truth label, y' is the predicted label, and K is the number of classes.Furthermore, to generalize the behavior from the loss function, Equation (2) shows the cost where Ω is the total set of weights and m is the total number of elements.
In this paper, we compared the RNN methods (LSTM and Bi-LSTM) with traditional Machine Learning methods used in the remote sensing application: (1) Support Vector Machine (SVM) [122]; (2) Random Forest (RF) [123][124][125]; (3) k-NN [126], and (4) Normal Bayes.The SVM algorithm allocates prediction vectors into a higher dimensional plane, separating the different classes using linear or non-linear kernel functions [127].The Random Forest algorithm introduced by Breiman [128] combines a high amount of binary classification trees with several bootstrap values, obtaining the final value from the majority vote from all individual trees.The k-NN method is widely used in remote sensing imagery and does not require training a model since the classification of a given point assumes the majority vote from its k nearest neighbors [129].For last, the Bayesian classifier predicts a given class considering feature vectors from each class are normally distributed, usually presenting better results in data that has a high degree of feature dependencies [130].
The dataset was a manual collection of 28,000-pixel samples (denoised temporal signatures), containing seven classes with equal distribution (4000 samples per class), showing a well-distributed, systematic, and stratified sampling [131] (Figure 6).In this manual sampling of the ground truth points, we used previous surveys of mapping rice crops and high-resolution images from Google Earth to determine the exact location of known points.The sampling process considered balanced data sets that positively impacted the classification results [132].The train/validation/test split had a total of 19,600-pixel samples for training (70%), 5600-pixel samples for validation (20%), and 2800-pixel samples for testing (10%).The mapping considered the following land-use/land-cover classes: rice crops, the fallow period between rice crops, bodies of water (rivers, lakes, and reservoir), riparian forest/reforestation, grassland, flooded grassland, and other types of crops.We disregard the areas of the city using a mask.In the study region, the vast majority of the rice grown occurred in lowland areas using flood irrigation.The visual interpretation of high-resolution images quickly detects this type of management.The planting has a characteristic pattern, containing the dikes that follow the contour curve and the transversal dikes that divide the land into rectangular plots.The captured and retention of water allows the field to flood during the growing season.

Accuracy Assessment
Accurate analysis of remote sensing image classifications is essential to compare algorithms and establish product quality.In the present study, we use the accuracy metrics from confusion matrix widely used in studies of land-use/land-cover classification, including the overall accuracy, Kappa coefficient, and commission and omission errors [133,134].Overall accuracy is the most straightforward metric, calculated by dividing the sum of pixels correctly classified by the total number of pixels analyzed [134].Congalton [133] introduced the Kappa coefficient for accuracy analysis of remote sensing data.The Kappa coefficient is a popular measure of agreement among raters (intermediate reliability) with a value range of −1 to +1 [135].Commission and omission errors are a very effective way of representing the accuracy of each classified category.Commission error (Type I-false positive) occurs when classifying a pixel as a land-cover/land-use category that is not.Meanwhile, the omission error (Type II-false negative) is not to classify the pixel as a land-cover/landuse category when it is.
We used the McNemar test [136] to assess the statistical significance of differences in remote sensing classification accuracy [137].McNemar's test principle is based on the evaluation of a contingency table with a 2 × 2 dimension considering only correct and incorrect points for two different methods (Table 1).

Accuracy Assessment
Accurate analysis of remote sensing image classifications is essential to compare algorithms and establish product quality.In the present study, we use the accuracy metrics from confusion matrix widely used in studies of land-use/land-cover classification, including the overall accuracy, Kappa coefficient, and commission and omission errors [133,134].Overall accuracy is the most straightforward metric, calculated by dividing the sum of pixels correctly classified by the total number of pixels analyzed [134].Congalton [133] introduced the Kappa coefficient for accuracy analysis of remote sensing data.The Kappa coefficient is a popular measure of agreement among raters (intermediate reliability) with a value range of −1 to +1 [135].Commission and omission errors are a very effective way of representing the accuracy of each classified category.Commission error (Type I-false positive) occurs when classifying a pixel as a land-cover/land-use category that is not.Meanwhile, the omission error (Type II-false negative) is not to classify the pixel as a land-cover/land-use category when it is.
We used the McNemar test [136] to assess the statistical significance of differences in remote sensing classification accuracy [137].McNemar's test principle is based on the evaluation of a contingency table with a 2 × 2 dimension considering only correct and incorrect points for two different methods (Table 1).In the relative comparison between different classification algorithms, the chi-square (χ2) statistic has one degree of freedom and uses exclusively discordant samples (f 12 e f 21 ) (Equation ( 3)).The null hypothesis of marginal homogeneity means equivalent results between the two classifications.If the result χ 2 is higher than the values of the χ 2 distribution table, we reject the null hypothesis and assume that the marginal proportion of each classification method is significant and with different behavior.As in other studies that compare classification algorithms, we used the same set of validation data for McNemar's analysis, consisting of a sample independent of the training set.Therefore, the test used 1200 samples per mapping unit (totaling 8400 samples).

Image Denoising
The combination of the 3D Gamma and S-G filters exhibited good results in the noise elimination of the Sentinel time series.The 3D Gamma filter provides an intense minimization of speckles, but still maintains some minor unwanted alterations in the temporal trajectories.The S-G filter with a window size of 19 refined the result obtained by the 3D Gamma filter, smoothing the temporal profile without interfering with the maximum and minimum values and ensuring data integrity.Figure 7 demonstrates the effects of the application of filtering techniques in the Sentinel-1 time series.

Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 26
In the relative comparison between different classification algorithms, the chi-square (χ2) statistic has one degree of freedom and uses exclusively discordant samples (f12 e f21) (Equation ( 3)).The null hypothesis of marginal homogeneity means equivalent results between the two classifications.If the result χ 2 is higher than the values of the χ 2 distribution table, we reject the null hypothesis and assume that the marginal proportion of each classification method is significant and with different behavior.As in other studies that compare classification algorithms, we used the same set of validation data for McNemar's analysis, consisting of a sample independent of the training set.Therefore, the test used 1200 samples per mapping unit (totaling 8400 samples).

Image Denoising
The combination of the 3D Gamma and S-G filters exhibited good results in the noise elimination of the Sentinel time series.The 3D Gamma filter provides an intense minimization of speckles, but still maintains some minor unwanted alterations in the temporal trajectories.The S-G filter with a window size of 19 refined the result obtained by the 3D Gamma filter, smoothing the temporal profile without interfering with the maximum and minimum values and ensuring data integrity.Figure 7 demonstrates the effects of the application of filtering techniques in the Sentinel-1 time series.Effect of noise elimination in rice crop temporal trajectory using a 3D convolutional window in the spatial direction "x" and "y" and the temporal direction "t" (x = 5, y = 5, t = 9) and a second smoothing filter using the Savitzky and Golay (S-G) method.The original data is in black, 3D Gamma filter in red, and smoothing with S-G method in green.

Temporal Backscattering Signatures
The study area is flat, which reduces the sensitivity to terrain variations and improves the detection of targets.Figures 8-13 show the average backscattering time series in the cross-polarized data (vertical transmit, horizontal receive-VH) with their corresponding standard deviation for all targets and detailed images of some classes.The temporal profiles show distinctive seasonal backscatter patterns for the analyzed targets, which offers a high contrast of the rice cultivation areas to their surroundings.The riparian forest shows a backscattering time series with the highest values over the whole time series with a stable backscatter between −15.50 and −13.00 dB.Open water Figure 7. Effect of noise elimination in rice crop temporal trajectory using a 3D convolutional window in the spatial direction "x" and "y" and the temporal direction "t" (x = 5, y = 5, t = 9) and a second smoothing filter using the Savitzky and Golay (S-G) method.The original data is in black, 3D Gamma filter in red, and smoothing with S-G method in green.

Temporal Backscattering Signatures
The study area is flat, which reduces the sensitivity to terrain variations and improves the detection of targets.Figures 8-13 show the average backscattering time series in the cross-polarized data (vertical transmit, horizontal receive-VH) with their corresponding standard deviation for all targets and detailed images of some classes.The temporal profiles show distinctive seasonal backscatter patterns for the analyzed targets, which offers a high contrast of the rice cultivation areas to their surroundings.The riparian forest shows a backscattering time series with the highest values over the whole time series with a stable backscatter between −15.50 and −13.00 dB.Open water surfaces have the lowest backscatter values (<−24.5 dB throughout the year) since the surface has a specular behavior that reflects the emitted sensor energy.
The rice temporal series almost had a Gaussian behavior for the crop cycles (Figure 8).This Gaussian behavior for rice planting is in agreement with the study developed by Bazzi et al. [58].In the study region (Western Frontier and Upper Valley of Uruguay), favorable sowing periods for medium-cycle cultivars are from September 21 to November 20 [138].While for early-cycle cultivars are from 11 October to 10 December [138].Although varieties with different cycle lengths can coincide, early-cycle cultivars are delayed by about 10 to 15 days to avoid the critical cold period in December that is harmful to the crop [138].Very late sowing is avoided due to low productivity levels.The harvested area of the 2017/18 crop reached the percentage of 35% of the area harvested close to 29 March and 95% close to 10 May.
Remote Sens. 2020, 12, x FOR PEER REVIEW 10 of 26 surfaces have the lowest backscatter values (<−24.5 dB throughout the year) since the surface has a specular behavior that reflects the emitted sensor energy.The rice temporal series almost had a Gaussian behavior for the crop cycles (Figure 8).This Gaussian behavior for rice planting is in agreement with the study developed by Bazzi et al. [58].In the study region (Western Frontier and Upper Valley of Uruguay), favorable sowing periods for medium-cycle cultivars are from September 21 to November 20 [138].While for early-cycle cultivars are from 11 October to 10 December [138].Although varieties with different cycle lengths can coincide, early-cycle cultivars are delayed by about 10 to 15 days to avoid the critical cold period in December that is harmful to the crop [138].Very late sowing is avoided due to low productivity levels.The harvested area of the 2017/18 crop reached the percentage of 35% of the area harvested close to 29 March and 95% close to 10 May.In addition to the dates of cultivation, an essential factor in the description of temporal signatures is agronomic practices.The irrigated rice tillage systems for the 2017/2018 harvest in Western Frontier (RS) showed a predominance of minimal tillage use by 74.5% of the planted areas [139].The remaining regions present 24.1% with the traditional system and 0.3% with the sum of notillage and the pre-germinated system [139].The minimum tillage promotes soil conservation, making management with low interference that supports the presence of vegetation cover and reduces surface irregularities caused by harvesters.Sowing takes place directly in the soil, under a vegetation cover previously desiccated with herbicide.According to Li et al. [140], the minimum tillage system can generate a grain yield of 2.1% higher than conventional planting and produce economic benefits by 11.0%.Therefore, the high use of minimum tillage is a simple conservation practice in humid areas, not preventing rice transplantation or increasing labor costs.
Thus, the rice-crop time series shows the lowest σ° values at the beginning of planting between September and November (with average value on 28 October, 2017).During this period, the use of herbicide eliminates the vegetation cover, followed by flooding of rice fields, resulting in low SAR values.Figure 9 shows the Google Earth image from the beginning of the flood of the field in 2016.In addition to the dates of cultivation, an essential factor in the description of temporal signatures is agronomic practices.The irrigated rice tillage systems for the 2017/2018 harvest in Western Frontier (RS) showed a predominance of minimal tillage use by 74.5% of the planted areas [139].The remaining regions present 24.1% with the traditional system and 0.3% with the sum of no-tillage and the pre-germinated system [139].The minimum tillage promotes soil conservation, making management with low interference that supports the presence of vegetation cover and reduces surface irregularities caused by harvesters.Sowing takes place directly in the soil, under a vegetation cover previously desiccated with herbicide.According to Li et al. [140], the minimum tillage system can generate a grain yield of 2.1% higher than conventional planting and produce economic benefits by 11.0%.Therefore, the high use of minimum tillage is a simple conservation practice in humid areas, not preventing rice transplantation or increasing labor costs.
Thus, the rice-crop time series shows the lowest σ • values at the beginning of planting between September and November (with average value on 28 October 2017).During this period, the use of herbicide eliminates the vegetation cover, followed by flooding of rice fields, resulting in low SAR values.Figure 9 shows the Google Earth image from the beginning of the flood of the field in 2016.With the growth of planting, the σ • values gradually increase, reaching its peak in May, when the backscatter values start to fall due to the beginning of the harvest.The decrease is gradual due to the practice of minimum tillage that keeps the vegetation cover.The lowest values return in 2018 between October and November with the beginning of the new planting cycle.Figure 10 shows Google Earth and Sentinel-1 images related to different moments of the rice plantation and their respective positions in the temporal trajectory.In the time signature, the date of 13 January 2017 shows the stage of development of the planting, while the period of 27 August 2018 shows the initial phase of the new planting cycle.

Remote Sens. 2020, 12, x FOR PEER REVIEW 11 of 26
With the growth of planting, the σ° values gradually increase, reaching its peak in May, when the backscatter values start to fall due to the beginning of the harvest.The decrease is gradual due to the practice of minimum tillage that keeps the vegetation cover.The lowest values return in 2018 between October and November with the beginning of the new planting cycle.Figure 10 shows Google Earth and Sentinel-1 images related to different moments of the rice plantation and their respective positions in the temporal trajectory.In the time signature, the date of 13 January, 2017 shows the stage of development of the planting, while the period of 27 August, 2018 shows the initial phase of the new planting cycle.The rice planting areas with irrigation dikes also exhibit temporal signatures without a significant drop in σ° values (Figure 11).Generally, these areas correspond to a fallow period that can extend for up to two years and are sometimes underutilized by cattle.The rice planting areas with irrigation dikes also exhibit temporal signatures without a significant drop in σ° values (Figure 8).The presence of other plantations was merged in a class characterized by slightly higher values and lower amplitude than rice planting (Figure 12).The rice planting areas with irrigation dikes also exhibit temporal signatures without a significant drop in σ • values (Figure 11).Generally, these areas correspond to a fallow period that can extend for up to two years and are sometimes underutilized by cattle.The rice planting areas with irrigation dikes also exhibit temporal signatures without a significant drop in σ • values (Figure 8).The presence of other plantations was merged in a class characterized by slightly higher values and lower amplitude than rice planting (Figure 12).Finally, we separated the floodplain areas with a distinctive feature due to the flood event's presence in May-June 2017 (Figure 13).Therefore, these flooded grasslands show a significant decrease in backscatter values during the flood event (blue bar), acquiring well-marked temporal signatures (Figure 13).The duration of the slope in the backscatter values depends on the extent of the flood and its intensity may vary according to its position on the ground.Finally, we separated the floodplain areas with a distinctive feature due to the flood event's presence in May-June 2017 (Figure 13).Therefore, these flooded grasslands show a significant decrease in backscatter values during the flood event (blue bar), acquiring well-marked temporal signatures (Figure 13).The duration of the slope in the backscatter values depends on the extent of the flood and its intensity may vary according to its position on the ground.Finally, we separated the floodplain areas with a distinctive feature due to the flood event's presence in May-June 2017 (Figure 13).Therefore, these flooded grasslands show a significant decrease in backscatter values during the flood event (blue bar), acquiring well-marked temporal signatures (Figure 13).The duration of the slope in the backscatter values depends on the extent of the flood and its intensity may vary according to its position on the ground.

Comparison between RNN and Traditional Machine Learning Methods
The training stage for each model presented different procedures to achieve the optimal configuration.To obtain the best LSTM and Bi-LSTM models, we monitored the categorical validation loss.Both models presented low categorical loss values (0.027 for Bi-LSTM and 0.029 for LSTM).Traditional Machine Learning methods require parameter tuning.SVM with third-degree polynomial kernels, RF with 150 trees, k-NN with ten neighbors, and Normal Bayes with 1 × 10 −9 variation smoothing presented the best results.After performing the optimal hyper-parameter tuning, we analyzed the model's performances by comparing their test samples confusion matrixes (Figure 14), accuracy, kappa score (Table 2), Commission and Omission Errors (Tables 3 and 4), and McNemar Test (Table 5).

Comparison between RNN and Traditional Machine Learning Methods
The training stage for each model presented different procedures to achieve the optimal configuration.To obtain the best LSTM and Bi-LSTM models, we monitored the categorical validation loss.Both models presented low categorical loss values (0.027 for Bi-LSTM and 0.029 for LSTM).Traditional Machine Learning methods require parameter tuning.SVM with third-degree polynomial kernels, RF with 150 trees, k-NN with ten neighbors, and Normal Bayes with 1 × 10 −9 variation smoothing presented the best results.After performing the optimal hyper-parameter tuning, we analyzed the model's performances by comparing their test samples confusion matrixes (Figure 14), accuracy, kappa score (Table 2), Commission and Omission Errors (Tables 3 and 4), and McNemar Test (Table 5).Figure 15 shows the results of the four best models.The macro analysis presented in Table 2 shows high accuracy and kappa values for all models (>97% in all metrics).The filter applied in the raw data can explain this behavior because it increases inter-class similarities and maximizes extra-classes differences.In addition, the seven classes are very different from each other, increasing the classifier's effectiveness.Despite the high metrics in the macro analysis, there is a significant difference within the most critical classes for this study (1 and 2).We can observe from the Confusion Matrixes (Figure 14) that Classes 3, 4, 5, 6, and 7 present very similar results for all models.Classes 1 and 2 show the most significant values of confusion between classes, which are the most critical for the present research.Furthermore, the evaluation of the Commission and Omission errors within those classes clearly indicates that the RNN models had more consistent results than other methods.To verify the statistical differences between the models, we performed the McNemar test presented in Table 5.Thus, using a significance level of 0.05, two pairs of models are equivalent (RF/SVM and NB/k-NN).Despite the high metrics in the macro analysis, there is a significant difference within the most critical classes for this study (1 and 2).We can observe from the Confusion Matrixes (Figure 14) that Classes 3, 4, 5, 6, and 7 present very similar results for all models.Classes 1 and 2 show the most significant values of confusion between classes, which are the most critical for the present research.Furthermore, the evaluation of the Commission and Omission errors within those classes clearly indicates that the RNN models had more consistent results than other methods.To verify the statistical differences between the models, we performed the McNemar test presented in Table 5.Thus, using a significance level of 0.05, two pairs of models are equivalent (RF/SVM and NB/k-NN).
The results obtained show that even though the Deep Learning and Machine Learning models presented high values, the Deep Learning models had statistical advantages and had a significant improvement in the most critical classes.Even though Bi-LSTM and LSTM presented statistical differences, the high values of these models within classes 1 and 2 are due to the temporal signature behavior, which shows a relatively symmetrical shape with sinusoidal (periodic) characteristics.In this scenario, the forward and backward sequences within the RNN model's trajectory are very similar, with opposing derivatives for each instance in the time-sequenced data, yet with the same approximate module.
Furthermore, another significant advantage of the RNN models is the computational cost in the classification.The time to classify the 13000 × 13000 pixel image is about 50 min for Bi-LSTM and LSTM, 104 min for SVM, 110 min for Random Forest, 13 h for k-NN, and 3 h for Normal Bayes.The RNN methods are two times faster than SVM (the faster Traditional Machine Learning method) in the classification processing, having a much more useful application in practical scenarios.

Discussion
There are few remote sensing studies for detecting rice cultivation in Brazilian regions (although Brazil is one of the largest rice producers globally), mainly using Deep Learning methods, such as LSTM and Bi-LSTM.In this research, the best results for the LSTM and SVM models are in line with other research developed in the classification of remotely sensed data.Commonly, remote sensing classifications using LSTM show higher accuracy than other machine learning methods.In the detection of urban features using Landsat 7 images, Lyu et al. [77] show better performance of LSTM (95% OA) compared to SVM (80% OA) and Decision Tree (70% OA).Rußwurm and Körner [79] obtain better results with LSTM (90.6% OA) than CNN (89.2% OA) and SVM (40.9% OA) for land cover classification using a temporal sequence of Sentinel-2 images.Mou et al. (2018b) describe the superiority of the combination CNN and LSTM (~98% OA) over SVM (~95% OA) and Decision Tree (~85% OA) for change detection in multispectral images.
However, other researches demonstrate worse LSTM performance when compared to other learning machine methods.In the detection of winter wheat in MODIS images, He et al. [78] obtained a better RF result (0.72 F-Score) than attention-based LSTM (0.71 F-Score).Zhong et al. [141] classified summer crops in California, using the Landsat Enhanced Vegetation Index (EVI) time series.LSTM had the worst performance among all classifiers (82.41%OA and 0.67 F-Score), such as Conv1D-based model (85.54%OA and 0.73 F-Score), XGBoost (84.17%OA and 0.69 F-Score), MLP (83.81%OA and 0.69 F-Score), RF (83.38%OA and 0.67 F-Score), and SVM (82.45% OA and 0.68 F-Score).
Among the researches that implemented Machine Learning methods in rice crops, Onojeghuo et al. [65] compared SVM and RF in rice crops located in China, performing different studies in VV and VH images.The best combination had a prevalence of RF (95.2% OA) over SVM (82.5% OA).Son et al. [64] also compared SVM and RF in rice crops in the South Vietnam area.Even though RF better performed than SVM, the difference was smaller (86% OA from RF and 83.4% from SVM).
The present research had higher accuracy metrics and lower discrepancies between models than the studies related above.This behavior can be explained by the high temporal resolution (61-step steps) added with intense smoothing caused by the 3D filter, which increased the intra-class similarities and highlighted the extra-class differences, facilitating the classification algorithms.The results obtained in this research show that RNN and Machine Learning methods bring significant benefits for its ease, accuracy, and simplicity, being much more useful than cense information used nowadays.Furthermore, the RNN shows significant advantages when comparing the processing time of classification, since it is much faster than any traditional Machine Learning method.
The hybrid combination of convolutional and recurrent networks has applications in this study area in future studies.In this context, Teimouri et al. [81] combined the Fully Convolutional Network (FCN) and Long-Term Convolutional Memory (ConvLSTM) to classify 14 main classes of crops using Sentinel-1 temporal images in Denmark, reaching an accuracy of 86% and Intersection over Union of 0.64, respectively.Another important future research is to evaluate the classification performance using the VH and VV bands together.

Conclusions
Crop mapping using SAR temporal series is an alternative to the census survey by field interviews because of its low cost and speed, supporting the Rio Grande do Sul territorial planning.In this article, we applied different models to identify large-scale rice crops from the Sentinel-1 time-series imagery, covering the municipality of Uruguaiana, Brazil's largest producer.The model application considered a multiclass mapping to delimit the rice plantations in comparison to its surroundings.The temporal behavior of the backscatter coefficient of Sentinel-1 in the polarization of the VH on different types of targets has features and patterns that differentiate them from rice planting plots.We created a sufficiently extensive and balanced training and test dataset to improve results and statistical analysis.RNN models presented a state-of-the-art performance in identifying rice crops, with accuracy and Kappa values greater than 0.98.The comparison of these models indicates that Bi-LSTM presented the best results with statistical differences considering the significance of 0.05.An advantage of Deep Learning models is the speed with GPU acceleration being significantly faster than the traditional machine learning methods in the classification processing.The high accuracy values are due to the distinction between time series behaviors.Rice plantations present a gradual and wide variation throughout the year, acquiring a distinct temporal signature for each target.Besides, continuous and denser time series obtained from SAR data, without interference or data loss, improve the accuracy and generalizability of models.The results demonstrate that Bi-LSTM and LSTM models achieve better performance when compared to traditional Machine Learning methods.Future research may include new architectures, integrating temporal and spatial domains, such as CNN and RNN.Another approach is to analyze non-periodic time series data, which tends to highlight the differences between Bi-LSTM and LSTM, also presenting a more significant gap between Deep Learning and traditional Machine Learning models.
): (a) acquisition of the Sentinel time series (10-m resolution) considering the phenological behavior of the plantation; (b) data pre-processing and minimizing noise from 3D spatial-temporal filters and smoothing with Savistky-Golay filter; (c) comparison of classification methods (LSTM, Bidirectional LSTM, SVM, RF, k-NN, and NB) and; (d) accuracy analysis.Remote Sens. 2020, 12, x FOR PEER REVIEW 4 of 26 ): (a) acquisition of the Sentinel time series (10-m resolution) considering the phenological behavior of the plantation; (b) data preprocessing and minimizing noise from 3D spatial-temporal filters and smoothing with Savistky-Golay filter; (c) comparison of classification methods (LSTM, Bidirectional LSTM, SVM, RF, k-NN, and NB) and; (d) accuracy analysis.

Figure 3 .
Figure 3. 3D convolutional filter considering a time series data with spatial dimensions "x" and "y" and temporal dimension "t".

Figure 3 .
Figure 3. 3D convolutional filter considering a time series data with spatial dimensions "x" and "y" and temporal dimension "t".

Figure 6 .
Figure 6.Map of ground truth points.

Figure 6 .
Figure 6.Map of ground truth points.

Figure 7 .
Figure 7. Effect of noise elimination in rice crop temporal trajectory using a 3D convolutional window in the spatial direction "x" and "y" and the temporal direction "t" (x = 5, y = 5, t = 9) and a second smoothing filter using the Savitzky and Golay (S-G) method.The original data is in black, 3D Gamma filter in red, and smoothing with S-G method in green.

Figure 8 .
Figure 8. Behavior of the mean time series with the respective standard deviation bars for the main targets (rice planting, bodies of water, riparian forest, and grassland) present in the region of Uruguaina, southern Brazil.

Figure 8 .
Figure 8. Behavior of the mean time series with the respective standard deviation bars for the main targets (rice planting, bodies of water, riparian forest, and grassland) present in the region of Uruguaina, southern Brazil.

Figure 9 .
Figure 9. Google Earth image demonstrating the flooding of rice fields in the Uruguaina region, southern Brazil.

Figure 9 .
Figure 9. Google Earth image demonstrating the flooding of rice fields in the Uruguaina region, southern Brazil.

Figure 10 .
Figure 10.Google Earth and Sentinel 1 images for rice crop at two different times: planting under development (13 January, 2017) and at the beginning of planting (27 August, 2018).The time trajectory of the pixel marked in red in the Sentinel 1 images shows the positioning of the scenes over time (red arrows).

Figure 10 .
Figure 10.Google Earth and Sentinel 1 images for rice crop at two different times: planting under development (13 January 2017) and at the beginning of planting (27 August 2018).The time trajectory of the pixel marked in red in the Sentinel 1 images shows the positioning of the scenes over time (red arrows).

Figure 11 .
Figure 11.The average time series behavior with the respective standard deviation bars for areas during the fallow period between rice cultivation, without showing the typical characteristic of the beginning of planting.

Figure 12 .
Figure 12.Behavior of the average time series with the respective standard deviation bars for the other plantations (background areas).

Figure 11 .
Figure 11.The average time series behavior with the respective standard deviation bars for areas during the fallow period between rice cultivation, without showing the typical characteristic of the beginning of planting.

Figure 11 .
Figure 11.The average time series behavior with the respective standard deviation bars for areas during the fallow period between rice cultivation, without showing the typical characteristic of the beginning of planting.

Figure 12 .
Figure 12.Behavior of the average time series with the respective standard deviation bars for the other plantations (background areas).

Figure 12 .
Figure 12.Behavior of the average time series with the respective standard deviation bars for the other plantations (background areas).

Figure 13 .
Figure 13.Behavior of the mean time series with the respective standard deviation bars for grassland regions in floodplain, southern Brazil.Sentinel-1 images in the VH polarization representing: (A) flood period, and (B) dry period.

Figure 13 .
Figure 13.Behavior of the mean time series with the respective standard deviation bars for grassland regions in floodplain, southern Brazil.Sentinel-1 images in the VH polarization representing: (A) flood period, and (B) dry period.

Figure 15 .
Figure 15.Comparison of classification results within the four best methods in Uruguayan region, located in South Brazil: (a) LSTM, (b) Bi-LSTM, (c) SVM, and (d) RF.

Figure 15 .
Figure 15.Comparison of classification results within the four best methods in Uruguayan region, located in South Brazil: (a) LSTM, (b) Bi-LSTM, (c) SVM, and (d) RF.

Table 1 .
Data layout for McNemar test between two classification results.

Table 1 .
Data layout for McNemar test between two classification results.

Table 2 .
Mean accuracy, Kappa, and F-score values for the models evaluated.

Table 3 .
Representation of commission errors for every class.

Table 4 .
Representation of omission errors for every class.

Table 5 .
McNemar test, where the green cells represent statistically equal models, and the red cells represent statistically different models, using a significance of 0.05.