The difficulty of Thailand paddy rice mapping lies in two aspects. On one hand, the paddy rice in Thailand has various growth patterns. The of the paddy rice fields may differ significantly from each other, so it is difficult to apply the classification methods that rely on the fixed relationship between phenology and time directly. On the other hand, the complicated distribution of croplands in Thailand caused by topography raise up with high requirement for the classification method. A reliable method that is able to identify small and scattered paddy rice fields and meanwhile be resistant to the sporadically distributed false alarms is demanded.
As a solution, this paper comes up with a paddy rice mapping scheme that is independent of the diverse cultivation patterns in Thailand and is capable of extracting small ground parcels despite the broken and irregular shapes. The flow chart of the proposed method is presented in
Figure 5. First, the multitemporal Sentinel-1 images are preprocessed to get the calibrated
. The necessary preprocessing steps include coregistration, filtering, and geocoding. Then, features that capture the key information during growth are extracted from multitemporal SAR data. Considering the big data processing demands for the large area paddy rice mapping, the temporal statistic features of the time-series SAR data are computed and stacked as input to the deep learning network. To fully utilize the pixel-level semantics of the features, the U-Net model [
51] is adopted to train the paddy rice prediction model, which uses the deconvolution layer instead of the pooling layer to construct the decoder structure and is capable to recover the spatial details of feature images. Finally, to improve the compactness and homogeneity of the classification result, the fully connected CRF is introduced to modify the U-Net model.
2.3.2. Extraction of Temporal Statistic Features
As illustrated above, the rice planting cycle in tropical countries such as Thailand can be very complex so that the general evolution models of can hardly be summarized. Hence, three simple but effective temporal statistic features are defined from the dense time-series images, which describe the most prominent SAR characteristics during paddy rice growth.
With the dramatic changes over leaves, stems, and fruits, the interactions between microwave radiation and crop canopy vary with time, leading to a large range of variations during plant growth and periodic changes in multiyear observations. By contrast, for non-agricultural objects, such as buildings, water, and forest, the change of
with time is less significant so that the temporal variance of
in annual SAR observations is the key to separate croplands from other land covers. The computation of
can be expressed as follows, where
indicates the number of images, and
refers to the temporal average value of
images:
Different from dryland crops, paddy rice has the transplanting stage when the plants are flooded underwater [
57]. In this period, the
of paddy rice is only slightly higher than that of the water body, leading to distinct backscattering characteristic difference from other crops. Previous studies that targeted paddy rice mapping or crop classification methods have demonstrated the potential of the temporal behavior of
to distinguish paddy rice from other crops [
54,
55]. Therefore, in this study, the temporal minimum of
in each frame is computed with Equation (2) to determine whether the flooded period exists and to discriminates rice from other crops:
Influenced by factors such as monsoons, floods, aquatic plants, and so on, some water bodies can also display seasonal backscattering changes. If the identification of paddy rice only relies on
, false alarms might occur because of the misclassification of water bodies. Since
of paddy rice rises substantially in vegetative and reproductive stages, the temporal maximum of
in SAR sequences is useful to avoid the influence of water. The computation of
can be expressed as:
To illustrate the potential of the temporal features,
Figure 7 depicts the mean
and
curves of several regions of interest (ROIs), whose locations are shown in
Figure 7a. The error bars represent the standard deviations of each ROI. Firstly, the mean
and
curves of different land covers are compared in
Figure 7d,e. Compared to other land covers, such as buildings, non-rice crop, and forest,
and
curves of paddy rice both show obvious fluctuations, especially from March to October 2019. Meanwhile, from 6 October 2018 to 4 April 2018, the standard deviations of paddy rice were no less than 2 dB and even reached to 4.32 dB (in VH polarization) on 23 November 2018. Other land covers, such as buildings and non-rice crops, also have higher standard deviations, especially in VV polarizations. As a result, in both VH and VV polarizations, the
values of paddy rice and other land covers overlap with each other, which are likely to cause misidentifications if using traditional thresholding methods.
To make further inspection on the diverse paddy rice cultivation patterns, six adjacent paddy rice parcels were selected, whose location is indicated by the yellow box of
Figure 7a.
Figure 7b shows the false-color image composited by
on 11 March, 22 May, and 14 August 2019, whereas
Figure 7c displays the false-color image of
,
, and
in VH polarization. In
Figure 7b, these ROIs display different colors, indicating different backscattering intensities in VH polarization on 11 March, 22 May, and 14 August. In other words, even though spatially close to each other, these ROIs had different cultivation practices, which is also validated by the mean
and
curves shown in
Figure 7e,f. For ROI 1 and ROI 2, triple-season paddy rice was cultivated during the observing period: the first-season lasted from October to February, the second-season lasted from February to May, and the final season lasted from May to September. For ROI 3, two complete paddy rice growing seasons can be observed: the first one was from December to April, and the second one was from April to August. As for ROI 4, only one complete paddy rice growing season is observed, which lasted from December to June. ROI 5 had a similar first two seasons as ROI 1 and ROI 2, but the third paddy rice season was much longer. ROI 6 also cultivated double-season paddy rice, but the growing cycle is different from ROI 3: the first-season lasted from December to May, and the second one was from May to September. As demonstrated by
Figure 7b,f,g, the diverse paddy rice cultivation patterns in Thailand was exactly the most notable characteristic in tropical areas. In contrast, despite the various cultivation practices, these ROIs display similar hues in
Figure 7c (purple, magenta, and red), which indicates that the temporal features are capable to capture the key information of paddy rice fields even under different growing patterns.
Furthermore,
Figure 8 gives the false-color images and corresponding Google Earth optical images of typical land covers. The uniqueness of paddy rice lies in its high values of
and
and meanwhile with a low value of
, which usually leads to magenta in the false-color image. Since the hue is affected by the relative intensity of
and
compared to the whole frame, sometimes paddy rice fields also appear as red or dark blue. Compared to paddy rice, the values of
of other non-rice crops are much higher. In other words, the green component is higher, resulting in green, dark yellow, or brown in the RGB image. Water bodies appear as dark regions because of low
,
, and
. Land covers with stable backscattering intensities, such as buildings and forest, generally have very low
and high
; as a result, these land covers appear yellow or green in the false-color image.
Previous studies have demonstrated the correlation between paddy rice parameters and cross-polarizations (HV or VH) was slightly higher than that of VV polarization in C-band and performed better in paddy rice identification [
52,
55,
58,
59,
60]. The VH polarization is mainly affected by the volume scattering mechanism of the canopy, whereas VV polarization is affected by the double-bounce and surface scattering mechanisms of the canopy and ground surface. The disappearance of standing water, the reflection changes between stems and ground surfaces, and the vertical structure variations of paddy rice during the vegetation stage contribute comprehensively to VV polarization. It is more difficult to summarize the growth pattern of paddy rice using VV polarization, which is also confirmed by the comparison of
Figure 7f,g. Therefore, in this study, only VH polarization is used for paddy rice mapping.
Figure 9 displays the temporal features of VH polarization extracted from frame 99-16. The false-color image shown in
Figure 9d is taken as the training dataset, and the corresponding label image is displayed in
Figure 9e.
Figure 10 shows some examples of the training patches for the paddy rice mapping model, which were randomly selected from the training dataset. When the model is trained, the false-color images of all other frames were extracted and constitute the classification dataset that to be predicted.
Figure 11 shows the false-color temporal feature images of whole Thailand, which is mosaicked and harmonized to get consistent hues. The automatic mosaicking was accomplished by ENVI 5.3. In overlapping regions, the frames on the west and north side was in front of the ones on the east and south side.
2.3.3. The Modified U-Net Model for Paddy Rice Mapping
In this study, the U-Net model was utilized to accomplish the paddy rice mapping task. The fully connected CRF module was introduced to modify the U-Net model to improve the performance of paddy rice extraction in Thailand. The flowchart of the proposed model is shown in
Figure 12.
The classical CNN structure can only tell whether a certain class exists in the input image but cannot predict the semantic information of each pixel. In contrast, FCN outputs a pixel-by-pixel semantic label image corresponding to the input image by replacing the fully connected layers of CNN with the convolution layers so that the output classification map maintains the same resolution as the input images [
61]. In this study, we applied the U-Net model to extract a high-resolution paddy rice map of Thailand. As an improved FCN model, U-Net extracts high-level semantic features while maintaining the spatial details of the input image [
51]. The structure of U-Net is displayed in
Figure 13. The model contains 23 convolution layers in total. The encoder part is consisted of five down-sampling units, where each unit is composed of two 3 × 3 convolution layers and a 2 × 2 max-pooling layer. The decoder contains four up-sampling units, where each unit is composed of two 3 × 3 convolution layers and a 2 × 2 deconvolution layer. Finally, the feature vector of the last up-sampling unit is converted to probability maps by a 1 × 1 convolution layer, where the dimension of probability maps equals the number of classes, and the pixel value of each map represents the probability that the pixel belongs to the corresponding class.
To improve the training efficiency, in this paper, we introduce the Batch Normalization (BN) layer [
62] into the original U-Net model. Before each convolution layer, a BN layer is applied to the input of the activation function (such as a sigmoid function or a ReLU function) to ensure that the input data follows the same distribution whose mean is 0 and variance is 1. The formula for BN can be expressed as:
where
represents the input of mini-batch
,
and
represent the mean and variance in the mini-batch,
and
are scale and bias parameters that need to be trained, and
is a smooth item to assure that the denominator will not be zero.
As mentioned above, the paddy rice plots in Thailand are small, fragmented, and with unclear edges, which may cause broken plots and rough boundaries in the classification results. To solve this problem, this study introduces fully connected CRF [
53] to improve the output of the U-Net model.
CRF is essentially an undirected graph model based on Markov Random Fields (MRF), which can describe the dependence or spatial correlation between pixels [
63]. As shown in
Figure 14a, each node in CRF is composed of the label
and the value
of pixel
, and the edge between two nodes denotes the relationship between two corresponding pixels. Through the spatial correlation between nearby pixels, misidentifications will be effectively eliminated, leading to consistent classification results. As a modification of CRF, fully connected CRF links all the pixels in the image to avoid over-smoothing caused by spatial modeling in a limited neighborhood, as shown in
Figure 14b. The pixel value
and the spatial distances between pixel
and others are both considered to modify the label
. The energy function of fully connected CRF can be expressed by:
where
is the unary potential energy provided by
, which is the probability map generated by the softmax function of U-Net model;
is the binary potential energy provided by adjacent pixels
and
;
is a label compatibility function;
is the linear combination weight;
is the Gaussian kernel function that considers the spatial similarity and pixel value similarity comprehensively and assigns the same semantic label to similar pixels; and
and
are feature vectors of pixels
and
in an arbitrary feature space. The detailed expressions of
were given in [
53], which presented a fast inference algorithm of applying fully connected CRF. The probability maps acquired by U-Net as well as the original feature images are used to calculate the energy function of fully connected CRF, which is minimized iteratively to obtain the final paddy rice mapping results.