Classification and Mapping of Paddy Rice by Combining Landsat and SAR Time Series Data

Rice is an important food resource, and the demand for rice has increased as population has expanded. Therefore, accurate paddy rice classification and monitoring are necessary to identify and forecast rice production. Satellite data have been often used to produce paddy rice maps with more frequent update cycle (e.g., every year) than field surveys. Many satellite data, including both optical and SAR sensor data (e.g., Landsat, MODIS, and ALOS PALSAR), have been employed to classify paddy rice. In the present study, time series data from Landsat, RADARSAT-1, and ALOS PALSAR satellite sensors were synergistically used to classify paddy rice through machine learning approaches over two different climate regions (sites A and B). Six schemes considering the composition of various combinations of input data by sensor and collection date were evaluated. Scheme 6 that fused optical and SAR sensor time series data at the decision level yielded the highest accuracy (98.67% for site A and 93.87% for site B). Performance of paddy rice classification was better in site A than site B, which consists of heterogeneous land cover and has low data availability due to a high cloud cover rate. This study also proposed Paddy Rice Mapping Index (PMI) considering spectral and phenological characteristics of paddy rice. PMI represented well the spatial distribution of paddy rice in both regions. Google Earth Engine was adopted to produce paddy rice maps over larger areas using the proposed PMI-based approach.


Introduction
Rice is one of the important food resources for most of the world's population, especially those in Asia [1,2]. The annual average of rice consumption per capita was approximately 54 kg during 2017-2018 (Agricultural Market Information System (AMIS), http://statistics.amis-outlook.org/), and the consumption has increased with increasing world population [3]. Muthayya et al. [4] noted that the demand for rice over the next 30 years is expected to increase by 90% in Asia. Paddy rice mapping and monitoring are crucial for food security and agricultural mitigation because they allow for identifying and forecasting rice production [5]. In general, the annual or seasonal surveys have been conducted for measuring cultivated areas during the crop growing season [5,6]. The sampling survey, however, is costly, labor-intensive, and time-consuming. In addition, the survey based paddy rice mapping has a relatively slow update cycle (e.g., every five years). Rice cultivated areas often change due to human activities as well as natural disasters such as drought, floods, and wildfire. Thus, up-to-date approaches to improve the identification of paddy rice areas over two distinct regions with different climatic and environmental characteristics: 1. paddy rice classification using multi-sensor fusion (optical sensor and SAR) and multi-temporal data through machine learning approaches (RF and SVM); 2.
development of a paddy rice mapping index (PMI) while considering spectral and phenological characteristics; and, 3. paddy rice classification using Google Earth Engine based on the PMI approach over larger areas.
The present study aims at advance paddy rice classification by expanding the dimensionality of data on both spectral and temporal domains. We conducted paddy rice classification through fusion of optical sensor and SAR time series data using two machine learning approaches, RF and SVM. This study examined six schemes to identify the effect of data dimensionality on paddy rice classification. The six schemes composed of various combinations of input data by sensor and collection date considering the phenology of paddy rice over two different regions. This study also proposed Paddy rice Mapping Index (PMI) when considering the spectral and temporal characteristics of paddy rice that can be applied to areas that have different climatic and environmental characteristics.

Study Area
The study area consisted of two regions with different climatic and environmental characteristics ( Figure 1). The first study site (site A) was Sutter County in California State, United States, which is known for its high rice productivity. About 90% of site A is composed of vegetated areas, and almost 40% of the vegetated areas are paddy rice. Temperature ranges from 4.06 • C in January to 36.11 • C in July on average, and the annual precipitation is around 558.8 mm. In site A, rice is planted mid-May by dropping rice seeds into flooded paddies from small planes (i.e., direct seeding) and harvested between September and October ( Figure 2). The second study site (site B) was Dangjin in South Korea. Site B is one of the highest rice production regions in South Korea. Site B has relatively heterogeneous land covers, with many fragmented patches when compared to site A. The annual mean temperature is 11.4 • C and the annual precipitation is 1158.7 mm. Since South Korea has a rainy season between June and July caused by the Asian Monsoon, it is not easy to collect cloud-free satellite optical sensor images during this time. Rice planting and harvesting in site B occur at similar time intervals as site A ( Figure 2). Unlike site A, manual or machine-based rice transplantation in May is common in site B.

Satellite Data
In this study, two types of time series satellite data (optical and SAR) were used. Multispectral Landsat 5, 7, and 8 level-1 images were acquired from the United States Geological Survey (USGS)  (Tables 1 and 2). Unlike site A, where enough cloud free images for every month are available, it is hard to obtain clear images at site B during the rainy season (June to July). Figure 2 shows data availability with crop schedules by month. Atmospheric correction using ENVI Fast Line-of-sight Atmospheric Analysis of Hypercubes (FLAASH) to Landsat images were conducted to produce scaled reflectance data. Then, Normalized Difference Water Index (NDWI) and Normalized Difference Vegetation Index (NDVI) were calculated using Green, Red, and NIR reflectance data (Equations (1) and (2)). Multi temporal SAR images from the Advanced Land Observing Satellite (ALOS) Phased Array type L-band SAR (PALSAR) (site A) and RADARSAT-1 (site B) for horizontal transmit and horizontal receive (HH) polarization data were used. According to Zhang et al. [32] and Le Toan et al. [33], among the SAR signals at different polarizations, horizontal (HH) and vertical (VV) polarizations, the HH backscattered signals are higher and are more dominated by specular reflection for the rice canopy and water surfaces than the VV backscattered ones. HH polarization data have been used in many paddy rice mapping studies [17,32,34,35]. ALOS PALSAR L band fine beam data with HH polarization at 6.25 m spatial resolution were used. A total of 10 images from ALOS PALSAR between 2006 and 2009 were used for site A (Table 1). A total of 6 images collected from the RADARSAT-1 C band standard mode with HH polarization at 30 m spatial resolution between 2003 and 2005 were used for site B ( Table 2). Preprocessing of SAR data was carried out with Alaska Satellite Facility (ASF) Map Ready software (http://www.asf.alaska.edu), and the backscatter coefficient σ0 (dB) was calculated through Equation (3) [36].
where DN is the digital number of a SAR image and CF is a calibration factor. SAR data were resampled to a 30 m pixel size and were projected to the Universal Transverse Mercator (UTM) coordinate system to collocate with Landsat data. SAR images have high speckle noise; hence, many filters, including Lee, Kuan, Frost, and Gamma MAP have been used to despeckle SAR images in the literature. In this study, Lee filter [37], one of the well-known despeckling filters, was applied to remove noise with 3 × 3 window size. Geometric correction was conducted using ninety-five ground control points (GCP) collected from both Landsat and SAR images, and the resultant root mean square error (RMSE) ranged from 10 m to 15 m (less than one pixel). Elevation data at 30 m resolution from the Shuttle Radar Topography Mission (SRTM) digital elevation models (DEM) that was obtained from Earth Explorer (https://earthexplorer.usgs.gov/) was used as an input variable for paddy rice classification. DEM provides land cover related information. For example, most paddy rice is located in relatively flat areas and low elevations. Many studies have used elevation data as input variable for land cover classification [38][39][40].

Reference Data
Ground reference data were collected from the 2004 Sutter County land use survey data and United States Department of Agriculture (USDA) 2009 Cropland Data Layer (CDL) with 1:100,000 scale (site A) and 2002 Korea land cover data with 1:25,000 scale (site B), which were provided by California Department of Water Resource, USDA, and Korea Ministry of Environment, respectively. USDA CDL was produced using Landsat data and Resourcesat-1 based on a decision tree approach [41], and Korea land cover was produced using Landsat, Indian Remote Sensing satellite-1C (IRS-1C), and Satellite Pour l'Observation de la 5 (SPOT 5) Terre data. Since there was a 2-3 years gap between ground reference data and satellite data, we refined ground reference data using high resolution Google Earth images that were collected in the corresponding years through visual inspection. There are nineteen (site A) and twenty-two (site B) land cover classes in the ground reference data and they were aggregated to six (site A) and eight (site B) land cover classes by grouping similar classes (Table 3). A total of 1100 sample points were randomly extracted from input variables at each site. These samples were divided into training data (800 samples) and test data (300 samples), and the number of samples for each class was determined considering the area percentage for each class throughout the study area. The original land cover, aggregated land cover, the number of samples, and the area percentage of each class are summarized in Table 3.

Methodology
A total of 10 input variables-Landsat bands 1-5 and 7, NDVI, NDWI, SAR backscattering coefficients, and DEM-were used for paddy rice classification. Figure 3 shows a flowchart of this study. Classification models were constructed using training data through two machine learning approaches, RF and SVM.

Methodology
A total of 10 input variables-Landsat bands 1-5 and 7, NDVI, NDWI, SAR backscattering coefficients, and DEM-were used for paddy rice classification. Figure 3 shows a flowchart of this study. Classification models were constructed using training data through two machine learning approaches, RF and SVM.
RF is based on classification and regression trees (CART), which is a non-parametric model and constructs rule sets from data [42,43]. RF has two randomization processes, randomly selecting a subset of training samples (e.g., ~70%) for each tree and selecting a subset of variables at each node of a tree (e.g., squared root of the number of total variables). RF consists of many decision trees (the number of trees is 1000 in this study), and each tree has a decision. Finally, the class of an unknown pixel is determined through the majority voting of the results of trees. RF has been successfully conducted for various classification tasks in the literature [38,[44][45][46][47][48][49][50][51]. RF also provides information about relative variable importance. Relative variable importance is based on an increased mean squared error (MSE) using out-of-bag data (i.e., unused training samples for each tree), which is calculated when randomly permuting an independent variable [52,53]. In this study, RF was implemented by R software (https://www.r-project.org/) through Random forest package with default settings (i.e., the number of randomly sampled variables as candidates at each split is the square root of the number of input variables; the minimum size of the terminal node is 1), except for the number of trees.  RF is based on classification and regression trees (CART), which is a non-parametric model and constructs rule sets from data [42,43]. RF has two randomization processes, randomly selecting a subset of training samples (e.g.,~70%) for each tree and selecting a subset of variables at each node of a tree (e.g., squared root of the number of total variables). RF consists of many decision trees (the number of trees is 1000 in this study), and each tree has a decision. Finally, the class of an unknown pixel is determined through the majority voting of the results of trees. RF has been successfully conducted for various classification tasks in the literature [38,[44][45][46][47][48][49][50][51]. RF also provides information about relative variable importance. Relative variable importance is based on an increased mean squared error (MSE) using out-of-bag data (i.e., unused training samples for each tree), which is calculated when randomly permuting an independent variable [52,53]. In this study, RF was implemented by R software (https://www.r-project.org/) through Random forest package with default settings (i.e., the number of randomly sampled variables as candidates at each split is the square root of the number of input variables; the minimum size of the terminal node is 1), except for the number of trees.
SVM is a supervised non-parametric approach and has been used in many studies in the remote sensing field [54][55][56][57][58][59][60]. The use of SVM has significantly increased because it works well in cases of limited training data sets and produces higher accuracy than other traditional approaches, such as Maximum Likelihood (ML) [61,62]. SVM conducts classification by determining hyperplanes that optimally separates classes [63]. One hyperplane only separates two classes so multiple hyperplanes are used to classify multiple classes. Kernel functions are typically adopted to transform the data dimensionality to make it easy to identify optimum hyperplanes. There are various kernel functions including linear, polynomial, radial basis function (RBF), and sigmoid [64]. This study used RBF kernel, widely used in classification tasks, and a grid-search method was used to determine the optimum values of parameters (i.e., gamma in the kernel function and penalty parameter) [65,66].
In this study, six schemes were tested to identify the improvement of classification performance when fusing multi-sensor time series data. Accuracy assessment was conducted under each scenario using test data (300 samples). Overall accuracies of multi-class classification (six classes for site A and eight classes for site B) and binary classification (i.e., paddy rice vs. non-paddy rice) were calculated for six schemes. The six schemes were designed considering sensor types and collection dates. In order to evaluate the significant difference between classifications, the McNemar test, a non-parametric test based on the proportion correct/incorrect was conducted. Large error matrices can be collapsed to a binary size (i.e., 2 × 2) as the binary distinction between correct and incorrect class allocations should be focused for comparison [67,68]. The standardized normal test statistics (z) from the McNemar test is calculated as: where f AB indicates the samples correctly identified in classification A, but is incorrectly classified in classification B; and, f BA means the samples correctly identified in classification B, but incorrectly classified in classification A. The 5% level of significance (α = 0.05) was used for the McNemar test. Important spectral and temporal characteristics for paddy rice classification were identified from the temporal pattern of each variable and variable importance, as provided by RF. In this study, Paddy rice Mapping Index (PMI) was proposed by considering the phenological characteristics of paddy rice in NIR and SAR backscattering coefficients. NIR reflectance and SAR backscattering coefficients are low during the planting/transplanting season (May) and increase during the growing season (Figures 4 and 5). Paddy rice maps were produced from two classification models and PMI. The spatial distribution of the paddy rice maps was also compared with the reference paddy rice maps. PMI-based paddy rice mapping over a larger area (i.e., entire California State) was also conducted using Google Earth Engine.      Tables 4 and 5 show the accuracy assessment results of each approach for both multi-class and binary classifications. This experiment is repeated for 10 times, and the highest overall accuracy and kappa coefficient are shown in Tables 4 and 5. In scheme 1, summer season data showed high accuracy in both sites because of the greenness of the paddy rice. While schemes 2 and 6 showed the highest accuracy in site A (100% and 1.00) and site B (99.33% and 0.99). It is not surprising that the accuracy is high when using time series data since they provide the phenology information of paddy rice. The use of SAR data only (i.e., scheme 3) did not work well for paddy rice classification, resulting in relatively low accuracy (~91%). The McNemar test results showed that there was a significant difference between scheme 3 and the other schemes for both sites (Tables 6 and 7). It is because scheme 3 uses SAR data of all dates only for classification and showed worse accuracy than the other schemes, which indicates that optical sensor data are crucial for paddy rice classification. In site B, scheme 5 showed a significantly higher accuracy than scheme 4, which implies that data from the growing season would be more contributing than data from the no growing season (from November to March) for paddy rice classification.

Evaluation of Paddy Rice Classification
There is no big difference in classification accuracy by classification model. The McNemar test also showed that there was no significant difference between RF and SVM classifications (Tables 6 and 7). However, the accuracy of SVM was slightly higher (~1-3%) than that of RF at both of the sites. The producer's accuracy (PA) and user's accuracy (UA) of SVM were also higher than RF (Supplementary Tables S1 and S2). The classification accuracies for site A (95% (0.93) in multiclass classification, 100% (1.00) in binary classification) were higher than those for site B (88.67% (0.86) in multiclass classification, 99.33% (0.98) in binary classification). The PA and UA in site A were also higher than those in site B (Supplementary Tables S1 and S2). In comparison with site A, site B is very cloudy during summer due to the Asian Monsoon, so there were not enough Landsat data for site B during summer which is important to classify land cover and paddy rice.

Analysis of Temporal and Spectral Characteristics of Paddy Rice
Figures 4 and 5 shows the temporal patterns of nine variables (visible, NIR and SWIR bands, and SAR backscattering) for each land cover class at sites A and B, respectively. Although atmospheric correction was conducted through ENVI FLAASH, there are some fluctuations ( Figure 5). This is because the graphs are not in order of years, but in order of months to better show both the phenological and spectral patterns of the paddy rice (refer to Tables 1 and 2). Fluctuant patterns are more visible at site B, as it suffers from more severe atmospheric affects (e.g., clouds, fogs) than site A. Built-up class (refer to Table 3) generally results in a flat temporal pattern (Figure 4), but there are outliers in the built up class on 29 March, 1 June, and 3 June due to the atmospheric effect ( Figure 5). SAR backscattering coefficients were not relatively affected by atmospheric conditions and the order of years with the flat temporal pattern in built-up class (Figures 4 and 5).
There are no noticeable temporal characteristics in the visible bands (bands 1-3). The reflectance is low and shows typical characteristics, including the highest reflectance in built up areas, and the lowest reflectance in water. The reflectance of paddy rice and vegetation (e.g., field, forest, grass) increase in the NIR band during their growing season. In paddy rice, the reflectance of the NIR band shows a peak in August when photosynthesis is most active, and decreases in September to October when the color of paddy rice turns yellow (Figures 4 and 5). SWIR2 is sensitive to leaf water content, and the more leaf water content there is, the lower the reflectance of SWIR2 is. Paddy rice shows the lowest reflectance of SWIR2 during the growing season, including the transplanting season (May) when the paddy is flooded (Figures 4 and 5).
NDVI increases during the growing season in vegetation (paddy rice, grass, field, and forest), and paddy rice especially grows after transplanting (May). Contrary to NDVI, NDWI sensitive to water deficit during the growing season in vegetation. SAR backscattering shows the lowest roughness in water and the highest (brightest) in built up. Paddy rice shows a distinct phenological pattern in comparison with other land cover classes because paddy rice is flooded during the planting (transplanting) season, unlike other vegetation classes that have a continuously increased/decreased pattern (Figures 4 and 5).
While paddy rice is distinguishable between July and September at site A, vegetation related classes (paddy rice, field, forest, and grass) show a similar pattern in NIR, MIR2, NDVI, and NDWI at site B. This is because sites A and B have different climatic and environmental characteristics: data availability due to clouds (site B has a high cloud cover rate) and the composition of land cover, especially in vegetation types (site B has more forests and denser vegetation areas than site A). However, the difference between planting and harvest seasons of paddy rice is distinct in both sites, and the characteristics of paddy rice are well depicted, especially in NIR spectral bands and SAR backscattering coefficients.

Paddy Rice Mapping Index (PMI)
PMI was developed when considering the phenological characteristics of paddy rice in terms of NIR and SAR backscattering coefficients (Equations (5) and (6)). PMI means the slope of NIR or SAR backscattering coefficients between the transplanting and harvest seasons because the difference of NIR and SAR backscattering coefficients between the two seasons in paddy rice is much higher than other classes. Since optical sensor data has higher data availability at a lower cost than SAR data, NIR based PMI might have a higher practicality than SAR based PMI.
Since PMI is related to the probability of a pixel being paddy rice, it is necessary to determine an optimum threshold to identify paddy rice areas. The optimization of the threshold of NIR based PMI was conducted using reference data for the entire study areas (3,354,085 pixels at site A and 2,694,510 pixels at site B) based on the overall accuracy of binary classification (paddy rice or non-paddy rice). The optimization of SAR based PMI was not conducted due to the limited number of time series data. Figure 6 shows the change of accuracy with increasing thresholds of NIR-based PMI from −0.4 to 0.4 at interval of 0.02 for 13 years for both sites A and B (from 2004 to 2016). There are missing years due to clouds. PMI was calculated using NIR images from Landsat 5, 7, and 8. The highest accuracies were achieved when the threshold was near 0.2 for 10 years at both sites. Site A (~85%) shows higher accuracy than site B (~80%) because site A consisted of relatively homogeneous land cover, as discussed above. Figure 6c shows the box plot of optimal thresholds for 13 years at both of the sites. We determined 0.2 to be the optimal threshold of PMI, and a pixel is determined to be paddy rice when the PMI value of the pixel is greater than 0.2. In addition, it should be noted that the accuracy did not change much with the thresholds between 0.18-0.22 for both sites.
Remote Sens. 2018, 10, x FOR PEER REVIEW 13 of 22 were achieved when the threshold was near 0.2 for 10 years at both sites. Site A (~85%) shows higher accuracy than site B (~80%) because site A consisted of relatively homogeneous land cover, as discussed above. Figure 6c shows the box plot of optimal thresholds for 13 years at both of the sites. We determined 0.2 to be the optimal threshold of PMI, and a pixel is determined to be paddy rice when the PMI value of the pixel is greater than 0.2. In addition, it should be noted that the accuracy did not change much with the thresholds between 0.18-0.22 for both sites.

Comparisons of Spatial Distributions of Paddy Rice Maps
The visual comparison of paddy rice maps from RF-, SVM-, PMI (NIR)-, and PMI (SAR)-derived classification at both sites appears in Figures 7 and 8. Machine learning-based paddy rice maps (Figures 7c,d and 8c,d) were produced through scheme 6, which resulted in the best performance among the six schemes (Tables 4 and 5) using RF and SVM classifiers. There is not much difference between two classifiers at site A, while the SVM result shows clearer paddy rice parcels than RF at site B. Paddy rice was not well detected and underestimated at site B because paddy rice and field are mixed within a pixel, especially at site B (Figure 8c-f).

Comparisons of Spatial Distributions of Paddy Rice Maps
The visual comparison of paddy rice maps from RF-, SVM-, PMI (NIR)-, and PMI (SAR)-derived classification at both sites appears in Figures 7 and 8. Machine learning-based paddy rice maps (Figures 7c,d and 8c,d) were produced through scheme 6, which resulted in the best performance among the six schemes (Tables 4 and 5) using RF and SVM classifiers. There is not much difference between two classifiers at site A, while the SVM result shows clearer paddy rice parcels than RF at site B. Paddy rice was not well detected and underestimated at site B because paddy rice and field are mixed within a pixel, especially at site B (Figure 8c-f).   PMI based paddy rice maps were calculated using NIR and SAR backscattering coefficients (Equations (5) and (6)) and, the thresholds applied were 0.2 (NIR) and −0.15 (SAR). The threshold of SAR based PMI was not fully optimized because the availability of time series of SAR data was very limited, so the threshold, (−0.15), was derived by averaging one year PMI optimized thresholds from two sites. SAR based PMI produced more noisy paddy rice maps than NIR based PMI due to speckle, which is one of the limitations of SAR images [69]. The overall accuracy of binary classification (paddy rice or non-paddy rice) for the entire study areas (3,354,085 pixels at site A and 2,694,510 pixels at site B) was calculated. That classification results with two machine learning classifiers (~95% at site A and ~90% at site B) were more accurate than PMI (NIR) based results (~87% at site A and ~83% at site B). PMI based paddy rice maps were calculated using NIR and SAR backscattering coefficients (Equations (5) and (6)) and, the thresholds applied were 0.2 (NIR) and −0.15 (SAR). The threshold of SAR based PMI was not fully optimized because the availability of time series of SAR data was very limited, so the threshold, (−0.15), was derived by averaging one year PMI optimized thresholds from two sites. SAR based PMI produced more noisy paddy rice maps than NIR based PMI due to speckle, which is one of the limitations of SAR images [69]. The overall accuracy of binary classification (paddy rice or non-paddy rice) for the entire study areas (3,354,085 pixels at site A and 2,694,510 pixels at site B) was calculated. That classification results with two machine learning classifiers (~95% at site A and 90% at site B) were more accurate than PMI (NIR) based results (~87% at site A and~83% at site B).

PMI-Based Paddy Rice Mapping Using Google Earth Engine
PMI was applied to a larger area (paddy rice cultivated area in California State; Figure 9) using the threshold of 0.2. Figure 9 shows PMI based paddy rice maps on the Google Earth Engine platform. The reference paddy rice map was obtained from USDM CDL 2016. Paddy rice was well detected through PMI, although there were some false alarms that were caused by clouds. These results indicate that PMI can effectively map and monitor yearly paddy rice areas at a national or larger scale through Google Earth Engine through the improvement of computing capacity. We also tested the PMI-based paddy rice mapping for other regions where rice is cultivated (Pohang in South Korea, Acadia county in Louisiana State, and Arkansas county in Arkansas state, United State; Supplementary Figure S1). Paddy rice was well detected through PMI where the fields are flooded during the transplanting season (Pohang and Acadia County). PMI was applied to a larger area (paddy rice cultivated area in California State; Figure 9) using the threshold of 0.2. Figure 9 shows PMI based paddy rice maps on the Google Earth Engine platform. The reference paddy rice map was obtained from USDM CDL 2016. Paddy rice was well detected through PMI, although there were some false alarms that were caused by clouds. These results indicate that PMI can effectively map and monitor yearly paddy rice areas at a national or larger scale through Google Earth Engine through the improvement of computing capacity. We also tested the PMI-based paddy rice mapping for other regions where rice is cultivated (Pohang in South Korea, Acadia county in Louisiana State, and Arkansas county in Arkansas state, United State; Supplementary Figure S1). Paddy rice was well detected through PMI where the fields are flooded during the transplanting season (Pohang and Acadia County).

Discussion
Paddy rice classification was conducted by expanding the dimensionality of data on both spectral and temporal domains using two machine learning approaches, RF and SVM. The six schemes that were composed of various combinations of input data by sensor and collection date were carried out, and the effect of data dimensionality was analyzed over two different regions. The spectral and temporal characteristics of paddy rice were identified, and Paddy rice Mapping Index (PMI) was proposed based on the phenological characteristics of paddy rice.
SAR data contributed to paddy rice classification more for site A than site B, which implies that it might not be ideal to use SAR data only for paddy rice classification in areas with rugged terrains and heterogeneous land cover. Nonetheless, SAR data were able to improve classification accuracy when using with Landsat data. Although it is considered that the fusion of Landsat and SAR might not be necessary for areas with low cloud cover, such a fusion can be effective to paddy rice classification for Asia because most regions in Asia are very cloudy, especially for the growing season, which limits the availability of optical time series data.

Discussion
Paddy rice classification was conducted by expanding the dimensionality of data on both spectral and temporal domains using two machine learning approaches, RF and SVM. The six schemes that were composed of various combinations of input data by sensor and collection date were carried out, and the effect of data dimensionality was analyzed over two different regions. The spectral and temporal characteristics of paddy rice were identified, and Paddy rice Mapping Index (PMI) was proposed based on the phenological characteristics of paddy rice.
SAR data contributed to paddy rice classification more for site A than site B, which implies that it might not be ideal to use SAR data only for paddy rice classification in areas with rugged terrains and heterogeneous land cover. Nonetheless, SAR data were able to improve classification accuracy when using with Landsat data. Although it is considered that the fusion of Landsat and SAR might not be necessary for areas with low cloud cover, such a fusion can be effective to paddy rice classification for Asia because most regions in Asia are very cloudy, especially for the growing season, which limits the availability of optical time series data.
The performance of SVM was better than that of RF at both sites, which is consistent with Pouteau et al. [70] and Raczko & Zagajewski [71]. In this study, classifications were conducted in a 10-dimensional feature space for scheme 1, and more than 60-dimensional feature space for the other schemes. As SVM is known to work well in a large dimensional feature space, it could be better for SVM to classify paddy rice than RF [70]. Raczko and Zagajewski [71] stated that the number of training samples has an impact on the performance of a classifier, and Kavzoglu & Mather [72] stated that more than 400 pixels per class are needed for robust classification. In this study, a relatively small number of training samples (less than 200 pixels per class) were used, which might explain the better performance of SVM than RF because SVM works well with a small sample size as well as mixed pixels [56,73].
Land cover was more heterogeneous in site B than site A with a higher number of classes (six classes in site A and eight classes in site B). The population density of site B (249 people/km 2 ) was about four times more than that of site A (61 people/km 2 ), and classes were more patched and thus more mixed within a pixel in site B than site A. Those factors influenced the classification accuracy at site B. The size of paddy rice fields needs to be considered in the future research as discussed in Dong et al. [17]. Higher resolution data, such as RapidEye, are needed to classify paddy rice where the size of paddy rice field is small, such as in site B.
The approach proposed in this study have benefited from the use of phenological characteristics of paddy rice (i.e., time series data). The phenology of paddy rice in terms of various spectral bands and vegetation indices can be used for numerous applications because such phenology is a relative characteristic that varies by region. There are many fields that use the phenological characteristics of crops (e.g., paddy rice), such as classifying different vegetation (crop) types [74,75], crop yield estimation [76,77], gross and net primary production (GPP/NPP) estimation [78], and quantifying crop water requirements [79]. In addition, the approach that was developed in this study can be applied to any inaccessible regions where in situ data are not available, such as North Korea. For example, previous studies conducted classification over inaccessible regions applying training samples from different areas that have similar climate and environmental characteristics [80] or unsupervised classification [81].
While machine learning-based classification requires training data, which is often challenging for certain areas, PMI-based paddy rice mapping does not require any in situ data for training. Although the validation accuracy of PMI based results was relatively lower than those of the machine learning approaches, the spatial distribution of paddy rice well matched that of the reference paddy rice data. PMI can be applied to different regions with the same threshold (greater than 0.2) or an adjusted threshold considering the dates of data acquisition. Consequently, PMI is considered to be useful to map paddy rice because PMI is calculated using a very simple equation and needs only two Landsat images or similar satellite data with NIR bands during the planting and harvest seasons. An annual update of paddy rice maps is possible using PMI. However, the availability of optical sensor data due to cloud contamination could be a limiting factor, especially for East Asia. This limitation can be overcome using geostationary satellite data, such as planned Geostationary Ocean Color Imager (GOCI)-2, which have high temporal resolution (~hourly) with 250 m spatial resolution. It is expected that fully clear images during two seasons can be successfully obtained every year using geostationary satellite data.
Paddy rice mapping at a regional or continental scale can be conducted using PMI through Google Earth Engine with an aid of the continued improvement of computing capacity. However, in this study, paddy rice detected using PMI was somewhat confused with other crops where paddy rice is planted in dry fields. It is one of the limitations of using PMI, in that PMI can be successfully applied only where paddy rice is planted in flooded fields.

Conclusions
As the population in the world increases, the demand of rice also increases. Thus, providing accurate and up-to-date paddy rice maps is very important. A satellite-based approach produces paddy rice maps with a fast update cycle in comparison with the traditional survey approach. In this study, Landsat, SAR, and SRTM data were used to classify paddy rice at two sites that have different climatic and environmental characteristics. This study evaluated six schemes to identify the improvement of classification performance through the investigation of the characteristics of study areas, classifiers, and fusion of multi-sensor time series data. As expected, the classification accuracy was the highest when using multi-sensor time series data (scheme 6). Site A showed better performance than site B because Site A consisted of more homogeneous land cover and had higher data availability during the growing season than site B. The performance of SVM classifier was slightly better (accuracy of 1-3%) than RF, especially at site B because SVM works well in a small sample size and at a mixed pixel. Paddy Rice Mapping Index (PMI) considering spectral and phenological characteristics of paddy rice was proposed in this study. NIR and SAR backscattering coefficients detected the phenological characteristic of paddy rice well. Although the performance of PMI is slightly worse than the RF or SVM classifications based on multi-sensor time series data, PMI is useful to operationally map paddy rice, because PMI is simply calculated only using two scenes and does not require training data, which implies that the PMI-based approach can be applied to different regions. In addition, paddy rice can be easily mapped using the Google Earth Engine platform.
There are some limitations in the use of PMI. PMI can be applied only where paddy rice areas are flooded during the transplanting season. As the current threshold, greater than 0.21, based on the results for 13 years is a little arbitrary, more robust and adaptive thresholding approaches should be considered. There are some misclassified pixels due to the heterogeneous composition of the land cover within a pixel at site B. In further studies, the size of paddy rice fields need to be considered when selecting satellite images, and higher resolution data, such as RapidEye or Worldview series, need to be evaluated to detect small size paddy rice patches.