Subpixel Inundation Mapping Using Landsat-8 OLI and UAV Data for a Wetland Region on the Zoige Plateau , China

Wetland inundation is crucial to the survival and prosperity of fauna and flora communities in wetland ecosystems. Even small changes in surface inundation may result in a substantial impact on the wetland ecosystem characteristics and function. This study presented a novel method for wetland inundation mapping at a subpixel scale in a typical wetland region on the Zoige Plateau, northeast Tibetan Plateau, China, by combining use of an unmanned aerial vehicle (UAV) and Landsat-8 Operational Land Imager (OLI) data. A reference subpixel inundation percentage (SIP) map at a Landsat-8 OLI 30 m pixel scale was first generated using high resolution UAV data (0.16 m). The reference SIP map and Landsat-8 OLI imagery were then used to develop SIP estimation models using three different retrieval methods (Linear spectral unmixing (LSU), Artificial neural networks (ANN), and Regression tree (RT)). Based on observations from 2014, the estimation results indicated that the estimation model developed with RT method could provide the best fitting results for the mapping wetland SIP (R2 = 0.933, RMSE = 8.73%) compared to the other two methods. The proposed model with RT method was validated with observations from 2013, and the estimated SIP was highly correlated with the reference SIP, with an R2 of 0.986 and an RMSE of 9.84%. This study highlighted the value of high resolution UAV data and globally and freely available Landsat data in combination with the developed approach for monitoring finely gradual inundation change patterns in wetland ecosystems.


Introduction
Wetlands act as one of the most important types of ecosystems and perform many vital functions, including water storage and purification, flood and erosion control, shoreline protection, conservation of biological diversity, and as a habitat for wildlife and fishery resources for human communities [1][2][3].As one of the most important abiotic factors, the wetland inundation extent greatly dominates the function of the wetland ecosystem and its consequent effects on the interactions between the land and atmosphere system [4].In addition, it is well known that even small inundation regime changes may result in a substantial impact on the ecosystem characteristics and function [5].Therefore, accurately mapping the wetland inundation is absolutely required to capture the wetland inundation dynamics to understand the influences from climate change and human activities and to monitor their responses to the terrestrial ecosystem [6,7].
To ascertain wetland inundation changes at a regional scale, remote sensing has been proven to be an economical and efficient tool [8,9].Traditionally, land cover classification for remote sensing images is one popular way of defining the land surface characteristics of each observed pixel.However, this approach is usually ineffective or involves a high level of uncertainty due to the assumption that a single land cover type is assigned to each pixel, especially for moderate spatial resolution pixels.Compared to classifying each pixel as the water surface or non-water surface, it would be more appropriate and important to label the proportion of the wetland inundation for each pixel at a subpixel scale.To estimate the subpixel land cover proportion from remotely sensed data, different methods have been developed based on the pixel signal from remote sensing observation and the spectral differences between different land surface components [5,[10][11][12].
Generally, according to the method of simulating the pixel signal, the methods can be grouped into three categories: physical-based models, spectral mixture models, and regression models [13].Physical-based models are often complex and are generally premature for application to land cover [14]; therefore, it was not used in this study.For spectral mixture models, the linear spectral unmixing (LSU) method is commonly used.It is based on the assumption that there is no significant occurrence of multiple scattering between the different surface components, which allows the pixel signal to be linearly composed by the signals from each surface component (endmember).Meanwhile, each endmember should be predefined with this method.LSU is originally designed to identify the percent distribution of different land covers in coarse/medium resolution imagery (e.g., Moderate Resolution Imaging Spectroradiometer (MODIS) and Landsat-8 Operational Land Imager (OLI) image) [15][16][17].This method has been widely used to derive the proportions of each endmember in a mixed pixel because of its ease of use and with reasonable physical meaning in interpreting spectral mixing [18,19].Compared to physical-based and spectral mixture models, regression models do not require prior knowledge and are able to directly yield interpretable results on the subpixel proportions.Therefore, many regression models have been developed to downscale the mixed pixels, including linear and nonlinear methods [20].Among these methods, the artificial neural network (ANN) and regression tree (RT) are part of typical nonlinear regression methods, which have been used to retrieve subpixel surface component proportions in a variety of studies and have obtained satisfactory results [5,10,21,22].
To assess the performance of the three model types, Halabisky et al. [11] used LSU methods to reconstruct semi-arid wetland surface water dynamics based on a time series of Landsat satellite images from 1984 to 2011, and LSU method worked well for even small wetlands (<1800 m 2 ).Weng and Hu [23] applied both ANN and LSU methods to estimate impervious surfaces with medium spatial resolution satellite images from the Terra Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) and Landsat-7 Enhanced Thematic Mapper plus (ETM+).The result clearly demonstrated the superior performance of ANN to LSU method due to ANN's ability to handle the nonlinear mixing of the image spectrum.Huang et al. [5] applied RT method to retrieve the wetland subpixel inundation percentage with Landsat data and airborne Light Detection and Ranging (LiDAR) intensity data and found that RT model performed better than the stepwise linear regression.It is clear that the aforementioned methods have been applied often in different fields.However, few studies have been conducted to examine and compare the effectiveness of current typical subpixel unmixing methods for wetland subpixel inundation percentage (SIP) estimations.Meanwhile, an efficient, accurate, and robust method for monitoring the continuous distribution of wetland inundation over a large area is urgently needed to track the wetland dynamics and monitor its degradation [24].Therefore, this study aims to assess the potential and effectiveness of the three typical methods (LSU, RT, and ANN) for estimating wetland SIP and to propose the best method to map wetland SIP.
In this study, highly accurate reference SIP data are essential for the reliability of assessing the performance of the different methods.Huang et al. [5] used LiDAR intensity data to produce 1 m resolution inundation maps and to obtain the reference SIP at a Landsat 30 m resolution scale.This procedure had been approved as an optional method based on their multi-year estimation results.However, the routine use of LiDAR data for mapping wetlands and their dynamics in many areas is not feasible due to the high cost and limited availability of data, which obviously is a significant obstruction for the method application.The emergence of unmanned aerial vehicle (UAV) technology in recent years offers a new opportunity to activate wetland inundation at the sub-meter scale.As an important complementary platform to traditional remote sensing platforms, UAV technology exhibits significant advantages because it is small, has low to moderate costs, is flexible, and does not need highly trained pilots.In addition, the high spatial resolution images from UAVs partly ensures the purity of the image pixel and provide a direct way to recognize water bodies and non-water bodies, such as small inundations, grass, and soil patches, which cannot be detected with conventional manned aerial photography or satellite imagery [25].UAV-based images can be used to produce high-resolution inundation maps to bridge the gap between ground-based wetland inundation information and middle resolution satellite data.Therefore, another purpose of this study is to investigate the feasibility of the UAV-based data in wetland SIP mapping work when combined with medium resolution Landsat data.
Based on the purposes mentioned above, the Landsat-8 OLI data and the UAV-based data of a typical wetland region on the Zoige Plateau in China were used in this paper as a performance comparison of the three typical methods (ANN, LSU, and RT) for estimating the SIP of wetlands on a Landsat pixel scale.The ultimate goal is to provide a reliable approach for long-term wetland monitoring to help management decision making in terms of balancing economic interests and nature conservation.

Study Area
A typical wetland degradation transect in the Zoige Wetland National Nature Reserve on the Zoige Plateau, the largest high-altitude marsh area in the world, which is located in the northeast area of the Tibetan Plateau in China [26], was selected as the study area in this study (Figure 1).The transect, from the southwest to the northeast, has an obvious water gradient inducing land surface changes from the large water surface of the lake, permanent wet marshes close to the lake, a transition zone covered by meadows that are temporary inundated by water, and permanent dry grasslands (Figure 2).The vegetation species in this region are dominated by Kobresia tibetica, Carex muliensis and Festuca nivina [27].It should be mentioned that the inundation of the transition zone is highly impacted by climate conditions, especially by precipitation of the region.Meanwhile, the regulation of the dam of the lake also greatly influences the water level of the lake and the inundation of the wetland.The Zoige wetlands, which have an important ecological function of retaining water upstream of the Yellow River [28], are inundated or saturated for a relatively short period in the summer.The period of this area has the highest groundwater level and is largely inundated, which usually occurs around

Overview of the Approaches
In this paper, the proposed approach for modeling SIP using the Landsat-8 OLI data and UAV data consisted of three major steps.In the first step, the UAV image was prepared, preprocessed, and classified by an object-based image analysis to derive the UAV land cover map.The UAV land

Overview of the Approaches
In this paper, the proposed approach for modeling SIP using the Landsat-8 OLI data and UAV data consisted of three major steps.In the first step, the UAV image was prepared, preprocessed, and classified by an object-based image analysis to derive the UAV land cover map.The UAV land

Overview of the Approaches
In this paper, the proposed approach for modeling SIP using the Landsat-8 OLI data and UAV data consisted of three major steps.In the first step, the UAV image was prepared, preprocessed, and classified by an object-based image analysis to derive the UAV land cover map.The UAV land cover map was then aggregated into an OLI pixel, and the reference 30 m SIP map was created.In the second step, the available surface reflectance of the Landsat-8 OLI image was preprocessed, and the spectral indices were calculated to in preparation of the modeling.Finally, the UAV-based reference SIP for 2014 was used to train and evaluate the three SIP models (LSU, ANN, and RT) to select the optimal model.The prediction ability of the best model was validated by the UAV-based reference SIP of 2013.The best model was then used to predict the wetland SIP beyond the transect across the Zoige Wetland Nature Reserve of the Zoige Plateau. Figure 3 shows the flowchart on the detailed process concerning these steps.
Remote Sens. 2017, 9, 31 5 of 22 cover map was then aggregated into an OLI pixel, and the reference 30 m SIP map was created.In the second step, the available surface reflectance of the Landsat-8 OLI image was preprocessed, and the spectral indices were calculated to in preparation of the modeling.Finally, the UAV-based reference SIP for 2014 was used to train and evaluate the three SIP models (LSU, ANN, and RT) to select the optimal model.The prediction ability of the best model was validated by the UAV-based reference SIP of 2013.The best model was then used to predict the wetland SIP beyond the transect across the Zoige Wetland Nature Reserve of the Zoige Plateau. Figure 3 shows the flowchart on the detailed process concerning these steps.

Best method validation & application
Step 1 Step 2 Step 3 Figure 3. Flowchart of the overall approach for mapping the subpixel inundation percentage (SIP), using data on the Landsat-8 Operational Land Imager (OLI) and unmanned aerial vehicle (UAV).The RT, ANN, and LSU are the abbreviations of regression tree, artificial neural networks, and linear spectral unmixing, respectively.

Deriving the Reference SIP from the UAV Data
Considering that the Zoige wetland usually has the highest annual inundation level in the summer and the time consistency in satellite remote sensing, almost simultaneous UAV-based remote sensing experiments were conducted on 31 July 2013 and 25 July 2014.In 2014, the UAV-based observation occurred one day prior to the Landsat-8 observation.For 2013, approximately one week lagged for the UAV observation against the Landsat-8 observation.Due to the acquisition dates between the 2014 UAV dataset and Landsat image being closer than those of 2013, the 2014 UAV imagery was used for model development and a comparison study, and the 2013 UAV imagery was used for the best performance method validation.

UAV Data
A fixed-wing UAV (Freebird, China), which has superior capabilities of wind resistance compared to the rotary wing UAV, was chosen to acquire the image.The onboard sensor is a digital camera (Canon 5D Mark II) that has three optical bands (red, green, and blue).During the experiment, the flight height of the UAV was set to 800 m (4250 m a.s.l.) above ground level over the transect to enable high spatial resolution at ground level (0.16 m).Meanwhile, the 80% forward lap and 60% side lap of the photography system guarantee that there is no gap between the nearby images.A total of 265 pictures were simultaneously obtained to cover the transect area with 7 km long and 1 km wide.Subsequently, the geometric/topographic corrections was implemented to Figure 3. Flowchart of the overall approach for mapping the subpixel inundation percentage (SIP), using data on the Landsat-8 Operational Land Imager (OLI) and unmanned aerial vehicle (UAV).The RT, ANN, and LSU are the abbreviations of regression tree, artificial neural networks, and linear spectral unmixing, respectively.

Deriving the Reference SIP from the UAV Data
Considering that the Zoige wetland usually has the highest annual inundation level in the summer and the time consistency in satellite remote sensing, almost simultaneous UAV-based remote sensing experiments were conducted on 31 July 2013 and 25 July 2014.In 2014, the UAV-based observation occurred one day prior to the Landsat-8 observation.For 2013, approximately one week lagged for the UAV observation against the Landsat-8 observation.Due to the acquisition dates between the 2014 UAV dataset and Landsat image being closer than those of 2013, the 2014 UAV imagery was used for model development and a comparison study, and the 2013 UAV imagery was used for the best performance method validation.

UAV Data
A fixed-wing UAV (Freebird, China), which has superior capabilities of wind resistance compared to the rotary wing UAV, was chosen to acquire the image.The onboard sensor is a digital camera (Canon 5D Mark II) that has three optical bands (red, green, and blue).During the experiment, the flight height of the UAV was set to 800 m (4250 m a.s.l.) above ground level over the transect to enable high spatial resolution at ground level (0.16 m).Meanwhile, the 80% forward lap and 60% side lap of the photography system guarantee that there is no gap between the nearby images.A total of 265 pictures were simultaneously obtained to cover the transect area with 7 km long and 1 km wide.Subsequently, the geometric/topographic corrections was implemented to generate the digital orthophoto map (DOM) by using MAP-AT software, which can automatically process the acquired imagery and altitude data using the ground control points (GCPs), synchronized GPS positions, and the roll, pitch and yaw of each image.In addition, GCPs were collected, in which the surface features are easily distinguished, distinct, or have a large color contrast compared to nearby features (e.g., houses, road intersections, or artificial white plates).The GCPs were collected in the center of the artificial white plate by real-time kinematic (RTK).Each GCP was measured three times, and the mean value was calculated as the coordinate of each point.Figure 4 shows the UAV image of the study area (6.4 × 1.2 km 2 ) acquired in 2014.Five eagle-eye images were selected to show the unique ability of the UAV data to present fine inundation conditions at different locations.Sites A and B are permanent wetland located in the lake.Site C is located at the edge of the lake, and sites D and E in the transition Zone are the meadows that are temporary inundated.
Remote Sens. 2017, 9, 31 6 of 22 generate the digital orthophoto map (DOM) by using MAP-AT software, which can automatically process the acquired imagery and altitude data using the ground control points (GCPs), synchronized GPS positions, and the roll, pitch and yaw of each image.In addition, GCPs were collected, in which the surface features are easily distinguished, distinct, or have a large color contrast compared to nearby features (e.g., houses, road intersections, or artificial white plates).The GCPs were collected in the center of the artificial white plate by real-time kinematic (RTK).Each GCP was measured three times, and the mean value was calculated as the coordinate of each point.
Figure 4 shows the UAV image of the study area (6.4 × 1.2 km 2 ) acquired in 2014.Five eagle-eye images were selected to show the unique ability of the UAV data to present fine inundation conditions at different locations.Sites A and B are permanent wetland located in the lake.Site C is located at the edge of the lake, and sites D and E in the transition Zone are the meadows that are temporary inundated.

Reference SIP Extraction from the UAV Data
Because of the high spatial resolution of the UAV data, it is reasonable to define the UAV image pixel to be a water surface or non-water surface, after which the SIP on the Landsat-8 OLI pixel scale is calculated.An object-based image analysis (OBIA) is used for UAV image classification because the geometrical and contextual features can be incorporated into the classification [29,30].The approach segments the UAV image into ecological patches, is combined with a decision tree model at the object level, and is able to improve the classification accuracy.
Based on the UAV classification map, the reference SIP can ultimately be obtained through spatial aggregation.The general form of the spatial aggregation is as follows: where SIPr is the reference SIP for the 30 m grids of the Landsat image; Sw,i represents the area of the water pixel for UAV (m 2 ); n stands for the number of water pixels of the UAV classification map in the 30 m grids of the Landsat image; and Soli represents the area of 30 m grids of the Landsat-8 OLI image (m 2 ).

Reference SIP Extraction from the UAV Data
Because of the high spatial resolution of the UAV data, it is reasonable to define the UAV image pixel to be a water surface or non-water surface, after which the SIP on the Landsat-8 OLI pixel scale is calculated.An object-based image analysis (OBIA) is used for UAV image classification because the geometrical and contextual features can be incorporated into the classification [29,30].The approach segments the UAV image into ecological patches, is combined with a decision tree model at the object level, and is able to improve the classification accuracy.
Based on the UAV classification map, the reference SIP can ultimately be obtained through spatial aggregation.The general form of the spatial aggregation is as follows: where SIP r is the reference SIP for the 30 m grids of the Landsat image; S w,i represents the area of the water pixel for UAV (m 2 ); n stands for the number of water pixels of the UAV classification map in the 30 m grids of the Landsat image; and S oli represents the area of 30 m grids of the Landsat-8 OLI image (m 2 ).

Landsat-8 OLI Data
Two scenes in the Landsat-8 OLI data were acquired on 26 July 2014 and 23 July 2013, from path 131 and row 37.They were used as satellite data to derive the wetland SIP mapping in this study.They were downloaded from the USGS National Center from Earth Resources Observation and Science (EROS) [31], where the Landsat-8 Surface Reflectance and Spectral Indices are provided.Therefore, there is no need for further radiometric calibration or atmospheric correction for the datasets.Although the geolocation accuracy of the Landsat-8 OLI image is better than 0.4 pixels [32], the subpixel systematic geometric correction should not be ignored.The Landsat-8 OLI images were georeferenced using the easily distinguished houses and road intersections from the UAV images.Meanwhile, co-registration was also implemented for Landsat 8 OLI according to the coherence feature in the spectral-spatial feature space based on the Landsat image.

Variables Derived from Landsat Data
Previous studies had found that the original Landsat bands and a suit of indices were useful for inundation modeling [5].Therefore, the surface reflectance (band 1-7), two vegetation indices (normalized difference vegetation index (NDVI) and enhanced vegetation index (EVI)), and four normalized different water indices (NDWI_1, NDWI_2, NDWI_3, and MNDWI) were ultimately selected as the explanatory variables for the estimation model construction with ANN and RT methods.The equations for calculating the other indices are provided in Table 1.Xu [38] Note: band 2 to band 6 are the surface reflectance of each band.

Modeling Approaches
Among the three methods to be investigated in estimating the SIP on the Landsat pixel scale, LSU method was used to derive the SIP directly from the Landsat data by finding the spectral endmembers with the help of UAV data, and the other two methods were used to find a functional relationship between the spectral data and spectral indices acquired from Landsat-8 and the UAV-based reference SIP.
(1) LSU method LSU is a commonly used approach to derive fractions of each land cover (endmember) in the mixed pixels of remotely sensed imagery.The approach assumes that the spectral signal of a pixel measured by sensors is a linear combination of the pure spectra of different endmembers that are weighted by their corresponding fraction cover in each pixel [12,39].
In this study, the image-based method, which selects endmembers from the satellite image itself, was adopted due to its ease of use, and the spectra of the endmembers were derived at the same scale as the original image.Once the proportion of one land cover type of UAV classification maps accounted for more than 95% within each 30 m grid of Landsat data, the corresponding Landsat pixel was treated as a pure pixel and as an endmember.A set of candidate endmember spectra were then prepared.Finally, four endmembers were chosen: (1) Soil; (2) Grass1; (3) Grass2; and (4) Water to represent the overall land cover condition of the study area.Two grass classes were separately defined to account for the water underneath the grass.Grass1 represents one hundred percent of the grass coverage on the land, while Grass2 is the full cover grassland living in the water.The LSU method was then used to decompose the image based on the four endmember spectra.The equation can be described as: where i=1, . . ., 7 is the number of spectral bands; ρ(λ i ) is the spectral reflectance of band i of a pixel that contains four endmembers; ρ soil (λ i ), ρ grass1 (λ i ), ρ grass2 (λ i ), and ρ water (λ i ) are the known spectral reflectance of the four endmembers within the pixel of band i; f soil , f grass1 , f grass2 and SIP are the proportion of the endmembers soil, grass1, grass2 and water within the pixel, respectively; and ε(λ i ) is noise or can be interpreted as a measurement error for band i [12].During the fitting process, the additional boundary condition should be imposed to obtain a physically meaningful result such that the fractions of endmembers should be positive and add up to 1. (

2) ANN method
The back-propagation neural network (BPNN) is one of the most widely used artificial neural networks (ANN) in the remote sensing community [40].As for the BPNN structure in this study, 13 variables, including band 1 to band 7, EVI, NDVI, NDWI_1, NDWI_2, NDWI_3, and MNDWI, formed the 13 neurons in the input layer.The node in this output layer was SIP.The number of nodes in the hidden layer was optimized by separating the whole dataset into training and validation data.To minimize the undesired effects related to the random initialization of the optimization routine, the entire cross-validation procedure was repeated 1000 times.All parameters represented the average values of these 1000 simulations by trial and error.From the validation results shown in Figure 5, the root mean squared error (RMSE) became stable when the number of nodes was set to 12 or greater.Therefore, the number of hidden layer nodes was 12 in this study.The unipolar sigmoid transfer function between the input layer and hidden layer was selected, and a linear transfer function in the output layer was used.During the network training process, Levenberg-Marquardt (TrainLM), which shares a common training algorithm with BPNN, was used.The early stopping technique was used to avoid overfitting problems [41].
Landsat pixel was treated as a pure pixel and as an endmember.A set of candidate endmember spectra were then prepared.Finally, four endmembers were chosen: (1) Soil; (2) Grass1; (3) Grass2; and (4) Water to represent the overall land cover condition of the study area.Two grass classes were separately defined to account for the water underneath the grass.Grass1 represents one hundred percent of the grass coverage on the land, while Grass2 is the full cover grassland living in the water.The LSU method was then used to decompose the image based on the four endmember spectra.The equation can be described as: where i=1, …, 7 is the number of spectral bands; ρ(λi) is the spectral reflectance of band i of a pixel that contains four endmembers; ρsoil(λi), ρgrass1(λi), ρgrass2(λi), and ρwater(λi) are the known spectral reflectance of the four endmembers within the pixel of band i; fsoil, fgrass1, fgrass2 and SIP are the proportion of the endmembers soil, grass1, grass2 and water within the pixel, respectively; and ε(λi) is noise or can be interpreted as a measurement error for band i [12].During the fitting process, the additional boundary condition should be imposed to obtain a physically meaningful result such that the fractions of endmembers should be positive and add up to 1. (

2) ANN method
The back-propagation neural network (BPNN) is one of the most widely used artificial neural networks (ANN) in the remote sensing community [40].As for the BPNN structure in this study, 13 variables, including band 1 to band 7, EVI, NDVI, NDWI_1, NDWI_2, NDWI_3, and MNDWI, formed the 13 neurons in the input layer.The node in this output layer was SIP.The number of nodes in the hidden layer was optimized by separating the whole dataset into training and validation data.To minimize the undesired effects related to the random initialization of the optimization routine, the entire cross-validation procedure was repeated 1000 times.All parameters represented the average values of these 1000 simulations by trial and error.From the validation results shown in Figure 5, the root mean squared error (RMSE) became stable when the number of nodes was set to 12 or greater.Therefore, the number of hidden layer nodes was 12 in this study.The unipolar sigmoid transfer function between the input layer and hidden layer was selected, and a linear transfer function in the output layer was used.During the network training process, Levenberg-Marquardt (TrainLM), which shares a common training algorithm with BPNN, was used.The early stopping technique was used to avoid overfitting problems [41].(3) RT method RT method, which produces a rule-based regression model based on training data, can fit a complex nonlinear relation under different rules.Each rule contains one or more conditions under which a linear sub-model is established.RT method can approximate a nonlinear relationship between predictive and target variables without a priori knowledge, and allow both continuous and discrete variables as input variables [5,42].These approaches have been proven to be more effective than simple techniques, including multivariate linear regression, and are also easier to interpret than the neural network [13].In this study, the Cubist was used to implement the function of RT and the model SIP [43] The Cubist is a powerful tool for generating rule-based predictive models that balance accurate predictions against intelligibility.This package had been used successfully in many studies [5,41,42,44,45].
In this paper, the approach for modeling SIP using RT method consisted of data preparation, modeling, and model application.First, the 30 m reference SIP data were derived from the UAV classification map by using the spatial aggregation method.The UAV-based reference SIP was then divided into training (70%) and evaluating (30%) data.Second, the Cubist was chosen to model the SIP based on the reference SIP and the 13 variables (including band 1 to band 7, EVI, NDVI, NDWI_1, NDWI_2, NDWI_3, and MNDWI), and the accuracy of the reconstructed model was evaluated by the remaining data.Finally, the constructed model was applied to the area beyond the transect area.

Evaluation Metrics
During the method comparison study, the training and test accuracy of LSU, ANN, and RT methods were measured by using the coefficient of determination (R 2 ) and RMSE.R 2 can reflect the correlation relationship between the reference SIP and the predicted SIP.RMSE is a measure of the absolute difference of the estimation results.Additionally, to validate the spatial accuracy of the prediction, the mean absolute error (MAE) was adopted to compare the predicted SIP map to reference SIP map at the pixel level and image level.The MAE can be calculated as: in which j = 1, 2, . . ., n is the total number of pixels, and SIP p,i and SIP r,i are the predicted and reference SIP values, respectively.

Reference SIP
Based on the UAV image, the classification result was derived using the OBAI and decision tree methods.The area percentages of water, grass, and soil are 14.36%, 71.22%, and 14.42%, respectively, as illustrated in Figure 6.From southwest to northeast, we can see a large water surface on the lake, some small ditches, bare soil that is temporarily inundated by water, and grass distributed among them.An independent accuracy assessment was performed using a stratified random sample of 236 points within the classification transect.Table 2 shows the producer's accuracy (PA), user's accuracy (UA), overall accuracy, and the kappa coefficient.It is clear that all categories have relatively high PA, from 91.07% to 95.65%.This is mainly because the alpine land cover condition is relatively simple and homogeneous.Soil has the lowest accuracy (91.07%) for both PA and UA.It can be partly explained by the fact that the spectral response from the soil has some signal confusion with low grass coverage and shallow water.For the overall accuracy and kappa coefficient, their values are 93.34% and 0.9, respectively.In addition, the retrieval accuracy and efficiency of the classification map were further validated with ground-based measurements which were taken in the satellite-aircraft-ground synchronous experiment over the transect.The high accuracy of classification map also shows the consistent with the field conditions.The accuracy assessment generally confirms the reliability of the classification result to be used to produce the reference SIP on Landsat-8 OLI pixels for further analysis.
Remote Sens. 2017, 9, 31 10 of 22 also shows the consistent with the field conditions.The accuracy assessment generally confirms the reliability of the classification result to be used to produce the reference SIP on Landsat-8 OLI pixels for further analysis.To obtain the reference SIP map, the aforementioned classification map was then overlaid on the 30 m grids of the Landsat data to calculate the SIP at the 30 m resolution using Equation (1).The pixel values of the SIP map range from 0% (non-water surface) to 100% (water surface) (Figure 7).From the statistical histogram of the SIP value distribution, the water surface and non-water surface at a 30 m resolution occupy a large amount of the map.Thus, 75.82% of the pixels have an SIP value of less than 5%, and 6.27% of the pixels have an SIP value that is greater than 95%.
As mentioned above, there are too many pixels located at the two ends (close to 0% and 100% of the SIP value).If all the pixels are involved in the estimation process, the easily recognizable water and non-water of the methods will result in high accuracy for the water and non-water estimation.The influence of the large amount of these pixels in the map will ultimately result in an unrealistically high accuracy of the overall accuracy assessment for the methods.It is clear that the results cannot represent the real performance of the methods.It is important for the methods to estimate the correct SIP value of the surface with the water surface partly covered.Therefore, a stratified sampling method was adopted in this study according to the distribution of the SIP.In the stratified sampling method, the different binning ranges were chosen based on a heuristic analysis of the SIP predictions to ensure that each SIP bin represents spectral and spatial variability of different SIP values.To obtain the reference SIP map, the aforementioned classification map was then overlaid on the 30 m grids of the Landsat data to calculate the SIP at the 30 m resolution using Equation (1).The pixel values of the SIP map range from 0% (non-water surface) to 100% (water surface) (Figure 7).From the statistical histogram of the SIP value distribution, the water surface and non-water surface at a 30 m resolution occupy a large amount of the map.Thus, 75.82% of the pixels have an SIP value of less than 5%, and 6.27% of the pixels have an SIP value that is greater than 95%.As mentioned above, there are too many pixels located at the two ends (close to 0% and 100% of the SIP value).If all the pixels are involved in the estimation process, the easily recognizable water and non-water of the methods will result in high accuracy for the water and non-water estimation.The influence of the large amount of these pixels in the map will ultimately result in an unrealistically high accuracy of the overall accuracy assessment for the methods.It is clear that the results cannot represent the real performance of the methods.It is important for the methods to estimate the correct SIP value of the surface with the water surface partly covered.Therefore, a stratified sampling method was adopted in this study according to the distribution of the SIP.In the stratified sampling method, the different binning ranges were chosen based on a heuristic analysis of the SIP predictions to ensure that each SIP bin represents spectral and spatial variability of different SIP values.

Correlation Analysis between the Reference SIP and the Landsat-8 OLI Spectral Data
A correlation analysis was conducted to study the relationships between the SIP and the surface reflectance of individual Landsat bands and the spectral indices.As shown in Table 3, EVI had the highest R 2 value, with a 75% SIP variance explained by this variable.The bands 5, 6, 7, NDWI_2, and MNDWI were also highly correlated with SIP, with the R 2 value above 0.6.NDWI_1 and NDWI_3 had the lowest correlation with SIP (0.26 and 0.26, respectively).The correlation analysis provides a general idea about the relationship between SIP and the variables, giving useful information related to the choice of variables that are more suitable to be used in the SIP estimation.

SIP Estimation with LSU Method
The LSU method has been directly used to uncouple the reflectance of each image pixel into the actual fractional cover of the components on the ground [15].Therefore, all the reference SIP values with the water surface partially included were used to validate the predictive performance of LSU method.Figure 8 shows the correlation between the Landsat-derived SIP under LSU method and reference SIP.The regression line reveals that the predicted SIP based on LSU method fits well with the SIP calculated from the UAV data with an R 2 value of 0.869 and RMSE value of 10.55%.

Correlation Analysis between the Reference SIP and the Landsat-8 OLI Spectral Data
A correlation analysis was conducted to study the relationships between the SIP and the surface reflectance of individual Landsat bands and the spectral indices.As shown in Table 3, EVI had the highest R 2 value, with a 75% SIP variance explained by this variable.The bands 5, 6, 7, NDWI_2, and MNDWI were also highly correlated with SIP, with the R 2 value above 0.6.NDWI_1 and NDWI_3 had the lowest correlation with SIP (0.26 and 0.26, respectively).The correlation analysis provides a general idea about the relationship between SIP and the variables, giving useful information related to the choice of variables that are more suitable to be used in the SIP estimation.

SIP Estimation with LSU Method
The LSU method has been directly used to uncouple the reflectance of each image pixel into the actual fractional cover of the components on the ground [15].Therefore, all the reference SIP values with the water surface partially included were used to validate the predictive performance of LSU method.Figure 8 shows the correlation between the Landsat-derived SIP under LSU method and reference SIP.The regression line reveals that the predicted SIP based on LSU method fits well with the SIP calculated from the UAV data with an R 2 value of 0.869 and RMSE value of 10.55%.

LSU-derived SIP(%)
UAV-based reference SIP(%) Figure 8.The R 2 and root mean squared error (RMSE) of the relationships between the UAV-based reference and Landsat-predicted SIP using LSU method in 2014.The red dotted lines are the 1:1 lines, and the blue solid lines are linear regressions between the predicted and UAV-based reference SIP.

SIP estimation with ANN and RT Method
According to the results from the correlation analysis in Section 3.2, 13 predictor variables (band 1 to band 7, EVI, NDVI, NDWI_1, NDWI_2, NDWI_3, and MNDWI) were selected to be used in ANN and RT methods.In the estimation process, 70% of the UAV-derived SIPs were randomly selected as training data for both methods.The remaining 30% were used as validation data to evaluate the models derived from ANN and RT method with the training data.

(1) ANN performance
Figure 9 shows the correlation between the Landsat-derived SIP using ANN method and the reference SIP.The regression line reveals that the predicted accuracy of ANN method is better than LSU method.From the prediction results and with the training data shown in Figure 9a, the result shows a significant improvement in the overall performance, with an R 2 value of 0.927 and RMSE value of 8.95%, when compared to LSU method.When the estimation model, generated by ANN method, was applied to the validation data, a similar result can be obtained with a slightly lower R 2 (0.926) and RMSE (8.87%) (Figure 9b).

SIP estimation with ANN and RT Method
According to the results from the correlation analysis in Section 3.2, 13 predictor variables (band 1 to band 7, EVI, NDVI, NDWI_1, NDWI_2, NDWI_3, and MNDWI) were selected to be used in ANN and RT methods.In the estimation process, 70% of the UAV-derived SIPs were randomly selected as training data for both methods.The remaining 30% were used as validation data to evaluate the models derived from ANN and RT method with the training data.

(1) ANN performance
Figure 9 shows the correlation between the Landsat-derived SIP using ANN method and the reference SIP.The regression line reveals that the predicted accuracy of ANN method is better than LSU method.From the prediction results and with the training data shown in Figure 9a, the result shows a significant improvement in the overall performance, with an R 2 value of 0.927 and RMSE value of 8.95%, when compared to LSU method.When the estimation model, generated by ANN method, was applied to the validation data, a similar result can be obtained with a slightly lower R 2 (0.926) and RMSE (8.87%) (Figure 9b).

LSU-derived SIP(%)
UAV-based reference SIP(%) Figure 8.The R 2 and root mean squared error (RMSE) of the relationships between the UAV-based reference and Landsat-predicted SIP using LSU method in 2014.The red dotted lines are the 1:1 lines, and the blue solid lines are linear regressions between the predicted and UAV-based reference SIP.

SIP estimation with ANN and RT Method
According to the results from the correlation analysis in Section 3.2, 13 predictor variables (band 1 to band 7, EVI, NDVI, NDWI_1, NDWI_2, NDWI_3, and MNDWI) were selected to be used in ANN and RT methods.In the estimation process, 70% of the UAV-derived SIPs were randomly selected as training data for both methods.The remaining 30% were used as validation data to evaluate the models derived from ANN and RT method with the training data.

(1) ANN performance
Figure 9 shows the correlation between the Landsat-derived SIP using ANN method and the reference SIP.The regression line reveals that the predicted accuracy of ANN method is better than LSU method.From the prediction results and with the training data shown in Figure 9a, the result shows a significant improvement in the overall performance, with an R 2 value of 0.927 and RMSE value of 8.95%, when compared to LSU method.When the estimation model, generated by ANN method, was applied to the validation data, a similar result can be obtained with a slightly lower R 2 (0.926) and RMSE (8.87%) (Figure 9b).(

2) RT performance
Regarding the performance of RT model that was used to estimate the SIP of the transect, similar patterns were found in their estimation accuracies with ANN method (Figure 10a).However, from the training data evaluation result, it can be seen that RT method had a higher R 2 value (0.935) and lower RMSE value (8.54%).When the estimation results were validated by the remaining 30% reference SIP, the R 2 value improved slightly from 0.926 to 0.933, and the RMSE reduced from 8.87% to 8.73% when compared to the validation results of ANN method (Figures 9b and 10b).From the compared statistics among the above three methods, it can be concluded that RT method is able to derive the best results when evaluated either by the training or the validation data.
Remote Sens. 2017, 9, 31 13 of 22 (2) RT performance Regarding the performance of RT model that was used to estimate the SIP of the transect, similar patterns were found in their estimation accuracies with ANN method (Figure 10a).However, from the training data evaluation result, it can be seen that RT method had a higher R 2 value (0.935) and lower RMSE value (8.54%).When the estimation results were validated by the remaining 30% reference SIP, the R 2 value improved slightly from 0.926 to 0.933, and the RMSE reduced from 8.87% to 8.73% when compared to the validation results of ANN method (Figures 9b and 10b).From the compared statistics among the above three methods, it can be concluded that RT method is able to derive the best results when evaluated either by the training or the validation data.

Spatial Pattern of the SIP Estimation Error Using LSU, ANN, and RT Methods
To understand the spatial performance of the three models, LSU, ANN, and RT methods were used to obtain the spatial patterns of the SIP map of the transect, respectively (Figure 11a-c).A common SIP pattern can be derived from the transect to depict the SIP variation with the water gradient.
Based on the validation with the UAV-based reference SIP, the absolute differences between the reference SIP and the Landsat-derived SIP on the three methods were calculated and shown in Figure 11d-f, respectively.A common phenomenon can be observed for these three methods in which the transition areas between the water and the grassland have high uncertainty in the SIP estimation.The MAEs of the predictions are 4.97%, 4.90%, and 3.49% for LSU, ANN, and RT methods, respectively.The LSU method shows high uncertainty in the peatland SIP estimation because of the mixture of the spectra of shallow water and the peat soil (Figure 11d).This is consistent with the scatterplot in Figure 8, in which the SIP is generally overestimated when the value is lower than 20%.Although ANN method performs better on the SIP estimation than LSU method according to the scatterplot shown in Figure 9, high uncertainties still exist for the bare soil area (Figure 11e).When compared to the other two methods, RT method greatly outperforms the other methods and makes the absolute error smaller.
In addition to the absolute error map, Figures 12a-c present the estimation error histogram of these three methods.The negative value of the SIP error indicates that the method has an overestimation in the SIP.The mean (ME) and standard deviation (SD) values of the SIP error are provided for each method to compare the method performance.The smaller the ME and SD values are, the better the method's performance is.From the ME values, it can be concluded that the estimation results from the three methods, and all show a slight overestimation with the ME values of −3.17%, −2.89%, and −1.57%, respectively.The results from the comparison suggest that the estimation with RT method is highly correlated with the reference SIP map and with the smallest ME (−1.57%) and SD (5.55%).

Spatial Pattern of the SIP Estimation Error Using LSU, ANN, and RT Methods
To understand the spatial performance of the three models, LSU, ANN, and RT methods were used to obtain the spatial patterns of the SIP map of the transect, respectively (Figure 11a-c).A common SIP pattern can be derived from the transect to depict the SIP variation with the water gradient.
Based on the validation with the UAV-based reference SIP, the absolute differences between the reference SIP and the Landsat-derived SIP on the three methods were calculated and shown in Figure 11d-f, respectively.A common phenomenon can be observed for these three methods in which the transition areas between the water and the grassland have high uncertainty in the SIP estimation.The MAEs of the predictions are 4.97%, 4.90%, and 3.49% for LSU, ANN, and RT methods, respectively.The LSU method shows high uncertainty in the peatland SIP estimation because of the mixture of the spectra of shallow water and the peat soil (Figure 11d).This is consistent with the scatterplot in Figure 8, in which the SIP is generally overestimated when the value is lower than 20%.Although ANN method performs better on the SIP estimation than LSU method according to the scatterplot shown in Figure 9, high uncertainties still exist for the bare soil area (Figure 11e).When compared to the other two methods, RT method greatly outperforms the other methods and makes the absolute error smaller.
In addition to the absolute error map, Figure 12a-c present the estimation error histogram of these three methods.The negative value of the SIP error indicates that the method has an overestimation in the SIP.The mean (ME) and standard deviation (SD) values of the SIP error are provided for each method to compare the method performance.The smaller the ME and SD values are, the better the method's performance is.From the ME values, it can be concluded that the estimation results from the three methods, and all show a slight overestimation with the ME values of −3.17%, −2.89%, and −1.57%, respectively.The results from the comparison suggest that the estimation with RT method is highly correlated with the reference SIP map and with the smallest ME (−1.57%) and SD (5.55%).In conclusion, comparisons between the spatial patterns of the estimation results of the three methods suggest that RT method can provide more reasonable estimation results, with the errors falling with in a relatively small range.There are few areas related to the over/under estimation of this method.Most of the remaining errors are often located around small inundation patches or along the edge of large patches, which appeared to be associated with residual mis-registration errors between the UAV data and Landsat image (see more details in Section 4.4).

Comparison of the Performance of the Three Methods
The results of comparison between the SIP estimation results and the UAV-based reference SIP revealed that RT method outperformed LSU and ANN methods with the highest R 2 (0.933) and the lowest RMSE (8.73%) when the derived model was evaluated by the validation data.The LSU and ANN methods showed relatively poorer performance with RMSE values of 10.55% and 8.87%, respectively.
For LSU method, the optimal selection of endmembers was the most essential part in the application of LSU, and the endmember should comprehensively represent local land cover with high accuracy.Although the land cover condition of this study area seems to be homogeneous at the satellite pixel scale, it is still highly complex due to the differences in the inundation condition.Taking the surface condition shown in Figure 2a as an example, Grass2 is an emergent aquatic plant in or near the lake.Although the grass coverage of Grass2 accounts for 100% of the UAV image on the pixel of the OLI image, the water can be seen beneath the grass from the nearby environment.Figure 13 shows the spectra of these two grass types (dry grass and waterlogged grass) as measured by a spectrometer in the field.It was found that the spectra are quite different, although they maintain the spectral pattern of the vegetation.Therefore, it is reasonable to select Grass2 as an independent endmember because this type of grass cannot be simply unmixed with Grass1 and water spectra.Although the grass was separated into Grass1 and Grass2 to consider the waterlogged effect based on the UAV image, the mixture of Grass1 and Grass2 within the relatively coarse Landsat-8 OLI pixel causes a significant difficulties in defining or selecting a pure endmember.Consequently, this spectral confusion would lead to incorrect unmixing between them and introduce high uncertainty in the final SIP calculation.In addition, the linear assumption of LSU method is another factor that influences the accuracy.In conclusion, comparisons between the spatial patterns of the estimation results of the three methods suggest that RT method can provide more reasonable estimation results, with the errors falling with in a relatively small range.There are few areas related to the over/under estimation of this method.Most of the remaining errors are often located around small inundation patches or along the edge of large patches, which appeared to be associated with residual mis-registration errors between the UAV data and Landsat image (see more details in Section 4.4).

Comparison of the Performance of the Three Methods
The results of comparison between the SIP estimation results and the UAV-based reference SIP revealed that RT method outperformed LSU and ANN methods with the highest R 2 (0.933) and the lowest RMSE (8.73%) when the derived model was evaluated by the validation data.The LSU and ANN methods showed relatively poorer performance with RMSE values of 10.55% and 8.87%, respectively.
For LSU method, the optimal selection of endmembers was the most essential part in the application of LSU, and the endmember should comprehensively represent local land cover with high accuracy.Although the land cover condition of this study area seems to be homogeneous at the satellite pixel scale, it is still highly complex due to the differences in the inundation condition.Taking the surface condition shown in Figure 2a as an example, Grass2 is an emergent aquatic plant in or near the lake.Although the grass coverage of Grass2 accounts for 100% of the UAV image on the pixel of the OLI image, the water can be seen beneath the grass from the nearby environment.Figure 13 shows the spectra of these two grass types (dry grass and waterlogged grass) as measured by a spectrometer in the field.It was found that the spectra are quite different, although they maintain the spectral pattern of the vegetation.Therefore, it is reasonable to select Grass2 as an independent endmember because this type of grass cannot be simply unmixed with Grass1 and water spectra.Although the grass was separated into Grass1 and Grass2 to consider the waterlogged effect based on the UAV image, the mixture of Grass1 and Grass2 within the relatively coarse Landsat-8 OLI pixel causes a significant difficulties in defining or selecting a pure endmember.Consequently, this spectral confusion would lead to incorrect unmixing between them and introduce high uncertainty in the final SIP calculation.In addition, the linear assumption of LSU method is another factor that influences the accuracy.Compared to LSU method, ANN and RT methods do not need to select endmembers and thus avoid the uncertainty in this process.Meanwhile, the nonlinear relationship established with these two methods has a greater advantage than the linear assumption in LSU method.Additionally, the stratified sampling method applied in the training data selection greatly reduces the impact from the non-uniform distribution of the SIP values.Therefore, the estimation results of ANN and RT methods are generally better than those of LSU method.
Although ANN method is a widely used algorithm and is successful in retrieving the SIP with acceptable accuracy in this study, there is no complete unified theory that can determine the structure of the network.First, the number of hidden layers and hidden-layer nodes need to be significant, which is impacted by many factors, including the training samples and the number of input/output nodes.It was proven that a single hidden layer of networks could enable arbitrary nonlinear mapping simply by increasing the neurons.Therefore, a trial-and-error technique was used to find the appropriate number of hidden-layer nodes.The whole process of calibration was repeated 1000 times to avoid random errors related to the random initialization at the start of the optimization, and the average results of the 1000 replicates were chosen.Second, the proper setup of parameters for ANN method still faces challenges.Although some heuristics have been developed for designing and implementing ANN, these methods were not straightforward.For instance, the selection of too great a learning rate could make the model unstable, whereas the selection of too small of a learning rate would result in a locally optimal phenomenon [10].Although the final network was chosen by trial and error, the selection of suitable network architecture is an important issue and needs further research in the future.It also greatly affects the application of ANN method in wetland inundation monitoring.
Meanwhile, RT method can approximate complex nonlinear relations by partitioning a dataset into subsets, and the relationship within each subset can be simplified to a linear model [5,13].In general, it outperformed others in all cases, which was not only more effective than simple techniques, such as multivariate linear regression and LSU, but was also easier to comprehend than the neural networks.This conclusion was consistent with previous studies [5,45].Because RT method was mostly automated and required only limited computing time, this method shows the increasing potential for mapping SIP in large area cost effectively.

Validation of the Best Prediction Model in 2013
As concluded in Section 3, RT method outperformed both LSU and ANN methods, and the prediction model generated with RT method in 2014 was selected to produce the SIP map in 2013 to Compared to LSU method, ANN and RT methods do not need to select endmembers and thus avoid the uncertainty in this process.Meanwhile, the nonlinear relationship established with these two methods has a greater advantage than the linear assumption in LSU method.Additionally, the stratified sampling method applied in the training data selection greatly reduces the impact from the non-uniform distribution of the SIP values.Therefore, the estimation results of ANN and RT methods are generally better than those of LSU method.
Although ANN method is a widely used algorithm and is successful in retrieving the SIP with acceptable accuracy in this study, there is no complete unified theory that can determine the structure of the network.First, the number of hidden layers and hidden-layer nodes need to be significant, which is impacted by many factors, including the training samples and the number of input/output nodes.It was proven that a single hidden layer of networks could enable arbitrary nonlinear mapping simply by increasing the neurons.Therefore, a trial-and-error technique was used to find the appropriate number of hidden-layer nodes.The whole process of calibration was repeated 1000 times to avoid random errors related to the random initialization at the start of the optimization, and the average results of the 1000 replicates were chosen.Second, the proper setup of parameters for ANN method still faces challenges.Although some heuristics have been developed for designing and implementing ANN, these methods were not straightforward.For instance, the selection of too great a learning rate could make the model unstable, whereas the selection of too small of a learning rate would result in a locally optimal phenomenon [10].Although the final network was chosen by trial and error, the selection of suitable network architecture is an important issue and needs further research in the future.It also greatly affects the application of ANN method in wetland inundation monitoring.
Meanwhile, RT method can approximate complex nonlinear relations by partitioning a dataset into subsets, and the relationship within each subset can be simplified to a linear model [5,13].In general, it outperformed others in all cases, which was not only more effective than simple techniques, such as multivariate linear regression and LSU, but was also easier to comprehend than the neural networks.This conclusion was consistent with previous studies [5,45].Because RT method was mostly automated and required only limited computing time, this method shows the increasing potential for mapping SIP in large area cost effectively.

Validation of the Best Prediction Model in 2013
As concluded in Section 3, RT method outperformed both LSU and ANN methods, and the prediction model generated with RT method in 2014 was selected to produce the SIP map in 2013 to evaluate the model performance.Similar to the comparison study with the data in 2014, a reference SIP map was also generated with the UAV image.The prediction model was then applied directly to the Landsat-8 OLI in 2013 to obtain the predicted SIP map in 2013, and the estimation results were validated by the reference SIP map in 2013.To present the general feature of the estimation results, the UAV-based SIP values of 2013 were divided into 2% bins.The mean UAV-based and Landsat-predicted SIP values were then calculated separately for each bin with the corresponding standard deviation.Figure 14 presents the linear fit results between the mean SIP values from the estimation results and the reference SIP.A relatively good correlation was found between the two datasets, with R 2 at 0.986 and RMSE at 9.84%, and the standard deviations of the 2% bins were generally within 20%.However, considerable overestimation and minor underestimation can be observed in the low SIP value region and high SIP value region, respectively.The reasons can be partly explained by the fact that the acquisition date of the UAV image was 8 days after the Landsat-8 OLI observation time.During these eight days, there were four days with rain, according to the meteorological data, which induced the inconsistency between the estimation results and the reference data.Meanwhile, the Landsat-8 OLI data of the study area in 2013 was partly covered by a thin cloud that might distort the spectra of the land surface and influence the accuracy of the SIP estimation.
Remote Sens. 2017, 9, 31 17 of 22 evaluate the model performance.Similar to the comparison study with the data in 2014, a reference SIP map was also generated with the UAV image.The prediction model was then applied directly to the Landsat-8 OLI in 2013 to obtain the predicted SIP map in 2013, and the estimation results were validated by the reference SIP map in 2013.To present the general feature of the estimation results, the UAV-based SIP values of 2013 were divided into 2% bins.The mean UAV-based and Landsat-predicted SIP values were then calculated separately for each bin with the corresponding standard deviation.Figure 14 presents the linear fit results between the mean SIP values from the estimation results and the reference SIP.A relatively good correlation was found between the two datasets, with R 2 at 0.986 and RMSE at 9.84%, and the standard deviations of the 2% bins were generally within 20%.However, considerable overestimation and minor underestimation can be observed in the low SIP value region and high SIP value region, respectively.The reasons can be partly explained by the fact that the acquisition date of the UAV image was 8 days after the Landsat-8 OLI observation time.During these eight days, there were four days with rain, according to the meteorological data, which induced the inconsistency between the estimation results and the reference data.Meanwhile, the Landsat-8 OLI data of the study area in 2013 was partly covered by a thin cloud that might distort the spectra of the land surface and influence the accuracy of the SIP estimation.

Application in the Zoige Wetland National Nature Reserve
Validation regarding the above section confirmed the strong capabilities of RT method in SIP estimation.Therefore, RT method was applied to generate the SIP map for the Zoige Wetland National Nature Reserve in 2014.The predicted result shown in Figure 15 presents a reasonable spatial pattern of the SIP associated with the spatial distribution of the major wetlands.Because of the lack of field investigation of the wetland inundation condition during the satellite overpass time, it is hard to validate the estimated SIP map of the whole region.However, from the comparison between the original Landsat-8 true color image and the estimated SIP in the three zoom-in windows, it is clear that an obvious change in the gradient of the SIP did exist in all three windows from the grassland to the water bodies (lake and river).The lake and river are predicted with the highest SIP values.When the 30 m pixel of Landsat-8 data cover the edge of the water body, the SIP values of the pixels are approximately 50%.The reasonability of the SIP estimation can also be reflected from the width change of the river shown in Figure 15C, F. When the width of the river is greater than one Landsat pixel (30 m), the SIP is 100%.In other cases, the SIP values vary from 0% to 100%.The SIP of the small branch in the figure below confirms the result derived from the 2014 RT model.

Application in the Zoige Wetland National Nature Reserve
Validation regarding the above section confirmed the strong capabilities of RT method in SIP estimation.Therefore, RT method was applied to generate the SIP map for the Zoige Wetland National Nature Reserve in 2014.The predicted result shown in Figure 15 presents a reasonable spatial pattern of the SIP associated with the spatial distribution of the major wetlands.Because of the lack of field investigation of the wetland inundation condition during the satellite overpass time, it is hard to validate the estimated SIP map of the whole region.However, from the comparison between the original Landsat-8 true color image and the estimated SIP in the three zoom-in windows, it is clear that an obvious change in the gradient of the SIP did exist in all three windows from the grassland to the water bodies (lake and river).The lake and river are predicted with the highest SIP values.When the 30 m pixel of Landsat-8 data cover the edge of the water body, the SIP values of the pixels are approximately 50%.The reasonability of the SIP estimation can also be reflected from the width change of the river shown in Figure 15C, F. When the width of the river is greater than one Landsat pixel (30 m), the SIP is 100%.In other cases, the SIP values vary from 0% to 100%.The SIP of the small branch in the figure below confirms the result derived from the 2014 RT model.

Uncertainty Analysis
The performance comparison and the application study indicated that the proposed approach based on RT method is a good way to integrate the UAV image and Landsat-8 images for the wetland inundation mapping.However, this approach is still highly influenced by two uncertainties.
(1) Spectral uncertainty of the UAV data Although the high spatial resolution of the UAV data greatly helps the classification of the land cover condition, the results from classification assessment on accuracy reflected that the land cover map based on UAV data achieved a high accuracy level with an overall accuracy of 93.64% and kappa coefficients of 0.9.However, some misclassifications occurred during the process.As a typical highland peatland, the dark color of the peat soil of the Zoige wetland caused some confusion with the water surface due to the similar spectral characteristics.The emergent grass in shallow water will

Uncertainty Analysis
The performance comparison and the application study indicated that the proposed approach based on RT method is a good way to integrate the UAV image and Landsat-8 images for the wetland inundation mapping.However, this approach is still highly influenced by two uncertainties.
(1) Spectral uncertainty of the UAV data Although the high spatial resolution of the UAV data greatly helps the classification of the land cover condition, the results from classification assessment on accuracy reflected that the land cover map based on UAV data achieved a high accuracy level with an overall accuracy of 93.64% and kappa coefficients of 0.9.However, some misclassifications occurred during the process.As a typical highland peatland, the dark color of the peat soil of the Zoige wetland caused some confusion with the water surface due to the similar spectral characteristics.The emergent grass in shallow water will also cause difficulty in determining whether it is water surface or grassland.In addition, the three bands of the true color UAV images can only provide limited spectral information about the land surface.The commonly used NDVI and NDWI, which are important for vegetation and water extraction, cannot be calculated from these three bands.Consequently, these factors introduce some uncertainties in the classification results and influence the reference SIP accuracy.In the future, we can develop an approach that can obtain the portion of each class of the UAV mixed pixel or obtain multispectral UAV images.
(2) Uncertainty of geolocation matching between the UAV and satellite image Although the GCPs were collected through real-time kinematic (RTK) to perform the UAV image geometric correction, they also inherit sub-meter levels of error.Meanwhile, the geolocation accuracy of the Landsat-8 OLI image is scarcely better than 0.4 pixels [32].To match the two datasets, the commonly used image-to-image registration was first performed by using the easily distinguished houses, road intersections, and white plates.Due to the one order of magnitude difference in the spatial resolution between the two images, the further co-registration was implemented by manually adjusting the Landsat-8 OLI image according to the coherence feature in the spatial spectral-spatial feature based on the UAV image.It is obvious that the subjective adjustment of the Landsat-8 OLI image cannot fully handle the geometric errors between two datasets, and the mismatch will finally induce the pixel inundation condition, which calculated from the UAV image, cannot represent the real condition of the corresponding Landsat pixel.In addition, a likely source of error is caused by the submerged plant, which is frequently distributed over the transition of the water and land.Therefore, there are overestimations in low value and underestimations in high value (Figure 12a-c), and most of the errors are located around small inundation patches or along the edges of larger patches (Figure 11d-f).Despite the existence of underestimation and overestimation, the proposed method still provides useful information on the hydrology of small wetlands (Figure 15).

Conclusions
Reliable and updated inundation information is essential to improving the conservation and management of wetlands to ensure their provision for sustained ecosystem services.In this study, we compared the ability of three common methods (LSU, ANN, and RT) to extract the SIP information with the combination use of the Landsat-8 data and the UAV image.RT method outperformed the other two methods when the method was applied to the validation data in 2014, with the R 2 at 0.933 and the RMSE at 8.73%.The results from the spatial pattern analysis on the estimation using the three methods also indicated the good performance of RT method.Data from the comparison study suggested that the introduction of a UAV image with RT method is appropriate for wetland SIP investigation.To further validate the proposed method, the model was applied to 2013.The result revealed that RT method could yield the SIP values with a high correlation (R 2 = 0.986) and low RMSE (9.84%) with the reference SIP.Meanwhile, the application of the method for the Zoige Wetland National Nature Reserve SIP mapping also confirmed that the method was useful in obtaining data on the regional distribution of the SIP.
With the successful operation of the Landsat-8 satellite, global Landsat data will be acquired continuously with no additional charge.At the same time, the UAV is more suitable for obtaining rapid observations since it provides the advantages of low cost, flexible launch and landing options, safety, and ultra-high spatial image resolution.Such data will be increasingly adopted with more applications.Therefore, the approach developed in this study has the potential to track previous inundation changes and continuous monitoring of ongoing and future inundation changes in more areas than those demonstrated in this study.Finer SIP maps and the change products derived under RT method may be useful in improving the model studies on wetland hydrology, evapotranspiration, and stream runoff.The related research is still ongoing, and further investigation is urgently needed.More UAV images will be provided over a longer period when the proposed model is further verified using different spatial and temporal scales.

Figure 1 .
Figure 1.Location of the Zoige Plateau (a); the Zoige Wetland National Nature Reserve (ZNWNR) (b); and the typical wetland degradation transect studied in this work (c).

Figure 2 .
Figure 2. Photographs of the typical land cover types over the transect: (a) permanent wetland; (b,c) intermittent wetland in wet and dry condition; and (d) dry grassland.

Figure 1 . 22 Figure 1 .
Figure 1.Location of the Zoige Plateau (a); the Zoige Wetland National Nature Reserve (ZNWNR) (b); and the typical wetland degradation transect studied in this work (c).

Figure 2 .
Figure 2. Photographs of the typical land cover types over the transect: (a) permanent wetland; (b,c) intermittent wetland in wet and dry condition; and (d) dry grassland.

Figure 2 .
Figure 2. Photographs of the typical land cover types over the transect: (a) permanent wetland; (b,c) intermittent wetland in wet and dry condition; and (d) dry grassland.

Figure 4 .
Figure 4.The UAV image of the transect in Zoige wetland, acquired in July 2014.A and B are the small patches of grass in the lake, C is the edge of lake, and D and E show the area with mixed water and grass.

Figure 4 .
Figure 4.The UAV image of the transect in Zoige wetland, acquired in July 2014.A and B are the small patches of grass in the lake, C is the edge of lake, and D and E show the area with mixed water and grass.

Figure 5 .
Figure 5. Training and validation errors in the SIP estimation associated with the change of neurons in the hidden layer.

Figure 5 .
Figure 5. Training and validation errors in the SIP estimation associated with the change of neurons in the hidden layer.

Figure 6 .
Figure 6.Classification map of the transect based on the UAV image acquired in July 2014.

Figure 6 .
Figure 6.Classification map of the transect based on the UAV image acquired in July 2014.

Figure 7 .
Figure 7.The reference SIP map of the transect in 2014 derived from the UAV classification map.

Figure 7 .
Figure 7.The reference SIP map of the transect in 2014 derived from the UAV classification map.

Figure 9 .
Figure 9.The R 2 and RMSE of the relationships between the reference SIP and Landsat-predicted SIP, under ANN in 2014 for the: training data (a); and validation data (b).The red dotted lines are the 1:1 lines, and the blue solid lines are linear regressions between the predicted and UAV-based reference SIP.

Figure 8 .
Figure8.The R 2 and root mean squared error (RMSE) of the relationships between the UAV-based reference and Landsat-predicted SIP using LSU method in 2014.The red dotted lines are the 1:1 lines, and the blue solid lines are linear regressions between the predicted and UAV-based reference SIP.

Figure 9 .
Figure 9.The R 2 and RMSE of the relationships between the reference SIP and Landsat-predicted SIP, under ANN in 2014 for the: training data (a); and validation data (b).The red dotted lines are the 1:1 lines, and the blue solid lines are linear regressions between the predicted and UAV-based reference SIP.

Figure 9 .
Figure 9.The R 2 and RMSE of the relationships between the reference SIP and Landsat-predicted SIP, under ANN in 2014 for the: training data (a); and validation data (b).The red dotted lines are the 1:1 lines, and the blue solid lines are linear regressions between the predicted and UAV-based reference SIP.

Figure 10 .
Figure 10.The R 2 and RMSE of the relationships between the reference and Landsat-predicted SIP using RT in 2014 for the: training data (a); and validation data (b).The red dotted lines are the 1:1 lines, and the blue solid lines are linear regressions between the predicted and UAV-based reference SIP.

Figure 10 .
Figure 10.The R 2 and RMSE of the relationships between the reference and Landsat-predicted SIP using RT in 2014 for the: training data (a); and validation data (b).The red dotted lines are the 1:1 lines, and the blue lines are linear regressions between the predicted and UAV-based reference SIP.

Figure 11 .
Figure 11.Spatial distribution of the estimated SIP map using LSU, ANN, and RT methods left (a-c); and the corresponding absolute SIP estimation error when compared to the reference SIP map in July 2014 right (d-f).

Figure 11 . 22 Figure 11 .Figure 12 .
Figure 11.Spatial distribution of the estimated SIP map using LSU, ANN, and RT methods left (a-c); and the corresponding absolute SIP estimation error when compared to the reference SIP map in July 2014 right (d-f).

Figure 12 .
Figure 12.The SIP estimation error histogram based on the differences between the reference SIP and the estimated SIP using LSU (a), ANN (b) and RT (c) methods.SD and ME represent the standard deviation and mean value of the SIP error, respectively.

Figure 13 .
Figure 13.Spectra of the dry grass and waterlogged grass as measured by the field spectrometer.

Figure 13 .
Figure 13.Spectra of the dry grass and waterlogged grass as measured by the field spectrometer.

Figure 14 .
Figure 14.Comparison of the mean SIP values of RT prediction for 2013, with the mean reference SIP values within 2% bins of the 2013 reference SIP map.The mean prediction and its standard deviation in each 2% bin are shown in as a black dot and a gray bar, respectively.The red dotted and blue solid lines represent the 1:1 line and the fitted line.

Figure 14 .
Figure 14.Comparison of the mean SIP values of RT prediction for 2013, with the mean reference SIP values within 2% bins of the 2013 reference SIP map.The mean prediction and its standard deviation in each 2% bin are shown in as a black dot and a gray bar, respectively.The red dotted and blue solid lines represent the 1:1 line and the fitted line.

Figure 15 .
Figure 15.The SIP map of the Zoige Wetland National Nature Reserve that was derived based on the Landsat-8 OLI data using the prediction model with RT method in 2014.The OLI image (1) and three zoom-in windows (A-C) are shown with bands 6, 5, and 4 in red, green, and blue, respectively; and the corresponding SIP estimation of the SIP map (2) and three regions of interest (D-F).

Figure 15 .
Figure 15.The SIP map of the Zoige Wetland National Nature Reserve that was derived based on the Landsat-8 OLI data using the prediction model with RT method in 2014.The OLI image (1) and three zoom-in windows (A-C) are shown with bands 6, 5, and 4 in red, green, and blue, respectively; and the corresponding SIP estimation of the SIP map (2) and three regions of interest (D-F).

Table 2 .
Confusion matrix and accuracy estimates for the classified map.
Note: The numbers of correctly classified testing samples are in boldface.

Table 2 .
Confusion matrix and accuracy estimates for the classified map.
Note: The numbers of correctly classified testing samples are in boldface.

Table 3 .
Coefficient determination (R 2 ) of the linear relationship between the SIP and Landsat bands and derived indices.

Table 3 .
Coefficient determination (R 2 ) of the linear relationship between the SIP and Landsat bands and derived indices.