A Study of Spatial Soil Moisture Estimation Using a Multiple Linear Regression Model and MODIS Land Surface Temperature Data Corrected by Conditional Merging

This study attempts to estimate spatial soil moisture in South Korea (99,000 km2) from January 2013 to December 2015 using a multiple linear regression (MLR) model and the Terra moderate-resolution imaging spectroradiometer (MODIS) land surface temperature (LST) and normalized distribution vegetation index (NDVI) data. The MODIS NDVI was used to reflect vegetation variations. Observed precipitation was measured using the automatic weather stations (AWSs) of the Korea Meteorological Administration (KMA), and soil moisture data were recorded at 58 stations operated by various institutions. Prior to MLR analysis, satellite LST data were corrected by applying the conditional merging (CM) technique and observed LST data from 71 KMA stations. The coefficient of determination (R2) of the original LST and observed LST was 0.71, and the R2 of corrected LST and observed LST was 0.95 for 3 selected LST stations. The R2 values of all corrected LSTs were greater than 0.83 for total 71 LST stations. The regression coefficients of the MLR model were estimated seasonally considering the five-day antecedent precipitation. The p-values of all the regression coefficients were less than 0.05, and the R2 values were between 0.28 and 0.67. The reason for R2 values less than 0.5 is that the soil classification at each observation site was not completely accurate. Additionally, the observations at most of the soil moisture monitoring stations used in this study started in December 2014, and the soil moisture measurements did not stabilize. Notably, R2 and root mean square error (RMSE) in winter were poor, as reflected by the many missing values, and uncertainty existed in observations due to freezing and mechanical errors in the soil. Thus, the prediction accuracy is low in winter due to the difficulty of establishing an appropriate regression model. Specifically, the estimated map of the soil moisture index (SMI) can be used to better understand the severity of droughts with the variability of soil moisture.


Introduction
Soil moisture (SM) is an important state variable governing the partitioning of rainfall into runoff and water that infiltrates the soil.Although the water contained in soil is only a tiny fraction of all the water on the Earth, it influences important extreme events, such as floods and droughts [1].SM is existing soil water stored among soil particles and pores.Moreover, SM is a hydrometric factor that plays a crucial role in the exchange of water and energy in the land and atmosphere.Notably, SM has been studied in the agricultural field regarding plant growth, water resources field for rainfall-runoff, and meteorological field regarding interactions between the atmosphere and land [2].
There are multiple ways to estimate SM, including in situ networks and satellite remote sensing.Traditional in situ measurements provide valuable information on SM at different soil depths.Many field techniques are available, such as oven drying, neutron probe, time/frequency domain reflectometry (TDR/FDR), and capacitance measurements [3].TDR, tension-measuring, and gravimetric methods are available to measure SM through ground observations.However, these methods are expensive and time consuming when used on large areas.Most allow point measurements, which provide information on the SM content at specific points only.In addition, reliability of the point data is also poor due to the short observation period as well as many missing data.Therefore, in situ SM is regarded as the true value of SM and is commonly used as a reference to validate remotely sensed SM retrieval [4][5][6].However, in situ SM data of point scale are difficult to use as spatial SM [7][8][9] in large areas.
The estimation of spatial SM can be divided into direct measurement using microwave satellites and indirect measurement using land surface parameters related to SM without using microwave satellite data.First, microwave satellites are widely used because they allow the continuous observation of spatial SM over a wide area [10,11].However, there are several limitations to directly and immediately using SM data based on satellites: (1) the relatively low spatiotemporal resolution of the data; (2) the only regional optimization algorithms for satellite soil moisture estimation are available from the National Aeronautics and Space Administration (NASA), European Space Agency (ESA), and Japan Aerospace Exploration Agency (JAXA); (3) the satellite data have not been calibrated and verified; and (4) radio interference.Nevertheless, many studies have been actively conducted to calibrate spatial SM based on satellite data compared to point-based observations of SM over wide areas [12][13][14].For example, using the AMSR satellite, Kim et al. [15] obtained AMSR2 spatial SM data and performed a correction process using observed SM data.In addition, other studies have proposed microwave satellites SM downscaling method using MODIS and in situ data [16][17][18][19].Second, spatial SM can be estimated from land surface parameters such as LST and NDVI without using microwave satellite data [20].Moreover, LST has a unique relationship with spatial SM [21].Examples of applications as non-microwave satellites using moderate resolution imaging spectroradiometer (MODIS) LST data are as follows.Cai et al. [22] obtained SM retrieval form Terra MODIS L1B data.Kim et al. [23] improved the spatial SM of AMSE-E through the integration of MODIS land surface temperature (LST), the enhanced vegetation index (EVI) and albedo.MODIS data have been successfully used to increase the accuracy of spatial SM estimates.
Although MODIS LST has been used in various ways, MODIS LST should be regenerated using gap-filling and correction processes, which can convert global data to region data.Notably, daily MODIS LST datasets (MOD11_L2) have missing values due to clouds and atmospheric conditions [24].In some previous studies, MODIS LST was corrected by spatial interpolation and geostatistical methods [24,25].To overcome this issue, geostatistical methods (e.g., kriging, inverse distance weighting and spline methods) can be applied to ground measurements by matching the spatial scale with satellite-based products [26,27].In rainfall research, various strategies for combining satellite-based and observed data have been widely used to overcome the limitation of areal representativeness of point scale measurements and the high variability in satellite-based datasets.This method, which is called "conditional merging (CM)" in Sinclari et al. [28], uses the radar field to estimate the error associated with the ordinary kriging method based on rain gauges and to correct it [29].In general, previous studies that have used the CM method yielded reasonable results compared to ground-based measurements and exhibited improved spatial and temporal variability [30].
The overall goal of this study was to estimate spatial SM based on MODIS LST and normalized difference vegetation index (NDVI) data via multiple linear regression (MLR) analysis (Figure 1).The specific objectives of the study were as follows: (1) to correct MODIS LST using the CM method; (2) to develop the MLR model using corrected MODIS LST, MODIS NDVI data, and interpolated precipitation (PCP); (3) to generate the spatial distribution of SM using normalized regression coefficients; (4) to assess spatial SM estimates and (5) to compare the SM index (SMI) and standardized precipitation index (SPI) to evaluate the usability of spatial SM.

Materials and Methods
The study analyzed SM based on satellite data via correlation analysis.Principle component analysis was used to effectively select independent variables among the environmental attributes of SM.Finally, MLR model was developed using MODIS LST, MODIS NDVI, and PCP from various regression scenarios.In this study, various satellite data and observed point data were used with 1 km spatial resolution (Table 1).The soil map data (soil type, field capacity, and wilting point) obtained from the Korea Rural Development Administration (KRDA) were rasterized from a 1/25,000 vector map.

Satellite Data
The LST is a key variable used in a wide range for monitoring of surface radiation budget, climate change, hydrological cycle and ecosystems.In spite of the recognized importance of LST, observed LST on land are not yet adequate for assessing diurnal cycles or analyzing seasonal and inter-annual variability because of large spatiotemporal variations.MODIS LST (MOD11A1) and MODIS NDVI (MOD13Q1) were used as satellite datasets in this study.The MODIS land products include the energy balance product, vegetation parameter product, and land cover/land use product (http://modis-land.gsfc.nasa.gov/).These products have been widely applied in global and regional monitoring, modeling and assessment.

MODIS LST
MODIS LST is estimated using a statistical regression-based method based on 12 MODIS thermal infrared (TIR) bands, with an option for non-linear physical retrieval.The regression coefficients of the statistical retrieval are derived using a fast radiative transfer model with atmospheric characteristics taken from a dataset of 12,208 global profiles of atmospheric temperature, moisture, and ozone [31].Notably, the accuracy of the product may be affected by errors in land surface emissivity (Eva Borbas, personal communication).The MODIS LST (MOD11) product is retrieved using the generalized split-window algorithm [32,33]: where T s is LST, T 31 and T 32 are brightness temperatures in MODIS bands 31 and 32, respectively, ε 31 and ε 32 are surface emissivities in MODIS bands 31 and 32, respectively, and C, A 1 , A 2 , A 3 , B 1 , B 2 , and B 3 are regression coefficients [34].MOD11A1 data are daily level-3 global LST products gridded in the Sinusoidal projection (version 4) at spatial resolution of 1 km and collected twice per day using the generalized split-window algorithm [33,35,36].

MODIS NDVI
The MODIS sensors onboard the Terra satellite acquire data in 36 discrete spectral channels with a spatial resolution of 250 m for the visible bands, 500 m for the near-infrared bands, and 1000 m for the remaining thermal infrared bands (http://modis.gsfc.nasa.gov/).To obtain the MODIS NDVI data among the products, the MODIS 500-m NDVI 16-day composite scenes (MOD13Q1) from January 2013 to May 2015 were retrieved from the Land Processes Distributed Active Archive Center (LP DAAC, https://lpdaac.usgs.gov/).NDVI is a distributed vegetation condition index based on differences in the reflectivity of near infrared light.The NDVI reflects the presence of vegetation on a pixel basis and provides measures of the amount and condition of vegetation within a pixel [31].NDVI values were calculated using Equation (4) in each grid cell: where NIR is the near infrared band (MODIS band 2), and RED is the red band (MODIS band 1).In this study, MODIS NDVI composited to 16 days was resampled from UTM format and a 500 m resolution to the WGS84 projection and a 1 km spatial resolution for the MODIS LST reference.

Automatic Weather Stations (AWSs)
The Korea Meteorological Administration (KMA) operates the AWSs, which include 687 stations throughout South Korea, to continuously monitor weather conditions (Figure 2a).The average distance between stations is approximately 13 km, and the raw data are collected in 10 min intervals.The data set used in this study is archived at 24 h intervals.In this study, we used 687 AWS precipitation data points during the simulation period.First, we obtained minute-scale precipitation data from KMA and compared it with the operation period data.Second, minute-scale precipitation data were converted into daily data, and the data were interpolated using the inverse distance weighting (IDW) method based on the 1-km spatial resolution of the MODIS LST reference.

Soil Moisture and Soil Data of the Stations
To develop a model for expected SM, the regression model requires observed SM data from various stations.This study used SM data measured by KMA at 9 stations, the Hydrological Survey Center (HSC) at two stations, K-water (Korea Water Resources Corporation) at 7 stations and the Rural Development Administration (RDA) at 40 stations (Table 2 and Figure 2b).The automated agriculture observing system (AAOS) operated by KMA provides observations of weather phenomena that are closely related to agriculture at 10 auxiliary agricultural weather observatories located throughout the country, including the Suwon (SW) meteorological observatory, which is a basic agricultural meteorological observation office according to the technical regulations of the World Meteorological Organization (WMO).In this study, SM data at 10 cm of depth were collected at 9 sites (the Seogwipo and Andong sites did not provide good SM data and were not used.The technical specifications of the KMA data were previously reported [37].Stations No. 10 to No. 12 are flux towers.Nos. 10 and 11 are within the Han River basin, the largest river basin in South Korea (total area of 34,406 km 2 ), and are operated by HSC, and No. 12, which is operated by K-water, is located in the Geum River basin at an elevation of 688.568 m and a height of 25 m to avoid the effects of the reservoir and canopy.Nos. 13 to 18 are installed in the Yongdam Dam watershed in the Geum River basin and are operated by K-water.These data were measured using time domain reflectometry (TDR) at an average depth of 10 cm.The data are provided by the Yongdam Experimental Catchment (http://www.ydew.or.kr/kdrum/main/main.do).The observed stations from No. 19 to the end of the list are operated by RDA and utilize TDR.These data were provided by the Agricultural Meteorology Information Service (http://weather.rda.go.kr).All SM data were prepared from January 2013 to December 2015, but some data, especially RDA data, were prepared for a limited period of the time because of the short observation period.

Conditional Merging (CM) Technique
The CM technique [38][39][40] is a method of spatial interpolation suited for merging spatially continuous grid-based measurements and point measurements.The method has the advantage of precisely preserving the spatial covariance structure of spatially continuous grid-based measurements while maintaining the accuracy of the point-based measurements.The algorithm has been applied and showed superior performance to the traditional geostatistical approaches, especially in obtaining spatial rainfall fields in several regions across the world [29,30,[40][41][42].The CM technique is also known as the kriging error correction technique based on radar-based or satellite-based data.Geostatistical merging methods such as mean field bias correction (MFB), range-dependent adjustment (RDA), Brande spatial adjustment (BRA), ordinary kriging (KRI), kriging with external drift (KED), and CM technique have been tried.Goudenhoofdt et al. [29] performed and evaluated various methods for combining spatial and gauge data.They found that the CM and KED techniques provided the best methods of improving the spatial interpolation of gauges values.Therefore, CM technique was selected among various methods in this study.
In this study, the CM technique includes the following six processes:

Multiple Linear Regression Model and Scenarios
Regression analysis is commonly used to measure the relationship between two or more variables, predicting the behavior of a dependent or endogenous variable according to one or more independent or explanatory variables.Multiple linear regression (MLR) models are frequently used as empirical models or approximating functions and to establish a mathematical model to describe a real-world phenomenon.Generally, the relationship between the dependent and the independent variables is given as presented as in Equation ( 5) [43].
where Y is the dependent variable, C is a constant, X 1 , X 2 , X 3 , X 4 and X n are independent variables, and β 1 , β 2 , β 3 , β 4 and β n are regression coefficients.From input data, including MODIS LST, MODIS NDVI, PCP n , PCP n-1 , PCP n-2 , PCP n-3 , PCP n-4 and PCP n-5 , were used to develop the model.Notably, PCP n is the precipitation on the observation day.PCP n-5 , PCP n-4 , PCP n-3 , PCP n-2 , and PCP n-1 indicate antecedent precipitation from five days to one day (Table 3).Table 3. Eight regression scenarios for MLR model using MODIS LST, MODIS NDVI, precipitation of the observation day (PCP n ), one day antecedent precipitation (PCP n-1 ), two-day antecedent precipitation (PCP n-2 ), three-day antecedent precipitation (PCP n-3 ), four-day antecedent precipitation (PCP n-4 ), and five-day antecedent precipitation (PCP n-5 ).Note: denotes an independent variable used in the regression scenario.

Seasonal Analysis and Normalization of the Regression Coefficients
In a previous study [44], several input datasets were selected, such as LST, NDVI, sunshine hours and precipitation, and twelve scenarios were developed according to the combinations of input data.In addition, the correlation increased when the model coefficients were evaluated on a seasonal basis due to the reverse correlation between MODIS NDVI and SM in spring and autumn.Therefore, the MLR regression coefficients were calculated by seasonal analysis.
In general, statistical analysis and normalization should be performed to reduce uncertainty in MLR models.The disadvantage of unnormalized regression is that the independent variables usually have different units.Thus, it is difficult to compare the relative influence of each independent variable on the dependent variable.The unnormalized regression coefficient is dependent on the measurement scale, while the normalized regression coefficient is not.Normalization typically shows which independent variable has the largest influence on the dependent variable in MLR analysis.In this study, the normalization was performed using the min-max normalization expression given in Equation (6) [45].Normalization = (Independent variable − Minmum Independent variable) (Maximum Independent variable − Minimum Independent variable) (6)

Soil Moisture Index (SMI)
Spatially distributed SM is often used in spatial drought indices.The SM-based drought index, known as the SMI, provides the severity and duration of an agricultural drought for an area of interest.Available water for plants is defined as the quantity of soil water between field capacity (FC) and the lower limit of extractable water, which is known as the wilting point (WP), and this stored water is extracted by plant roots [46].Available water is therefore an important metric for quantifying agricultural droughts if it is converted into an index.In this study, available water is first calculated based on the observed or modeled SM that is normalized by the maximum available water for plants, calculated as the difference between the field capacity and wilting point, to derive the SMI.This index is classified from no drought to extreme drought to quantitatively assess droughts in space and time [47].In general, the SMI reflects the level of agricultural drought.However, SMI has not yet been evaluated as an authorized drought index in South Korea by lack of spatial confidence.Therefore, this study additionally examined the efficiency of spatial SMI compared to the standardized precipitation index (SPI), which is widely used as a meteorological drought index.In South Korea, the SPI distribution is provided by the Drought Information System (http://drought.kma.go.kr) of the KMA.
The SMI is computed based on the soil characteristics and SM conditions, and the parameters include FC, WP and SM.The soil map data (FC, WP and soil type) were obtained from the KRDA (Figure 4).Then, the soil map was rasterized at a 1 km spatial resolution.We used these data to estimate the SMI.The dominant soils are sand (31.2%) and loam (38.8%).The FC ranges from 9% to 40%, and the average FC of all regions is 22.1%.Additionally, the WP ranges from 3% to 15%, with an average WP of 5.6% in all regions.The SMI equation is given as follows.
This equation yields SMI values ranging from less than −5 to 0. Thus, the actual SM in the soil column is normalized based on the available water content (AWC) in the soil column.This normalized value then used to compute the index.The range is chosen via a method similar to that of the U.S. Drought Monitor to maintain consistency and compare the drought severity.An SMI of 0 indicates no drought, but conditions could be heading toward drought or moving out of a drought.An SMI of −1 reflects a low-intensity drought, while an SMI of −5 reflects an extreme drought [47].

Corrected MODIS LST Data
The leave-one-out cross-validation method was used to assess the performance of the CM technique in predicting the LST values at ungauged locations.In this technique, observed LST stations are assumed to be nonexistent at given gauge locations, and a spatial interpolation technique (CM) is applied to obtain the values at these points.Then, the estimated values obtained from the CM technique are compared to the original values at all measurement locations.All observed daily LST data measured by 71 stations of KMA from January 2013 to December 2015 (Figure 5).
For verifying LST by CM, some LST stations were assumed to be ungauged stations for verifying LST in green ellipses of Figure 5. LST stations used for verification considered three conditions: (1) the locations in coastal and inland regions; (2) the proximity of the SM observation station; and (3) each land use.From that, 129, 192 and 238 stations are selected by coastal and agricultural area (129), coastal and pasture area (192), and inland and forest area (238) for verifying LST.
After excluding three stations, the CM technique was applied to calculate the corrected LST distribution.The coefficient of determination (R 2 ) values were high at all stations, including verified stations, and varied from 0.89 to 0.99.Overall, the calibration results at the three points excluded from the CM process showed good results between observed and simulated values (Figure 6).Finally, monthly spatial distribution map of corrected LST was generated by the CM technique from 2013 to 2015. Figure 7 shows monthly spatial distribution maps of corrected LST from April to September in drought years (2013-2015).

Optimal Regression Equation and Regression Coefficients
To determine the optimal regression scenario, the regression coefficients of MLR equations were assessed (Table 4).Moreover, scenario 9 was additionally added to assess how the result of corrected MODIS LST would be affected in R 2 .In the correlation analysis, R 2 was used to assess the results.The weighted effect of each regression coefficient on SM is given in Table 4. Scenario 1 only used MODIS LST and PCP n to estimate SM, and the resulting R 2 was 0.22.However, R 2 increased to 0.24 and 0.45 when LST, PCP n , PCP n-1 , PCP n-2 , PCP n-3 , PCP n-4 , and PCP n-5 were included.From this result, Scenario 8, which included nine independent variables, was selected as the optimal regression equation.Additionally, the R 2 of Scenario 9 decreased by 0.06 when original MODIS LST was applied, compared with the result of Scenario 8.
The MLR coefficients showed that R 2 increased from 0.07 to 0.24 when we estimated the coefficient seasonally compared with the coefficients that were not estimated seasonally (0.22 to 0.43).These results indicate that the correlation between observed soil moisture and estimated SM was improved by considering seasonal and soil type patterns.Nevertheless, the reason for the low R 2 of less than 0.5 was that the soil classification for each observation site was not complete.It was difficult to classify the soil classes of the RDA's stations (from No. 19 to 58) because these stations were not stabilized; the stations have only been measuring SM since December 2014.In addition, the classification of 12 soil types into four soil types also contributed to the poor accuracy.Therefore, if we add more soil moisture observations or extend the observation period and refine the soil classification to more than four soil types, the simulation results would improve.
The seasonal regression coefficients showed the highest correlation in all four soil classes in spring and summer.Because of the characteristics of the monsoon climate in South Korea, where precipitation is concentrated in summer, there are many values that are effective for regression analysis based on antecedent precipitation, and it is estimated that the correlation is high.However, the reason for the low R 2 of the four soil types in winter is that SM data in winter are associated with high uncertainty because the soil is frozen, which can cause instrument errors.

Soil Moisture by Regression Coefficients of Optimal Scenario
Using the optimal scenario, the accuracy of SM estimation was evaluated via comparison to observed SM at 58 stations.SM was estimated using the regression coefficients, and the accuracy with respect to the observed SM was verified by R 2 and root mean square error (RMSE) (Table 6).The graphs between observed SM and estimated SM at major stations are illustrated in Figure 8. R 2 and RMSE for all soil types ranged from 0.30 to 0.76 (R 2 ) and 0.46% to 12.21% (RMSE).The overall R 2 and RMSE were greater than 0.4 (R 2 ), indicating a constant correlation.Most of the RMSEs were less than 5.0% (RMSE), but the RMSEs at Stations 1 and 2 were greater than 9.0% (RMSE).The main errors may have been associated with the artificial water supply.Unlike other stations, these two stations are located near upland crop and paddy field areas.Therefore, the observed SM was likely influenced by the agricultural water supply in addition to precipitation during the irrigation period from April to June.Notably, R 2 and RMSE in winter were poor, as illustrated by the many missing values, and uncertainty exists in observations due to freezing and mechanical errors in the soil.Thus, the prediction accuracy is low in winter due to the difficulty of establishing an appropriate regression model.Additionally, some observations with low correlations did not fit the soil properties.After refining the soil classes in further research, it is expected that the R 2 and RMSE of these observatories, as well as those of the other observation sites, will increase because of the characteristics of the regression model.Currently, 43 of the 58 stations are occupied by loam and clay, and these 43 stations are calculated using one regression equation for each season.Therefore, as mentioned above, improvement of the soil classification is necessary.

Distribution of Estimated Soil Moisture
The SM distribution by soil type was estimated using normalized regression coefficients (Table 7).The monthly distribution maps of spatial SM and spatial precipitation (PCP) were generated from 2013 to 2015 (Figure 9).From the results of the monthly distribution map, spring drought was severe until March due to the absence of rain in 2013 and 2015.As seen from Figure 9, there was no rainfall in other regions, except for northeastern South Korea, until January.Therefore, SM was relatively low compared with other regions.Because PCP was observed throughout South Korea from March to April, SM increased by 5-6%.However, SM in May decreased due to the absence of rain.These results show that SM depends on spatial PCP pattern.In particular, in the June SM map, the SM in the western part of the Korean Peninsula sharply decreased because there was no rainfall for the three months after March from 2013 to 2015.This trend is due to the monsoon climate of the area.All areas of South Korea are affected by monsoons, and wet and dry seasons occur each year, with seasonal variations in precipitation.Usually, June through August (summer) is the wet season, and most of yearly rainfall occurs during this period.Approximately 30% of the annual rainfall occurs in the other 9 months.

Comparison of Drought Index
The SMI and SPI in 2015, an extreme drought year, are illustrated in Figure 10.The SPI was extremely low (dry) in Gyeonggi and Gangwon Provinces (northern part of South Korea) on 1 January 2015.In March and May, SMI and SPI values show that drought was alleviated by rain.However, the SMI and SPI levels remained severe in Gyeonggi and Gangwon Provinces.This drought was resolved by large-scale rainfall events in July, and the SMI approached zero in southern regions where rainfall occurred.After July, the soil moisture naturally increased and decreased according to the precipitation.This pattern is consistent with the tendency of SPI.Coupling with SPI, SMI can be used as meteorological drought index in forested area and agricultural drought index in cultivation area.

Conclusions
This study estimated the spatial SM of South Korea from January 2013 to December 2015 using an MLR model and MODIS satellite data and evaluated the results by comparison with observed SM data at 58 stations.From the original MODIS LST data, daily spatial LST was corrected using CM technique.Additional satellite data (NDVI of Terra MODIS) were used to reflect vegetation variation.The observed precipitation measured from AWSs of the KMA considered during the simulation period was interpolated using the IDW method to match the spatial resolution of 1 km.Although the USDA textural classification, which divides soil into 12 classes, is one of the most widely used soil classification systems, the soil was classified into four types (loam, sand, clay and silt) based on the largest proportion of soil in South Korea.Finally, the regression coefficients of the MLR model were estimated seasonally considering the five-day antecedent precipitation.The primary results are summarized as follows: 1.
The R 2 of MODIS LST corrected by CM were between 0.83 and 0.99 at all LST stations.The results showed the values were generally accurate compared to the observed LST.

2.
The p-values of all the regression coefficients were less than 0.05, except for a few coefficients of NDVI for silt, indicating statistical significance.The R 2 values of the regression coefficients for the 4 soil classes were between 0.28 and 0.67.The reason for the low R 2 values of less than 0.5 is that the soil classification for each observation site was not completely accurate.

3.
The seasonal regression coefficients showed the highest correlation in all four soil classes in summer due to the characteristics of the monsoon climate in South Korea, where precipitation is concentrated in summer.There are many values that are effective for regression analysis based on antecedent precipitation.4.
When we simulated SM using the estimated regression coefficients, the overall R 2 was greater than 0.4 at most observation sites (approximately 66%), except for some observations.Therefore, as mentioned above, improvements in soil and season classification are necessary.

5.
For distributing spatial SM, normalized regression coefficients were estimated using min-max normalization.Normalization typically showed relationship which independent variable has the largest influence on the dependent variable in MLR analysis.In this study, the normalization was performed using min-max normalization.6.
In the spatial soil moisture distribution, simulated SM tends to increase and decrease with precipitation.This tendency is more clearly seen in the SMI map, where the SMI decreases from −2 to −3, indicating a weak drought.From March to April in 2014, PCP was observed throughout South Korea.Thus, SM also increased by 5-6%.These results showed that approximately 60% of the drought areas predicted by the SMI and SPI overlapped.
The result of the CM technique in this study showed that the accuracy of MODIS LST data improved by 20-30%.In regression analysis, the most important variables for estimating SM include 2-day, 3-day, 4-day and 5-day antecedent precipitation, LST and NDVI.SM exhibits spatially different patterns, even in areas with the same land use and soil characteristics.Overall, this study develops a high-resolution and accurate spatial distribution of SM for the first time based on satellite data.The results of this study showed that spatial resolution improved by 90% (10 km to 1 km) and R 2 increased by 62% (0.30 to 0.49) compared with the spatial resolution (10 km) and R 2 (0.30) of a previous study [48] based on AMSR2 satellite data.Although machine learning methods such as MLR and artificial neural network (ANN) may achieve better accuracy, there are still some limitations.The MLR model is trained with a large number of samples, and the more training samples that are available, the better the model fits the data.This fact draws on a requirement for the abundance of historical satellite data and contemporary in situ SM observations.If the satellite data and in situ SM archive is not abundant enough, then the relation values cannot be fully represented by historical observation pairs [49].Notably, this study is the first to spatially estimate SM using MODIS LST corrected using the CM technique in South Korea.Therefore, this study provides a framework for accurate SM prediction in ungauged areas.Future research study could be improved if the soil classification is further subdivided and the soil moisture regression coefficient is simulated based on more observed data.

Figure 1 .
Figure 1.Flow chart of the study.For the satellite data, MODIS is the Terra moderate-resolution imaging spectroradiometer.For the soil moisture data, AAOS is the automated agriculture observing system operated by KMA, TDR is the time domain reflectometry, and RDA is the rural development administration.

Figure 2 .
Figure 2. Distributions of observation stations: (a) the 687 automatic weather system (AWS) stations for continuous monitoring throughout South Korea and (b) the 58 soil moisture stations used for calibration of the multiple linear regression (MLR) model.
(a) observed LST are collected at 71 stations of KMA; (b) LST values measured at the 71 stations are interpolated using the ordinary kriging technique with 1 km spatial resolution of the MODIS LST reference; (c) satellite LST data are collected with 1 km spatial resolution; (d) the satellite LST at the 71 gauging stations are extracted and then spatially interpolated using the ordinary kriging technique with 1 km spatial resolution of the MODIS LST reference; (e) the residual between (c) and (d) is calculated; and (f) the residual values of (e) are added to the data from (b) to produce the final satellite-observed composite LST dataset with 1 km spatial resolution (Figure 3).

Figure 3 .
Figure 3. Conditional merging process for MODIS LST: (a) Observed LST are collected at 71 stations of KMA; (b) LST values measured at the 71 stations are interpolated using the ordinary kriging technique; (c) Satellite LST data; (d) Satellite LST at the 71 gauging stations are extracted and then spatially interpolated using the ordinary kriging technique; (e) Residual between (c) and (d); (f) Residual values of (e) added to the data from (b).

Figure 4 .
Figure 4. Distribution map of soil information with a 1 km spatial resolution: (a) Soil type (silt, clay, loam and sand); (b) soil field capacity (FC); and (c) soil wilting point (WP).

Figure 5 .
Figure 5. Map of observed land surface temperature (LST) stations: ellipses (in green) denote the stations excluded from the conditional merging (CM) process and used for verification.The number above each pentagon (in red) is the LST station number.

Figure 6 .
Figure 6.Comparison of observed and simulated LST graphs at verified stations: (a) 129 sites; (b) 192 sites; and (c) 238 sites.The left graphs are the original MODIS LST values.The right graphs are the corrected MODIS LST values after applying conditional merging (CM).

Figure 7 .
Figure 7. Final monthly spatial distribution maps of LST during drought years.

Figure 8 .
Figure 8.Comparison of observed soil moisture and predicted soil moisture for each soil type.The black line is observed soil moisture, and red points are soil moisture values predicted using the multiple linear regression model.These graphs are representative results of each soil type.

Figure 10 .
Figure 10.Comparison of the soil moisture index (SMI) and standardized precipitation index (SPI): green and red dashed circles indicate areas where the SMI and SPI exhibited good agreement.

Table 1 .
Description of data specifications.

Table 2 .
Agrometeorological observation network of the Korea Meteorological Administration and the other observation points with soil type.

Table 4 .
Regression coefficients of multiple linear regression for each scenario.

Table 5 .
Regression coefficients in four seasons and for four soil types.

Table 6 .
Summary of the average (2013-2015) R 2 and root mean square error (RMSE) values at 58 soil moisture stations.