Estimating Regional Soil Moisture Distribution Based on NDVI and Land Surface Temperature Time Series Data in the Upstream of the Heihe River Watershed, Northwest China

: Temporal and spatial variability of soil moisture has an important impact on hydrological processes in mountainous areas. Understanding such variability requires soil moisture datasets at multiple temporal and spatial scales. Remote sensing is a very e ﬀ ective method to obtain surface (~5 cm depth) soil moisture at the regional scale but cannot directly measure soil moisture at deep soil layers ( > 5 cm depth) currently. This study chose the upstream of the Heihe River Watershed in the Qilian Mountain Ranges in Northwest China as the study area to estimate the proﬁle soil moisture (0–70 cm depth) at the regional scale using satellite Vegetation Index (NDVI) and Land Surface Temperature (LST) products. The study area was divided into 31 zones according to the combination of altitude, vegetation and soil type. Long-term in situ soil moisture observation stations were set up at each of the zones. Soil moisture probe, ECH2O, was used to collect soil moisture at ﬁve layers (0–10, 10–20, 20–30, 30–50 and 50–70 cm) continuously. Multiple linear regression equations of time series MODIS (Moderate-resolution Imaging Spectroradiometer) NDVI, LST and soil moisture were developed for each of the ﬁve soil layers at the 31 zones to estimate the soil moisture (0–70 cm) on a regional scale with a spatial resolution of 1 km 2 and a temporal resolution of 16-d from October, 2013 to September, 2016. The correlation coe ﬃ cient R of the regression equations was between 0.47 and 0.94, the RMSE was 0.03, indicating that the estimation method based on the MODIS NDVI and LST data was suitable and could be applied to alpine mountainous areas with complex topography, soil and vegetation types. The overall pattern of soil moisture spatial distribution indicated that soil moisture was higher in the eastern region than in the western region, and the soil moisture content in the whole study area was 14.5%. The algorithm and results provide novel applications of remote sensing to support soil moisture data acquisition and hydrological research in mountainous areas.


Introduction
Soil moisture is an essential component of the terrestrial water cycle, and serves as a critical link between the precipitation, surface water, groundwater and vegetation water [1][2][3][4][5][6]. It plays an important role in hydrological processes and land surface-atmosphere interactions such as the Soil-Plant-Atmosphere Continuum (SPAC) [7]. Mountains are water towers of rivers, and understanding the distribution of soil moisture over the mountainous areas, is essential for hydrological modeling and water resources management, especially in arid areas, such as Northwest China [8]. Since its strong temporal and spatial variability, soil moisture has an important effect on the distribution of regional water resources, and ecosystem services [9][10][11]. Unfortunately, the impact of soil moisture on both hydrological processes and ecosystem services over the regional scale has been poorly understood due to the lack of long-term, large-scale soil moisture datasets, particularly in such mountainous areas [12].
Over the past decades, acquisition of soil moisture data has evolved from traditional gravimetric methods to the applications of semi-automatic and automatic monitoring of the neutron probe, time-domain reflectometry (TDR), frequency domain reflectometry (FDR). At the regional scale, remote sensing, particularly satellites, provides soil moisture estimates over large areas [13], which is mainly based on the measurement of electromagnetic radiation energy reflected or emitted from the land surface. Satellite soil moisture products mainly include AMSR (Advanced Microwave Scanning Radiometer) [14], ERS (European Remote Sensing Satellite) [15], ASCAT (The Advanced Scatterometer) [16,17], SMOS (Soil Moisture and Ocean Salinity) [18] and SMAP (Soil Moisture Active Passive) [19]. In addition, soil moisture data at the regional scale can also be obtained through model simulation and data assimilation [20]. Common weaknesses of these soil moisture products are that the accuracy and resolution in heterogeneous mountainous areas are still too coarse to meet the study requirements [21,22]. Temperature Vegetation Dryness Index (TVDI) [23] and Apparent Thermal Inertia (ATI) [24,25] are two common methods to estimate soil moisture by using visible light and near-infrared wavebands. However, both the TVDI and ATI methods mainly establish the correlation between the measured soil moisture on the ground and the TVDI or ATI indices to estimate the soil moisture distribution in the region. For alpine and heterogeneous areas, the complexity of the underlying surface leads to a series of uncertainties, such as the inconsistent linear relationship between soil moisture and ATI, resulting in large estimation errors. In addition to these deficiencies, current remote sensing methods mainly concentrate on the relationship between the surface reflection value and the in situ observations to estimate the surface soil moisture at the regional scale [23]. Few studies have estimated the profile soil moisture, particularly in the alpine areas. Li et al. [26] used remotely sensed surface soil moisture to calculate deep soil moisture by a flux model, and pointed out that remote sensing surface soil moisture in wet areas was suitable for estimating profile soil moisture, while the results in dry areas had large errors. Tobin et al. [27] downscaled AMSR-E and ERS-CCI (European remote sensing satellite-climate change initiative) soil moisture data using an exponential filter (ExpF) with soil moisture index derived from MODIS NDVI. Tian et al. [3] coupled the surface soil moisture and profile soil moisture, applied ExpF, artificial neural networks (ANN) and cumulative distribution function matching (CDF) methods in an alpine region to estimate the deep profile soil moisture at the regional scale. Lu et al. [28] investigated a nonlinear autoregressive neural network method with exogenous input (NARXnn) to estimate time series soil moisture by multiple remote sensing data. Estimation of 0-100 cm soil moisture by the principle of maximum entropy achieved better results than those obtained by the exponential decaying function in the Southeastern USA, a subtropical humid area [29]. With the measurement of the spectral characteristics of soil profile, Balet et al. [30] inferred soil moisture conditions based on the MARMIT (MultilAyer Radiative Transfer Model for soIl reflecTance) model. Over the past decades, MODIS LST and NDVI products have been widely used in agriculture, ecosystem and global change research [31,32]. The NDVI is an index that shows the difference between vegetation reflectance in the visible and near-infrared bands and the soil background [33]. Land Surface Temperature (LST) is a key parameter for agricultural drought monitoring, hydrological research and urban thermal environment [34,35]. Both MODIS LST and NDVI data are widely used for soil moisture estimation based on the TVDI and ATI methods (e.g., [21,36]). For example, Yang et al. [37] proposed a trapezoidal space defined by remote sensed vegetation cover and LST to estimate surface soil moisture. Compared with the aforementioned soil moisture products, both MODIS LST and NDVI have higher spatial resolutions in soil moisture monitoring. In addition, both MODIS LST and NDVI are usually used for downscaling the satellite soil moisture products [22,38].
In recent years, many scale transformation studies for soil moisture retrieval have been done. However, the accuracy of most of these works was poor in mountainous areas, and the temporal and spatial resolutions of profile soil moisture estimation were still too coarse to meet the needs of hydrological research in heterogeneous mountain areas. To improve the deficiency in estimating profile soil moisture distribution over mountainous areas, this study proposed a novel approach to estimate profile soil moisture by integrating remote sensing LST and NDVI products and in situ soil moisture observations. The upstream of the Heihe River Watershed in the Qilian Mountain Ranges in Northwest China was chosen as the study area to estimate the profile soil moisture (0-70 cm depth) at the regional scale by remote sensing. The study area was divided into 31 zones based on the combination of altitude, vegetation and soil type. In situ soil moisture observation stations were set up at each of the zones to collect soil moisture at five layers (0-10, 10-20, 20-30, 30-50 and 50-70 cm) continuously for the period of October, 2013 to September, 2016. Multiple time series MODIS NDVI, LST and soil moisture were fitted for each of the five soil layers at the 31 zones to estimate the profile soil moisture distribution on a regional scale. The study aims to provide a novel approach to estimate profile soil moisture distribution by remote sensing to support hydrological research in mountainous areas.

Study Area
The in situ soil moisture data for this study were obtained from the Soil Hydrological Heterogeneity Observation Network in the upstream of the Heihe River Watershed (97 • 46 -101 • 11 E, 38 • 12 -39 • 22 N) Northwest, China (Figure 1a), a 2.75 × 104 km 2 study area with 30 tributaries, elevation ranging from 2000 m to 5580 m (above sea level, a.s.l) on the northeast edge of the Qinghai-Tibet Plateau. The spatial distribution of annual mean temperature ranges from -3.1 • C to 3.6 • C, and annual precipitation ranges from 200 mm to 700 mm [39]. Perennial snow cover and glaciers are present above 4000 m with permafrost extending down to 3700 m. Affected by mountain climate and terrain, there is a remarkable vertical zonal distribution of vegetation and soils in the study area. In the upstream of the Heihe River Watershed, the major vegetation types include coniferous forest (Picea crassifolia), shrub (Potentilla fruticosa), steppe (Stipa purpurea Griseb), alpine meadow (Kobresia pygmaea Clarke), alpine sparse vegetation (Saussurea medusa Maxim) and desert (Sympegma regelii Bunge) [22]. The main soil types include aeolian sandy soil, cold desert soil, alpine meadow soil and mountain swamp chestnut soil [8].  Table 1). (a) the study area in China; (b) the elevation of the study area, black circles represent the in situ sites and crimson circles represent the three automatic meteorological station sites; (c) land use types distribution in the study area; (d) soil types. The locations of in situ monitoring sites and meteorological stations are also indicated in (b), (c) and (d).   We established a Soil Hydrological Heterogeneity Observation Network comprising 31 stations ( Figure 1) in the upstream of the Heihe River Watershed based on the combination of land use/land cover (LULC), soil type and elevation [39,40]. The distribution of soil and vegetation types is shown in Table 1. We used GPS (Garmin Oregon 550) to record latitude, longitude and altitude information for each station, the positional errors were within ±1 m and altitude error was in ±3-5 m. We also recorded soil profile characteristics, vegetation status, root depth, aspect and slope using a geological compass, and took pictures of the sites every summer since June, 2014.

Soil Moisture Measurement
ECH2O 5TE soil moisture probes (DECAGON Devices, USA) and EM50 datalogger were installed at each in situ observation site. The 5TE probe measures the soil volumetric water content by the dielectric constant of the soil, with a precision of ±3% and accuracy of ±1-2% [3,41]. The 5TE probes with a sensing depth of 2.5 cm were vertically centered in each of the sampled layers (0-10 cm, 10-20 cm, 20-30 cm, 30-50 cm and 50-70 cm) [42]. There is one probe in each layer for a total of five at each station. Datalogger EM50 was placed into a waterproof box sealed with high-strength glass glue and wrapped in a thick waterproof bag buried at a distance of at least 50 cm away from the 5TE probes. The system collected soil moisture content data at 30 min intervals. Regular site maintenance took place twice a year at the beginning of June and at the end of October [3].
However, the mountainous study area is topographically complex and hard to reach, and it is a real challenge to maintain the soil moisture in situ observation network in the study area. Wireless data transmission is not possible because there is no mobile communication network coverage in the study area. As a result, there are some gaps in the measurement datasets due to severe weather and unstable batteries or sensors and the sensor damage by livestock and rats [3].

Time Series Data of NDVI and LST
NDVI was used as the vegetation index for the upscaling of the in situ soil moisture in this study. Specifically, we acquired the 16-day composite NDVI dataset MOD13A2 V6 (https://e4ftl01. cr.usgs.gov/MOLT/MOD13A2.006/) of the sixth edition of terrestrial Level 3 standard data products for the period of October, 2013 to September, 2016, with a total of 69 scenes at a spatial resolution of 1 km 2 [43]. We used the 8-d composite Land Surface Temperature (LST) data MOD11A2 V6 (https://e4ftl01.cr.usgs.gov/MOLT/MOD11A2.006/) for the same period of October, 2013 to September, 2016, for a total of 138 scenes with a resolution of 1 km 2 [44]. Since the LST data were of 8-d composite, we used Maximum Value Composites (MVC) to process the LST dataset into a 16-d temporal resolution to match the temporal frequency of the NDVI dataset [45].

Methodology
The remote sensing vegetation index products contain a lot of noise caused by precipitation, cloud cover, human influence and the sensor itself in the data acquisition and process stages [46,47], and these factors can be collectively called random factors or residual parts.
The MOD13A2 V5 NDVI product was divided into the NDVI seasonal part and the NDVI residual part by using the Asymmetric Gaussian fitting method (AG) [48]. The multivariate linear regression fitting was used to establish multiple linear regression equations between the time series soil moisture data and the MODIS LST and NDVI. Subsequently, soil moisture upscaling equations were developed for every profile in the 31 vegetation-soil-elevation zones.
We installed only three automatic weather stations in the study region due to resource constraints. The limited number of precipitation observation stations cannot match the number of in situ soil moisture observations. Our in situ observations covered the three-year period (October, 2013 to September, 2016), and were able to represent the mean precipitation amount in the study region. Therefore, we only used MODIS LST and NDVI remote sensing data to estimate the spatial distribution of soil moisture in the study region.

Asymmetric Gaussian Function Fitting
Based on the noise information, NDVI data products can be divided into seasonal part, NDVI AG , and residual part, NDVI RES as shown in Equation (1). It needs to be reconstructed to remove the residual part [48].
The time series NDVI data curve reconstructed by the AG algorithm, can well express the interannual variation characteristics of the vegetation, and identify all the abnormal values [31]. The AG algorithm is a nonlinear least-squares fitting algorithm based on the Asymmetric Gaussian function. The original NDVI data and NDVI AG were used to obtain the residual part of the time series Vegetation Index (NDVI RES ). The shallow soil moisture was greatly affected by the residual part which was likely to be related to short-term weather. The deep soil moisture was mainly subject to the seasonal changes in weather conditions and plant growth and environmental factors.

Multiple Linear Regression Fittings
In this study, a multiple linear regression method was used to estimate the soil moisture using the in situ soil moisture observations, Vegetation Index (NDVI) and Land Surface Temperature (LST) data. The specific equation is as follows: In Equation (2), SMC is soil volumetric water content (m 3 m −3 ), LST is the Land Surface Temperature (K), NDVI RES and NDVI AG are the residual part and seasonal part of the vegetation index, respectively, a 0 , a 1 , a 2 and a 3 are the coefficients to be determined, a 1 is in m 3 m −3 K -1 , while a 2 and a 3 are dimensionless.
Following Zhang et al. [22], the regression equations were evaluated by complex correlation coefficient (R), the F-test value, the confidence level P and the root mean square error (RMSE). R describes the linear correlation of the dependent variable (soil moisture) and multiple explanatory variables in multivariate linear regression. The F-test was used to test whether the established regression equation is statistically significant. The larger the F-value, the better the fitted regression equation. Confidence level P indicated the significance level of the regression equation. The smaller the p-value, the more significant the equation. RMSE was used to measure the deviation between the predicted value and the true value. The smaller the RMSE value, the closer the simulated value to the observed value, the higher the accuracy of the regression equation. R and RMSE calculation formulas are as follows: where, SMC obs,i is the measured soil moisture value, SMC mod , i is the multiple linear regression fitted soil moisture value, n is the number of observations [22].

Time Series NDVI and LST
The NDVI dataset was the MOD13A2 product from 16 October 2013 to 29 September 2016. The product was the 16-day composite with a total of 69 images. NDVI AG (Figure 2b) was obtained by reconstructing the NDVI time series data with the AG algorithm, and then the NDVI AG and the original NDVI data (Figure 2a) were used to derive the residual NDVI RES (Figure 2c). The reconstructed NDVI time series data (NDVI AG ) based on the AG algorithm and the trend of the original NDVI time series data were basically the same (Figure 2d), and the reconstructed results were used to detect abnormally high and low values in the time series data, to correct the low values, and to reflect seasonal changes in the vegetation index [49]. The seasonal NDVI AG and the residual NDVI RES were consistent with the growth pattern of the vegetation [48].
The selected LST dataset was the MOD11A2 product from 16 October 2013 to 29 September 2016, which was an eight-day composite product with a total of 138 images. The eight-day LST dataset was converted to a 16-day LST images (Figure 3b) to match the NDVI temporal resolution by the MVC method [45].
original NDVI data ( Figure 2a) were used to derive the residual NDVIRES (Figure 2c). The reconstructed NDVI time series data (NDVIAG) based on the AG algorithm and the trend of the original NDVI time series data were basically the same (Figure 2d), and the reconstructed results were used to detect abnormally high and low values in the time series data, to correct the low values, and to reflect seasonal changes in the vegetation index [49]. The seasonal NDVIAG and the residual NDVIRES were consistent with the growth pattern of the vegetation [48]. The selected LST dataset was the MOD11A2 product from October 16th, 2013 to September 29th, 2016, which was an eight-day composite product with a total of 138 images. The eight-day LST dataset was converted to a 16-day LST images (Figure 3b) to match the NDVI temporal resolution by the MVC method [45].

Upscaling of in Situ Soil Moisture
The soil moisture time series data in the 31 in situ observation sites were processed to 16-day mean values corresponding to the NDVI and LST's temporal resolution. Subsequently, multiple linear regressions were established between the in situ soil moisture observations and the NDVI and LST time series data. The five-layer soil moisture upscaling equation coefficients of the 31 in situ observation sites were shown in Table 2 and Tables A1-A4. For the 0-10 cm layer of the site D8, the soil moisture upscaling equation was significant at a 95% confidence level, and all the other models at the rest 31 in situ observation sites were significant at a 99% level. Among them, the correlation coefficient R of the 0-10 cm layer was 0.36-0.93, the RMSE was 0.03 (Table 2); the R and RMSE of 10-20 cm layer were 0.65-0.90, and 0.03 (Table A1), respectively; The R and RMSE of 20-30 cm layer were 0.51-0.95 and 0.03 (Table A2), respectively; the R and RMSE of 30-50 cm layer were 0.47-0.93, and 0.03 (Table A3), respectively; and the R and RMSE of 50-70 cm layer were 0.54-0.94, and 0.02, respectively (Table A4). The accuracy of scaling results was higher than that of Xu et al. [21]. In general, the five-layer RMSE mean values were close to the ECH2O data accuracy (0.03 cm 3 cm −3 ), indicating that the accuracy of the in situ soil moisture estimation models was reasonable.

Upscaling of In Situ Soil Moisture
The soil moisture time series data in the 31 in situ observation sites were processed to 16-day mean values corresponding to the NDVI and LST's temporal resolution. Subsequently, multiple linear regressions were established between the in situ soil moisture observations and the NDVI and LST time series data. The five-layer soil moisture upscaling equation coefficients of the 31 in situ observation sites were shown in Tables 2 and A1, Tables A2-A4. For the 0-10 cm layer of the site D8, the soil moisture upscaling equation was significant at a 95% confidence level, and all the other models at the rest 31 in situ observation sites were significant at a 99% level. Among them, the correlation coefficient R of the 0-10 cm layer was 0.36-0.93, the RMSE was 0.03 (Table 2); the R and RMSE of 10-20 cm layer were 0.65-0.90, and 0.03 (Table A1), respectively; The R and RMSE of 20-30 cm layer were 0.51-0.95 and 0.03 (Table A2), respectively; the R and RMSE of 30-50 cm layer were 0.47-0.93, and 0.03 (Table A3), Remote Sens. 2020, 12, 2414 9 of 20 respectively; and the R and RMSE of 50-70 cm layer were 0.54-0.94, and 0.02, respectively (Table A4). The accuracy of scaling results was higher than that of Xu et al. [21]. In general, the five-layer RMSE mean values were close to the ECH2O data accuracy (0.03 cm 3 cm −3 ), indicating that the accuracy of the in situ soil moisture estimation models was reasonable. The soil moisture of each pixel (with spatial resolution of 1 km 2 ) at the 31 zones from 16 October 2013 to 29 September 2016 was calculated by using the upscaling model to obtain the five-layer soil moisture dataset in the study area for the same study period. The dataset covered an area of 2.75 × 10 4 km 2 , with a temporal resolution of 16-day and a spatial resolution of 1 km 2 (e.g., Figure 5).

Accuracy Evaluation of Upscaling Soil Moisture Models
To evaluate the accuracy of soil moisture upscaling models, the three sites-Biandukou, Dayekou and Kangle, which were not used to establish the regression equations, were used as validation sites, and their soil moisture data were used for correlation test and error analysis ( Table 3). The results of the soil moisture upscaling in the different soil layers showed that the correlation coefficients were between 0.5410 and 0.8940, and the RMSE were between 0.0066 and 0.0549, all of which passed the F-test at 99% significant level. The results showed that the correlation between the upscaled data and the measured data were significant, and the soil moisture upscaling models well expressed the dry and wet conditions of the soil at the regional scale. Comparing the validation results of soil moisture models of the five layers, we found that the R's descending order was 20-30 cm > 10-20 cm > 0-10 cm > 30-50 cm > 50-70 cm, RMSE's descending order was 0-10 cm > 10-20 cm > 30-50 cm > 50-70 cm > 20-30 cm. The trends of the two indices were not consistent, the main reason might be: 1) the variability of soil moisture with time decreased from the top to the bottom soil layer, and the 0-10 cm layer had the largest soil water fluctuation and the largest error, 2) the LST was mainly affected by the 0-10 cm layer of soil and overlying vegetation, 3) NDVI was mainly affected by the vegetation growth, and the root zone soil moisture was very important for vegetation growth, the main vegetation types of the three verification sites were all grassland, with their root system mainly distributed in the 20-30 cm. Therefore, the accuracy of the 20-30 cm soil moisture upscaling model was the highest, and the lowest was the 50-70 cm layer.
Comparison of the soil moisture upscaling results of the three verification sites showed that the RMSE values were getting smaller from the top to the deep soil layers, Kangle appeared to have the best fit. Topographically, the Kangle site was located at a plateau within 1 km 2 pixels. The vegetation type was mainly grass (Stipa Steppe) with homogeneous distribution. The climate was dry, with smaller rainfall (annual 338 mm for Kangle, 428 mm for Dayekou and 702 mm for Biandukou during the 2013-2016 period), the soil moisture was relatively low and stable. Under these climate, topography and vegetation conditions, the LST and NDVI data were of relatively high quality, and the accuracy of the upscaling was high. The Biandukou and Dayekou sites were located on the mountain slope. The terrain was complex within 1 km 2 pixels, and the vegetation types were diverse. In addition, there was a reservoir near the Dayekou site. These factors were likely to have affected the LST and NDVI quality of remote sensing products, and the accuracy of the upscaling model was not as good as that of the Kangle site. Figure 4 shows the changes in soil moisture at different layers over time. As Figure 4a shows, the soil moisture was the lowest in December and the highest in July. The descending soil moisture order in all the layers in December were 20-30 cm > 50-70 cm> 10-20 cm > 30-50 cm > 0-10 cm. The descending soil moisture order in all the layers in July were 20-30 cm >10-20 cm > 0-10 cm > 30-50 cm > 50-70 cm. Among them, the 20-30 cm were the highest among all the layers of the soil profile. During the whole three years, the variation range of each layer was 0-10 cm (coefficient of variation, CV. 12.44%), 10-20 cm (CV. 12.27%), 20-30 cm (CV. 11.86%), 30-50 cm (CV. 10.40%), 50-70 cm (CV. 8.38%), respectively, which was closely related to the effects of precipitation and soil infiltration on soil moisture. Figure 4b shows that the 20-30 cm and 30-50 cm had a relatively high and low but stable variation. Among the four seasons, the precipitation and temperature were low in winter, leading to the decrease of liquid water in the soil, thus the soil moisture was the lowest in the winter. Summer rainfall was the highest and most frequent, therefore the soil moisture content was highest and most variable in the summer.

Temporal Variability of Soil Moisture at the Regional Scale
type was mainly grass (Stipa Steppe) with homogeneous distribution. The climate was dry, with smaller rainfall (annual 338 mm for Kangle, 428 mm for Dayekou and 702 mm for Biandukou during the 2013-2016 period), the soil moisture was relatively low and stable. Under these climate, topography and vegetation conditions, the LST and NDVI data were of relatively high quality, and the accuracy of the upscaling was high. The Biandukou and Dayekou sites were located on the mountain slope. The terrain was complex within 1 km 2 pixels, and the vegetation types were diverse. In addition, there was a reservoir near the Dayekou site. These factors were likely to have affected the LST and NDVI quality of remote sensing products, and the accuracy of the upscaling model was not as good as that of the Kangle site.

Soil Moisture Variability at Different Temporal and Spatial Scales
5.4.1. Temporal Variability of Soil Moisture at the Regional Scale Figure 4 shows the changes in soil moisture at different layers over time. As Figure 4a shows, the soil moisture was the lowest in December and the highest in July. The descending soil moisture order in all the layers in December were 20-30 cm > 50-70 cm> 10-20 cm > 30-50 cm > 0-10 cm. The descending soil moisture order in all the layers in July were 20-30 cm >10-20 cm > 0-10 cm > 30-50 cm > 50-70 cm. Among them, the 20-30 cm were the highest among all the layers of the soil profile. During the whole three years, the variation range of each layer was 0-10 cm (coefficient of variation, CV. 12.44%), 10-20 cm (CV. 12.27%), 20-30 cm (CV. 11.86%), 30-50 cm (CV. 10.40%), 50-70 cm (CV. 8.38%), respectively, which was closely related to the effects of precipitation and soil infiltration on soil moisture. Figure 4b shows that the 20-30 cm and 30-50 cm had a relatively high and low but stable variation. Among the four seasons, the precipitation and temperature were low in winter, leading to the decrease of liquid water in the soil, thus the soil moisture was the lowest in the winter. Summer rainfall was the highest and most frequent, therefore the soil moisture content was highest and most variable in the summer.

Spatial Distribution of Soil Moisture at the Regional Scale
The overall spatial distribution of soil moisture during the growing season from 2014 to 2016 are shown in Figure 5. The weighted mean soil moisture of the three years was 14.50% in the entire study region. Soil moisture was higher in most of the eastern, central and northwestern parts of the region. Overall, the soil moisture in the eastern area was generally higher than that in the central and western areas, and the soil moisture was the lowest in the western area. The spatial patterns were consistent with the precipitation distribution pattern shown in Figure 5d and Tian et al. [42]. The precipitation during the growth period of 2014 to 2016 was highest in the Biandukou, the second highest in the Dayekou, and the lowest in the Kangle automatic weather station, showing a spatial pattern of a declining trend from the eastern to the central and to the western parts of the region. Since the limited number of automatic weather stations and coarse spatial coverage, the precipitation was not incorporated into the regression analysis. with the precipitation distribution pattern shown in Figure 5d and Tian et al. [42]. The precipitation during the growth period of 2014 to 2016 was highest in the Biandukou, the second highest in the Dayekou, and the lowest in the Kangle automatic weather station, showing a spatial pattern of a declining trend from the eastern to the central and to the western parts of the region. Since the limited number of automatic weather stations and coarse spatial coverage, the precipitation was not incorporated into the regression analysis.

Soil Profile Moisture Heterogeneity at the Regional Scale
As shown in Figure 6, soil moisture and its variability at different layers were shown by the relationship between the mean and coefficient of variation (CV). The fitting curves indicated that the variability of soil moisture at the shallow layers was larger than deeper layers. Mean and CV had negative correlations in the 30-50 cm and 50-70 cm layers, indicating that the soil moisture variability was the highest in dry conditions and the lowest in humid conditions (e.g., summer), and the fitting curve between the 30-50 cm layers and the whole profile was close, indicating that the characteristics of the soil moisture change in the 30-50 cm was similar to that of the whole profile. As described in the above section, the most representative of the whole profile was 30-50 cm among the five layers.

Soil Profile Moisture Heterogeneity at the Regional Scale
As shown in Figure 6, soil moisture and its variability at different layers were shown by the relationship between the mean and coefficient of variation (CV). The fitting curves indicated that the variability of soil moisture at the shallow layers was larger than deeper layers. Mean and CV had negative correlations in the 30-50 cm and 50-70 cm layers, indicating that the soil moisture variability was the highest in dry conditions and the lowest in humid conditions (e.g., summer), and the fitting curve between the 30-50 cm layers and the whole profile was close, indicating that the characteristics of the soil moisture change in the 30-50 cm was similar to that of the whole profile. As described in the above section, the most representative of the whole profile was 30-50 cm among the five layers.

Conclusions
Estimating spatial and temporal distribution of soil moisture is a challenge in high-elevation, data-scarce, and heterogeneous mountainous areas like the upstream of the Heihe River Watershed in Northwest China. We proposed a regression model based on the MODIS NDVI and LST to estimate profile soil moisture at the regional scale. Subsequently, we analyzed the spatial and temporal variability of soil moisture at the regional scale. Results showed that the multivariate linear

Conclusions
Estimating spatial and temporal distribution of soil moisture is a challenge in high-elevation, data-scarce, and heterogeneous mountainous areas like the upstream of the Heihe River Watershed in Northwest China. We proposed a regression model based on the MODIS NDVI and LST to estimate profile soil moisture at the regional scale. Subsequently, we analyzed the spatial and temporal variability of soil moisture at the regional scale. Results showed that the multivariate linear regression method could be used to estimate high-resolution soil moisture products in alpine and cold mountainous areas at both shallow and deep soil layers. The soil moisture in the east of the upstream of the Heihe River Watershed was significantly higher than that in the west, and the average soil moisture in the whole region was 14.5% in the 0-70 cm depth. The soil moisture at the 30-50 cm soil layer could reasonably represent the 0-70 cm profile soil moisture. Soil moisture in the 0-10 cm layer had the highest variability while the 20-30 cm layer showed the lowest soil moisture variability among all the layers. The contribution of this study was to estimate profile soil moisture at the regional scale by readily available remote sensing products of NDVI and LST.
However, the microterrain features such as slope and aspect were not considered in the soil-vegetation-elevation sampling zone, which might affect the accuracy of soil moisture upscaling models. Satellite products with high spatial-temporal resolutions need to be integrated with in situ observations and other soil and vegetation datasets to improve the accuracy of estimating soil moisture at the regional scale, especially in data-scarce and topographically complex mountainous regions. Appendix A Table A1. Layer soil moisture estimation model (10-20 cm).