Estimation of All-Weather 1 km MODIS Land Surface Temperature for Humid Summer Days

: Land surface temperature (LST) is used as a critical indicator for various environmental issues because it links land surface fluxes with the surface atmosphere. Moderate-resolution imaging spectroradiometers (MODIS) 1 km LSTs have been widely utilized but have the serious limitation of not being provided under cloudy weather conditions. In this study, we propose two schemes to estimate all-weather 1 km Aqua MODIS daytime (1:30 p.m.) and nighttime (1:30 a.m.) LSTs in South Korea for humid summer days. Scheme 1(S1) is a two-step approach that first estimates 10 km LSTs and then conducts the spatial downscaling of LSTs from 10 km to 1 km. Scheme 2(S2), a one-step algorithm, directly estimates the 1 km all-weather LSTs. Eight advanced microwave scanning radiometer 2 (AMSR2) brightness temperatures, three MODIS-based annual cycle parameters, and six auxiliary variables were used for the LST estimation based on random forest machine learning. To confirm the effectiveness of each scheme, we have performed different validation experiments using clear-sky MODIS LSTs. Moreover, we have validated all-weather LSTs using bias-corrected LSTs from 10 in situ stations. In clear-sky daytime, the performance of S2 was better than S1. However, in cloudy sky daytime, S1 simulated low LSTs better than S2, with an average root mean squared error (RMSE) of 2.6 °C compared to an average RMSE of 3.8 °C over 10 stations. At nighttime, S1 and S2 demonstrated no significant difference in performance both under clear and cloudy sky conditions. When the two schemes were combined, the proposed all-weather LSTs resulted in an average R 2 of 0.82 and 0.74 and with RMSE of 2.5 °C and 1.4 °C for daytime and nighttime, respectively, compared to the in situ data. This paper demonstrates the ability of the two different schemes to produce all-weather dynamic LSTs. The strategy proposed in this study can improve the applicability of LSTs in a variety of research and practical fields, particularly for areas that are very frequently covered with clouds.


Introduction
Land surface temperature (LST) is the radiative temperature of the land surface, which plays a crucial role in understanding various environmental problems such as heatwaves, drought, wildfire, air quality, and urban heat islands [1][2][3][4][5][6][7]. Since LST reflects the energy flux stability at the boundary of the surface and atmosphere, it is also used as a major parameter in modeling global physical processes, including hydrological and biogeochemical cycles [8][9][10]. Therefore, it is important to obtain accurate LST over large areas on both high spatial and temporal domains.
With the continued development of remote sensing technology, LST has been retrieved from satellite data for large areas with high temporal and spatial resolution. Thermal infrared (TIR) sensors are the most widely used in producing satellite-based LST. Several algorithms, such as singlechannel, split-window, and temperature and emissivity separation (TES) techniques, have been developed to provide TIR-based LST [11]. One of the most well-known TIR-based LST datasets is the moderate-resolution imaging spectroradiometer (MODIS) LST onboard the Terra and Aqua satellites. MODIS offers global LST products within a 1-2 K accuracy range, with a relatively high spatial resolution of 1 km, four times a day (two daytime and two nighttime LSTs). In addition, LST products are provided by several other TIR sensors with different specifications in both low earth orbit and geostationary orbit satellites: The visible infrared imaging radiometer suite (VIIRS), spinning enhanced visible and infrared imager (SEVIRI), and advanced spaceborne thermal emission and reflection radiometer (ASTER). Unfortunately, TIR-based LST is significantly affected by weather and atmospheric conditions; particularly, the surface temperature under clouds is not available. Some studies have been conducted to fill the gaps in LST data caused by clouds [12][13][14][15][16][17][18][19][20][21][22][23][24][25][26].
Previous studies aimed at overcoming the lack of TIR-based LST data under cloudy areas can be divided into four groups. The first group reconstructs LSTs in cloudy areas by combining spatially, temporally, or spatiotemporally neighboring clear sky LSTs [13,18,22,25]. In particular, recent studies have looked at modeling by combining multiple algorithms, such as regression and interpolation, using spatial and temporal information of multi-temporal LSTs [16,21,26]. The second group not only uses multiple LSTs, but also auxiliary variables that are highly correlated with LST, to estimate cloudy sky LSTs, using statistical methods such as regression kriging and spline interpolation. However, the critical limitation in the methods of these first two groups is that they assume that LST under cloudy weather conditions is not different from that under clear sky conditions. In general, clouds reduce incoming shortwave radiation during daytime by blocking the Sun, and increase downward longwave radiation during nighttime. Thus, nighttime LSTs in cloudy conditions are only slightly lower than those under clear skies, while the difference in daytime LST is more significant [27]. Consequently, it is essential to model LSTs under cloudy conditions.
The third group uses physical modeling approaches like surface-energy balance (SEB) theory, which is adopted to derive cloudy LSTs from spatially neighboring clear sky LSTs. The effect of the clouds is simulated using a correction term that takes into account surface insolation, air temperature, and wind speed [15,17,23,24]. The SEB techniques, however, require complex parameterization with air temperature and wind speed as input data. Although the variation of LST is assumed to be based on insolation during the daytime, the method is not able to be applied to nighttime [17].
The fourth group uses passive microwave (PMW)-based data to overcome the issue of cloudy areas in TIR-based LST data. PMW-based data are less affected by water vapor and clouds than TIRbased LST data. Brightness temperature (BT), measured by the advanced microwave Sscanning radiometer-earth observing system (AMSR-E) and advanced microwave scanning radiometer 2 (AMSR2) sensors, are frequently used as PMW-based data to estimate LST. Although PMW-based BT has limitations of coarse spatial resolution (10-25 km), it could be used as supporting data in estimating missing values of TIR-based LSTs under cloudy conditions [14]. For example, Shwetha and Kumar resampled 25-km AMSR-E/AMSR2-based BTs directly into 1 km, using them as input variables for artificial neural networks with auxiliary data of elevation, latitude, and longitude to model the allweather 1 km LST [19]. Meanwhile, many studies have derived the PMW-based LST using the original resolution of BT (i.e., 10-25 km), rather than resampling it to 1 km and then downscaling it to a high resolution to merge with TIR-based LST [12,20]. PMW-based methods simulate cloudy sky LSTs based on the fact that PMW can penetrate clouds. However, the previous studies have limitations in terms of spatial accuracy in merging coarse PMW-based data with high-resolution TIR-based LST.
In South Korea, summers can often be scorching, causing a variety of disasters, including heatwaves and tropical nights. During these hot summers, Northeast Asia, especially South Korea, is usually covered by clouds transported by the East Asia monsoon [28]. Therefore, the reconstruction methods using temporally or spatially neighboring clear sky LSTs could not be successfully applied in this area in summer due to a very high cloud cover rate. Moreover, daytime LST on humid summer days (i.e., July and August) in South Korea is generally high under clear sky conditions, but it drops sharply in cloudy weather, such as during the rainy season or typhoon periods. Previous studies, however, have failed to consider the variability (i.e., rapid change) of LST under cloudy conditions. In addition, many studies have used air temperature rather than LST as in situ data to validate their cloudy sky LST predictions [16,29,30]. However, it should be noted that air temperature and LST often show different patterns in regions with heterogeneous land surfaces [31,32].
This study proposes two different schemes for estimating all-weather 1 km MODIS LSTs for humid summer days over South Korea, based on machine learning, using multiple datasets made up of AMSR2 BTs, and the annual cycle parameters (ACPs) of satellite TIR-derived LSTs. The first scheme (S1) is a two-step approach that first estimates 10 km LSTs and then downscales the LSTs from 10 km to 1 km. The second scheme (S2) is a one-step algorithm that directly estimates the 1 km all-weather LSTs. The primary objective of this study is to investigate how well the two schemes that we propose simulate dynamic humid summer LSTs under clear-and cloudy sky conditions through a series of validation processes. The remainder of this paper is organized as follows. Section 2 presents the study area and the data we used. Section 3 introduces the methods in detail, including the framework of our two proposed schemes. In Section 4, the distribution of clear and cloudy sky LSTs in the summer season are analyzed using in situ station data for daytime and nighttime. Then, the two different schemes are evaluated by a series of validations, especially using in situ LSTs for both clear and cloudy conditions. Finally, Section 5 presents the conclusion of this study.

Study Area
The study area is the mainland of South Korea with an area of approximately 99,728 km 2 (latitude 34° N-38.5° N and longitude 126° E-129.5° E) ( Figure 1). South Korea generally has a humid, continental climate affected by the Asian monsoons, with a large amount of precipitation in summer during the rainy season (usually from the end of June to the end of July). The annual mean temperature is about 10-15 °C; August, the hottest month, has a mean temperature of 23-26 °C. Humidity ranges from 60%-75% on a national scale, with summers (July and August) rising to 70%-85%. The southern coast is subject to late-summer typhoons. As seen in Figure 1, the dominant landcover categories of the study area are forest (67.7%), agricultural land (22.2%), urban areas (4.6%), and others, including grass, water, barren, and wetlands (5.5%).

Saetellite Data
The AMSR2 onboard the global change observation mission 1st-water (GCOM-W1) satellite, launched in May 2012, provides global PMW-based BT data. It acquires a set of daytime and nighttime microwave data twice a day: The equator crossing time is 1:30 p.m. for the ascending pass, and 1:30 a.m. for the descending pass. The AMSR-2 has seven frequencies, with both vertical and horizontal polarizations, and approximately 62 × 35, 62 × 35, 42 × 24, 22 × 14, 19 × 11, 12 × 7, and 5 × 3 km spatial resolution at 6.9, 7.3, 10.7, 18.7, 23.8, 36.5, and 89.0 GHz, respectively. Among these, we used the four frequencies (36.5, 23.8, 18.7, and 10.7 GHz) mostly used for the estimation of LST in the previous studies [30]. Low frequency data resampled into a 10 km resolution were downloaded from the Japan Aerospace Exploration Agency (https://gcom-w1.jaxa.jp) for 2013-2018. We used daily MODIS daytime and nighttime Aqua LST data (MYD11A1) because the equatorial-crossing times of Aqua MODIS are nearly the same as those of AMSR2 (1:30 p.m.-daytime and 1:30 a.m.-nighttime). The MYD11A1 LST data, which have 1 km spatial resolution, were retrieved using a generalized splitwindow algorithm [33]. The MYD11A1 products from 2013-2018 were downloaded from Earthdata Search (https://search.earthdata.nasa.gov/search). South Korea's elevation was retrieved from the shuttle radar topography mission (SRTM) digital elevation model (DEM), with 30 m spatial resolution (https://earthexplorer.usgs.gov). Global man-made impervious surface (GMIS) data with 30 m spatial resolution derived from Landsat images for the year of 2010 [34] were obtained to get the fractional impervious surface in this study.

In Situ LST Data
In situ LSTs (1 a.m./p.m. and 2 a.m./p.m.) from 2013 to 2018, obtained from the automated surface observing systems (ASOSs) operated by the Korea Meteorological Administration, were used as reference data. As shown in Figure 1a, a total of 10 ASOSs were selected based on the following conditions: First, the stations close to the coastline were excluded, because satellite-based LST data could be contaminated from the influence of ocean water included in the grid; and second, the stations that have high bias were also excluded from the reference data, after applying the biascorrection method described in Section 3.1.

LST Pre-processing
The 1 km MODIS LSTs with quality control (QC) flags of 'cloud' were considered to be pixels under cloudy sky conditions, and all others were clear sky LSTs. To train the model and validate the estimated LST under cloudy areas, we used in situ LSTs measured at ASOSs. However, there are spatial scale differences between the MODIS LST (1 km) and the point-based ASOS LST; thus, a systematic bias could occur. To solve this problem, we fitted in situ LST and MODIS LST under clear sky conditions using polynomial regression by station [35], to bias-correct in situ LSTs for both daytime and nighttime (Equation (1)).
where X is the in situ LST and Y is the MODIS LST under clear sky conditions. a, b, and c represent the coefficients. We used the quadratic function because many ASOSs have observed that in situ LSTs rise more rapidly than MODIS LSTs at high temperatures in the summertime. The MODIS and in situ LSTs from March to October, starting from 2013 through to 2018 at each station were used for the bias-correction. Only MODIS LSTs with good quality were used based on the QC flags, and the hourly in situ LSTs were linearly interpolated to the MODIS view time. We excluded the winter season (December through February) because the range of winter LSTs is not compatible with that of summer LSTs, which are the target variable of this study. Finally, we selected a total of 10 ASOSs for validation, with a calibration error (RMSE) < 2.5 °C for both daytime and nighttime in July-August (summer season) from the bias-correction analysis. Figure 1 shows the geographic distribution of the ten ASOSs used in this study. Appendix A1 describes the statistical results of before and after bias correction of the 1 km in situ LSTs at the 10 stations.
In addition, clear sky 1 km MODIS LSTs were aggregated to 10 km based on the 10 km AMSR2 grid area. At this stage, only 1 km MODIS LSTs where 95% or more clear sky LSTs exist in a 10 × 10 km 2 area were used for aggregation. We also fitted the in situ LSTs and the 10 km MODIS LSTs under clear sky conditions using polynomial regression in the same way as the bias correction of the 1 km LSTs at the selected 10 ASOS stations to produce the bias-corrected in situ LSTs at 10 km scale. Table 1 describes the input variables used for the estimation of LST under cloudy conditions. A total of 10 AMSR2 BTs were projected onto MODIS LSTs, whilst AMSR2 pixels from near the coastline were masked to eliminate the effects of ocean water. Due to the shift of the flight overpass path, the areas of missing AMSR2 BT values were also excluded.

Pre-Processing of Input Variables
The 1 km MODIS annual cycle parameters (ACPs), including mean annual surface temperature (MAST), yearly amplitude surface temperature (YAST), and LST phase shift relative to the spring equinox (theta) were calculated for each of daytime and nighttime based on the following equation [36].
where d represents the day of the cycle relative to the spring equinox. The ACPs were created for each year of the study period and averaged to construct one MAST, YAST, and theta each for daytime and nighttime. We aggregated the 30 m SRTM elevation and 30 m GMIS impervious surface fraction to the 1 km resolution of MODIS. Latitude and longitude values were extracted from the information contained in the MODIS tiles. This study converted DOY to values ranging from −1 to 1 within a oneyear period, using a sine function that considers seasonality (i.e., setting the middle of summer as 1 and the middle of winter as −1) [37].

Random Forest (RF)
In this study, we applied machine-learning random forest (RF) to estimate the LST for allweather conditions. RF has been widely used to conduct a variety of classification and regression tasks [38][39][40][41][42][43]. RF comprises classification and regression trees, producing a variety of independent trees to make a final decision through two randomizations: (1) Random selection of training samples and (2) random selection of predictor variables at each node of a tree [44]. This randomization is aimed at solving the overfitting problem and reducing the sensitivity of a model that comes from training sample configurations [45]. When a variable is randomly permuted, RF calculates the relative variable importance from out-of-bag (OOB) samples, measuring the mean squared error (MSE) of the OOB portion of samples in each tree. The same process is implemented whenever each input variable is perturbed, and the difference between the two MSEs of all trees is averaged. The larger the increase in the percentage of MSE (%incMSE) of a variable, the greater the contribution of the variable. It should be noted that the relative variable importance is considered as the sum of local contributions, not global importance [46][47][48]. The RF was implemented using the R statistical software through the "ranger" add-on package, with default model parameter settings (e.g., the number of trees set as 500). The "ranger" is a fast implementation of RF, or recursive partitioning, and is known to be suited to high-dimensional data with large datasets [49].

Schemes for Estimating All-Weather LSTs
In this study, we proposed two schemes (S1 and S2) to estimate all-weather MODIS LSTs. The S1 is based on a two-step approach (see Figure 2). The first step is estimating 10 km LSTs for daytime and nighttime. We used the following independent variables: Four different frequencies of the AMSR2 of two polarization BTs (10H/V, 18H/V, 23H/V, and 36H/V), along with five 10 km resampled auxiliary variables (Elev, Lat, Lon, DOY, and year). The aggregated 10 km MODIS LSTs under the clear sky condition were used as a target variable. Moreover, we used the 10 km bias-corrected in situ LSTs under the cloudy sky condition from 10 stations as a dependent variable together to train the model with the real LST characteristics under the cloudy skies. RF was applied to predict LSTs by training the samples from March to October between 2013 and 2018. Finally, the 10 km LSTs were produced based on the developed model using the input variables under all-weather conditions. The second step involved the spatial downscaling of LSTs from 10 km to 1 km. We used RF to develop the downscaling model for each date. The Elev, Imp, Lat, Lon, and ACPs (i.e., MAST, YAST, and theta) were used as input variables, considering the surface properties and spatial distribution characteristics of 1 km LSTs. At first, the 1 km MAST, YAST, and theta with 1 km auxiliary factors-Elev, Imp, Lat, and Lon-were aggregated to 10 km for independent variables for the RF model, and the 10 km LSTs, constructed in the first step, were used as the dependent variable. After the model was trained for each date, the original high-resolution 1 km input variables were put into the RF model to finally produce the 1 km all-weather LSTs, after the residual correction proposed by Hutengs and Vohland [50]. The second scheme S2 is a one-step algorithm that directly estimates 1 km all-weather LSTs ( Figure 3). To create this scheme, first we downscaled all 10 km AMSR2 BTs to 1 km using bilinear resampling. The resampled BTs together with 1 km auxiliary variables (i.e., MAST, YAST, theta, Elev, Imp, Lat, Lon, DOY, and Year) were used as input variables in RF. We used the 1 km clear sky MODIS LSTs and the 1 km bias-corrected in situ LSTs under cloudy skies from ten stations as target variables for daytime and nighttime separately. RF models were built for each year with the samples of the entire study area from March through October being used for training. We put the input variables of all-weather conditions into the developed RF model to produce 1 km LSTs for both clear and cloudy sky conditions.

3.5.A Series of Validations
For the first step of the first scheme, S1, two types of validations were implemented using the clear sky 10 km aggregated MODIS LSTs focusing on the summertime (i.e., July and August). In the first validation, we randomly divided July and August samples into training and validations sets (80%/20% split). In the humid summer, however, most of the area in South Korea is often covered by clouds for the whole day. So, for the second validation, we randomly divided the samples of July and August by date: 80% for training and the remaining 20% for validation. In the second step of S1, LSTs downscaled into 1 km by RF were validated using all clear sky MODIS LSTs in the summer. At this stage, we compared the performance of RF with other downscaling techniques in two ways: (1) A bilinear resampling of 10 km LSTs to 1 km; (2) a lapse-rate technique using Equation (3), which assumes that the temperature difference comes from elevation, suggested by Dual et al. [12]: where LST1km is the downscaled LST at 1 km scale; LST10km represents the predicted 10 km LSTs in step 1; Elv1km is the surface elevation at 1 km scale; Elv10km is the surface elevation at 10 km scale; and the TLR is the temperature lapse rate, which is defined as the rate of temperature decrease with elevation in the troposphere (average TLR value is 6.5 K/km; [51]). For the second scheme, S2, first and second validations similar as in S1 were performed. Moreover, the estimated all-weather 1 km LSTs, including those under cloudy skies, were validated using bias-corrected in situ LSTs of 10 ASOSs in summer. Among the 10 ASOS stations, data at one station were used as validation and the others were used to train the model with in situ cloudy LSTs as targets. This leave one-station out cross-validation (LOOCV) was repeated for all 10 stations. The coefficient of determination (R 2 ; Equation (4)), RMSE (Equation (5)), normalized RMSE (nRMSE; Equation (6)), and bias (Equation (7)) were used to evaluate the performance of the models for each validation step.
where is the measured value, is the predicted value, and represent the maximum and minimum values of observations, and n is the number of samples.  Figure  4) shows that the upper quartile of cloudy sky LSTs is close to the lower-quartile of clear sky LSTs for most in situ stations. One possible reason for this is an increasing cloud-cover on rainy days in humid areas in the summer, where precipitation reduces the surface sensible heat [12,52]. When the sky is heavily cloudy in the daytime, most of the Sun's energy does not reach the surface, preventing the heating up of the Earth. Otherwise, at nighttime, clear sky and cloudy sky LST distributions do not differ significantly for all the stations. There is no incoming radiation from the Sun at night, therefore LSTs do not rise significantly. The average nighttime LSTs-early overnight-under cloudy skies have a nearly identical range as, or only slightly lower than, the clear sky LSTs.

Analysis of Clear and Cloudy Sky in Situ LSTs in Summer
Among the 372 days, the total number of summer days (July-August) for the six years (2013-2018), the size of good-quality clear sky samples was very small, especially for the daytime (see the sample size in Tables A1 and A2). This implies that the temporal or spatial interpolation approach to reconstructing LSTs in cloudy conditions could not be effectively applied to the summer season because of a large amount of cloud contamination.  Figure 5 shows the results of the first (upper) and second (lower) validations for summer daytime and nighttime for Scheme 1(S1). The R 2 values for estimating 10 km daytime and nighttime LSTs were 0.82 and 0.85, respectively, in the first validation. The RMSE values have a daytime LST of 1.40 °C and a nighttime LST of 0.95 °C. Considering their similar nRMSE, the main reason for the RMSE difference between daytime and nighttime is likely the range difference in the temperature distribution. In the second validation, the R 2 (RMSE) values for estimating 10 km daytime and nighttime LSTs were 0.79 (1.55 °C) and 0.88 (1.14 °C), respectively. The accuracy of the two RF-based validations corresponds to the MODIS LST validation error of 1-2 K [53][54][55]. In particular, the second validation results were similar to the first validation in terms of accuracy (i.e., R 2 and nRMSE), even with a separate validation dataset by date. These results imply that the constructed 2-step RF-based model is robust for humid areas where clouds often cover most regions in summer. However, since we performed both validations using the MODIS clear sky LSTs, the effect of clouds was not considered. Thus, accuracy assessment using in situ LST data under cloudy conditions was needed.  Figure 6 shows the validation results of the second step for Scheme 1, using clear sky MODIS LSTs estimated with the bilinear resampling (Bilinear), the lapse rate approach (Lapse rate), and the proposed RF-based algorithm (RF) among the different land covers. RF outperformed the other two techniques with higher correlation and lower RMSE in most land covers for both daytime and nighttime. For daytime, RF showed R 2 of 0.6-0.8 and RMSE of 1.6-3.0 °C with some variation according to land cover. This accuracy is considered significant with an average range value of RMSE similar to approximately 2.0 K, which is the target accuracy in several daytime LST retrieval studies [56][57][58]. The reason why urban areas showed lower performance than other land covers might be that the relatively high LSTs in urban areas were not accurately simulated when downscaling 10 km LST to 1 km, considering the fact that RF underestimated the downscaled LSTs (mean bias: −1.5 °C, not shown). Moreover, it has been reported that MODIS daytime LSTs in urban areas have relatively high uncertainty [16,59]. Note that the RF outperformed the lapse rate approach, which assumes the dependency of cloudy LSTs only on the altitude for all land covers in the daytime [12]. The RF appears to simulate the surface thermal heterogeneity well in daytime, using not only Elev, but also the Imp and ACPs-MAST, YAST, and theta-as input variables.

Evaluation of Two Schemes Using Clear Sky LSTs
For nighttime, the RF model returned highly accurate results, with R 2~0 .8 to 0.9 and RMSE < 1.5 °C with some variations by land cover. The lapse rate approach showed worse performance than the other two methods for most land cover types. Duan et al. used an average lapse rate of 6.5 K/km for both day and night [12]. However, the lapse rate could be applied to the air temperature of the troposphere, not the LSTs, which implies that the LST difference by altitude does not seem to be significant on the local scale. Interestingly, the bilinear and RF models did not show much difference for nighttime, where RF showed slightly better performance than the bilinear interpolation in terms of R 2 and RMSE for most land covers. This suggests that the nighttime LST is thermal-homogenous enough for bilinear interpolation to achieve good results. Figure 6. Model performance of three downscaling approaches (bilinear resampling (bilinear), lapse rate, and random forest (RF)) in the second step for scheme 1 by land cover for daytime (a) and nighttime (b). Table 2 summarizes the results of the first and second validations for a 1 km clear sky LST estimation in S2. The first validation showed excellent performance resulting in R 2 > 0.9 and RMSE < 1 °C for both daytime and nighttime. The second validation, however, dividing samples by date, yielded relatively low accuracy, which is possibly due to the daily LST variations. It is not surprising that the prediction performance of S2 by time over humid areas in the summer would be less accurate than the first validation, because clouds covered most areas for specific days, which might not have been trained well. In particular, cloudy areas in the summer daytime have an LST range different from clear sky areas ( Figure 4); therefore, S2-based 1 km LSTs need further investigation using in situ data under cloudy conditions.  Figures 7 and 8 show the all-weather average daytime and nighttime 10 km LSTs of S1 from July through August for each year. The overall spatial distribution of the daytime and nighttime LSTs appeared similar, but the range of the daytime LSTs (22.1-34.0 °C) was considerably higher than that of the nighttime LSTs (16.0-22.6 °C). This is because the differences in heat capacity and evapotranspiration by land type result in a wide range of LSTs in the daytime, affected by incoming solar radiation [22]. Elevation (see Figure 1) yielded a negative spatial relationship with LSTs for both daytime and nighttime, which is consistent with the results of Bertoldi that showed that LST decreases with increasing elevation because of complex factors such as air temperature and vegetation amount [60]. In this study, therefore, the elevation input variable contributes significantly to the LST estimation model. In terms of land cover (see Figure 1), LSTs were relatively low in both the daytime and nighttime in the forested areas, widely distributed in the study area. Meanwhile, the cropland areas showed higher temperature values than in forests, especially for the western part of the study region, which is consistent with the results of Lee et al. [61]. The urban areas also showed relatively high LSTs compared to other landcovers, because of the highly impervious surfaces in both daytime and nighttime [32,62].

Spatial Distribution Analysis of All-weather LSTs from Two Schemes
The developed 10 km LSTs of S1 also showed a temporal variation over six years (2013-2018). For example, South Korea experienced unprecedented extreme temperatures (i.e., a heatwave) in the summer of 2018 [63], where the developed LSTs show distinctly high-temperature distribution for both daytime and nighttime (Figures 7f and 8f). Furthermore, 10 km LSTs were relatively high at nighttime of summer 2013 (Figure 8a), which is consistent with the analysis of Choi and Lee that showed frequent tropical night events in South Korea in summer 2013 [64]. Overall, the LSTs estimated using S1 on the 10 km scale seem to accurately describe both spatial and temporal patterns of LSTs in summer daytime and nighttime.   Figure 9 shows the mean and variance of the 2013-2018 1 km all-weather LSTs produced by S1 and S2 for summer in both the daytime and nighttime. In the daytime, it should be noted that S2 generally simulated LSTs higher in urban and agricultural areas than S1 (Figure 9a,b). The difference in the estimated LSTs between S1 and S2 might also be due to S2 overestimating the low LSTs under cloudy conditions. The bias of the second validation of daytime was relatively high (~0.16), as shown in Table 2. At nighttime, the S1 and S2 showed similar LST distributions, except urban areas where the S1 LSTs were slightly higher than the S2 ones. Note that the variance of S1 LSTs is quite a lot higher than S2 in most areas in the daytime (Figure  9e-f). The relative variable importance (%) explains this fact as provided by RF ( Figure A1). Among the several input variables, only AMSR2 BTs showed temporal changes. In S2, temporally constant variables, such as Lat, YAST, theta, and MAST, played significant roles in the RF model ( Figure A1c); therefore, S2 could be relatively limited in simulating the temporal variations of LSTs. In the case of S1, AMSR2 BTs were used only to construct the 10 km LSTs first (step 1), whereas ACPs were used only for step 2. Therefore, S1 could simulate the temporal pattern of daily LSTs accurately by downscaling the proposed 10 km LSTs to 1 km for each day in step 2. At nighttime, S1 and S2 had similar LST variance for most areas (Figure 9g-h). In particular, LSTs in summer nighttime have a distinctly lower variance than those in the daytime (Figure 9e-f), which implies that the nighttime does not show significant temperature changes in summer regardless of weather conditions. Many previous studies that predicted 1 km cloudy sky LSTs have used auxiliary variables that are indirectly related to LST, such as elevation, latitude, and longitude [12,19]. It should be noted that ACPs (i.e., MAST, YAST, and theta) have been identified as crucial variables by RF for both S1 and S2 ( Figure A1). This suggests that ACPs are useful in estimating LSTs for all-weather conditions, due to their characteristics representing the spatial distribution of LSTs, without being affected by clouds.

Comparison of Two Schemes Using in Situ LSTs
The clear and cloudy sky 1 km LSTs produced from the two schemes were validated (i.e., LOOCV) using the 1 km bias-corrected in situ LSTs measured at the 10 stations (Table 3 for daytime  and Table 4 for nighttime). It should be noted that S1 showed negative biases for most stations, which implies that S1 tended to underestimate 1 km LSTs under clear sky conditions. In particular, the RMSE of S1 was relatively high with a large negative bias, especially for stations 5. This might be because the station was located in built-up urban areas, where S1 LSTs were possibly underestimated, as presented in Figure 9. On the other hand, in the case of S2, the bias under clear sky conditions was close to zero and slightly positive for most stations. Therefore, S2 produced higher accuracy than S1 for high LSTs under clear sky conditions in the daytime.
In the daytime under cloudy skies, S1 outperformed S2 for most stations. At stations 2, 3, 4, 8, and 10, S1 exhibited a significant increase in R 2 compared to S2. High temporal correlations of S1 imply the effective simulation of temporal variations of LSTs. Unlike the clear sky conditions, the RMSE of S1 was much smaller compared to S2 for most stations under cloudy skies. One possible reason for the high RMSE in S2 is that low LST values under cloudy areas could be overestimated based on the high bias of the S2 RF model under cloudy sky conditions (see also Figure 9). The RMSEs of S1 (from 2.1 to 3.9 °C in summer) for daytime under cloudy skies are comparable to or lower than those from the literature (RMSE of 4.3-8.3 °C for four stations in China [65], RMSE of 5.1-5.6 °C for two sites in Africa [17], RMSE of 1.8-2.7 °C for three stations in North China [30], and RMSE of 3.5-4.4 °C for four stations in China [12]), although those studies focused on all seasons, not only summer. It should be noted that Zhou et al. reported that the accuracy of LST estimation under the cloudy sky conditions in summer was lower than the other seasons [26]. Table 3. The leave one-station out cross-validation (LOOCV) results of the estimated 1 km clear and cloudy sky LSTs from scheme 1 (S1) and scheme 2 (S2) using bias-corrected in situ LSTs at 10 stations during July and August from 2013 to 2018 for daytime. Refer to Figure 1 for station numbers.

Clear Skies
Cloudy Skies

R 2 RMSE (℃) Bias (℃) R 2 RMSE (℃)
Bias (℃) S1 S2 S1 S2 S1 S2 S1 S2 S1 S2 S1 S2  Figure 10 represents the temporal distribution of the S1, S2, and in situ LSTs at station 9 for July-August 2017, when the LSTs dynamically changed. In the daytime, extremely high LSTs were well predicted by S2, such as for 13-July and 7-August, as opposed to S1; however, relatively low LSTs were better predicted by S1. It should be noted that there were days when the very high LSTs sharply dropped, such as between 6 July and 7 July, as well as 13 July and 15 July. S1 simulated these rapid temperature changes better than S2. Furthermore, there were many days with large amounts of precipitation, which resulted in low LSTs (i.e., 10 August and 14 August) in the humid summer. LSTs for such days were also better simulated by S1 with an improved temporal correlation.
At nighttime, R 2 was significantly high in both S1 and S2, the RMSE was less than 1 °C, and the bias converged to zero for most stations under the clear sky conditions (Table 4). Under cloudy skies, S2 yielded higher R 2 at some stations compared to S1. Nevertheless, both S1 and S2 produced RMSE < 2 °C under the cloudy sky conditions, which suggests that the nighttime LSTs are relatively less affected by atmospheric phenomena, such as precipitation and clouds (Figure 10b). Figure 10. Time-series of the estimated LSTs for schemes 1(S1) and 2(S2), and 1 km bias-corrected in situ LSTs with daily precipitation at station 9 for (a) daytime and (b) nighttime during July and August in 2017 (except the missing days due to advanced microwave scanning radiometer (AMSR2) gaps between paths). Daily precipitation data were obtained from automated surface observing systems (ASOSs) operated by the Korea Meteorological Administration. Table 4. The leave one-station out cross-validation (LOOCV) results of the estimated 1 km clear and cloudy sky LSTs from scheme 1(S1) and scheme 2(S2) using 1 km bias-corrected in situ LSTs for 10 stations during July and August from 2013 to 2018 for nighttime. Refer to Figure 1 for station numbers.

Two Scheme Combinations
We further examined the combination of the two schemes (S1 and S2) to improve estimation performance, based on the difference of the LST distribution between clear and cloudy sky conditions, as analyzed in Section 4.1. The LSTs developed from S2 were used for days with relatively high LSTs, whereas S1 was used for days with lower temperatures. Appendix A3 describes the detailed combination methods of the daytime. For nighttime, the average of S1 and S2 LSTs was used since both schemes resulted in high accuracy [66]. Table 5 shows the results of the LOOCV of all-weather LSTs, as well as of S1, S2, and Scheme combined (SC) models, using bias-corrected in situ LSTs for daytime and nighttime. In the daytime SC, R 2 was relatively high and RMSE was distinctly lower at many stations when compared to S1 and S2. For nighttime, the SC exhibited significantly higher R 2 than did S1 and S2 for several stations, and the RMSE of SC was also generally close to S1 or S2, whichever was lower. The superiority of SC to S1 and S2 is consistent with previous studies' findings that the combination of different models improves performance by overcoming the limitations of each individual model [67]. Therefore, we propose all-weather LSTs by combining two different schemes with an average R 2 of 0.82 and 0.74 and with RMSE of 2.5 °C and 1.4 °C for daytime and nighttime, respectively, over the 10 in situ stations. Table 5. The leave one-station out cross-validation (LOOCV) results of the estimated 1 km all-weather LSTs from scheme 1(S1), scheme 2(S2) and scheme combined (SC) using 1 km bias-corrected in situ LSTs at 10 stations during July and August from 2013 to 2018 for daytime. Refer to Figure 1 for station numbers.

Conclusions
In this study, we estimated all-weather 1 km MODIS LSTs for daytime and nighttime in South Korea for the humid summer days. We used eight AMSR2 BTs, three ACPs (i.e., MAST, Yast, and theta), and six auxiliary variables for the LST estimations based on RF machine learning. Both clear sky MODIS LSTs and the bias-corrected in situ LSTs under cloud sky conditions were used as a dependent variable to provide the models with the LST characteristics for clear and cloudy skies. We have proposed two schemes: A two-step approach (S1) first estimates 10 km LSTs and then involves the spatial downscaling of LSTs from 10 km to 1 km. S2 is a one-step algorithm that directly estimates the 1 km all-weather LSTs, which we have evaluated using a series of validations. In clear sky daytime, S2 slightly outperformed S1, but in cloudy sky daytime, S1 had an average R 2 of 0.78 and RMSE of 2.6 °C, an improvement when compared to S2 (R 2 of 0.74 and RMSE of 3.8 °C) for the biascorrected 10 in situ stations. At nighttime, S1 and S2 showed no significant difference in performance regardless of cloud conditions. We further examined the combination of the two schemes (S1 and S2) in order to improve estimation performance, producing promising results, with R 2 of 0.82 and 0.74 and with RMSE of 2.5 °C and 1.4 °C for daytime and nighttime, respectively, over the 10 in situ stations. This study has revealed that the two-step-based S1 was better able to simulate low LSTs in cloudy sky, humid summer daytime conditions (i.e., rainy days) than S2. Moreover, we found that ACPs appear relatively important for the estimation of LSTs in light of spatial variability. To our knowledge, this is the first study to predict all-weather 1 km MODIS LSTs that focuses on humid summer days in great detail. Nevertheless, there is still room for further validation of the constructed LSTs over built-up areas since an insufficient number of in situ stations in urban land cover were used in this study.
Although we have focused this study on South Korea, we believe that the suggested schemes could be successfully used over other regions frequently covered with clouds in humid summer seasons. Recently, new MODIS Aqua LST datasets (MYD21A1D for daytime and MYD21A1N for nighttime) were produced utilizing the ASTER temperature/emissivity separation (TES) technique suitable for hot and humid conditions, although the approach has, thus far, only been tested using a small number of measurements [68]. It is expected that the proposed technique may be used to predict MYD21 LSTs under cloudy sky conditions.

Conflicts of Interest:
The authors declare no conflict of interest.

A.1. Before-and after 1 km Bias-corrected in situ LSTs
Tables A1 and A2 show the relationship between good-quality MODIS LSTs and in situ LSTs before and after bias correction under clear skies, in daytime and nighttime, respectively. In the daytime, before bias correction, the R 2 of the time series data showed a relatively significant range of 0.46-0.76, but the RMSE and bias were quite high. The significant spatial thermal difference within a 1 km grid in the daytime, especially for the summer season, could result in the underestimation of MODIS LSTs when compared to the in situ data [69]. After the bias correction, the RMSE decreased towards 2 °C at the 10 stations in which the range of errors were similar to that of the typical MODIS LST validation results (i.e., ~2 K; [59]). The bias converged close to 0, indicating a very slightly negative signal at some stations after correction. For the nighttime, before bias correction, the temporal R 2 was relatively higher than in the daytime for most stations, and the RMSE and bias were significantly lower than the daytime. These results are consistent with previous studies' findings that the MODIS LST validation error is much lower at nighttime than in the daytime [70,71]. After the bias correction, the RMSEs were under 1.5 °C and the bias was close to zero for all stations. The reason that the nighttime has a lower bias correction error than the daytime is possibly due to the higher thermal homogeneity in one MODIS grid (i.e., 1 km resolution) at the nighttime without solar shortwave radiation. Table A1. The relationship between clear sky MODIS LSTs and the before-and-after 1 km biascorrected in situ LSTs during July and August from 2013 to 2018 for the 10 stations in daytime. Refer to Figure 1 for station numbers.

Station Number
Before  Table A2. The relationship between clear sky MODIS LSTs and the before-and-after 1 km biascorrected in situ LSTs during July and August from 2013 to 2018 for the 10 stations at nighttime. Refer to Figure 1 for station numbers.  Figure A1 shows the normalized relative variable importance calculated from the RF models for the two schemes. The normalization was done by sum to 100%. Daily variable importance of step 2 in S1 were averaged before normalization. Figure A1. Normalized relative variable importance calculated from the RF models of two steps (step 1 and step 2) for scheme 1 (S1), the orange bar; and scheme 2(S2), the blue bar, for daytime (a-c) and nighttime (d-f).

A.3. Two Scheme Combination (SC) Methods of Daytime
In Section 4.1, the upper quartile of cloudy sky LST was close to the lower quartile of clear sky LST for most in situ stations in the daytime. We used the AMSR2 BT to select the dates of daytime high and low LSTs. First, all of the eight daytime AMSR2 BTs listed in Table 1 were aggregated to 1 km as a MODIS grid using bilinear resampling. Temporal correlations of AMSR2 BTs and in situ data at the 10 stations were performed over all-weather conditions in summer; we selected 10V, which showed the highest correlation, as the reference variable (Table A3). The 75th percentile (i.e., upper quartile) value of 1 km AMSR2 BT 10V was then calculated over the summer study periods (July-August, 2013-2018) for each pixel. Finally, the LSTs developed from S2 were used for days when the BT of AMSR2 10V was over the 75th percentile, and S1 was used for days when the BT of AMSR2 10V was below the 75th percentile, based on the results in Section 4.4. Table A3. The temporal correlation (Pearson correlation coefficient; R) results between eight AMSR2 BTs and the 1 km bias-corrected in situ LSTs during July and August from 2013 to 2018 for the 10 stations in daytime. Refer to Figure 1 for station numbers.