Retrieval of Daily Reference Evapotranspiration for Croplands in South Korea Using Machine Learning with Satellite Images and Numerical Weather Prediction Data

: Evapotranspiration (ET) is an important component of the Earth’s energy and water cycle via the interaction between the atmosphere and the land surface. The reference evapotranspiration (ET 0 ) is particularly important in the croplands because it is a convenient and reasonable method for calculating the actual evapotranspiration (AET) that represents the loss of water in the croplands through the soil evaporation and vegetation transpiration. To date, many e ﬀ orts have been made to retrieve ET 0 on a spatially continuous grid. In particular, the Moderate Resolution Imaging Spectroradiometer (MODIS) product is provided with a reasonable spatial resolution of 500 m and a temporal resolution of 8 days. However, the applicability to the local-scale variabilities due to complex and heterogeneous land surfaces in countries like South Korea is not su ﬃ ciently validated. Meanwhile, the AI approaches showed a useful functionality for the ET 0 retrieval on the local scale but have rarely demonstrated a substantial product for a spatially continuous grid. This paper presented a retrieval of the daily reference evapotranspiration (ET 0 ) over a 500 m grid for croplands in South Korea using machine learning (ML) with satellite images and numerical weather prediction data. In a blind test for 2013–2019, the ML-based ET 0 model produced the accuracy statistics with a root mean square error of 1.038 mm / day and a correlation coe ﬃ cient of 0.870. The results of the blind test were stable irrespective of location, year, and month. This outcome is presumably because the input data of the ML-based ET 0 model were suitably arranged spatially and temporally, and the optimization of the model was appropriate. We found that the relative humidity and land surface temperature were the most inﬂuential variables for the ML-based ET 0 model, but the variables with lower importance were also necessary to consider the nonlinearity between the variables. Using the daily ET 0 data produced over the 500 m grid, we conducted a case study to examine agrometeorological characteristics of the croplands in South Korea during the period when heatwave and drought events occurred. Through the experiments, the feasibility of the ML-based ET 0 retrieval was validated, especially for local agrometeorological applications in regions with heterogeneous land surfaces, such as South Korea.


Introduction
Evapotranspiration (ET) is the transport of water vapor to the atmosphere through the evaporation from soil and canopy surfaces and the transpiration from vegetation [1]. ET is an important component of the Earth's energy and water cycles via the interaction between the atmosphere and the land surface in the form of latent heat using net radiation [2]. ET is affected by many agrometeorological variables such as radiation, temperature, humidity, wind, precipitation, and soil moisture. The amount of water transported by ET is generally greater than that by runoff in most agricultural fields [3]. Thus, the amount of irrigation water for the cropland can be determined by the cumulative amount of ET.
Potential evapotranspiration (PET) is the possible amount of ET according to climatic and meteorological conditions, assuming a sufficient supply of water [4]. Actual evapotranspiration (AET) is a substantial amount of ET, as the result of the effects of many factors such as vegetation type, the fraction of vegetation cover (FVC), phenological cycle, soil type, soil moisture, irrigation, and drainage, in addition to meteorological conditions [2]. Water management in the croplands includes the maintenance of a suitable level of soil moisture and the adjustment of irrigation water. The accurate information of AET is essential to determine the amount of irrigation water because the AET represents the loss of water in the croplands through soil evaporation and vegetation transpiration. The in situ AET is usually obtained from the eddy covariance [5] that can be measured by the energy and water exchanges between the land surface and the atmosphere on a micrometeorological tower [6]. It is the most scientifically reliable system for the in situ measurement of AET, but the installation and operation of the eddy covariance instruments on the micrometeorological tower require high costs and many efforts [7].
For the sake of convenience, the AET can be calculated by multiplying a crop coefficient (K c ) by the reference evapotranspiration (ET 0 ) defined as the ET of well-irrigated hypothetical grassland under a specific meteorological condition [2]. Hence, the K c is expressed as the ratio of AET to ET 0 . The ET 0 can have a similar physical quantity as PET, but is in a more agricultural context, assuming the calculation of AET using the K c [8]. The ET 0 for a given point can be easily derived from the Food and Agriculture Organization (FAO) version of the Penman-Monteith (PM) equation only using meteorological parameters [3], without the hydrological and physiological factors such as soil moisture and stomatal resistance. While the meteorological input data for the PM equation is usually obtainable, the physiological and phenological properties of the crops are hard to quantify in detail. So, the K c can be a useful method for the estimation of AET in the croplands. Under considerations of the possible use for the AET estimation, an efficient calculation of ET 0 using meteorological data will be of help for many agrometeorological studies.
Despite the usefulness of K c in the AET estimation, it is usually confined to a specific point because some of the input data are only obtainable from in situ point observations. However, individual point observations cannot fully represent spatially continuous areas, although ET can vary spatially according to the meteorological and cropland variables. To date, many efforts have been made to retrieve PET, ET 0 , and AET over a continuous grid. Most of them tried to expand the point-based calculation of PET, ET 0 , and AET to a continuous grid using satellite images and gridded meteorological data with the linkage to the PM equation [9][10][11][12][13][14][15][16][17] or the Priestley-Taylor (PT) equation with the empirically derived coefficients fitted for the study areas [18][19][20][21][22][23][24][25][26][27][28]. They also introduced additional factors to the retrieval process for a more realistic simulation of ET 0 or AET, such as soil structure [29], terrain effect [9], stomatal conductance [10,13], diurnal temperature difference [24,30], and vegetation greenness [12,24]. Additionally, some previous studies estimated ET 0 or AET using satellite-derived vegetation index and net radiation data with a statistical model like linear regression [30,31]. Recently, artificial intelligence (AI) approaches using the techniques like neural network (NN) or machine learning (ML) have tried to take account of the nonlinearity between ET and the variables for meteorological and land surface conditions, showing a possibility for the retrieval of the gridded ET 0 or AET [32][33][34][35]. Because the PM and PT equations and the AI techniques are already in a stable status, the construction of a reasonable database for the meteorological and land surface conditions is one of the critical points for the effective retrieval of gridded ET 0 and AET.
Currently, the operative gridded ET products are available from the Climatic Research Unit (CRU), TerraClimate, and Moderate Resolution Imaging Spectroradiometer (MODIS). The ET 0 products by the CRU and the TerraClimate are the monthly reanalysis data suitable for long-term climate researches. So, they cannot support examining daily or weekly changes for the applications in agriculture, hydrology, and meteorology. On the other hand, the MODIS product (MOD16A2GF) has a reasonable spatial resolution of 500 m and a temporal resolution of 8 days. However, the applicability to the local-scale variabilities due to complex and heterogeneous land surfaces in countries like South Korea is not sufficiently validated. Meanwhile, the AI approaches have rarely demonstrated a substantial product for the daily gridded ET 0 on the local scale, although they showed a useful functionality for the ET 0 retrieval on the local scale.
In this paper, we present ML-based modeling of daily local-scale ET 0 that can be used for estimating the crop AET, with the consideration of the complexity and nonlinearity between ET 0 and input variables, and thereby produce a daily 500 m ET 0 in the croplands of South Korea. Vegetation data from MODIS, precipitation data from Global Precipitation Measurement (GPM), and meteorological data from Unified Model (UM) Local Data Assimilation and Prediction System (LDAPS) were used as input features for the ML-based daily ET 0 model. The in situ point-based ET 0 data taken by the Korea Meteorological Administration (KMA) were used for training and blind testing of our model for the period between March and November 2013-2019. We also present the accuracy statistics followed by the characteristics of meteorology, hydrology, and vegetation in the croplands of South Korea in recent years using the daily ET 0 retrieved over the 500 m grid.

Overview
For the retrieval of the daily ET 0 over the 500 m grid in South Korea, we used the following input variables: normalized difference vegetation index (NDVI), leaf area index (LAI), and fraction of photosynthetically active radiation (FPAR) from MODIS; standardized precipitation index for three months (SPI3) from GPM; air temperature, land surface temperature, soil temperature, relative humidity, and wind speed from UM LDAPS. The ML models were built using these input data and the in situ ET 0 data taken from KMA ( Figure 1). The database was constructed for the period between March and November of 2013-2019, excluding the winter season when evaporation from soil and transpiration from vegetation are insignificant. For the data processing, we employed the MODIS Reprojection Tool (MRT) and the R programming with Geospatial Data Abstraction Library (GDAL). Additionally, R programming with the h2o library was conducted for the optimization of the ML model and the creation of the ET 0 data over the 500 m grid.

In Situ Reference Evapotranspiration Data
One effective method for quantifying the ET 0 is to use the pan evaporation (E p ) and the pan coefficient (K p ). The E p is the in situ observation using an evaporation pan, and the K p is defined as the ratio of ET 0 to E p . Given an E p and a K p , one can directly obtain the ET 0 even without meteorological observations. In the case of KMA, the K p was prepared in advance by fitting the linear relationship between the observed E p and the ET 0 derived by the PM equation using in situ meteorological data. Through this procedure, KMA provides the ET 0 by the multiplication of E p and K p for the nine points in South Korea ( Figure 2). They are part of the Automated Surface Observing System (ASOS) stations on the croplands having the equipment of evaporation pan. The ASOS meteorological data and the E p data of the nine stations have almost no missing values during 2013-2019. We aggregated the KMA ET 0 data on a daily basis for the calibration/validation for the ML-based daily ET 0 model.
where ∆ is the slope of the vapor pressure curve; R n is the net radiation; G is the soil heat flux; γ is the psychrometric constant; T is the near-surface air temperature; u 2 is the two-dimensional wind speed; e s is the saturation vapor pressure; e a is the actual vapor pressure. The constant 0.408 is for the conversion of the unit from MJ/m 2 /day to mm/day, and the constant 273.15 is for the temperature conversion from Kelvin to Celsius. The constants 900 and 0.34 are the empirical coefficients for the daily ET 0 proposed by FAO [1].
croplands of South Korea in recent years using the daily ET0 retrieved over the 500 m grid.

Overview
For the retrieval of the daily ET0 over the 500 m grid in South Korea, we used the following input variables: normalized difference vegetation index (NDVI), leaf area index (LAI), and fraction of photosynthetically active radiation (FPAR) from MODIS; standardized precipitation index for three months (SPI3) from GPM; air temperature, land surface temperature, soil temperature, relative humidity, and wind speed from UM LDAPS. The ML models were built using these input data and the in situ ET0 data taken from KMA ( Figure 1). The database was constructed for the period between March and November of 2013-2019, excluding the winter season when evaporation from soil and transpiration from vegetation are insignificant. For the data processing, we employed the MODIS Reprojection Tool (MRT) and the R programming with Geospatial Data Abstraction Library (GDAL). Additionally, R programming with the h2o library was conducted for the optimization of the ML model and the creation of the ET0 data over the 500 m grid.

Satellite Images
The MODIS instrument is a satellite sensor operated by the National Aeronautics and Space Administration (NASA) for Earth environmental monitoring onboard the Terra and Aqua satellites with a sun-synchronous orbit. The satellites pass over a specific location at the same local time every day. Terra crosses the equator from north to south (descending path) at approximately 10:30 AM local time. Aqua crosses the equator from south to north (ascending path) at approximately 1:30 PM local time. We obtained the NDVI, LAI, and FPAR products as the indicators of vegetation vitality, biomass, and gross primary production (GPP), from the Aqua MODIS.
NDVI is a commonly used vegetation index that represents the vitality and greenness of vegetation. We used the MYD13C2 monthly product that has a resolution of 0.05 • , instead of the MYD13A1 16-day product with a 500 m resolution because the 16-day 500 m product has many missing pixels due to clouds, particularly in summer. On the other hand, the monthly 0.05 • product has almost no missing pixels through the spatial and temporal smoothing. In general, vegetation greenness changes gradually in the form of a cosine curve over a year, so we converted the monthly product to daily values using the cubic spline method. The LAI, which is calculated using a crop-specific growth coefficient and maximum GPP, can be used as a proxy of the leaf biomass. FPAR is the portion of photosynthetically active radiation absorbed by green vegetation. We obtained LAI and FPAR from the MCD15A2H 8-day Remote Sens. 2020, 12, 3642 5 of 22 product with a resolution of 500 m, based on a combination of Terra and Aqua products, by choosing the best pixel available from all acquisitions of both sensors within the 8-day period.
where ∆ is the slope of the vapor pressure curve; is the net radiation; is the soil heat flux; is the psychrometric constant; is the near-surface air temperature; is the two-dimensional wind speed; is the saturation vapor pressure; is the actual vapor pressure. The constant 0.408 is for the conversion of the unit from MJ/m 2 /day to mm/day, and the constant 273.15 is for the temperature conversion from Kelvin to Celsius. The constants 900 and 0.34 are the empirical coefficients for the daily ET0 proposed by FAO [1].

Satellite Images
The MODIS instrument is a satellite sensor operated by the National Aeronautics and Space Administration (NASA) for Earth environmental monitoring onboard the Terra and Aqua satellites with a sun-synchronous orbit. The satellites pass over a specific location at the same local time every To extract croplands in South Korea, we used the MODIS land-cover type product (MCD12Q1), which was created by the supervised classification using the reflectance of multiple spectral bands, with an annual update over the 500 m grid [36]. According to the classification criteria released by the International Geosphere-Biosphere Programme (IGBP), we extracted the pixels associated with croplands, including type 12 (croplands) and type 14 (cropland/natural vegetation mosaic).
The Global Precipitation Measurement (GPM) mission comprises an international satellite network that provides the next-generation global observations of rain and snow to elucidate Earth's water and energy cycles [37]. We used the TRMM_3B42_Daily version 7 product, which provides better rainfall estimates than previous versions, with significantly lower bias over complex topography [38]. Precipitation is the supply of water to the land surface, which can be infiltrated and evaporated by soil or transpired by vegetation. Because cumulative precipitation is generally more closely associated with ET 0 than daily precipitation, we used SPI3, which is calculated from the z-score of the empirical cumulative distribution function (ECDF) of 3-month accumulated precipitation.
where x i is the ECDF of i-th day; µ is the mean of the entire ECDF; σ is the standard deviation of the ECDF.

Numerical Weather Prediction Data
The LDAPS is the numerical weather prediction model operated by the KMA that is a locally optimized version of the UM from the MetOffice of the United Kingdom (UK). The UM LDAPS provides a 3-hourly weather forecast over a 1.5 km grid using three-dimensional variational data assimilation (3DVAR) with optimized parameterizations for the physical processes such as radiation, boundary layer, convection, and the land-atmosphere interactions. The land surface variables are produced by the UK MetOffice Surface Exchange Scheme (MOSES), which computes the momentum, thermal energy exchange, water flux, and CO2 flux, based on the land-atmosphere interaction scheme in the land surface model (LSM). We extracted the variables air temperature, land surface temperature, soil temperature at 10 cm depth, relative humidity, and wind speed at noon local time every day from 2013 to 2019.

Spatial and Temporal Matchup
The 9 variables collected for the retrieval of ET 0 have different spatial and temporal resolutions, as summarized in Table 1. For use as input features for the ML-based ET 0 model, the variables should be spatially and temporally co-located to construct a matchup database. The MODIS LAI and FPAR data use a sinusoidal projection; the GPM precipitation data use a geographic projection; the LDAPS meteorological variables use a Lambert Conformal Conic (LCC) projection. In addition, the datasets have various spatial resolutions. We converted all datasets into the geographic projections based on latitude and longitude with a 500 m (15 arc-second) grid to construct a matchup database for the various data sources. The transformation of the coordinate reference system (CRS) was performed using the MRT and GDAL. The adjustment of spatial resolution was carried out using the bilinear interpolation that is based on the distance between the center points of each pixel. For temporal resolutions, we adjusted all datasets to an interval of 1 day. MODIS NDVI was converted through cubic spline interpolation, assuming the gradual change over a year. MODIS LAI and FPAR data from the 8-day composite were rearranged to daily values using the nearest neighbor method under the assumption that they do not usually change within 8 days. LDAPS meteorological variables such as air temperature, land surface temperature, soil temperature, relative humidity, and wind speed at noon were treated as representative values for the day.

Random Forest and Its Extensions
RF is an ensemble machine learning model that conducts classification and regression using the bootstrap and bagging methods [39]. The RF generates many decision trees with different input features Remote Sens. 2020, 12, 3642 7 of 22 through random sampling of training data. Resampling is conducted via bootstrapping if necessary, to achieve a suitable distribution of training samples. As an extension of the RF model, the gradient boosting machine (GBM) and the extreme gradient boosting (XGBoost) adopt a boosting method instead of bagging. The boosting considers the prediction performance of each sample during the iteration of training. The XGBoost uses more regularized trees than the GBM for the adjustment of the weighting scheme [40]. We used the three ensemble machine learning methods (RF, GBM, and XGBoost) and compared the test results to select the most suitable model. The number of decision trees (ntree) and the number of variables for splitting the tree branches (mtry) are the critical hyperparameters for the optimization of the three models [41]. We set the ntree parameter to 100 (Figure 3), and the mtry parameter to the number of input features divided by three, based on optimization experiments.

Training and Blind Test
Statistical models are usually optimized through a calibration/validation process for parameter adjustment using a training dataset extracted from the whole database. Data that were not included in the training dataset were used for blind testing of the optimized model. We constructed 16,125 matchups for the ML-based daily ET0 modeling in the croplands of South Korea for the period between March and November of 2013-2019. We first divided the whole matchups into 80% (12,941 records) for training and 20% (3184 records) for the blind test. This sampling was performed with consideration of the monthly distribution of data so that the ML-based ET0 model could learn from the temporally representative dataset. Using the training dataset, we first conducted a 10-fold calibration/validation to optimize the ML-based ET0 model (Figure 4). Then, we conducted a blind test using the remaining data and derived the accuracy statistics, including mean bias error (MBE), mean absolute error (MAE), root mean square error (RMSE), and correlation coefficient (CC).

Training and Blind Test
Statistical models are usually optimized through a calibration/validation process for parameter adjustment using a training dataset extracted from the whole database. Data that were not included in the training dataset were used for blind testing of the optimized model. We constructed 16,125 matchups for the ML-based daily ET 0 modeling in the croplands of South Korea for the period between March and November of 2013-2019. We first divided the whole matchups into 80% (12,941 records) for training and 20% (3184 records) for the blind test. This sampling was performed with consideration of the monthly distribution of data so that the ML-based ET 0 model could learn from the temporally representative dataset. Using the training dataset, we first conducted a 10-fold calibration/validation to optimize the ML-based ET 0 model ( Figure 4). Then, we conducted a blind test using the remaining data and derived the accuracy statistics, including mean bias error (MBE), mean absolute error (MAE), root mean square error (RMSE), and correlation coefficient (CC). records) for training and 20% (3184 records) for the blind test. This sampling was performed with consideration of the monthly distribution of data so that the ML-based ET0 model could learn from the temporally representative dataset. Using the training dataset, we first conducted a 10-fold calibration/validation to optimize the ML-based ET0 model (Figure 4). Then, we conducted a blind test using the remaining data and derived the accuracy statistics, including mean bias error (MBE), mean absolute error (MAE), root mean square error (RMSE), and correlation coefficient (CC).

Comparison with Operative Product
A comparison with an operative product was also conducted to more objectively evaluate our ML-based ET 0 retrievals. The MODIS PET product (MOD16A2GF) is a year-end gap-filled 8-day composite with a 500 m resolution. The retrieval algorithm was based on the PM equation and used daily meteorological reanalysis data along with MODIS products such as vegetation, albedo, and land cover [43].

Retrieval of Daily Reference Evapotranspiration
The optimization of the three ML models (RF, GBM, and XGBoost) was conducted through the training process, followed by the blind test for the optimized models. The accuracy statistics of the three models were very similar, although the RF showed slightly better performance (Table 2). We chose the RF model because it is a commonly used machine learning model of which maintenance and rebuilding are more convenient.  Figure 5 shows the accuracy statistics of the blind test of the RF model. The MBE was 0.007 mm/day, including unbiased predictions without over-or under-estimation. The MAE was 0.790 mm/day, the RMSE was 1.038 mm/day, and the CC was 0.870, indicating quite a strong accordance with the KMA in situ data. A scatterplot also shows that the test retrieval for the 3,184 cases was successful, with few deviations from the 1:1 line. Figure 5 shows the accuracy statistics of the blind test of the RF model. The MBE was 0.007 mm/day, including unbiased predictions without over-or under-estimation. The MAE was 0.790 mm/day, the RMSE was 1.038 mm/day, and the CC was 0.870, indicating quite a strong accordance with the KMA in situ data. A scatterplot also shows that the test retrieval for the 3,184 cases was successful, with few deviations from the 1:1 line. The importance of 9 variables was evaluated based on permutation feature importance (PFI) [41] ( Table 3). Relative humidity, land surface temperature, and air temperature accounted for about 70% of the importance of all variables. The transport of water vapor from the soil and vegetation to the atmosphere is mainly controlled by the heat transfer between the land surface and the atmosphere [44]. Relative humidity (PFI = 33.2%) is directly associated with the vapor pressure deficit (VPD) (e s − e a ) in the PM equation, which represents the capacity of the diffusion of water vapor to the atmosphere. Under the conditions of sufficient soil moisture, a high VPD and low relative humidity result in high evapotranspiration (Figure 6a). Although the degree of stomatal opening, which is associated with the transpiration rate, can be decreased under extremely dry conditions of very high VPD, the plant transpiration is generally increased by the high evaporative capacity [45]. The land surface temperature (PFI = 25.1%) and the air temperature (PFI = 11.9%) can influence the relative humidity. Indeed, the land surface temperature had a positive relationship with ET 0 (Figure 6b), as mentioned in the previous studies [46]. The evaporation can increase if high thermal energy is available to convert the liquid water to water vapor under a sufficient supply of water.    ET 0 is not directly controlled by rainfall or vegetation phenology, according to the definition of ET 0 . However, the high-priority variables of the ET 0 model can have direct or indirect relationships with the other variables, such as rainfall and vegetation (NDVI, LAI, and FPAR). Hence, the variables with lower importance should not be considered unnecessary for the AI modeling of ET 0 to cope with the nonlinearity between the variables. Figure 7 shows that the estimation errors were not significantly influenced by wind speed. Further, the errors were decreased when the wind was strong because the aerodynamic resistance, which is generally governed by wind speed, has less fluctuation against the wind speed under strong winds [47].

Spatial and Temporal Characteristics of the Accuracy Statistics
For a more detailed examination of the daily ET0 retrieval, the accuracy statistics were summarized according to location, year, and month. Table 4 shows the accuracy associated with the nine stations in the croplands of South Korea used for in situ measurements. Station 177 was associated with the highest accuracy, with the MAE of 0.703 mm/day, the RMSE of 0.955 mm/day, and the CC of 0.896, whereas station 254 results were the least accurate, with the MAE of 0.878 mm/day, the RMSE of 1.144 mm/day, and the CC of 0.833. The overall accuracy of the daily ET0 was similar among the nine stations because the ET0 is not affected by the local condition of the land surfaces. Figure 8 shows scatterplots for the nine stations, in which data trends are generally in accordance with the 1:1 line.

Spatial and Temporal Characteristics of the Accuracy Statistics
For a more detailed examination of the daily ET 0 retrieval, the accuracy statistics were summarized according to location, year, and month. Table 4 shows the accuracy associated with the nine stations in the croplands of South Korea used for in situ measurements. Station 177 was associated with the highest accuracy, with the MAE of 0.703 mm/day, the RMSE of 0.955 mm/day, and the CC of 0.896, whereas station 254 results were the least accurate, with the MAE of 0.878 mm/day, the RMSE of 1.144 mm/day, and the CC of 0.833. The overall accuracy of the daily ET 0 was similar among the nine stations because the ET 0 is not affected by the local condition of the land surfaces. Figure 8 shows scatterplots for the nine stations, in which data trends are generally in accordance with the 1:1 line. The temporal characteristics of the accuracy statistics according to year were also examined. The MAE for each year ranged between 0.745 and 0.834 mm/day; RMSE ranged from 0.996 to 1.092 mm/day; the CC ranged between 0.852 and 0.893. No significant differences among years were found, indicating that the accuracy statistics were stable across years (Table 5; Figure 9) because the input features from the satellite images and numerical reanalysis product well reflect the meteorological and land surface conditions according to the time-series of the seven years.  The temporal characteristics of the accuracy statistics according to year were also examined. The MAE for each year ranged between 0.745 and 0.834 mm/day; RMSE ranged from 0.996 to 1.092 mm/day; the CC ranged between 0.852 and 0.893. No significant differences among years were found, indicating that the accuracy statistics were stable across years (Table 5; Figure 9) because the input features from the satellite images and numerical reanalysis product well reflect the meteorological and land surface conditions according to the time-series of the seven years.  We investigated the seasonal characteristics of the accuracy statistics according to the month between March and November. The MAE and the RMSE were somewhat lower in colder months (September to March) than in warmer months (April to August). Because soil evaporation and vegetation transpiration levels are low in the cold season, the amount of ET is relatively small (Table 6), and accordingly, the errors are also small (Table 7). However, this result does not mean that the accuracy was better in the cold season than in the warm season. We also calculated the normalized root mean square error (NRMSE), which is the RMSE divided by the mean of the observation values to account for seasonal variations in the value ranges. Table 7 shows the accuracy statistics by month, including the NRMSE. In contrast to the RMSE, the NRMSE values were similar across all months, indicating that the ET 0 retrievals were stable irrespective of the month (Figure 10). The accuracy was lowest in August, presumably because the quality of satellite data is lowered in the cloudier season.    A comparison with the operative MODIS PET product (MOD16A2GF) was conducted for a more objective assessment of our ET 0 retrievals because the PET can be considered the same physical quantity as ET 0 . Since MOD16A2GF comprises 8-day composite data, we aggregated the daily ET 0 and in situ data into 8-day bins for these comparisons. Table 8 shows the accuracy statistics of the ML model and MODIS product against the in situ data collected during 2013-2019. Both 8-day composite datasets led to good model performance due to the effects of temporal smoothing. However, the results of the ML model were much better than those from the MODIS product for croplands in South Korea ( Figure 11). The RMSE of our retrievals was 73% smaller than that of MODIS PET ((1.142−0.304)/1.142), and the ML model produced an almost perfect CC of 0.982. This difference in performance is because our ML model was locally optimized for South Korea, whereas the MODIS PET algorithm was not specifically designed for South Korea. This finding also indicates that the arrangement of the input features for the ML model was spatially and temporally appropriate. The 8-day LAI/FPAR and the daily NDVI created through time-series interpolation appeared suitable for the estimation of daily ET 0 , as described previously [48,49]. A comparison with the operative MODIS PET product (MOD16A2GF) was conducted for a more objective assessment of our ET0 retrievals because the PET can be considered the same physical quantity as ET0. Since MOD16A2GF comprises 8-day composite data, we aggregated the daily ET0 and in situ data into 8-day bins for these comparisons. Table 8 shows the accuracy statistics of the ML model and MODIS product against the in situ data collected during 2013-2019. Both 8-day composite datasets led to good model performance due to the effects of temporal smoothing. However, the results of the ML model were much better than those from the MODIS product for croplands in South Korea ( Figure 11). The RMSE of our retrievals was 73% smaller than that of MODIS PET ((1.142−0.304)/1.142), and the ML model produced an almost perfect CC of 0.982. This difference in performance is because our ML model was locally optimized for South Korea, whereas the MODIS PET algorithm was not specifically designed for South Korea. This finding also indicates that the arrangement of the input features for the ML model was spatially and temporally appropriate. The 8-day LAI/FPAR and the daily NDVI created through time-series interpolation appeared suitable for the estimation of daily ET0, as described previously [48,49].

Agrometeorological Characteristics in Recent Years
ET0 is a useful indicator to examine the impacts of the meteorological drought on vegetation conditions. Using the daily ET0 retrieved over the 500 m grid for the period between March and November of 2013-2019, a case study was analyzed to clarify the hydrological and meteorological characteristics of croplands in South Korea. We focused on the July and August of 2016-2018, during which heatwave and drought events occurred. From 12 July to 12 August, the land surface temperatures in 2016 and 2017 were typical (midday average, 32.4 °C and 32.2 °C, respectively). By contrast, the land surface temperatures in 2018 were remarkably high (midday average, 36.6 °C) ( Figure 12). During that period, the average rainfall was 3.6 and 7.5 mm/day in 2016 and 2017, respectively, but only 0.8 mm/day in 2018. Because of the high temperatures and low rainfall, soil moisture decreased in 2018 (average, 29.9%) compared to 2016 and 2017 (average, 35.3% and 36.9%, respectively). The ET0 values were higher in 2018 (average, 6.0 mm/day) than in 2016 and 2017 (average, 4.7 mm/day and 5.0 mm/day, respectively). The combination of high land surface temperature and ET0 with low rainfall and soil moisture resulted in a reduced NDVI in 2018 (average, 0.72) than in 2016 and 2017 (average, 0.75 and 0.76, respectively). Figure 13 illustrates the NDVI from August 21 to 28 in 2017 and 2018, showing NDVI values with a different pattern. Insufficient rainfall leads to a deficit in the water supply to the land surface. When the higher ET0 is combined with severe meteorologically dry conditions due to a heatwave, it can cause excessive water loss from the land surface that exceeds the water supply from rainfall and irrigation. Therefore, considering the welldeveloped irrigation systems used in croplands in South Korea, a small degradation in the greenness of croplands can be regarded as indicating a critical deficit of water supply and excessive water consumption in the croplands.

Agrometeorological Characteristics in Recent Years
ET 0 is a useful indicator to examine the impacts of the meteorological drought on vegetation conditions. Using the daily ET 0 retrieved over the 500 m grid for the period between March and November of 2013-2019, a case study was analyzed to clarify the hydrological and meteorological characteristics of croplands in South Korea. We focused on the July and August of 2016-2018, during which heatwave and drought events occurred. From 12 July to 12 August, the land surface temperatures in 2016 and 2017 were typical (midday average, 32.4 • C and 32.2 • C, respectively). By contrast, the land surface temperatures in 2018 were remarkably high (midday average, 36.6 • C) ( Figure 12). During that period, the average rainfall was 3.6 and 7.5 mm/day in 2016 and 2017, respectively, but only 0.8 mm/day in 2018. Because of the high temperatures and low rainfall, soil moisture decreased in 2018 (average, 29.9%) compared to 2016 and 2017 (average, 35.3% and 36.9%, respectively). The ET 0 values were higher in 2018 (average, 6.0 mm/day) than in 2016 and 2017 (average, 4.7 mm/day and 5.0 mm/day, respectively). The combination of high land surface temperature and ET 0 with low rainfall and soil moisture resulted in a reduced NDVI in 2018 (average, 0.72) than in 2016 and 2017 (average, 0.75 and 0.76, respectively). Figure 13 illustrates the NDVI from August 21 to 28 in 2017 and 2018, showing NDVI values with a different pattern. Insufficient rainfall leads to a deficit in the water supply to the land surface. When the higher ET 0 is combined with severe meteorologically dry conditions due to a heatwave, it can cause excessive water loss from the land surface that exceeds the water supply from rainfall and irrigation. Therefore, considering the well-developed irrigation systems used in croplands in South Korea, a small degradation in the greenness of croplands can be regarded as indicating a critical deficit of water supply and excessive water consumption in the croplands. Remote Sens. 2020, 12, x FOR PEER REVIEW 16 of 21   The NDVI in 2018 was lower than in 2017 due to a heatwave, low rainfall, decreased soil moisture and increased ET 0 . The red dashed line denotes the major agricultural areas in South Korea.

Conclusions
In this paper, we presented an ML-based retrieval of daily ET 0 over a 500 m grid for croplands in South Korea using satellite images and NWP data. In a blind test for 2013-2019, the ML-based ET 0 model produced highly accurate results, with an RMSE of 1.038 mm/day and the CC of 0.870. The results of the blind test were stable irrespective of location, year, and month. This outcome is presumably because the input data of the ML-based ET 0 model were suitably arranged spatially and temporally, and the optimization of the model was appropriate. In addition, the performance comparison with the operative MODIS PET product was conducted using 8-day composite data. Our ML model outperformed the MODIS product for croplands in South Korea because the ML model was locally optimized for South Korea, whereas MODIS PET was not. Our ET 0 retrieval is expected to be an official product of the National Meteorological Satellite Center (NMSC) of South Korea since 2021 for the applications of agriculture, hydrology, and meteorology. Using the daily ET 0 data produced over the 500 m grid, we conducted a case study to examine agrometeorological characteristics in July and August of 2016-2018. We found that low rainfall and high ET 0 can damage the surface water balance when combined with a heatwave, resulting in reduced vegetation greenness. Through the experiments, the feasibility of the ML-based ET 0 retrieval was validated, especially for local agrometeorological applications in regions with heterogeneous land surfaces, such as South Korea. However, a more detailed examination is required to test if the 500 m grids created from various spatial resolutions are statistically stable. The AI approaches using satellite images with NWP data can be a viable option for studying interactions between the land surface and atmosphere.