1. Introduction
Evapotranspiration (ET) is the process by which surface water is transferred to the atmosphere, including evaporation of water vapor captured from water bodies, soils, and vegetation surfaces, and transpiration by plants [
1]. Evapotranspiration is a major component of the global water cycle and is a key link between terrestrial water, carbon, and surface energy exchange, with strong links to agriculture, hydrology, meteorology, and ecology [
2]. Accurate estimates of evapotranspiration are important for understanding global climate change, ecological issues, the water cycle, and hydrological processes, as well as for improving water use efficiency, crop yields, and water productivity [
3,
4,
5].
The conventional estimation method for ET is to analyze meteorological data through the information obtained from meteorological stations and ground flux monitoring stations in different areas [
6], but the above estimation method is only a point estimate of a single station, which is not suitable for large-scale and multi-temporal ET estimation, and more manpower and material resources are required to install evapotranspiration systems in a large area [
7]. With the rapid development of remote sensing technology, remote sensing technology has overcome the shortage of spatial variability and multi-temporal phases based on single-site observations due to its short detection period, strong presentability, and its ability to observe simultaneously over a large area [
8,
9]. Since the 1980s, with the extensive use of polar-orbiting meteorological satellites, the efficient space measurement capability has provided a cost-effective method for monitoring ET on a large scale [
10]. At present, many scholars have conducted several studies on the simulation and inversion of large-scale evapotranspiration using Landsat series satellites and MODIS series satellites [
11,
12].
The inversion methods for evapotranspiration remote sensing can be classified into process-driven physical inversion methods and data-driven inversion methods according to the principal mechanism [
13]. Physical inversion methods include energy balance residual methods and methods based on Penman–Monteith or Priestley–Taylor formulas. Wang [
14] et al. studied ET for different land use types based on meteorological data of the Sanmenxia reservoir watershed in China using Landsat 8 OIL_TIRS remote sensing images as an example. Gao [
15] et al. studied ET for different land use types based on MODIS data and Sentinel-2 data, using the SEBAL model and the Penman–Monteith equation to estimate the ET of field crops. Yang [
16] et al. successfully simulated the spatial distribution of ET in the Haihe Plain in northern China by establishing an irrigation application (IA) estimation model and an EI calculation model based on remote sensing data. Liu et al. [
17] used data from meteorological stations in Sichuan Province, China, and MODIS data products to validate the MODIS-predicted evapotranspiration through the estimated actual evapotranspiration. The process-driven approach addresses the theory of plant photosynthesis, canopy respiration, and soil evapotranspiration in the ET production mechanism, using simplified ecosystem processes and components to form a model structure for simulating ecosystem carbon and water energy exchange [
18,
19]. However, the heterogeneity of regional surfaces, the complexity of parameter selection, and the cumulative error of different models can lead to process-driven models being more complex in estimating ET at large scales and in multiple time phases, and the accuracy of the results is limited by the quality of the input data, making it difficult to obtain the desired regional estimation accuracy.
Data-driven inversion methods include empirical regression methods, machine learning methods, and data fusion methods. Liu et al. [
20] used global eddy covariance flux point data to compare the applicability of different machine learning models for the remote sensing estimation of farmland ET, with good results. Han et al. [
21] studied the relationship between drought and precipitation, temperature, vegetation, and evapotranspiration using random forest (RF) and, by introducing a joint drought monitoring index (CDMI), explored the spatial distribution of drought in Shaanxi Province. Data-driven inversion methods are methods for obtaining ET estimates by establishing relationships between actual ET measurements and their associated characteristic parameters. For example, empirical regression methods and machine learning methods estimate the actual ET by directly constructing empirical relationships between remote sensing, meteorology, and ET [
22,
23]. Related studies have shown that complex physical and analytical methods do not necessarily have higher accuracy than simple empirical and statistical methods, and that it is not necessary to predict the physical mechanisms of ET or to obtain all the influencing factors, but only to construct relationships between the remote sensing and meteorological data obtained in order to obtain highly accurate estimates [
1]. It has been shown that evapotranspiration is influenced by rainfall, soil temperature, etc. [
2,
17,
24]. Therefore, it is feasible to fuse some soil and meteorological indicators for ET inversion using empirical regression methods or machine learning models.
Satellite remote sensing products currently in use include Landsat, Sentinel-2, MODIS, and AVHRR. The spatial resolutions of AVHRR, MODIS, and Sentinel-2 are 5 km, 500–1000 m, and 10 m. Reyadh Albarakat [
25] and Lida Andalibi et al. [
26] found that although the inversion results of the three sensors were close in the same area, the lower spatial resolution and wider sensor bandwidth of AVHRR and MODIS resulted in more significant water vapor absorption and showed noisy behavior in the dataset. For the Sentinel-2 satellite, although it has a spatial resolution of 10 m, its large data volume and limited image storage services make it difficult to perform multi-year time series monitoring. In addition, the 10 m resolution results in a huge amount of computation and requires a high computing power, which is not conducive to application promotion. In contrast, Landsat technology is more mature and easier to apply, and it has considerable typicality and ubiquity, which is suitable for promotion and application.
Vegetation cover is an important factor influencing the amount of surface evapotranspiration. With the extensive use of remotely sensed vegetation indices, it is possible to monitor changes in vegetation cover on a large scale [
27]. The normalized difference vegetation index (NDVI), which compares the intensity of reflectance in the visible red and near-infrared bands, is the most commonly used VI to quantify the presence of live green vegetation. By relying on a ratio of band intensities, NDVI removes a large proportion of noise caused by cloud shadows, topographic and solar angle variations, and atmospheric attenuations existing in the visible red and infrared bands, which makes NDVI less susceptible to illumination conditions [
27,
28,
29].
The current inversion of ET using remote sensing mainly focuses on the evaluation of the accuracy of different models or the mechanism, but there are fewer studies on the analysis of ET changes and driving factors in the same area under different years (multi-temporal phases). Therefore, this paper took the lower Yangtze River urban agglomeration (Nanjing, Changzhou, and Zhenjiang, hereafter referred to as the three cities) as the study area, and used the Landsat 8 OIL_TIRS series to acquire multispectral remote sensing images of the study area for the same period from 2016 to 2021. In addition, the Food and Agriculture Organization of the United Nations (FAO) Harmonized World Soil Database v 1.2 platform was used to obtain ET, soil temperature and humidity, solar short-wave radiation, and heat flux data of the study area for the same period. On the basis of the reduced dimensional analysis of the spectral data, a multiple linear regression (MLR) model and an extreme learning machine (ELM) inversion model were established, and the spatial distribution of ET and the characteristic factors driving it in different time periods in 2016–2021 and 2017 were analyzed on the basis of the model inversion. This study provides a theoretical basis for modeling the spatial distribution of ET evapotranspiration over different time periods.
4. Discussion
In this paper, using Landsat 8 remote sensing satellite images of the lower Yangtze River urban agglomeration from 2016 to 2021, we introduced five ecological and environmental factors involved in the Penman–Monteith equation: soil heat flux, soil temperature and humidity, net solar radiation, and vegetation cover, and established a multi-temporal ET prediction model based on ELM and MLR, with refined inversions of both models. The results were good, with an R2 above 0.59. In addition, the spatial and temporal variation characteristics of ET in the study area were analyzed through remote sensing visualization, and the driving forces of soil heat flux and other factors were analyzed and compared. The correlation coefficients showed that the net solar radiation, soil heat flux, soil temperature, and vegetation cover all significantly influenced the ET variation and greatly enhanced the accuracy of the prediction models. This finding indicates that the combination of spectral data and ecological data can be used to carry out the prediction and accuracy optimization of regional remote sensing ET.
4.1. Data Optimization Studies
For predictive models, correlations between predictors can have unpredictable and inconsistent effects on parameter estimates and significance, and may lead to biased results [
53]. In most remote sensing inversion studies, researchers tend to ignore this issue or simply eliminate variables with strong covariance, which may result in the elimination of variables that contribute more to the model prediction at the expense of the accuracy of the model prediction. In this study, clustering analysis was used to group variables with the same information and perform factor analysis, which can effectively deal with the problem of multiple cointegration between variables and improve the model prediction accuracy and stability. The results showed that the VIF of the variables treated using the cluster factor analysis dropped to below 3, and the models could still maintain a high R
2.
4.2. Model Accuracy Discussion
Many studies have shown that the use of machine learning models can show better predictive power in estimating ET [
20]. Compared to the traditional PM equation, the introduction of spectral data, meteorological, and ecological data in this study yielded better predictors (R
2 = 0.59−0.87, RMSE < 0.59) [
54]. In addition, when comparing the machine learning model with the linear regression model in this study horizontally, the difference in inversion accuracy between the two was not significant, and even in some year types, the MLR prediction accuracy was higher than that of the ELM model. This is because machine learning techniques rely on a large amount of data to achieve high performance, and when the data samples are the same and have not reached their peak, the machine learning prediction effect does not prevail. In contrast, the traditional linear regression achieved good prediction results based on the theoretical causality of the data [
55], while some of the induced factors introduced based on the PM equation, which itself has a good linear relationship with ET, greatly increased the explanatory power of the multiple linear regression for ET.
4.3. Selection of Vegetation Indices
NDVI, a widely used vegetation index, plays an important role in monitoring vegetation. However, environmental factors such as atmospheric conditions, topographical effects, and topographic illumination may affect it significantly [
56]. Glenn et al. [
57] found that the enhanced vegetation index (EVI), compared to NDVI, improved vegetation monitoring through a de-coupling of the canopy background signal and a reduction in atmospheric influences. However, the presence of soil regulators in EVI does not eliminate the influence of topographic effects and has limitations for estimating vegetation cover changes on a large scale. In addition, Moreira et al. [
58] found that topographic illumination was negligible in areas where the average watershed slope was below 25° and the latitude was below 45°. From a review of the data, the study area (lower Yangtze River plain) has a relatively flat topography, with an average elevation of 20–30 m and an average slope of 2–6°; therefore, the choice of NDVI as a parameter for vegetation cover estimation has some applicability.
4.4. Spatial- and Temporal-Change-Driven Analysis
Spatially, the distribution of ET in the study area was basically in line with the general trend, mainly concentrated in areas with high vegetation cover such as farmland and forests in the urban periphery and extensive agricultural land. From a temporal perspective, ET showed more obvious seasonal variations within the year, mainly in summer > autumn > spring > winter. According to the analysis of the previous driving factors, ET changes are influenced by the net solar radiation intensity, soil heat flux, and soil temperature and humidity, and these four driving factors themselves are more sensitive to the season, thus leading to seasonal differences in ET. According to the mean ET values measured at each meteorological station, the maximum ET during the year was 912.67 mm in 2018, and the minimum was 845.02 mm in 2021, which is consistent with the results of this paper.