Next Article in Journal
De-Carbonisation Pathways in Jiangxi Province, China: A Visualisation Based on Panel Data
Next Article in Special Issue
Assessing Spatio-Temporal Hydrological Impacts of Climate Change in the Siliana Watershed, Northwestern Tunisia
Previous Article in Journal
Towards Air Quality Protection in an Urban Area—Case Study
Previous Article in Special Issue
Weather Research and Forecasting Model (WRF) Sensitivity to Choice of Parameterization Options over Ethiopia
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Calibration for Improving the Medium-Range Soil Forecast over Central Tibet: Effects of Objective Metrics’ Diversity

1
Henan Meteorological Bureau, China Meteorological Administration, Zhengzhou 450003, China
2
Key Laboratory of Meteorological Disaster (KLME), Ministry of Education and Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters (CIC-FEMD), Nanjing University of Information Science and Technology (NUIST), Nanjing 210044, China
3
Meteorological Observation Centre, China Meteorological Administration, Beijing 100081, China
4
Meteorological Development and Planning Institute, China Meteorological Administration, Beijing 100081, China
5
Tibet Meteorological Observatory, China Meteorological Administration, Lhasa 850000, China
*
Authors to whom correspondence should be addressed.
Atmosphere 2024, 15(9), 1107; https://doi.org/10.3390/atmos15091107
Submission received: 14 August 2024 / Revised: 6 September 2024 / Accepted: 8 September 2024 / Published: 11 September 2024
(This article belongs to the Special Issue Climate Change and Regional Sustainability in Arid Lands)

Abstract

:
The high spatial complexities of soil temperature modeling over semiarid land have challenged the calibration–forecast framework, whose composited objective lacks comprehensive evaluation. Therefore, this study, based on the Noah land surface model and its full parameter table, utilizes two global searching algorithms and eight kinds of objectives with dimensional-varied metrics, combined with dense site soil moisture and temperature observations of central Tibet, to explore different metrics’ performances on the spatial heterogeneity and uncertainty of regional land surface parameters, calibration efficiency and effectiveness, and spatiotemporal complexities in surface forecasting. Results have shown that metrics’ diversity has shown greater influence on the calibration—predication framework than the global searching algorithm’s differences. The enhanced multi-objective metric (EMO) and the enhanced Kling–Gupta efficiency (EKGE) have their own advantages and disadvantages in simulations and parameters, respectively. In particular, the EMO composited with the four metrics of correlated coefficient, root mean square error, mean absolute error, and Nash–Sutcliffe efficiency has shown relatively balanced performance in surface soil temperature forecasting when compared to other metrics. In addition, the calibration–forecast framework that benefited from the EMO could greatly reduce the spatial complexities in surface soil modeling of semiarid land. In general, these findings could enhance the knowledge of metrics’ advantages in solving the complexities of the LSM’s parameters and simulations and promote the application of the calibration–forecast framework, thereby potentially improving regional surface forecasting over semiarid regions.

1. Introduction

Soil moisture and soil temperature are crucial variables modulating land-atmosphere fluxes [1,2,3,4]. However, due to the complexity of ST modeling over semi-arid regions, the ST simulations directly produced by land surface models (LSMs) exhibit spatiotemporal deficiencies, posing challenges in their regional weather and climate applications [5,6]. Research efforts in improving ST simulations have suggested that the manually corrected high-sensitivity land parameters could benefit the greater-scale ST modeling physics [7,8], and the auto-calibrated LSM’s parameter table could benefit the joint SM—ST modeling configuration [9]. Given the great challenges in solving the high non-linearity in joint soil moisture and temperature (briefed as SM—ST hereafter) modeling, e.g., the high-dimensional land parameters and nonlinear physics, the composited objectives evaluating the distance between simulations and observations are proposed to enhance calibration performance (e.g., Kling–Gupta efficiency) [10], whose various internal multi-metrics’ credits need more effort to meet a robust real-world application [11,12]. Therefore, evaluating the effects of the objective metrics’ diversity on calibration performance in solving the spatial complexities of surface simulations is of great significance for improving ST modeling and forecasting over semiarid land.
The LSM parameters’ optimization or identification has been evolving for decades with the regional application of auto-calibration techniques, primarily achieved by utilizing global search algorithms (GSA, e.g., particle swarm optimization and shuffled complex evolution; PSO [13] and SCE [14]) to seek the optima against the specific model objective. As the LSM parameter number usually decreases the GSA’s efficiency and effectiveness, especially in high-dimensional cases, early research advocated for dimensionality reduction through generalized land parameter sensitivity analysis, such as focusing on reducing insensitive parameters for specific objectives, based on land surface parameter categorization (e.g., soil, vegetation, general, and initial types), to enhance calibration optima [15,16,17]. Moreover, given the intensifying diversity of land surface model applications (e.g., runoff, fluxes), globally applicable land surface parameter estimation has garnered great attention [18,19,20], but this has been challenged by the largely varied sensitivities of the distinguished LSM parameters (or spatial heterogeneity) in arid and semi-arid land (ASAL) [21,22,23].
The Noah LSM, comprising the comprehensive and complex physics among soil, land surface, and atmosphere at finer scales, has been widely employed in numerical weather studies [24] but faces increasingly prominent issues related to the representativeness of parameters in complex ASAL applications, such as varying sensitivities of vegetation and general parameters to thermal flux, respectively [17,25,26]. This poses continuous challenges for refined land surface applications in Tibet, a region with diverse climatic zones, e.g., LSM parameter diverse advantages in different regions of a similar surface [7,21,22]. Despite the establishment of a refined SM—ST observation network under a semiarid climate over central east Tibet (i.e., northwest Naqu) [27], which features grassland as the primary land cover and clay as the dominant soil texture, a more comprehensive calibration objective against the LSM parameter uncertainties reduction and surface enhancement is still required for the robust ST modeling [9,28,29,30].
In fact, with the development of land remote sensing, given the diversity of the GSA’s strategies and application (e.g., SM, ST, runoff, and fluxes), the objective metric designs of auto-calibration have been greatly developed to enhance LSM modeling performances. For instance, in LSM calibration using multi-source remote sensing data, the multi-objective design concentrations on the comprehensive inversion characteristics of remote sensing SM and/or ST observations are essential [31,32,33]. Similarly, in calibration applications aimed at improving the spatial accuracy of surface state predictions, a multi-objective design that considers horizontal variations [34,35,36,37] and/or vertical stratification [38,39,40,41] of states and observations is crucial. Generally, these multi-dimensional metrics can, to a certain extent, address the issues of observational data fusion and multi-state complex error measurement in specific calibration applications, emphasizing the enhanced role of spatial dimensions of single or multiple land surface state errors as holistic objectives.
Meanwhile, although the holistic objectives with differentiated dimensions have enhanced the ability to apply observations in calibration, the inherent scale uncertainties of land surface state (e.g., the distance between simulation and observation could fall into the non-Euclidean space) lead to challenges in assessing and ranking LSM performances in high-dimensional searching space. Therefore, the holistic objectives with differentiated metrics have been widely developed to simplify and enhance the calibration [42,43,44]. In addition, they can be primarily categorized into flexible (e.g., Pareto front [45,46,47,48], dominated Pareto [49,50,51]), and deterministic approaches. The Pareto front adjusts the cumulative distribution of metrics based on external algorithm storage and aims at general LSM modeling with globally applicable parameters, which can be an expensive evaluator that is independent from GSA. Within the GSA’s evaluator, the dominated Pareto compares relationships among metrics, and the deterministic approach combines various metrics and then aims at determining the optimally combined solution of various simulations.
It is noted that though the holistic objectives with differentiated metrics offer a deterministic reference for estimating diversity of model performances, recent research has determined that the application of computational methods employed by these metrics often exhibits blindness against identical datasets. For instance, integrated metrics such as Nash–Sutcliffe efficiency and Kling–Gupta efficiency can be utilized for algorithm comparison, yet for actual model evaluation, direct metrics are still necessary to indicate [52,53,54,55]. In addition, the optimal applicability of direct metrics like root mean square error and mean absolute error in describing data is premised on their distributions conforming to normal and Laplacian distributions, respectively [56,57]. Furthermore, the correlated coefficient is susceptible to the monotonicity and nonlinearity of two types of data [58,59]. Consequently, the performance of these metrics, when combined across different dimensions, often necessitates a comprehensive evaluation of their suitability tailored to the varied spatiotemporal requirements [60,61]. In particular, the holistic SM—ST objective with differentiated metrics has been rarely evaluated during regional LSM applications till now.
Overall, due to the LSM’s high nonlinearity between parameters and simulations, the holistic SM—ST objectives of the forecast calibration framework lack investigation on their dimensional-varied metrics’ abilities in reducing spatial complexities of regional ST modeling over ASAL. Therefore, to fill this gap, based on the previously established forecast calibration framework [9], this study utilizes various introduced SM—ST objectives with differentiated metrics combined with dense regional soil site observations, to explore the effects of objective metrics’ diversity on the spatial heterogeneity and uncertainty of regional land surface parameters, calibration efficiency and effectiveness, and temporal and spatial complexities in surface forecasting. This intends to provide profundities in regional ST modeling configuring, aiming to improve medium-range ST forecasting over semi-arid regions.

2. Methods

2.1. Calibration Schemes

The utilization of GSA in this study, notably SCE [14] and PSO [62], for automatic calibration of LSMs addresses the challenge of inefficient convergence in high-dimensional parameter spaces [63,64,65]. SCE, with its conservative evolutionary strategy, excels in simplifying model complexities and handling incommensurable information, favoring flexible objectives [15,16,17]. PSO, adopting a radical evolutionary approach, enhances efficiency and effectiveness within constrained parameter spaces, making it suitable for deterministic objectives [66,67]. Under the calibration–forecast framework, both methods offer complementary strengths tailored to the objective that is included in an LSM evaluator [9,68,69].

2.1.1. Evolution Algorithms

As seen in Figure 1, the swarm and particle social behaviors are incorporated into the core PSO algorithm process. The PSO algorithm first randomly selects the scaled (or normalized) parameters ( x ) to generate the initial swarm, including the particle’s position and the speed of position change, i.e., x i 0 , v i 0 ,   i ( 1 , , n p ) , and further obtain the local and global optimal position ( P b 0 and g b 0 ) through evaluating and sorting at the beginning (steps 1–4). Then the following procedures are repeated until the stop criteria are met. (1) Each particle’s speed and position are updated using v i t + 1 = [ w v i t + c 1 r 1 ( p b i t x i t ) + c 2 r 2 ( g b t x i t ) ] × n v m × [ ( r 3 1 ) v r + 1 ] + c 3 v r and x i t + 1 = x i t + v i t + 1 , respectively (steps 8–9), where v m and v r are equal to 0.5 and 0.15, respectively; r 1 and r 2 equal 0.5 and 0.15, respectively; w is equal to 0.9; and c 1 , c 2 , and c 3 are equal to 2.0, 2.0, and 10−7 [9,62], respectively. (2) The particles’ current and previous evaluation values are compared to obtain the current local optima ( P b t ) (steps 10–12). (3) Then, the local optima are sorted to find the current swarm’s optima ( g b t ) (step 14). Note that except n p , all the other parameters that can affect the generality of PSO are vaguely related to the dimensions of the LSM parameter space.
Moreover, community and individual social behaviors are incorporated into the core SCE algorithm process [14,65]. The SCE algorithm first randomly selects the scaled (or normalized) parameters ( x ) to generate the initial population then further obtains initial local and global optima ( D 0 and g b 0 ) through evaluating and sorting with marked orders at the beginning time (steps 1–2). The following procedures are then repeated until the stop criteria are met. (1) The individuals are evaluated and marked to reorganize the original population into n c communities, each of which has m points (step 4). (2) Complex competitive evaluation (CCE) is conducted for each point to identify the triangular probability distribution [20], known as P k (step 7), the previous generation is determined, and the new individuals from each community are mixed through reflection and mutation to form a new population (steps 6–17). (3) The individuals are then reordered to form n c new communities (step 19) and obtain the current local and global optima ( D l and g b t ) (step 21). Note that n c and m are equal to 2 and 2 n + 1 , respectively, while the outer cycling number ( n e ) and the internal and external iteration numbers of CCE ( α and β ) are equal to n + 1 , n c , and m , respectively. Therefore, except for α and β , all other five parameters that can affect the generality of SCE are only related to the dimensions of the LSM parameter space.
Note that x is scaled with a threshold normalization equation, i.e., x i n i t x m i n x m a x     x m i n [9,25,26,66,67], for both PSO and SCE. The total individual number ( n c × m ) of SCE and total particle number ( n p ) of PSO both equal 2 2 n + 1 . Additionally, for both SCE and PSO, the GSA stops when the evaluation contour ( i e ) is greater than 105 Noah runs [9]. The equitable population size and stop criteria could ensure the relatively equitable investigation and are intended to reduce the issue that the objective metrics’ impact could be affected by the GSA itself.

2.1.2. Optional Evaluator

The evaluator of the above-mentioned GSA algorithms used in our study is shown in Figure 2, which includes a fixed physical constraint and an optional objective function. The physical constraint formula ( f c ) represents the soil moisture of the first two surface soil layers ( S M C 1 and S M C 2 , x ( 30 ) and x ( 31 ) here in step 2) and only varies between the wilting point ( W L T S M C , x ( 20 ) ) and the soil moisture where transpiration stress begins ( R E F S M C , x ( 16 ) ) [9,16,31].
Under the condition r < 0 , when constraints are effective, the unscaled parameters will drive the LSM model to run simulation one time for evaluation based on the corresponding objective value, which can be selected based on the objective type ( z ), and z is a corresponding constant that varies in [1,8]. Note that once z is determined, the predefined corresponding objective metric or function ( f o ) that measures the distances between simulation and observation is also determined at the very beginning.

2.2. Composited Metrics

For calibration schemes based on GSA, the parameter–simulation problem in the LSM is addressed by searching for the optimal parameters and/or simulations that minimize or maximize the objective function ( f o ), which has been extended into multidimensions, i.e., layers ( n l ) and variables ( n e ) , to meet a holistic SM—ST objective. Nevertheless, the objective indicating the SIM—OBS distance can be quite varied and largely affects the LSM’s complex evaluation [50,51,52,53,54,55,56,57,58,59]. Therefore, eight metric differentiated objectives (Table 1), including the widely used direct measurements (i.e., the correlation and various errors), the composited deterministic measurements (i.e., the various enhanced efficiencies), and the composited flexible measurements (i.e., the Pareto nondominated), are conducted to fulfill the objective metrics’ comprehensive investigation during the present study.
This study mainly examines fixed metrics composed of fundamental measures such as linear correlation coefficient ( c c ) [58], Kling–Gupta efficiency ( k g e ) [10], absolute error ( a e ) [57], Nash–Sutcliffe efficiency ( n s e ) [55], and root mean square error ( r m s e ) [56] across different dimensions. Specifically, correlation coefficients (CCS), enhanced Kling–Gupta efficiencies (EKGE), mean absolute errors (MAES), Nash–Sutcliffe efficiencies (NSES), and root mean square errors (RMSES) represent the averages of c c , k g e , a e , n s e , and r m s e across both the variable and layer dimensions. Additionally, the enhanced multiple objectives (EMO) integrate the average values of the measure that combines c c , a e , n s e , and r m s e in the variable and layer dimensions.
Furthermore, this study has composited various metrics across different dimensions within a non-dominant Pareto space [42,43,50,51] to conduct the Pareto-dominant objective based on the basic land physical laws: that is, since the surface variations of the topsoil layer could often determine the sublayer’s variations through infiltration [24], the dimensional objective of the upper layer is assumed to be the solution. The top layer’s objective that is larger (or smaller) than the sublayers’ is taken as the current optimal maximum (or minimum) solution. Consequently, the Pareto-dominant KGE (PKGE) and the Pareto-dominant multi-objective (PMO) indicate the dominated top layer’s values of EKGE and EMO, respectively.
All the above-mentioned multi-objective metrics’ variable and layer dimensional numbers are 2 and 4, respectively. CCS varies in [−1, 1]. EKGE and NSES both vary in (−∞, 1], and EMO, MAES, and RMSES all vary in [0, +∞). PKGE varies in (−∞, 1], and PMO varies in [0, +∞). Therefore, the value of the metric determines the performance of the evaluator, and the direction of the metric determines the direction of the search, that is, continuously approaching the optimal value of the metric (i.e., the final ideal termination condition) towards the calibrated optimal solution.

2.3. Performance Evaluation

2.3.1. Parameter

During this study, parameter heterogeneity was defined as variations or sensitivities of land parameters across sites. Due to the immense dimensionality of parameter–site sensitivities, parameter relative sensitivities based on the two predefined limits of the parameter space are suggested, e.g., if more (fewer) sites met (failed to meet) a parameter’s limit compared to others, indicating sites’ relative sensitivity to that parameter within the limit’s confidence [9,16]. Since the parameter relative sensitivities (or heterogeneity) are usually large, while their homogeneity could be small (and thus be easily observed), here, to qualify this and simplify the metrics’ diversity investigation, we further propose the parameter numbers with low site sensitivities as homogeneity (H). Consequently, low H (>0) in this study indicates high heterogeneity of one metric quantitatively. Note that when all or no sites cross the parameter’s limits, H equals 0 or 0 ~ , respectively.
The parameter’s spatial uncertainty is defined as the land parameter range and outlier against the sites, e.g., one parameter’s interquartile range (IQR, >0), smaller parameter ranges, and outlier numbers indicated fewer uncertainties and fewer unaccountable factors, respectively fifig9]. In particular, to simplify different metrics’ effects on parameter uncertainty, the whole parameter space’s uncertainty is defined as the interquartile range of all the IQR ensembles of different parameters (or IQRD) in the parameter space. Consequently, the IQRD’s interquartile range and outlier indicate the quantitative parameter uncertainties among metrics.
In particular, compared to SCE, the parameter number with less parameter uncertainties (PNL, >0) and the outlier number reduction of parameter uncertainties (ONR) in PSO are summarized in this study to qualify whether the metrics’ parameter uncertainty is affected by the GSA itself or not. As the heterogeneity and uncertainty differences of different metrics could account for the metric-informed method’s performance in solving parameter spatial complexities during SM—ST calibration, the metric with less parameter uncertainties and heterogeneity could meet the preferable LSM configuration demand in surface forecasting.

2.3.2. Objective

As the population position of a generation, e.g., the best (Pb) or medium (Pm, if non solution is met) locations that are known as fitness against the number of LSM runs (or the convergence speed), could indicate the method’s performance in calibration efficiency, the better fitness values (e.g., larger EKGE values or smaller EMO values) with fewer LSM runs indicate more efficiency, where the success rate exploring evolution abilities is usually put alongside.
Moreover, as the optimal objectives (e.g., the final EKGE or EMO values) could indicate a method’s performance in calibration effectiveness, the larger or smaller optimal objectives that depend on the direction of predefined metrics indicate more effectiveness. Furthermore, since the kernel density distribution of optimal values across different sites demonstrates their spatial enrichment characteristics, the variation in enrichment between different algorithms (such as PSO or SCE) to a certain extent reflects their capacity to address the spatial disparity in SM—ST simulation.

2.3.3. Simulation

To simplify the spatial complexity among regional datasets, linear fitting between the observations (OBS) and simulations (SIM) for all sites is conducted [60]. The linear fitting’s slope (briefed as s hereafter) demonstrates the sensitivity of SIM to OBS, while its coefficient of determination (briefed as r2 hereafter) or the goodness of fit demonstrates if the sensitivity or linear model is robust or not. Moreover, under the assumption of the normal distribution of the errors between SIM and OBS (briefed as E O S hereafter) of all sites, the Gaussian fit of E O S resampled with 100 bins is conducted to generate at most two signals determining the main distribution characteristic., e.g., the amplitude (or frequency, briefed as f hereafter) and center (briefed as c hereafter) [61]. Here, the compound feature of f and c that is closer to the normal distribution indicates better performance or more consistence with the assumption.
The method’s performance in optimal simulation and forecast is qualified using the spatial differences and similarities of surface conditions among different datasets, e.g., surface simulations or reanalysis, and observations, by the following equation:
R M S E S = j = 1 n s ( s j o j ) 2 n s ,   C C S = j = 1 n s [ ( s j s n s ¯ ) ( o j o n s ¯ ) ] j = 1 n s ( s j s n s ¯ ) 2 j = 1 n s ( o j o n s ¯ ) 2
where i and j represent the i t h time and the j t h site, respectively, and n s represents the total number of stations. Smaller R M S E S and/or high C C S indicate better performance. Note that R M S E S and C C S are like the formula of the metrics RMSES and CCS (see Table 1) but are conducted within the spatial sequences here.
Meanwhile, the Taylor diagram [52,53,54,55,70], which can assemble the comprehensive statistics (i.e., standard deviation, root mean square difference, and correlation) in a temporal sequence between SIM and OBS, was also created for comparison with the method’s skills. Usually, a smaller distance away from the reference location (i.e., the OBS’s location) indicated more skills. Note that the SIM datasets (30 min) were linearly interpolated into 3 h for a broad comparison with the land reanalysis.
In addition, aside from the uncertainties and heterogeneous requirements in surface prediction parameters (manifested as variations in calibration performance), addressing the precision demands inherent to surface prediction (evidenced by differences in calibration robustness), this study employs indicators such as EKGE increment, R M S E S reduction, and C C S increment to clarify the performance of various SM—ST objectives in the calibration–forecast framework, aiming to explore the objective metrics’ advantages and/or weaknesses in both the LSM configuration and surface forecasts.

3. Experiments

3.1. Model and Data

The Unified Noah LSM is created to better predict the effects of land surface processes on regional weather, climate, and hydrology. It is intended to comprehend the intricate biophysical, hydrological, and biogeochemical interactions among the soil, the land surface, and the atmosphere at micro- and mesoscales (Figure 3A) [24]. The simple driver Noah LSM (version 3.4.1, https://ral.ucar.edu/model/unified-noah-lsm, accessed on 31 August 2024) has been recently extended into the muti-point applications over central Tibet [9].
The SM-ST observations that are firstly derived from the highest altitude soil moisture network in the world (Figure 3B, whose elevations are above 4470 m), which is constructed by the Institute of Tibetan Plateau Research, Chinese Academy of Sciences (ITPCAS) with four soil depths (i.e., 0–5, 10, 20, and 40 cm) [35], are further assembled into the multi-site (i.e., 12) observations of the local warm season (i.e., covering from 1 April to 31 July 2014) over northwest Naqu city, which has a typical semiarid climate, by using simple quality control-based time continuity correction (detailed and described in Ref. [9]). In addition, the global land data assimilation system (GLDAS) [71] grid soil reanalysis data with resolutions of 3 h/0.25° are collected for broader comparison with the surface simulations during this study.
The gridded meteorological surface datasets that merge a variety of data sources were first developed by ITPCAS, with a 3 h interval (3 h) and a resolution of 0.1° × 0.1°, were produced by [72], and are further reassembled into the multi-site LSM forcing dataset by using the inverse distance-weighted quadratic spline interpolation method to drive the Noah LSM.
According to the observational soil and surface characteristics, the multi-site Noah LSM is configured with a four-layer depth and 30 min runtime step, and the soil and vegetation types are mainly silt and grassland, while the slope type is assumed to be flat (e.g., 1). In addition, the forcing time step (3 h) and screen height (i.e., 10 m for winds and 2 m for temperature) for the LSM are the same as the input forcing data [9].

3.2. Experimental Description

A three-month-long warm-up run (covering the period from 1 April to 1 July 2014) of the multi-site LSM, that initialized with the unobserved default parameters (i.e., the “General,” “Vegetation,” “Soil,” and partial “Initial” types) [24,25,26] and partially observational “Initial” parameters (i.e., SMC1-4 and STC1-4), was first conducted to obtain the default multi-site parameter tables, including spatially distinguished “Initial” parameters, for the following experimental runs [9]. Based on this, three types of experiments are conducted, as shown in Figure 4.
Following the timeline, (1) a one-month-long run ranging from 1 to 31 July (or the control run, briefed as CTR hereafter) was conducted, as the referenced surface conditions resulted from the default LSM parameter table configuration ( Ω 0 ). (2) Secondly, the multi-objective metrics varied calibration runs that ranged from 1 to 15 July were conducted to obtain the calibrated multi-site LSM models with metrics-informed parameter tables ( Ω ) and further investigate the metrics’ impact on the calibration’s abilities in solving the spatial complexities of the Noah LSM (e.g., in terms of parameters, objective, and simulations). (3) Finally, the abovementioned various objective-informed LSM models ( Ω ) ranfrom 15 to 31 July to obtain the hopefully improved surface forecasts and further investigate the metrics’ impact on surface forecast (e.g., soil states and surface fluxes).
Following the research flow, (1) the differences between CTR and calibration could account for the calibration performance, and the difference among different calibration runs could account for the metric’s impact on the calibration. (2) Meanwhile, the differences between CTR and calibrated forecast runs could account for the calibrated LSMs’ performances (e.g., in terms of heterogeneity and uncertainty, effectiveness and efficiency, spatial complexities), and the differences among the different calibrated forecast runs should account for the metric’s impact on surface forecast (e.g., in terms of various spatial complexities). (3) After the metrics’ effects on calibration and forecast performances are comprehensively investigated, the best metric configuration of the SM—ST objective in the calibration–forecast framework is finally identified. Notably, all objective metrics within both PSO and SCE algorithms are conducted to explore if the potentially improved surface forecast could be highly affected by the algorithms themselves or not.
In particular, it should be emphasized that this study designs experiments based on the operational scheme of the LSM aiming at improving land surface forecasting. Consequently, the near surface atmospheric forcing conditions (see Section 3.1) in all experiments are identical, meaning that the impact of meteorological forcing on the land surface can be regarded as constant and neglects the surface climate variation factors [73,74]. Thus, the differences observed in the land surface conditions across all experimental results can be attributed solely to the variations in experimental design.

4. Results

4.1. Case Perspective

As land surface models utilize parameters and forcing inputs to prepare land surface forecasts, the issues of surface simulation and local application in typical semi-arid regions (i.e., rapidly applying calibrated parameters to surface forecasts) are exemplified here. To this end, a review of the spatiotemporal characteristics of the default forcing, initial parameters, and their overall simulation status across different periods, including control, simulation calibration, and forecast verification, is conducted to clarify the fundamental manifestations of the issues involved in this study.

4.1.1. Model Configure

The site-averaged 3 h meteorological forcing values against time during the study period are shown in Figure 5a–d. During July 2014, the diurnal variation in temperature ( T 2 m ) mostly ranged between 5 and 15 °C, with an extremely dry atmosphere in which the relative humidity ( R H 2 m ) values were mostly below 1%. The relatively low wind speed ( W S 10 m ) generally varied between 0 and 6 m s−1, and the wind direction ( W D 10 m ) was mostly dominated by a southern flow (between 180° and 270°) from 1 to 10 July and from 16 to 21 July, respectively, but the opposite in other periods. The incoming shortwave radiation ( S W ) exhibited strong diurnal variation between 0 and 600 W m−2, and the incoming longwave radiation (L W ) varied between 250 and 350 W m−2. The pressure ( P ) was generally around 586 hPa, and the maximum hourly precipitation ( R 1 h ) was about 5 mm h−1 on 10 July.
The initial land parameters of all the sites (or default parameters) that need calibration have shown great variety for the “Initial” types (Figure 5e), and this is especially pronounced for the moisture-related parameters, i.e., the SMC and the SH2O. This should be attributed to the differences in the pre-experiment 3-month simulation of different sites. Furthermore, due to the lack of direct observations for other types of land surface parameters, they are configured using statistical data from optimization experiments based on limited benchmarks from the previous study (serving only as reference inputs for control experiments, consistent with conventional numerical model configurations) [9]. Therefore, in numerical operations, parameter variations among different stations mainly exist in the initial types. Consideration of multi-site calibration can account for the differences among stations with unobserved parameters under existing observational constraints, namely, the spatial heterogeneity and uncertainty of the parameter space. Consequently, the spatial heterogeneity of parameters (i.e., the sensitivity of commonly used parameters to different stations or the number of intersections of the same parameters at different stations in the parameter space) and the characteristics of uncertainty (such as the interquartile range and outlier features of parameters at different stations) in relation to the differences in various optimization objectives are the key areas of focus for further investigation in this study.

4.1.2. Forecast Problem

The CTR simulations and observation datasets for the surface layer are compared in Figure 5f–i. For the whole experimental period, the linear fit for the surface soil moisture (briefed as S M 05 c m hereafter) exhibited a small increasing slope (about 0.21) with weak consistency, and the surface soil temperature (briefed as S T 05 c m hereafter) had a larger decreasing slope (about –0.45) with strong differences. Moreover, the linear fits of S M 05 c m for the calibration and forecast periods were 0.22 and 0.15, respectively, and the linear fits of S T 05 c m for the calibration and forecast periods were −0.48 and −0.4, respectively. This indicates that the surface conditions of the forecast period were slightly better than those for the calibration period. Generally, S M 05 c m fits better than S T 05 c m .
In addition, the S M 05 c m ’s E O S for the whole experimental period had a sharp and narrow distribution, which was centered around 0.15 m 3 · m 3 with a frequency of around 800, while the E O S distributions of the calibration and forecast periods had centered around 0.16 and 0.13 m 3 · m 3 , with a frequency of around 500 and 300, respectively. This indicates that S M 05 c m values were mostly underestimated for all periods, and this is more pronounced at the calibration period. Nevertheless, the E O S values of S T 05 c m for different periods showed bimodal distributions (Figure 5i), whose centers were located around −4 and 9 K (whole period), −5 and 9 K (calibration period), and −4 and 8 K (forecast period), respectively. This indicates that S T 05 c m values were both under- and over-estimated, and the latter were more pronounced. Generally, S M 05 c m and S T 05 c m were both underestimated.
In general, though S M 05 c m in CTR exhibited better consistency with OBS than S T 05 c m , the overall surface simulation underestimation of the Noah LSM could be great for regional surface forecast applications. Note that either the ITPCAS forcing datasets or the improved heat-sensitive parameter Z0h (also known as CZIL) to improve S T 05 c m with the Noah LSM over a surface near our study area [7,75], as well as the non-negligible biased S T 05 c m and the spatially diverted parameter space in CTR, indicated a more effective calibration in present study. Since multi-objective calibration can reduce these spatiotemporal errors through parameter identification to improve subsequent forecasts [9], the next focus is on how different objective metrics affect the performance of calibration and forecasting.

4.2. Effects on Calibration

4.2.1. Optimal Parameters

Due to the significant spatial heterogeneity exhibited by most parameters in PSO and SCE, Table 2 has summarized the statistics of H against the four land parameter types during each optimal parameter space for all the experiments, which have been detailed in Figure S1-1, based on the parameter relative sensitivity analysis [9,16,25,26]. For the “Vegetation” type, except for the SCE scenario considering CCS, the H of other scenarios is zero, indicating great heterogeneity. Regarding the “Soil” type, the H values for PSO and SCE calibration schemes (short for Hp and Hs hereafter) based on EKGE, EMO, MAES, and RMSES metrics are (1, 3), (2, 2), (1, 1), and (2, 1), respectively. For the “General” type, the (Hp, Hs) values based on CCS, EKGE, EMO, MAES, and RMSES metrics are (1, A), (2, 2), (2, 2), (1, 1), and (1, 1), respectively. For the “Initial” type, the (Hp, Hs) values based on EKGE, EMO, and MAES metrics are (4, 2), (3, 2), and (2, 1), respectively. Evidently, among all pairs, the spatial homogeneity of optimal parameters for all “Vegetation” types in PSO and SCE is relatively minimal, suggesting the strongest heterogeneity. Conversely, “Soil” and “General” types exhibit minimal spatial heterogeneity, while “Initial” types fall in the middle. Notably, QTZ and SBETA parameters consistently demonstrate homogeneity, below the parameter space threshold (0.33), across PSO and SCE schemes based on EKGE, EMO, MAES, and RMSES metrics.
Considering the heterogeneity disparities of the entire parameter space among metrics, the H counts of all the parameter types in PSO and SCE are further assembled. For CCS, the counts are 1 and 40, respectively; for EKGE, both schemes yield 7; for EMO, the counts are 7 and 6; for MAES, they are 4 and 3; for NSES, the counts are 2 and none; both PKGE and PMO register none; and for RMSES, the counts are 5 and 2. Evidently, there exist substantial variations in the homogeneity or heterogeneity of the parameters among calibration schemes based on different metrics. Notably, CCS exhibits the lowest parameter heterogeneity, followed by EKGE, then EMO, and subsequently MAES and RMSES. NSES displays relatively poor parameter heterogeneity, whereas PKGE and PMO manifest the highest degree of parameter heterogeneity.
To qualify the metrics’ effect on the parameter uncertainties, both the IQR of each parameter and the IQRD of the entire parameter space contributed by different metrics as shown in Figure 6 are compared. For PSO, the maximum IQR, which has about 4.8 of the parameter SNP in the “Vegetation” type, had the largest uncertainties, while the EMO made the largest contribution. However, the IQR, which is about 1.2 of the SBETA parameter in the “General” type, behaves oppositely, while EKGE, EMO, and RMSES make the smallest contributions (Figure 6a). For SCE, the maximum IQR is around 1.82 of the CZIL parameter in the “General” type, and it has the largest uncertainties, while EMO has the largest contribution. Nevertheless, the IQR that is around 0.61 of the parameter CSOIL in the “General” type behaves conversely, while EKGE, EMO, and RMSES make the smallest contributions (Figure 6b). Generally, PSO achieved higher IQR than SCE for most metrics. PSO and SCE achieved the lowest uncertainties of the parameters SBETA and CSOIL in the “General” type. Moreover, the IQRD of PSO across various metrics exhibits a broader range with more scatters compared to that of SCE (Figure 6c). For PSO, the IQRD median values, ranked from highest to lowest, are PKGE > PMO > NSES > CCS > EMO > EKGE > RMSES > MAES. In contrast, for SCE, the order is PMO > PKGE > EKGE > EMO > MAES > RMSES > CCS. For PSO, the number of outliers in the IQRD is highest for EKGE with 3, followed by EMO and CCS with 2, while the rest of the metrics have zero outliers. For SCE, the EKGE and RMSES numbers of outliers are both 2, followed by EKGE and MAES with 1 outlier each, and the rest are 0 (Figure 6d). In summary, significant differences exist in the IQRD of the entire optimal parameter spaces across different metrics, with SCE exhibiting smaller IQRD but relatively more outliers. Notably, EKGE and EMO exhibit relatively large numbers of outliers in both PSO and SCE.
To qualify the uncertainty’s disparities of different parameter types among metrics that are induced by GSA’s advantages or weakness, Table 3 has summarized the IQR and outlier differences (i.e., PNL and ONR) of different parameter types in the PSO’s optimal parameter space when compared to SCE against different metrics, while the IQR and outliers of both the PSO’s and SCE’s optimal parameter spaces for each metric are detailed in Figure S1-2. For the “Vegetation” type, all metrics are null except for the PNL value of EKGE, which is 2, while the ONR of all metrics is non-positive. For the “Soil” type, the PNL values are positive for all metrics except for CCS, NSES, and PKGE, which are null. The ONR values are positive for EKGE, EMO, PMO, and RMSES while negative for the rest. Regarding the “General” type, all metrics exhibit positive PNL values except for NSES, PKGE, and PMO, whose PNL values are null. The ONR values are positive for EKGE, EMO, MAES, and RMSES and negative for the others. In the “Initial” type, only EKGE and EMO have positive PNL values, with the rest being null. The ONR values are positive for all metrics except for CCS, PKGE, and PMO, which are non-positive. In summary, summing the PNL values across types, EKGE has the highest total (8), followed by EMO and RMSES (7), then MAES (5), with PMO and CCS having the lowest totals (1). PKGE has no PNL value. For the ONR values, EMO has the highest total (9), followed by EKGE (3), then RMSES (3), while PMO has the lowest (2). The rest of the metrics have negative ONR values.
In general, the overall low homogeneities (e.g., the low H), and the PSO’s low relative advantages (e.g., the general NA PNL with non-positive ONR) have indicated the great parameter heterogeneity and uncertainty challenges and implied higher calibration frequency demand. In particular, the “Vegetation” parameters show the most spatial heterogeneities and uncertainties among all the parameter types, while the “General” type behaves oppositely, which is like the sensitivity tradeoff of the two parameter types over Tibet [7,8,17,22,23]. Moreover, for each objective metric, SCE consistently achieves lower parameter uncertainty than PSO, albeit at the cost of relatively higher spatial heterogeneity, and this tradeoff is in line with the previous study [9]. EKGE and EMO in PSO and EKGE in SCE have yielded the smallest parameter heterogeneities (e.g., low H values), while MAES in PSO and CCS in SCE exhibit the smallest parameter uncertainties (e.g., low IQRD median values). Meanwhile, compared to other metrics, EKGE in PSO and EMO in SCE have shown the most unaccountable factors of the parameter uncertainties, as indicated by having the most outliers of IQRD.
Overall, the “General” and “Vegetation” types have generally shown the opposite tradeoffs in LSM parameter calibration performance in terms of heterogeneity and uncertainty. Once the objective metric is determined, SCE favors lowering parameter uncertainty at the cost of heterogeneity compared to PSO during calibration. EKGE and EMO have shown considerable advantages in reducing parameter heterogeneity compared to other metrics, while the former is slightly more powerful. However, the metrics’ best advantages in reducing parameter uncertainties have shown great diversity (e.g., MAES in PSO and CCS in SCE) and overconfidence (e.g., null accountable factors), while the parameter uncertainty differences of one method induced by metric differences are great (this is especially evident in PSO). This demonstrates that both metric and method could highly affect the parameter uncertainties during calibration.

4.2.2. Effectiveness and Efficiency

Figure 7 shows the different metrics’ fitness curves (i.e., Pb and Pm), and the median position and median Noah run numbers at convergence for different sites, e.g., ( C P P , C P N ) and ( C S P , C S N ) are for PSO and SCE, respectively. For CCS, both PSO and SCE both sharply increased before 3000 Noah runs, and both converged to 1 but at around 79,475 and 66,663 runs, respectively. For EKGE, PSO and SCE both sharply increased before 10,000 Noah runs but converged to 0.56 at 99,017 runs and 0.53 at 90,731 runs, respectively. For EMO, PSO and SCE both decreased to 1 before 8000 Noah runs but converged to 1 at 99,297 runs and 1.08 at 82,709 runs, respectively. For MAES, PSO and SCE both quickly decreased to the range of 0.7–1.1 before 10,000 Noah runs but converged to 0.79 at 99,765 runs and 0.81 at 94,795 runs, respectively. For NSES, PSO and SCE both instantly reached 1 at 187 runs, indicating the most rapid convergence among all metrics. However, for PKGE and PMO, since volatile finesses (e.g., which vary within ( , 1 ] and [ 1 , + ) , respectively) are found for all sites in each generation, nonstrict solutions can be observed. For RMSES, PSO and SCE both sharply decreased to 1 before 5000 Noah runs but converged to 0.97 at 99,391 runs and 0.98 at 94,029 runs, respectively.
Generally, except for PKGE and PMO, other metrics of PSO achieved better effectiveness, as indicated by their better fitness values, but with relatively worse efficiency, as indicated their larger converged runs compared to those of SCE. The non-solution performance for the metric PKGE and PMO of both PSO and SCE indicated their requirements for more Noah runs in achieving convergence or the potential failure of the Paetro-dominated logic (i.e., that surface improvement likely improves the subsurface). For MAES, NSES, and RMSES, the fitness curve of site C4 is found to be notably biased from (or worse than) that of other sites. Nevertheless, for all the metrics’ convergences, MAES has the largest range, and this could indicate the divergent convergence domain.
Figure 8 presents the success rate curves for calibration across various metrics. For CCS, PSO experiences a decline from 70% to 20% during the first 10,000 Noah runs, followed by a gradual decrease to near zero. In the case of EKGE, PSO initially shows a decline from 80% within the first 5000 Noah runs, subsequently exhibiting two distinct patterns: fluctuations around 40% and 20%, respectively. For EMO, PSO drops from 80% to nearly 0% within the initial 25,000 Noah runs, with some stations subsequently exhibiting strong fluctuations between 0% and 80%. MAES follows a similar trend, with PSO declining from 80% to near 0% within the first 15,000 Noah runs and subsequent intense fluctuations between 0% and 80% at certain stations. For NSES, PSO gradually decreases from 80% to 20% within the first 35,000 Noah runs and remains stable thereafter. PKGE and PMO exhibit similar behavior, with PSO slowly declining from 80% to 20% within the first 20,000 Noah runs and fluctuating slightly around 20% thereafter. SCE’s performance in PKGE resembles that of CCS. In contrast, RMSES displays a fluctuating decline from 80% to 0% within the initial 20,000 Noah runs for PSO, followed by drastic fluctuations between 20% and 80%. However, SCE consistently demonstrates a rapid initial decrease from 80% to 20% across nearly all metrics, maintaining this level thereafter.
For all metrics, the search domain of SCE exhibits a consistent pattern, characterized by an L-shaped thin linear region. In contrast, PSO’s search domain displays significant fluctuations and notable variations across different metrics (e.g., EKGE, EMO, MAES, RMSES), albeit with an overall larger area than SCE. This suggests that for most metrics, PSO demonstrates stronger evolutionary capabilities compared to SCE, which primarily contributes to PSO’s slightly slower convergence rate compared to SCE.
Figure 9 presents the statistical performance of the optimal objectives across all stations for various metrics. For CCS, both PSO and SCE exhibit a concentrated distribution near 1, with PSO displaying a tighter clustering and an outlier at 0.973. In the case of EKGE, PSO and SCE concentrate around 0.58 and 0.53, respectively, with PSO showing a more focused distribution and an outlier at 0.34. For EMO, PSO and SCE are centered near 1 and 1.1, respectively, with PSO displaying a relatively dispersed distribution and an outlier at 1.5. MAES values for PSO and SCE are centered around 0.79 and 0.81, respectively, demonstrating similar distributions. For NSES, PKGE, and PMO, both PSO and SCE have concentrated distributions near 1, with NSES exhibiting a more tightly clustered distribution compared to the other two metrics. Finally, for RMSES, PSO and SCE are centered around 0.9 and 1.1, respectively, with SCE displaying a more focused distribution and both having outliers at around 2.4.
It is evident that for the optimal solutions of PKGE and PMO, both PSO and SCE yield values of 1, indicating the absence of optimal solutions or the need for more time to locate them. In contrast, numerical optimal solutions were achieved for other metrics. Furthermore, while PSO consistently outperformed SCE in attaining better optimal solutions across almost all metrics, significant variations were observed in the enrichment levels of optimal solutions between PSO and SCE under different metrics. For instance, PSO surpassed SCE in CCS and EKGE, whereas SCE surpassed PSO in EMO, MAES, and RMSES. Notably, PSO and SCE exhibited similar performance in NSES. This underscores the disparate spatial variability characteristics of optimal solutions influenced by distinct metrics (whereby the enrichment levels of optimal solutions at different sites reflect the extent of spatial variability or heterogeneity). Additionally, notable outliers were identified in PSO’s performance within CCS, EKGE, and EMO metrics, while both PSO and SCE exhibited outliers in the RMSES metric. This indicates that for RMSES, unquantifiable factors within the spatial variability of optimal solutions are more pronounced, whereas for other metrics, PSO’s performance relative to SCE is more significantly influenced.
Overall, apart from PKGE and PMO, for other metrics, PSO typically exhibits better optimal solutions, i.e., enhanced effectiveness, compared to SCE, albeit at the cost of relatively lower efficiency. Notably, for CCS, EKGE, and RMSES, the optimal solutions obtained by PSO demonstrate higher kernel densities (or lower spatial variability) than those obtained by SCE, while for EMO and RMSES, the performance behaves oppositely, and for other metrics, kernel densities of optimal objectives are almost equal. In particular, only these enrichment advantages resulting from metrics such as CCS, EKGE, and RMSES in PSO have commonalities of the positive PNL in the “General” type (Table 3), which likely indicate the “General” type’s advantages in lowering the spatial variabilities of calibration objectives [15,16,17]. However, recall that the overall PNL values are much smaller than the LSM parameter dimensions (Table 2 and Table 3), and the optimal objective’s absorption resulting from parameter advantages could be small, while their resulting simulation differences need further investigation.

4.2.3. Optimal Simulation

Linear and Gaussian Fitting

Figure S2-1 presents linear fitting, whose indicators are slope and goodness of fit (briefed as (s,r2) hereafter), between simulations and observations for S M 05 c m and S T 05 c m under varying metrics. For S M 05 c m , the PSO’s s values (in descending order) are EMO, EKGE, RMSES, MAES, PMO, NSES, CCS, PKGE, with r2 values also descending from EMO to PKGE. In contrast, the order of SCE s values is EMO, PMO, EKGE, MAES, PKGE, NSES, RMSES, and CCS, with r2 following a similar but slightly different descending order. For S T 05 c m ’s linear fitting (Figure S2-1-2), the PSO s values are, in order, EKGE, EMO, MAES, RMSES, CCS, NSES, PKGE, PMO, while r2 values show a distinct ordering: PMO, followed closely by EMO and PKGE, then with MAES, RMSES, and NSES closely grouped, then EKGE, and finally CCS. SCE fitting for S T 05 c m exhibits a different ordering for s (EKGE, EMO, CCS, RMSES, MAES, NSES, PMO, PKGE) and r2 values (PKGE, PMO, NSES, EKGE, EMO, RMSES, with CCS and MAES closely grouped).
Generally, for S T 05 c m , except for metrics like NSES, PKGE and PMO in both PSO and SCE exhibit negative s values, while the rest are positive (Table 4). This indicates that most linear relationships between calibrated simulations and observations are positively correlated, which aligns with the improvement objectives of this study. Specifically, for EMO and EKGE, the s values of PSO (SCE) in the calibration of S M 05 c m and S T 05 c m are 0.96 (0.83) and 0.18 (0.23), respectively, showcasing the optimal calibration performance (Figure 10). Furthermore, it is noteworthy that for S T 05 c m , the highest r2 value of 0.11 is comparable to the lowest r2 value observed in S M 05 c m (PKGE), implicitly suggesting a greater challenge in modeling S T 05 c m .
The Gaussian fitting, with indicators of center and frequency (briefed as (c, f) hereafter), of E O S for S M 05 c m in Figure S2-2-1 reveals the following: CTR is widely distributed, peaking at ~0.15 (f = 297). In PSO and SCE, CCS values span widely around −0.04 (f ≈ 350) and 0.11 (f ≈ 295), EKGE narrowly centers at 0 (f ≈ 1276) and 0 (f ≈ 608), EMO narrowly peaks at 0 (f ≈ 1178) and 0.01 (f ≈ 700), MAES widens slightly at 0.01 (f ≈ 344) and 0.02 (f ≈ 416), NSES values are wide at 0.05 (f ≈ 274) and 0.05 (f ≈ 230), PKGE widely centers at 0.08 (f ≈ 322) and 0.11 (f ≈ 325), PMO narrowly peaks at 0.02 (f ≈ 480) and 0.03 (f ≈ 444), and RMSES narrowly centers at −0.02 (f ≈ 426) and 0 (f ≈ 296). Moreover, the c (f) of E O S for S T 05 c m (Figure S2-2-2) shows the following: CTR has a wide bimodal distribution centered at ~7.1 (f ≈ 192) and −3.8 (f ≈ 134). In PSO and SCE, CCS widely centers at ~2.3 (f ≈ 216) and 1.1 (f ≈ 167), EKGE widely centers at ~1.3 (f ≈ 200) and 2.5 (f ≈ 203), EMO centers at ~0.85 (f ≈ 170) and 1.23 (f ≈ 207), MAES centers at ~−0.06 (f ≈ 200) and 0.88 (f ≈ 230), NSES widely centers at ~5.86 (f ≈ 169) and 5.03 (f ≈ 213), PKGE widely centers at ~4.91 (f ≈ 237) and 5.01 (f ≈ 152), PMO widely centers at ~6.1 (f ≈ 300) and 5.19 (f ≈ 224), and RMSES centers at ~0.16 (f ≈ 200) and 1.29 (f ≈ 206).
Generally, for E O S of S M 05 c m , EKGE’s performance in both PSO and SCE is closest to a normal distribution, whereas for that of S T 05 c m , MAES exhibits the closest resemblance to normality (Figure 11), with EKGE performing relatively poorly (Table 5). This underscores the significant influence of metric discrepancies on optimal simulation errors, contingent upon distinct calibration objectives. Furthermore, excessively wide peaks with low frequencies in unimodal distributions (e.g., CCS, NSES, PKGE, and PMO) indicate the dispersed fitting distribution, potentially necessitating the multimodal (e.g., more than two peaks) fitting. Conversely, bimodal distributions characterized by narrower peaks may call for a single-peak fitting.
In general, EMO, EKGE, and PMO have slopes greater than 0.7, showing promising S M 05 c m linear modeling, while EKGE, EMO, MAES, and RMSES have slopes greater than 0.1, showing better S T 05 c m linear modeling than other metrics. Furthermore, EKGE and EMO have unbiased errors centered around 0 for S M 05 c m , showing promising S M 05 c m nonlinear modeling, while MAES, RMSES, and EMO have unbiased errors centered around 1, showing better S T 05 c m nonlinear modeling than other metrics, e.g., EKGE is biased around 3 K. Overall, EMO has shown the most advantages in surface soil modeling among all metrics.

Spatial Difference and Similarity

Figure 12a depicts temporal R M S E S variations for S M 05 c m . CTR is generally the largest, at 0.15 (decreasing during 5 and 10 July rainfalls), followed by a slight upward trend. For CCS, PSO slightly fluctuates between 0.1 to 0.17, while SCE fluctuates around 0.07. EKGE and EMO values in PSO (SCE) are around 0.01 (0.03) and 0.01 (0.02), respectively, both trending downward. MAES values in both PSO and SCE are around 0.04, both declining. NSES values in both PSO and SCE are around 0.07, upward trending. PKGE in PSO (SCE) is 0.12 (0.1), upward trending. PMO in PSO decreases from 0.1 to 0.05, while SCE values are around 0.07, slightly decreasing. RMSES values in PSO (SCE) are 0.04 (0.05), both declining. Moreover, Figure 12b illustrates the overall R M S E S distribution for S M 05 c m . Median R M S E S values ranking from highest to lowest for PSO are as follows: CCS (0.13) > PKGE (0.12) > NSES (0.08) > PMO (0.07) > RMSES (0.039) > MAES (0.038) > EMO (0.018) > EKGE (0.017); for SCE: PKGE (0.1) > CCS (0.09) > NSES (0.085) > PMO (0.065) > RMSES (0.056) > MAES (0.039) > EKGE (0.03) > EMO (0.02). Notably, EKGE in PSO and EMO in SCE exhibit the lowest median R M S E S , whereas CCS in PSO and PKGE in SCE have the highest.
Figure 12c depicts temporal variations of the C C S of S M 05 c m . CTR significantly drops 5–6 July (0.5 to −0.6), fluctuating at ~0.2 otherwise. For CCS, PSO is stable at −0.5, and SCE increases 5 July (−0.4 to 0.5) then sharply drops to −0.2. For EKGE, PSO and SCE fluctuate around 1 and 0.8. For EMO, both are ~1. For MAES, PSO increases (0.2 to 1), SCE initially declines 4–5 July (0.4 to 0) and then becomes ~0.2. For NSES, PSO ~0.7 and then drops post 10 July to ~0.5; SCE ~0.2. For PKGE, PSO ~0.45, with a sharp drop 5 July to ~−0.4; SCE is −0.4, with a sharp increase 4 July to 0.5, followed by a sharp drop 6 July to ~−0.3. For PMO, PSO is 0.6, with a sharp drop 6 July to −0.5, followed by a rise to 0.2; SCE is 0.8 then drops 5 July to 0 and rises to 0.4. For RMSES, PSO increases (0.5 to 0.8) and stabilizes ~0.8 post 4 July; SCE is ~0.45, declines 4–6 July, and rises to ~0.26. EMO and EKGE consistently outperform MAES for PSO and SCE, with other metrics displaying varied trends. Moreover, Figure 12d illustrates the overall C C S distribution for S M 05 c m . Median C C S values, ranking from highest to lowest for PSO, are as follows: EMO > EKGE > MAES > RMSES > NSES > PMO > PKGE > CCS; for SCE: EKGE > EMO > MAES > PMO > PKGE > NSES > RMSES > CCS. Notably, EMO in PSO and EKGE in SCE exhibit the highest median C C S , whereas CCS in PSO and SCE has the lowest.
For R M S E S of S T 05 c m , CTR shows a marked diurnal variation, averaging 8 K fluctuations (Figure S2-3-1). Due to overlapping diurnal error ranges, its performance complexity surpasses S M 05 c m . Notably, NSES, PKGE, and PMO peak R M S E S values > 14 K (CTR’s max), indicating inferiority (Figure 12e). Conversely, MAES and RMSES peak at 8 K, surpassing CTR. EKGE and EMO, excluding initial days, also peak near 8 K, outperforming CTR. Median R M S E S (K) values ranking from highest to lowest yields the following order for PSO: PMO (7.5) > PKGE (6) > NSES (5.8) > CCS (4) > EKGE (3.5) > MAES (2.8) > RMSES (2.5) > EMO (2.48); and for SCE: PKGE (6.1) > PMO (5.8) > NSES (5.6) > CCS (3.6) > EKGE (3.3) > MAES (2.9) > RMSES (2.7) > EMO (2.5) (Figure 12f). EMO in both PSO and SCE exhibits the lowest median R M S E S , whereas PMO in PSO and PKGE in SCE possess the highest.
Furthermore, for C C S of S T 05 c m , CTR varies from −0.5 to 0.7, showing distinct diurnal patterns (Figure 12g). Overlapping diurnal error ranges complicate performance compared to S M 05 c m (Figure S2-3-2). In both PSO and SCE, CCS and EKGE maximums are less than 0.7 (CTR’s max), indicating inferiority, while NSE, PKGE, and PMO maximums rival CTR, but minimums are larger than −0.5, outperforming CT; EMO, MAES, and RMSES maximums are around 0.8, exceeding CTR. Hence, C C S performance ranks EMO, MAES, and RMSE as the best, followed by NSE, PKGE, and PMO; CCS and EKGE perform badly. Moreover, Figure 12h illustrates the overall C C S distribution of S T 05 c m . Median C C S values ranking from highest to lowest for PSO are as follows: EMO > MAES > RMSES > NSES > PMO > PKGE > EKGE > CCS; for SCE: EMO > RMSES > CCS > EKGE > MAES > PKGE > PMO > NSES. Notably, EMO exhibits the highest median C C S for both PSO and SCE, whereas CCS and NSES have the lowest.
Overall, for S M 05 c m , EKGE in PSO and EMO in SCE have the lowest median R M S E S when EMO in PSO and EKGE in SCE have the highest median C C S , which shows the complex triangular tradeoffs among metrics, algorithms, and simulation complexities. Nevertheless, for S T 05 c m , EMO in both PSO and SCE exhibits the lowest median R M S E S and the highest median C C S , which demonstrate the significant benefits of EMO in reducing simulation complexities. Generally, for S M 05 c m , EKGE in PSO and EMO in SCE have the best R M S E S and C C S performances, respectively, while for S T 05 c m , EMO in both PSO and SCE has the best R M S E S and C C S performances. It is noted that the EMO’s general advantages of solving the optimal simulations’ complexities are likely in line with its advantages in reducing parameter heterogeneities (Table 2).

4.3. Effects on Forecast

4.3.1. Linear and Gaussian Fitting

Figure S3-1 illustrates disparities in linear fitting (s, r2) between simulations and observations for S M 05 c m and S T 05 c m across metrics. For S M 05 c m ‘s linear fit (Figure S3-1-1), PSO’s s values (descending) are as follows: EKGE > EMO > MAES > RMSES > NSES > PMO > CCS > PKGE; the r2 order matches. For SCE, s values in descending order are as follows: EMO > EKGE > PMO > MAES > NSES > RMSES > CCS > PKGE. The r2 order differs: EKGE > EMO > MAES > PMO > NSES > PKGE > RMSES > CCS. For S T 05 c m ’s fit (Figure S3-1-2), PSO’s s values are in order as follows: MAES > RMSES > EMO ≥ EKGE/CCS > NSES > PKGE > PMO; for r2, the order is as follows: PMO > PKGE > EMO > RMSES ≥ MAES/NSES > EKGE > CCS. SCE’s s values are in the following order: MAES/CCS > RMSES > EMO/EKGE > NSES > PMO > PKGE; for r2, the order is as follows: PKGE > RMSES/PMO > MAES/EMO > NSES > CCS/EKGE.
Generally, in addition to S T 05 c m of NSES, PKGE, and PMO and S M 05 c m of PKGE, both PSO and SCE exhibit positive s values (Table 6). This indicates that the forecasts and observations are positively correlated. Specifically, for S M 05 c m of EKGE and S T 05 c m of MAES, the s (r2) values of PSO (SCE) S M 05 c m and S T 05 c m are 0.98 (0.84) and 0.14 (0.15), respectively, showcasing the best performance (Figure 13). Furthermore, it is noteworthy that for S T 05 c m of MAES, the highest r2 value of 0.1 is much smaller than the highest r2 value observed for S M 05 c m of EKG, implicitly suggesting a greater challenge in S T 05 c m forecasting.
The c (f) values of E O S of S M 05 c m (Figure S3-2-1) reveal the following: CTR centered at 0.19 (f ≈ 272). In PSO and SCE, the CCS centers are at 0.15 (f ≈ 189) and 0.07 (f ≈ 225), EKGE narrowly centers at 0 (f ≈ 383) and 0 (f ≈ 363), the EMO centers are at 0 (f ≈ 416) and 0 (f ≈ 359), the MAES centers are at −0.01 (f ≈ 359) and 0 (f ≈ 284), NSES centers at 0.06 (f ≈ 343) and 0.05 (f ≈ 322), PKGE bimodally centers at 0.13/0.04 (f ≈ 234/220) and 0.16/0.06 (f ≈ 199/365), PMO widely centers at 0.01 (f ≈ 367) and 0.04 (f ≈ 323), and RMSES centers at −0.02 (f ≈ 293) and 0.01 (f ≈ 326). Furthermore, the c (f) values of E O S of S T 05 c m (Figure S3-2-2) reveal the following: CTR bimodally centers at 7.28/−3.57 (f ≈ 211/160). In PSO and SCE, CCS widely centers at 3.2 (f ≈ 187) and −0.38 (f ≈ 181), EKGE centers at −0.09 (f ≈ 143) and 3.39 (f ≈ 189), EMO centers at −1.41 (f ≈ 175) and −0.98 (f ≈ 148), MAES centers at 0.49 (f ≈ 181) and 0.29 (f ≈ 206), NSES widely centers at 5.81 (f ≈ 204) and 4.56 (f ≈ 210), PKGE widely centers at 4.9 (f ≈ 214) and 5.7 (f ≈ 217), PMO widely centers at 6.17 (f ≈ 221) and 5.47 (f ≈ 187), and RMSES centers at 0.55 (f ≈ 194) and 0.32 (f ≈ 198).
Generally, for E O S of S M 05 c m , EMO and EKGE in both PSO and SCE are closest to the normal distribution, whereas for that of S T 05 c m , MAES exhibits the closest resemblance to normality (Figure 14), with EKGE in SCE having a positive bias around 3.3 K (Table 7). The unbiased errors of S M 05 c m for EKGE and EMO and the unbiased errors of S T 05 c m for MAES are consistent with the surface modeling advantages indicated by the linear fitting. Nevertheless, the metrics’ inconsistences of the best Gaussian fitting of S T 05 c m errors in calibration and forecast underscore the significant influence of metric discrepancies on forecast errors. Furthermore, excessively wide peaks with low frequencies in unimodal distributions (e.g., CCS, NSES, PKGE, and PMO), which is consistent with the optimal simulations’ Gaussian fitting (see Section 4.2.3), indicate the dispersed fitting distribution, potentially necessitating the multimodal (more than two peaks) fitting.
In general, EMO and EKGE have slopes greater than 0.8, showing promising S M 05 c m linear modeling, while EMO, MAES, and RMSES have slopes greater than 0.1, showing better S T 05 c m linear modeling than other metrics. Furthermore, EKGE and EMO have unbiased errors centered around 0 for S M 05 c m , showing promising S M 05 c m nonlinear modeling, while MAES, RMSES, and EMO have unbiased errors centered around 0.3, 0.3, and −1, showing better S T 05 c m nonlinear modeling than other metrics, e.g., EKGE is biased around 3 K. Overall, EMO has shown the most advantages in surface soil modeling among all metrics. This is consistent with EMO’s surface modeling performance during the calibration period (see Section 4.2.3 on “Linear and Gaussian Fitting”).

4.3.2. Spatial Difference and Similarity

Figure 15a depicts the temporal R M S E S variations of S M 05 c m . CTR is largest (~0.15), fluctuating with a dip on 24 July. CCS in PSO ranges around 0.13 then trends upward, while it remains stable at 0.1 in SCE. EKGE and EMO in PSO/SCE are ~0.02/0.01 and ~0.02/0.02, respectively, with both trending slightly upward. In both PSO and SCE, MAES values are ~0.04, NSES hovers at 0.07 with a slight decline, PKGE is ~0.12, and PMO declines from ~0.07 to 0.0. RMSES values in PSO/SCE are ~0.04/0.06. Furthermore, Figure 15b illustrates the overall R M S E S distribution of S M 05 c m . The median R M S E S values ranking from highest to lowest for PSO are as follow: CCS (0.12) > PKGE (0.11) > NSES (0.07) > PMO (0.05) > MAES (0.04) > RMSES (0.036) > EMO (0.028) > EKGE (0.02); for SCE, PKGE (0.11) > CCS (0.1) > NSES (0.07) > RMSES (0.052) > PMO (0.05) > MAES (0.04) > EMO (0.02) > EKGE (0.019). Notably, EKGE has the lowest median R M S E S for both PSO and SCE, whereas CCS in PSO and PKGE in SCE have the highest.
Figure 15c shows temporal C C S variations of S M 05 c m . CTR significantly drops from 20–21 July (0.1 to −0.7) and is stable at ~0.2 otherwise. In both PSO and SCE, CCS hovers around −0.3. EKGE in PSO remains ~0.8, while in SCE, it jitters at ~0.3. EMO is ~1 in both PSO and SCE. MAES in PSO is ~0.8, declining gradually, while in SCE, it drops from 0.4 to 0 (5 July) and then rises slightly to ~0.2. NSES in PSO starts at 0.7, dropping to ~0.5 post 10 July; while in SCE, it jitters at ~0.2. PKGE in PSO is ~0.45, sharply dropping to ~−0.4 post 5 July, while in SCE, it initially is ~−0.4 then jitters at ~−0.3. PMO in PSO starts at ~0.6, sharply drops to ~−0.5 (6 July), and rises to ~0.2, while in SCE, it starts at ~0.8, drops to 0, and rises to ~0.4. RMSES in PSO jitters and rises (0.5 to 0.8), stabilizing at ~0.8 post 4 July, while SCE starts at ~0.45, drops to ~0 (4–6 July), and rises, fluctuating around ~0.26. For C C S of S M 05 c m , PSO and SCE consistently rank in the order EMO > EKGE > MAES, with the others exhibiting unstable/inferior performance. Figure 15d depicts the overall C C S distribution of S M 05 c m . The median C C S ranking from highest to lowest, for PSO, is as follows: EKGE > EMO > MAES > RMSES > NSES > PMO > PKGE > CCS; for SCE, it is as follows: EKGE > EMO > MAES > PMO > NSES > RMSES > PKGE > CCS. Notably, EKGE has the highest median C C S for both PSO and SCE, while CCS has the lowest.
Figure 15e displays temporal R M S E S variations of S T 05 c m . CTR exhibits pronounced diurnal fluctuations at 10 K. Metric performances are intricate due to overlapping diurnal error amplitudes (Figure S3-3-1). In both PSO and SCE, NSES, PKGE, and PMO maximums exceed 15 K (CTR’s max), indicating inferior performance, while EMO, MAES, and RMSES maximums are lower than 7 K, superior to CTR. Noted that CCS and EKGE maximums in both PSO and SCE are around 8K (except 1–2 July), also outperforming CTR. For R M S E S of S T 05 c m , in both PSO and SCE, EMO, MAES, and RMSES show as the smallest, followed by CCS and EKGE, while NSES, PKGE, and PMO are the highest. Moreover, Figure 15f shows the R M S E S distribution of S T 05 c m . Median R M S E S values ranking from highest to lowest, for PSO, are as follows: PMO (7.5) > PKGE (6.9) > NSES (6.6) > CCS (4) > EKGE (3) > RMSES (2.9) > MAES (2.7) > EMO (2); for SCE: PKGE (7) > PMO (6.8) > NSES (6.4) > CCS (3.8) > EKGE (3.3) > MAES (2.7) > RMSES (2.5) > EMO (2.3). EMO in both PSO and SCE has the lowest median R M S E S , whereas PMO in PSO and PKGE in SCE have the highest.
Figure 15g presents temporal C C S variations of S T 05 c m . CTR displays strong diurnal fluctuations between −0.7 and 0.7. Overlapping diurnal error amplitudes complicate performance (Figure S3-3-2). Notably, EMO, MAES, and RMSES in both PSO and SCE exceed CTR’s extremes, demonstrating superior performance. Moreover, Figure 15h depicts the overall C C S distribution of S T 05 c m . The median C C S ranking from highest to lowest, for PSO, is as follows: EMO > RMSES > MAES > NSES > PMO > PKGE > CCS > EKGE; for SCE: RMSES > CCS > EMO > MAES > EKGE > NSES > PKGE > PMO. Notably, EMO in PSO and RMSES in SCE have the highest median C C S , while EKGE in PSO and PMO in SCE have the lowest.
Overall, for S M 05 c m , EKGE in both PSO and SCE has the lowest median R M S E S and the highest median C C S , whereas CCS and PKGE behave oppositely. For S T 05 c m , EMO in both methods has the lowest median R M S E S , whereas PMO in PSO and PKGE in SCE have the highest. In addition, EMO in PSO and RMSES in SCE have the highest median C C S , while EKGE in PSO and PMO in SCE have the lowest. Generally, for S M 05 c m , EKGE has the best R M S E S and C C S performances, while for S T 05 c m , EMO in PSO and RMSES in SCE have the best R M S E S and C C S performances, respectively. Nevertheless, EMO has shown no weakness performance, while EKGE in PSO has the lowest similarity for S T 05 c m , which is consistent with the EMO’s advantages during the calibration period (see Section 4.2.3 on “Spatial Difference and Similarity”).

4.3.3. Surface States Intercomparison

Figure 16 presents Taylor diagram plots of calibrated and CTR simulations of S M 05 c m during the forecast period, compared with observations and/or GLDAS data, across various metrics. For the comparison of S M 05 c m simulations with observations (Figure 16a), CTR exhibits a root mean square difference ( r m s d ) greater than 0.02, surpassing other simulated metrics and GLDAS. However, the correlation coefficient ( c c ) between CTR and observations is above 0.5, outperforming other simulations and GLDAS except for EKGE and EMO metrics. Additionally, CTR’s standard deviation ( s t d ) reaches approximately 0.03, significantly higher than that of other simulated metrics and GLDAS. Thus, EKGE and EMO metrics, when applied in PSO and/or SCE, effectively improve the simulation of S M 05 c m . In the comparison of S T 05 c m with observations (Figure 16b), CTR, GLDAS, and multiple simulations demonstrate no skill. Nevertheless, like S M 05 c m , simulations using EKGE and EMO metrics consistently yield the lowest r m s d and s t d , as well as the highest c c among all evaluated metrics.
Furthermore, for the comparison of sensible heat flux (HFX) with GLDAS (Figure 16c), CTR displays higher r m s d and lower c c than other simulated metrics, albeit with a relatively low s t d . This suggests that while most other metrics’ HFX simulations outperform CTR in terms of r m s d and c c , their s t d values are relatively increased, with EKGE and EMO ranking as the top two in both PSO and SCE for s t d . In contrast, for the comparison of latent heat flux (LH) with GLDAS (Figure 16d), CTR exhibits lower r m s d and higher c c than other simulated metrics but with a relatively high s t d . Notably, CTR’s LH simulation surpasses other metrics in both r m s d and c c . Specifically, EKGE and EMO rank as the top two for both s t d and r m s d in both PSO and SCE, which is a notable contrast to the findings for HFX.
Overall, for S M 05 c m and S T 05 c m , EKGE and EMO exhibit high Taylor diagram skills (briefed as TDS hereafter) in both PSO and SCE, significantly outperforming CTR. However, when compared with GLDAS, the TDS of HFX for all metrics in both PSO and SCE is superior to CTR, whereas the performances of LH show the opposite. Evidently, the enhancement of surface soil moisture and temperature simulations often yield more divergent surface flux simulation, indicating either the high complexity of modeling both the surface states and the surface fluxes in arid regions or the biased LH in GLDAS.

4.4. Configure and Benefit

Figure 17 compares the parameter ranges of the “best metric’s simulations” between PSO and SCE, alongside the k g e values of various metrics for surface soil moisture simulations against observations. It is observable that in PSO, the optimal parameter range of EMO is larger than that of EKGE, whereas the opposite holds true for SCE, where EMO’s optimal parameter range is smaller than EKGE’s (Figure 17A). The k g e values of S M 05 c m from optimal simulations of different metrics indicate that in PSO, EKGE achieves the highest k g e value, whereas in SCE, EMO attains the peak (Figure 17a). For S T 05 c m , however, EKGE’s optimal simulation yields the highest k g e value in both PSO and SCE (Figure 17c). In terms of forecasted S M 05 c m , EKGE consistently produces the highest k g e values in both PSO and SCE (Figure 17b). Conversely, for S T 05 c m , CCS achieves the highest k g e values in both PSO and SCE, with EKGE following closely (Figure 17d).
Figure 18 illustrates the changes in R M S E S reductions and C C S increases of the simulations for various metrics and CTR during the calibration and validation periods. For S M 05 c m , during the calibration period, most metrics, except CCS, exhibit a reduction in R M S E S compared to CTR, with EMO and EKGE showing the most significant improvements (Figure 18a), which is also reflected in their highest C C S (Figure 18e). During the validation period, EKGE and EMO stand out among the metrics, excluding CCS and PKGE, in terms of R M S E S reduction (Figure 18b), again accompanied by the highest C C S (Figure 18f). For S T 05 c m , during the calibration period, R M S E S reductions relative to CTR are observed for most metrics except PKGE, PMO, and NSES, with MAES, RMSES, and EMO demonstrating the most pronounced improvements (Figure 18c), which also correspond to the highest C C S (Figure 18g). Similarly, during the validation period, R M S E S reductions are observed for most metrics except PKGE, PMO, and NSES, with MAES, RMSES, and EMO continuing to show the most significant improvements (Figure 18d), accompanied by the highest C C S (Figure 18h).
Overall, the parameter uncertainty range of EMO is slightly smaller than that of EKGE in both PSO and SCE, but the two metrics exhibit a trade-off in terms of R M S E S reduction and C C S increase during the forecast and calibration periods for S M 05 c m and S T 05 c m . Specifically, EKGE shows the greatest R M S E S reduction for both calibration and forecast periods of S M 05 c m and the largest C C S increase during the forecast period of S M 05 c m . Notably, EMO demonstrates the largest R M S E S reduction and C C S increase for both forecast and calibration of S T 05 c m , while EKGE performs poorest. This is notably different from the clear advantage of EKGE observed in a nearby study [9], which can be attributed to the use of four layers in all objective metrics in this study. This suggests that the EKGE metric with different vertical dimensions (number of layers) can significantly impact the improvement capability of S T 05 c m forecasting. Additionally, for S T 05 c m , the failure of PKGE and PMO during both the forecast and calibration periods (e.g., even inferior to CTR) indicates the ineffectiveness of using surface-layer-dominated Pareto objectives and highlights the limitations of adjusting subsurface simulations through improvements in surface simulations within the Noah LSM.

5. Discussion

Though a comparative analysis of eight kinds of the introduced multi-objective calibration has effectively portrayed the multifaceted impacts of metric differences in the SM—ST calibration objective on the month-long calibration–forecast framework, offering insights for ST modeling and forecasting over semi-arid regions, this study is nevertheless subject to limitations: (1) the imperfect datasets such as the unavailability of the site scale forcing data [27,72], which may place certain spatial effects on the simulations, and (2) the absence of solutions for calibration schemes based on the dominant Pareto metrics (e.g., PKGE and PMO) (Figure 7), which might indicate a need for extended search time or fewer physical constraints [42,43].
The uncertainty of optimal parameters associated with PSO is higher than that of SCE for all metrics (Figure 6), indicating the SCE’s advantage in solving parameter uncertainties, which is consistent with the low dimensional parameter optimizations [31,45,46,47,48]. Furthermore, the generally opposite tradeoff performances in SM—ST calibration between “General” and “Vegetation” types (Section 4.2.1) could suggest that on one hand, the greater advantages of the “General” type in improving most surface simulations than other types, which is consistent with many previous studies [7,17,22,23,25,26], although “General” parameters are almost unobservable. On the other hand, the partially observational “Vegetation” parameters (e.g., vegetation remote sensing) should be made variable in the LSM to promote simulation [8]. In particular, EKGE comprehensively performs best among all metrics in reducing parameter spatial heterogeneities and uncertainties during calibration. Moreover, significant parameter heterogeneities are observed across different objective metrics with one algorithm, while the parameter heterogeneities are relatively closer across different algorithms with one metric (Figure S1-1 and Table 2), showing that the parameter heterogeneities are more likely affected by the objective metric rather than the algorithm itself.
Moreover, the rapid convergence observed in CCS and NSES, as well as the non-convergence in PKGE and PMO, likely signify the presence of locality and sub-optimality, respectively, in the numerical solutions [42,43]. Furthermore, substantial variations in success rates across different metrics are evident in PSO, whereas minimal changes are observed in SCE (Figure 8), suggesting that the evolutionary capability of the PSO algorithm is significantly affected by metric differences, which should contribute to the PSO’s evolution mechanism performances [9,68,69]. Notably, the performance differences resulting from different metrics with one algorithm are usually found to be more significant than those performance differences resulting from different algorithms with one metric; this is especially evident in both the fitness curves (Figure 7) and optimal objective (Figure 9), showing the metrics’ more profound impacts on the calibration performances in terms of convergence efficiencies and objective spatial heterogeneities than the algorithm itself, which suggests the significance of metric selection in parameter optimization.
In addition, the metrics’ surface soil modeling (Table 6 and Table 7) and surface spatial complexity (Figure 15) differences during forecasting are mostly consistent with those during calibration (Table 4 and Table 5, Figure 12), e.g., EMO showing the most promising performances compared to other metrics, which indicates the calibration’s robustness. Furthermore, EKGE and EMO have shown greater TDS advantages in surface soil and HFX states compared to other metrics (Figure 16). Moreover, EMO and EKEG have similar parameter uncertainties (Figure 17) but with complex tradeoffs in promoting surface simulations (Figure 18), which should be likely attributed to the diversities of the objective’s internal metrics. In particular, compared to the EKGE’s advantages in the previous study [9], the EKGE’s weaknesses in S T 05 c m modeling and complexity reduction here can be attributed to the EKGE’s vertical dimension extension. This demonstrates the great vertical heterogeneity issues in regional land surface modeling [28], e.g., one parameter against four different layers, and the necessity of stratification in calibration objectives [38,39,40,41]. Nevertheless, EMO is not the overall best, but its relatively balanced performance with no notable weaknesses could simplify the vertical heterogeneity issues to some extent.
Overall, that the objective metrics’ diversity has greater influences on most performances of the calibration–forecast framework than the GSA itself, e.g., parameter heterogeneity, calibration convergence, surface spatial complexity, and so on, has reemphasized a favorable metric against a specific LSM calibration, whereas EMO could be a better SM—ST objective. Given EMO’s balanced performances in solving various complexities of the calibration–forecast framework, it can likely enhance the numerical applications against various soil observations [76], e.g., regional soil data assimilation schemes [77,78,79]. In addition, future work should strengthen the application of EMO against the greatly developed land remote sensing dataset [31,32,33,34,35] and machine learning method [80,81,82] recently to improve the spatial representativeness of surface states in regional numerical weather and/or climate modeling.

6. Conclusions

The surface spatial conditions are crucial for both regional hydrology and weather; however, the high spatial complexities in improving surface forecasting have challenged the LSM calibration–forecast framework, whose objective metrics’ credits need comprehensive investigation to meet the robust application. By using the ITPCAS dataset from 1 April to 31 July 2014, this study investigates the performance of various multi-objective metrics that are combined with the multi-parameter tables as the criteria of GSA on enhancing the Noah LSM calibration and forecasting. Comprehensive comparisons are conducted among these enhancements, such as for the optimal land parameters, objectives, and simulations, and the objective-informed forecasts that are brought by these different metrics, to identify the effect of metric diversity on SM—ST calibration and surface forecasts. The results show the following:
(1) The case presented herein can be succinctly characterized as a configuration–forecasting problem. Initially, in terms of model configuration, the forcing manifestations encompass locally elevated surface temperatures (>5 °C), low relative humidity (<1%), feeble wind speeds (<5 m s−1), a shift in wind direction from south to north, low atmospheric pressure (586 hPa), and minor hourly rainfall intensities (<5 mm h−1). Subsequently, within the default model parameter configuration, significant spatial disparities emerge in the parameter space due to the static parameters being set as the globally optimal defaults, while the initial parameters are derived from forecasts spanning the preceding three months. The surface forecasting challenge manifests in the form of consistent S M 05 c m simulations but poor S T 05 c m simulations. Considering the variations in default model parameters across the calibration and forecast periods, these periods are analyzed separately. Specifically, during the calibration and forecasting periods, the slope/goodness of fit (s/r2) values for S M 05 c m under default parameter configuration are 0.22/0.14 and 0.15/0.05, respectively, with Gaussian fits of simulation errors exhibiting positive skew distributions centered at 0.16 and 0.13 m3.m−3. In contrast, the s/r2 values for the S T 05 c m simulations are −0.48/0.17 and −0.4/0.15, with their errors displaying broader bi-modal distributions.
(2) During the calibration period, the “General” and “Vegetation” types have generally shown opposite tradeoffs in LSM parameter calibration performance in terms of heterogeneity and uncertainty. SCE consistently achieves lower parameter uncertainty than PSO, albeit at the cost of relatively higher spatial heterogeneity. For parameter uncertainty, MAES in PSO and CCS in SCE exhibit the smallest for each metric. For parameter heterogeneity, EKGE and EMO in PSO yield the smallest, while EKGE in SCE displays the smallest. Furthermore, apart from PKGE and PMO, for other metrics, PSO typically exhibits better optimal solutions compared to SCE, albeit at the cost of relatively lower efficiency. Notably, for CCS, EKGE, and RMSES, the optimal solutions obtained by PSO demonstrate higher kernel densities than those by SCE, while for EMO and MAES, the performance trend is reversed. Additionally, EMO and EKGE in PSO (SCE) with max s(r2) values as 0.96 (0.83) for S M 05 c m and 0.18 (0.23) for S T 05 c m , respectively, showcase the optimal linear modeling (Table 4). For Gaussian fitting of E O S of S M 05 c m , EMO and EKGE in both PSO and SCE are closest to the normal distribution, whereas for that of S T 05 c m , EKGE in PSO and MAES in SCE exhibit the closest resemblance to normality (Table 5). Moreover, for S M 05 c m , EKGE in PSO and EMO in SCE have the best R M S E S and C C S performances, respectively, while for S T 05 c m , EMO in both PSO and SCE has the best R M S E S and C C S performances (Figure 12).
(3) During the forecast period, EKGE and MAES in PSO (SCE) with the max s(r2) values as 0.98 (0.84) for S M 05 c m and 0.14 (0.15) for S T 05 c m , respectively, showcase the best linear modeling (Table 6). For E O S of S M 05 c m , EMO and EKGE in both PSO and SCE are closest to the normal distribution, whereas for that of S T 05 c m , EKGE in PSO and MAES in SCE exhibit the closest resemblance to normality (Table 7). For S M 05 c m , EKGE has the best R M S E S and C C S performances, while for S T 05 c m , EMO in PSO and RMSES in SCE have the best R M S E S and C C S performances, respectively. Furthermore, for S M 05 c m and S T 05 c m , EKGE and EMO exhibit high Taylor skills in both PSO and SCE, significantly outperforming CTR. However, when compared with GLDAS, the Taylor skills of HFX for all metrics in both PSO and SCE are superior to CTR, whereas the performances of LH show the opposite.
(4) The parameter uncertainty range of EMO is slightly smaller than that of EKGE in both PSO and SCE, but the two metrics exhibit a trade-off in terms of R M S E S reduction and C C S increase during the forecast and calibration periods for S M 05 c m and S T 05 c m . However, due to the failure of vertical dimension expansion, EKGE performs poorly in S T 05 c m improvement. Eventually, EMO showcases the overall benefit in surface forecast improvement among all metrics and with hopeful low parameter uncertainties, which shows the most promising application performance.
It is noted that for optimal parameters, MAES in PSO and CCS in SCE exhibit the lowest uncertainties. EKGE and EMO in PSO and EKGE in SCE yield the smallest heterogeneity, while other metrics demonstrate nearly irregular or non-discriminatory patterns. For optimal solutions, apart from Pareto-dominant metrics (e.g., PKGE and PMO), other metrics do not alter the generality of GSA algorithms (such as effectiveness and convergence domain). Notably, although CCS and NSES can accelerate GSA’s convergence, their impacts on model calibration and prediction remain highly uncertain or even negative. Furthermore, regarding optimal modeling performance for calibration and forecast compared to observations, substantial variations exist among different metrics. Among them, EMO and EKGE yield the best S M 05 c m modeling abilities, while EKGE and MAES exhibit the best S T 05 c m modeling abilities, respectively. For Gaussian fitting of E O S of S M 05 c m , EMO and EKGE perform optimally, while for that of S T 05 c m , EKGE in PSO and MAES in SCE demonstrate the best performance. For spatial complexities reduction, EKGE performs best for S M 05 c m , while EMO excels for S T 05 c m .
Generally, the opposite tradeoffs of LSM calibrations between “General” and “Vegetation” types could suggest the “General” parameters’ favorability in LSM modeling physics [7,17,22,23,25,26] and the “Vegetation” parameters’ eagerness in observational demand of themselves [8], respectively. SCE achieves fewer parameter uncertainties than PSO, but possibly at the cost of higher parameter heterogeneities, while PSO achieves better optimal objectives than SCE, but likely at the cost of calibration efficiency [9]. Notably, most performance differences of the calibration–forecast framework resulting from the metrics’ differences suppress those resulting from the GSA’s differences, which has emphasized the significance of the effects of the objective metrics’ diversity. EKGE and EMO have shown low parameter heterogeneities, which should account for their best calibration–forecast performances of S M 05 c m . However, EKGE has also shown poor calibration–forecast performances of S T 05 c m , which should be attributed to the unsolved vertical heterogeneities [28]. In particular, EMO’s relatively balanced performances with no notable weaknesses related to solving various surface spatial complexities could greatly benefit the calibration–forecast framework.
Overall, the objective metrics’ diversity has shown greater influence on the calibration–predication framework than the GSA’s differences. Furthermore, EKGE and EMO favor parameter heterogeneity and simulation complexity reduction, respectively. However, due to the great vertical heterogeneities [28], the holistic SM—ST objective that perfectly matches the observational dimensions could generally be less effective, which could be hopefully addressed by the calibration objectives that consider the vertical stratification of metrics [38,39,40,41]. In this study, given that EMO’s balanced performances, it should be recommended to be applied in future calibration to mitigate the vertical heterogeneity issue. This could enhance the knowledge of metrics’ advantages in solving the complexities of the LSM’s parameters and simulations and promote the application of the calibration–forecast framework, thereby potentially improving regional surface forecasting over semiarid regions.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/atmos15091107/s1: Figure S1: Effects on optimal parameters; Figures S2: Effects on optimal simulations; Figures S3: Effects on forecasts’ improvement.

Author Contributions

Conceptualization, methodology, validation, and formal analysis, Y.G. (Yakai Guo); investigation, resources, and data curation, Y.G. (Yakai Guo), C.S., G.N. and Y.G. (Yong Gao); writing—original draft preparation, Y.G. (Yakai Guo) and C.S.; writing—review and editing, Y.G. (Yakai Guo), C.S., G.N., D.X. and Y.G. (Yong Gao); visualization, G.N.; supervision, Y.G. (Yakai Guo) and C.S.; project administration, Y.G. (Yakai Guo) and B.Y.; funding acquisition, C.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Henan Provincial Natural Science Foundation Project (grant numbers: 242300421367, 222300420468), the China Meteorological Administration Meteorological Development and Planning Institute Special Research Project (grant number: JCXM2024014), and the Open Project of KLME CIC-FEMD NUIST (grant numbers: KLME201906, KLME202407).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The multiscale observation network on the central Tibetan Plateau (2010–2021) soil temperature and moisture data, and the China meteorological forcing data set (1979–2018) used in this study are openly available from the National Tibetan Plateau Data Center (TPDC; https://data.tpdc.ac.cn, accessed on 22 July 2024). The GLDAS land reanalysis data used in this study are openly available from Goddard Earth Sciences Data and Information Services Center (https://disc.gsfc.nasa.gov, accessed on 20 July 2024). The experimental data presented in this study are available on request from the corresponding author, and the data are not publicly available due to privacy concerns and ongoing research using the dataset.

Acknowledgments

We would like to express our gratitude to the Henan Meteorological Bureau, the China Meteorological Administration Meteorological Observation Centre, and the China Meteorological Administration Meteorological Development and Planning Institute for their support in carrying out this study. We would like to express our special thanks to the anonymous reviewers for their valuable feedback that greatly improves this study. Also, we would like to give our many thanks to those who made efforts to advance this work and to the fellow travelers encountered along the way.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Min, J.; Guo, Y.; Wang, G. Impacts of Soil Moisture on Typical Frontal Rainstorm in Yangtze River Basin. Atmosphere 2016, 7, 42. [Google Scholar] [CrossRef]
  2. Li, K.; Zhang, J.; Wu, L.; Yang, K.; Li, S. The role of soil temperature feedbacks for summer air temperature variability under climate change over East Asia. Earth’s Future 2022, 10, e2021EF002377. [Google Scholar] [CrossRef]
  3. García-García, A.; Cuesta-Valero, F.J.; Miralles, D.G.; Mahecha, M.D.; Quaas, J.; Reichstein, M.; Zscheischler, J.; Peng, J. Soil heat extremes can outpace air temperature extremes. Nat. Clim. Chang. 2023, 13, 1237–1241. [Google Scholar] [CrossRef]
  4. Guo, Y.; Shao, C.; Su, A. Investigation of Land-Atmosphere Coupling during the Extreme Rainstorm of 20 July 2021 over Central East China. Atmosphere 2023, 14, 1474. [Google Scholar] [CrossRef]
  5. Gao, Y.; Li, K.; Chen, F.; Jiang, Y.; Lu, C. Assessing and improving Noah-MP land model simulations for the central Tibetan Plateau. J. Geophys. Res. Atmos. 2015, 120, 9258–9278. [Google Scholar] [CrossRef]
  6. Li, C.; Lu, H.; Yang, K.; Han, M.; Wright, J.; Chen, Y.; Yu, L.; Xu, S.; Huang, X.; Gong, W. The Evaluation of SMAP Enhanced Soil Moisture Products Using High-Resolution Model Simulations and In-Situ Observations on the Tibetan Plateau. Remote Sens. 2018, 10, 535. [Google Scholar] [CrossRef]
  7. Chen, Y.; Yang, K.; He, J.; Qin, J.; Shi, J.; Du, J.; He, Q. Improving land surface temperature modeling for dry land of China. J. Geophys. Res. Atmos. 2011, 116, D20104. [Google Scholar] [CrossRef]
  8. He, Q.; Lu, H.; Yang, K.; Zhao, L.; Zou, M. Improving Land Surface Temperature Simulation of NOAH-MP on the Tibetan Plateau. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 6217–6220. [Google Scholar]
  9. Guo, Y.; Yuan, B.; Su, A.; Shao, C.; Gao, Y. Calibration for Improving the Medium-Range Soil Temperature Forecast of a Semiarid Region over Tibet: A Case Study. Atmosphere 2024, 15, 591. [Google Scholar] [CrossRef]
  10. Gupta, H.V.; Kling, H.; Yilmaz, K.K.; Martinez, G.F. Decomposition of the mean squared error and NSE performance criteria, Implications for improving hydrological modelling. J. Hydrol. 2009, 377, 80–91. [Google Scholar] [CrossRef]
  11. Kumar, S.; Kolassa, J.; Reichle, R.; Crow, W.; Lannoy, G.; Rosnay, P.; MacBean, N.; Girotto, M.; Fox, A.; Quaife, T.; et al. An agenda for land data assimilation priorities, Realizing the promise of terrestrial water, energy, and vegetation observations from space. J. Adv. Model. Earth Syst. 2022, 14, c2022MS003259. [Google Scholar] [CrossRef]
  12. Ma, Y.M.; Yao, T.D.; Zhong, L.; Wang, B.B.; Xu, X.D.; Hu, Z.Y.; Ma, W.Q.; Sun, F.L.; Han, C.B.; Li, M.S.; et al. Comprehensive study of energy and water exchange over the Tibetan Plateau: A review and perspective: From GAME/Tibet and CAMP/Tibet to TORP, TPEORP, and TPEITORP. Earth-Sci. Rev. 2023, 237, 104312. [Google Scholar] [CrossRef]
  13. Kennedy, J.; Eberhart, R.C. Particle swarm optimization. In Proceedings of the ICNN’95—International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar]
  14. Duan, Q.S.; Soroosh, S.; Gupta, V. Optimal use of the SCE-UA global optimization method for calibrating watershed models. J. Hydrol. 1994, 158, 265–284. [Google Scholar] [CrossRef]
  15. Liu, Y.Q.; Gupta, H.V.; Sorooshian, S.; Bastidas, L.A.; Shuttleworth, W.J. Exploring parameter sensitivities of the land surface using a locally coupled land-atmosphere model. J. Geophys. Res. Atmos. 2004, 109, D21101. [Google Scholar] [CrossRef]
  16. Bastidas, L.A.; Hogue, T.S.; Sorooshian, S.; Gupta, H.V.; Shuttleworth, W.J. Parameter sensitivity analysis for different complexity land surface models using multicriteria methods. J. Geophys. Res. Atmos. 2006, 111, 20101. [Google Scholar] [CrossRef]
  17. Peng, F.; Sun, G.D. Identifying Sensitive Model Parameter Combinations for Uncertainties in Land Surface Process Simulations over the Tibetan Plateau. Water 2019, 11, 1724. [Google Scholar] [CrossRef]
  18. Gudmundsson, L.; Cuntz, M. Soil Parameter Model Intercomparison Project (SP-MIP): Assessing the influence of soil parameters on the variability of Land Surface Models. In Proceedings of the GEWEX–SoilWat Workshop, Leipzig, Germany, 28–30 June 2016; pp. 1–6. Available online: https://www.gewexevents.org/wp-content/uploads/GLASS2017_SP-MIP_Protocol.pdf (accessed on 30 August 2024).
  19. Chaney, N.W.; Herman, J.D.; Ek, M.B.; Wood, E.F. Deriving global parameter estimates for the Noah land surface model using FLUXNET and machine learning. J. Geophys. Res. Atmos. 2016, 121, 13218–13235. [Google Scholar] [CrossRef]
  20. Zeng, Y.; Anne, V.; Or, D.; Cuntz, M.; Gudmundsson, L.; Weihermueller, L.; Kollet, S.; Vanderborght, J.; Vereecken, H. GEWEX-ISMC SoilWat Project:Taking Stock and Looking Ahead. In Proceedings of the GEWEX GLASS Meeting, Online, 23–25 November 2020; pp. 4–9. Available online: https://gewex.org/gewex-content/files_mf/1633983474Q22021.pdf (accessed on 30 August 2024).
  21. Stephens, G.; Polcher, J.; Zeng, X.B.; van Oevelen, P.; Poveda, G.; Bosilovich, M.; Ahn, M.H.; Balsamo, G.; Duan, Q.Y.; Hegerl, G.; et al. The First 30 Years of GEWEX. Bull. Am. Meteorol. Soc. 2023, 104, E126–E157. [Google Scholar] [CrossRef]
  22. Zhao, X.; Liu, C.; Tong, B.; Li, Y.; Wang, L.; Ma, Y.; Gao, Z. Study on Surface Process Parameters and Soil Thermal Parameters at Shiquanhe in the Western Qinghai-Xizang Plateau. Plateau Meteorol 2021, 40, 711–723. (In Chinese) [Google Scholar]
  23. Sun, S.; Chen, B.; Che, T.; Zhang, H.; Chen, J.; Che, M.; Lin, X.; Guo, L. Simulating the Qinghai—Tibetan Plateau seasonal frozen soil moisture and improving model’s parameters—A case study in the upper reaches of Heihe River. Plateau Meteorol. 2017, 36, 643–656. (In Chinese) [Google Scholar]
  24. Chen, F.; Dudhia, J. Coupling an advanced land-surface/hydrology model with the Penn State/NCAR MM5 modeling system. Part I, Model implementation and sensitivity. Mon. Weather. Rev. 2001, 129, 569–585. [Google Scholar] [CrossRef]
  25. Hogue, T.S.; Bastidas, L.A.; Gupta, H.V.; Sorooshian, S. Evaluating model performance and parameter behavior for varying levels of land surface model complexity. Water Resour. Res. 2006, 42, W08430. [Google Scholar] [CrossRef]
  26. Rosero, E.; Yang, Z.L.; Gulden, L.E.; Niu, G.Y.; Gochis, D.J. Evaluating Enhanced Hydrological Representations in Noah LSM over Transition Zones: Implications for Model Development. J. Hydrometeorol. 2009, 10, 600–622. [Google Scholar] [CrossRef]
  27. Yang, K.; Qin, J.; Zhao, L.; Chen, Y.; Tang, W.; Han, M.; Lazhu; Chen, Z.; Lv, N.; Ding, B.; et al. A multi-scale soil moisture and freeze-thaw monitoring network on the third pole. Bull. Am. Meteorol. Soc. 2013, 94, 1907–1916. [Google Scholar] [CrossRef]
  28. Yang, K.; Chen, Y.Y.; Qin, J. Some practical notes on the land surface modeling in the Tibetan Plateau. Hydrol. Earth Syst. Sci. 2009, 13, 687–701. [Google Scholar] [CrossRef]
  29. Coon, E.T.; David Moulton, J.; Painter, S.L. Managing complexity in simulations of land surface and near-surface processes. Environ. Model. Softw. 2016, 78, 134–149. [Google Scholar] [CrossRef]
  30. Fisher, R.A.; Koven, C.D. Perspectives on the Future of Land Surface Models and the Challenges of Representing Complex Terrestrial Systems. J. Adv. Model. Earth Syst. 2020, 12, e2018MS001453. [Google Scholar] [CrossRef]
  31. Crow, W.T.; Wood, E.F.; Pan, M. Multiobjective calibration of land surface model evapotranspiration predictions using streamflow observations and spaceborne surface radiometric temperature retrievals. J. Geophys. Res. Atmos. 2003, 108, 4725. [Google Scholar] [CrossRef]
  32. Coudert, B.; Ottle, C.; Boudevillain, B.; Demarty, J.; Guillevic, P. Contribution of thermal infrared remote sensing data in multiobjective calibration of a dual-source SVAT model. J. Hydrometeorol. 2006, 7, 404–420. [Google Scholar] [CrossRef]
  33. Khaki, M. Land Surface Model Calibration Using Satellite Remote Sensing Data. Sensors 2023, 23, 1848. [Google Scholar] [CrossRef]
  34. Dembélé, M.; Hrachowitz, M.; Savenije, H.H.G.; Mariéthoz, G.; Schaefli, B. Improving the Predictive Skill of a Distributed Hydrological Model by Calibration on Spatial Patterns with Multiple Satellite Data Sets. Water Resour. Res. 2020, 56, e2019WR026085. [Google Scholar] [CrossRef]
  35. Zhou, J.; Wu, Z.; Crow, W.T.; Dong, J.; He, H. Improving Spatial Patterns Prior to Land Surface Data Assimilation via Model Calibration Using SMAP Surface Soil Moisture Data. Water Resour. Res. 2020, 56, e2020WR027770. [Google Scholar] [CrossRef]
  36. Abhervé, R.; Roques, C.; Gauvain, A.; Longuevergne, L.; Louaisil, S.; Aquilina, L.; de Dreuzy, J.-R. Calibration of groundwater seepage against the spatial distribution of the stream network to assess catchment-scale hydraulic properties. Hydrol. Earth Syst. Sci. 2023, 27, 3221–3239. [Google Scholar] [CrossRef]
  37. Adeyeri, O.E.; Folorunsho, A.H.; Ayegbusi, K.I.; Bobde, V.; Adeliyi, T.E.; Ndehedehe, C.E.; Akinsanola, A.A. Land surface dynamics and meteorological forcings modulate land surface temperature characteristics. Sustain. Cities Soc. 2024, 101, 105072. [Google Scholar] [CrossRef]
  38. Cunha, A.P.M.A.; Alvalá, R.C.S.; Sampaio, G.; Shimizu, M.H.; Costa, M.H. Calibration and Validation of the Integrated Biosphere Simulator (IBIS) for a Brazilian Semiarid Region. J. Appl. Meteorol. Clim. 2013, 52, 2753–2770. [Google Scholar] [CrossRef]
  39. Burke, E.J.; Shuttleworth, W.J.; Houser, P.R. Impact of horizontal and vertical heterogeneities on retrievals using multiangle microwave brightness temperature data. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1495–1501. [Google Scholar] [CrossRef]
  40. Hagedorn, B. Hydrograph separation through multi objective optimization: Revealing the importance of a temporally and spatially constrained baseflow solute source. J. Hydrol. 2020, 590, 125349. [Google Scholar] [CrossRef]
  41. Kuban, M.; Parajka, J.; Tong, R.; Pfeil, I.; Vreugdenhil, M.; Sleziak, P.; Adam, B.; Szolgay, J.; Kohnová, S.; Hlavcová, K. Incorporating Advanced Scatterometer Surface and Root Zone Soil Moisture Products into the Calibration of a Conceptual Semi-Distributed Hydrological Model. Water 2021, 13, 3366. [Google Scholar] [CrossRef]
  42. Zitzler, E.; Thiele, L.; Laumanns, M.; Fonseca, C.M.; da Fonseca, V.G. Performance assessment of multiobjective optimizers: An analysis and review. IEEE Trans. Evol. Comput. 2003, 7, 117–132. [Google Scholar] [CrossRef]
  43. Coello, C.A.C.; Lamont, G.B.; Veldhuizen, D.A.V. Evolutionary Algorithms for Solving Multi-Objective Problems Second Edition; Springer: New York, NY, USA, 2007; pp. 1–800. [Google Scholar]
  44. Loridan, T.; Grimmond, C.S.B.; Grossman-Clarke, S.; Chen, F.; Tewari, M.; Manning, K.; Martilli, A.; Kusaka, H.; Best, M. Trade-offs and responsiveness of the single-layer urban canopy parametrization in WRF: An offline evaluation using the MOSCEM optimization algorithm and field observations. Q. J. R. Meteorol. Soc. 2010, 136, 997–1019. [Google Scholar] [CrossRef]
  45. Yapo, P.; Gupta, H.; Sorooshian, S. Multi-objective global optimization for hydrologic models. J. Hydrol. 1998, 204, 83–97. [Google Scholar] [CrossRef]
  46. Vrugt, J.A.; Gupta, H.V.; Bastidas, L.A.; Bouten, W.; Sorooshian, S. Effective and efficient algorithm for multiobjective optimization of hydrologic models. Water Resour. Res. 2003, 39, 1214. [Google Scholar] [CrossRef]
  47. Fenicia, F.; Savenije, H.H.G.; Matgen, P.; Pfister, L. A comparison of alternative multiobjective calibration strategies for hydrological modeling. Water Resour. Res. 2007, 43, W03434. [Google Scholar] [CrossRef]
  48. Deng, L.; Guo, S.; Yin, J.; Zeng, Y.; Chen, K. Multi-objective optimization of water resources allocation in Han River basin (China) integrating efficiency, equity and sustainability. Sci. Rep. 2022, 12, 798. [Google Scholar] [CrossRef] [PubMed]
  49. Dumedah, G.; Berg, A.A.; Wineberg, M. An Integrated Framework for a Joint Assimilation of Brightness Temperature and Soil Moisture Using the Nondominated Sorting Genetic Algorithm II. J. Hydrometeorol. 2011, 12, 1596–1609. [Google Scholar] [CrossRef]
  50. Li, M.; Yang, S.; Liu, X. Pareto or Non-Pareto: Bi-Criterion Evolution in Multiobjective Optimization. IEEE Trans. Evol. Comput. 2016, 20, 645–665. [Google Scholar] [CrossRef]
  51. Liu, Y.; Zhu, N.; Li, K.; Li, M.; Zheng, J.; Li, K. An angle dominance criterion for evolutionary many-objective optimization. Inf. Sci. 2020, 509, 376–399. [Google Scholar] [CrossRef]
  52. Pool, S.; Vis, M.; Seibert, J. Evaluating model performance: Towards a non-parametric variant of the Kling-Gupta efficiency. Hydrol. Sci. J. 2018, 63, 1941–1953. [Google Scholar] [CrossRef]
  53. Knoben, W.J.M.; Freer, J.E.; Woods, R.A. Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores. Hydrol. Earth Syst. Sci. 2019, 23, 4323–4331. [Google Scholar] [CrossRef]
  54. Vrugt, J.A.; de Oliveira, D.Y. Confidence intervals of the Kling-Gupta efficiency. J. Hydrol. 2022, 612, 127968. [Google Scholar] [CrossRef]
  55. Mathevet, T.; Le Moine, N.; Andréassian, V.; Gupta, H.; Oudin, L. Multi-objective assessment of hydrological model performances using Nash–Sutcliffe and Kling–Gupta efficiencies on a worldwide large sample of watersheds. Comptes Rendus Geosci. 2023, 355, 117–141. [Google Scholar] [CrossRef]
  56. Chai, T.; Draxler, R.R. Root mean square error (RMSE) or mean absolute error (MAE)?—Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 2014, 7, 1247–1250. [Google Scholar] [CrossRef]
  57. Hodson, T.O. Root-mean-square error (RMSE) or mean absolute error (MAE): When to use them or not. Geosci. Model Dev. 2022, 15, 5481–5487. [Google Scholar] [CrossRef]
  58. Armstrong, R.A. Should Pearson’s correlation coefficient be avoided? Ophthalmic Physiol. Opt. 2019, 39, 316–327. [Google Scholar] [CrossRef]
  59. Schober, P.; Boer, C.; Schwarte, L.A. Correlation Coefficients: Appropriate Use and Interpretation. Anesth. Analg. 2018, 126, 1763–1768. [Google Scholar] [CrossRef] [PubMed]
  60. Cheng, C.L.; Shalabh; Garg, G. Coefficient of determination for multiple measurement error models. J. Multivar. Anal. 2014, 126, 137–152. [Google Scholar] [CrossRef]
  61. Contessi, D.; Recati, A.; Rizzi, M. Phase diagram detection via Gaussian fitting of number probability distribution. Phys. Rev. B 2023, 107, L121403. [Google Scholar] [CrossRef]
  62. Eberhart, R.C.; Shi, Y. Comparing inertia weights and constriction factors in particle swarm optimization. Proc. IEEE 2000, 1, 84–88. [Google Scholar]
  63. Clerc, M.; Kennedy, J. The particle swarm—Explosion, stability, and convergence in a multidimensional complex space. IEEE Trans. Evol. Comput. 2002, 6, 58–73. [Google Scholar] [CrossRef]
  64. Shami, T.M.; El-Saleh, A.A.; Alswaitti, M.; Al-Tashi, Q.; Summakieh, M.A.; Mirjalili, S. Particle Swarm Optimization: A Comprehensive Survey. IEEE Access 2022, 10, 10031–10061. [Google Scholar] [CrossRef]
  65. Naeini, M.R.; Analui, B.; Gupta, H.V.; Duan, Q.; Sorooshian, S. Three decades of the Shuffled Complex Evolution (SCE-UA) optimization algorithm: Review and applications. Sci. Lranica Transaclions A Civ. Eng. 2019, 26, 2015–2031. [Google Scholar] [CrossRef]
  66. Deng, Y.; Yang, Q.; Zuo, H.; Li, W. Land Surface Model and Particle Swarm Optimization Algorithm Based on the Model-Optimization Method for Improving Soil Moisture Simulation in a Semi-Arid Region. PLoS ONE 2016, 11, e0151576. [Google Scholar] [CrossRef]
  67. Yang, Q.; Ling, C.; Du, B.; Wang, L.; Yang, Y. Application of the particle swarm optimization in the land surface model parameters calibration. Plateau Meteorol. 2017, 36, 1060–1071. (In Chinese) [Google Scholar]
  68. Ketabchi, H.; Ataie-Ashtiani, B. Evolutionary algorithms for the optimal management of coastal groundwater: A comparative study toward future challenges. J. Hydrol. 2015, 520, 193–213. [Google Scholar] [CrossRef]
  69. Jeon, J.-H.; Park, C.-G.; Engel, B. Comparison of Performance between Genetic Algorithm and SCE-UA for Calibration of SCS-CN Surface Runoff Simulation. Water 2014, 6, 3433–3456. [Google Scholar] [CrossRef]
  70. Taylor, K.E. Summarizing multiple aspects of model performance in a single diagram. J. Geophys. Res. Atmos. 2001, 106, 7183–7192. [Google Scholar] [CrossRef]
  71. Rodell, M.; Houser, P.R.; Jambor, U.; Gottschalck, J.; Mitchell, K.; Meng, C.; Arsenault, K.; Cosgrove, B.; Radakovich, J.; Bosilovich, M.; et al. The Global Land Data Assimilation System. Bull. Am. Meteorol. Soc. 2004, 85, 381–394. [Google Scholar] [CrossRef]
  72. Yang, K.; He, J.; Tang, W.; Lu, H.; Qin, J.; Chen, Y.; Li, X. China Meteorological Forcing Dataset (1979–2018). TPDC. 2019. Available online: https://data.tpdc.ac.cn/en/data/8028b944-daaa-4511-8769-965612652c49 (accessed on 30 August 2024).
  73. Yu, Y.; Pi, Y.; Yu, X.; Ta, Z.; Sun, L.; Disse, M.; Zeng, F.; Li, Y.; Chen, X.; Yu, R. Climate change, water resources and sustainable development in the arid and semi-arid lands of Central Asia in the past 30 years. J. Arid. Land 2018, 11, 1–14. [Google Scholar] [CrossRef]
  74. Yu, Y.; Chen, X.; Malik, I.; Wistuba, M.; Cao, Y.G.; Hou, D.D.; Ta, Z.J.; He, J.; Zhang, L.Y.; Yu, R.D.; et al. Spatiotemporal changes in water, land use, and ecosystem services in Central Asia considering climate changes and human activities. J. Arid. Land 2021, 13, 881–890. [Google Scholar] [CrossRef]
  75. Chen, Y.Y.; Yang, K.; Zhou, D.G.; Qin, J.; Guo, X.F. Improving the Noah Land Surface Model in Arid Regions with an Appropriate Parameterization of the Thermal Roughness Length. J. Hydrometeorol. 2010, 11, 995–1006. [Google Scholar] [CrossRef]
  76. Li, M.; Wu, P.; Ma, Z. A comprehensive evaluation of soil moisture and soil temperature from third-generation atmospheric and land reanalysis data sets. Int. J. Clim. 2020, 40, 5744–5766. [Google Scholar] [CrossRef]
  77. Min, J.Z.; Che, L.; Guo, Y.K. Testing and application of a land data assimilation system using automatic weather station data. Trans. Atmos. Sci. 2016, 39, 318–328. (In Chinese) [Google Scholar]
  78. Guo, Y.; Wang, G.; Shen, F.; Min, J. Comparison of two correction schemes on soil moisture assimilation based on the ensemble square root filter. Jiangsu Agric. Sci. 2018, 46, 210–218. (In Chinese) [Google Scholar]
  79. Li, X.; Liu, F.; Ma, C.; Hou, J.; Zheng, D.; Ma, H.; Bai, Y.; Han, X.; Vereecken, H.; Yang, K.; et al. Land Data Assimilation: Harmonizing Theory and Data in Land Surface Process Studies. Rev. Geophys. 2024, 62, e2022RG000801. [Google Scholar] [CrossRef]
  80. Bastrikov, V.; MacBean, N.; Bacour, C.; Santaren, D.; Kuppel, S.; Peylin, P. Land surface model parameter optimization using in situ flux data: Comparison of gradient-based versus random search algorithms (a case study using ORCHIDEE v1.9.5.2). Geosci. Model Dev. 2018, 11, 4739–4754. [Google Scholar] [CrossRef]
  81. Sawada, Y. Machine Learning Accelerates Parameter Optimization and Uncertainty Assessment of a Land Surface Model. J. Geophys. Res. Atmos. 2020, 125, e2020JD032688. [Google Scholar] [CrossRef]
  82. Yu, Y.; Cao, Y.G.; Hou, D.D.; Disse, M.; Brieden, A.; Zhang, H.Y.; Yu, R.D. The study of artificial intelligence for predicting land use changes in an arid ecosystem. J. Geogr. Sci. 2022, 32, 717–734. [Google Scholar] [CrossRef]
Figure 1. The pseudo code of the algorithms used in this study [13,14].
Figure 1. The pseudo code of the algorithms used in this study [13,14].
Atmosphere 15 01107 g001
Figure 2. The pseudo code of the evaluator used in this study.
Figure 2. The pseudo code of the evaluator used in this study.
Atmosphere 15 01107 g002
Figure 3. (A) Noah LSM description. (B) Soil observation network, (a) Tibet and the large scale soil observation network location (box); (b) site locations (filled dots) in the large scale soil observation network, with two types of observation networks (the mesoscale and small scale are in red and blue boxes), roads (white line), and the Naqu city (red asterisk); (c) soil sampling sites (filled dots) in the mesoscale soil network (bold black dots were our study sites).
Figure 3. (A) Noah LSM description. (B) Soil observation network, (a) Tibet and the large scale soil observation network location (box); (b) site locations (filled dots) in the large scale soil observation network, with two types of observation networks (the mesoscale and small scale are in red and blue boxes), roads (white line), and the Naqu city (red asterisk); (c) soil sampling sites (filled dots) in the mesoscale soil network (bold black dots were our study sites).
Atmosphere 15 01107 g003
Figure 4. Flowchart of this study. OBS = observations, SIM = simulations, SM = soil moisture, ST = soil temperature, HFX = sensible heat flux, LH = latent heat flux. The superscript * represents the optima of LSM parameter space here.
Figure 4. Flowchart of this study. OBS = observations, SIM = simulations, SM = soil moisture, ST = soil temperature, HFX = sensible heat flux, LH = latent heat flux. The superscript * represents the optima of LSM parameter space here.
Atmosphere 15 01107 g004
Figure 5. The case overview in the CTR experiment. (ad) The meteorological forcing, derived from Ref. [9]. (e) The threshold normalized default parameters of different sites (colored) for calibration. (f,g) The linear and Gaussian fits of the errors between observation and simulation ( E O S ) for S M 05 c m of different periods (colored, the whole study period was in black, while the calibration and validation periods were in red and blue, respectively). (h,i) were the same as (f,g), except for S T 05 c m .
Figure 5. The case overview in the CTR experiment. (ad) The meteorological forcing, derived from Ref. [9]. (e) The threshold normalized default parameters of different sites (colored) for calibration. (f,g) The linear and Gaussian fits of the errors between observation and simulation ( E O S ) for S M 05 c m of different periods (colored, the whole study period was in black, while the calibration and validation periods were in red and blue, respectively). (h,i) were the same as (f,g), except for S T 05 c m .
Atmosphere 15 01107 g005
Figure 6. The different metrics’ parameter spatial uncertainties. (a) The stacked interquartile ranges (IQR, colored) of different optimal parameters for PSO. (b) is the same as (a), but for SCE. (c) The boxplot of the IQR ensembles (or the IQR distributions; IQRD) of the optimal parameter space for various metrics, and their outlier numbers (d). The cross and asterisk represent the extreme and mild outliers respectively.
Figure 6. The different metrics’ parameter spatial uncertainties. (a) The stacked interquartile ranges (IQR, colored) of different optimal parameters for PSO. (b) is the same as (a), but for SCE. (c) The boxplot of the IQR ensembles (or the IQR distributions; IQRD) of the optimal parameter space for various metrics, and their outlier numbers (d). The cross and asterisk represent the extreme and mild outliers respectively.
Atmosphere 15 01107 g006
Figure 7. The different metrics’ impact on calibration effectiveness and efficiency. Fitness curves of different sites (colored) against Noah runs for metrics CCS (a), EKGE (b), EMO (c), MAES (d), NSES (e), PKGE (f), PMO (g), and RMSES (h) in PSO (solid) and SCE (dashed). Except for PKGE and PMO, whose fitness was Pm, others were Pb.
Figure 7. The different metrics’ impact on calibration effectiveness and efficiency. Fitness curves of different sites (colored) against Noah runs for metrics CCS (a), EKGE (b), EMO (c), MAES (d), NSES (e), PKGE (f), PMO (g), and RMSES (h) in PSO (solid) and SCE (dashed). Except for PKGE and PMO, whose fitness was Pm, others were Pb.
Atmosphere 15 01107 g007
Figure 8. Success rate curves of different sites (colored) against Noah runs for metrics CCS (a), EKGE (b), EMO (c), MAES (d), NSES (e), PKGE (f), PMO (g), and RMSES (h) in PSO (top) and SCE (bottom).
Figure 8. Success rate curves of different sites (colored) against Noah runs for metrics CCS (a), EKGE (b), EMO (c), MAES (d), NSES (e), PKGE (f), PMO (g), and RMSES (h) in PSO (top) and SCE (bottom).
Atmosphere 15 01107 g008
Figure 9. The different metrics’ impact on optimal objective uncertainties against sites for PSO and SCE. The asterisk represents for the outlier.
Figure 9. The different metrics’ impact on optimal objective uncertainties against sites for PSO and SCE. The asterisk represents for the outlier.
Atmosphere 15 01107 g009
Figure 10. Different metrics’ best linear fits against sites for (a) S M 05 c m and (b) S T 05 c m during the calibration period. CRT, PSO, and SCE are plotted in black, red, and blue, respectively.
Figure 10. Different metrics’ best linear fits against sites for (a) S M 05 c m and (b) S T 05 c m during the calibration period. CRT, PSO, and SCE are plotted in black, red, and blue, respectively.
Atmosphere 15 01107 g010
Figure 11. Different metrics’ best Gaussian fits of E O S against sites for (a) S M 05 c m and (b) S T 05 c m during the calibration period. CRT, PSO, and SCE are plotted in black, red, and blue, respectively. In addition, the two typically characterized “amplitudes [peak position, peak width]” in Gaussian fitting are displayed together. Note that two amplitudes with one identical peak could be summed to one amplitude.
Figure 11. Different metrics’ best Gaussian fits of E O S against sites for (a) S M 05 c m and (b) S T 05 c m during the calibration period. CRT, PSO, and SCE are plotted in black, red, and blue, respectively. In addition, the two typically characterized “amplitudes [peak position, peak width]” in Gaussian fitting are displayed together. Note that two amplitudes with one identical peak could be summed to one amplitude.
Atmosphere 15 01107 g011
Figure 12. The different metrics’ impact on the optimal surface simulation. (a) The temporally varied and (b) the boxplot of R M S E S for S M 05 c m . (c,d) are the same as (a,b) but showing the C C S for S M 05 c m . (eh) are the same as (ad), but for S T 05 c m , note that only the best metric performance is shown in (e,g) to avoid overlaps. The cross and asterisk represent the extreme and mild outliers respectively.
Figure 12. The different metrics’ impact on the optimal surface simulation. (a) The temporally varied and (b) the boxplot of R M S E S for S M 05 c m . (c,d) are the same as (a,b) but showing the C C S for S M 05 c m . (eh) are the same as (ad), but for S T 05 c m , note that only the best metric performance is shown in (e,g) to avoid overlaps. The cross and asterisk represent the extreme and mild outliers respectively.
Atmosphere 15 01107 g012
Figure 13. Different metrics’ best linear fits against sites for (a) S M 05 c m and (b) S T 05 c m during the forecast period. CRT, PSO, and SCE are plotted in black, red, and blue, respectively.
Figure 13. Different metrics’ best linear fits against sites for (a) S M 05 c m and (b) S T 05 c m during the forecast period. CRT, PSO, and SCE are plotted in black, red, and blue, respectively.
Atmosphere 15 01107 g013
Figure 14. Different metrics’ best Gaussian fits of E O S against sites for (a) S M 05 c m and (b) S T 05 c m during the forecast period. CRT, PSO, and SCE are plotted in black, red, and blue, respectively. In addition, the two typically characterized “amplitude [peak position, peak width]” values in Gaussian fitting are displayed together. Note that two amplitudes with one identical peak could be summed to one amplitude.
Figure 14. Different metrics’ best Gaussian fits of E O S against sites for (a) S M 05 c m and (b) S T 05 c m during the forecast period. CRT, PSO, and SCE are plotted in black, red, and blue, respectively. In addition, the two typically characterized “amplitude [peak position, peak width]” values in Gaussian fitting are displayed together. Note that two amplitudes with one identical peak could be summed to one amplitude.
Atmosphere 15 01107 g014
Figure 15. The different metrics’ impact on the soil forecast. (a) The temporally varied and (b) the boxplot of R M S E S of S M 05 c m . (c,d) are the same as (a,b) but showing C C S of S M 05 c m . (eh) are the same as (ad), except for S T 05 c m , note that only the best metric performance is shown in (e,g) to avoid overlaps. The cross and asterisk represent the extreme and mild outliers respectively.
Figure 15. The different metrics’ impact on the soil forecast. (a) The temporally varied and (b) the boxplot of R M S E S of S M 05 c m . (c,d) are the same as (a,b) but showing C C S of S M 05 c m . (eh) are the same as (ad), except for S T 05 c m , note that only the best metric performance is shown in (e,g) to avoid overlaps. The cross and asterisk represent the extreme and mild outliers respectively.
Atmosphere 15 01107 g015
Figure 16. The different metrics’ impact on surface forecast. (a,b) The Taylor diagram against observations for S M 05 c m and S T 05 c m , respectively, and the CTR and GLDAS are shown in cross and asterisk markers, respectively, while PSO and SCE are shown in circles and triangles, respectively. (c,d) The Taylor diagram against GLDAS for HFX and LH, respectively, and the CTR values are shown in cross markers.
Figure 16. The different metrics’ impact on surface forecast. (a,b) The Taylor diagram against observations for S M 05 c m and S T 05 c m , respectively, and the CTR and GLDAS are shown in cross and asterisk markers, respectively, while PSO and SCE are shown in circles and triangles, respectively. (c,d) The Taylor diagram against GLDAS for HFX and LH, respectively, and the CTR values are shown in cross markers.
Atmosphere 15 01107 g016
Figure 17. The best LSM parameters’ configuration (A), and the different metrics’ impact on the k g e indicators of surface simulation (B) in PSO and SCE. Among B, (a,b) represent the discrete distribution (box and scatter) and its density (ridge) of k g e of the calibration and forecast periods, respectively, for S M 05 c m , while (c,d) are the same as (a,b), but for S T 05 c m . Note that all the metrics’ performances are shown in color that align with the legend except CTR in grey.
Figure 17. The best LSM parameters’ configuration (A), and the different metrics’ impact on the k g e indicators of surface simulation (B) in PSO and SCE. Among B, (a,b) represent the discrete distribution (box and scatter) and its density (ridge) of k g e of the calibration and forecast periods, respectively, for S M 05 c m , while (c,d) are the same as (a,b), but for S T 05 c m . Note that all the metrics’ performances are shown in color that align with the legend except CTR in grey.
Atmosphere 15 01107 g017
Figure 18. The different metrics’ impact on the LSM’s spatial difference reduction and similarity increment. (a) Time-varied R M S E S reduction (PSO, solid; SCE, dotted) compared to CTR (left) and the box-plotted R M S E S reduction during the calibration period for S M 05 c m ; (b) is the same as (a), but for the validation period. (c,d) are the same as (a,b), but for S T 05 c m . (eh) are the same as (ad), but for the C C S increments when compared to CTR.
Figure 18. The different metrics’ impact on the LSM’s spatial difference reduction and similarity increment. (a) Time-varied R M S E S reduction (PSO, solid; SCE, dotted) compared to CTR (left) and the box-plotted R M S E S reduction during the calibration period for S M 05 c m ; (b) is the same as (a), but for the validation period. (c,d) are the same as (a,b), but for S T 05 c m . (eh) are the same as (ad), but for the C C S increments when compared to CTR.
Atmosphere 15 01107 g018
Table 1. Description of the objective metrics used in this study.
Table 1. Description of the objective metrics used in this study.
Metric Description Reference Formula *Direction, Optima
CCSCorrelation coefficients [58] 1 n e e n e 1 n l l n l i = 1 n t [ ( s i e , l s n t e , l ¯ ) ( o i e , l o n t e , l ¯ ) ] i = 1 n t s i e , l s n t e , l ¯ 2 · i = 1 n t o i e , l o n t e , l ¯ 2 m a x i m u m , 1
EKGEEnhanced Kling–Gupta efficiency [9] C C ~ 1 n e e n e 1 n l l n l i = 1 n t [ ( s i e , l s n t e , l ¯ ) ( o i e , l o n t e , l ¯ ) ] i = 1 n t s i e , l s n t e , l ¯ 2 i = 1 n t o i e , l o n t e , l ¯ 2 ,
M ~ 1 n e e n e 1 n l l n l s n t e , l ¯ o n t e , l ¯ ,
S T D ~ 1 n e e n e 1 n l l n l i = 1 n t s i e , l s n t e , l ¯ 2 n t 1 n e e n e 1 n l l n l i = 1 n t o i e , l o n t e , l ¯ 2 n t ,
1 C C ~ 1 2 + S T D ~ 1 2 + M ~ 1 2
m a x i m u m , 1
EMOEnhanced multiple objectives 0.25 × 1 n e e n e 1 n l l n l 1 a b s c c + r m s e + 1 n s e + a e m i n i m u m , 0
MAESMean absolute errors [57] 1 n e e n e 1 n l l n l 1 n t i = 1 n t ( | ( s i e , l o i e , l | ) m i n i m u m , 0
NSESNash–Sutcliffe efficiencies [55] 1 n e e n e 1 n l l n l 1 i = 1 n t s i e , l o i e , l 2 i = 1 n t s i e , l s n t e , l ¯ 2 m a x i m u m , 1
PKGEPareto-dominant KGE k g e e , l , e 1 , , n e , l 1 , , n l ;
if k g e e , l + 1 < k g e e , l , d o m i n a t e d ; e l s e , n o n d o m i n a t e d
m a x i m u m , 1
PMOPareto-dominant MO 1 a b s c c e , l + r m s e e , l + 1 n s e e , l + m a e e , l , e 1 , , n e , l 1 , , n l ;
i f 1 a b s c c e , l < 1 a b s c c e , l + 1 , r m s e e , l < r m s e s e , l + 1 , 1 n s e e , l < 1 n s e e , l + 1 , and m a e e , l < m a e e , l + 1 , d o m i n a t e d ; e l s e , n o n d o m i n a t e d
m i n i m u m , 0
RMSESRoot mean square errors [56] 1 n e e n e 1 n l l n l i = 1 n t s i e , l o i e , l 2 n t m i n i m u m , 0
* Note that the superscripts e and l are the variable and layer indexes, respectively, and n e and n l are the total numbers of variables and layers, respectively. For EKGE, the factors C C ~ , M ~ , and S T D ~ indicate the vectored objective statistics such as correlation coefficient, mean value, and standard deviation, respectively. For EMO, PKGE, and PMO, the lower cases (e.g., k g e , c c , n s e , r m s e , and m a e ) represent only one-dimensional objectives in temporal sequences [10,55,56,57,58]. The superiority and inferiority relationship between different objectives in Pareto optimality is determined using the non-dominated sorting method, e.g., the top layer dimensional objective is assumed to be the dominated Pareto solution.
Table 2. Parameter spatial homogeneity for all metrics.
Table 2. Parameter spatial homogeneity for all metrics.
MetricsVegetation (Hp, Hs) *Soil (Hp, Hs)General (Hp, Hs)Initial (Hp, Hs)
CCS0, 0 ~ 0, 0 ~ 1, 0 ~ 0, 1
EKGE0, 01, 32, 24, 2
EMO0, 02, 22, 23, 2
MAES0, 01, 11, 12, 1
NSES0, 00, 00, 02, 0
PKGE0, 00, 00, 00, 0
PMO0,00, 00, 00, 0
RMSES0, 02, 11, 12, 0
* Note that bold numbers indicate the lowest spatial heterogeneities among all metrics, and 0 ~ represents that no sites cross the parameter’s limits, in contrast of 0.
Table 3. Parameter spatial uncertainties comparison for all metrics.
Table 3. Parameter spatial uncertainties comparison for all metrics.
MetricsVegetation
(PNL, ONR) *
Soil
(PNL, ONR)
General
(PNL, ONR)
Initial
(PNL, ONR)
CCSNA, −2NA, −11, −2NA, −1
EKGE2, −12, 22, 12, 1
EMONA, −13, 33, 21, 5
MAESNA, −52, −13, 0NA, 2
NSESNA, −5NA, −1NA, −2NA, 2
PKGENA, −1NA, −2NA, −2NA, −4
PMONA, 01, 3NA, −1NA, 0
RMSESNA, −34, 43, 1NA, 1
* Note that bold numbers indicate the best performance among all metrics.
Table 4. Linear fits between surface soil simulations and observations against sites for all metrics.
Table 4. Linear fits between surface soil simulations and observations against sites for all metrics.
Metrics PSO   S M 05 c m (s, r2) * SCE   S M 05 c m (s, r2) PSO   S T 05 c m (s, r2) SCE   S T 05 c m (s, r2)
CCS0.29, 0.110.03, 0.010, 00.1, 0.01
EKGE0.91, 0.90.73, 0.750.18, 0.030.23, 0.05
EMO0.96, 0.920.83, 0.840.14, 0.10.11, 0.04
MAES0.76, 0.60.44, 0.550.13, 0.050.06, 0.01
NSES0.57, 0.390.25, 0.20.41, 0.050.44, 0.08
PKGE0.19, 0.040.26, 0.110.57, 0.10.56, 0.11
PMO0.68, 0.310.74, 0.480.63, 0.110.51, 0.09
RMSES0.77, 0.570.16, 0.130.12, 0.050.09, 0.02
* Note that bold numbers indicate the best performance among all metrics, while italics indicate a negative slope.
Table 5. Gaussian fits of E O S of surface soil simulations against sites for all metrics.
Table 5. Gaussian fits of E O S of surface soil simulations against sites for all metrics.
Metrics PSO   S M 05 c m (f, c) * SCE   S M 05 c m (f, c) PSO   S T 05 c m (f, c) SCE   S T 05 c m (f, c)
CCS350, −0.04295, 0.11216, 2.13167, 1.07
EKGE1276, 0608, 0142, 4.37204, 2.48
EMO1178, 0386, 0.01170, 0.85206, 1.23
MAES344, 0.01416, 0.02200, −0.06230, 0.88
NSES274, 0.05230, 0.05169, 5.86213, 5.03
PKGE322, 0.08325, 0.11237, 4.91152, 5.01
PMO480, 0.02444, 0.03300, 6.10224, 5.19
RMSES426, −0.02296, 0200, 0.16206, 1.29
* Note that bold numbers indicate the best performance among all metrics.
Table 6. Linear fits between surface soil simulations and observations against sites for all metrics.
Table 6. Linear fits between surface soil simulations and observations against sites for all metrics.
Metrics PSO   S M 05 c m (s, r2) * SCE   S M 05 c m (s, r2) PSO   S T 05 c m (s, r2) SCE   S T 05 c m (s, r2)
CCS−0.32, 0.08−0.07, 0.020.04, 00.15, 0.04
EKGE0.98, 0.840.84, 0.840.04, 0.010.1, 0.04
EMO0.96, 0.780.86, 0.820.09, 0.080.1, 0.07
MAES0.83, 0.580.42, 0.370.14, 0.070.15, 0.07
NSES0.75, 0.450.31, 0.270.45, 0.070.33, 0.05
PKGE0.04, 00.21, 0.140.53, 0.10.54, 0.1
PMO0.52, 0.300.46, 0.310.58, 0.110.46, 0.09
RMSES0.77, 0.560.16, 0.080.13, 0.080.14, 0.09
* Note that bold numbers indicate the best performance among all metrics, while italics indicate a negative slope.
Table 7. Gaussian fits of E O S of surface soil simulations against sites for all metrics.
Table 7. Gaussian fits of E O S of surface soil simulations against sites for all metrics.
Metrics PSO   S M 05 c m (f, c) * SCE   S M 05 c m (f, c) PSO   S T 05 c m (f, c) SCE   S T 05 c m (f, c)
CCS189, 0.15225, 0.07187, 3.2181, −0.38
EKGE383, 0363, 0143, −0.09189, 3.39
EMO416, 0359, 0175, −1.41148, −0.98
MAES359, −0.01284, 0181, 0.49206, 0.29
NSES343, 0.06322, 0.05204, 5.81210, 4.56
PKGE234, 0.13365, 0.06214, 4.9217, 5.69
PMO367, 0.01323, 0.04221, 6.17187, 5.47
RMSES293, −0.02326, 0.01194, 0.55198, 0.32
* Note that bold numbers indicate the best performance among all metrics.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Guo, Y.; Shao, C.; Niu, G.; Xu, D.; Gao, Y.; Yuan, B. Calibration for Improving the Medium-Range Soil Forecast over Central Tibet: Effects of Objective Metrics’ Diversity. Atmosphere 2024, 15, 1107. https://doi.org/10.3390/atmos15091107

AMA Style

Guo Y, Shao C, Niu G, Xu D, Gao Y, Yuan B. Calibration for Improving the Medium-Range Soil Forecast over Central Tibet: Effects of Objective Metrics’ Diversity. Atmosphere. 2024; 15(9):1107. https://doi.org/10.3390/atmos15091107

Chicago/Turabian Style

Guo, Yakai, Changliang Shao, Guanjun Niu, Dongmei Xu, Yong Gao, and Baojun Yuan. 2024. "Calibration for Improving the Medium-Range Soil Forecast over Central Tibet: Effects of Objective Metrics’ Diversity" Atmosphere 15, no. 9: 1107. https://doi.org/10.3390/atmos15091107

APA Style

Guo, Y., Shao, C., Niu, G., Xu, D., Gao, Y., & Yuan, B. (2024). Calibration for Improving the Medium-Range Soil Forecast over Central Tibet: Effects of Objective Metrics’ Diversity. Atmosphere, 15(9), 1107. https://doi.org/10.3390/atmos15091107

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop