1. Introduction
Soil salinization is a widespread and increasingly severe form of land degradation globally, which severely restricts the health of terrestrial ecosystems and the sustainable development of agriculture [
1]. Approximately 33% of the world’s land surface has been degraded [
2]. As a prevalent form of soil degradation globally, soil salinization affects 466.36 million ha of surface soil and 377.68 million ha of subsoil in Asia alone [
3,
4]. The area of salt-affected soils covers approximately 100 million hectares in China, with over 80% remaining undeveloped [
5]. It continuously threatens crop growth and species diversity; therefore, the real-time dynamic monitoring of soil salinization is of paramount importance for food security and the development of ecological agriculture [
6]. Traditional soil salinity monitoring heavily relies on manual sampling using field soil augers and subsequent laboratory physicochemical analyses. Although this observation method provides extremely high calibration accuracy, it is time-consuming, labor-intensive, and costly [
7,
8]. Furthermore, the high spatial heterogeneity of soil salinity across large-scale farmlands is difficult to characterize by this approach. With the rapid development of remote sensing technology, satellite remote sensing, represented by Landsat and Sentinel, has been widely applied due to its broad swath coverage [
9,
10]. However, constrained by the mixed-pixel problem caused by low spatial resolution, and further restricted by susceptibility to cloud and rain obstruction as well as long revisit cycles, satellite remote sensing struggles to satisfy the demands of modern precision agriculture for continuous and micro-scale surface observation [
11]. In recent years, low-altitude unmanned aerial vehicle (UAV) remote sensing platforms have compensated for these technical limitations by virtue of their flexibility, ultra-high spatial resolution, and low operational costs. Consequently, UAV platforms have become an effective tool for monitoring the micro-scale spatial and temporal patterns of soil salinity in farmlands [
12].
Currently, methods for soil salinity inversion based on spectral information are primarily categorized into empirical approaches and machine learning techniques [
13]. Early or fundamental empirical methods typically involve directly selecting the red and near-infrared bands that are highly sensitive to salinity, or constructing vegetation indices and soil salinity indices through algebraic operations to enhance specific spectral absorption features. Subsequently, empirical equations are established by incorporating statistical techniques such as linear regression or multiple stepwise regression [
14]. However, the relationships among micro-scale soil salinity, moisture, and surface reflectance are extremely complex. Empirical methods mainly rely on linear relationships, which oversimplify the actual physical processes, and it is inadequate for comprehensively simulating the nonlinear dynamics between salinity and spectral properties, which often leads to limited monitoring accuracy [
15]. Consequently, machine learning methods, equipped with powerful nonlinear fitting and high-dimensional data processing capabilities, have progressively replaced traditional regression approaches, emerging as the mainstream paradigm for extracting soil salinization [
16]. Advanced machine learning algorithms can handle complex datasets and capture nonlinear relationships. For instance, Random Forest [
17] (RF) can flexibly model complex and deep internal correlations through its tree ensemble structure, while Support Vector Machine [
18] (SVM) and similar algorithms exhibit outstanding performance in the regression and classification of multispectral nonlinear problems based on kernel function hyperplane mapping mechanisms. Compared with traditional empirical methods, these machine learning algorithms demonstrate superior robustness and generalization abilities, and they have achieved significant success in inversion applications within arid and complex coastal saline-alkali environments [
13,
14,
15,
16,
17,
18,
19].
Although machine learning exhibits significant advantages in handling multivariate data, the input feature set is inevitably expanded to enhance weak salinity signals, typically by stacking a massive number of vegetation indices, salinity indices, or multi-source channel features. This practice is highly prone to triggering the curse of dimensionality, introducing severe information redundancy and multicollinearity issues [
20]. Excessive redundant variables not only increase the computational cost of models but also obscure the true contributions of specific bands, thereby degrading the efficiency of model construction and the ultimate prediction accuracy. Therefore, the scientific and prudent selection of the number and types of input variables is of critical importance. Traditional feature selection methods (e.g., correlation analysis, principal component analysis, variable importance in projection, or elastic net algorithms) are mostly constrained to capturing linear features or achieving superficial dimensionality reduction. When confronted with the deep, nonlinear, interactive correlations between soil salinity and multispectral signals, their feature selection capabilities are often highly limited.
Furthermore, the process of salt accumulation and leaching in saline soils exhibits exceedingly strong spatiotemporal variability [
21]. Influenced by this intense spatial variability, limited predictive capabilities are often exhibited by single machine learning methods when confronting complex saline-alkali patches across varying surface environmental gradients. Such single models are highly susceptible to prediction distortion [
22,
23], and it is difficult for them to maintain high and stable monitoring accuracy at regional or field scales. Therefore, the stability and predictive capability of models can be enhanced to a certain extent by combining the advantages of multiple independent machine learning algorithms. Ensemble learning is a widely adopted collective learning technique that has been extensively applied across various machine learning tasks. Superior and more comprehensive supervised learning performance can be achieved through the integration of multiple base learners. In agricultural research, complex surface mixed interferences were successfully isolated by Das et al. [
24] using advanced spectral techniques combined with ensemble machine learning, which significantly improved the monitoring accuracy of soil salinity content. Moreover, it was demonstrated by Wang et al. [
25] that when utilizing high-resolution UAV imagery to estimate heterogeneous soil salinity, the robustness and coefficient of determination of a feature-optimized weighted ensemble learning model far exceeded those of traditional single baseline models. However, current research on constructing such multi-level ensemble architectures in the field of coastal soil salinity inversion remains relatively limited. Simultaneously, the “black-box” nature of models is severely exacerbated by highly integrated model architectures, which hinders their practical wide application and deep mechanistic interpretation. To overcome this challenge, the Shapley Additive Explanations (SHAP) method [
26], rooted in game theory, has been gradually introduced into the interpretation of agricultural remote sensing models in recent years. The absolute marginal impacts of input feature variables on the model output can be comprehensively quantified by this method from both global and local perspectives, thereby providing pure data-driven models with a solid physical and agronomic foundation. Specifically, the black-box decision-making process of a coastal farmland salinity prediction model was successfully elucidated by Jia et al. [
13] using the SHAP interpretability framework. This application not only quantitatively verified the nonlinear physiological response mechanisms between specific spectral index feature thresholds and severe soil salinity stress, but also ensured the absolute internal logical self-consistency of the ensemble algorithm.
In summary, UAV-based soil salinity inversion in coastal saline-alkali areas still faces three key challenges. First, the strong spatial heterogeneity of soil salinity makes it difficult for a single prediction model to maintain stable performance under complex field conditions. Second, the construction of numerous vegetation and salinity indices from UAV multispectral imagery inevitably introduces redundancy and multicollinearity among input features. Third, although ensemble machine-learning models can improve prediction accuracy, their complex structures also increase the difficulty of interpreting the physical meaning of model decisions. To address these challenges, this study develops an integrated and interpretable workflow for UAV-based coastal soil salinity inversion. The workflow combines PSO-SFLA-based feature selection, stacking ensemble learning, and SHAP-based interpretation. The main contributions of this study are as follows: (1) the performance of VIP, MultiSURF, and PSO-SFLA feature selection strategies is systematically compared under the same UAV multispectral dataset; (2) the PSO-SFLA-selected compact feature subset is used to reduce feature redundancy and improve prediction reliability; and (3) SHAP analysis is introduced to explain the contributions of both base learners and selected spectral features. This workflow provides a practical reference for high-resolution soil salinity mapping and dynamic monitoring in coastal agricultural areas.
3. Results
3.1. Comparative Analysis and Optimization of Multiple Spectral Feature Selection Algorithms
Within the high-dimensional spectral feature set comprising a total of 37 dimensions constructed in this study (including 5 primitive bands and 32 vegetation/salinity indices), extensive information redundancy and multicollinearity issues inevitably existed. To obtain the feature subset with the highest value for model prediction, three algorithms VIP, MultiSURF, and PSO-SFLA were independently employed for feature screening and comparative evaluation.
3.1.1. Screening Based on the Linear Projection VIP Algorithm
The Variable Importance in Projection (VIP) method focuses on evaluating the linear variance contribution of explanatory variables to the target response variable. From the generated VIP score plot,
Figure 4 shows that, by utilizing 1.0 as the absolute screening threshold for feature importance, 9 out of the total 37 features (distributed in the orange region of the figure) surpassed the dashed threshold line of 1.0 and were subsequently designated as imperative features; meanwhile, the remaining features (distributed in the cyan region) were discarded as invalid background noise. The stepped distribution of the explanatory power of spectral features for salinity was clearly reflected in this figure. However, the results indicated that although a substantial amount of low-value noise bands was isolated by the VIP method to a certain extent, redundant and highly correlated indices still remained in the selected feature subset, constrained by the underlying global linear assumption of Partial Least Squares Regression (PLSR). This implies that deep multicollinearity among composite spectra in complex surface environments cannot be effectively disrupted by VIP, leading to obvious redundancy that persists in its dimensionality reduction results.
3.1.2. Screening Based on the Nearest Neighbor Distance-Driven MultiSURF Algorithm
The comprehensive distinguishing capability of features between inter-class and intra-class samples in a multidimensional space is evaluated by MultiSURF through calculating local nearest-neighbor sample class distances. Based on
Figure 5, the radial annular distribution of the weights corresponding to the entire 37-dimensional variables could be intuitively observed. The inner red dashed baseline circle within the figure represents the invalid boundary determined by the model, where the weight equals zero; any inward contraction denotes redundant features that exert negative interference on the target prediction. Effective feature variables surpassing the safe threshold of the 95% confidence interval were highlighted in red and retained as the optimal feature subset. Based on an analysis of the selected list, compared to the singular linear orientation of the VIP method, the nonlinear synergistic correlations deep within the hidden layers of the data were indeed significantly captured by MultiSURF. However, it can be revealed from the dense cluster of high-priority red rays in the polar plot that, when confronting sample sets characterized by highly heterogeneous farmland salinity patches and extremely complex micro-scale soil water-salt transport, multiple cross-repetitive spectral indices reflecting highly similar physical salinity stress were inevitably preserved by MultiSURF due to its inherent lack of a global swarm intelligence evolutionary elimination mechanism.
3.1.3. Optimization Based on the Swarm Intelligence-Driven PSO-SFLA
In contrast to the limitations associated with traditional filtering-based algorithms, the advantages of overcoming the spectral multicollinearity problems induced by complex coastal surface environments were demonstrated by PSO-SFLA, which adopts a wrapper-based feature optimization strategy. Through a five-fold cross-validation process, the frequently selected feature variables were confirmed as the final optimal feature subset. In this study, the frequency threshold was set to 5 based on the stability-selection principle. Since 5-fold cross-validation was used, a frequency of 5 indicates that a feature was selected in all five validation folds, corresponding to a 100% selection frequency. This strict criterion was adopted to retain only highly stable variables and to reduce the inclusion of unstable, redundant, or noise-sensitive spectral features. In conjunction with
Figure 6, a stepped distribution regarding the stability of features could be intuitively observed, whereby the sector radius in the polar coordinates directly corresponds to the selection frequency of the designated variables. Subsequent to stability validation, a massive number of bands subject to information redundancy and multicollinearity interference were intercepted within the gray and yellow-orange sectors—representing low frequencies (1 to 4 times)—and subsequently discarded. Conversely, a total of 11 feature (GNDVI, NDSI_reg, SI1, OSAVI, SI1_reg, RDVI, SI-T, NDSI, NIR, WDVI, Red) variables were consistently selected in all 5 cross-validation folds (a full frequency of 5 times). The feature selection results showed that the initial 37 candidate variables were reduced to a compact subset of 11 core variables. This reduction substantially decreased the input dimensionality before model construction and helped mitigate the overfitting risk associated with high-dimensional predictors under a limited sample size. The selected variables retained the most informative spectral, vegetation-index, and texture-related information for soil salinity prediction.
3.2. Testing and Comparative Accuracy Evaluation of Predictive Models
Utilizing the measured soil salinity as the dependent variable and the optimal feature combinations retained by the three different selection methods as independent input variables, a series of surface soil salinity inversion models were developed. Five different algorithms (Ensemble, XGBoost, Ridge, RF, and ET) were integrated into these models, and three distinct feature selection methods (VIP, MultiSURF, and PSO-SFLA) were employed. The accuracy of different salinity inversion models is shown in
Table 3.
From the perspective of model performance, the highest predictive accuracy across all feature selection strategies was exhibited by the nonlinear ensemble model. Under this model framework, the highest determination coefficients (R2) on the test set among all models, reaching 0.650, 0.718, and 0.758 for the VIP, MultiSURF, and PSO-SFLA methods respectively, were achieved. Simultaneously, low SRMSE and favorable RPIQ values were maintained, indicating good predictive performance. In contrast, relatively weaker predictive efficacies were demonstrated by the Ridge, RF, and ET models. The Ridge model, in particular, performed poorly when combined with the MultiSURF method, yielding an R2 of only 0.427. Analyzed from the perspective of feature selection efficacy, the VIP method could only support the test set R2 to reach 0.650 within the ensemble model, with an RPIQ of merely 2.812. In the Ridge regression tests, although an R2 of 0.664 on the training set was achieved when driven by MultiSURF, the R2 on the validation test set was only 0.427. In contrast, because of its optimization-based feature selection strategy, higher accuracy was demonstrated by the PSO-SFLA even within the Ridge model (with an R2 of 0.630 on the training set and 0.598 on the test set). For the remaining models utilizing the PSO-SFLA algorithm, the test validation set R2 values generally ranged from 0.667 to 0.758. Based on the RPIQ indicators, it was also corroborated that the predictive capabilities of all models were enhanced following the application of the PSO-SFLA algorithm. Notably, the Ensemble–PSO-SFLA model achieved an RPIQ value slightly above 3.37, indicating good quantitative prediction performance according to the adopted RPIQ classification criterion. Conversely, a maximum RPIQ value of 3.134 was recorded for models based on the VIP and MultiSURF methods, implying that although their predictive power is limited, they can still be utilized for the preliminary classification of soil salinity.
At the model validation level, the training and testing results showed generally consistent trends, indicating that the adopted validation strategy effectively reduced the risk of overfitting. For example, under the PSO-SFLA feature selection strategy, the RF model achieved R2 values of 0.717 and 0.667 on the training and testing sets, respectively. Similar trends were also observed for ET and XGBoost, suggesting that these models maintained stable predictive performance on the independent testing set. A cross-model comparison further showed that, when dealing with soil salinity inversion under complex and highly variable field conditions, the tree-based nonlinear models achieved higher predictive accuracy than Ridge, which was used as the linear baseline. Specifically, the maximum validation R2 of Ridge was 0.598, whereas the test-set R2 values of ET and XGBoost reached 0.703 and 0.676, respectively. These results indicated that the relationship between surface soil salinity and spectral features was not purely linear, but involved deeper nonlinear response patterns.
It can be distinctly observed from
Figure 7 that the accuracies of all models at low-salinity points are significantly higher than those at high-salinity points. Compared with the VIP and MultiSURF feature selection methods, the scatter point distributions under PSO-SFLA aggregate much more tightly around the fitting line.
3.3. Interpretability Analysis of the Model
3.3.1. Contribution Analysis of PSO-SFLA-Selected Spectral Features
To further interpret the contribution of the spectral features selected by PSO-SFLA, SHAP analysis was performed on the final ensemble model. The analysis focused on the 11 selected variables, including GNDVI, NDSI_reg, SI1, OSAVI, SI1_reg, RDVI, SI-T, NDSI, NIR, WDVI, and Red. The contribution patterns of the 11 spectral features selected by PSO-SFLA in the ensemble model are shown in
Figure 8. Among them, GNDVI showed the highest mean absolute SHAP value, indicating that it had the strongest overall influence on the model output. This result suggests that vegetation-related spectral responses played an important role in soil salinity estimation. Red and NIR also showed relatively high SHAP contributions, ranking only after GNDVI. Their high importance indicates that the final model captured not only vegetation stress information but also direct spectral responses associated with soil background reflectance, surface brightness, and salt-affected soil exposure. In addition, OSAVI, NDSI_reg, RDVI, and WDVI showed moderate contributions, further confirming that vegetation indices provided useful information for predicting soil salinity under UAV multispectral observation conditions. These indices are closely related to vegetation coverage, canopy reflectance, and growth status, which may change under salt stress.
The SHAP values also showed whether each feature increased or decreased the predicted salinity. Positive SHAP values indicate that the corresponding feature increased the predicted soil salinity, whereas negative SHAP values indicate that the feature decreased the predicted salinity. For the dominant features, especially GNDVI, Red, NIR, and OSAVI, the wide distribution of SHAP values suggests that their effects on model prediction varied among samples. This result suggests a nonlinear relationship between multispectral features and soil salinity. Instead, it may have been affected by vegetation cover, soil exposure, moisture conditions, and the spatial heterogeneity of salinity in coastal areas. Meanwhile, SI1, SI1_reg, SI-T, and NDSI showed relatively lower SHAP rankings. This does not mean that these salinity-related indices were unimportant. Rather, their independent marginal contributions in the final ensemble model were weaker than those of vegetation-related indices and original spectral bands. One possible reason is that part of their information overlapped with Red, NIR, and vegetation indices. Another possible reason is that salinity-sensitive indices may interact with vegetation status and soil background conditions in a nonlinear manner.
Overall, the SHAP results indicate that the final model did not rely on a single type of spectral variable. Instead, it integrated information from vegetation indices, salinity-sensitive indices, and original multispectral bands. The relatively high contributions of GNDVI, Red, NIR, OSAVI, and RDVI suggest that soil salinity prediction in the study area was strongly associated with both vegetation stress responses and red–near-infrared reflectance characteristics. Therefore, the SHAP analysis provides an interpretable explanation for the PSO-SFLA feature selection result and further demonstrates that the selected 11 features had clear and differentiated roles in soil salinity prediction.
3.3.2. Meta-SHAP Interpretation of Base Learners in the Ensemble Model
Since the ensemble model adopted in this study was constructed using Ridge, RF, and ET as base learners and XGBoost as the meta-learner, Meta-SHAP analysis was further conducted to reveal how the prediction outputs of the three base learners contributed to the final fused output of the top-layer XGBoost model. As shown in
Figure 9, the prediction output of the RF base learner showed the largest contribution to the final ensemble prediction across all three feature selection methods. Its Meta-SHAP values exhibited the widest distribution range, indicating that RF had a strong influence on the final model output. In general, high RF prediction values tended to increase the final predicted soil salinity, as indicated by the concentration of high-value red points on the positive side of the SHAP axis. In contrast, low RF prediction values tended to reduce the final prediction, as indicated by the concentration of low-value blue points on the negative side of the SHAP axis. This result suggests that RF provided a stable and dominant nonlinear prediction signal for the meta-learner.
Under the VIP feature selection method (
Figure 9a), the ET base learner showed a relatively complex contribution pattern. Low ET prediction values mainly produced negative Meta-SHAP values, whereas some high ET prediction values contributed positively to the final output. However, compared with RF, the distribution range of ET was narrower, and the separation between high-value and low-value samples was less distinct. The Ridge base learner showed no clear directional pattern under the VIP-based feature subset, because both high and low Ridge prediction values corresponded to positive and negative Meta-SHAP values. This indicates that the linear information provided by Ridge was relatively unstable when the VIP-selected features were used.
Under the MultiSURF feature selection method (
Figure 9b), ET showed an inconsistent contribution pattern. Lower ET prediction values were partly associated with positive Meta-SHAP values, whereas higher ET prediction values were partly distributed on the negative side of the SHAP axis. This result suggests that the ET prediction signal was not fully aligned with the final fused output under the MultiSURF-selected feature subset. One possible reason is that the local-neighbor-based feature selection process retained some redundant or collinear variables, which may have caused divergence among the base learners. Similar to the VIP condition, the Ridge base learner still showed no stable contribution direction, indicating that its linear prediction signal contributed limited regular information to the meta-learner.
Under the PSO-SFLA feature selection method (
Figure 9c), the contribution patterns of all three base learners became more consistent. RF, ET, and Ridge all showed a clearer positive association with the final ensemble output. Their high prediction values were mainly distributed on the positive side of the SHAP axis, whereas their low prediction values were mainly distributed on the negative side. This pattern indicates that the feature subset selected by PSO-SFLA improved the consistency among the base learners and reduced potential conflicts in their prediction outputs. Therefore, the PSO-SFLA-selected low-dimensional feature subset not only improved model accuracy, but also enhanced the interpretability and internal consistency of the ensemble model.
3.4. Spatial Distribution Characteristics of Surface Soil Salinity
Based on the selected PSO-SFLA feature optimization strategy combined with the ensemble inversion model, a high-resolution spatial distribution map of soil salinity was generated in this study. The spatial variability of soil salinity levels within the study area is revealed by this inversion map. It is intuitively apparent from the inversion distribution map (
Figure 10) that the salinity levels of the two fields are generally situated within a low-to-medium environment, yet prominent spatial heterogeneity is also exhibited. The soil salinity across the majority of the area is maintained within a lower range (predominantly characterized by large continuously distributed green and yellow patches). However, irregular punctate and small-block accumulations of salinity (predominantly represented by orange and red) appear at the edges of the fields and in areas with uneven topography. This phenomenon, which constitutes localized high-salinity spots, is primarily associated with small-scale microtopography and lateral water flow within the fields; in slightly depressed areas, water converges and subsequently evaporates, causing free salts to easily remain in situ. Conversely, in flat areas with superior drainage, salts are less likely to accumulate excessively.
Notably, the field sampling and UAV imagery acquisition for this experiment were conducted in early December. During this period, winter wheat was in its seedling stage, and the vegetation coverage was relatively low. The shadowing effect exerted by the weak wheat seedling canopy on the surface soil was minimal, thereby enabling pure, high-quality bare-soil reflectance spectral information to be captured by the UAV multispectral imagery. The mixed-pixel interference and spectral saturation effects universally present during the vigorous crop growth periods of summer and autumn were effectively bypassed during this phase. Consequently, the original spectral data utilized for modeling, alongside salinity indices such as NDSI_reg and SI1_reg, were capable of fully exerting their salinity-characterizing functions, thereby maximizing the sensitivity of the model to the background soil salinity information. From the perspective of the laws governing soil water-salt transport, temperatures consistently decline in early winter, and both atmospheric evaporation and crop transpiration are significantly weakened. Distinct from the processes driven by high summer and autumn temperatures—where strong upward capillary rise and surface accumulation of deep salts are induced, precipitating secondary salinization—the low-temperature environment substantially decelerates the process of soil capillary water movement, stabilizing the transport of surface soil salinity. This mechanistically explains the reasons behind the low overall level of inverted salinity and the absence of extensive salt-crust formations with severe salt accumulation on the land surface.
In summary, the spatial distribution pattern delineated in this soil salinity inversion map is the comprehensive manifestation of the multifactorial coupling effects among seasonal climate, farmland vegetation dynamics, and microtopographic differences. Reliable spatial data foundations for targeted salt-leaching and amelioration treatments can be provided by these research findings.
4. Discussion
4.1. The Role of Feature Optimization and Interpretation of Physical Mechanisms
In this study, a significant and systematic divergence trend regarding inversion accuracy was exhibited by models constructed upon the identical ensemble framework but adopting varying feature screening schemes (Ensemble-PSO-SFLA > Ensemble-MultiSURF > Ensemble-VIP). It is indicated by this result that when utilizing UAV multispectral data for machine learning inversions, feature selection consistently remains the critical step determining the ultimate predictive capability of the model [
45]. Although abundant spectral details are provided by remote sensing imagery, issues of feature redundancy and multicollinearity are also readily introduced [
46]. If a vast array of noise-laden features is directly fed into an algorithm, not only is the computational burden immense, but under conditions of limited sample sizes, irrelevant environmental noise is highly susceptible to being excessively learned by the model rather than valid soil salinity patterns. Consequently, when confronted with unseen data, the generalization capability of the model is drastically constrained. The importance of feature selection and surface-condition differences can also be observed when the present results are compared with previous UAV-based soil salinity inversion studies. Yu et al. [
47] used UAV multispectral imagery in the Yellow River Delta and constructed soil salinity retrieval models using PLSR, MLR, BPNN, SVM, and RF. Their optimal SSRI-based RF model achieved a validation R
2 of 0.745, an RMSE of 1.879, and an RPD of 2.211, indicating that the construction and screening of salinity-sensitive spectral information played an important role in improving inversion accuracy. Zhao et al. [
48] further showed that UAV multispectral soil salinity inversion accuracy varied substantially among different surface-cover conditions. Their optimal model achieved an R
2 of 0.707 for bare land and 0.836 for agricultural land with vegetation cover, suggesting that vegetation coverage, soil background, and surface conditions can strongly affect model performance. In the present study, the Ensemble–PSO-SFLA model achieved an R
2 of 0.758, an S
RMSE of 0.285, and an RPIQ of 3.382 on the independent testing set. Therefore, the performance of the proposed model should not be interpreted simply as a direct numerical comparison of R
2 values, but as evidence that PSO-SFLA-based feature optimization can provide stable prediction ability under coastal saline-alkali farmland conditions with strong spatial heterogeneity.
The substantial disparity in efficacy among the three feature optimization methods adopted in this study stems precisely from the divergences in their underlying screening mechanisms. Derived from Partial Least Squares Regression, the VIP method belongs to a typical embedded evaluation paradigm, excelling at rapidly eliminated explicitly irrelevant features; however, the linear relationships among variables are primarily captured by it. Because the spectral reflectance of authentic soil is extremely intricate, and the spectral reflectance of coastal saline-alkali soil is not merely a simple linear superposition, numerous nonlinear spectral responses are frequently neglected by VIP. In contrast, acting as a filtering-based algorithm, a certain degree of nonlinear aggregation capability is captured by MultiSURF through computing nearest-neighbor sample distances; yet, highly similar, redundantly extracted indices still cannot be effectively identified and eliminated by it. The optimal-performing PSO-SFLA is classified as a wrapper-based feature optimization algorithm, whereby the cross-validation S
RMSE is directly used as the criterion for evaluating candidate feature subsets. It is ensured by this feature screening methodology that the 11 conclusively selected feature subsets constitute the optimal subset. This explanation is consistent with previous studies on feature selection and soil salinity inversion. VIP is derived from the PLSR framework and mainly identifies important variables within a linear latent-variable structure; therefore, it may be limited when soil salinity is affected by nonlinear interactions among spectral reflectance, vegetation condition, moisture, and soil background. Previous soil salinity remote-sensing studies have also reported that the relationship between spectral variables and salinity is nonlinear and controlled by multiple factors, making simple linear combinations insufficient for accurately representing salinization processes. In contrast, Relief-family algorithms such as MultiSURF can capture certain complex associations through nearest-neighbor comparisons, but they remain filter-style methods that evaluate feature relevance before model construction. Wrapper-based methods, such as PSO-SFLA, can directly optimize feature subsets according to model prediction performance. Similar conclusions were reported by Xie et al., who found that appropriate feature selection reduced input dimensionality and improved soil salinity estimation accuracy [
49], and by Wang et al., who emphasized that feature selection combined with model optimization can improve model generalization and robustness in heterogeneous saline environments [
50]. In the present study, PSO-SFLA reduced the original spectral feature space to 11 core variables, weakened the influence of redundant and collinear indices, and made the subsequent SHAP-based interpretation more physically meaningful. Previous SHAP-based soil salinity studies have also shown that interpretable machine learning can help link black-box model outputs with environmental mechanisms.
In addition, the interpretability and practical physical significance of the model are also facilitated by feature optimization. A highly parameterized model containing dozens or even hundreds of black-box bands is extremely difficult to be clearly explained and applied to practical agronomic decision-making. Through the precise screening executed by the PSO-SFLA, the focus of the model is strictly localized onto the core variables most sensitive to fluctuations in soil salinity and moisture. The anti-noise capability of the model is not only elevated by this dimensionality reduction, but tight linkages between the data model and the genuine physical processes such as ion absorption and water evaporation within actual agro-geological environments are inherently established.
4.2. Inversion Potential and Mechanism Analysis of the Ensemble Model
It is indicated by the data derived from this study that the most optimal predictive efficacy in coastal saline-alkali land inversion was achieved by the PSO-SFLA-driven ensemble model (independent test set R
2 = 0.758, S
RMSE = 0.285, RPIQ = 3.382); its accuracy is significantly superior to other model combinations and existing relevant research findings. This indicates that the proposed framework provided good quantitative prediction performance for soil salinity inversion in coastal saline-alkali farmland. The use of RPIQ further strengthens this evaluation because it is based on the interquartile range of the observed data and is therefore suitable for heterogeneous and potentially skewed soil salinity distributions [
43]. The higher R
2 and RPIQ values obtained in this study suggest that the Ensemble–PSO-SFLA model improved prediction accuracy and reliability under the current coastal salinity conditions. This improvement can be attributed to the ability of PSO-SFLA to select stable and informative spectral features and the capacity of the ensemble framework to integrate complementary linear and nonlinear information from different base learners. Therefore, the proposed model not only improved overall prediction accuracy but also reduced prediction bias and smoothing effects for extreme high-salinity samples.
The core advantages of the ensemble model primarily stem from its robust nonlinear fitting and generalization capabilities. The spatial variability of coastal saline-alkali lands constitutes a complex process driven by the multiple coupling of multifarious natural and anthropogenic factors, and its mapping relationship with multispectral features exhibits a high degree of nonlinearity and random mutability. Any singular machine learning baseline model inevitably carries its specific inductive bias [
51]. Due to this bias, the true nonlinear manifolds can only be approximated by a single model from a specific, narrow dimensionality, rendering it exceedingly difficult to simultaneously accommodate both global stable trends and local extreme-value mutations. The merit of an ensemble model resides in its capacity to integrate base learners. In this study, the Ridge model characterized by strict L2 penalty regression constraints, the nonlinear RF learner reliant on parallel voting within ensemble subspaces, and the ET evolutionary tree model which pushes splitting randomness to the extreme limit, were juxtaposed as bottom-layer observers. Because dissimilar underlying logics are employed by them to process data, a high degree of uncorrelated complementarity is manifested by the residual distributions they leave behind when facing the identical sample set [
52]. A robust data foundation is thereby provided for the XGBoost aggregator to correct systematic biases. The essence of an ensemble model lies not merely in selecting a single optimal model, but rather in intelligently amalgamating multiple suboptimal yet diversified models. By integrating foundational learners equipped with disparate mechanisms, latent patterns within the data from multidimensional perspectives are capable of being captured by the model. The prediction errors of individual base learners are often uncorrelated, which constructs a more solid foundation for the meta-learner to rectify systematic biases and integrate multi-layered feature information, thereby maximizing the predictive value of the ensemble [
51,
52].
4.3. Feasibility of Constructing Salinity Inversion Models in Small Sample Size Scenarios
In refined regional UAV quantitative remote sensing tasks, constrained by geographical limitations, harsh field sampling environments, as well as the exorbitant costs and protracted cycles of laboratory sample measurement, how to break the objective upper limit of sample numbers and utilize a limited small sample capacity to construct inversion models with acceptable predictive performance and improved reliability under limited-sample conditions has perpetually remained a critical pain point in agricultural remote sensing [
53]. Aiming at the model overfitting issues readily induced by small samples, corresponding strategies across three dimensions—sampling design, feature dimensionality reduction, and algorithm architecture—were adopted in this study, thereby verifying the feasibility of constructing robust models under small sample conditions.
First, during the data sampling and dataset partitioning phase, a sampling strategy balancing both randomness and representativeness was adopted in this study. Although the overall foundational sample size was limited, the salinity gradients ranging from non-salinized to severely salinized soils were evenly covered by the sampling points. It is indicated by the statistical analysis results (
Table 4) that a high degree of consistency across the mean, standard deviation, and coefficient of variation distributions was preserved between the training set and the test set. The stability of the data distribution was guaranteed by this rational macroscopic sample structure, and issues of model evaluation failure caused by uneven data distribution were consequently averted.
Second, model complexity was decreased through feature dimensionality reduction. For small-sample learning, the input of excessive high-dimensional features often constitutes a primary factor leading to overfitting. In this study, the PSO-SFLA was employed to streamline the initial 37-dimensional multispectral and index features into 11-dimensional core variables. A vast amount of redundant spectral information and background noise was eliminated during this process, effectively controlling the dimensionality of the input data. Such targeted dimensionality reduction enabled the machine learning models to focus more intensely on learning physical mapping laws genuinely and strongly correlated with soil salinity, precluding their excessive learning of local random interference.
Finally, the stability of the model was elevated by exploiting the inherent advantages of the ensemble model itself. In an ensemble model, base models with distinct mechanisms are combined to operate synergistically. Outliers are subjected to fundamental constraints via linear regularization by the Ridge model, whereas Decision Trees are constructed by RF and ET through the introduction of randomized samples and feature splitting, innately conferring strong anti-overfitting capabilities upon them [
54]. Furthermore, the prediction residuals of the base models are subsequently rectified by the XGBoost model. The bias and variance of the model were effectively balanced by this hybrid strategy combining a linear model, tree models, and a boosting algorithm. This is corroborated by the experimental results: under optimal configurations, the performance of the model on the training set (R
2 = 0.812, S
RMSE = 0.285) closely approximated that on the independent test set (R
2 = 0.758, S
RMSE = 0.315), demonstrating a smooth transition in accuracy. It is fundamentally substantiated that the restricted generalization capabilities resulting from limited sample sizes were effectively overcome through the aforementioned mechanisms.
4.4. Shortcoming and Prospects
Although this study has made positive progress in UAV-based soil salinity inversion in coastal saline-alkali farmland, several limitations should be acknowledged. First, concerning data timeliness and seasonal bias, the field soil samples and UAV multispectral images used in this study were collected only on 4 December 2025, during the winter wheat seedling stage. During this period, the vegetation coverage was relatively low and most of the soil surface remained exposed, which was beneficial for capturing soil background reflectance information. However, soil salinity in coastal farmland is strongly affected by seasonal variations in precipitation, evaporation, groundwater depth, irrigation, and crop growth conditions. Therefore, the applicability and robustness of the model during other key periods, such as the spring irrigation period, summer crop growth period, and autumn fallow period, still require further validation and assessment.
Second, regarding sample size and spatial representativeness, although the 90 soil samples collected using a grid-based random sampling strategy were sufficient for basic model construction and accuracy evaluation, the sample size was still relatively limited for fully characterizing the complex spatial heterogeneity of soil salinity in coastal saline-alkali farmland. In particular, the prediction accuracy at high-salinity points was lower than that at low-salinity points, indicating that the model may still have uncertainty when estimating extreme salinity conditions. Therefore, caution should be exercised when directly extrapolating the proposed model to larger coastal saline-alkali regions or areas with different soil, hydrological, and management conditions.
Future research will focus on several key directions. First, multi-season and multi-year UAV observations should be conducted to reveal the temporal dynamics of soil salinity and improve the seasonal robustness of the inversion model. Second, larger and more representative field datasets should be collected across different coastal saline-alkali regions to strengthen the spatial transferability and external validation of the model. Third, multi-source data fusion should be further explored by integrating UAV multispectral imagery with thermal infrared data, LiDAR-derived terrain factors, soil moisture information, groundwater indicators, and meteorological variables. Finally, future studies should further improve model lightweighting, cross-regional validation, and mechanistic interpretability, so that UAV-based soil salinity inversion can be more effectively applied to precision agriculture and dynamic monitoring of coastal saline-alkali land.