Author Contributions
Conceptualization, S.Y. and L.S.; methodology, S.Y., L.S. and W.D.; software, S.Y.; validation, S.Y., W.D. and D.W.; formal analysis, S.Y. and L.S.; investigation, S.Y., W.D., D.W. and H.G.; resources, L.S., L.Y. and Y.Z.; data curation, S.Y., D.W. and H.G.; writing—original draft preparation, S.Y.; writing—review and editing, L.S., A.R., R.L., H.G. and L.Y.; visualization, S.Y. and D.W.; supervision, L.S.; project administration, S.Y.; funding acquisition, S.Y. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Overview of the proposed workflow. Stage 1 summarizes UAV hyperspectral acquisition, preprocessing, and the construction of the real labeled set (Xreal,Yreal) with outlier removal. Stage 2 illustrates the conditional S-WGAN-GP with reflectance-domain constraints (SAM loss, spectral TV loss, range penalty), followed by Teacher-based pseudo-labeling (PLSR + RF) and metric-based filtering. The final regression is trained using Real-only versus Real + Filtered-Syn, and all validations are performed on real samples only (LOOCV).
Figure 1.
Overview of the proposed workflow. Stage 1 summarizes UAV hyperspectral acquisition, preprocessing, and the construction of the real labeled set (Xreal,Yreal) with outlier removal. Stage 2 illustrates the conditional S-WGAN-GP with reflectance-domain constraints (SAM loss, spectral TV loss, range penalty), followed by Teacher-based pseudo-labeling (PLSR + RF) and metric-based filtering. The final regression is trained using Real-only versus Real + Filtered-Syn, and all validations are performed on real samples only (LOOCV).
Figure 2.
Overview map of the study area.
Figure 2.
Overview map of the study area.
Figure 3.
UAV images and sample point distribution.
Figure 3.
UAV images and sample point distribution.
Figure 4.
Salt sample organization.
Figure 4.
Salt sample organization.
Figure 5.
(a) DJI Matrice 300 RTK and (b) S185 sensors.
Figure 5.
(a) DJI Matrice 300 RTK and (b) S185 sensors.
Figure 6.
(a) Residual ranking plot from the 5-fold cross-validated SVR model. Samples are sorted by absolute prediction residual |ECepred − ECeobs|; the vertical dashed line indicates the residual threshold (≈14.5 dSm−1) used to flag outliers. Green points denote retained samples (N = 47), and red points denote removed samples (N = 13). (b) Measured–predicted consistency plot for the same SVR-CV results. The dashed line represents the 1:1 reference; retained samples cluster closer to the 1:1 line, whereas removed samples show larger deviations, indicating spectral–chemical inconsistency and motivating their exclusion.
Figure 6.
(a) Residual ranking plot from the 5-fold cross-validated SVR model. Samples are sorted by absolute prediction residual |ECepred − ECeobs|; the vertical dashed line indicates the residual threshold (≈14.5 dSm−1) used to flag outliers. Green points denote retained samples (N = 47), and red points denote removed samples (N = 13). (b) Measured–predicted consistency plot for the same SVR-CV results. The dashed line represents the 1:1 reference; retained samples cluster closer to the 1:1 line, whereas removed samples show larger deviations, indicating spectral–chemical inconsistency and motivating their exclusion.
Figure 7.
Distribution of cleaned soil salinity (ECe) samples across the three sampling sites. The colored shaded areas represent the probability density distribution of the data (violin plots), and the correspondingly colored circles represent individual sample measurements (green for Site A, orange for Site B, and blue for Site C). The internal box plots indicate the median (red horizontal line), interquartile range (box), and overall data range (whiskers) for each site.
Figure 7.
Distribution of cleaned soil salinity (ECe) samples across the three sampling sites. The colored shaded areas represent the probability density distribution of the data (violin plots), and the correspondingly colored circles represent individual sample measurements (green for Site A, orange for Site B, and blue for Site C). The internal box plots indicate the median (red horizontal line), interquartile range (box), and overall data range (whiskers) for each site.
Figure 8.
Slice display of the correlation between soil spectral characteristics and soil salinity based on the Optimal Band Combination Analysis (OBCA) algorithm.
Figure 8.
Slice display of the correlation between soil spectral characteristics and soil salinity based on the Optimal Band Combination Analysis (OBCA) algorithm.
Figure 9.
Site-wise selection frequency of the robust OBCA band set.
Figure 9.
Site-wise selection frequency of the robust OBCA band set.
Figure 10.
Robustness Score Profile of Selected Bands.
Figure 10.
Robustness Score Profile of Selected Bands.
Figure 11.
Training dynamics and spectral fidelity evaluation of the S-WGAN-GP model. (a) Evolution of the Generator (G) and Critic (D) losses and the Spectral Angle Mapper (SAM) during training steps; (b) Comparison of the mean spectral reflectance between real observed samples (solid blue line) and GAN-generated synthetic samples (dashed red line).
Figure 11.
Training dynamics and spectral fidelity evaluation of the S-WGAN-GP model. (a) Evolution of the Generator (G) and Critic (D) losses and the Spectral Angle Mapper (SAM) during training steps; (b) Comparison of the mean spectral reflectance between real observed samples (solid blue line) and GAN-generated synthetic samples (dashed red line).
Figure 12.
Comparison of the mean spectral reflectance between real observed samples (solid blue line) and GAN-generated synthetic samples (dashed red line).
Figure 12.
Comparison of the mean spectral reflectance between real observed samples (solid blue line) and GAN-generated synthetic samples (dashed red line).
Figure 13.
Visualization of the data manifold alignment in the 2D feature space using t-Distributed Stochastic Neighbor Embedding (t-SNE). The plot illustrates the distribution coverage of real labeled samples (blue), unlabeled samples (orange), and the generated synthetic samples (green), highlighting the effective bridging of the feature gap. Two-dimensional t-SNE embedding of real, unlabeled, and synthetic spectra. The x–y axes represent embedding coordinates (dimensionless); only relative distances/neighborhood structure are meaningful.
Figure 13.
Visualization of the data manifold alignment in the 2D feature space using t-Distributed Stochastic Neighbor Embedding (t-SNE). The plot illustrates the distribution coverage of real labeled samples (blue), unlabeled samples (orange), and the generated synthetic samples (green), highlighting the effective bridging of the feature gap. Two-dimensional t-SNE embedding of real, unlabeled, and synthetic spectra. The x–y axes represent embedding coordinates (dimensionless); only relative distances/neighborhood structure are meaningful.
Figure 14.
(a) Frequency distribution of the predicted soil salinity (ECe) for the generated samples; (b) Distribution of the prediction confidence scores from the Teacher framework; (c) Distribution of the prediction uncertainty (standard deviation), indicating the stability of the pseudo-labels.
Figure 14.
(a) Frequency distribution of the predicted soil salinity (ECe) for the generated samples; (b) Distribution of the prediction confidence scores from the Teacher framework; (c) Distribution of the prediction uncertainty (standard deviation), indicating the stability of the pseudo-labels.
Figure 15.
Spatial distribution maps of predicted soil salinity (ECe) across the three experimental areas (Site A, Site B, and Site C) derived from the final semi-supervised inversion model. The solid black lines delineate the boundaries of the respective sampling sites.
Figure 15.
Spatial distribution maps of predicted soil salinity (ECe) across the three experimental areas (Site A, Site B, and Site C) derived from the final semi-supervised inversion model. The solid black lines delineate the boundaries of the respective sampling sites.
Figure 16.
Measured versus predicted soil salinity (ECe) for four regression models (RF, PLSR, SVR, and XGB) evaluated on real observations using LOOCV. The top row shows the supervised baseline trained with real samples only, whereas the bottom row shows models trained with the combined real + synthetic dataset. Each blue dot represents a held-out real sample; the orange dotted line denotes the 1:1 reference line, the blue dashed line indicates the linear fitted trend, and the light orange shaded area represents the uncertainty band (e.g., 95% confidence interval) of the fit. Performance statistics (R2, RMSE, MAE, and RPD) are reported in each panel.
Figure 16.
Measured versus predicted soil salinity (ECe) for four regression models (RF, PLSR, SVR, and XGB) evaluated on real observations using LOOCV. The top row shows the supervised baseline trained with real samples only, whereas the bottom row shows models trained with the combined real + synthetic dataset. Each blue dot represents a held-out real sample; the orange dotted line denotes the 1:1 reference line, the blue dashed line indicates the linear fitted trend, and the light orange shaded area represents the uncertainty band (e.g., 95% confidence interval) of the fit. Performance statistics (R2, RMSE, MAE, and RPD) are reported in each panel.
Figure 17.
High-resolution spatial distribution maps of predicted soil salinity (ECe) across the three study sites.
Figure 17.
High-resolution spatial distribution maps of predicted soil salinity (ECe) across the three study sites.
Table 1.
UAV sensor parameters.
Table 1.
UAV sensor parameters.
| Parameters | Specifications |
|---|
| Spectral range | 450–950 nm |
| Spectral resolution | 8 nm @ 532 nm |
| Sampling interval | 4 nm |
| Number of channels | 125 |
| Measurement time | 0.1–1000 ms |
| Digital resolution | 12 bit |
| High-resolution imaging speed | 5 Cubes/s |
| Weight | 490 g |
Table 2.
Descriptive statistics of soil salinity (ECe) before and after outlier removal.
Table 2.
Descriptive statistics of soil salinity (ECe) before and after outlier removal.
| Dataset | N | Min | Max | Mean | Std | CV(%) |
|---|
| Original | 60 | 1.65 | 51.22 | 16.87 | 13.35 | 79.12 |
| Cleaned | 47 | 1.65 | 29.95 | 12.39 | 8.97 | 72.43 |
Table 3.
Optimal band combinations and correlation coefficients (|r|) for the six constructed spectral indices (TBI1–TBI6) based on the global dataset.
Table 3.
Optimal band combinations and correlation coefficients (|r|) for the six constructed spectral indices (TBI1–TBI6) based on the global dataset.
| Feature | Combination | PearsonAbs |
|---|
| TBI1 | R30, R38, R124 | 0.717387 |
| TBI2 | R29, R31, R38 | 0.668571 |
| TBI3 | R4, R26, R38 | 0.558812 |
| TBI4 | R6, R24, R74 | 0.613876 |
| TBI5 | R29, R38, R125 | 0.705115 |
| TBI6 | R29, R38, R125 | 0.688971 |
Table 4.
Optimal band combinations and correlation coefficients (||) for the six constructed spectral indices (TBI1–TBI6) based on the global dataset.
Table 4.
Optimal band combinations and correlation coefficients (||) for the six constructed spectral indices (TBI1–TBI6) based on the global dataset.
| Feature | Combination | SpearmanAbs |
|---|
| TBI1 | R30, R38, R124 | 0.75266 |
| TBI2 | R30, R31, R38 | 0.702012 |
| TBI3 | R5, R25, R37 | 0.569843 |
| TBI4 | R6, R23, R76 | 0.600729 |
| TBI5 | R29, R38, R125 | 0.739015 |
| TBI6 | R29, R38, R125 | 0.707216 |
Table 5.
The final subset of 16 robust salinity-sensitive bands selected based on cross-scenario evaluation.
Table 5.
The final subset of 16 robust salinity-sensitive bands selected based on cross-scenario evaluation.
| Band ID | Center Wavelength (nm) | Spectral Region | Robustness Score | Primary Feature Support |
|---|
| R25 | 547 | Visible (Green) | 18 | TBI2, TBI3 |
| R27 | 555 | Visible (Green) | 15 | TBI1, TBI5 |
| R29 | 563 | Visible (Green) | 11 | TBI2, TBI5, TBI6 |
| R58 | 680 | Red Edge | 16.5 | TBI4 |
| R77 | 756 | NIR | 9 | TBI2 |
| R97 | 837 | NIR | 13 | TBI3, TBI4 |
| R100 | 849 | NIR | 14 | TBI2 |
| R102 | 857 | NIR | 14.8 | TBI1, TBI4 |
| R107 | 877 | NIR | 10 | TBI3 |
| R109 | 885 | NIR | 37.5 | All (High Diversity) |
| R111 | 894 | NIR | 25.5 | TBI1, TBI3, TBI5 |
| R115 | 910 | NIR | 22 | TBI1, TBI6 |
| R118 | 922 | NIR | 23 | TBI2, TBI5 |
| R121 | 934 | NIR | 12 | TBI4 |
| R123 | 942 | NIR | 9.5 | TBI2, TBI6 |
| R125 | 950 | NIR | 30 | All (High Diversity) |
Table 6.
Site-wise quality envelope of the final retained synthetic samples after stage filtering.
Table 6.
Site-wise quality envelope of the final retained synthetic samples after stage filtering.
| Site | N_Real_Original | N_Syn_Final | SAM_Deg (Max) | Dscore (Min) | Conf (Min) | S_Std (Max) |
|---|
| A | 15 | 472 | ≤0.556 | ≥4.226 | ≥0.472 | ≤10.479 |
| B | 16 | 432 | ≤0.895 | ≥4.181 | ≥0.474 | ≤10.331 |
| C | 16 | 282 | ≤0.738 | ≥4.381 | ≥0.546 | ≤7.35 |
Table 7.
Accuracy comparison of inversion models using LOOCV based on Baseline (Real only) and Augmented (Real + Synthetic) strategies.
Table 7.
Accuracy comparison of inversion models using LOOCV based on Baseline (Real only) and Augmented (Real + Synthetic) strategies.
| Strategy | Training Data Size (N) | Model | R2 | RMSE (dS⋅m−1) | MAE | RPD | Improvement (R2) |
|---|
| Real-only training | 47 | | | | | | |
| (Baseline) | (Original Only) | SVR | 0.36 | 7.06 | 5.78 | 1.27 | - |
| RF | 0.37 | 7.01 | 5.65 | 1.28 | - |
| XGB | 0.28 | 7.52 | 6.08 | 1.19 | - |
| PLSR | 0.23 | 7.78 | 6.07 | 1.15 | - |
| Real + synthetic training | 1233 | | | | | | |
| (Proposed) | (Original + Generated) | SVR | 0.6 | 5.57 | 4.63 | 1.61 | +66.7% |
| RF | 0.54 | 5.97 | 4.97 | 1.5 | +45.9% |
| XGB | 0.52 | 6.13 | 5.07 | 1.46 | +85.7% |
| PLSR | 0.44 | 6.59 | 5.14 | 1.36 | +91.3% |