Author Contributions
Conceptualization, A.L.; methodology, A.L. and T.S.; software, A.L., W.L., T.S., Y.J. and J.X.; validation, A.L., T.S. and C.S.; formal analysis, A.L. and T.S.; investigation, A.L. and T.S.; resources, A.L. and T.S.; data curation, A.L. and T.S.; writing—original draft preparation, A.L.; writing—review and editing, Z.Z., W.F. and C.S.; visualization, A.L. and T.S.; supervision, Z.Z., W.F., C.S., Y.J. and J.X.; project administration, A.L. and C.S.; funding acquisition, C.S. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Flowchart of the machine learning approach to reconstructing SCS Chl-a concentrations.
Figure 1.
Flowchart of the machine learning approach to reconstructing SCS Chl-a concentrations.
Figure 2.
Correlation analysis of the explanatory and predictor variables: Pearson’s correlation matrix plot. Chl-a, chlorophyll-a concentration; Lon, longitude; Lat, latitude; Dep, topography; WSP, wind speed; WSC, wind stress curl; SST, sea-surface temperature; TP, total precipitation; SLHF, sea-surface latent heat flux; SSHF, sea-surface sensible heat flux; LWRF, longwave radiation flux; SWRF, shortwave radiation flux; SLP, sea-level pressure.
Figure 2.
Correlation analysis of the explanatory and predictor variables: Pearson’s correlation matrix plot. Chl-a, chlorophyll-a concentration; Lon, longitude; Lat, latitude; Dep, topography; WSP, wind speed; WSC, wind stress curl; SST, sea-surface temperature; TP, total precipitation; SLHF, sea-surface latent heat flux; SSHF, sea-surface sensible heat flux; LWRF, longwave radiation flux; SWRF, shortwave radiation flux; SLP, sea-level pressure.
Figure 3.
Frequency histograms of the explanatory and predictor variables: (a) log10(Chl-a); (b) depth; (c) monthly mean sea-surface latent heat flux; (d) monthly mean sea-surface sensible heat flux; (e) monthly mean sea-surface net shortwave radiation flux; (f) wind speed.
Figure 3.
Frequency histograms of the explanatory and predictor variables: (a) log10(Chl-a); (b) depth; (c) monthly mean sea-surface latent heat flux; (d) monthly mean sea-surface sensible heat flux; (e) monthly mean sea-surface net shortwave radiation flux; (f) wind speed.
Figure 4.
Schematic diagram of the leave-one-out cross-validation method.
Figure 4.
Schematic diagram of the leave-one-out cross-validation method.
Figure 5.
Time series of Chl-a concentration missing ratios in the SCS.
Figure 5.
Time series of Chl-a concentration missing ratios in the SCS.
Figure 6.
Spatial variations in the Chl-a concentration missing ratios across seasons in the SCS: (a) spring; (b) summer; (c) autumn; (d) winter.
Figure 6.
Spatial variations in the Chl-a concentration missing ratios across seasons in the SCS: (a) spring; (b) summer; (c) autumn; (d) winter.
Figure 7.
Validation of the spatial imputation results for the monthly mean Chl-a concentration in the SCS: (a) R2; (b) RMSE; (c) MAE.
Figure 7.
Validation of the spatial imputation results for the monthly mean Chl-a concentration in the SCS: (a) R2; (b) RMSE; (c) MAE.
Figure 8.
Frequency distributions of the residual Chl-a concentration in the SCS region interpolated based on the month for the four machine learning methods: (a) KNN in January; (b) KNN in July; (c) MLP in January; (d) MLP in July; (e) RF in January; (f) RF in July; (g) GBDT in January; (h) GBDT in July. μ, mean; σ, standard deviation.
Figure 8.
Frequency distributions of the residual Chl-a concentration in the SCS region interpolated based on the month for the four machine learning methods: (a) KNN in January; (b) KNN in July; (c) MLP in January; (d) MLP in July; (e) RF in January; (f) RF in July; (g) GBDT in January; (h) GBDT in July. μ, mean; σ, standard deviation.
Figure 9.
Frequency distributions of the Chl-a concentration residuals in the SCS region interpolated based on the missing ratio for the four machine learning methods: (a) KNN in January; (b) KNN in July; (c) MLP in January; (d) MLP in July; (e) RF in January; (f) RF in July; (g) GBDT in January; (h) GBDT in July. μ, mean; σ, standard deviation.
Figure 9.
Frequency distributions of the Chl-a concentration residuals in the SCS region interpolated based on the missing ratio for the four machine learning methods: (a) KNN in January; (b) KNN in July; (c) MLP in January; (d) MLP in July; (e) RF in January; (f) RF in July; (g) GBDT in January; (h) GBDT in July. μ, mean; σ, standard deviation.
Figure 10.
Spatial distribution of the RF model predictions of the Chl-a concentration in the South China Sea based on the month method and based on the missing ratio method: (a) Chl-a raw data of January 2018; (b) prediction results of RF based on the month in January 2018; (c) prediction results of RF based on the missing ratio in January 2018; (d) Chl-a raw data of May 2018; (e) prediction results of RF based on the month in May 2018; (f) prediction results of RF based on the missing ratio in May 2018.
Figure 10.
Spatial distribution of the RF model predictions of the Chl-a concentration in the South China Sea based on the month method and based on the missing ratio method: (a) Chl-a raw data of January 2018; (b) prediction results of RF based on the month in January 2018; (c) prediction results of RF based on the missing ratio in January 2018; (d) Chl-a raw data of May 2018; (e) prediction results of RF based on the month in May 2018; (f) prediction results of RF based on the missing ratio in May 2018.
Figure 11.
Scatter plots of the monthly mean Chl-a concentrations from satellite data versus monthly mean Chl-a concentrations RF model predicted based on the month and based on the missing ratio in January and May 2018: (a) January based on the month; (b) May based on the month; (c) January based on the missing ratio; (d) May based on the missing ratio.
Figure 11.
Scatter plots of the monthly mean Chl-a concentrations from satellite data versus monthly mean Chl-a concentrations RF model predicted based on the month and based on the missing ratio in January and May 2018: (a) January based on the month; (b) May based on the month; (c) January based on the missing ratio; (d) May based on the missing ratio.
Figure 12.
Frequency distributions of the residuals from the RF methods for the spatial imputation of the Chl-a concentrations in the SCS: (a,b) predicted based on the month; (c,d) predicted based on the missing ratio. μ, mean; σ, standard deviation.
Figure 12.
Frequency distributions of the residuals from the RF methods for the spatial imputation of the Chl-a concentrations in the SCS: (a,b) predicted based on the month; (c,d) predicted based on the missing ratio. μ, mean; σ, standard deviation.
Table 1.
Description of the datasets used in the study. Chl-a, chlorophyll-a concentration; Lon, longitude; Lat, latitude; Dep, depth; WSP, wind speed; WSC, wind stress curl; SST, sea-surface temperature; TP, total precipitation; SLHF, sea-surface latent heat flux; SSHF, sea-surface sensible heat flux; LWRF, longwave radiation flux; SWRF, shortwave radiation flux; SLP, sea-level pressure.
Table 1.
Description of the datasets used in the study. Chl-a, chlorophyll-a concentration; Lon, longitude; Lat, latitude; Dep, depth; WSP, wind speed; WSC, wind stress curl; SST, sea-surface temperature; TP, total precipitation; SLHF, sea-surface latent heat flux; SSHF, sea-surface sensible heat flux; LWRF, longwave radiation flux; SWRF, shortwave radiation flux; SLP, sea-level pressure.
Dataset | Unit | Min | Max | Spatial Resolution | Grid Size (Pixel) |
---|
Chl-a | mg/m3 | 0 | 26.6 | 0.04° × 0.04° | 557 × 600 × 240 |
Lon | ° | 99 | 122.5 | 0.25° × 0.25° | 101 × 109 |
Lat | ° | 0 | 23.5 | 0.25° × 0.25° | 101 × 109 |
Dep | m | −5008 | −1 | 0.016°×0.016° | 1410 × 1409 |
WSP | m·s−1 | 1.4 | 15.4 | 0.25° × 0.25° | 101 × 109 × 240 |
WSC | N·m−3 | −2 × 10−7 | 2.5 × 10−7 | 0.25° × 0.25° | 101 × 109 × 240 |
SST | K | 285.6 | 304.8 | 0.25° × 0.25° | 101 × 109 × 240 |
TP | m | 0 | 0.04 | 0.25° × 0.25° | 101 × 109 × 240 |
SLHF | J·m−2 | −3.7 × 107 | 1.2 × 106 | 0.25° × 0.25° | 101 × 109 × 240 |
SSHF | J·m−2 | −9.8 × 106 | 1.4 × 106 | 0.25° × 0.25° | 101 × 109 × 240 |
LWRF | W·m−2 | −100.8 | −15.4 | 0.25° × 0.25° | 101 × 109 × 240 |
SWRF | W·m−2 | 57.7 | 293.2 | 0.25° × 0.25° | 101 × 109 × 240 |
SLP | Pa | 1.00 × 105 | 1.02 × 105 | 0.25° × 0.25° | 101 × 109 × 240 |
Table 2.
Hyperparameters and alternative values for ML algorithms.
Table 2.
Hyperparameters and alternative values for ML algorithms.
ML Algorithm | Hyperparameter | Alternative Values |
---|
MLP | hidden_layer_sizes | (100 × 1), (50 × 2), (20 × 3) |
activation | ‘relu’, ‘tanh’ |
solver | ‘adam’, ’sgd’ |
alpha | 0.0001, 0.001, 0.01 |
RF | n_estimators | 50, 100, 150 |
max_depth | 10, 20, 30, 40 |
min_samples_split | 2, 5, 10 |
min_samples_leaf | 1, 2, 4 |
GBDT | n_estimators | 100, 200, 300 |
learning_rate | 0.01, 0.1, 0.5 |
max_depth | 3, 5, 7 |
min_samples_split | 2, 5, 10 |
min_samples_leaf | 1, 2, 4 |
KNN | k_values | 3, 5, 7, 9, 11 |
Table 3.
Three performance metrics along with their formula.
Table 3.
Three performance metrics along with their formula.
Metrics | Formula |
---|
RMSE | |
MAE | |
R2 | |
Table 4.
The introduction of the explanatory and predictor variables in the machine learning models, with data size expressed as the amount of real data used for training.
Table 4.
The introduction of the explanatory and predictor variables in the machine learning models, with data size expressed as the amount of real data used for training.
Variables | Data | Data Size (Pixels) |
---|
Explanatory variables | depth | 26,886 |
wind speed | 6,130,008 |
monthly mean sea-surface net shortwave radiation flux | 6,130,008 |
monthly mean sea-surface sensible heat flux | 6,130,008 |
monthly mean sea-surface latent heat flux | 6,130,008 |
Predictor variables | Chl-a | 6,130,008 |
Table 5.
Performance of the four machine learning models under the three missing data ratio scenarios.
Table 5.
Performance of the four machine learning models under the three missing data ratio scenarios.
Missing Ratio (%) | Evaluation Metrics | MLP | RF | GBDT | KNN |
---|
(0~5) | RMSE | 0.26 | 0.28 | 0.25 | 0.35 |
R2 | 0.65 | 0.71 | 0.68 | 0.56 |
MAE | 0.12 | 0.10 | 0.11 | 0.11 |
(5~15) | RMSE | 0.34 | 0.29 | 0.28 | 0.45 |
R2 | 0.29 | 0.62 | 0.51 | 0.16 |
MAE | 0.16 | 0.12 | 0.14 | 0.15 |
(15~) | RMSE | 0.35 | 0.30 | 0.29 | 0.48 |
R2 | 0.27 | 0.66 | 0.54 | 0.23 |
MAE | 0.21 | 0.14 | 0.16 | 0.18 |
Table 6.
The R2 for random forests predicted based on the month and predicted based on the missing ratio. RF_BM, prediction based on the month; RF_BMR, prediction based on the missing ratio; MR missing data ratio.
Table 6.
The R2 for random forests predicted based on the month and predicted based on the missing ratio. RF_BM, prediction based on the month; RF_BMR, prediction based on the missing ratio; MR missing data ratio.
Month | RF_BM | RF_BMR | MR |
---|
January | 0.617166 | 0.778624 | 39.87% |
February | 0.802298 | 0.692879 | 17.12% |
March | 0.587013 | −0.03889 | 16.80% |
April | 0.864126 | 0.45351 | 9.8% |
May | 0.851278 | 0.739405 | 2.29% |
June | 0.863903 | 0.73687 | 15.04% |
July | 0.835558 | 0.670975 | 25.7% |
August | 0.851366 | 0.733656 | 32.59% |
September | 0.827067 | 0.76705 | 15.88% |
October | 0.744168 | 0.716628 | 10.89% |
November | 0.882253 | 0.874455 | 16.07% |
December | 0.860779 | 0.82279 | 28.13% |