A Systematic Review of Machine Learning Algorithms for Soil Pollutant Detection Using Satellite Imagery
Abstract
:1. Introduction
2. Materials and Methods
2.1. Eligibility Criteria: Search Characteristics
2.2. Characteristics of the Accepted Studies
2.2.1. Participants
2.2.2. Detections Considered and Evaluated
- Assessment of satellite imagery: Analysis of images captured by satellites.
- Machine learning methodologies: Evaluation of machine learning techniques used for satellite image analysis.
- Identification of contaminated soil types: Detection and assessment of contamination types through analysis described in (B), evaluated by (A).
- Validation of results: Verification of identified contaminated soil types (C) by comparing with on-ground sampling data.
2.2.3. Design of Accepted Studies
2.3. Information Sources
2.4. Search Strategy
- “Machine learning” AND polluta* AND satellite;
- “Deep learning” AND polluta* AND satellite;
- “Artificial intelligence” AND polluta* AND satellite;
- “Machine learning” AND polluta* AND image AND soil;
- “Deep learning” AND polluta* AND image AND soil;
- “Artificial intelligence” AND polluta* AND image AND soil;
- “Machine learning” AND polluta* AND Algorithm AND satellite;
- “Machine learning” AND soil contamina* AND satellite;
- “Deep learning” AND contamina* AND Satellite AND soil;
- “Artificial intelligence” AND soil contamina* AND satellite;
- “Machine learning” AND soil AND contamina* AND image;
- “Deep learning” AND contamina* AND image AND soil;
- “Artificial intelligence” AND soil AND contamina* AND image.
2.5. Study Records
2.5.1. Data Management
2.5.2. Selection Process
- Utilization of artificial intelligence techniques, such as machine learning (ML) and deep learning (DL), for image processing.
- Detection of soil pollutants or soil characteristics related to soil pollutants, such as soil evaporation or soil moisture content.
- Exclusive use of satellite images (excluding airborne or drone data and soil testing data) for analysis.
- Validation of data through methods that measure accuracy or error in detecting soil pollutants, such as field soil testing.
- Description and specification of the machine learning methods used for image processing, especially if developed by the author.
- Defined accuracy metrics and presentation of results in quantifiable terms.
2.6. Data Collection Process
- References: Each article reference is provided. Following a meticulous review of over 1000 articles using the PRISMA method, a final selection of 36 articles was made to align with the specific goals of this review. Further details on these selected articles are presented in the accompanying table.
- Satellite Name: The appendix’s second column lists the satellites used in each article, with some studies focusing on imagery from a single satellite while most compare multiple satellites. This approach aids in selecting the most suitable satellite(s) for specific applications.
- Machine Learning Methods: The types of machine learning (ML) techniques employed by the authors and any unique methodologies they developed.
- Contaminant Types: A list of all contaminants or pollutants studied, as well as other soil characteristics related to soil pollutants. The names of these parameters are presented in the fourth column of the appendix.
- Validation: Validation of each ML method is a critical aspect, evaluating the effectiveness of the techniques for interpreting satellite images. Typically, this validation involves direct soil sampling at specified depths, ensuring an understanding of the correlation between soil samples and satellite imagery. The number of boreholes used varies with the investigated area, as indicated in the appendix.
- Performance: Machine learning performance metrics serve as quantitative measures to assess model accuracy and effectiveness across classification, regression, or clustering tasks. Each article’s specific performance metrics and the best results are summarized in a table or presented as final values in the appendix.
- Results: The last column of Appendix A provides a concise conclusion for each article, showing how each study’s results align with the systematic review’s purpose. This section facilitates an understanding of the effectiveness of different methods, satellite types, and other parameters for pollution detection.
2.7. Prioritization and Outcomes
- Validation Sample Size: Studies using larger sample sizes for validation were given higher priority. Substantial sample sizes enhance the statistical robustness and reliability of findings, improving the overall quality of the research.
- Accuracy of Validation Data: Articles that employed the most accurate and reliable validation methods for analyzing images through machine learning were prioritized. High-quality validation data add credibility to research outcomes and strengthen confidence in the reported results.
- Identification of Different Contaminations: Studies that directly identified various contaminations or indirectly inferred them through specific soil characteristics were considered more significant. This approach broadens the research scope, providing insights into a diverse range of soil pollutants and their potential impacts.
- Diverse ML Methods: Articles exploring various machine learning methods for image analysis were prioritized. Employing diverse ML methodologies enables a comprehensive exploration of soil pollutant detection approaches, enhancing the understanding of their effectiveness.
2.8. The Risk of Bias and Quality Assessment
- Satellite Source Diversity: Evaluated the range and types of satellite data utilized.
- Machine Learning Methodology Variation: Assessed differences in ML approaches across studies.
- Contaminant Diversity: Examined the range of pollutants detected.
- Performance Analysis Methods: Investigated the metrics and validation techniques used.
- Sampling Quantity and Quality: Considered the number and reliability of samples used for model training and validation.
- Validation Data: Assessed the rigor of the accuracy measurement methods.
2.9. Article Selection
3. Results, Discussion, and Summary of Evidence
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Ref. | Study Characteristics | Study Results | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Ref. | Satellite Name | Method of ML | Environmental Parameters Detected | Validation | Performance | Result | |||||||||||||
Validation Type | Validation Sample Numbers | Performance Metrics | Best Performance Results | ||||||||||||||||
[22] | Landsat 4 (TM), Landsat 5 (TM), Landsat 6 (ETM), Landsat 7 (ETM+), Landsat 8, Sentinel 2 | Random Forest (RF) | Cr, Fe, Ni, and Zn | soil samples using an auger (0–20 cm depth). | 360 soil samples. | (R2), (MAE), (MSE). | Target | MAE (Cal) | R2 (Cal) | MAE (Val) | R2 (Val) | The clearest discrimination of soil PTEs was obtained from SYSI using a long-term Landsat 5 collection over 35 years. Satellite data could efficiently detect the contents of PTEs in soils due to their relation with soil attributes and parent materials. | |||||||
Cr | 9.18 ± 1.19 | 0.16 | 8.90 ± 0.40 | 0.23 | |||||||||||||||
Fe | 111.34 ± 24.35 | 0.55 | 78.77 ± 30.11 | 0.61 | |||||||||||||||
Ni | 4.74 ± 0.47 | 0.13 | 3.33 ± 0.38 | 0.16 | |||||||||||||||
Zn | 11.00 ± 2.15 | 0.22 | 8.44 ± 1.20 | 0.20 | |||||||||||||||
[23] | Terra, Aqua, NOAA satellites, Landsat satellites, PROBA-V | the general regression neural network (GRNN), long short-term memory (LSTM), gated recurrent unit (GRU), and Bidirectional LSTM (Bi-LSTM) | Leaf area index (LAI) | Reference maps were collected from 2000 to 2016 at 47 sites from Bigfoot from VALERI and ImagineS networks with different dominant biome types. | 79 available high-resolution LAI reference maps. | number of samples points (N), R2, RMSE, bias, and the percentage of pixels meeting the target accuracy requirement (P) | The results show that GLASS V6 LAI achieves higher accuracy, with a Root Mean Squared Error (RMSE) of 0.92 at 250 m and 0.86 at 500 m, while the RMSE is 0.98 for PROBA-V at 300 m, 1.08 for GLASS V5, and 0.95 for MODIS C6 both at 500 m. | GLASS V6 LAI product is more spatiotemporally continuous and has higher quality in terms of presenting more realistic temporal LAI dynamics when the surface reflectance is absent for a long period owing to persistent cloud/aerosol contaminations. The results indicate that the new Bi-LSTM deep learning model runs significantly faster than the GLASS V5 algorithm, avoids the reconstruction of surface reflectance data, and is resistant to the noises (cloud and snow contamination) or missing values contained in surface reflectance than other methods, as the Bi-LSTM can effectively extract information across the entire time series of surface reflectance rather than a single time point. | |||||||||||
[24] | Landsat 8 | Random Forest (RF) | multi-mycotoxin contamination (such as deoxynivalenol and zearalenone) | prediction results were validated with the Dutch data in the testing set. The model was then run with the input variables of the external validation set. The predicted model results for 2019 and 2020 were compared with the analyzed mycotoxin data (per contamination level) in these two years, separately. | - | Confusion metrics, accuracy, and generalization ability | internal and external validation resulted in 0.90–0.99 prediction accuracy. | It can be concluded that the use of machine learning algorithms for mycotoxin prediction in risk levels at the regional level in Europe provides good prediction results. Such models can be used by collectors, traders, and food safety authorities for logistics in the wheat supply chain, improved mycotoxin control, and risk-based testing. | |||||||||||
[25] | Landsat 8 | Support Vector Regression (SVR), Partial Least Squares Regression (PLSR), and Artificial Neural Network (ANN) | soil copper (Cu) | soil samples were collected in this study area in 2015, and the Cu concentrations of samples were analyzed and recorded. | 138 soil samples with lab-measured Cu concentrations. | coefficient of determination (R2), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and standard error (SE) | The mean adjusted R2 obtained by SVR using 20 repeated 6-fold cross-validations on 138 soil samples increases from 0.433 to 0.641. The mean R2 of PLSR and ANN increase from 0.568 to 0.618 and from 0.476 to 0.528 separately, indicating the necessity and benefit of feature extraction and selection. Although ANN is a popular regression method, in our work, SVR outperforms ANN by achieving a mean R2 of 0.641, which is 21.4% higher than ANN. RMSE, MAE and SE also support the highest generalization capability of SVR. | The preferred model with the highest R2 obtained by SVR is selected to estimate the Cu concentration in soil over the study area. Compared to the interpolation map, the Cu concentration distribution map generated by the recommended pipeline gives the pixel-based Cu estimation with more spatial detail and wider spatial coverage. It also shows a consistent spatial pattern with the ground-truth land cover classification map. The results show this model’s ability to perform large-scale soil (HMC heavy metal contamination) mapping from widely available satellite imagery. | |||||||||||
[26] | Sentinel 1, Sentinel 2 | Random Forest (RF) | oil spill | Sample locations of oil-free sites that were not located in close proximity to the observed oil spill sites were selected. A buffer zone of 500 m was implemented around the spill areas to exclude all existing spill points from the in situ observations. | n = 553 each candidate sample point was classified into one of six thematic categories based on expert knowledge resulting from a number of on-site visits in 2019 and 2020. | Overall accuracy, Kappa, LI 95%, UI 95%, F1 scores, The user’s classification accuracy (UA), producer’s classification accuracy (PA). | lowest overall accuracies for Oil spill-I and Oil spill-II were 91.4% and 85.0%, respectively. | The mapping of terrestrial oil spills with freely available Sentinel satellite images may thus represent an accurate and efficient means for the regular monitoring of oil-impacted areas. Such tools can be used to create an open access database for oil mapping, which would enable indigenous communities to document oil pollution from the remote areas they inhabit and provide local communities, journalists, and civil society organizations with reliable proof of environmental damage. | |||||||||||
[27] | Landsat 8 | Pixel-based RNN system (Pix RNN), Pixel-based single-image NN system (Pix single), Pixel-based multi-image NN system (Pix multi), Patch-based single-image NN system (Patch Single), Patch-based multi-image NN system, Proposed patch-based RNN (PB-RNN) | Land cover classification | Obtained ancillary data from the Florida Cooperative Land Cover Map first and performed corrections by comparing it with GPS guided field observations and the high-resolution images from Google Earth. | A series of 23 Landsat 8 images were used in this study, to evaluate the proposed method on a test site within the Florida Everglades Ecosystem. | Overall accuracy (OA), Overall kappa(kappa), Error Matrix. | The proposed system achieves 97.21% classification accuracy while the pixel-based single-image NN system achieves 64.74%. the proposed system achieves 0.97 overall kappa while the pixel-based single-image NN (Neural Network) system achieves 0.58. | The classification results show that the proposed system achieves significant improvements in both the overall and categorical classification accuracy. | |||||||||||
[28] | Space Shuttle Endeavour (SRTM-1) | Random Forests (RF), Cubist, Linear Model, Support Vector Machine, K Nearest Neighbor (KNN) | Pb, Zn, Ba, Fe, Al, and Cr. | Collected at a depth of 0–10 cm. | 120 soil samples. | MAE, RMSE, R2. | Target | Best Method | MAE (mg kg−1) | RMSE (mg kg−1) | R2 (mg kg−1) | In general, the Cubist algorithm produced better results in predicting the contents of Pb, Zn, Ba and Fe compared to the other tested models. For the Al contents, the Support Vector Machine produced the best prediction. The methodology structure reported in this study represents an alternative for fast, low-cost prediction of PTEs in soils, in addition to being efficient and economical for monitoring potentially contaminated areas and obtaining quality reference values for soils. | |||||||
Lead | Cubist | 120.97 | 264.03 | 0.795 | |||||||||||||||
Zinc | Cubist | 76.33 | 193.11 | 0.801 | |||||||||||||||
Barium | Cubist | 21.24 | 30.820 | 0.55 | |||||||||||||||
Chrome | Cubist | 5.31 | 7.21 | 0.37 | |||||||||||||||
Iron | Cubist | 5179.49 | 9357.34 | 0.90 | |||||||||||||||
Aluminum | SVM | 2089.88 | 2809.74 | 0.84 | |||||||||||||||
[29] | PROBA-V | Random forest (RF), Support Vector Machine (SVM) | Cropland Suitability Assessment. | A total of 119 covariates were used per the individual prediction of yearly cropland suitability classes for soybean cultivation, consisting of 47 climates, 24 soil, 6 topographic and 42 vegetation covariates. | Samples, with a total of 119 covariates being utilized per yearly suitability assessment. | Accuracy assessment, R2, RMSE | Random forest (RF) mean overall accuracy of 76.6% to 68.1% for Subset A and 80.6% to 79.5% for Subset B. | RF produced superior suitability assessment results to SVM in cases of moderate sample count and a high amount of complex input covariates. The proposed method overcomes the limitations of the conventionally used GIS-based multicriteria analysis, and could turn the attention to machine learning in future cropland suitability determination studies. | |||||||||||
[30] | Sentinel 2 | Support Vector Machine (SVM), Random Forest (RF), Classification and Regression Trees (CART) | Land Cover | Total data included: Corn crop: 113 Sorghum crop: 547 Water bodies: 190 Land in recovery: 226 Urban areas: 66 Sandy areas: 117 Tropical rainforest: 237 Others: 170 | For the test of the classified maps, 30% of the sample points were used: 742 for the spring–summer season and 868 for the autumn–winter season | Overall accuracy (OA), Kappa index (KI) | The results in overall accuracy were 0.99% for the Support Vector Machine, 0.95% for the Random Forest, and 0.92% for classification and regression trees. The kappa index was 0.99% for the Support Vector Machine, 0.97% for the Random Forest, and 0.94% for classification and regression trees. | The area and seasons studied presented a high rate of humidity, which made the research difficult. On the other hand, the execution capacity of the Google Earth Engine platform proved to be effective in land use analysis and classification. The methods used for land use classification and crops of sorghum and corn were SVM, RF, and CART, which obtained different results. | |||||||||||
[31] | Sentinel 2, Landsat 8 | Partial Least Squares Regression (PLSR), extreme learning machine (ELM). | soil organic carbon | Two soil sampling campaigns (50 soil samples on October 2015 and 145 soil samples on March 2016) were operated to collect the surface soil samples (0–15 cm) using a grid soil sampling strategy with 130 m. | 195 surface soil samples were collected | (RMSE), R2, ratio of performance to interquartile range (RPIQ) | Hyperspectral images were successfully used to predict the SOC stock, SOC, and SBD through PLSR and ELM, while ELM (RPIQ = 2.03, 1.97, 1.64) outperformed PLSR (RPIQ = 1.83, 1.97, 1.53); Sentinel 2 images and ELM obtained the best prediction results (RPIQ = 1.45, 1.25, 1.26); | This study further confirmed the good prediction abilities of the time series multispectral remote sensing images in low relief farmland regions. Lastly, this mapping strategy can provide additional valuable information for agricultural management and carbon cycle. | |||||||||||
[32] | GAOFEN 1 | Neu-SICR algorithm | Surface soil moisture | In situ soil moisture values observed by probes at soil moisture observatories among the soil climate analysis network (SCAN) were traced and adopted. | 11 soil moisture observatories from the soil climate analysis network (SCAN) could be accessed. | Average relative error (ARE), the universal image quality index (UIQI) | ARE: 13.18 and UIQI: 0.3143. | The new algorithm enhances the temporal resolution of high spatial resolution remote sensing regional soil moisture observations with good quality and can benefit multiple soil moisture-based applications and research. | |||||||||||
[33] | GAOFEN 5 | Random Forest (RF), ExtraTrees (ET), Adaptive Boosting (ADB), Gradient Descent Boosting Trees (GDB), eXtreme Gradient Boosting (XGB) | Arsenic | Systematic grid sampling was conducted, and sampling locations were set based on a 40 m regular grid. In each sample location, the soil sample was filled with 250 mL wide-mouth sampling bottles, the sample locations were confirmed by real-time kinematic (RTK) mobile station positioning technology. | In the whole study area, a total of 976 topsoil samples (0–30 cm) were collected. | r, RMSE, MAE | RF also maintained a relatively higher level of accuracy (r = 0.56) when the sampling grids increased to 100 m, which was higher than that of GIMs under a 50 m sampling grid (r = 0.42). | This study demonstrates that machine learning based on satellite visible and near-infrared reflectance spectroscopy (VNIR) is a promising approach to map soil arsenic contamination at brownfield sites with high accuracy and low cost. The RF method was found to render the best performance (r = 0.78), reducing 30% of prediction errors compared with traditional GIMs. | |||||||||||
[34] | Worldview 2 | U-net Convolutional Neural Network (CNN) | Microplastics pollution | Surface soil (2 cm) were randomly collected from the selected area (50 cm × 50 cm) using a stainless shovel for each subsample. | 6 mixed samples were collected at each site, each mixed sample being composed of three subsamples, included mulching soil and non-mulching soil. | - | - | The results revealed that the abundance of MPs in soil mulched by dust-proof nets ranged from 272 to 13,752 items/kg. Large-sized particles (>1000 mm) made up a significant proportion (49.83%) of MPs in the study area. This study will highlight the understanding of soil MPs pollution and its potential environmental impacts for scientists and policymakers. It provides suggestions for decisionmakers to formulate effective legislation and policies, so as to protect human health and protect the soil and the wider environment. | |||||||||||
[35] | Landsat 4, Landsat 5 (TM) | Random Forest (RF), Extreme Gradient Boosting (XGBoost) | Soil pH | The full set contains soil profiles with the descriptions of geographic location, genetic horizon thickness, organic matter, pH, texture (particle-size distribution), total nitrogen, total phosphorus and bulk density. The pH was measured with a pH meter in a suspension of soil in water with a soil: water ratio of 1:2.5. | 4700 soil profiles were available from China’s Second National Soil Inventory. | coefficient of determination (R2), the Root Mean Squared Error (RMSE) and Lins’s Concordance Correlation Coefficient (CC) | The combined two models’ Root Mean Squared Error (RMSE) was an acceptable 0.71 pH units per point, and Lin’s Concordance Correlation Coefficient was 0.84. | This map can provide a benchmark against which to evaluate the impacts of changes in land use and climate on the soil’s pH, and it can guide advisors and agencies who make decisions on remediation and prevention of soil acidification, salinization and pollution by heavy metals, for which we provide examples for cadmium and mercury. | |||||||||||
[36] | SPOT 5 (Satellite Pour l’Observation de la Terre) | Random Forest (RF), geographically weighted regression (GWR) | zinc (Zn) | A sampling site was then randomly chosen in the grids during the sampling process. The geographical coordinates of the sampling sites were recorded using a GPS (global positioning system) receiver. The samples were collected from vegetated or exposed soils in parks, gardens, greenbelts, etc., and impervious areas were avoided. At each site, we collected approximately 1.5 kg soil samples (0–20 cm) using a shovel, from which plant residues and artificial deposits were removed. | 221 soil samples | Accuracy, R values, R2, RMSE. | The RF and GWR models were established using the key environmental covariates, with leave-one-out cross-validated R values of 0.68 and 0.58 and Root Mean Squared Errors of 0.51 and 0.57, respectively. | The results showed that urban functional type, geology, NDVI, elevation, slope, and aspect were key environmental covariates. Compared with land use types, urban functional types could better reflect the spatial variation in Zn. | |||||||||||
[37] | ZY-1-02D satellite | CART, MLP, SVM, Gaussian process regression (GPR), K-nearest neighbor (KNN), kernel ridge regression (KRR), AdaBoost. | Heavy metal (Cr, Cu, and As) | Based on remote sensing images, the distribution of farmland in the study area was determined, and the sampling points were set at one-kilometer intervals. Through field investigation, we adjusted the preset locations of sampling points and the sequence and route of sample collection | 81 soil samples. | R2, RMSE, RPD | For Cr, Cu, and As, the determination coefficients (R2) of the verification set were 0.66, 0.61, and 0.74, respectively for the AdaBoost model. | In summary, the Stacked AdaBoost ensemble learning model provides detailed and reliable data for agricultural ecological protection and industrial pollution control, allowing the effective management of heavy metal pollution sources. | |||||||||||
Target | R2c | EMSEc | R2p | EMSEp | RPD | ||||||||||||||
Cr | 0.73 | 3.71 | 0.66 | 4.52 | 2.06 | ||||||||||||||
Cu | 0.69 | 1.94 | 0.61 | 2.36 | 1.85 | ||||||||||||||
As | 0.87 | 0.73 | 0.74 | 0.95 | 1.72 | ||||||||||||||
[38] | Sentinel 2A | Partial Least Squares Regression (PLSR), backward propagation neural network (BPNN), Random Forest (RF) | The Cd, Pb, soil organic matter (SOM), pH, and Fe | A portion of the soil sample passing through a 100-mesh nylon sieve was used to determine the Cd, Pb, and Fe contents. Another part of the soil sample was passed through a 10- mesh nylon sieve to determine the SOM content and pH value. Cd and Pb contents were measured by inductively coupled plasma–mass spectrometry. | 640 samples from the surface soils | R2 NRMSE RPD | Relatively satisfactory estimates of Cd and Pb contents in farmland of the study area (maximum R2val (determination coefficient of the validation set) = 0.60 for Cd and R2val = 0.63 for Pb) were obtained. | The results of the study provide a theoretical basis and methodological reference for the rapid prediction of Cd and Pb contents in regional farmland. | |||||||||||
Target | Best Method | R2 | NRMSE | RPD | |||||||||||||||
Cd (Original images) | RF | 0.46 | 0.101 | 1.74 | |||||||||||||||
Cd (Unmixed images) | RF | 0.50 | 0.098 | 1.80 | |||||||||||||||
Pb (Original images) | RF | 0.52 | 0.066 | 1.82 | |||||||||||||||
Pb (Unmixed images) | RF | 0.57 | 0.062 | 1.94 | |||||||||||||||
Cd (Original images) Double data images | RF | 0.55 | 0.093 | 1.89 | |||||||||||||||
Cd (Unmixed images) | RF | 0.60 | 0.088 | 2.01 | |||||||||||||||
Pb (Original images) Double data images | RF | 0.60 | 0.060 | 2.01 | |||||||||||||||
Pb (Unmixed images) Double data images | RF | 0.63 | 0.057 | 2.10 | |||||||||||||||
[39] | Gaofen 5, PRISMA | Convolutional Neural Network (CNN), Random Forest (RF), and Support Vector Machine (SVM) | Plastics polyethylene (PE), polypropylene (PP), polyvinyl chloride (PVC), polyethylene terephthalate (PET) and polystyrene (PS), some important varieties of industrial plastics types such as acrylonitrile butadiene styrene (ABS), ethylene vinyl acetate (EVA), polyamide (PA), polycarbonate (PC), and polymethyl methacrylate (PMMA). | Different samples with varying optical properties (color, brightness, transmissivity) have been selected for each plastic type. | Over 3000 samples were collected within the three formers mentioned spectral libraries. | Recall, precision, F1-score, overall accuracy (OA), Kappa. | The performance of the three (Satellite, airborne and laboratory) models is roughly balanced for the validation of the spectral data with an overall accuracy of 97%, 96%, and 95% for the CNN, RF, and SVM, models respectively. In principle, it can be stated that the RF classifier produced very good and reliable results for the data of both sensors. | The RF was used to classify the ten types of plastics in GF-5 and PRISMA satellite recordings of the same area. In comparison of both sensor systems, the RF produced high quality and transferable results for detecting plastic mainly related to greenhouses, sport fields, photovoltaic constructions and industrial sites. | |||||||||||
[40] | Terra, Aqua | Apriori algorithm | dust | - | - | - | The accuracy of the identified SDSs was estimated at 83.7% using the verification points. | The results revealed that Apriori’s ability to provide generalizable association rules is a robust algorithm for Data-Driven Soil Mapping (DSSM). | |||||||||||
[41] | Sentinel 2 | Random Forest (RF) | Land cover classification | Different NBS actions were simulated based on the reforestation of specific areas and were compared with the actual situation. | - | Error matrix, The Error of Commission (EC), Error of Omission (EO), Producer Accuracy (PA), User Accuracy (UA). | Land cover change in the Umia Basin was successfully mapped with Sentinel-2 images with an OA of 77% and 83%. | It has been proven that the use of reforestation upstream only is almost as beneficial as reforestation in the entire catchment and is economically more viable. This confirms that the methodology used reduces flood hazard, despite the territorial complexity, facilitating decision making on the use of NBS. | |||||||||||
[42] | Landsat 8 | Random Forest (RF), and Cubist. | Heavy metals (Ni, Fe, Cu, Mn) | Surface layers (0–20 cm depth). | 346 soil samples | R2, RMSE | Target | Best Method | EMSE | R2 | This study proved the high capability of machine learning methods to use easily available environmental data to predict studied heavy metals in the large scale that are essential for decision making in sustainable management in agricultural and environmental concerns. | ||||||||
Fe | RF, cubist | 0.20 | 0.73 | ||||||||||||||||
Mn | Cubist | 0.19 | 0.55 | ||||||||||||||||
Cu | RF | 0.19 | 0.60 | ||||||||||||||||
Ni | RF | 0.15 | 0.67 | ||||||||||||||||
[43] | Sentinel 2, CRYOSAT 2, Jason 1. | LSTM model (deep learning), RNN, CNN. | Heavy metal pollution (copper). | Selected 19 different types of data, including basic geological data and anomaly data, as experimental datasets. | As the 19 types of data are input into the stacked LSTM model. | Overall accuracy (OA). | The validation dataset includes a total of 31 copper mineral occurrences, of which 9 are classified as grade IV and 17 are classified as grade V, accounting for 83.87% of all mineral occurrences. | Using the optimized stacked LSTM model to integrate multisource geological features and mine the internal rules of feature information has a positive effect on improving the risk assessment of heavy metal pollution. | |||||||||||
[44] | Landsat 8 | Random Forest (RF), generalized boosting methods (GBM), generalized linear models (GLM) | Origin of trace metals (Na, Mg, Al, Si, P, S, K, Ca, Ti, V, Cr, M, Fe, Co, Ni, Cu, Zn, Ga, As, Br, Rb, Sr, Y, Zr, Nb, Mo, Cd, Cs, Ba, La, Ce, Pr, Nd, Hf, W, Pb, Th, U, Soil Organic Carbon) | Soil samples were collected from the topsoil (0–5 cm). The sampling depth of 0 to 5 cm guarantees that only surface materials that potentially may be displaced by surface runoff are sampled. | 79 soil samples having different Nemerow index values were considered for spatial modelling. | Receiver Operating Characteristic (ROC), Area Under the Curve (AUC), And OA. | RF had the best performance with an accuracy of 83%. The evaluation of polluted soil areas showed that the landforms ‘steep hills’ and ‘valley’ contributed the most with 51%and 27%in the riparian zone, respectively. The landform ‘plain’ had the highest contribution (28%) in sediment yield with a GOF of 0.72 in early-winter events. | Overall, the new proposed approach enables to better trace the origin of suspended sediments and trace elements discharge into the river environment. | |||||||||||
[45] | Landsat 7, Landsat 8 | Cubist, Random Forest (RF). | Toxic elements (PTEs) (As, Cr, Cu, Ni, Pb and Zn) and modified pollution index (MPI). | To have an accurate estimation, a stratified simple random sampling method based on a grid of 400 ha was chosen. Since there was no heterogeneity in landform, geology, vegetation, land management, etc., the grid lines were assumed as the strata boundaries | 129 surficial soil samples. | r, RMSE, bias, CCC, Enrichment factors (EFs) of PTEs and the Modified Pollution Index (MPI). | Calibration dataset (in the bag) R2 for all the heavy metals between 0.84–0.8, Cubist 0.19–0.45. | The results showed that Random Forests performed well in estimating EFs of several PTEs. Spectral indices using NIR and SWIR bands were key to predict these PTEs and MPI. The digital maps demonstrated that the study area was enriched with As, Cu and Pb at moderate to significant levels. | |||||||||||
Target | Best Method | R2 | Concordance | NRMSE | Bias | ||||||||||||||
AS | RF | 0.25 | 0.30 | 0.019 | 0.03 | ||||||||||||||
Cr | RF | 0.23 | 0.27 | 0.002 | 0.00 | ||||||||||||||
Cu | RF | 0.20 | 0.39 | 0.014 | 0.04 | ||||||||||||||
Ni | RF | 0.21 | 0.25 | 0.004 | 0.00 | ||||||||||||||
Pb | RF | 0.28 | 0.31 | 0.006 | 0.02 | ||||||||||||||
Zn | RF | 0.23 | 0.28 | 0.002 | 0.00 | ||||||||||||||
MPI | RF | 0.27 | 0.37 | 0.014 | 0.00 | ||||||||||||||
[46] | Gaofen 5 | Random Forest (RF), the extreme learning machine (ELM), the Support Vector Machine (SVM), the back-propagation neural network (BPNN) | Soil heavy metals (Zn, Ni, and Cu) | The sampling route was arranged according to FOREGS Geochemical Mapping Field Manual | 110 topsoil samples | R2, RMSE, MAE | The estimation accuracy was significantly improved by using the Decision Stump algorithm. | This paper revealed that the GF-5 can be one of the reliable satellite’s hyperspectral imageries for mapping soil heavy metals | |||||||||||
Target | Best Method | R2 | RMSE (Mg kg−1) | MAE (Mg kg−1) | |||||||||||||||
Zn | RF | 0.77 | 9.54 | 7.39 | |||||||||||||||
Ni | RF | 0.62 | 3.39 | 2.56 | |||||||||||||||
Cu | ELM | 0.56 | 5.02 | 3.73 | |||||||||||||||
[47] | Sentinel 1A, Alos Palsar I (SAR), Sentinel 2A | Classification and regression tree (CART), Artificial Neural Network (ANN), Random Forest (RF), k-nearest neighbors (kNN). | Pollution by urban influences on Inland Marsh | Samples from fieldwork that took place between 12/01/2018 to 12/04/2018. To collect the samples, two Global Navigation Satellite System (GNSS) Ruide R90-X dual-frequency (L1/L2) receivers were used. | 450 samples of the wet meadow. | overall accuracy (OA) producer’s accuracy (PA), user’s accuracy (UA). | The results showed that the method with the highest overall accuracy was k-NN, with 98.5%. The accuracies for the RF, ANN, and CART methods were 98.3%, 96.0% and 95.5%, respectively. The four classifiers presented accuracies exceeding 95%, showing that all methods have potential for inland marsh delineation. | CART and ANN methods presented the largest variations of the overall accuracy (OA) in relation to the different parameters tested. | |||||||||||
[48] | Terra, Aqua (MODIS) | Cubist | Soil loss | In the spatial modelling as well as the performance of the model using the samples not included in the bootstrap, i.e., the out-of-bag (OOB) samples. | 100 bootstrap samples to assess the uncertainties | R2, RMSE | Estimate the average erosion rate in Australia to be 4.16 t ha_1 y_1, and the total amount of annual soil loss to be 2788 × 106 tones. | Estimation of erosion are generally smaller than previous continental estimates using the Revised Universal Soil Loss Equation (RUSLE), but particularly in croplands, which might indicate that soil conservation practices effectively reduced erosion in Australia. | |||||||||||
Target | R2 | RMSE | |||||||||||||||||
Cross Validation Statistics | 0.68 | 0.38 | |||||||||||||||||
Out of bag statistics | 0.69 | 0.02 | |||||||||||||||||
Test Set Statistics | 0.71 | 0.01 | |||||||||||||||||
[49] | Earth Observing-1 (EO-1) | Regression Train/Test: (Partial Least Squares Regression (PLSR), Back Propagation Neural Network (BPNN), Random Forest (RF), K-Nearest Neighbors (KNN).) High-Risk Classification: (Support Vector Machine (SVM), Random Forest Classification (RFC), Multi-Layer Perceptron (MLP) | Arsenic | Publicly available data for soil arsenic concentration in the United States between 2005 and 2020 for both aforementioned land covers (in mg/kg) were processed for the top layer of soil (0–5 cm depth). | A total of 55 bare soil arsenic concentration values, with concentrations between 1.4 mg/kg and 380 mg/kg, were used for regression analysis | Accuracy, F1-Score, F2-Score, F0.5-Score, Brier Score, R2, RMSE | Target | R2 | NRMSE | These results strongly indicate that soil arsenic contamination can be detected with Hyperion satellite hyperspectral data when combined with preprocessing and machine learning. | |||||||||
SD + (PLSR) | 0.623 | 0.194 | |||||||||||||||||
SD + (BPNN) | 0.726 | 0.144 | |||||||||||||||||
SD + (RF) | 0.746 | 0.136 | |||||||||||||||||
SD + (KNN) | 0.715 | 0.178 | |||||||||||||||||
GA + SD + (PLSR) | 0.668 | 0.162 | |||||||||||||||||
GA + SD + (BPNN) | 0.704 | 0.173 | |||||||||||||||||
GA + SD + (RF) | 0.805 | 0.132 | |||||||||||||||||
GA + SD + (KNN) | 0.693 | 0.171 | |||||||||||||||||
DA + GA + SD + RF | 0.840 | - | |||||||||||||||||
Comparisons of the evaluation metrics of the three binary classification ML models for the averaged swath data. | |||||||||||||||||||
Model | Accuracy | F1-Score | F2-Score | F0.5-Score | Brier Score | ||||||||||||||
SVM | 0.647 | 0.688 | 0.658 | 0.751 | 0.272 | ||||||||||||||
RFC | 0.639 | 0.678 | 0.649 | 0.737 | 0.252 | ||||||||||||||
MLP | 0.693 | 0.728 | 0.704 | 0.772 | 0.279 | ||||||||||||||
[50] | Landsat 7 (ETM+) | Cubist (Cu), Random Forest (RF), Regression Tree (RT), Multiple Linear Regression (MLR). | Soil organic carbon (SOC), calcium carbonate equivalent (CCE), and clay content. | Total of 334 soil samples were collected from 0 to 30 cm depth. | 334 soil samples | RMSE, R2 and RMSE%. | According to the RMSE and R2, Cu and RF resulted in the most accurate predictions for CCE and clay contents respectively, while both of RF and Cu models showed the highest performance to predict SOC content | Results showed that remote sensing covariates (Ratio Vegetation Index and band 4) were the most important variables to explain the variability of SOC and CCE content, but only topographic attributes were responsible for clay content variation. | |||||||||||
Soil Properties | Best Model | RMSE Validation | R2 Validation | RMSE Calibration | R2 Calibration | ||||||||||||||
SOC | Cu and RF | 0.34 | 0.55 | 0.14 | 0.93 | ||||||||||||||
CCE | RF | 9.96 | 0.23 | 4.56 | 0.89 | ||||||||||||||
Clay | RF | 7.86 | 0.15 | 3.53 | 0.92 | ||||||||||||||
[51] | Sentinel 2, Landsat 8 | Neural Networks (NNs) and Random Forests (RFs). | Crop productivity | GPP data directly from the principal Investigators of the sites, and integrated half-hourly data to daily GPP values, which were then used as the reference value for the validation of our GPP model. | - | R2, RMSE | GPP data. Our final neural network model is able to estimate GPP at the tested flux tower sites with r2 of 0.92 and RMSE of 1.38 g C d−1m−2, which outperforms empirical models based on vegetation indices. | The model successfully estimates gross primary productivity (GPP) across a variety of C3 crop types and environmental conditions even though it does not use any local information from the corresponding sites. This highlights its potential to map crop productivity from new satellite sensors at a global scale with the help of current Earth observation cloud computing platforms. | |||||||||||
Model | Properties | GPP R2 Test | LAI R2 Test | GPP R2 Val | GPP RMSE val. | ||||||||||||||
NN (hidden layers) | (20,12) | 0.92 | 0.62 | 0.88 | 1.38 | ||||||||||||||
NN (hidden layers) | (40,20,12) | 0.95 | 0.68 | 0.91 | 1.41 | ||||||||||||||
RF (settings) | SW v.1 | 0.90 | 0.51 | 0.87 | 1.70 | ||||||||||||||
RF (settings) | SW v.2 | 0.90 | 0.51 | 0.89 | 1.58 | ||||||||||||||
SW v.1, increased the sample weight of data points with GPP below 2 μmol CO2 m−2 s −1 to 2. SW v.2 d for the settings SW v.2 to 20. | |||||||||||||||||||
[52] | Sentinel 2, Sentinel 3 | Data Mining Sharpener (DMS) based on an ensemble of decision-tree regressors. | Evaporation | land-cover map was based on Corine Land Cover (CLC) 2012 version 18.5, downloaded from the Copernicus Land Monitoring Service and meteorological data, which in this study are obtained from the ERA-Interim reanalysis data set produced by the European Centre for Medium-Range Weather Forecasts (ECMWF). | - | RMSE, Bias, CV, r. | The correlation between the sensible heat fluxes is significantly lower (0.67) which leads to the correlation between EF estimates to lie between the two values (0.79). It is not immediately clear why the correlation of H is so much lower than that of the other fluxes (correlations of G and Rn are 0.99 and 0.90 respectively). | The results show that the fluxes derived with sharpened thermal data are of acceptable accuracy (relative error lower than 20%) and provide more information at flux-tower footprint scale than the corresponding low-resolution fluxes. | |||||||||||
[53] | Landsat 5, Landsat 8. | Random Forest (RF) classifier. | Soil salinity | As ground truth we used the WoSIS Soil Profile Database, which is maintained by ISRIC—World Soil Information and includes over 100,000 georeferenced soil profiles. For the study the upper layer of soil profiles were selected for which electrical conductivity (ECe) values are available. The thickness of this layer varied from 0 to 5 cm to 0–60 cm. | In total, 15,188 data points were selected and used in further analysis | confusion matrix, overall accuracy, user’s accuracy, producer’s accuracy | The validation accuracy of the resulting maps was in the range of 67–70%. | It concludes that combining soil properties maps and thermal infrared imagery allows mapping of soil salinity development in space and time on a global scale | |||||||||||
[54] | Landsat 8, Sentinel 2 | ANN | Soil erosion | Sampling locations were carefully selected on the basis the most representative land cover, and overall conditions of the topsoil (about 0–20 depth). | 30 surface soil samples | R2, RMSE, ordinary least square regression (OLSR) and geographical weighted regression (GWR) | The high corresponding R2 values (67%) for OLSR denoted the potential of field spectroscopy to describe soil health effectively. | The derived maps captured successfully the SOM, the CaCO3, and the K-factor spatial distribution in the GIS environment. The results may contribute to the design of erosion best management measures and wise land use planning in the study region. Finally, the results highlighted the fact that the terrain morphology is absolutely related to soil erosion rates rather than SOM values that cannot successfully describe the soil erosion regime. | |||||||||||
Soil Erosion Parameter | Satellite Type | Mean RMSE | Mean R2 | ||||||||||||||||
SOM | Landsat 8 | 0.58 | 0.87 | ||||||||||||||||
CaCO3 | Landsat 8 | 8.08 | 0.79 | ||||||||||||||||
K-factor | Landsat 8 | 0.0095 | 0.6 | ||||||||||||||||
SOM | Sentinel-2 | 0.58 | 0.87 | ||||||||||||||||
CaCO3 | Sentinel-2 | 7.1 | 0.82 | ||||||||||||||||
K-factor | Sentinel-2 | 0.0093 | 0.59 | ||||||||||||||||
[55] | Landsat 8, Sentinel 2 MSI | Cubist model | Soil salinization | At each sampling point, four topsoil samples were collected and mixed (from 0 to 20 cm) using a soil drill. In the meantime, a portable GPS (UniStrong G120, positioning accuracy ≤ 5 m) was used to record the geographic locations. | 64 topsoil samples in an arid desert region, the Ebinur Lake Wetland National Nature Reserve (ELWNNR) | R2, RMSE, NRMSE, RPD, RPIQ | The results showed that the measured soil salinity had a significant correlation with surface soil moisture (Pearson’s r = 0.75). The introduction of TCW generated satisfactory estimating performance. Compared with OLI dataset, the combination of MSI dataset and Cubist model yielded overall better model performance and accuracy measures (R2 = 0.912, RMSE = 6.462 dSm−1, NRMSE = 9.226%, RPD = 3.400 and RPIQ = 6.824, respectively) | The differences between Landsat 8 OLI and Sentinel-2MSI were distinguishable. MSI image with finer spatial resolution performed better than OLI. Combining RS data sets and their derived TCW within a Cubist framework yielded accurate regional salinity map. The increased temporal revisiting frequency and spectral resolution of MSI data are expected to be positive enhancements to the acquisition of high-quality soil salinity information of desert soils. | |||||||||||
[56] | Terra (Aster), Earth Observing-1 (EO-1) (Hyperion), Sentinel 2A, Landsat 8. | Artificial Neural Networks (ANN), Stepwise Multiple Linear Regression (SMLR) and PLSR. | Chromium (Cr) | The samples’ mineralogy and Cr concentration were determined and were then subjected to laboratory reflectance spectroscopy in the range of Visible–Near Infrared–Shortwave Infrared (VNIR–SWIR: 350–2500 nm). | 120 soil samples | R2, RMSE, RPD | Using satellite images, SD-SMLR provided the best prediction models with R2 values of 0.61 and 0.53 for Hyperion and Sentinel-2A, respectively. | This study’s findings indicated that applying the best prediction models obtained by spectroscopy to the selected wavebands of Hyperion and Sentinel-2A satellite imagery could be considered a promising technique for rapid, cost-effective and eco-friendly assessment of Cr concentration in highly heterogeneous mining areas. | |||||||||||
Model | Satellite | R2 | RMSE | ||||||||||||||||
SD-ANN | Aster | 0.21 | 48.81 | ||||||||||||||||
SD-ANN | Hyperion | 0.61 | 25.82 | ||||||||||||||||
SD-ANN | Sentinel-2A | 0.33 | 36.29 | ||||||||||||||||
SD-ANN | Landsat 8-OLI | 0.24 | 45.06 | ||||||||||||||||
SD-SMLR | Aster | 0.31 | 42.63 | ||||||||||||||||
SD-SMLR | Hyperion | 0.68 | 23.61 | ||||||||||||||||
SD-SMLR | Sentinel-2A | 0.53 | 34.51 | ||||||||||||||||
SD-SMLR | Landsat 8-OLI | 0.45 | 40.58 | ||||||||||||||||
SD-PLSR | Aster | 0.22 | 45.37 | ||||||||||||||||
SD-PLSR | Hyperion | 0.54 | 29.11 | ||||||||||||||||
SD-PLSR | Sentinel-2A | 0.31 | 40.58 | ||||||||||||||||
SD-PLSR | Landsat 8-OLI | 0.24 | 41.55 | ||||||||||||||||
[57] | Sentinel 2 | Support Vector Machine Regression (SVMR) | Soil Organic Carbon (SOC), Soil texture | The soil samples were taken at 0–10 cm depth as composite samples over an area of 6×6 m, air-dried, ground and sieved (≤2 mm) and thoroughly mixed before analyzing (ISO 11464:2006). | 200 soil samples were collected using conditioned Latin Hypercube Sampling (cLHS) stratified random strategy | RMSEcv, RMSEp, RPD, Bias | The statistical accuracy attained using the LUCAS library was low and only the clay estimation model using Hyperion data showed suitable prediction accuracy with RMSE = 7.98 and RPD = 1.62. However, in PONMAC dataset, Sentinel- 2 simulated data provided the best results among the imagers for all properties except for silt. | The SOC maps also confirmed that in areas with a high level of SOC, Sentinel-2 was able to detect SOC more precisely than the airborne sensors. However, a decrease in the model and map performances was clear in the case of parameters with low contents. The study also emphasized the importance of the super spectral Sentinel-2 data in soil characteristic assessments with a frequent revisit-time over larger areas than is currently done with laboratory and airborne instruments. | |||||||||||
Properties | RMSEcv | RMSEp | RPD | Bias | |||||||||||||||
SOC | 0.14 | 0.14 | 1.60 | 0.03 | |||||||||||||||
Clay | 2.87 | 3.05 | 1.27 | −0.44 | |||||||||||||||
Silt | 5.71 | 6.28 | 1.13 | −1.89 | |||||||||||||||
Sand | 5.93 | 8.22 | 1.02 | −0.95 | |||||||||||||||
[62] | Landsat 8 OLI | Convolutional Neural Networks (CNNs), Support Vector Machines (SVMs), Random Forest (RF), eXtreme Gradient Boosting (XGBoost) | Soil health indicators, including soil fauna, soil microbes, and soil organic matter (SOM) | Data collected from various sources including soil sample extractions, cultivation in controlled environments, and publicly available image databases like ImageNet | - | Accuracy, classification accuracy, R2 value | XGBoost model achieving nearly 100% accuracy for nematode strain Steinernema feltiae NY, with other strains above 97% accuracy.—Random Forest (RF) model trained on 168 soil samples achieved a linear relationship (r value) of 0.74 in SOM prediction using LIFS data.—PLS regression model demonstrating an R2 value of 0.84 for SOM prediction. | The study highlights the integration of ML and computer vision techniques with digital imaging and spectroscopy for soil health assessment. It demonstrates the potential of these technologies to enhance the accuracy and efficiency of soil health monitoring, emphasizing the need for comprehensive data sets and addressing challenges like environmental variability. | |||||||||||
[63] | Landsat 8/9 | Stepwise Multiple Linear Regression (SMLR), Random Forest (RF) | Co, Cr, Cu, Fe, Mn, Ni, Pb, Zn | Samples collected from topsoil layers (0.3 m) of 19 soil pedons in the harrats arid region, Saudi Arabia | 19 soil pedons | R2, RMSE, NRMSE | SMLR: Mean R2 varied between 0.38 (Zn) and 0.54 (Cu) with NRMSEs between 18.53% (Zn) and 26.03% (Cr). RF: Mean R2 ranged from 0.17 (Ni) to 0.40 (Cu) with NRMSEs between 19.15% (Co) and 27.76% (Mn) | The study demonstrated the capacity of SMLR to use environmental covariates (ECOVs) to predict heavy metals (HMs) concentrations and generate background levels. SMLR performed better than RF in predicting HMs. The established background levels are important for future environmental pollution and monitoring studies in the harrats arid region. | |||||||||||
[69] | GaoFen-5 (GF-5) | Stacking model (PLSR, RFR, SVR) | Cd, As, Pb, Cu, Zn | Samples collected from topsoil layers (0–30 cm) using a grid pattern of 30 m by 30 m, totaling 415 samples | 415 soil samples | R2, RMSE | Cd: R2 = 0.65, RMSE = 0.29; As: R2 = 0.60, RMSE = 5.19; Pb: R2 = 0.78, RMSE = 37.19; Cu: R2 = 0.85, RMSE = 4.76; Zn: R2 = 0.81, RMSE = 37.32 | The study confirms that integrating geographical environmental factors (GEFs) into the SHMC prediction model significantly improves prediction accuracy. The Stacking model demonstrated higher accuracy compared to single models, with notable performance improvements for Cd and As. The research emphasizes the potential of advanced hyperspectral remote sensing technology in environmental monitoring. | |||||||||||
[72] | Pleiades | Deep Forest Algorithm | Tree counting | Satellite imagery from Pleiades for Kulon Progo district, Yogyakarta, Indonesia | - | F1 Score, Recall, Precision | Experiment 1: F1 = 0.760, Recall = 0.743, Precision = 0.778; Experiment 2: F1 = 0.774, Recall = 0.792, Precision = 0.756; Experiment 3: F1 = 0.779, Recall = 0.789, Precision = 0.769 | The study successfully applied the Deep Forest algorithm to count trees using Pleiades satellite imagery. The best F1 score achieved was 0.779, indicating the algorithm’s potential for accurate tree counting. | |||||||||||
[64] | Landsat 8 OLI, NOAA, ASTER-GDEM | Random Forest (RF) | Lead (Pb) | 304 soil samples collected using a 2 × 2 km grid pattern, combined with multisource geographic data including historical and current satellite images | 304 soil samples | R2, RMSE, RPIQ | R2 = 0.85, RMSE = 0.80 mg/kg, RPIQ = 4.09 | The study developed a mapping method for soil potentially toxic elements (PTEs) using temporal–spatial–spectral (TSS) covariates combined with a Random Forest model. The model achieved high accuracy, demonstrating the importance of incorporating temporal parameters into soil PTE mapping for better environmental risk assessment and soil management. | |||||||||||
[59] | Sentinel-1, Sentinel-2 |
|
| Images from three different sensors, pre-processing, feature extraction, 10-fold cross-validation, test dataset | training and validation: 131 polygons, test data: 17 polygons |
|
| The study demonstrated that the CNN model, when trained with a comprehensive dataset, provides superior accuracy for land use and land cover classification in semi-arid Mediterranean areas. However, the model’s performance varies across different classes, highlighting the need for balanced training datasets to avoid overfitting. | |||||||||||
[65] | Landsat 8 OLI | Random Forest (RF), Gradient Boosting Machine (GBM), Multi-layer Perceptron (MLP) | Soil salinity, Electrical conductivity (EC) | Samples collected using TDR-350 device for measuring EC, moisture, and temperature of soil, with 177 points collected around Maharloo Lake, 70% for training and 30% for testing | 177 | R2, RMSE | GBM R2 = 0.89, RMSE = 0.63; RF R2 = 0.85, RMSE = 0.71; MLP R2 = 0.75, RMSE = 0.88 | The GBM model showed the best performance in predicting soil salinity, with the RF model also performing well, while the MLP model showed the worst performance. This model is highly effective for monitoring and managing soil salinity, particularly in arid and semi-arid regions. | |||||||||||
[60] | Sentinel-2, Planet Lab SuperDove (Synthetic SuperDove, SSD, and Actual SuperDove, ASD) | Random Forest Regression (RF), Support Vector Regression (SVR), Linear Regression, K-Nearest Neighbors (KNN), Decision Tree (DT) | Geological formations, Land cover types, Vegetation indices (NDVI, NDWI) | Ground truth data were collected using a combination of direct field observations and existing geological survey data. This involved the use of handheld GPS devices to mark the exact locations for sample collection, ensuring the accuracy of the spatial data used for training and validating the machine learning models. The process also included the use of various sensors to measure specific environmental parameters, which were then correlated with the satellite imagery to enhance model training. | N/A |
| Best Performance Results:
| The study demonstrated that the Random Forest algorithm, combined with Sentinel-2 and Planet Lab SuperDove imagery, provides highly accurate geological information extraction. This method proved superior to other machine learning models, offering significant potential for applications in geological mapping and environmental monitoring. | |||||||||||
[61] | Sentinel-2, PlanetScope (Dove satellites) | Random Forest Algorithm | Aboveground biomass, spectral reflectance | The study used a combination of destructive sampling, C-Dax pasture meters, and rising plate meters (RPM) to gather ground truth data across ten farms. The destructive sampling method involved physically harvesting the biomass from specific plots, while C-Dax pasture meters and RPM provided non-destructive measurements through reflectance and height estimations, respectively. | Estimated 12,000 field datasets | R2, RMSE (Root Mean Squared Error), MAE (Mean Absolute Error) | Detailed Best Performance Results:
| The integration of Sentinel-2 and Planet SuperDove imagery with a Random Forest algorithm significantly improved the accuracy of pasture biomass estimation. This enhanced model supports more effective pasture management, especially in regions with frequent cloud cover, by providing timely and accurate biomass assessments. | |||||||||||
[70] | PRISMA | Random Forest (RF), Support Vector Regression (SVR), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU) | Chlorophyll-a (Chl-a) concentrations | Data from three sub-alpine lakes, complemented by low-resolution Chlorophyll-a concentration maps | Data from three sub-alpine lakes, complemented by low-resolution Chlorophyll-a concentration maps | Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) |
| The study demonstrated that the SVR model with standard scaling and PCA achieved the best performance in predicting Chlorophyll-a concentrations from PRISMA hyperspectral imagery. Enhancements in spatial resolution from Sentinel-3 to PRISMA were successfully achieved, though models tended to underestimate high Chl-a concentrations, suggesting the need for additional PRISMA data acquisitions to improve accuracy. | |||||||||||
[66] | PlanetScope (Dove satellites) | Support Vector Machines (SVMs), Decision Trees (DTs), Random Forests (RFs), Normal Bayes (NB), Artificial Neural Networks (ANNs) | Land use and land cover (LULC) | Empirical experiments, data representation, and pre-processing of satellite images | 105 geo-referenced images | Precision, Recall, F-score, Kappa index | ANN classification accuracy:
| ANN achieved the highest accuracy for LULC classification, demonstrating the effectiveness of integrating multi-spectral satellite imagery with ML algorithms in Egypt. |
References
- Jiang, N.; Wang, Y.; Chu, J.; Kawasaki, S.; Tang, C.; Cheng, L.; Du, Y.; Shashank, B.S.; Singh, D.N.; Han, X.; et al. Bio-mediated soil improvement: An introspection into processes, materials, characterization and applications. Soil Use Manag. 2022, 38, 68–93. [Google Scholar] [CrossRef]
- Havugimana, E.; Bhople, B.; Kumar, A. Soil Pollution-Major Sources and Types of Soil Pollutants Integrated Nutrient Management View Project Cluster Frontline Demonstration on Pulses View Project. 2017. Available online: https://www.researchgate.net/publication/321526846 (accessed on 20 April 2024).
- Lallas, P.L. The Stockholm Convention on Persistent Organic Pollutants. Am. J. Int. Law. 2001, 95, 692–708. [Google Scholar] [CrossRef]
- Askari, M.S.; O’Rourke, S.M.; Holden, N.M. Evaluation of soil quality for agricultural production using visible–near-infrared spectroscopy. Geoderma 2015, 243–244, 80–91. [Google Scholar] [CrossRef]
- Slessarev, E.W.; Lin, Y.; Bingham, N.L.; Johnson, J.E.; Dai, Y.; Schimel, J.P.; Chadwick, O.A. Water balance creates a threshold in soil pH at the global scale. Nature 2016, 540, 567–569. [Google Scholar] [CrossRef] [PubMed]
- Hu, B.; Chen, S.; Hu, J.; Xia, F.; Xu, J.; Li, Y.; Shi, Z. Application of portable XRF and VNIR sensors for rapid assessment of soil heavy metal pollution. PLoS ONE 2017, 12, e0172438. [Google Scholar] [CrossRef] [PubMed]
- Horta, A.; Malone, B.; Stockmann, U.; Minasny, B.; Bishop, T.; McBratney, A.; Pallasser, R.; Pozza, L. Potential of integrated field spectroscopy and spatial analysis for enhanced assessment of soil contamination: A prospective review. Geoderma 2015, 241–242, 180–209. [Google Scholar] [CrossRef]
- Liu, L.; Li, W.; Song, W.; Guo, M. Remediation techniques for heavy metal-contaminated soils: Principles and applicability. Sci. Total Environ. 2018, 633, 206–219. [Google Scholar] [CrossRef]
- Jia, X.; O’Connor, D.; Shi, Z.; Hou, D. VIRS based detection in combination with machine learning for mapping soil pollution. Environ. Pollut. 2021, 268, 115845. [Google Scholar] [CrossRef]
- Shi, T.; Guo, L.; Chen, Y.; Wang, W.; Shi, Z.; Li, Q.; Wu, G. Proximal and remote sensing techniques for mapping of soil contamination with heavy metals. Appl. Spectrosc. Rev. 2018, 53, 783–805. [Google Scholar] [CrossRef]
- Tan, K.; Ma, W.; Chen, L.; Wang, H.; Du, Q.; Du, P.; Yan, B.; Liu, R.; Li, H. Estimating the distribution trend of soil heavy metals in mining area from HyMap airborne hyperspectral imagery based on ensemble learning. J. Hazard. Mater. 2021, 401, 123288. [Google Scholar] [CrossRef]
- Ball, J.E.; Anderson, D.T.; Chan, C.S. Comprehensive survey of deep learning in remote sensing: Theories, tools, and challenges for the community. J. Appl. Remote Sens. 2017, 11, 42609. [Google Scholar] [CrossRef]
- Fang, K.; Pan, M.; Shen, C. The Value of SMAP for Long-Term Soil Moisture Estimation With the Help of Deep Learning. IEEE Trans. Geosci. Remote Sens. 2019, 57, 2221–2233. [Google Scholar] [CrossRef]
- Sherstinsky, A. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network. Phys. D 2020, 404, 132306. [Google Scholar] [CrossRef]
- Yuan, Q.; Shen, H.; Li, T.; Li, Z.; Li, S.; Jiang, Y.; Xu, H.; Tan, W.; Yang, Q.; Wang, J.; et al. Deep learning in environmental remote sensing: Achievements and challenges. Remote Sens. Environ. 2020, 241, 111716. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Cho, K.; van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. arXiv 2014, arXiv:1406.1078. Available online: http://arxiv.org/abs/1406.1078 (accessed on 12 April 2024).
- Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF Models for Sequence Tagging. arXiv 2015, arXiv:1508.01991. Available online: http://arxiv.org/abs/1508.01991 (accessed on 14 March 2024).
- Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
- Tricco, A.C.; Lillie, E.; Zarin, W.; O’brien, K.; Colquhoun, H.; Kastner, M.; Levac, D.; Ng, C.; Sharpe, J.P.; Wilson, K.; et al. A scoping review on the conduct and reporting of scoping reviews. BMC Med. Res. Methodol. 2016, 16, 15. [Google Scholar] [CrossRef]
- Tricco, A.C.; Lillie, E.; Zarin, W.; O’Brien, K.K.; Colquhoun, H.; Levac, D.; Moher, D.; Peters, M.D.J.; Horsley, T.; Weeks, L.; et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann. Intern. Med. 2018, 169, 467–473. [Google Scholar] [CrossRef]
- Mendes, W.d.S.; Demattê, J.A.; de Resende, M.E.B.; Ruiz, L.F.C.; de Mello, D.C.; Rosas, J.T.F.; Silvero, N.E.Q.; Alleoni, L.R.F.; Colzato, M.; Rosin, N.A.; et al. A remote sensing framework to map potential toxic elements in agricultural soils in the humid tropics. Environ. Pollut. 2022, 292, 118397. [Google Scholar] [CrossRef] [PubMed]
- Ma, H.; Liang, S. Development of the GLASS 250-m leaf area index product (version 6) from MODIS data using the bidirectional LSTM deep learning model. Remote Sens. Environ. 2022, 273, 112985. [Google Scholar] [CrossRef]
- Wang, X.; Liu, C.; van der Fels-Klerx, H.J. Regional prediction of multi-mycotoxin contamination of wheat in Europe using machine learning. Food Res. Int. 2022, 159, 111588. [Google Scholar] [CrossRef]
- Fang, Y.; Xu, L.; Wong, A.; Clausi, D.A. Multi-Temporal Landsat-8 Images for Retrieval and Broad Scale Mapping of Soil Copper Concentration Using Empirical Models. Remote Sens. 2022, 14, 2311. [Google Scholar] [CrossRef]
- Löw, F.; Stieglitz, K.; Diemar, O. Terrestrial oil spill mapping using satellite earth observation and machine learning: A case study in South Sudan. J. Environ. Manag. 2021, 298, 113424. [Google Scholar] [CrossRef] [PubMed]
- Sharma, A.; Liu, X.; Yang, X. Land cover classification from multi-temporal, multi-spectral remotely sensed imagery using patch-based recurrent neural networks. Neural Netw. 2018, 105, 346–355. [Google Scholar] [CrossRef]
- de C, É.; Veloso, G.V.; da Fonseca, A.A.; Fernandes-Filho, E.I.; Fontes, M.P.F.; Soares, E.M.B. Predictive modeling of contents of potentially toxic elements using morphometric data, proximal sensing, and chemical and physical properties of soils under mining influence. Sci. Total Environ. 2022, 817, 152972. [Google Scholar] [CrossRef]
- Radočaj, D.; Jurišić, M.; Gašparović, M.; Plaščak, I.; Antonić, O. Cropland suitability assessment using satellite-based biophysical vegetation properties and machine learning. Agronomy 2021, 11, 1620. [Google Scholar] [CrossRef]
- Pech-May, F.; Aquino-Santos, R.; Rios-Toledo, G.; Posadas-Durán, J.P.F. Mapping of Land Cover with Optical Images, Supervised Algorithms, and Google Earth Engine. Sensors 2022, 22, 4729. [Google Scholar] [CrossRef]
- Guo, L.; Sun, X.; Fu, P.; Shi, T.; Dang, L.; Chen, Y.; Linderman, M.; Zhang, G.; Zhang, Y.; Jiang, Q.; et al. Mapping soil organic carbon stock by hyperspectral and time-series multispectral remote sensing images in low-relief agricultural areas. Geoderma 2021, 398, 115118. [Google Scholar] [CrossRef]
- Xing, C.; Chen, N.; Zhang, X.; Gong, J. A machine learning based reconstruction method for satellite remote sensing of soil moisture images with in situ observations. Remote Sens. 2017, 9, 484. [Google Scholar] [CrossRef]
- Jia, X.; Hou, D. Mapping soil arsenic pollution at a brownfield site using satellite hyperspectral imagery and machine learning. Sci. Total Environ. 2023, 857, 159387. [Google Scholar] [CrossRef]
- Chen, Y.; Wu, Y.; Ma, J.; An, Y.; Liu, Q.; Yang, S.; Qu, Y.; Chen, H.; Zhao, W.; Tian, Y. Microplastics pollution in the soil mulched by dust-proof nets: A case study in Beijing, China. Environ. Pollut. 2021, 275, 116600. [Google Scholar] [CrossRef] [PubMed]
- Chen, S.; Liang, Z.; Webster, R.; Zhang, G.; Zhou, Y.; Teng, H.; Hu, B.; Arrouays, D.; Shi, Z. A high-resolution map of soil pH in China made by hybrid modelling of sparse soil data and environmental covariates and its implications for pollution. Sci. Total Environ. 2019, 655, 273–283. [Google Scholar] [CrossRef] [PubMed]
- Shi, T.; Hu, X.; Guo, L.; Su, F.; Tu, W.; Hu, Z.; Liu, H.; Yang, C.; Wang, J.; Zhang, J.; et al. Digital mapping of zinc in urban topsoil using multisource geospatial data and random forest. Sci. Total Environ. 2021, 792, 148455. [Google Scholar] [CrossRef]
- Lin, N.; Jiang, R.; Li, G.; Yang, Q.; Li, D.; Yang, X. Estimating the heavy metal contents in farmland soil from hyperspectral images based on Stacked AdaBoost ensemble learning. Ecol. Indic. 2022, 143, 109330. [Google Scholar] [CrossRef]
- Wang, L.; Zhou, Y.; Liu, J.; Liu, Y.; Zuo, Q.; Li, Q. Exploring the potential of multispectral satellite images for estimating the contents of cadmium and lead in cropland: The effect of the dimidiate pixel model and random forest. J. Clean. Prod. 2022, 367, 132922. [Google Scholar] [CrossRef]
- Zhou, S.; Kaufmann, H.; Bohn, N.; Bochow, M.; Kuester, T.; Segl, K. Identifying distinct plastics in hyperspectral experimental lab-, aircraft-, and satellite data using machine/deep learning methods trained with synthetically mixed spectral data. Remote Sens. Environ. 2022, 281, 113263. [Google Scholar] [CrossRef]
- Papi, R.; Attarchi, S.; Boloorani, A.D.; Samany, N.N. Knowledge discovery of Middle East dust sources using Apriori spatial data mining algorithm. Ecol. Inform. 2022, 72, 101867. [Google Scholar] [CrossRef]
- Acuña-Alonso, C.; Novo, A.; Rodríguez, J.L.; Varandas, S.; Álvarez, X. Modelling and evaluation of land use changes through satellite images in a multifunctional catchment: Social, economic and environmental implications. Ecol. Inform. 2022, 71, 101777. [Google Scholar] [CrossRef]
- Azizi, K.; Ayoubi, S.; Nabiollahi, K.; Garosi, Y.; Gislum, R. Predicting heavy metal contents by applying machine learning approaches and environmental covariates in west of Iran. J. Geochem. Explor. 2022, 233, 106921. [Google Scholar] [CrossRef]
- Cheng, Y.; Zhou, K.; Wang, J.; Cui, S.; Yan, J.; De Maeyer, P.; Van de Voorde, T. Regional metal pollution risk assessment based on a long short-term memory model: A case study of the South Altai Mountain mining area, China. J. Clean. Prod. 2022, 379, 134755. [Google Scholar] [CrossRef]
- Derakhshan-Babaei, F.; Mirchooli, F.; Mohammadi, M.; Nosrati, K.; Egli, M. Tracking the origin of trace metals in a watershed by identifying fingerprints of soils, landscape and river sediments. Sci. Total Environ. 2022, 835, 155583. [Google Scholar] [CrossRef]
- Alvyar, Z.; Shahbazi, F.; Oustan, S.; Dengiz, O.; Minasny, B. Digital mapping of potentially toxic elements enrichment in soils of Urmia Lake due to water level decline. Sci. Total Environ. 2022, 808, 152086. [Google Scholar] [CrossRef]
- Zhang, B.; Guo, B.; Zou, B.; Wei, W.; Lei, Y.; Li, T. Retrieving soil heavy metals concentrations based on GaoFen-5 hyperspectral satellite image at an opencast coal mine, Inner Mongolia, China. Environ. Pollut. 2022, 300, 118981. [Google Scholar] [CrossRef] [PubMed]
- Simioni, J.P.D.; Guasselli, L.A.; de Oliveira, G.G.; Ruiz, L.F.C.; de Oliveira, G. A comparison of data mining techniques and multi-sensor analysis for inland marshes delineation. Wetl. Ecol. Manag. 2020, 28, 577–594. [Google Scholar] [CrossRef]
- Teng, H.; Rossel, R.A.V.; Shi, Z.; Behrens, T.; Chappell, A.; Bui, E. Assimilating satellite imagery and visible-near infrared spectroscopy to model and map soil loss by water erosion in Australia. Environ. Model. Softw. 2016, 77, 156–167. [Google Scholar] [CrossRef]
- Agrawal, A.; Petersen, M.R. Detecting arsenic contamination using satellite imagery and machine learning. Toxics 2021, 9, 333. [Google Scholar] [CrossRef]
- Zeraatpisheh, M.; Ayoubi, S.; Jafari, A.; Tajik, S.; Finke, P. Digital mapping of soil properties using multiple machine learning in a semi-arid region, central Iran. Geoderma 2019, 338, 445–452. [Google Scholar] [CrossRef]
- Wolanin, A.; Camps-Valls, G.; Gómez-Chova, L.; Mateo-García, G.; van der Tol, C.; Zhang, Y.; Guanter, L. Estimating crop primary productivity with Sentinel-2 and Landsat 8 using machine learning methods trained with radiative transfer simulations. Remote Sens. Environ. 2019, 225, 441–457. [Google Scholar] [CrossRef]
- Guzinski, R.; Nieto, H. Evaluating the feasibility of using Sentinel-2 and Sentinel-3 satellites for high-resolution evapotranspiration estimations. Remote Sens. Environ. 2019, 221, 157–172. [Google Scholar] [CrossRef]
- Ivushkin, K.; Bartholomeus, H.; Bregt, A.K.; Pulatov, A.; Kempen, B.; de Sousa, L. Global mapping of soil salinity change. Remote Sens. Environ. 2019, 231, 111260. [Google Scholar] [CrossRef]
- Alexakis, D.D.; Tapoglou, E.; Vozinaki, A.E.K.; Tsanis, I.K. Integrated use of satellite remote sensing, artificial neural networks, field spectroscopy, and GIS in estimating crucial soil parameters in terms of soil erosion. Remote Sens. 2019, 11, 1106. [Google Scholar] [CrossRef]
- Wang, J.; Ding, J.; Yu, D.; Ma, X.; Zhang, Z.; Ge, X.; Teng, D.; Li, X.; Liang, J.; Guo, Y.; et al. Machine learning-based detection of soil salinity in an arid desert region, Northwest China: A comparison between Landsat-8 OLI and Sentinel-2 MSI. Sci. Total Environ. 2020, 707, 136092. [Google Scholar] [CrossRef]
- Khosravi, V.; Ardejani, F.D.; Gholizadeh, A.; Saberioon, M. Satellite imagery for monitoring and mapping soil chromium pollution in a mine waste dump. Remote Sens. 2021, 13, 1277. [Google Scholar] [CrossRef]
- Gholizadeh, A.; Žižala, D.; Saberioon, M.; Borůvka, L. Soil organic carbon and texture retrieving and mapping using proximal, airborne and Sentinel-2 spectral imaging. Remote Sens. Environ. 2018, 218, 89–103. [Google Scholar] [CrossRef]
- Page, M.J.; Page, M.J.; E McKenzie, J.; E McKenzie, J.; Bossuyt, P.M.; Bossuyt, P.M.; Boutron, I.; Boutron, I.; Hoffmann, T.C.; Hoffmann, T.C.; et al. Updating guidance for reporting systematic reviews: Development of the PRISMA 2020 statement. J. Clin. Epidemiol. 2021, 134, 103–112. [Google Scholar] [CrossRef] [PubMed]
- Alonso-Sarría, F.; Valdivieso-Ros, C.; Gomariz-Castillo, F. Analysis of the hyperparameter optimisation of four machine learning satellite imagery classification methods. Comput. Geosci. 2024, 28, 551–571. [Google Scholar] [CrossRef]
- Yadav, A.; Saraswat, S.; Faujdar, N. Geological Information Extraction from Satellite Imagery Using Machine Learning. In Proceedings of the 2022 10th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions), ICRITO, Noida, India, 13–14 October 2022; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2022. [Google Scholar] [CrossRef]
- Ogungbuyi, M.G.; Guerschman, J.; Fischer, A.M.; Crabbe, R.A.; Ara, I.; Mohammed, C.; Scarth, P.; Tickle, P.; Whitehead, J.; Harrison, M.T. Improvement of pasture biomass modelling using high-resolution satellite imagery and machine learning. J. Environ. Manag. 2024, 356, 120564. [Google Scholar] [CrossRef]
- Wang, L.; Cheng, Y.; Meftaul, I.M.; Luo, F.; Kabir, M.A.; Doyle, R.; Lin, Z.; Naidu, R. Advancing Soil Health: Challenges and Opportunities in Integrating Digital Imaging, Spectroscopy, and Machine Learning for Bioindicator Analysis. Anal. Chem. 2024, 96, 8109–8123. [Google Scholar] [CrossRef]
- Sulieman, M.M.; Kaya, F.; Keshavarzi, A.; Hussein, A.M.; Al-Farraj, A.S.; Brevik, E.C. Spatial variability of some heavy metals in arid harrats soils: Combining machine learning algorithms and synthetic indexes based-multitemporal Landsat 8/9 to establish background levels. Catena 2024, 234, 107579. [Google Scholar] [CrossRef]
- Xu, X.; Wang, Z.; Song, X.; Zhan, W.; Yang, S. A remote sensing-based strategy for mapping potentially toxic elements of soils: Temporal-spatial-spectral covariates combined with random forest. Environ. Res. 2024, 240, 117570. [Google Scholar] [CrossRef] [PubMed]
- Alamdar, S.; Ghazban, F.; Zarei, A. Efficiency of Machine Learning Algorithms in Soil Salinity Detection Using Landsat-8 Oli Imagery. SPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2023, 10, 49–55. [Google Scholar] [CrossRef]
- Mahmoud, R.; Hassanin, M.; Al Feel, H.; Badry, R.M. Machine Learning-Based Land Use and Land Cover Mapping Using Multi-Spectral Satellite Imagery: A Case Study in Egypt. Sustainability 2023, 15, 9467. [Google Scholar] [CrossRef]
- European Space Agency. ‘Sentinel-2.’ Sentinel Online. Available online: https://sentinel.esa.int/web/sentinel/missions/sentinel-2 (accessed on 12 April 2024).
- “USGS. ‘Landsat.’ Core Science Systems, National Land Imaging Program, United States Geological Survey. Available online: https://www.usgs.gov/core-science-systems/nli/landsat (accessed on 11 May 2023).
- Yao, L.; Xu, M.; Liu, Y.; Niu, R.; Wu, X.; Song, Y. Estimating of heavy metal concentration in agricultural soils from hyperspectral satellite sensor imagery: Considering the sources and migration pathways of pollutants. Ecol. Indic. 2024, 158, 111416. [Google Scholar] [CrossRef]
- Amieva, J.F.; Oxoli, D.; Brovelli, M.A. Machine and Deep Learning Regression of Chlorophyll-a Concentrations in Lakes Using PRISMA Satellite Hyperspectral Imagery. Remote Sens. 2023, 15, 5385. [Google Scholar] [CrossRef]
- Tchounwou, P.B.; Yedjou, C.G.; Patlolla, A.K.; Sutton, D.J. Heavy Metal Toxicity and the Environment; Springer: Basel, Swizerland, 2012; pp. 133–164. [Google Scholar] [CrossRef]
- Wijaya, D.; Aditama, I.S.; Austen, S.A.; Widjaja, C.S.; Jabar, B.A.; Irwansyah, E. Tree Counting with Deep Forest Algorithm for Kulon Progo District in Yogyakarta, Indonesia Using Pleiades Satellite Imagery. 2024. Available online: https://www.researchgate.net/publication/378008284 (accessed on 16 May 2024).
Study | Different Satellite Sources | ML Methodology Variation | Variation of Detected Contaminant | Performance Analysis Alteration | Sampling Quantity | Sampling Quality | Validation Data |
---|---|---|---|---|---|---|---|
[22] | LR | HR | LR | LR | LR | LR | LR |
[23] | LR | LR | UR | HR | LR | LR | LR |
[24] | HR | HR | LR | LR | UR | UR | LR |
[25] | HR | LR | HR | LR | UR | LR | LR |
[26] | LR | HR | UR | LR | LR | LR | LR |
[27] | HR | HR | UR | LR | UR | HR | LR |
[28] | HR | LR | LR | LR | LR | LR | LR |
[29] | LR | LR | UR | LR | HR | LR | LR |
[30] | HR | LR | UR | UR | LR | LR | LR |
[31] | LR | LR | HR | LR | LR | LR | LR |
[32] | HR | HR | UR | LR | HR | LR | LR |
[33] | HR | LR | HR | LR | LR | LR | LR |
[34] | HR | HR | UR | UR | HR | LR | UR |
[35] | HR | LR | UR | LR | LR | LR | LR |
[36] | HR | LR | HR | LR | LR | LR | LR |
[37] | HR | LR | LR | LR | LR | LR | LR |
[38] | HR | LR | LR | LR | LR | LR | LR |
[39] | LR | LR | LR | LR | LR | LR | LR |
[40] | HR | HR | UR | UR | UR | UR | LR |
[41] | HR | HR | UR | HR | HR | LR | LR |
[42] | HR | LR | LR | LR | LR | LR | LR |
[43] | LR | HR | LR | HR | HR | LR | LR |
[44] | HR | LR | UR | LR | LR | LR | LR |
[45] | LR | LR | LR | LR | LR | LR | LR |
[46] | LR | LR | LR | LR | LR | LR | LR |
[47] | HR | LR | LR | HR | LR | LR | LR |
[48] | LR | HR | UR | LR | LR | LR | LR |
[49] | HR | LR | HR | LR | HR | LR | LR |
[50] | HR | LR | UR | LR | LR | LR | LR |
[51] | LR | LR | UR | LR | UR | LR | LR |
[52] | LR | HR | UR | LR | UR | LR | LR |
[53] | LR | HR | UR | HR | LR | LR | LR |
[54] | LR | HR | UR | LR | HR | LR | LR |
[55] | LR | HR | UR | LR | LR | LR | LR |
[56] | LR | LR | HR | LR | LR | LR | LR |
[57] | HR | LR | LR | LR | LR | LR | LR |
Method of Machine Learning | Number of Frequency |
---|---|
Decision Trees (Random Forest (RF), ExtraTrees (ET), Decision Tree, Cubist (Cu), Classification and Regression Trees (CART), Deep Forest Algorithm) | 48 |
Neural Networks (Artificial Neural Network (ANN), Backward Propagation Neural Network (BPNN), The general regression neural network (GRNN), Patch-based multi-image NN system (Patch multi), Patch-based single-image NN system (Patch Single), Pixel-based multi-image NN system (Pix multi), Pixel-based single-image NN system (Pix single), Multi-Layer Perceptron (MLP), Extreme Learning Machine (ELM)) | 20 |
Regression Models (Partial Least Squares Regression (PLSR), K Nearest Neighbor (KNN), Gaussian Process Regression (GPR), Generalized Linear Models (GLMs), Geographically Weighted Regression (GWR), kernel ridge regression (KRR), Linear Regression Model, Stepwise Multiple Linear Regression (SMLR), Multiple Linear Regression (MLR)) | 20 |
Support Vector (Support Vector Machine (SVM), support vector regression (SVR)) | 15 |
Boosting Algorithms (Gradient Descent Boosting Trees (GDB), Adaptive Boosting (ADB), eXtreme Gradient Boosting (XGB), Generalized Boosting Methods (GBMs)) | 11 |
Recurrent Neural Networks (RNNs) (Long short-term memory (LSTM), Bidirectional LSTM (Bi-LSTM), Gated Recurrent Unit (GRU), Pixel-based RNN system (Pix RNN), Proposed patch-based RNN (PB-RNN)) | 9 |
Convolutional Neural Network (CNN) | 7 |
Naive Bayes | 1 |
Environmental Parameters Detected | Number of Frequencies |
---|---|
Heavy Metals including (copper (Cu), arsenic (As), cadmium (Cd), chromium (Cr), nickel (Ni), lead (Pb), zinc (Zn), iron (Fe)) | 47 |
Soil Characteristics including (soil organic carbon, soil organic matter (SOM), surface soil moisture, soil loss, soil erosion, soil texture, clay content, soil pH, soil salinity, evaporation, soil fauna, soil microbes, electrical conductivity (EC), calcium carbonate equivalent (CCE)) | 20 |
Vegetation Properties including (land cover, leaf area index (LAI), cropland suitability assessment, crop productivity, tree counting) | 13 |
Plastic Pollution including (microplastics pollution, plastics polyethylene (PE), polypropylene (PP), polyvinyl chloride (PVC), polyethylene terephthalate (PET), polystyrene (PS), acrylonitrile butadiene styrene (ABS), ethylene vinyl acetate (EVA), polyamide (PA), polycarbonate (PC), polymethyl methacrylate (PMMA)) | 11 |
Transition Metals including (Ti (titanium), V (vanadium), Cr (chromium), Mn (manganese), Fe (iron), Co (cobalt), Cu (copper), Zr (zirconium), Nb (niobium), Mo (molybdenum), Cd (cadmium), Hf (hafnium), W (tungsten), M (molybdenum)) | 11 |
Alkali and Alkaline Earth Metals including (Na (sodium), Mg (magnesium), K (potassium), Ca (calcium), Sr (strontium), Ba (barium), Cs (cesium)) | 8 |
Lanthanides or Rare Earth Elements including (Ce (cerium), Pr (praseodymium), Nd (neodymium), Y (yttrium), La (lanthanum)) | 5 |
Nonmetals including (Si (silicon), P (phosphorus), S (sulfur), Br (bromine)) | 4 |
Post-Transition Metals including (aluminum (Al), gallium (Ga)) | 3 |
Actinides including (Th (thorium), U (uranium)) | 3 |
Other pollutants (oil spill, dust and its sources, pollution by urban influence on Inland Marsh) | 3 |
Multi-mycotoxin contamination (such as deoxynivalenol and zearalenone) | 2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
TavallaieNejad, A.; Vila, M.C.; Paneiro, G.; Baptista, J.S. A Systematic Review of Machine Learning Algorithms for Soil Pollutant Detection Using Satellite Imagery. Remote Sens. 2025, 17, 1207. https://doi.org/10.3390/rs17071207
TavallaieNejad A, Vila MC, Paneiro G, Baptista JS. A Systematic Review of Machine Learning Algorithms for Soil Pollutant Detection Using Satellite Imagery. Remote Sensing. 2025; 17(7):1207. https://doi.org/10.3390/rs17071207
Chicago/Turabian StyleTavallaieNejad, Amir, Maria Cristina Vila, Gustavo Paneiro, and João Santos Baptista. 2025. "A Systematic Review of Machine Learning Algorithms for Soil Pollutant Detection Using Satellite Imagery" Remote Sensing 17, no. 7: 1207. https://doi.org/10.3390/rs17071207
APA StyleTavallaieNejad, A., Vila, M. C., Paneiro, G., & Baptista, J. S. (2025). A Systematic Review of Machine Learning Algorithms for Soil Pollutant Detection Using Satellite Imagery. Remote Sensing, 17(7), 1207. https://doi.org/10.3390/rs17071207