Machine Learning in Geosciences: A Review of Complex Environmental Monitoring Applications

: This is a systematic literature review of the application of machine learning (ML) algorithms in geosciences, with a focus on environmental monitoring applications. ML algorithms, with their ability to analyze vast quantities of data, decipher complex relationships, and predict future events, and they offer promising capabilities to implement technologies based on more precise and reliable data processing. This review considers several vulnerable and particularly at-risk themes as landfills, mining activities, the protection of coastal dunes, illegal discharges into water bodies, and the pollution and degradation of soil and water matrices in large industrial complexes. These case studies about environmental monitoring provide an opportunity to better examine the impact of human activities on the environment, with a specific focus on water and soil matrices. The recent literature underscores the increasing importance of ML in these contexts, highlighting a preference for adapted classic models: random forest (RF) (the most widely used), decision trees (DTs), support vector machines (SVMs), artificial neural networks (ANNs), convolutional neural networks (CNNs), principal component analysis (PCA), and much more. In the field of environmental management, the following methodologies offer invaluable insights that can steer strategic planning and decision-making based on more accurate image classification, prediction models, object detection and recognition, map classification, data classification


Introduction
Machine learning (ML) has significantly revolutionized scientific methodology in geoscience applications by introducing automation, enhancing efficiency, enabling adaptability, ensuring security, and facilitating extensive data analytics [1].Artificial intelligence (AI), machine learning, and deep learning (DL), highly cited contemporary technologies, are interconnected but distinct disciplines [2,3].AI is a wider field that integrates various approaches to create intelligent systems.ML is a branch of AI that emphasizes learning from data and human-imitating algorithms, and DL is a further specialized subset of ML, focusing on the use of deep neural networks for pattern recognition [4].This review exclusively focuses on the applications of ML within the field of geosciences.
The history of ML begins with cybernetics and the computer sciences in the early 1950s with the idea of using machines to simulate human learning processes.The primary stages between the 1950s and 1960s created the prototype of early neural networks [5].The evolution of ML has progressed through distinct phases: rule-based systems (1960s-1970s), connectionism and backpropagation (1980s), a renaissance in the 1990s, and a deep learning resurgence in the 2010s [6].Each phase marked significant advancements, diversification, and broader practical applications.The ML field collected substantial relevance and investment, evident in its transition from a limited number of global conferences to a proliferation of both national and international events.This shift underscores its increasing significance and widespread interest within the scholarly community.
The application of ML covers four principal domains: prediction, feature importance extraction, anomaly detection, and discovering new materials.These categories collectively exemplify the multifaceted utility of ML methodologies in assorted analytical pursuits [7].All predominant applications follow a uniform procedural framework: encompassing model preparation, model development, and post-model creation stages, inclusive of the interpretation and determination of applicability domains.This approach is well suited for addressing the intricate challenges in environmental monitoring.Comprehensively, environmental monitoring in geosciences contemplated the convergence of multiple disciplines in very complex data management.These disciplines are physics, geology, meteorology and atmospheric sciences, oceanography, environmental science, geomorphology, seismology, paleontology, mineralogy and petrology, geophysics, glaciology, hydrology, chemistry, biology, ecology, and anthropology.
On the global stage, scientific investigations into geosciences based on ML applications are predominantly guided by the utilization of supervised ML algorithms.The research was carried out utilizing the Clarivate site [8] by setting the following as filters: the last four-year open access scientific articles, sorted according to the first ten results by relevance, from the principal academic publishing companies specializing in scientific articles (e.g., Elsevier, Springer Nature, MDPI, IEEE, and Frontiers Media Sa) in the four geoscience fields (geophysics, geomorphology, hydrogeology, and applied geology).Upon the analysis of scientific articles, it is found that a notable 56.3% of the content originates from Asia, 12.4% from Europe, 10.4 from Australia, and 8.3% from North America, with equal results of 6.3% from South America and Africa.A substantial proportion of articles employ the supervised learning algorithm of random forest (RF), an ensemble method [9][10][11], a support vector machine (SVM) [12,13], logistic regression (LR) a linear model [14,15], an artificial neural network (ANN) [16][17][18], a decision tree (DT) [19,20], K-nearest neighbors (KNN) [21][22][23], and a Bayesian neural network (BNN) [24,25].The investigation revealed an average utilization of four ML techniques reflecting the dynamic landscape of machine learning applications (Figure 1).The complete results of the frequencies and number of publications are reported in Table S1.
proliferation of both national and international events.This shift underscores its increasing significance and widespread interest within the scholarly community.
The application of ML covers four principal domains: prediction, feature importance extraction, anomaly detection, and discovering new materials.These categories collectively exemplify the multifaceted utility of ML methodologies in assorted analytical pursuits [7].All predominant applications follow a uniform procedural framework: encompassing model preparation, model development, and post-model creation stages, inclusive of the interpretation and determination of applicability domains.This approach is well suited for addressing the intricate challenges in environmental monitoring.Comprehensively, environmental monitoring in geosciences contemplated the convergence of multiple disciplines in very complex data management.These disciplines are physics, geology, meteorology and atmospheric sciences, oceanography, environmental science, geomorphology, seismology, paleontology, mineralogy and petrology, geophysics, glaciology, hydrology, chemistry, biology, ecology, and anthropology.
On the global stage, scientific investigations into geosciences based on ML applications are predominantly guided by the utilization of supervised ML algorithms.The research was carried out utilizing the Clarivate site [8] by setting the following as filters: the last four-year open access scientific articles, sorted according to the first ten results by relevance, from the principal academic publishing companies specializing in scientific articles (e.g., Elsevier, Springer Nature, MDPI, IEEE, and Frontiers Media Sa) in the four geoscience fields (geophysics, geomorphology, hydrogeology, and applied geology).Upon the analysis of scientific articles, it is found that a notable 56.3% of the content originates from Asia, 12.4% from Europe, 10.4 from Australia, and 8.3% from North America, with equal results of 6.3% from South America and Africa.A substantial proportion of articles employ the supervised learning algorithm of random forest (RF), an ensemble method [9][10][11], a support vector machine (SVM) [12,13], logistic regression (LR) a linear model [14,15], an artificial neural network (ANN) [16][17][18], a decision tree (DT) [19,20], Knearest neighbors (KNN) [21][22][23], and a Bayesian neural network (BNN) [24,25].The investigation revealed an average utilization of four ML techniques reflecting the dynamic landscape of machine learning applications (Figure 1).The complete results of the frequencies and number of publications are reported in Table S1.The structure of the paper can be outlined as follows: Section 2 is an overview of the limits and challenges of geosciences in machine learning algorithms, while Section 3 provides a specific geoscience environmental monitoring application including quarries and discharge phenomena, coastal dunes safeguarding monitoring, illicit sea discharges, and pollution in different matrices from several sources in industrial complexes.In Section 4, the conclusion is presented and Section 5 offers a forward-looking perspective, anticipating strategic developments and innovation.The present comprehensive review systematically explores the application of machine learning (ML) algorithms within the realm of geosciences.Particular emphasis is placed on their use in environmental monitoring applications.The focus of the subsequent chapter will be to delve deeper into this specific area of application.The following table reports the nomenclature used internationally to distinguish the various ML algorithms (Table 1).

Overview of the Limits and Challenges of Geosciences in Machine Learning Algorithms
In the field of geoscience environmental monitoring, conventional methodologies include a range of specified methods such as field surveys and measurements, soil-rock-water sampling and geochemical analysis, geodetic and remote sensing, and climate monitoring.These methodologies are fundamental tools for assessing and understanding environmental dynamics, providing crucial insights into various geological and ecological processes.
Traditional sampling methods still have fundamental significance, although the promising ML techniques offer notable enhancements across six distinct domains: enhanced accuracy and spatial coverage [26,27], efficiency in time and resource utilization, an improved understanding of complex models [28][29][30], adaptability and continual updates, automation and reduced human dependence, reliability, and validation challenges [31,32].
The implementation of ML methodologies into geosciences offers numerous potential advantages in data analysis, and ML enables the efficient analysis of large volumes of data.Before executing ML, a substantial portion of the effort is dedicated to preprocessing and data transformations, entailing tasks such as eliminating redundancy, inconsistency, noise, and heterogeneity, as well as transforming and labeling data.Dealing with big data turns out to be very advantageous, creating the opportunity to diminish reliance on human supervision by learning directly from the three key concepts characterizing these data such as volume, variety, and velocity [33,34].Analyzing extensive datasets enhances scalability through the proficient management of large data volumes, augments adaptability by refining accuracy iteratively, and facilitates the effective management of data veracity [35].The ability to model, optimize, and integrate multi-source data, automate complex tasks, and provide forecasts facilitates land management by providing a complete view of the processes.ML algorithms are grouped into four main applications: detecting objects and events, estimating variables, long-term forecasting variable problems, and mining relationship data [36].In geo-monitoring, advanced methods for estimating landslide movement using drone data (UAV) have been developed, improving accuracy by 8% compared to traditional methods.In parallel, wireless sensor networks (WSNs) have been used to monitor the structural health of homes in areas at risk of ground movement.These technologies, which use artificial intelligence and the Internet of Things, represent the vanguard in remote monitoring, contributing to the prevention of harm and the safety of people [37,38].In geoscience, machine learning methodologies outlined by Dramsch et al. (2020) are primarily categorized into developing alternative models to optimize computational efficiency, crafting models to supplement or replace human intervention, enabling previously unattainable geoscientific activities [39].Machine learning (ML) methodologies involve supervised, unsupervised, reinforcement learning (LR), semi-supervised learning, deep learning, explainable AI, and other algorithms (Figure 2) [40].

Figure 2. Classification of artificial intelligence algorithms (AI)
. This figure illustrates the broad spectrum of AI algorithms, with a particular focus on machine learning (ML) methods.It is important to note that numerous methodologies span across multiple categories (for instance, the deep learning methodology).This overlap signifies the versatility and adaptability of these algorithms in various research and application domains.

Supervised ML Algorithms
Supervised learning encompasses various problem categories and techniques, including classification, regression, neural network-based approaches, ensemble methods, optimization-based techniques, object detection, feature filtering, and dimensionality reduction [40,41].Specific algorithms and methods, such as boosting methods, neural networks, tree-based methods, regression methods, Bayesian methods, instance-based methods, support vector machines, and deep learning, are employed to address these problems.
The first supervised ML algorithm methods are boosting methods, which include adaptive boosting (AdaBoost), and random under-sampling boosting (RUSBoost).The Adaboost algorithm improves the model's performance, and RUSBoost uses random under-sampling to resolve the class imbalance. .This figure illustrates the broad spectrum of AI algorithms, with a particular focus on machine learning (ML) methods.It is important to note that numerous methodologies span across multiple categories (for instance, the deep learning methodology).This overlap signifies the versatility and adaptability of these algorithms in various research and application domains.

Supervised ML Algorithms
Supervised learning encompasses various problem categories and techniques, including classification, regression, neural network-based approaches, ensemble methods, optimization-based techniques, object detection, feature filtering, and dimensionality reduction [40,41].Specific algorithms and methods, such as boosting methods, neural networks, tree-based methods, regression methods, Bayesian methods, instance-based methods, support vector machines, and deep learning, are employed to address these problems.
The first supervised ML algorithm methods are boosting methods, which include adaptive boosting (AdaBoost), and random under-sampling boosting (RUSBoost).The Adaboost algorithm improves the model's performance, and RUSBoost uses random undersampling to resolve the class imbalance.
Neural network methods include artificial neural networks (ANNs), multilayer perceptron (MPL), convolutional neural networks (CNNs), recurrent neural networks (RNNs), long short-term memory (LSTM), and Bayesian neural networks (BNNs).Neural networkbased algorithms simulate the biological neural networks of the human brain.In artificial neural network (ANN) methods, the "neurons" act to solve some complex problems to extract trends or detect patterns [42].In addition, convolutional neural networks (CNNs) are primarily used for image classification and recurrent neural networks (RNNs) for sequential data, such as natural language and time series, and long short-term memory (LSTM) is used to handle the gradient problem that vanishes in recurring neural networks.
The tree-based methods include decision trees (DTs), extremely randomized trees (extra-trees), and random forest (RF).A decision tree (DT) is an algorithm used for regression and classification [43].The principal idea is to divide a dataset into smaller subsets, and it is widely used for its interpretability features and ease of viewing.The extremely randomized trees (extra-trees) are a decision tree adaptation that randomly selects dividing points for each node in the tree.Despite lower accuracy, they prove to be quicker to train than traditional decision trees [9].
Regression algorithms aim to take the relationship between a variable output target and input features, facilitating the prediction of new data [40] like Gaussian process regression (GPR), stepwise linear regression (SLR), and polynomial kernel regression (PKR).GPR models input and output variable distributions using Gaussian processes.A regression model based on a genetic algorithm (GA) optimizes parameters through iterative generations, generating potential solutions and iteratively refining them to identify the optimal solution.
Gaussian naive Bayes (GNB), which pertains to Bayesian methods, assumes that features have a bell-shaped distribution, making it easier to calculate probabilities and classify data efficiently.
In the category of instance-based methods, one of the most simple and popular classifications of nonparametric variables is k-nearest neighbors (kNN) [9].It is employed to perform mainly classifications or predictions on data grouping based on the proximity (neighborhood) of training points.The nearest centroid (NC) calculates the centroids for each class and classifies the new points based on their distance from the centroids.
Support vector machines (SVMs) are active in high-dimensional spaces by finding the optimal hyperplane that maximizes the margin between classes in the feature space robust against overfitting [44].Vector support machines for least squares (LSSVM) integrate SVMs with least squares principles to minimize error by finding a function approximating the data.
Deep learning is recognized as belonging to the domain of supervised learning algorithms.Moreover, due to its distinctive architecture and methodology, deep learning also constitutes a distinct category within the broader landscape of machine learning techniques.

Unsupervised, Semi-Supervised, and Reinforcement Learning ML Algorithms
Unsupervised algorithms are used for data analysis without specific labels or targets to predict.These algorithms search for patterns or structures in the data without outwardreliant variable information and can be further subdivided into several categories including clustering algorithms, size reduction, and optimization based on set theory.The models use previously learned features to recognize the new data class entered [45,46].
Clustering algorithms are a set of techniques employed to group similar objects based on certain similarity or dissimilarity metrics, e.g., cluster analysis (CA), the iterative selforganizing data analysis technique (ISODATA) and cluster confusion normalized mutual information (CC-NMI).The ISODATA method "Iterative Self-Organizing Data Analysis Technique" is a clustering-specific algorithm that divides data into clusters built on their statistical properties, iteratively updating centroids and cluster members.CC-NMI is a measure of similarity between two cluster partitions, which considers the confusion between clusters and normalizes the result using mutual information.
In the dimensionality reduction algorithms, there is principal component analysis (PCA) and dimensionality reduction (DR).PCA is a method to reduce the dimensionality of the data while maintaining maximum variance in the original data.
A further type of optimization-based algorithm is the self-optimizing machine learning algorithm.This is an algorithm that independently gives its parameters to optimize a given performance metric.For algorithms based on set theory, there is the fuzzy set theory (FST), an extension of the classical theory.This theory assigns a grade of belonging between 0 (indicates no affinity) and 1 (indicates full affinity).
In transformation methods, the discrete orthogonal transformation (DOT) methods transform data using discrete orthogonal transformations to improve model analysis or training.
An example of a neural network is U-Net, a unique "U-shaped" neural network architecture, often used in convolutional neural networks (CNNs), designed for image segmentation and reconstruction problems.
In the domain of semi-supervised learning algorithms, the positive-unlabeled learning algorithm (PU) stands out.This approach leverages a combined dataset comprising both labeled and unlabeled data to enhance model performance.It operates under the assumption that the unlabeled data pool may encompass both positive and negative examples.This learning methodology proves particularly beneficial in scenarios characterized by an extensive repository of unlabeled data alongside a limited subset of positively labeled data.
Lastly, reinforcement learning is a paradigm of machine learning in which an agent learns to perform actions in an environment, receiving feedback through rewards or penalties, to maximize a specific goal (e.g., policy gradient methods, Q-learning, SARSA (stateaction-reward-state-action), and deep Q-networks (DQNs)).

Deep Learning
Deep learning (DL) is a subset of machine learning that employs algorithms modeled after the brain's structure and function.It excels in processing large datasets and uncovering complex relationships through multiple levels of abstraction.Specific algorithms and methods, such as artificial neural networks (ANNs), convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative models, large language models, and multipath convolutional neural networks, are employed to address these problems.
Within convolutional neural networks (CNNs), a multitude of sophisticated techniques are employed to enhance performance and accuracy such us feature pyramid networks (FPNs), U-Net, you only look once, version 3 (YOLOv3), and single-shot detector (SSD) algorithms.U-Net carries a contraction path and an expansive path to improve its performance and accuracy in image segmentation tasks.You only look once (YOLO) is a state-of-the-art real-time object detection system.It predicts class probabilities and bounding boxes for objects directly from full images in a single pass.Utilizing a 53-layer convolutional neural network, YOLO balances speed and precision.It features bounding box prediction, multi-scale prediction, and class prediction.
Large language models (LLMs) are advanced artificial intelligence systems designed to comprehend and generate human language.They play a crucial role in numerous applications, including chatbots, virtual assistants, and sophisticated search tools.

Explainable AI and Other Algorithms
Explainable AI encompasses methodologies aimed at enhancing interpretability in decision-making processes within artificial intelligence systems.Techniques such as LIME and SHAP provide local and individual model prediction explanations, respectively.LIME offers local interpretability, while SHAP leverages game theory principles for comprehensive insights.Regression methods, like multivariate adaptive regression splines (MARS), model relationships between variables, enhancing the understanding of complex data structures.In computer vision, object detection, exemplified by the single-shot detector (SSD) algorithms, entails identifying and localizing objects within visual data, enabling diverse applications.

Environmental Monitoring Applications in Geosciences
Environmental monitoring in geosciences is fundamental to understanding and mitigating the impacts of human activities on the environment.It encompasses methodologies and strategies for identifying, analyzing, and establishing environmental parameters to gauge and quantify environmental impacts.This process relies on various testing and evaluation methodologies to furnish crucial insights into environmental conditions and potential hazard levels.This article focuses on select issues within environmental monitoring for several reasons.Firstly, it underscores the urgency and gravity of environmental concerns, given their profound implications for planetary health and human well-being.Additionally, data availability from sources such as satellites, environmental sensors, industrial registries, and other tools has facilitated the choice of pertinent environmental issues for examination.Lastly, considering the practical and socio-economic implications, the potential for substantial enhancements and the promotion of sustainable solutions contribute to tackling genuine social challenges.
The methodologies of machine learning (ML) discussed in this article will be implemented in environmental monitoring endeavors to focus on key environments, including landfills, quarries, coastal dune protection, sea discharge, and complex industrial settings (Table 2, Figure 3).

Quarry and Landfill Monitoring ML Application
The problem of waste management, including unauthorized dumpsites, is a global concern [47,48].Despite regulatory efforts, landfills have harmful effects on soil, air, water, and biodiversity [49].The global population is expanding, resulting in a rise in waste generation [50].The exponential escalation in waste generation has required an increased

Quarry and Landfill Monitoring ML Application
The problem of waste management, including unauthorized dumpsites, is a global concern [47,48].Despite regulatory efforts, landfills have harmful effects on soil, air, water, and biodiversity [49].The global population is expanding, resulting in a rise in waste generation [50].The exponential escalation in waste generation has required an increased dependence on and proliferation of landfills for disposal, whether lawful or illicit [51].Recent publications have highlighted the utilization of machine learning (ML), deep learning (DL), and heuristic models.Awadh, M. Al and Mallick J [52] merged multi-criteria decision making (MCDM), fuzzy set theory, GIS, and eXplainable Artificial Intelligence (XAI).The models provide a landfill site potential zone (LSPZ) map classification.The model employs geospatial and environmental datasets to discern candidate locations for landfill establishment.It leverages machine learning methodologies, with a focus on an optimized ensemble bagging model, to categorize various regions as prospective landfill sites [53,54].The study utilizes SHAP (SHapley Additive exPlanations) and LIME (local interpretable model-agnostic explanations) analyses to elucidate machine learning models and enhance the comprehension of model predictions.A recent investigation introduced a machine learning model employing the positive-unlabeled (PU) learning algorithm within an ensemble framework.This model has undergone validation utilizing the PU-based random forest technique for monitoring and preventing the illegal disposal of hazardous waste (HW) [55].Furthermore, cluster analysis [56], a statistical technique, facilitates the unsupervised grouping of set elements into classes for grouping similar classifications for regional water resource protection.An additional proposed methodology uses a machine learning method called a multipath convolutional neural network (mp-CNN), and it is used to locate waste piles in roads and roadsides.In the test phase, the model with an image classification showed excellent performance, usable in developing countries [57].A novel method is proposed by Torres, R. N. and Fraternali with a convolutional neural network (CNN) combination of ResNet50 and feature pyramid network (FPN) methods for a risk map result [58,59].Illegal landfill detection is formulated as a multi-scale scene classification problem, with datasets of about 3000 images with an accuracy of 88%.Leveraging the single-shot detector (SSD) algorithm, in conjunction with deep learning methodologies and remote sensing techniques, facilitates the real-time detection of objects within video streams, thereby enhancing the efficacy of dumping detection [60].This amalgamation of advanced technologies underscores the potential for significant advancements in waste management.Moreover, a machine learning technique based on discrete orthogonal transformations (DOTs) is used.This technique is used to identify waste disposal facilities from high-resolution spatial images [61].Lastly, YOLOv3 (you only look once, version 3) enables the real-time detection of specific objects in videos, live feeds, or images [62].
Monitoring activities as regards quarries, mines, and excavations for material extraction cause environmental problems, with potential implications for environmental degradation and a high risk of environmental damage.Below are some examples of the application of ML techniques.Larrea-Gallegos et al. 2023 presented an ML approach with an unsupervised learning algorithm (X-means) and a random forest (RF) classification model to improve strategic planning [63].Furthermore, Fernández-Alonso et al. 2023 proposed a convolutional neural network (CNN) for the identification of mining remains [64].The study conducted by Fissha et al. (2023) used a Bayesian neural network (BNN) and other models like gradient boosting, K-neighbors, decision trees, and random forest to predict the blast-induced ground vibration [65].The article evaluates additional machine learning methods such as the nearest centroid, random forest, decision trees, and Gaussian naive Bayes.Moreover, discloses a decision tree algorithm based on the parametric analysis of tunneling-induced ground settlements to understand the tunneling-induced ground subsidence [66].This methodology can aid in the identification of historical subterranean quarries, even when their spatial coordinates have been obscured within highly urbanized locales.In a further study, a CNN, a type of deep learning model, is employed to identify deformations within a national-scale velocity field.The primary objective of the model is to accurately detect and classify various forms of deformation.These include subsidence resulting also from coal mining activities, deformations in slate quarries, and alterations due to tunnel engineering works [67].Another machine learning research is focused on predicting the peak particle velocity (PPV) values with a DT model.PPT is a measure of ground vibration amplitude due to blasting operations in limestone quarries by the use of the explosive charge weight per delay and the distance from the blast [68].
In conclusion, the advantages of implementing ML and AI in landfill and quarry management include enhanced accuracy, predictive capabilities, and real-time detection capabilities.Advanced ML models, such as SHAP, LIME, and PU-based random forest, offer precise classification and monitoring, while algorithms like YOLOv3 and SSD provide real-time object detection.However, the disadvantages include the complexity of implementing and interpreting sophisticated models and the dependency on high-quality, extensive datasets.

Coastal Dunes Preservation ML Application
The preservation of coastal dunes is paramount for safeguarding our shorelines.Coastal dunes are pivotal in combating erosion and conserving marine ecosystems [69][70][71].ML techniques are employed for dune reinforcement, forecasting, monitoring, and sustainable governance.In the study conducted by Pinton et al. 2023, a regression model based on a genetic algorithm (GA) and a random forest algorithm (RF) were utilized to estimate ground elevation in coastal dunes [72].A further exploration considers the coastal dunes along Lake Michigan's eastern shoreline to obtain an image classification from aerial images with the ISODATA classification method [73].A further exploration uses three distinct algorithms, ANN, SVM, and RF, to employ high-resolution mapping [74].Mohammadpoor, M. and Eshghizadeh, M. 2021 present an advanced algorithm designed for the precise extraction of dunes from Landsat satellite imagery in both terrestrial and coastal settings.K-nearest neighbors, decision trees, AdaBoost, RUS Boost, and SVM algorithms leverage intelligent techniques to accurately identify and delineate dune features [75].Finally, there is an example of assessing wave runup and coastal dune erosion through the use of the Gaussian process (GP), a nonparametric supervised learning method [76].
Summing up, the advantages of using ML and AI in coastal dune protection include precise mapping and erosion prediction with Gaussian process models.However, the disadvantages involve challenges in adapting to environmental variability and high computational costs.

Water Discharges into the Sea ML Application
ML for the analysis, prediction, and comprehension of water discharges into the sea facilitates efficient marine management and environmental conservation efforts.Understanding the impact of discharges, such as wastewater, pollutants, or runoff, on marine ecosystems is imperative.The models enable the prediction of discharges, aiding in water resource planning, disaster prevention, and environmental protection.Through the study of discharges, the optimization of water usage, the prevention of shortages, and the maintenance of ecosystem balance can be achieved.ML models can accurately predict water quality parameters even with limited data, crucial for pollution control, ecosystem health, and human well-being.In their study published in 2023, Liao et al. utilized the DeepLabv3+ semantic segmentation architecture for monitoring oil spill risk in coastal areas.Their approach relied on polarimetric synthetic aperture radar (SAR) satellite imagery [77].Magrì, S. et al. 2023 developed machine learning techniques utilizing two distinct generalized linear models: stepwise linear regression (SLR) and polynomial kernel regression (PKR).These models were employed to infer seawater turbidity from Sentinel-2 imagery [78].In an alternate investigation, various machine learning algorithms, including a support vector machine (SVMs), random forest (RF), an artificial neural network (ANN), and combined algorithms, were employed for the detection of sediment discharge in rivers using Sentinel-2 satellite imagery [79].A recent study employed machine learning method-ologies to reconstruct daily sea discharge.Six distinct machine learning algorithms were utilized in the analysis (RF, GPR, SVR, decision tree (DT), least squares support vector machine LSSVM, and multivariate adaptive regression spline MARS).The research aimed to accurately model and predict daily discharge patterns at sea [80].Granata et al. 2018 developed three ML models (M5P regression tree, random forest, and support vector regression) for spring discharge forecasting.These prototypes were constructed only using historical discharge data and cumulative rainfall information [81].
The advantages of using ML and AI in sea discharge include comprehensive monitoring and improved pollution control.Models such as DeepLabv3+ and random forest accurately monitor and predict water quality parameters.Nonetheless, disadvantages include data scarcity and the complexity of modeling in diverse environments.

Contaminated Industrial Water and Soil Matrix ML Application
The utilization of machine learning in pollution monitoring within contaminated industrial complexes holds substantial promise for enhancing pollution monitoring and management, thereby fostering environmental sustainability.Through machine learning algorithms, the capability to forecast forthcoming pollution levels and categorize pollution sources based on gathered data is facilitated, facilitating precise intervention strategies.The remediation of polluted areas occurs in both aquatic and terrestrial environments.Machine learning (ML) techniques play a pivotal role in enhancing remediation efforts in both contexts.
Emerging technologies have showcased their capacity to enhance, simulate, and automate water treatment methodologies, surveillance, and ecological system administration.The objective is to safeguard aquatic ecosystems through the observation and identification of contaminants.With the exponential surge in aquatic environmental data, ML has emerged as a pivotal instrument for data scrutiny, categorization, and prognostication [82][83][84][85].The eutrophication and the proliferation of chlorophyll algae in water frequently result from inadequate wastewater management and unsustainable agricultural practices.Huang, H. and Zhang, J. 2024 employed four distinct methodologies to ascertain the significant factors influencing chlorophyll-a (Chl-a) content, the support vector regression (SVR) model demonstrating superior accuracy and precise predictions [86].A new investigation examines urban river water quality monitoring through the utilization of a self-optimizing machine learning algorithm applied to multi-source remote sensing data (satellite images, UAV images, and water samples) [87].In addition, in Zhi, W. et al.'s study (2021), a recurrent neural network (RNN) known as long short-term memory (LSTM) is employed to forecast levels of dissolved oxygen (DO) within riverine environments [88].Moreover, an article conducted a comparative analysis utilizing big data to assess the prediction performance and identify key water parameters in surface water quality.The study compared seven traditional and three ensemble learning models, including a decision tree (DT), random forest (RF), and deep cascade forest (DCF) [89].Furthermore, another article aims to improve the classification of water images with a neural attention network [90].Finally, an Indian study utilizes cluster analysis (CA) and principal component analysis (PCA) to evaluate heavy metal contamination in aquatic environments [91].
Several scholarly publications have extensively examined the application of ML techniques for the monitoring of pollutants within soil ecosystems in industrial urbanization contamination.Zhao, W. et al. 2023 present a precise prognostication framework for soil heavy metal contamination, utilizing an enhanced amalgamation of three distinct machine learning methodologies: extreme gradient boosting (XGB), random forest (RF), and an artificial neural network (ANN) [92].A further investigation has elucidated the considerable efficacy of machine learning techniques, notably RF and cubist techniques, in leveraging environmental datasets to forecast concentrations of heavy metals in soil [93].Moreover, a study employs RF simulations in conjunction with spatial bivariate analysis to discern the presence of heavy metal pollution in agricultural land.Spatial bivariate analysis is utilized to investigate the interplay between soil metal contamination and predominant human activities [94].In additional research aimed at delineating soil pollution within an arsenic-contaminated agricultural domain, four distinct automated apprehension methodologies were employed.These methodologies encompassed the support vector machine (SVM), multi-layer perceptron (MLP), random forest (RF), and extreme random forest (ERF) models.Notably, the extreme random forest (ERF) model exhibited superior performance among the studied methodologies [95,96].And last, in Zhang, H. et al. 2020, three models, RF, ANN, and SVM, allow the source identification and spatial prediction of heavy metals in soil in a rapid urbanization area [97].
Taken together, the advantages of using ML-AI in complex industrial settings include enhanced monitoring and accurate classification.Models including extreme gradient boosting and random forest provide high accuracy in identifying contamination sources.Notwithstanding, the disadvantages include the requirement for technical expertise, high computational power, and extensive datasets.

Conclusions
In contemporary environmental monitoring, numerous ML algorithms are employed, with a preference for adapted classic models.Supervised ML methods have been predominantly favored over unsupervised approaches in recent scholarly works.RF is currently the most widely used method in this field of research.RF is favored in machine learning for its adaptability to classification and regression tasks.Its resilience against overfitting is notable, attributed to the construction of each tree using random data subsets.RF's simplicity facilitates its application and allows for ensemble integration, enhancing model efficacy in solving intricate problems and improving predictive accuracy [9][10][11].The article is employed for geospatial strategic planning probability maps, object identification, and general prediction models.In various previously mentioned geoscience applications, it is predominantly utilized for monitoring water discharges and in complex industrial urbanization contamination [64,74,79,81,86,96,98,99]. The DT, following the RF methods, stands as a prevalent ML approach esteemed for its user-friendly interpretability, adaptable nature across both classification and regression tasks, and adept handling of diverse data types, including numerical and categorical variables [20].Noteworthy for its capability to elucidate decision pathways and accommodate various data complexities, the DT consistently demonstrates robust predictive performance, often rivaling or surpassing more sophisticated methodologies such as RF.In the numerous geoscientific contexts previously cited, its primary application lies in the water industrial urbanization and associated contamination for water quality prediction [64,66,75,80,86,89].The support vector machine (SVM) is highly esteemed in the field of machine learning due to its capacity for optimal data classification, exceptional versatility, and efficiency, all achieved without necessitating extensive parameter tuning or adjustments [74,75].Conversely, the ANN offers the capability to approximate any computable function, facilitate pattern recognition, and address common troubleshooting challenges, leveraging the advancements in computational prowess [79,97].SVM and ANN are largely used in concentration maps of spatial patterns for complex industrial urbanization contamination.CNN [57,64,90] and PCA [86,91] methodologies are prominently featured among the preeminent analytical approaches employed in contemporary research endeavors.

Outlook and Future Research
Future advancements in the AI-ML domain involve improving existing models such as RF, DT, SVM, ANN, CNN, and PCA to elevate their accuracy and efficiency for adapted tasks.Simultaneously, pioneering algorithms could emerge to tackle specific environmental monitoring challenges, potentially outperforming conventional methods.Integrating diverse methodologies stands as an essential avenue for increasing model efficacy.Ensemble methods, for instance, guarantee further exploration, leveraging the complementary strengths of different techniques.The advanced sensor integration equipped with AI-ML processing capabilities will enable collection and real-time environmental data analysis, providing a broader and more detailed coverage of monitored areas.The development of software platforms dedicated to collecting, storing, and analyzing geoscientific data, along with creating user-friendly software interfaces, will facilitate end users' effective use of these technologies.
The ML model integration and Internet of Things (IoT) sensors are paving the way for future landfill development.This combination facilitates continuous monitoring and real-time decision making, aiding policymakers in devising more effective waste management regulations.In quarry fields, using UAV and satellite imagery with ML models enables automated and comprehensive monitoring.The development of hybrid models, which amalgamate multiple ML techniques, is enhancing the accuracy and reliability of predictive models.Moreover, climate change adaptation and community involvement are revolutionizing coastal dune protection.Models are being developed to predict and adapt to the impacts of climate change on coastal dunes.Simultaneously, user-friendly tools are being created for local communities to monitor and protect their coastal areas.Sea discharge management is being employed through real-time analysis and integration with environmental policies.In complex industrial settings, smart remediation and cross-disciplinary approaches are being employed.These advancements are contributing significantly to environmental conservation and protection.
Prospective research endeavors might examine novel applications, particularly in domains where conventional methods encounter limitations.Prioritizing the development of models resilient to overfitting while maintaining interpretability could spearhead future innovations.

Figure 1 .
Figure 1.Word cloud graphics (on the left) displays an ML technique in order of importance (based on the font size) based on the four fields of geology (hydrology, geophysics, geomorphology, and applied geology).The word graph (on the right) represents the frequency of machine learning in geologic world publications.

Figure 2 .
Figure 2. Classification of artificial intelligence algorithms (AI).This figure illustrates the broad spectrum of AI algorithms, with a particular focus on machine learning (ML) methods.It is important to note that numerous methodologies span across multiple categories (for instance, the deep learning methodology).This overlap signifies the versatility and adaptability of these algorithms in various research and application domains.

Figure 3 .
Figure 3. Categorization of machine learning applications in environmental monitoring.This image provides a comprehensive overview of the broad categories of environmental monitoring applications that leverage ML techniques.The categories include map and image classification, object detection and identification, prediction models, data classification, risk and performance metrics, and soil and water quality assessments.

Figure 3 .
Figure 3. Categorization of machine learning applications in environmental monitoring.This image provides a comprehensive overview of the broad categories of environmental monitoring applications that leverage ML techniques.The categories include map and image classification, object detection and identification, prediction models, data classification, risk and performance metrics, and soil and water quality assessments.

Table 2 .
Machine learning environmental monitoring applications.This table categorizes various environmental monitoring applications (e.g., landfill, quarry, safeguarding the coastal dune, discharge into the sea, and complex industrial soil and water contamination) where ML methodologies have been utilized.