Screening and Optimization of Soil Remediation Strategies Assisted by Machine Learning

: A numerical approach assisted by machine learning was developed for screening and optimizing soil remediation strategies. The approach includes a reactive transport model for simulating the remediation cost and effect of applicable remediation technologies and their combinations for a target site. The simulated results were used to establish a relationship between the cost and effect using a machine learning method. The relationship was then used by an optimization method to provide optimal remediation strategies under various constraints and requirements for the target site. The approach was evaluated for a site contaminated with both arsenic and polycyclic aromatic hydrocarbons at a former shipbuilding factory in Guangzhou City, China. An optimal strategy was obtained and successfully implemented at the site, which included the partial excavation of the contaminated soils and natural attenuation of the residual contaminated soils. The advantage of the approach is that it can fully consider the natural attenuation capacity in designing remediation strategies to reduce remediation costs and can provide cost-effective remediation strategies under variable constraints for policymakers. The approach is general and can be applied for screening and optimizing remediation strategies at other remediation sites.


Introduction
Heavy metals and hazardous organic chemicals are common contaminants in soils [1][2][3][4][5].Various approaches have been proposed to remediate soil contaminants at industrial sites, including monitored natural attenuation (MNA), in-situ stabilization, in-situ thermal desorption, excavation (EX), ex-situ treatments, etc. [6][7][8][9][10][11][12].Low-intensity remediation technologies such as MNA have cost-effective advantages over high-intensity remediation technologies such as the EX and ex-situ treatments because they can use the natural attenuation capacity in remediating contaminants [13,14].A challenge with low-intensity remediation technologies is that they often require long durations for natural processes to attenuate contaminants [15][16][17].This will pose a problem for a site targeted for re-development that often has a remediation duration constraint.High-intensity technologies such as EX can significantly shorten remediation durations to meet the site re-development targets [18][19][20].However, the trade-off is the significant increase in remediation cost, especially when the volume of contaminated soils is large [19,21].It would be ideal to combine the high-and low-intensity remediation technologies, using the high-intensity technologies to remediate soils containing the contaminants with a high risk of exposure, and low-intensity technologies to treat low risk contaminants.The challenge is how to design and optimize such a remediation strategy.
The design and optimization of the mixed low-and high-intensity remediation technologies often require the prediction of the fate and transport of contaminants during and

Model Framework
The framework for screening and optimizing remediation strategies for a contaminated site comprises three key components: RTM, ML, and optimization (Figure 2).The RTM is used to simulate the fate and transport of contaminants under various remedial conditions.The results include remediation effects (Figure 2, Yi), such as the residual soil contaminant volumes, groundwater contaminant concentrations, and contaminant discharges to nearby rivers, and remediation parameters (Figure 2, xi), such as the locations and volumes for the implementation of different remediation technologies, the types or concentrations of remediation reagents, etc., that are related to the cost of a remediation strategy.The results are used to train an ML model with a goal to establish the relationships between remediation parameters and remediation effects.The relationships are then used by an optimization algorithm to identify optimal remediation strategies under vari- To investigate the soil contamination, the soil samples were taken to the laboratory for analysis.The soil samples were dried, sieved, and passed through a 2 mm mesh and were then stored for measurements.For As measurement, the soil was first digested with HCl and HNO 3 , and its concentrations were then measured using atomic fluorescence spectrometry (AFS, AFS-9130, Titan Instruments, China) [30].For PAH concentration measurement, the soil was extracted using the Soxhlet extraction method, and the PAHs in the extracts were measured using gas chromatography-mass spectrometry (GC-MS, QP2010 Shimadzu, Kyoto, Japan) [31].The measurement quality control for As was conducted using standard reference soil (GSS-5), with a determination accuracy of 97.5-100.3%.The PAHs measurement quality control was conducted using the soil spiked with standard solutions that had recovery ratios of 79-137% for benzo[a]pyrene (Bap) and 69-82% for dibenz(a,h)anthracene (Dba).
The soil at the site contained As with a concentration range of 1.6 to 210.2 mg/kg, two PAH compounds (Bap and Dba) with Bap concentrations ranging from 0.001 to 4.31 mg/kg, and a Dba concentration from 0.001 to 0.75 mg/kg.Their concentrations in groundwater were below regulatory standards: As: <10 µg/L and PAHs: <10 ng/L.Soil samples were also taken to analyze microbial communities at the site by DNA extraction, 16S rRNA sequencing, and subsequent high-throughput sequencing analysis.The detailed sampling procedure and results are provided in Supporting Materials (Text S1).

Model Framework
The framework for screening and optimizing remediation strategies for a contaminated site comprises three key components: RTM, ML, and optimization (Figure 2).The RTM is used to simulate the fate and transport of contaminants under various remedial conditions.The results include remediation effects (Figure 2, Y i ), such as the residual soil contaminant volumes, groundwater contaminant concentrations, and contaminant discharges to nearby rivers, and remediation parameters (Figure 2, x i ), such as the locations and volumes for the implementation of different remediation technologies, the types or concentrations of remediation reagents, etc., that are related to the cost of a remediation strategy.The results are used to train an ML model with a goal to establish the relationships between remediation parameters and remediation effects.The relationships are then used by an optimization algorithm to identify optimal remediation strategies under various constraints of site-specific remediation requirements.The framework is general and can be applied to any contaminated site.

Model Framework
The framework for screening and optimizing remediation strategies for a contaminated site comprises three key components: RTM, ML, and optimization (Figure 2).The RTM is used to simulate the fate and transport of contaminants under various remedial conditions.The results include remediation effects (Figure 2, Yi), such as the residual soil contaminant volumes, groundwater contaminant concentrations, and contaminant discharges to nearby rivers, and remediation parameters (Figure 2, xi), such as the locations and volumes for the implementation of different remediation technologies, the types or concentrations of remediation reagents, etc., that are related to the cost of a remediation strategy.The results are used to train an ML model with a goal to establish the relationships between remediation parameters and remediation effects.The relationships are then used by an optimization algorithm to identify optimal remediation strategies under various constraints of site-specific remediation requirements.The framework is general and can be applied to any contaminated site.

Reactive Transport Model
A reactive transport simulation software, PFLOTRAN (v5.0, USA) [32], was used as an RTM in this study.In the software, the groundwater flow is simulated using the Richards' equation as follows: where φ is the porosity [-], s is the saturation degree [m The reactive transport of dissolved contaminants such as organic carbon (DOC), As, Bap, and Dba can be described as follows: where C j is the total concentration of component j [mol/L], Ω j is the total flux of component j [mol/(L s)], v jm is the stoichiometric coefficient of chemical component j in the mth reaction, and R m is the mth reaction rate [mol/(L s)].
Various geochemical and biogeochemical reactions controlling the transformation of contaminants can be incorporated into the RTM.In this study, reactions related to As, Bap, and Dba were considered (Figure S2 and Table S3).Because the degradation pathways for the PAHs and As sorption and desorption are often related to iron biogeochemistry [33,34], iron redox reactions were also considered in this study.At the site, the iron(III) content in the soil, as determined by XRF, ranged from 1.6% to 5.5%.
A Monod-type model was used to describe the bioreduction of ferrihydrite, which was often treated as a surrogate for the sorbents of metal and organic contaminants in subsurface systems [35][36][37][38]: where The aerobic microbial respiration was also considered in this study with the following rate expression [39,40]: where k 2 is the oxidation rate constant [L/(mol•h)], C Fe 2+ is the ferrous concentration [mol/L], and The aerobic microbial respiration was also considered in this study with the following rate expression [39,40]: where k 3 is the respiration rate constant [ The rate of DOC production from the transformation of particulate organic carbon (POC) is described below [39,40]: where C POC is the POC concentration [mol/m 3 ], K d,DOC is the distribution coefficient between DOC and POC [L/g], and α DOC is the mass transfer coefficient [1/h].
For PAHs, both aerobic and anaerobic biodegradation processes were considered in this study [25,41].The aerobic reactions and rates are described as follows: C 22 H 14 + 25.5O 2 where k 5 is the aerobic biodegradation rate constant [1/s], C PAHs is the PAH concentration [mol/L], K PAH,5 is the half-saturation constant with respect to PAHs [mol/L], respectively, and other variables are as described before.
The anaerobic reactions and rates are described as follows: C 22 H 14 + 102Fe(OH) 3 where k 6 is the anaerobic biodegradation rates [1/s], K PAH,6 and K Fe 3+ are the half-saturation constants with respect to PAHs and ferrihydrite [mol/m 3 ], respectively, I O 2 is the inhibition constant during anaerobic biodegradation [mol/L], and other variables are described as before.
The co-metabolism of multi-substrate PAH biodegradation was also considered in this study using the following expression [42]: where K si and K sj are the half rate constants for the degradation of PAH compounds i and j [mol/L], and C i and C j are the concentrations of PAH compound i and j [mol/L], respectively.Both As and PAHs can sorb to subsurface materials [43,44].Their competitive sorption was derived by assuming that their sorption follows the Langmuir sorption isotherm as follows: where K As and K PAHs are the sorption affinity constants for As and PAHs [L/mol], respectively, C As and C PAHs are the As and PAH concentrations in the aqueous phase [mol/L], and S t is the total sorption site density [mol/m 3 ].The values of these parameters can be found in Table S2.
For the RTM implementation, a remediation site has to be discretized.For the target site, the high-precision data from a digital elevation map (DEM) and site characterization were used in the discretization.An unstructured triangular grid was generated by COMSOL Multiphysics software (v6.0,USA) with a denser grid in the contaminated areas.Vertically, the bedrock lies approximately 15 m below the ground surface, while contaminants are mainly located at depths of 0-3 m.Consequently, variable vertical discretization intervals were used in this study with an interval of 0.5 m, from 0 to 3 m depth, 1 m, from 3 to 5 m depth, and 5 m, from 5 to 20 m depth, resulting in a total number of the numerical grids of 86,076.The model domain covered the entire study area with the following boundary conditions: tidal water level in the nearby Zhujiang River for the eastern boundary (Figure S3b); ground surface with local precipitation infiltration at the top boundary (Figure S3a); no-flux condition at the bottom boundary where is an impermeable layer; and no-flux condition at other boundaries because they were placed at the watershed edge [45].The measured chemical compositions in the nearby Zhujiang River water samples were used as the chemical concentration conditions at the eastern boundary.The initial concentration distributions of soil contaminants (such as As, Bap, and Dba) were derived from the site characterization and were interpolated by MATLAB R2019b.The initial As and PAH concentrations in groundwater were estimated by assuming their equilibrium with the concentrations in the soil.
There were 7 contaminated zones to be remediated at the site (Figure 3a).For each remediation zone, the effects of individual remediation technologies were simulated using the RTM.The candidate technologies that are acceptable to local agencies include MNA, enhanced natural attenuation (ENA), in-situ stabilization, and EX.The simulation results (Figures S5-S8) indicated that only the EX treatment could meet the site remediation requirements, i.e., the groundwater concentrations of all the contaminants must be lower than drinking water standards (As <10 µg/L, PAHs <10 ng/L [46][47][48]) and the concentrations of residual soil contaminants must be below the standards of industrial land development (As <60 mg/kg, PAHs <0.55 mg/kg [49,50]).However, the volume for the EX treatment can be optimized by treating heavily contaminated soils using the EX treatment, the slightly contaminated soils with enhanced natural attenuation (ENA), and the residual contaminated soils treated with MNA.Consequently, for the subsequent simulations, the EX treatment was placed in the center of radius R1, which was the place with the highest concentrations of contaminants.The ENA was from radius R1 to R2, and the MNA for the remaining contaminated area, where the concentrations of contaminants were relatively low in each zone (Figure 3b).For each zone, R1 and R2 and the corresponding depth D1 and D2 from the ground surface were the variables to be optimized with a goal to reduce the total remediation cost.In this study, the main cost was associated with the volume for the EX and treatment under the constraints that soil and groundwater quality values meet the regulatory standards after remediation.The values for R1, R2, D1, and D2 were assumed to be independent and could be different for different remediation zones (Zone 1-7).This leads to 4 variables for each zone, yielding 28 variables to be optimized for the site with 7 remediation zones.
the regulatory standards after remediation.The values for R1, R2, D1, and D2 were assumed to be independent and could be different for different remediation zones (Zone 1-7).This leads to 4 variables for each zone, yielding 28 variables to be optimized for the site with 7 remediation zones.zone where the center region with a radius R1 and depth D1 is to be treated with EX, the region from R1 to R2 with a depth of D2 to be treated with ENA, and the remaining contaminated soil outside R2 to be treated with MNA in each zone.

ML Model
In this study, the RTM was first used to simulate remediation effects with different combinations of the remediation technologies for the 7 remediation zones.The simulated results were used to train an ML model to establish the relationship between the inputs (variables R1, R2, D1, and D2 for 7 zones) and outputs (maximum concentrations of the contaminants in groundwater and the residual contaminated soil volumes for MNA).The reliability of the ML-based relationship was evaluated during the training process using variables R 2 and RMSE.The data required for training a reliable ML-based relationship was continuously generated by the RTM until the ML model achieved a reliable accuracy.For the current case, 291 cases were simulated for training the ML model, satisfying the ratio requirement of sample size to feature size (280 cases at least) [51].Four different ML models were evaluated in this study: ridge regression, support vector machine (SVR), random forest, and extreme gradient boosting (XGBoost).The five-fold cross-validation was used in this study to obtain a set of optimal hyperparameters for constructing the ML model, where the training and test datasets were divided at a 3:1 ratio.The model with the highest precision was chosen.

Optimization of Remediation Strategies
With the establishment of the ML-based relationship between the remediation input and output variables, an optimization algorithm developed at the University of Arizona (SCE-UA [52]) was used to search for optimal strategies and associated parameters by minimizing the total remediation cost (e.g., EX treatment volume) under different regulatory constraints and site requirements.The SCE-UA algorithm was chosen due to its excellent performance and efficiency in global search [53][54][55] with the detailed optimization procedure described elsewhere [56].zone where the center region with a radius R1 and depth D1 is to be treated with EX, the region from R1 to R2 with a depth of D2 to be treated with ENA, and the remaining contaminated soil outside R2 to be treated with MNA in each zone.

ML Model
In this study, the RTM was first used to simulate remediation effects with different combinations of the remediation technologies for the 7 remediation zones.The simulated results were used to train an ML model to establish the relationship between the inputs (variables R1, R2, D1, and D2 for 7 zones) and outputs (maximum concentrations of the contaminants in groundwater and the residual contaminated soil volumes for MNA).The reliability of the ML-based relationship was evaluated during the training process using variables R 2 and RMSE.The data required for training a reliable ML-based relationship was continuously generated by the RTM until the ML model achieved a reliable accuracy.For the current case, 291 cases were simulated for training the ML model, satisfying the ratio requirement of sample size to feature size (280 cases at least) [51].Four different ML models were evaluated in this study: ridge regression, support vector machine (SVR), random forest, and extreme gradient boosting (XGBoost).The five-fold cross-validation was used in this study to obtain a set of optimal hyperparameters for constructing the ML model, where the training and test datasets were divided at a 3:1 ratio.The model with the highest precision was chosen.

Optimization of Remediation Strategies
With the establishment of the ML-based relationship between the remediation input and output variables, an optimization algorithm developed at the University of Arizona (SCE-UA [52]) was used to search for optimal strategies and associated parameters by minimizing the total remediation cost (e.g., EX treatment volume) under different regulatory constraints and site requirements.The SCE-UA algorithm was chosen due to its excellent performance and efficiency in global search [53][54][55] with the detailed optimization procedure described elsewhere [56].

Simulated Groundwater Flow and Reactive Transport of Contaminants under Natural Conditions
Groundwater flow fluctuated daily and seasonally, as revealed by the fluctuation of groundwater levels at the monitoring wells at the site (Figures 4 and 5).The fluctuation of the groundwater levels was mainly attributed to the fluctuation of nearby river water levels as a result of tidal and seasonal precipitation events (Figure S3).The groundwater levels simulated by the RTM matched well with the observed monitoring wells with a Nash coefficient of 0.83 (Figure 4a), indicating a good agreement between the simulated and observed data [57].The groundwater flow directions at the site changed frequently near the riverside (e.g., GW4).This change diminished with an increasing distance from the river (e.g., GW1).The frequent reversals of groundwater flow directions would increase the exchange of river water with groundwater, promoting the biogeochemical transformation of soil contaminants [58][59][60].
Groundwater flow fluctuated daily and seasonally, as revealed by the fluctuation of groundwater levels at the monitoring wells at the site (Figures 4 and 5).The fluctuation of the groundwater levels was mainly attributed to the fluctuation of nearby river water levels as a result of tidal and seasonal precipitation events (Figure S3).The groundwater levels simulated by the RTM matched well with the observed monitoring wells with a Nash coefficient of 0.83 (Figure 4a), indicating a good agreement between the simulated and observed data [57].The groundwater flow directions at the site changed frequently near the riverside (e.g., GW4).This change diminished with an increasing distance from the river (e.g., GW1).The frequent reversals of groundwater flow directions would increase the exchange of river water with groundwater, promoting the biogeochemical transformation of soil contaminants [58][59][60].The effect of MNA was simulated to predict the fate and reactive transport of the contaminants under natural conditions.The results showed that the As concentration in groundwater gradually increased with time near the contaminant sources, reaching 13.2 µg/L after 10 years (Figure 6a and Figure S5) as a result of the slow reduction of sorbent ferrihydrite.Meanwhile, the As concentration in the soil slowly decreased as it was released to groundwater.The simulated result also showed that PAH oxidation enhanced the As release under anaerobic conditions, a phenomenon consistent with the literature reports [61,62].The biogeochemical processes related to iron reduction and PAH biodegradation considered in this study were feasible at the site where the main microbial compositions were Proteobacteria (42-65%) and Firmicutes (23-34%) at the phylum level and Bacillus, Hydrogenophaga, and Acinetobacter at the genus level (Table 1 and Figure S2).These microorganisms can reduce iron oxides and degrade PAHs.

Bacillus
13-30% Iron bioreduction, PAHs aerobic and anaerobic biodegradation [63][64][65] Hydrogenophaga 5-32% PAHs aerobic and anaerobic biodegradation [63,66] Acinetobacter 2-30% PAHs aerobic biodegradation [67] The As flux from the site to the nearby Zhujiang River was small (Figure 5b) and did not significantly increase the As concentration in the river as it was diluted by the high water flux in the river (Figure 5a).The result indicated that the MNA could remediate the contaminant in the soils at the site without affecting the nearby river water quality.However, it could not meet the time constraint for the site that was targeted for subsequent redevelopment after remediation within a short time duration.It may also cause a problem of groundwater contamination during the long duration of the MNA.Consequently, the MNA alone was not enough to remediate the site.The simulated concentrations of both Bap and Dba in groundwater were lower than the regulatory level (10 ng/L, Figure 6a), while their concentrations in the soil were significantly over the regulatory level (Figure 6b) during the simulation period.In the simulation, the microbial degradation of PAHs was considered.However, their solubility was

Simulated Effects of Active Remediation Technologies
The effects of two active remediation technologies, ENA and EX, were individually simulated to evaluate the changes in contaminant concentrations in groundwater and soil.ENA was implemented with the addition of organic material (lactate in this study) as the electron donor and carbon source for microorganism growth [70,71].The implementation of the ENA resulted in a faster As release rate from the soil than MNA due to the enhanced ferrihydrite reduction by microorganisms with the addition of organic matter (Figure 6a,b).The faster As release led to the earlier arrival of As concentrations in groundwater to above regulatory level, indicating that the ENA would also potentially cause groundwater contamination.The microbial degradation of Bap and Dba was also enhanced by  The As flux from the site to the nearby Zhujiang River was small (Figure 5b) and did not significantly increase the As concentration in the river as it was diluted by the high water flux in the river (Figure 5a).The result indicated that the MNA could remediate the contaminant in the soils at the site without affecting the nearby river water quality.However, it could not meet the time constraint for the site that was targeted for subsequent redevelopment after remediation within a short time duration.It may also cause a problem of groundwater contamination during the long duration of the MNA.Consequently, the MNA alone was not enough to remediate the site.
The simulated concentrations of both Bap and Dba in groundwater were lower than the regulatory level (10 ng/L, Figure 6a), while their concentrations in the soil were significantly over the regulatory level (Figure 6b) during the simulation period.In the simulation, the microbial degradation of PAHs was considered.However, their solubility was low, leading to the low biodegradability of PAHs in the soil [68,69].The results of As and PAHs both indicated that active remediation of the contaminated soil was needed for the targeted site.

Simulated Effects of Active Remediation Technologies
The effects of two active remediation technologies, ENA and EX, were individually simulated to evaluate the changes in contaminant concentrations in groundwater and soil.ENA was implemented with the addition of organic material (lactate in this study) as the electron donor and carbon source for microorganism growth [70,71].The implementation of the ENA resulted in a faster As release rate from the soil than MNA due to the enhanced ferrihydrite reduction by microorganisms with the addition of organic matter (Figure 6a,b).The faster As release led to the earlier arrival of As concentrations in groundwater to above regulatory level, indicating that the ENA would also potentially cause groundwater contamination.The microbial degradation of Bap and Dba was also enhanced by the addition of organic matter.However, the rate decreased significantly when the added organic matter was diminished with time, indicating that it needs to be replenished continuously to maintain the degradation [72].
The EX is a simple and time-effective remediation strategy for contaminant removal [18,73,74].The concentrations of As, Bap, and Dba after EX can both meet groundwater and soil remediation goals (Figures 6 and S8).However, the cost for the EX was much higher than the MNA and ENA as it requires the excavation of a large amount of the contaminated soil and subsequent treatment.Considering the efficiency and cost of the MNA, ENA, and EX, it would be ideal to combine them with the goal of minimizing the remediation cost under the constraints of regulatory and local requirements.The remediation requirements at the site were various, including the discharge of contaminants from the site to the nearby river that would not affect river water quality, no groundwater contamination was allowed during and after the remediation, contaminant concentrations in the soil must meet regulatory criteria, and the remediation duration must meet site-redevelopment schedule.Given these requirements, a remediation strategy, as described in Figure 3, was proposed, which separated each remediation zone into three spatial regions defined by parameters R1, R2, D1, and D2.In order to determine the values of these parameters that would give the minimal remediation cost under the constraints of the remediation requirements, RTM was first used to simulate the reactive transport of contaminants with different combinations of these parameters.The data generated from the RTM simulations were then used to establish the relationships linking these parameter values with corre-sponding remediation costs, which were then used to find an optimal remediation strategy, as described in the next section.

Remediation Strategies and Optimization
The RTM-simulated results showed that different remediation strategies, as generated by adjusting remediation parameters (R1, R2, D1, and D2) for each remediation zone, would lead to different residual volumes of the contaminated soil for subsequent MNA (Figure 7a) and different groundwater concentrations (Figure 7b) as a function of the total volume of the contaminated soil treated with EX and ENA.A larger excavated volume generally, as expected, led to lower groundwater concentrations and residual volumes of the contaminated soil.The result also showed that for a specific EX volume of the contaminated soil, groundwater concentrations and residual volumes of the contaminated soil varied.This is because the different placements of the EX zones within the contaminated areas would lead different remediation results.The placement of the EX at a highly contaminated location within each zone led to lower groundwater concentrations and residual volumes of the contaminated soil despite the same amount of the excavated contaminated soil.These results indicated that extensive RTM simulations are required to find an optimal combination of the remediation parameters (R1, R2, D1, and D2), which would yield the lowest cost of remediation.

Prospect and Limitation
The framework used in this study was general and can be adapted to other contaminated sites by incorporating site-specific data such as hydrology, geochemistry, and biogeochemistry, as well as the site requirements and constraints for remediation.Being module-based, the framework allows any advancements in each module to be readily updated.For example, the ML part is rapidly advancing, becoming more intelligent in handling high-dimensional and large-scale datasets [76][77][78].Such an advance can be readily incorporated into the ML module in the framework.Similarly, the optimization method is also fast evolving with progress, such as swarm intelligence algorithms, hybrid algo- In this study, instead of performing extensive RTM simulations, the results from a select set of simulations, as shown in Figure 7, were used to train an ML model to establish a relationship linking the remediation parameters (R1, R2, D1, and D2) with the simulated results of groundwater concentrations and residual volumes of the contaminants at the site.When properly trained, such an ML-based relationship could replace the RTM to generate simulation results from other combinations of remediation parameters (R1, R2, D1, and D2).In this study, various ML methods were compared to evaluate their performances in fitting the RTM-simulated results, as shown in Figures S9 and S10, and it was found that the XGBoost method had the best performance.
The ML-based relationship from the XGBoost method was then used by an optimization algorithm (SCE-UA) to search for an optimal combination of remediation parameters (R1, R2, D1, and D2) that would give the lowest cost of the remediation and meet the site remediation requirements (Figure 7c,d).During the optimization process, the remediation parameters (R1, R2, D1, and D2) were optimized for each contaminant (As, Bap, and Dba).The optimization results indicated that to meet the groundwater quality standard for As (<10 µg/L) at the site, 5768 m 3 of the contaminated soil with the high concentration has to be excavated, while the other soil can be left for the MNA.Since the simulated groundwater concentrations of Bap and Dba are always less than the groundwater quality standards, the EX treatment of 5768 m 3 of the contaminated soil with the MNA for the residual contaminated soil can meet all groundwater standards for both As and PAH contaminants.The residual contaminated soil would not pose a health risk if kept at the site despite the fact that the concentrations of the contaminants in the residual contaminated soil still exceeded the soil quality standards.If the site needs to meet both the groundwater standards and the soil standards, then the excavated volume would be 40,755 m 3 for the As-contaminated soil, 21,922 m 3 for the Bap-contaminated soil, and 5149 m 3 for the Dba-contaminated soil.Because of the partial overlap of the soils contaminated with these contaminants (Figure 3), the total excavation volume would be less than their addition.The optimization results also showed that adding ENA with a volume of 6910 m 3 would reduce the excavation volume (Table S3).The other residual contaminated soils with their contaminant concentrations above the background levels were left for MNA.The result from the optimization process indicated that the combination of EX and MNA was the most optimum method to treat the contaminated soil.EX is a time-efficient but costly technology for soil remediation, while MNA is a low-cost technology but requires a long duration of remediation action.For targeting a site for re-development, a long duration of remediation is not acceptable.The combination of the EX and MNA can have advantages and minimize the shortcomings of each technology.Here, the EX was optimized to treat the contaminated soils with high concentrations of contaminants, while the areas containing low concentrations of contaminants were left for MNA.Because the areas with low concentrations of contaminants have no risk of health risk exposure, they can be left at the site for MNA without a need for active remediation.Consequently, the combined EX and MNA can reduce the amount of the contaminated soil for EX treatment, and meanwhile, it can significantly reduce the time duration of active remediation.
Under the actual field conditions, the selection of remediation strategies should be considered under different constraints suitable to the specific sites, such as economic, environmental, and social impacts, besides the selection of remediation technologies to meet both regulatory groundwater and soil standards [75].In this context, our framework can readily consider the different constraints during the optimization step that can be tailored to a specific site.At this site, a remediation strategy that combines EX and MNA was eventually selected to meet both groundwater and soil standards because of the current regulatory requirements for the site redevelopment.From the view of remediation cost, treating 5768 m 3 of contaminated soil with EX, followed by MNA for the residual contaminated soil, was cost-effective and could reduce the cost by 91.5%, assuming that the cost for MNA is negligible compared to EX.As a validation of the approach, the simulated results of aqueous As concentrations were compared with the measured concentrations at the monitoring wells (Figure S11).The simulated results generally matched well with the measured results, indicating that the approach was effective and could be used for screening and optimizing soil remediation strategies.
A similar framework to the one reported previously [17] was used in this study, which consists of RTM, ML, and optimization algorithms.However, the previous study focused more on the development of the framework itself with a demonstration site having a relatively simple contamination history and biogeochemistry.The current study focused on solving a real and complex problem using the framework on a site contaminated with both organic and inorganic contaminants, featuring multiple contamination zones as shown in Table 2.This significantly increases the complexity of the biogeochemistry and optimization, providing a realistic demonstration of the framework's application and an optimal remediation strategy for the site.Additionally, the simulated results presented here were compared with the measured data monitored in this study, validating the effectiveness of the framework.

Prospect and Limitation
The framework used in this study was general and can be adapted to other contaminated sites by incorporating site-specific data such as hydrology, geochemistry, and biogeochemistry, as well as the site requirements and constraints for remediation.Being module-based, the framework allows any advancements in each module to be readily updated.For example, the ML part is rapidly advancing, becoming more intelligent in handling high-dimensional and large-scale datasets [76][77][78].Such an advance can be readily incorporated into the ML module in the framework.Similarly, the optimization method is also fast evolving with progress, such as swarm intelligence algorithms, hybrid algorithms, and nature-inspired algorithms that are computationally efficient and fast converging with a strong global search capability [79][80][81].Such advances can also be incorporated into the framework by updating the optimization module.One major limitation of the application of the framework is that extensive site characterization of hydrology and biogeochemistry properties is required to obtain a reliable set of input data for the RTM, which is the foundation of the framework.The high-precision prediction of remediation effects from different potential remediation approaches relies on the accuracy of simulated RTM results, which will ultimately determine the selection of an optimal remediation strategy during the optimization step.

Conclusions
This study developed a numerical approach that can be used for screening and optimizing remediation strategies to clean up contaminants in soil and groundwater systems.The approach consisted of three components: (1) RTM simulations of the fate and transport of contaminants under natural and remedial conditions for preliminary screening of potential remediation technologies; (2) using the RTM simulation data to train an ML model for establishing the relationships linking remediation parameters, such as sizes and locations, for the implementation of specific remediation technologies with remediation effects, such as residual concentrations in groundwater and soil, their subsequent changes after remediation, and the costs associated with the implementation of remediation technologies, such as the excavation volume of the contaminated soil; and (3) using the relationship and global optimization algorithm to find optimal remediation strategies.Using a field site that was contaminated with As and PAHs as an example, this study demonstrated the application of the approach in finding optimal remediation strategies.The optimal approach selected from this study under the constraint of local regulations and site requirements was successfully implemented.
While a site co-contaminated with As and PAHs was demonstrated, the developed approach is general for screening and optimizing remediation strategies at any other sites.One important adaptation in applying the approach to other sites is the biogeochemical part in the RTM, including biogeochemical reactions and parameters, which may be specific to the target site.The ML part of the approach is to reduce computational cost because extensive RTM simulations are required to generate enough data for establishing the relationships between remediation parameters and remediation effects.Such relationships can be established with a limited number of RTM simulations when ML is used.The ML part can be ignored if the remediation site is relatively small and biogeochemical reactions are relatively simple to describe the fate and transport of contaminants.For a large remediation site or a site contaminated with multiple contaminants with complex geochemical behaviors, the ML part is recommended to be included.

Figure 1 .
Figure 1.Location of (a) the study site, site boundaries (red line in (b)), and model boundaries (blue line in (c)).

Figure 1 .
Figure 1.Location of (a) the study site, site boundaries (red line in (b)), and model boundaries (blue line in (c)).

Processes 2024 ,
12, x FOR PEER REVIEW 3 of 19rRNA sequencing, and subsequent high-throughput sequencing analysis.The detailed sampling procedure and results are provided in Supporting Materials (Text S1).

Figure 1 .
Figure 1.Location of (a) the study site, site boundaries (red line in (b)), and model boundaries (blue line in (c)).

Figure 2 .
Figure 2. A model framework for screening and optimizing remediation strategies.

Figure 2 .
Figure 2. A model framework for screening and optimizing remediation strategies.

Figure 3 .
Figure3.Illustration of (a) 7 remediation zones and (b) remediation variables for each remediation zone where the center region with a radius R1 and depth D1 is to be treated with EX, the region from R1 to R2 with a depth of D2 to be treated with ENA, and the remaining contaminated soil outside R2 to be treated with MNA in each zone.

Figure 3 .
Figure3.Illustration of (a) 7 remediation zones and (b) remediation variables for each remediation zone where the center region with a radius R1 and depth D1 is to be treated with EX, the region from R1 to R2 with a depth of D2 to be treated with ENA, and the remaining contaminated soil outside R2 to be treated with MNA in each zone.

Figure 4 .
Figure 4. Observed and simulated groundwater levels at (a-d) individual monitoring wells, (e) all observed and simulated data, and the simulated groundwater velocity at monitoring wells (f) GW1 and GW4.

Figure 5 .
Figure 5.The flux of water (plot (a)) and As (plot (b)) from the site to the river.

Figure 5 .
Figure 5.The flux of water (plot (a)) and As (plot (b)) from the site to the river.

Figure 7 .
Figure 7. Relationship between the residual volume of the contaminated soil vs. the total remediation volume of the contaminated soil (excavated volume) (a) and maximum concentrations of the contaminants in groundwater as a function of total remediation volume of the contaminated soil (b).The optimized residual volume of the contaminated soil as a function of total remediation volume from the ML-SCE-UA results (c) and the maximum As concentration in groundwater as a function of remediation volume (d).

Figure 7 .
Figure 7. Relationship between the residual volume of the contaminated soil vs. the total remediation volume of the contaminated soil (excavated volume) (a) and maximum concentrations of the contaminants in groundwater as a function of total remediation volume of the contaminated soil (b).The optimized residual volume of the contaminated soil as a function of total remediation volume from the ML-SCE-UA results (c) and the maximum As concentration in groundwater as a function of remediation volume (d).
1/s], X is the biomass concentration [mol/L], C O 2 is the DO concentration [mol/L], C DOC is the DOC concentration [mol/L], and K DOC and K O 2 are, respectively, the half-saturation constants with respect to DOC and DO [mol/L].

Table 1 .
Microorganisms and their functions at the target site.

Table 2 .
The comparison between the current and previous study.

Table 2 .
The comparison between the current and previous study.