# Accelerating Contaminant Transport Simulation in MT3DMS Using JASMIN-Based Parallel Computing

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Methodology and Implementations

#### 2.1. Governing Equation

^{3}), t denotes time (T), D denotes hydrodynamic dispersion coefficient tensor (L

^{2}/T), x

_{ij}denotes distance along the respective Cartesian coordinate axis (L), v

_{i}denotes the Darcy velocity (L/T), q

_{s}denotes the volumetric flow rate per unit volume at sources(positive) and sinks(negative) (T

^{−1}), and C

_{s}denotes the concentration of the source of sink flux for species (M/L

^{3}).

#### 2.2. JASMIN Data Structures

#### 2.3. Parallel Computing Strategies

#### 2.4. Coupling Flow and Solute Simulation

## 3. Results and Discussion

#### 3.1. Correctness Verification

^{2}with a thickness of 80 m. The problem size can be represented as 20 × 20 × 6 in the x, y, and z directions.

^{3}/d per well. Dynamic recharge occurs at the first layer with the rate in Figure 8. For the contaminant transport model, a rectangular contaminated area with an area of 200 × 100 m

^{2}was located in the first layer. The upper left corner of the rectangular area was located at (400, 900). The contaminant area was treated as initial concentration without other sources. The initial concentration was set to 5000 mg/L. Three observation wells were set at obs 1 (325, 825) in layer 1, obs 2 (525, 475) in layer 4, and obs 3 (625, 175) in layer 5. The model was run for 3600 days with 120 stress periods with one time step per stress period. The contaminant transport program automatically calculates transport steps depending on the flow velocity and Courant number. The others parameters of the flow model and contaminant transport model are shown in Table 1.

^{−6}. The test was carried out on an SR4803S Linux workstation with 4 AMD OPTERON 6174 CPUs (2.2 GHz). Each CPU contained 12 processors, and the memory was 64 GB DDR3. The JASMIN framework version 2.1 (CAEP software center for high performance numerical simulation, Beijing, China) and the visual software JAVIS 1.2.3 (CAEP software center for high performance numerical simulation, Beijing, China) were used. All codes were compiled by GCC 4.5.1 with -O2 flags. The solutions of JMT3D on one processor were compared with numerical solutions of MODFLOW-MT3DMS. The root-mean-square error (RMSE) and the coefficient of determination were used to evaluate the correctness. The smaller the RMSE value, the better the result. The closer the coefficient of determination to 1, the better the result.

^{2}denotes the coefficient of determination, C

_{i MT3DMS}denotes the concentration of cell i of MT3DMS, C

_{i JMT3D}denotes the concentration of cell i of JMT3D, and ${\overline{C}}_{i\ast}$ denotes the average concentration of MT3DMS or JMT3D.

^{2}values of the observation wells are above 0.99. The concentration at 3600 days presents values of RMSE = 0.014 mg/L and R

^{2}= 1, and the average relative error is 1.72 × 10

^{−3}. These findings show that the results of JMT3D can maintain a small error compared with the original results from the MODFLOW-MT3DMS program. The reason for the error is that JMT3D continues to use the flow information of JOGFLOW. Although the results of JOGFLOW are consistent with the results of MODFLOW [25], it is impossible to guarantee that all flow information is exactly the same. The difference of flow will be transmitted to the contaminant transport computation. A comparison of the higher initial concentration and the final concentration shows that the relative errors ranged from 10

^{−4}~10

^{−6}, and the accuracy is sufficient to meet the actual need. In our opinion, the result fits well with the MT3DMS. In addition, some tests were conducted to compare the serial with parallel processes. The relative errors between the serial and parallel processes are less than 10

^{−6}. The errors are affected only by the rounding error at different core numbers.

#### 3.2. The Parallel Performance with Different Iteration Methods

^{−4}m/d. The other parameters were unchanged. The flow model was solved by the common CG iteration method. The contaminant transport model was solved by the algebraic multigrid (AMG), Bi-conjugate gradient variant algorithm (BiCGSTAB), generalized minimal residual algorithm (GMRES), BiCGSTAB with preconditioner AMG (AMG-BiCGSTAB), and GMRES with preconditioner AMG (AMG-GMRES) methods. Every method was tested on 1, 2, 4, 8, 16, 24, 32, and 46 processors, respectively. The flow and contaminant transport convergence standard were both set to 1 × 10

^{−6}, and the maximum iteration number was set to 100. To exclude occasional errors, the model was tested at least three times for each method.

_{p}denotes the speedup, T

_{s}denotes the wall-clock time in the serial program, T

_{p}denotes the wall-clock time in the parallel program, E

_{p}denotes the efficiency of a parallel program, and P denotes the number of processors.

#### 3.3. The Parallel Performance with High Heterogeneity

#### 3.4. The Parallel Performance with Transient Flow

#### 3.5. Scaling Tests

_{A}and T

_{B}denote the wall-clock time of the AMG and BiCGSTAB methods, respectively. E

_{A}and E

_{B}denote the efficiency of the AMG and BiCGSTAB methods, respectively. I

_{A}and I

_{B}denote the number of iterations of the AMG and BiCGSTAB methods, respectively.

## 4. Summary and Outlook

- The MT3DMS was parallelized by a high-performance parallel framework, and models with hundreds of thousands to tens of millions of cells were simulated on tens to hundreds of processors. The developed parallel JMT3D speeded up 31.7 times and reduced memory consumption by 96% on 46 processors.
- A domain decomposition method and stencil-based method were implemented. Equations can be solved simultaneously as a serial program. The number of iterations does not increase as the number of processors. This can ensure that results from serial and parallel are as consistent as possible.
- The test results showed that the BiCGSTAB method required the least time and achieved high speedup in most cases. Highly heterogeneity and transient flow have little effect on the speedups. The parallel coupling flow and contaminant transport can further improve the parallel performance, which achieved a 33.45 times greater speedup on 46 processors.
- Contaminant transport simulation is the main time-consuming component of high-resolution flow and contaminant transport simulation. The higher the resolution, the greater the proportion of the contaminant transport simulation.

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Wheeler, M.F.; Peszynska, M. Computational engineering and science methodologies for modeling and simulation of subsurface applications. Adv. Water Resour.
**2002**, 25, 1147–1173. [Google Scholar] [CrossRef] - Zheng, C.M.; Bennett, G.D. Applied Contaminant Transport Modeling, 2nd ed.; Wiley-Interscience: New York, NY, USA, 2002; pp. 273–282. [Google Scholar]
- Hammond, G.E.; Lichtner, P.C. Field-scale model for the natural attenuation of uranium at the Hanford 300 Area using high-performance computing. Water Resour. Res.
**2010**, 46, 389–390. [Google Scholar] [CrossRef] - Gwo, J.P.; D’Azevedo, E.F.; Frenzel, H.; Mayes, M.; Yeh, G.T.; Jardine, P.M.; Salvage, K.M.; Hoffman, F.M. HBGC123D: A high-performance computer model of coupled hydrogeological and biogeochemical processes. Comput. Geosci.
**2001**, 27, 1231–1242. [Google Scholar] [CrossRef] - Dong, Y.; Li, G. A parallel PCG solver for MODFLOW. Groundwater
**2009**, 47, 845–850. [Google Scholar] [CrossRef] [PubMed] - Hwang, H.T.; Park, Y.J.; Sudicky, E.A.; Forsyth, P.A. A parallel computational framework to solve flow and transport in integrated surface-subsurface hydrologic systems. Environ. Model. Softw.
**2014**, 61, 39–58. [Google Scholar] [CrossRef] - Abdelaziz, R.; Le, H.H. MT3DMSP - A parallelized version of the MT3DMS code. J. Afr. Earth Sci.
**2014**, 100, 1–6. [Google Scholar] [CrossRef] - Wu, Y.S.; Zhang, K.N.; Ding, C.; Pruess, K.; Elmroth, E.; Bodvarsson, G.S. An efficient parallel-computing method for modeling nonisothermal multiphase flow and multicomponent transport in porous and fractured media. Adv. Water Resour.
**2002**, 25, 243–261. [Google Scholar] [CrossRef] - Kollet, S.J.; Maxwell, R.M. Integrated surface-groundwater flow modeling: A free-surface overland flow boundary condition in a parallel groundwater flow model. Adv. Water Resour.
**2006**, 29, 945–958. [Google Scholar] [CrossRef] [Green Version] - Kollet, S.J.; Maxwell, R.M.; Woodward, C.S.; Smith, S.; Vanderborght, J.; Vereecken, H.; Simmer, C. Proof of concept of regional scale hydrologic simulations at hydrologic resolution utilizing massively parallel computer resources. Water Resour. Res.
**2010**, 46, W04201. [Google Scholar] [CrossRef] - Zhang, K.; Wu, Y.-S.; Pruess, K. User’s Guide for TOUGH2-MP-a Massively Parallel Version of the TOUGH2 Code; Report LBNL-315E; Lawrence Berkeley National Laboratory: Berkeley, CA, USA, 2008. [Google Scholar]
- Wei, X.; Li, W.; Tian, H.; Li, H.; Xu, H.; Xu, T. THC-MP: High performance numerical simulation of reactive transport and multiphase flow in porous media. Comput. Geosci.
**2015**, 80, 26–37. [Google Scholar] [CrossRef] - Tang, G.; D’Azevedo, E.F.; Zhang, F.; Parker, J.C.; Watson, D.B.; Jardine, P.M. Application of a hybrid MPI/OpenMP approach for parallel groundwater model calibration using multi-core computers. Comput. Geosci.
**2010**, 36, 1451–1460. [Google Scholar] [CrossRef] - Mahinthakumar, G.; Saied, F. A hybrid MPI-OpenMP implementation of an implicit finite-element code on parallel architectures. Int. J. High Perform. Comput. Appl.
**2002**, 16, 371–393. [Google Scholar] [CrossRef] - Le, P.V.V.; Kumar, P. Interaction between ecohydrologic dynamics and microtopographic variability under climate change. Water Resour. Res.
**2017**, 53, 8383–8403. [Google Scholar] [CrossRef] - Le, P.V.V.; Kumar, P.; Valocchi, A.J.; Dang, H.-V. GPU-based high-performance computing for integrated surface-sub-surface flow modeling. Environ. Model. Softw.
**2015**, 73, 1–13. [Google Scholar] [CrossRef] [Green Version] - Ji, X.; Li, D.; Cheng, T.; Wang, X.-S.; Wang, Q. Parallelization of MODFLOW using a GPU library. Groundwater
**2014**, 52, 618–623. [Google Scholar] [CrossRef] [PubMed] - Miller, C.T.; Dawson, C.N.; Farthing, M.W.; Hou, T.Y.; Huang, J.; Kees, C.E.; Kelley, C.T.; Langtangen, H.P. Numerical simulation of water resources problems: Models, methods, and trends. Adv. Water Resour.
**2013**, 51, 405–437. [Google Scholar] [CrossRef] - Hammond, G.E.; Lichtner, P.C.; Mills, R.T. Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN. Water Resour. Res.
**2014**, 50, 208–228. [Google Scholar] [CrossRef] [Green Version] - SAMRAI: Structured Adaptive Mesh Refinement Application Infrastructure. Available online: http://computation.llnl.gov/projects/samrai (accessed on 9 April 2020).
- Benzi, M. Preconditioning techniques for large linear systems: A survey. J. Comput. Phys.
**2002**, 182, 418–477. [Google Scholar] [CrossRef] [Green Version] - PESTc. Available online: http://www.mcs.anl.gov/petsc/index.html (accessed on 9 April 2020).
- MacNeice, P.; Olson, K.M.; Mobarry, C.; De Fainchtein, R.; Packer, C. PARAMESH: A parallel adaptive mesh refinement community toolkit. Comput. Phys. Commun.
**2000**, 126, 330–354. [Google Scholar] [CrossRef] [Green Version] - Mo, Z.; Zhang, A.; Cao, X.; Liu, Q.; Xu, X.; An, H.; Pei, W.; Zhu, S. JASMIN: A parallel software infrastructure for scientific computing. Front. Comput. Sci. China
**2010**, 4, 480–488. [Google Scholar] [CrossRef] - Cheng, T.; Mo, Z.; Shao, J. Accelerating groundwater flow simulation in MODFLOW using JASMIN-Based parallel computing. Groundwater
**2014**, 52, 194–205. [Google Scholar] [CrossRef] [PubMed] - Cheng, T.; Shao, J.; Cui, Y.; Mo, Z.; Han, Z.; Li, L. Parallel simulation of groundwater flow in the North China Plain. J. Earth Sci.-China
**2014**, 25, 1059–1066. [Google Scholar] [CrossRef] - Zheng, C.; Wang, P.P. MT3DMS: A Modular Three-Dimensional Multispecies Transport Model for Simulation of Advection, Dispersion, and Chemical Reactions of Contaminants in Groundwater Systems. In Documentation and User’s Guide; Alabama University: Tuscaloosa, AL, USA, 1999. [Google Scholar]
- Ehtiat, M.; Mousavi, S.J.; Srinivasan, R. Groundwater modeling under variable operating conditions using SWAT, MODFLOW and MT3DMS: A catchment scale approach to water resources management. Water Resour. Manag.
**2018**, 32, 1631–1649. [Google Scholar] [CrossRef] - Hecht-Mendez, J.; Molina-Giraldo, N.; Blum, P.; Bayer, P. Evaluating MT3DMS for heat transport simulation of closed geothermal systems. Ground Water
**2010**, 48, 741–756. [Google Scholar] [CrossRef] [PubMed] - Cao, X.; Mo, Z.; Liu, X.; Xu, X.; Zhang, A. Parallel implementation of fast multipole method based on JASMIN. Sci. China-Inf. Sci.
**2011**, 54, 757–766. [Google Scholar] [CrossRef] - Xiao, L.; Cao, X.; Cao, Y.; Ai, Z.; Xu, P. Efficient coupling of parallel visualization and simulations on tens of thousands of cores. In Proceedings of the International Conference on Virtual Reality and Visualization, Beijing, China, 4–5 November 2011. [Google Scholar] [CrossRef]
- Zheng, C.; Hill, M.C.; Hsieh, P.A. MODFLOW-2000, the US Geological Survey Modular Ground-Water Model: User Guide to the LMT6 Package, the Linkage with MT3DMS for Multi-Species Mass Transport Modeling (No. 2001–2082); USGS: Denver, CO, USA, 2001. [Google Scholar]
- Guiguer, N.; Franz, T. Visual MODFLOW, User’s Manual; Waterloo hydrogeologic Inc.: Waterloo, ON, Canada, 1996. [Google Scholar]
- Amdahl, G.M. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the spring joint computer conference, Atlantic City, NJ, USA, 18–20 April 1967; ACM: New York, NY, USA, 1967; pp. 483–485. [Google Scholar] [CrossRef]
- Ghai, A.; Lu, C.; Jiao, X. A comparison of preconditioned Krylov subspace methods for large-scale nonsymmetric linear systems. Numer. Linear Algebra Appl.
**2018**. [Google Scholar] [CrossRef] [Green Version]

**Figure 2.**JASMIN modular three-dimensional transport model for multi-species (JMT3D) program flowchart.

**Figure 3.**The operational principle of basic transport (BTN) package. (

**a**) Modular three-dimensional transport model for multi-species (MT3DMS) BTN package, (

**b**) JMT3D BTN package.

**Figure 9.**(

**a**–

**c**) Transport breakthrough curves of JMT3D and MT3DMS at each observation well. (

**d**) Three-dimension profile of the concentration at 3600 days.

**Figure 13.**(

**a**) Hydraulic conductivity of layer 1–3,5–6. (

**b**) Longitudinal dispersion of layer 1–3,5–6.

**Figure 15.**(

**a**–

**e**) represent the wall-clock per iteration of AMG, BiCGSTAB, GMRES, AMG-BiCGSTAB, AMG-GMRES respectively.

**Figure 18.**(

**a**–

**e**) represent the speedup of AMG, BiCGSTAB, GMRES, AMG-BiCGSTAB, AMG-GMRES respectively.

Name | Value | Name | Value | Name | Value |
---|---|---|---|---|---|

K1 | 40 m/d | S_{y} | 0.25 | D1 | 5 m |

K2 | 40 m/d | S_{s} | 2.5 × 10^{−4} | D2 | 5 m |

K3 | 40 m/d | S_{s} | 2.5 × 10^{−4} | D3 | 5 m |

K4 | 1 m/d | S_{s} | 2.5 × 10^{−4} | D4 | 1 m |

K4-2 | 100 m/d | S_{s}-2 | 3.0 × 10^{−4} | ||

K5 | 100 m/d | S_{s} | 3.0 × 10^{−4} | D5 | 5 m |

K6 | 100 m/d | S_{s} | 3.0 × 10^{−4} | D6 | 5 m |

Porosity | 0.3 | TRPT | 0.2 | TRPV | 0.1 |

_{y}and S

_{s}denote the specific yield and storage coefficient, respectively; D1 (2,3, …) denotes the longitudinal dispersivity of layer 1 (2, 3, …); K4-2 and S

_{s}-2 are parameters for the circle area of layer 4; TRPT is the ratio of horizontal transverse dispersivity to longitudinal dispersivity, and TRVT is the ratio of vertical transverse dispersivity to longitudinal dispersivity.

Number of Processors | AMG | BiCGSTAB | GMRES | AMG-BiCGSTAB | AMG-GMRES |
---|---|---|---|---|---|

1 | 4780 | 7885 | 12817 | 3710 | 4777 |

2 | 4746 | 7885 | 12817 | 3710 | 4742 |

4 | 4742 | 7885 | 12817 | 3710 | 4738 |

8 | 4839 | 7885 | 12817 | 3710 | 4837 |

16 | 4820 | 7885 | 12818 | 3710 | 4818 |

24 | 4826 | 7885 | 12817 | 3710 | 4824 |

32 | 4835 | 7885 | 12817 | 3710 | 4833 |

46 | 4839 | 7885 | 12817 | 3710 | 4837 |

**Table 3.**The Bi-conjugate gradient variant algorithm (BiCGSTAB) method wall-clock time percentages of different parts (Unit: %).

Processor | Solute | Flow | Others |
---|---|---|---|

1 | 77 | 21 | 2 |

2 | 78 | 20 | 2 |

4 | 81 | 17 | 2 |

8 | 82 | 16 | 2 |

16 | 83 | 13 | 4 |

24 | 83 | 13 | 4 |

32 | 82 | 13 | 5 |

46 | 80 | 14 | 6 |

Processors | T_{A} (S) | T_{B} (S) | E_{A} (%) | E_{B} (%) | I_{A} | I_{B} | Average Memory (MB) |
---|---|---|---|---|---|---|---|

28 | 5650 | 2133 | 100.0 | 100.0 | 1028 | 2677 | 449 |

56 | 2956 | 1708 | 95.6 | 62.4 | 1028 | 2676 | 260 |

112 | 1407 | 971 | 100.4 | 54.9 | 1029 | 2677 | 141 |

224 | 644 | 516 | 109.7 | 51.7 | 1030 | 2676 | 88 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Liu, X.; Zhang, Q.; Cheng, T.
Accelerating Contaminant Transport Simulation in MT3DMS Using JASMIN-Based Parallel Computing. *Water* **2020**, *12*, 1480.
https://doi.org/10.3390/w12051480

**AMA Style**

Liu X, Zhang Q, Cheng T.
Accelerating Contaminant Transport Simulation in MT3DMS Using JASMIN-Based Parallel Computing. *Water*. 2020; 12(5):1480.
https://doi.org/10.3390/w12051480

**Chicago/Turabian Style**

Liu, Xingwei, Qiulan Zhang, and Tangpei Cheng.
2020. "Accelerating Contaminant Transport Simulation in MT3DMS Using JASMIN-Based Parallel Computing" *Water* 12, no. 5: 1480.
https://doi.org/10.3390/w12051480