PyTheis—A Python Tool for Analyzing Pump Test Data

: The Theis equation is an important mathematical model used for analyzing drawdown data obtained from pumping tests to estimate aquifer parameters. Since the Theis model is a nonlinear equation, a complex graphical procedure is employed for ﬁtting this equation to pump test data. This graphical method was originally proposed by Theis in the late 1930s, and since then, all the groundwater textbooks have included this ﬁtting method. Over the past 90 years, every groundwater hydrologist has been trained to use this tedious procedure for estimating the values of aquifer transmissivity (T) and storage coefﬁcient (S). Unfortunately, this mechanical procedure does not provide any intuition for understanding the inherent limitations in this manual ﬁtting procedure. Furthermore, it does not provide an estimate for the parameter error. In this study, we employ the public domain coding platform Python to develop a script, namely, PyTheis, which can be used to simultaneously evaluate T and S values, and the error associated with these two parameters. We solve nine test problems to demonstrate the robustness of the Python script. The test problems include several published case studies that use real ﬁeld data. Our tests show that the proposed Python script can efﬁciently solve a variety of pump test problems. The code can also be easily adapted to solve other hydrological problems that require nonlinear curve ﬁtting routines. The results show that PyTheis converged to values that are close to the literature values reported for these eight test problems.


Introduction
Groundwater pumping tests are routinely used to characterize two important aquifer properties-transmissivity (T) and storage coefficient (S). In the published literature, several mathematical models have been developed to predict the hydraulic properties of aquifers based on the appropriate system properties and boundary conditions [1][2][3][4][5][6]. The nonlinear equation proposed by Theis [5] was the first model that was used for predicting transient drawdown patterns in a confined aquifer. Later, the Theis model has been used to develop a variety of graphical procedures for estimating T and S values from drawdown data [1]. Researchers have also extended the Theis model for describing different types of aquifer systems; for example, Hantush [2] developed a semilogarithmic plot for analyzing leaky confined aquifers. Kruseman and de Ridder [3] presented detailed graphical methods for analyzing pump test data for various types of aquifers. These graphical methods can be subjective and their results can have considerable variations [7][8][9][10]. For example, Naderi and Gupta [8] investigated the variations in discharge rates occurring during pump tests and demonstrated their impacts on T and S values. McElwee [10] was one of the early studies that employed a least-squares method to analyze pump test data and they demonstrated that the automated technique provided better results than the traditional graphical fitting method.
The drawdown curve predicted by the Theis model uses a nonlinear function, known as the well function, that is typically evaluated using a type curve [11]. The Theis model is an analytical solution to the transient radial flow equation used for describing groundwater flow in a one-dimensional confined aquifer [5]. Interestingly, the mathematical solution presented by Theis is a well-known solution to the classic radial heat flow problem, which was originally pointed out by Theis' colleague C.I. Lubin, and Theis generously acknowledged this fact in the main text of his seminal paper [5]. Theis later developed a graphic method to use this relationship in an inverse mode to estimate the values of T and S from pump test data. Although Theis did not publish this graphical method, when Jacob [12] provided an example for this graphical procedure in his journal article, he explicitly acknowledged his personal communications with Theis (see p.96 in Jacob [12]). Later, when Wenzel [13] published the details of this graphical procedure, he also acknowledged his communication with Theis in a footnote [see p.88 and 96 in Wenzel [13] report]. Since then all the groundwater textbooks have included this graphical procedure. Therefore, every groundwater hydrogeologist who has been trained in the past 90 years would have had a chance to learn this graphical procedure.
The Theis fitting method is a tedious technique even for experts. Benbarka and Davis [7] conducted a study to evaluate the efficiency of using the Theis approach. The graphical type-curve matching method was completed by 30 technical experts, and the values were compared against the results obtained from an automatic parameter estimation procedure. The results identified the inherent limitation of the graphical estimation procedure due to the variations in the lines drawn based on visual interpretation.
In recent years, several computational tools have been developed to analyze the pump test data. Labadie and Helweg [14] presented a computer algorithm named FASTEP. In this FORTRAN code, a least-squares curve fitting method was used as an alternative to the graphical approach. Reed [15] developed another FORTRAN code for fitting type curves for several types of flow conditions. Maslia and Randolph [16] developed a FORTRAN code, namely, TENSOR2D to analyze the transmissivity tensor for an anisotropic aquifer by using a weighted least-squares optimization procedure. Barlow and Moench [17] developed the WTAQ code, which was also written in FORTRAN, and it employed several analytical solutions for modeling radial flow in both unconfined and confined aquifers. The WTAQ code uses two different parameter estimation methods: a type-curve fitting method that uses a trial-and-error approach, and an automatic parameter-estimation method that minimizes the error. Halford and Kuniansky [18] developed a Microsoft Excel spreadsheet tool for evaluating aquifer parameters. Numerous commercial software tools such as AQTESOLV [19] and AquiferTest Pro [20] have also been developed for analyzing pump test data.
Most automatic curve fitting tools employ a nonlinear fitting algorithm to minimize the sum of squared error between observations and model predictions [7,8,10,[21][22][23] In a recent study, Bateni et al. [24] compared the performance of various nonlinear programming methods including the genetic algorithms (GA) and ant colony (AC) methods to estimate aquifer parameters using five different time-drawdown datasets. The authors concluded that both GA and AC are reliable methods for evaluating aquifer parameters, and they performed better than the graphical methods.
Previous studies have also attempted to quantify the error to assess the reliability of the best-fit parameter values evaluated from pump tests [7,8]. However, these methods can be cumbersome; therefore, none of the standard Theis analysis methods discussed in textbooks provide any error analysis. In this paper, we provide a simple open-source Python script that can fit the Theis model to pump test data to evaluate T and S values, and it can also provide the estimate of error associated with these parameters. The trust-region reflective method [25], available in the scipy.optimize.least_squares function, is used to solve the nonlinear least-squares problem after setting appropriate bounds for the model parameters. The performance of the code was tested by solving several benchmark problems.

Well Function of Theis Curve Type for Pump Test
The Theis equation for modeling flow towards a fully penetrating well pumping at a constant rate in a homogeneous, isotropic confined aquifer of infinite extent can be written as: the well function has been conveniently tabulated [13] for various values of u defined by the following equation: where S is the aquifer storage coefficient (dimensionless), t is the time since pumping began [T], r is the radial distance from the pumping well [L], and u is a dimensionless constant.

The Parameter Fitting Method Used in PyTheis
As discussed in the introduction section, the graphic method originally suggested by Theis and later documented by Jacob [12] and others, is often used in textbooks to fit the drawdown data with Equation (1) to estimate the T and S values. The PyTheis script developed in this study employs a nonlinear solver available in the Python curve-fitting library to obtain the best estimates for T and S. The computational steps used by the fitting procedure are summarized in Figure 1.
the well function has been conveniently tabulated [13] for various values of u defined by the following equation: where S is the aquifer storage coefficient (dimensionless), t is the time since pumping began [T], r is the radial distance from the pumping well [L], and u is a dimensionless constant.

The Parameter Fitting Method Used in PyTheis
As discussed in the introduction section, the graphic method originally suggested by Theis and later documented by Jacob [12] and others, is often used in textbooks to fit the drawdown data with Equation (1) to estimate the T and S values. The PyTheis script developed in this study employs a nonlinear solver available in the Python curve-fitting library to obtain the best estimates for T and S. The computational steps used by the fitting procedure are summarized in Figure 1.  The scipy.optimize.curve_fit function available in the Python library offers three possible nonlinear solvers: the Levenberg-Marquardt (LM), Trust-Region Reflective (TR), and Dog Box (DB) methods. The LM method is a default algorithm for solving nonlinear least-squares problems using this Python function. LM is an unconstrained nonlinear search procedure typically used for solving small-to-medium-sized problems. This method may fail or converge to a local minimum when inappropriate initial estimates are adopted [26]. TR or DB methods, which are constrained nonlinear search algorithms, are better alternatives when using a limited number of data points to fit nonlinear functions. More recently, Memari and Clement [27] applied all three approaches to fit soil moisture data to various soil water retention models and concluded that the TR is the most robust approach for fitting highly nonlinear functions.

Initial Guess Values and Parameter Bounds for T and S Values
To use the TR method, one needs to provide parameter bounds and initial guess values. In PyTheis, the initial guess value for T was always set to 1 m 2 /d. This is a reasonable initial guess since pump tests are usually completed in permeable formations that can produce water. Therefore, assuming that, on average, the aquifer will contain an approximately 1 to 10 m thick layer of fine/silty sand layer with a conductivity of approximately 1 to 0.1 m/day is a good approximation. The lower bound of the T value was assumed to be about 0.01 m 2 /day, and the upper bound was assumed to be 1.0 × 10 5 . This is a broad range that should cover a wide range of soils. Halford and Kuniansky [18] study summarized hydraulic conductivity values that are typically observed under field conditions. The bounds we have recommended here include all possible T values for permeable sand/silt formations reported in this study.
The minimum value of S was set to 1.0 × 10 −6 , which is a typical lower limit of the confined aquifer compressibility value [11]. The maximum value was set to 0.1, which is a typical upper limit for the unconfined specific yield value. The best initial guess value of S was assumed to be 1.0 × 10 −5 , which is the typical value of S observed for confined aquifers [11]. These initial guess values and the lower and upper bounds worked well for solving all the example problems considered in this study. However, if needed, users can always modify these default values.

Estimation of the Error Associated with Transmissivity and Storage Values
Error is an inherent part of every engineering/scientific measurement. One of the major limitations of the Theis graphical method is that the method cannot quantify the error in the estimated values of T and S. Without an error estimate, it is difficult to understand the accuracy of the parameter values, and hence, in most cases, analysts often report the values of T and S using an inappropriate number of significant figures. The Python curve fitting routines employed in this study provides a covariance matrix for the best-fit parameter values. The error associated with the parameter values can be estimated as the square root of the diagonal terms of this covariance matrix. Using this error estimate, the best-fit parameter values can be reported with an appropriate number of significant figures by employing a standard rounding procedure summarized below.
A simple method to determine the number of appropriate significant figures is to express the best-fit parameter values and the associated error values in the scientific notation using a consistent unit. Since the error is an estimate of what we do not know, as a rule of thumb, one should round error estimates to one (max of two, in rare cases) significant figure [28]. Then, one can use the first nonzero digit of the error value to truncate the significant digit of the estimated parameter value [28,29]. For example, if the computer reported a T value of 82.7224131 m 2 /day and the uncertainty is 3.124124, first express these data in the scientific notation as: T = (8.27224131 ± 0.3124124) × 10 1 m 2 /day. After appropriate rounding, the value should then be expressed as: T = (8.3 ± 0.3) × 10 1 or 83 ± 3 m 2 /day. The knowledge of the error (or uncertainty) would allow one to round the estimated parameter values using an appropriate number of significant figures. Although this is a well-known standard procedure, when the error estimate is not available analysts often report the estimated values of model parameters with an unrealistic number of significant figures (e.g., see Bateni [24]). We will discuss the implications of this rounding procedure using the results of some benchmark problems in our results section.

Test Problems Considered in This Study
To demonstrate the use of PyTheis, we first solve a standard problem given in Lohman [30] to demonstrate the importance of the error estimates and the use of proper rounding to report the parameter values with an appropriate number of significant figures. The dataset given by Lohman [30] describes the drawdown patterns in a theoretical confined aquifer. The monitoring wells are placed at distances of 200, 400, and 800 ft from a well pumping at the constant rate of 96,000 ft 3 /day. We use these monitoring data to test the performance of the PyTheis code.
We later solve eight other test problems to demonstrate the use of the PyTheis code for solving a variety of previously studied problems with a wide range of model parameters. These eight test problems allowed us to verify whether the default initial guess values and the parameter bounds set within PyTheis can be used to solve practical problems. Finally, we developed three synthetic datasets by artificially introducing different levels of random error into the well-known Walton [31] problem. These perturbed datasets are used to further test the robustness of the fitting procedure when there are considerable random variations in the field data.

Results and Discussion
The fitted results estimated using PyTheis for the Lohman [30] test problem are shown in Figure 2. The figures show the fittings for the observed drawdown data collected at three different observation wells located at r = 200, 400, and 800 ft away from the pumping well. The drawdown curves estimated by PyTheis closely fitted the data points (see Figure 2). The best-fit T and S values are close to the values reported in Lohman [30]. For example, the PyTheis estimated model parameters based on data from r = 200 ft are: T r=200 = 13,379.2554 ft 2 /day and S r=200 = 2.01716330 × 10 −4 , and the corresponding error values are: 27.3641357 ft 2 /day and 1.42597139 × 10 −6 , respectively. As expected, the computer code reported the best-fit values and error estimates with a large number of significant figures. After appropriate rounding (see the example in Section 2.4), the estimated value of T should be reported as (133.8 ± 0.3) × 10 2 or 13,380 ± 30 ft 2 /day. This value compares well with the graphical solution reported in Lohman's book, which is T = 14,000 ft 2 /day (note Lohman estimated this average value using the data points from all three wells). In this effort, we evaluated three estimates using the drawdown data reported for the three monitoring wells. The parameter values estimated based on the three datasets are summarized in Table 1. It is worth noting that Lohman's analysis excluded some extreme data points; for example, when the r 2 /t values are 10 8 or more the points fell outside the range of the standard 2 × 4 graph paper and hence were excluded in the analysis. In this study, the PyTheis code was able to utilize all the data points for estimating the model parameters. The results show the model parameter values estimated using all three datasets were close to the average values reported by Lohman. Interestingly, the error analysis shows that the PyTheis estimates are more precise than the value reported by Lohman.
To further test the performance of the PyTheis code under different types of flow conditions, we solved the following eight different problems reported in the literature: Walton [31], Fetter and Rayne [32], Driscol [33], McWhorter and Sunda [34], Viessman and Lewis [35], Todd and Mays [36], and two sets of data from Naderi [23]. Our results indicated that the PyTheis code, with the default initial values and parameter bounds, was able to solve all eight problems. The results for the first problem, the Walton problem, are given in Figure 3a. This problem was based on a pump test completed at a confined aquifer in Gridley, Illinois. This dataset was also used by Saleem [21] for comparing the performance of a least-squares fitting method. The result for the second problem is given in Figure 3b; this problem was taken from Fetter [11] textbook, and the solution was documented in the textbook's solution manual [32]. The result for the third test problem is shown in Figure 3c; this problem is based on a synthetic test problem given in Driscol [33]. The result for the fourth problem, shown in Figure 3d, is a test problem discussed in McWhorter and Sunda [34]. This dataset is based on a pump test conducted at a field site in Fort Collins, Colorado. The fifth problem, shown in Figure 3e, is an example problem taken from Viessman and Lewis [35]. The sixth problem is taken from Todd and Mays [36] textbook, and the results are shown in Figure 3f; this is a well-known problem that has been widely used in the published literature [22,23] to validate different fitting methods used for estimating aquifer parameter values.  To further test the performance of the PyTheis code under different types of flow conditions, we solved the following eight different problems reported in the literature: Walton [31], Fetter and Rayne [32], Driscol [33], McWhorter and Sunda [34], Viessman and Lewis [35], Todd and Mays [36], and two sets of data from Naderi [23]. Our results indicated that the PyTheis code, with the default initial values and parameter bounds, was able to solve all eight problems. The results for the first problem, the Walton problem, are given in Figure 3a. This problem was based on a pump test completed at a confined aquifer in Gridley, Illinois. This dataset was also used by Saleem [21] for comparing the performance of a least-squares fitting method. The result for the second problem is given in Figure 3b; this problem was taken from Fetter [11] textbook, and the solution was documented in the textbook's solution manual [32]. The result for the third test problem is shown in Figure 3c; this problem is based on a synthetic test problem given in Driscol [33]. The result for the fourth problem, shown in Figure 3d, is a test problem discussed in McWhorter and Sunda [34]. This dataset is based on a pump test conducted at a field site in Fort Collins, Colorado. The fifth problem, shown in Figure 3e, is an example problem taken from Viessman and Lewis [35]. The sixth problem is taken from Todd and Mays [36] textbook, and the results are shown in Figure 3f; this is a well-known problem that has been widely used in the published literature [22,23] to validate different fitting methods used for estimating aquifer parameter values.  Finally, we selected two different datasets reported in Naderi [23] for our seventh and eighth test problems, and the results are summarized in Figure 3g,h, respectively. Figure 3g results are for a synthetic dataset given in Naderi [23], whereas Figure 3h results are based on a pump test dataset collected at a field site in Shiraz, Iran, and hence, we refer to this dataset as "field data from Naderi (2019)".
Detailed datas for all eight problems, which include pumping rate, distance from the well, the PyTheis estimated parameter and error values, and the parameter values reported in the literature, are given in Table 2. Since the nonlinear estimation method employed by PyTheis uses a solver that calculates the local optimum, we used these multiple datasets to verify whether the default initial values and parameter bounds are appropriate for estimating realistic aquifer parameter values. The results show that PyTheis converged to values that are close to the literature values reported for these eight test problems. Finally, we selected two different datasets reported in Naderi [23] for our seventh and eighth test problems, and the results are summarized in Figure 3g,h, respectively. Figure 3g results are for a synthetic dataset given in Naderi [23], whereas Figure 3h results are based on a pump test dataset collected at a field site in Shiraz, Iran, and hence, we refer to this dataset as "field data from Naderi (2019)." Detailed datas for all eight problems, which include pumping rate, distance from the well, the PyTheis estimated parameter and error values, and the parameter values reported in the literature, are given in Table 2. Since the nonlinear estimation method employed by PyTheis uses a solver that calculates the local optimum, we used these multiple datasets to verify whether the default initial values and parameter bounds are appropriate for estimating realistic aquifer parameter values. The results show that PyTheis converged to values that are close to the literature values reported for these eight test problems.

Sensitivity of Parameter Values to Variations in Field Data
To further test the robustness of the code for solving field problems involving noisy data, we created three synthetic datasets by introducing different levels of random error to the Walton [31] dataset. We used the RANDBETWEEN function in Excel to generate random numbers between the specified range; (−10, 10), (−20, 20), (−40, 40). The field data was perturbed by multiplying these random numbers by the drawdown values and dividing them by 100. The original data along with the three perturbed datasets are given in Appendix B. The PyTheis code was used to fit all four datasets, and the results are shown in Figure 4. As shown in Figure 4a,b, the datasets with a low level of perturbation are located close to the curve estimated by PyTheis over the entire observation period, whereas the datasets with a high level of perturbation (see Figure 4c,d) are scattered farther away from the fitted curve. The best-fit T and S values and the associated errors for all four datasets are summarized in Table 3. The results presented in Table 3 show that the uncertainty in the estimated parameter values increases with an increase in the amount of random error introduced in the dataset. The estimated mean T value for the base case is 123 m 2 /day and the uncertainty is 1 m 2 /day. When 10% noise was included in the dataset, the estimated mean value of T is 122 m 2 /day, and the uncertainty is 4 m 2 /day. The estimated mean value of S for the base case is 2.1 × 10 −5 , and the uncertainty is 4 × 10 −7 ; and for the 10% perturbed case, the mean S value is 2.1 × 10 −5 , and the uncertainty is 1 × 10 −6 . It is interesting to note that even for the highly noisy dataset, the predicted mean parameter values are fairly close to the expected values.

Conclusions
We present the details of a Python tool, namely, PyTheis, for evaluating the aquifer properties T and S, and the error associated with these two important hydrogeological parameters. PyTheis is an excellent alternative to the commonly used graphical estimation method (known as the Theis type-curve method), which involves laborious curve matching steps that can result in subjective estimates. The method proposed in this study can be used within the public domain Python platform, and it is a rational, modern alternative to the traditional, 90-year-old Theis graphical procedure. PyTheis is a flexible computer code that uses open-source tools. The initial guess values and parameter bounds suggested in this study worked well for fitting a wide variety of pump test data. PyTheis is a practical tool for analyzing pump tests, and it is also an excellent teaching tool. Furthermore, this code can be easily adapted to solve other hydrological parameter estimation problems that require nonlinear curve fitting routines.
It is important to note that the current version of PyTheis uses the basic confined flow analytical solution. Therefore, the users must carefully screen the field data to avoid forcing the ideal Theis curve to fit the data points that are inconsistent with the theory. For example, in many cases, early time data points collected during a pump test are strongly influenced by well storage effects. Similarly, river/rock boundary conditions and leaky confining layers will impact the later time data points. Forced fitting the Theis model to these early or later data points will yield incorrect aquifer parameter values. Therefore, the users should use caution and carefully screen the field data before using the code to fit the classic Theis model. Correcting the code for well-storage, boundary-condition, and leaky-aquifer effects is beyond the scope of this study. Future studies should consider including advanced analytical or numerical models that can account for these effects.