Approximate Dynamic Programming Based Control of Proppant Concentration in Hydraulic Fracturing

Hydraulic fracturing has played a crucial role in enhancing the extraction of oil and gas from deep underground sources. The two main objectives of hydraulic fracturing are to produce fractures with a desired fracture geometry and to achieve the target proppant concentration inside the fracture. Recently, some efforts have been made to accomplish these objectives by the model predictive control (MPC) theory based on the assumption that the rock mechanical properties such as the Young’s modulus are known and spatially homogenous. However, this approach may not be optimal if there is an uncertainty in the rock mechanical properties. Furthermore, the computational requirements associated with the MPC approach to calculate the control moves at each sampling time can be significantly high when the underlying process dynamics is described by a nonlinear large-scale system. To address these issues, the current work proposes an approximate dynamic programming (ADP) based approach for the closed-loop control of hydraulic fracturing to achieve the target proppant concentration at the end of pumping. ADP is a model-based control technique which combines a high-fidelity simulation and function approximator to alleviate the “curse-of-dimensionality” associated with the traditional dynamic programming (DP) approach. A series of simulations results is provided to demonstrate the performance of the ADP-based controller in achieving the target proppant concentration at the end of pumping at a fraction of the computational cost required by MPC while handling the uncertainty in the Young’s modulus of the rock formation.


Introduction
Petroleum and natural gas remain an important part of the global energy supply.Recently, the extraction of underground resources such as shale gas and oil, which are trapped in low porosity and ultra-low permeability formations, has become economically feasible due to the application of well-stimulation techniques such as hydraulic fracturing [1] and directional drilling [2].Since its introduction in the 1940s, hydraulic fracturing has gradually developed as a standard practice and has been applied to various types of reservoir formations, and thus it has made significant contributions to the oil and gas industry [3].
In hydraulic fracturing, the ultimate goal is to enhance the productivity of a stimulated (i.e., fractured) well.The process begins with a step referred to as "perforation", in which small explosives are set off at spaced intervals at the wellbore to create initial fracture paths.Next, a fluid called pad is injected at a high pressure to initiate fractures of the rock at the perforated sites.Subsequently, a fracturing fluid called dirty volume consisting of water, additives, and proppant is pumped into the wellbore at sufficiently high pressure and flow rate to further propagate the fracture in the rock formation.Finally, the pumping is stopped, and the fractures are closed due to the natural stress of the rock formation.During the closure process, the remaining fluid seeps into the reservoir and the proppant is trapped inside the fracture.At the end of pumping, the concentration of the proppant should be uniform along the fracture to achieve a highly conductive channel which will result in effective extraction of oil and gas from the reservoir.The overall efficiency of the hydraulic fracturing process depends on (1) the uniformity of proppant concentration across the fracture at the end of pumping and (2) the final fracture geometry.
To produce a fracture with uniform proppant concentration across the fracture and the desired fracture geometry, it is important to generate an optimal pumping schedule.Several efforts in this direction were initially made by Nolte [4], Gu and Desroches [5] and Dontsov and Peirce [6].Specifically, Nolte [4] developed a power-law type pumping schedule based on the conservation of fluid volume; Gu and Desroches [5] proposed a pumping schedule design technique using a detailed forward numerical simulator; and Dontsov and Peirce [6] designed a pumping schedule by taking into account proppant transport in their forward model.
The aforementioned techniques viewed hydraulic fracturing processes as an open-loop problem.Motivated by some advances in real-time measurement techniques such as downhole pressure analysis and microseismic monitoring, several attempts have recently been made to employ model predictive control (MPC) theory to regulate the fracture geometry and proppant concentration.Specifically, the limited availability of real-time measurements has been addressed by utilizing state estimators [7][8][9], and several model order-reduction (MOR) techniques [10][11][12] have been developed to handle the large computational requirements due to dynamic simulation of multiple highly-coupled partial differential equations (PDEs) defined over moving boundaries to describe the hydraulic fracturing process.However, there are two unresolved issues with the MPC approach.First, it is necessary to handle a potentially exorbitant online computational requirement due to the simulation of a nonlinear large-scale system at each sampling time (which is usually the case in hydraulic fracturing) and the use of a long prediction/control horizon to ensure satisfactory performance [13][14][15].Second, the conventional MPC solves a deterministic open-loop optimal control problem at each sampling time.Therefore, it ignores the uncertainty and the feedback at future sampling times [16,17].
The above-mentioned limitations of MPC formulation can be handled by the approximate dynamic programming (ADP) approach [18], particularly circumventing the "curse-of-dimensionality" of the traditional dynamic programming (DP) approach.ADP is a model-based control technique and can be employed to derive an improved control policy, starting with some sub-optimal control policies (or, alternatively, closed-loop identification data).In recent years, ADP has been successfully applied to several applications such as a complex microbial cell reactor characterized by multiple steady states [19], Van de Vusse reaction in an isothermal CSTR [20], integrated plants with a reactor and a distillation column with a recycle [21], and systems described by hyperbolic PDEs [22][23][24].Motivated by these earlier efforts, we present an ADP-based control framework for the closed-loop operation of a hydraulic fracturing process to achieve uniform proppant concentration across the fracture at the end of pumping.
The organization of this paper is as follows: first, a brief introduction of ADP is presented.Second, a high-fidelity model of hydraulic fracturing is constructed based on first-principles.Finally, we discuss the application of ADP to a hydraulic fracturing process and present a series of simulation results that demonstrates the superiority of the ADP-based controller over the standard MPC system in achieving uniform proppant concentration at the end of pumping, which is directly related to the overall productivity of a fractured well.

Approximate Dynamic Programming
Consider an optimal control problem with the set of all possible states and inputs represented by X ⊂ R n x and U ⊂ R n u , respectively, where n x and n u are the number of state and manipulated input variables, respectively.For a deterministic state space system, an optimal state feedback policy can be determined by formulating an optimization problem as follows: where x(t k ) ∈ X is the system state vector and u(t k ) ∈ U is the manipulated input vector at t = t k (i.e., k th sampling time), µ is the function mapping x(t k ) to u(t k ), Π is the set of all valid policies over which optimal µ is to be found, f is the model describing the time evolution of the system states, and Φ(x(t k ), u(t k )) is the single-stage cost incurred at state x(t k ) at t = t k while implementing the control move u(t k ).
DP is an alternative approach for solving multi-stage optimal control problems [25].In DP, we define "cost-to-go" function, denoted by J µ (x), of a starting state x as a sum of single-stage costs incurred from the state x under the control policy µ over the infinite horizon: where it is assumed that J µ (x) is well-defined over the entire X .The objective of DP is to obtain the infinite horizon optimal cost-to-go function (J opt ) that satisfies the following Bellman equation: Once the optimal cost-to-go function J opt is obtained, it can be subsequently employed to find the optimal input profile by solving the following point-wise single-stage optimization at every sampling time t k : There are very few problems such as linear quadratic (Gaussian) optimal control problem for which the Bellman equation can be solved analytically.Alternatively, numerical approaches such as "value iteration" or "policy iteration" can be employed.For systems with continuous state space, these numerical approaches can be employed either by discretizing the state space, or by using a finite dimensional parameterization.However, this can lead to potentially exorbitant computational requirements as the state dimension increases, which is referred to as "curse-of-dimensionality". Therefore, DP has been considered largely impractical for almost all the problems of practical interest.
The curse-of-dimensionality problem of DP can be handled by the ADP approach [18].In contrast to the numerical approach to compute solutions for DP, which obtains the optimal cost-to-go function for the entire continuous state space, the ADP approach limits the control law calculations to only the relevant regions of the state space.These relevant regions are identified by performing closed-loop simulations of the system under some known heuristics or sub-optimal control policies.From the principle of optimality of DP, an improved control policy can be derived by solving the Bellman equation in a recursive manner within the sampled domain of the state space (which is a discrete set consisting of sampled states, and denoted by X sample ).For a system with continuous state space, it is not feasible to restrict the Bellman iterations to a set of discrete sampled states only.Therefore, a function approximator (denoted by J(x)) is used to interpolate cost-to-go values within the sampled states to approximate cost-to-go values in the original continuous state space.In this work, we used K-nearest neighbors (KNN) as a function approximator.In KNN, the cost-to-go at any query point x 0 is computed as a distance-weighted average of its K-nearest neighbors: where and • • • from the query point.x 1 is the closest point from x 0 , followed by x 2 , and so on.

Dynamic Modeling of Hydraulic Fracturing
In this work, a large-scale process model of hydraulic fracturing is developed by adopting the following assumptions: (1) fracture propagation is described by Perkins, Kern, and Nordgren (PKN) model as shown in Figure 1 [26]; (2) the layers above and below have sufficiently large stresses such that the vertical fracture is confined within a single horizontal rock layer; (3) the fracture length is much greater than fracture width; (4) the fluid pressure is constant in the vertical direction; and (5) the fracture is surrounded by an isotropic homogenous elastic material.The dynamic modeling of hydraulic fracturing involves two sub-processes: fracture propagation and proppant transport.A brief description of the governing equations is presented below.The fluid flow rate in the horizontal direction is determined by the following equation for flow of a Newtonian fluid in an elliptical section [2,27]: where P is the net pressure, x ∈ [0, L(t)] is the time-dependent spatial coordinate in the horizontal direction, µ is the fluid viscosity, Q is the fluid flow rate in the horizontal direction, H is the predefined fracture height, and W is the fracture width.For a crack under constant normal pressure, the fracture shape is elliptical as shown in Figure 1.The relationship between the maximum fracture width (i.e., the minor axis of the ellipse) and the net pressure is given as follows [28,29]: where ν is the Poisson ratio of the formation, and E is the Young's modulus of the formation.By taking into account the fracture volume changes and the fluid leak-off into the reservoir, the continuity equation for an incompressible fluid flow inside the fracture is given by [27]: where A = πW H/4 is the cross-sectional area of the elliptic fracture, and U is the fluid leak-off per unit height accounting for both fracture walls, which is determined by the following expression [2,30]: where C leak is the overall leak-off coefficient, t is the elapsed time since fracturing was initiated, and τ(x) is the time at which the fracture propagation has arrived at x for the first time.
Plugging Equations ( 7) and ( 8) into Equation ( 9) results in the following nonlinear parabolic PDE: The initial condition for solving the above equations is that the fracture is closed, that is W(x, 0) = 0.In addition, the two boundary conditions are considered as follows: 1.At the wellbore, the fluid flow rate is specified by Q(x, t) = Q 0 (t), where Q 0 (t) is the fluid injection rate (i.e., the manipulated variable).2. At the fracture tip, x = L(t), the fracture is always closed, that is W(L(t), t) = 0.
The modeling of the injected proppant transport is based on the following assumptions: (1) along the horizontal direction, the injected proppant will travel at the fracturing fluid's velocity; (2) the suspended proppant will settle towards the fracture bottom due to the gravitational force which will lead to the formation of a proppant bank; (3) proppant particles are sufficiently large so that the diffusive flux can be neglected when the convective flux is considered; and (4) because of low proppant concentration, the interactions between the individual particles are neglected, while the drag and gravitational forces acting on the proppant particles are still considered.Based on these assumptions, the advection of the suspended proppant can be expressed by the following equation: where C is the suspended proppant concentration, ∇ is the vector differential operator, V p is the velocity with which the proppant particles are advected, and C 0 (t) is the inlet proppant concentration at the wellbore (i.e., the manipulated variable).
The net velocity of the proppant particles, V p , is dependent on the suspended proppant concentration, C, the fluid velocity, V, and the gravitational settling velocity, V s , which is given by the following expression [31]: The gravity-induced proppant settling velocity in the fracturing fluid, V s , is computed as follows [32]: where ρ sd is the proppant particle density, ρ f is the pure fluid density, g is the gravitational acceleration constant, d is the proppant diameter, and µ is the fracture fluid viscosity where its relation with the proppant concentration can be described by the following empirical model [33]: where µ 0 is the pure fluid viscosity, β is an exponent in the range of 1.2 to 1.8, and C max is the theoretical maximum proppant concentration determined by C max = (1 − φ)ρ sd where φ is the proppant bank porosity.The particles settle out of the flow to the fracture bottom and form a proppant bank.The evolution of proppant bank height, δ, via the proppant settling is given as follows [34,35]: where proppant bank is initially of vanishing thickness, so the initial condition is that δ(x, 0) = 0.The above-mentioned PDE-ODE systems defined over time-dependent spatial domains will be solved using an in-house simulator developed by Siddhamshetty et al. [8].

Simulation of Sub-Optimal Control Policies for ADP
In this section, a nonlinear MPC formulation is presented to obtain sub-optimal control policies for ADP.First, we employed unified fracture design (UFD) technique to obtain the optimal fracture length (L opt ) and width (W opt ) for a specific amount of proppant, M prop , to be injected into a reservoir well [36].Additionally, the target proppant concentration, C target , at the end of hydraulic fracturing is calculated as follows: The high-fidelity process model described in Equations ( 7)-( 17) demands high computational requirements, and thus, it cannot be directly employed for the controller design.While there are a variety of computationally efficient linear [11,37,38] and nonnlinear [10,[39][40][41][42] MOR techniques available, in this work, we developed a reduced-order model (ROM) based on multivariate output error state space (MOESP) algorithm using the simulation results from the high-fidelity model as described in [8].The developed ROM is presented as follows: where x(t k ) represents the vector of states in the state space model at time instant t k , A, B and H represent the system matrices, T is the inlet proppant concentration at the wellbore (i.e., the manipulated input variable), ζ x i is the input time-delay due to the time required for the proppant to travel from the wellbore to a particular location x i .
Remark 1. Please note the linear discrete-time state space model is good for the purpose of this study.To develop the ROM, we varied the input profile so that we can cover the entire range of operating conditions that are being considered in the oil reservoir field.Alternatively, a nonlinear ROM can be used to improve the controller performance as the governing equation is indeed a nonlinear parabolic PDE with the moving boundary.
In hydraulic fracturing, the readily available real-time measurements are limited to the fracture length and the fracture width at the wellbore, which are provided via the processed microseismic and downhole pressure data, respectively [43].To estimate the remaining important state variables such as the proppant concentration across the fracture, we designed a Kalman filter by adding the process and measurement noise to the ROM presented in Equation ( 19) as described in [8]: where v denotes the process noise, and w denotes the measurement noise.The process noise is assumed to be drawn from a zero mean multivariate normal distribution with covariance Q, and the measurement noise is assumed to be zero mean Gaussian white noise with covariance R.
The state estimator algorithm works in a two-step process: prediction and measurement update.Combining these two steps, the Kalman filter equations can be written as follows: where the operator ( •) is used to denote the estimated variables, M(t k ) is the Kalman filter gain, and P(t k ) denotes the covariance of the state estimation error.
To determine sub-optimal control policies for ADP, we employed the following MPC scheme designed by Siddhamshetty et al. [8]: where t f denotes the total treatment time, Q c is a positive definite matrix used to compute the weighted norm, t k is the current time, ∆ is the time interval between sampling times, T is the proppant concentration inside the fracture at six different locations at t = t k , 1 is a 6 × 1 vector whose elements are all ones, W 0 (t k ) and L(t k ) are the only readily available real-time measurements of the fracture width at the wellbore and the fracture length at t = t k , respectively, and C stage,k is the inlet proppant concentration (i.e., the manipulated input) corresponding to the k th time interval i.e., t ∈ [t k , t k+1 ), which can be computed by solving Equation ( 22) with a shrinking prediction horizon N p = t f − t k .
In the above optimization problem, the penalty function, Equation (22a), computes the squared deviation of the proppant concentration from the set-point at 6 different locations across the fracture at the end of pumping.At every sampling time t k , the Kalman filter of Equation ( 21) is initialized to estimate the proppant concentration Ĉ by using the real-time measurements of the fracture width at the wellbore and the fracture length.The constraint of Equation (22d) imposes limits on the concentration profiles to avoid premature termination of the hydraulic fracturing process.The constraint of Equation (22e) demands a monotonic increase in the input proppant concentration with an increment less than 4 ppga/stage, where ppga is a concentration unit used in petroleum engineering that refers to one pound of proppant added to a gallon of water.The constraint of Equation (22f) specifies the total amount of proppant to be injected.The terminal constraint of Equation (22g) employs the optimal fracture geometry, which is calculated by UFD scheme described in the preceding paragraph.
The dynamic model described in Section 3.1 was utilized to simulate the hydraulic fracturing process using the parameters listed in Table 1.In hydraulic fracturing, the characterization of rock mechanical properties is one of the key tasks that has to be performed prior to the model based controller design.This requires the availability of field data.Currently, in the field, a small-scale experiment called the mini-frac test is performed to collect preliminary data that can be used to characterize the geological properties.However, in this work, the model parameters utilized to simulate the hydraulic fracturing process are taken from literature [8,44].Specifically, we considered 48,000 kg of proppant amount to be injected during the entire hydraulic fracturing process.For this fixed amount of proppant, we employed UFD scheme to obtain the corresponding optimal fracture length L opt = 135 m and width W opt = 5.4 mm, which were used as the optimal fracture geometry constraint (Equation (22g)) in the MPC formulation that has to be satisfied at the end of pumping.Then, the target proppant concentration at the end of pumping, C target = 9.765 ppga, was calculated using Equation (18).The positive definite matrix, Q c , was considered to be a diagonal matrix with the diagonal entries equal to 100.The pad time, t p , was fixed to be 220 s and the constant flow rate of Q 0 = 0.03 m 3 /s was used after the pad time.The Kalman filter and feedback control systems were initialized at the end of pad time (i.e., t k ≥ t p ).In the closed-loop simulation, t f and ∆ were chosen to be 1220 s and 100 s, respectively.The proppant pumping schedule was divided into 10 substages and the duration of each substage was identical to ∆.At each sampling time, the controller was called and the first input, C stage,k , of the entire input profile (C stage,k , C stage,k+1 , • • • ) obtained by solving the optimization problem over a prediction horizon length of N p was applied to the dynamic model in a sample-and-hold fashion, and this procedure was repeated at each sampling time until the end of treatment.Please note that the controller performance can be improved by increasing the number of proppant pumping substages while maintaining the number of spatial locations across the fracture, at which we want to achieve the uniform proppant concentration [8].However, it is not viable to have a large number of substages, and hence, we selected 10 substages because we did not observe significant improvement after that.Remark 2. Please note that Kalman filter is applied before the MPC optimization.Specifically, at every sampling time t k , the real-time measurements of the fracture width at the wellbore, W 0 (t k ), and the fracture length, L(t k ), were obtained from the high-fidelity model, which is a virtual experiment.These measurements were then utilized by Kalman filter of Equation ( 21) to estimate the (unmeasurable) proppant concentration inside the fracture at t = t k .Finally, the inlet proppant concentration (i.e., the manipulated input), C stage,k , was obtained by solving the optimization problem with the cost function of Equation (22a) under the constraints described by Equations ( 22b) and (22d)-(22g).The data covering the relevant regions in the continuous state space was obtained from the closed-loop simulations under the above-mentioned MPC scheme by initializing the Kalman filter and the feedback control systems with different fracture width at the wellbore and fracture length values at t = t p , which were obtained by varying the flow rate Q 0 during the pad time-i.e., t ∈ [0, t p ).In this work, the pad time t p was fixed, and therefore, the fracture width at the wellbore and the fracture length at the end of pad time depended only on the flow rate Q 0 .Every different Q 0 profile during the pad time would result in a unique combination of the fracture width at the wellbore and the fracture length at the end of pad time that allows the feedback system to compute a control input profile, which is dissimilar to other profiles, if their Q 0 during the pad time is different.In each closed-loop simulation, the real-time measurements of the fracture width at the wellbore and the fracture length were taken at 10 time instants (i.e., 10 substages).At each time instant, the real-time measurement was utilized by the Kalman filter to estimate the (unmeasurable) proppant concentration inside the fracture.Each measurement can be represented as y(t k ) = [W 0 (t k ), L(t k )] T and the corresponding estimated state can be represented as T where Ĉi (t k ) is the proppant concentration inside the fracture at a specific location i at t = t k .Twenty-four closed-loop simulations were performed by initializing the Kalman filter and the feedback control systems with 24 different initial conditions, resulting in 240 real-time measurements (denoted by X measure ) and the corresponding set of 240 estimated states (denoted by X est ).

Remark 3.
In hydraulic fracturing, the real-time measurement of the proppant concentration inside the fracture is not available due to the remote subterranean location where the fracture propagates.To overcome this challenge, we used the Kalman filter to estimate the proppant concentration based on the available real-time measurements of the fracture width at the wellbore and the fracture length.Therefore, we use the term "estimated states" in Section 3 instead of "sampled states" as described in Section 2.

Initial Cost-to-Go Approximation
For all of the estimated states during the closed-loop simulations, the initial cost-to-go values (J µ 0 ) were computed as follows: where N t = 10 is the number of time instants where the proppant concentrations are estimated by the Kalman filter during each closed-loop simulation, and Φ( Ĉ(t j ), u(t j )) is given by the following equation: A function approximator, denoted as Jµ 0 ( Ĉ), was constructed to obtain the mapping between initial cost-to-go values and the estimated states (i.e., proppant concentration Ĉ) obtained during the closed-loop simulations.We used KNN (with K = 5) as a function approximator.In hydraulic fracturing, the real-time measurements readily available are limited to the fracture width at the wellbore and the fracture length.Therefore, we used only these two available measurements to determine the KNN.The function approximator works in the following way.Suppose there is a new measurement y.First, we determine the KNN of y from the set X measure , which consists of 240 real-time measurements obtained during the closed-loop simulations.Let the KNN of y be y 1 , respectively, from y. Second, we select K proppant concentration vectors Ĉ1 , Ĉ2 , • • • , ĈK from X est which were estimated by the Kalman filter utilizing y 1 , y 2 , • • • , y K during the closed-loop simulations, respectively.Then, the selected K proppant concentration vectors Ĉ1 , Ĉ2 , • • • , ĈK are used in Equation ( 25) to determine the cost-to-go at a new state Ĉ estimated by the Kalman filter utilizing the new measurement y. where The schematic diagram of the above procedure is shown in Figure 2.

Bellman Iteration
Value iteration is employed for offline cost-to-go improvement.In this approach, Bellman equation (Equation ( 3)) is solved iteratively for every estimated state in X est until convergence.At each iteration step, we calculate J i+1 for every estimated state Ĉ(t k ) by solving: where the superscript i denotes the iteration index, J i+1 is the updated cost-to-go value for Ĉ(t k ), Ji ( Ĉ(t k+1 )) is the estimate of cost-to-go value for the successor state Ĉ(t k+1 ) and J0 = Jµ 0 .
After updating the cost-to-go values for all of the estimated states in X est , we fit another function approximator to the resulting Ĉ vs. J i+1 ( Ĉ) data.
In this work, the Bellman iterations converged after six iterations with the following termination condition: 1 where N = 240 is the total number of estimated states during the closed-loop simulations.Figure 3 shows how the average absolute error, 1 In ADP, the time evolution of the system states is described by employing the ROM, developed for the MPC scheme, because of the high computational effort required to solve the high-fidelity model.Remark 4. In the ADP approach, the data covering the relevant regions in the continuous state space can be obtained by performing closed-loop simulations of the system under any known heuristics or sub-optimal control policies.The initial cost-to-go estimates do not affect the quality of the converged cost-to-go function [45,46].Therefore, the performance of the ADP-based controller does not heavily rely on the MPC design.In this work, we employed the MPC scheme developed by Siddhamshetty et al. [8] because it provides a good input profile compared to other available input profiles such as Nolte's pumping schedule.

Online Optimal Control
Once the cost-to-go iteration converges, we compute the control policy online by solving the following optimization problem: where N c = 6 represents the number of iterations required for convergence, and JN c denotes the converged cost-to-go function approximator.
The ADP-based controller is employed to achieve the uniform proppant concentration across the fracture at the end of pumping.To compare the performance of the ADP-based controller with the MPC system, we perform the closed-loop simulations under both of the control systems by initializing the Kalman filter with the fracture width at the wellbore and the fracture length values outside the "training set".We want to note that the "training set" refers to the set of values of the fracture width at the wellbore and the fracture length used to initialize the Kalman filter during the closed-loop simulations under the MPC system as described in Section 3.2.1.The profiles of the injected proppant concentration at the wellbore (i.e., the manipulated input) and proppant concentration across the fracture at the end of pumping using ADP-based controller and the MPC system are shown in Figures 4 and 5.The ADP-based controller shows an improvement over the MPC system in achieving the uniform proppant concentration across the fracture at the end of pumping.The performance of the closed-loop response can be understood based on the total cost-to-go values.Specifically, the cost-to-go values for the ADP-based controller and the MPC system are 179.66× 10 3 and 181.58 × 10 3 , respectively.Furthermore, the ADP-based controller takes less computational time than the MPC system to run a closed-loop simulation as shown in Figure 6.This is because ADP-based controller solves a single-stage optimization problem, Equation ( 29), whereas the MPC system solves a multi-stage optimization problem, Equation (22), at each sampling time.In MPC, the computational time required to solve the optimization problem keeps decreasing with time because the number of remaining stages to be considered keeps decreasing.However, in ADP, the computational time required at each sampling time is similar because ADP solves a single-stage optimization problem at every sampling time.Please note that the calculations were performed on a Dell workstation, powered by Intel(R) Core(TM) i74770 CPU@3.40GHz, running the Windows 8 operating system.We would like to highlight that the reduction in the computational time can be very beneficial to enhance the productivity of the produced wells.Specifically, despite advances in measurement techniques, it still requires interruption by experienced engineers to distinguish useful information from noise.For example, it takes about 1-3 min to post-process the microseismic data.Therefore, having a computationally efficient controller such as ADP-based controllers would compensate for the time delay due to the interruption by experienced engineers.Remark 5.In the ADP-based controller, we did not include the constraints directly but used the data from 24 closed-loop simulations which satisfied the constraints.Specifically, for every sampling time t k , we stored 24 values of the manipulated input obtained from the 24 closed-loop simulations.Let C LB k and C UB k be the minimum and the maximum value of the manipulated input at time instant t k among the 24 stored values, respectively.Finally, at every sampling time t k , Equations ( 27) and (29) were solved with C LB k and C UB k as the lower and the upper bound, respectively.

ADP-Based Control with Plant-Model Mismatch
We also studied the performance of the ADP-based controller when the rock mechanical properties are not available a priori.Specifically, we considered a plant-model mismatch in Young's modulus, E, which significantly affects the controller performance with respect to achieving uniform proppant concentration across the fracture at the end of pumping [44,47].
In this case, the data covering the relevant regions in the state space was obtained from the closed-loop simulations under the MPC system described in Section 3.2.1 by considering the following five different scenarios: (1) no variation in E; (2) 15% increase in E; (3) 5% increase in E; (4) 5% decrease in E; and (5) 15% decrease in E. For each value of E, a total of five closed-loop simulations under the MPC system of Equation (22) were performed by initializing the Kalman filter with different values of the fracture width at the wellbore and the fracture length.As a result, a total of 250 estimated states were obtained from the five scenarios.For all the estimated states, the initial cost-to-go values (J µ 0 ) were computed using Equation (23).Then, we employed KNN (with K = 5) as a function approximator to obtain the mapping between initial cost-to-go values and the estimated states as described in Section 3.2.2.This was followed by the value iteration step in which the Bellman iterations converged after five iterations with the following termination condition: where N = 250 is the total number of estimated states obtained from the five scenarios.Figure 7 shows how the average absolute error, 1 N ∑ N k=1 |J i+1 (x k ) − J i (x k )|, changes in subsequent iterations.Please note that the ROM used to run the closed-loop simulations under the MPC system for five different scenarios is developed with a fixed E, which is 0.5 × 10 10 Pa (i.e., no variation in E).Therefore, it does not capture the plant-model mismatch due to uncertainty in E. In this work, the same ROM with a fixed E is used in the ADP-based controller to demonstrate its capability of handling a plant-model mismatch in E.
To test the performance of the ADP-based controller, we performed the closed-loop simulation with the converged cost-to-go function by considering 10% decrease in E and by initializing the Kalman filter with the fracture width at the wellbore and the fracture length values outside the "training set".Note that the "training set" refers to the set of values of the fracture width at the wellbore and the fracture length used to initialize the Kalman filter during the closed-loop simulations under the MPC system for five scenarios.The profiles of the injected proppant concentration at the wellbore (i.e., the manipulated input) and proppant concentration across the fracture at the end of pumping using the ADP-based controller are shown in Figures 8 and 9.It can be observed in Figure 9 that the ADP-based controller is able to achieve uniform proppant concentration across the fracture at the end of pumping.In other words, it effectively handled the plant-model mismatch in E. Please note that, even with the same rock formation, the performance of hydraulic fracturing can be significantly different [48,49].This variability can, in part, be attributed to the spatially varying rock mechanical properties such as the Young's modulus, E. Therefore, having a controller such as the ADP-based control framework that can handle the plant-model mismatch in E can play a crucial role in enhancing the productivity of the produced well.

Conclusions
In this work, we developed an ADP-based strategy to regulate the proppant concentration across the fracture at the end of pumping in hydraulic fracturing.First, we performed the closed-loop simulations under the standard MPC to obtain measurable outputs, which were then used to estimate unmeasurable states by the Kalman filter, and to initialize the cost-to-go function approximation in ADP.Second, we employed the value iteration for the cost-to-go improvement by performing iterations of the Bellman equation for all the estimated states during the closed-loop simulations under the standard MPC.Lastly, the ADP-based controller with the converged cost-to-go function approximator was employed for the design of a feedback control system to achieve uniform proppant concentration across the fracture at the end of pumping.The generated pumping schedule using the proposed ADP-based control framework was able to produce a uniform proppant concentration that was closer to the target concentration than the pumping schedule generated by the standard MPC.The ADP-based controller was able to generate an online pumping schedule at a fraction of the time required for the standard MPC as we only had to solve the single-stage optimization problem at each sampling time.Furthermore, the ADP-based controller was able to handle the plant-model mismatch in the Young's modulus of a rock formation.Therefore, this method holds promise to control the hydraulic fracturing process by handling uncertainties in the important geological properties.

Figure 2 .
Figure 2. Schematic flow diagram to determine the cost-to-go at a new state Ĉ estimated by the Kalman filter using a new measurement y.

Figure 3 .
Figure 3. Profile of the average absolute error with iteration.

Figure 4 .Figure 5 .
Figure 4. Comparison of the pumping schedule generated using the ADP-based controller and the MPC system.

Figure 6 .
Figure 6.Comparison of the computation time to solve the optimization problem at each sampling time using the ADP-based controller and the MPC system.
the vector of output variables, W 0 (t k ) is the fracture width at the wellbore, L(t k ) is the fracture length, C(x 1 , t k ), • • • , C(x 6 , t k ) are the proppant concentrations at six different locations across the fracture, and

Table 1 .
Model parameters used for the simulation.