Generalized Extreme Value Statistics, Physical Scaling and Forecasts of Oil Production in the Bakken Shale

: We aim to replace the current industry-standard empirical forecasts of oil production from hydrofractured horizontal wells in shales with a statistically and physically robust, accurate and precise method of matching historic well performance and predicting well production for up to two more decades. Our Bakken oil forecasting method extends the previous work on predicting ﬁeldwide gas production in the Barnett shale and merges it with our new scaling of oil production in the Bakken. We ﬁrst divide the existing 14,678 horizontal oil wells in the Bakken into 12 static samples in which reservoir quality and completion technologies are similar. For each sample, we use a purely data-driven non-parametric approach to arrive at an appropriate generalized extreme value (GEV) distribution of oil production from that sample’s dynamic well cohorts with at least 1, 2, 3, . . . years on production. From these well cohorts, we stitch together the P 50 , P 10 , and P 90 statistical well prototypes for each sample. These statistical well prototypes are conditioned by well attrition, hydrofracture deterioration, pressure interference, well interference, progress in technology, and so forth. So far, there has been no physical scaling. Now we ﬁt the parameters of our physical scaling model to the statistical well prototypes, and obtain a smooth extrapolation of oil production that is mechanistic, and not just a decline curve. At late times, we add radial inﬂow from the outside. By calculating the number of potential wells per square mile of each Bakken region (core and noncore), and scheduling future drilling programs, we stack up the extended well prototypes to obtain the plausible forecasts of oil production in the Bakken. We predict that Bakken will ultimately produce 5 billion barrels of oil from the existing wells, with the possible addition of 2 and 6 billion barrels from core and noncore areas, respectively.


Introduction
Over the last decade, crude oil and gas production from hydrofractured shales in the United States has accounted for most of the net increase of global crude oil production. Therefore, it is important to have a reliable, quantitative method for delineating the possible futures oil and gas production in the data-rich US shales. The current industry-standard methods of forecasting production from shales are variants of the empirical decline curve analysis (DCA) [1], developed 75 years ago. Lately, some of the more sophisticated methods, for example, Fractional Decline Curve [2] have become popular.
Unlike other analytical and numerical methods that require numerous reservoir parameters and a lengthy calculation or simulation time, DCA only requires production data to predict future production by extrapolating oil or gas rate observed in a boundary-dominated flow regime. Because of DCA's In this paper, we adopt a hybrid data-driven and physics-based method of predicting oil or gas production in shales that has been introduced in our previous work [21,22]. Here, we consider only black oil production. First, we identify play regions in which reservoir quality is similar, see Figure 1.
In each region, we identify well classes by different completion technologies. Finally, a well class in a region constitutes a well sample. We ensure that oil production from all wells in each sample is statistically uniform, that is, has a unimodal distribution. For each well sample, we then identify well cohorts with at least 1, 2, . . . years on production. In general, well cohorts contain different sets of wells that satisfy the minimum time on production required for each cohort. It turns out that each cohort of wells is superbly characterized by its unique Generalized Extreme Value (GEV) distribution (see Appendix A) of annualized well rates or cumulative well production. Different cohorts in the same sample have different GEV distributions, each with its unique expected value, median and mode. Here we choose the somewhat better GEV fits of the production rate distributions. Each GEV distribution is statistically superior to the corresponding log-normal distribution at the 95% confidence level. When we plot the expected values of the GEV distributions of all wells cohorts in a sample versus elapsed time of production, we obtain this sample's average P 50 statistical well prototype that is purely field data-driven. Now we fit each statistical well prototype with a physical scaling curve that extends this prototype to 30 years on production. The physical scaling curves are based on an analytical solution of the pressure diffusion equation in the hydrofractured horizontal shale well geometry. In previous papers, we comprehensively detailed the physical scaling solutions for shale gas wells [23,24] and shale oil wells [22,25]. Late-time flow from outer reservoir encompassing the stimulated reservoir volume (SRV) [26] was also quantified. We have verified that our physical scaling is equivalent to detailed numerical reservoir simulations [27]. This scaling is much simpler to set up and runs almost infinitely faster than the corresponding reservoir simulations.
At this point, for each well sample in every play region, we have obtained a unique hybrid mean (P 50 ) well prototype with 30 years on production. In each play region, we know how many wells were drilled and completed each month up until current time. We then multiply each well prototype by the number of wells completed per month and stack them up to represent the total historical field rate and future production decline. In this manner, we obtain a 'base' case forecast for all existing wells. This base case forecast is a 'do nothing' scenario with no new wells drilled in the future. For all other forecasts of future field production rate, we first determine the infill potential or the number of wells that can be drilled in the future without causing significant interwell pressure interference (fracture hits). We cover each region with a fish-net grid that consists of one-square-mile pixels. We then calculate the infill potential as the number of wells that can be drilled so that the total number of wells in each pixel is less than the maximum allowable number of wells without fracture hits. Next, based on the infill potentials for all regions, we create future drilling programs to obtain plausible forecasts of oil or gas production. Based on current rig count in the Bakken, we assume a constant overall drilling rate. Finally, we assign the correct well prototype to every future well that will be drilled during each month of a postulated drilling schedule, and sum them up to obtain a forecast scenario.
In this paper, we select the Bakken shale, the current second-largest oil producer in the U.S. with 1.5 million bbl of oil per day, as an illustration. Being one of the oldest shale oil plays, Bakken has been a field laboratory to test drilling and completion technologies and increase well productivity. Currently, Bakken has ∼15,000 active hydrofractured horizontal wells with a few wells that have 18 years of production data. In a previous paper [22], we scaled well-by-well all ∼15,000 wells in the Bakken. We accounted for well refracturing and/or changes in downhole pressure. It turns out that the 12 well prototypes obtained with our hybrid GEV-physical scaling method are as good in duplicating the total field rate as the super-precise scaling of each individual well in our previous work [22]. Given the results of our analyses that are free of bias, policy-makers should not assume that the production boom in the Bakken shale will last decades longer.

Results
Our approach is as follows: • Divide all 14,678 horizontal oil wells in the Bakken shale into 12 samples in which oil production is statistically uniform; • Fit a generalized extreme value distribution to all wells in every sample and obtain 12 stable mean P 50 well prototypes; • Fit the physics-based scaling curves to every statistical well prototype and extend these prototypes smoothly to 30 years on production; • Replace oil production from all existing wells with the 12 extended well prototypes and obtain a 'base' forecast; • Calculate the infill potential for each of the 12 regions in the Bakken; and • Create the plausible infill drilling schedules and forecast total field oil production rate up to the year 2050.  Figure 2. Maps of all 14,678 active horizontal wells completed in the Middle Bakken formation (a) and Three Forks formation (b) colored by maximum daily oil rate. The red lines define the core areas for each reservoir and delineate best producing wells with more than 750 barrels of oil per day. The blue lines define the effective areas for drilling neglecting a few poor producing wells outside. We define the noncore areas as the difference between the effective and core areas. In the Middle Bakken, there are currently 5732 and 4128 wells located in the core and noncore areas, respectively. The other 2672 and 2146 wells are located in the core and noncore areas of Three Forks, respectively.

Design of Well Regions
In the previous work [21], we divided all 13,141 horizontal wells in the Barnett shale into 6 static regions that corresponded to the geographical borders of 6 major gas-producing counties. We justified this approach as follows: (i) Productive Barnett shale is a single gas reservoir; (ii) the Barnett play area is relatively small, and the division into counties captures different reservoir qualities; and (iii) completion technologies have not been revolutionized over time.
However, in the Bakken shale, we introduce a more complicated set of 12 static well samples classified by reservoir, geography, and completion dates. In Bakken, there are two different producing reservoirs, the Middle Bakken and Three Forks. Accordingly, we first split all Bakken wells into the Middle Bakken and Three Forks groups. To account for different reservoir quality, we then divide each of these two well groups into two macro regions: the core and noncore areas, see Figure 2. Finally, we further refine the four Bakken macro regions by splitting them into three classes by completion date intervals, 2000-2012, 2013-2016 and 2017-2019. These classes reflect advancements in well completion technology, such as longer wells, more fracture stages, fewer clusters and perforation shots, bigger hydrofractures, more proppant, and so forth. In the end, we divide the existing 14,678 horizontal oil wells in the Bakken into 12 unique samples listed in Table 1. Figure 3. The procedure of arriving at the Generalized Extreme Value (GEV) well prototypes in a given shale play. (a) Define static regions in which oil production is statistically uniform. (b) For each region, gather the dynamic cohorts of wells with ≥i years on production, i = 1, 2, 3, . . . . (c) Fit GEV probability density function (PDF) to each well cohort. (d) From the corresponding cumulative distribution function (CDF) pick the P 10 , P 50 , and P 90 values for each cohort. (e) Construct time-lapse P 50 well prototype for each region, by connecting all P 50 values of all cohorts. (f) Time-shift and superpose the GEV well prototypes to match past production. Reproduced from our previous work [21].

Statistical Well Prototypes
For each of the 12 Bakken well samples defined in Section 2.1, we construct a statistical well prototype based on the Generalized Extreme Value (GEV, see Appendix A) mean or the P 50 values of annual rates from all well cohorts that exist in that particular region. Figure 3 illustrates the procedure of arriving at the GEV well prototypes. For every GEV distribution fit we acquire values of the location parameter (µ), scale parameter (σ), shape parameter (ξ), confidence interval (CI), mean (P 50 ), median, mode, P 10 and P 90 . In this paper, we show only two GEV fit examples (Figures 4 and 5). Please see Supporting Online Materials-1 to find all GEV fits for all 12 regions of the Bakken shale. The resulting statistical well prototypes are plotted versus the square root of time on production in Figure 6. In the initial year on production, for some well cohorts, time was shifted by subtracting the first 1-3 months on production (see the second column of Table 2) to discount all possible initial disturbances in well performance. After these time shifts, each mean or P 50 well prototype starts from a straight line versus the square root of time on production, as expected in linear flow regime. Later, cumulative oil production bends down due to the inter-hydrofracture pressure interference and exponential decline of production rate [23]. For our GEV distribution fits, the mean or P 50 prototype is always higher than the median, and the median is higher than the mode. The P 10 and P 90 prototypes diverge from the mean with time, indicating that with time the best wells grow better and worst wells worse. A similar trend was also observed in the Barnett shale [21].  (c) GEV CDF with the 95% CI on the residual for the P 10 well.

Physical Scaling Fits
For each region, we fit the statistical P 50 well prototype with a physical scaling curve, see Appendix B, to extend this prototype to 30 years on production. The fitting procedure is detailed in the Materials and Methods section. Briefly, by least squares fit, we find optimum values of the scaling parameters τ and M so that the physics-based well prototypes match the statistical well prototypes. The results are shown in Figure 6. The red solid lines show the most pessimistic versions of each physics-based prototype, which assume that oil is produced only from the interior of the stimulated reservoir volume (SRV). The green solid lines show a more realistic forecast with an assumption that at late times there will be additional exterior flow of a √ t towards the SRV. Finally, one obtains the Estimated Ultimate Recovery (EUR) from each region by identifying the endpoint of cumulative production for the physics-scaling prototype. In Table 2, the scaling parameters, τ, M and a, and EUR values are listed for each region with and without exterior flow. Notice that in general the newest wells have the shortest pressure interference times, τ, and fastest decline rates.  . . ,15 years on production. The dashed lines labeled P 10 and P 90 denote wells whose cumulative production is exceeded by 10% and 90% of wells in each region. The red and green lines are the physics-based scaling curves that match each average well, respectively, with and without exterior flow during late time production. In general, ultimate oil recovery from the core areas is higher than that from the noncore ones. The Middle Bakken wells are slightly more productive than the Three Forks wells. The newly completed wells have much higher initial oil production, but they decline faster, resulting in more or less the same ultimate recovery as older wells. 165 We have replaced actual field production rate from all existing wells in each region with 166 its corresponding well prototypes multiplied by the numbers of actually completed wells and . . , 15 years on production. The dashed lines labeled P 10 and P 90 denote wells whose cumulative production is exceeded by 10% and 90% of wells in each region. The red and green lines are the physics-based scaling curves that match each average well, respectively, with and without exterior flow during late time production. In general, ultimate oil recovery from the core areas is higher than that from the noncore ones. The Middle Bakken wells are slightly more productive than the Three Forks wells. The newly completed wells have much higher initial oil production but they decline faster, resulting in more or less the same ultimate recovery as older wells.

Base or 'Do Nothing' Forecasts
We have replaced actual field production rate from all existing wells in each region with this region's well prototype multiplied by the historic numbers of completed wells, and we time-shifted the products. The superposed prototypes match the historical field rate. Figure 7 shows the sum of the well prototypes with and without exterior flows (green and red lines) versus the total field historical oil rate and cumulative oil (black lines). The results are rather satisfactory. Our GEV prototypes are robust. They capture all peaks of total historical oil rate and flawlessly match the cumulative production. As stated before, the red line forecast is pessimistic. It assumes that there will be no production contribution from the reservoir exterior to the SRVs of individual wells. This pessimistic case may hold if a shale play, here Bakken, is already overdrilled. For the Bakken shale, we can infer that the 14,678 existing wells will ultimately produce 4.5-5.3 billion of oil by 2050. This is a 'base' or 'do nothing' case with no future drilling in the Bakken shale.
(a) (b) Figure 7. The actual and forecasted total field rate (a) and cumulative oil (b) in the Bakken shale. Total field cumulative production curves were obtained by stacking calendar-shifted average wells. Total production rates were obtained by differencing cumulative production. The red and green curves are the physical scaling forecasts with and without exterior flow, while the black line shows the historical production from the existing 14,678 wells. The red physical scaling gives the lower bound of EUR estimate with about 4.5 billion bbl by 2050. Assuming reasonable exterior flow, EUR prediction becomes slightly more than 5 billion bbl.

Infill Potentials
We have calculated the number of wells that can be drilled in the future in every one-square-mile pixel of the grid that covers the entire Bakken play. In order to calculate infill potentials, one should first determine well density. However, the publicly available data rarely provide information about the bottomhole locations of the wells. Instead, only surface locations are reported as latitude-longitude coordinates. Therefore, we have developed an algorithm that allows us to predict the bottomhole well density from surface well locations. See Figure 8 and the Materials and Methods section for more details. As mentioned before in Section 2.1, the Bakken shale has two reservoirs, the Middle Bakken and Three Forks and two significantly different reservoir qualities, the core and noncore areas. Thus the infill potentials should be calculated for these four parts of the Bakken. Supporting Online Materials-2 shows the gridding, well locations, wellhead densities, well densities and infill potentials for the four Bakken parts. Table 3 lists the numbers of wells that can potentially be drilled in the future. We note in passing that infilling the less productive noncore areas to the same well density as the core areas is likely too optimistic. The algorithm is as follows: (1) For each well, record its lateral length and calculate the number of squares, n, intercepted by the lateral. For example, a 5000 ft lateral will occupy one square and a 10,000 ft lateral will occupy two squares because one mile is 5280 ft. (2) Search for the least occupied n squares in all possible directions (i.e., north, northeast, east, southeast, south, southwest, west and northwest).
(3) Increase the value of well density by 1 well/mi 2 for every least dense square found. (4) Repeat the process until all wells in the area of interest are exhausted. (e) Calculate an infill potential map by subtracting the calculated well density from the maximum number of wells, N max (e.g., N max = 4 wells/mi 2 to avoid frac hits). The summation of all values in the map is the infill potential for the one-square-mile grid. In this case, we obtain 3650 wells. As most wells have 10,000 ft laterals and occupy two one-square-mile squares, we divide 3650 by 2 to obtain 1825 infill potential wells in the Three Forks core area.

Future Drilling Scenarios and Infill Forecasts
We have created several future drilling scenarios based on infill potentials listed in Table 3. We assume that the rig count in the Bakken will not significantly change from the current value and 120 wells will be drilled each month between now and 2041 (Figure 9a). By looking at the current trend, many operators have narrowed their drilling choices to only the core areas to avoid spending money on the less productive wells with high watercuts typical of the noncore areas [22]. Thus, it is reasonable to create a drilling schedule that exhausts all potential wells in the core areas first, before moving out to the less productive noncore areas. Figure 9b shows this scenario predicting that the Bakken core area will be completely drilled out by 2022. Later, the drilling in noncore areas can last up to 2041, according to our calculation. See Supporting Online Materials-2 for the more detailed schedules.
(a) (b) Figure 9. (a) The number of drilled wells, completed wells and rig count per month for the Bakken play from the U.S. Energy Information Administration (EIA). The drilled and completed wells are almost the same for each month and are strongly correlated with the number of rigs available. These data reveal an increase in drilling efficiency. In 2015, one rig could drill about 1.2 wells per month, while in 2019 so far, one rig could drill two wells per month. The current drilling rate is constant at about 120 wells/month. (b) A future drilling scenario for the Bakken region up to 2041. The plan is to continue at the current drilling rate of 120 wells/month for both core and noncore areas. The numbers of wells to be infilled in each region were previously calculated using the procedure detailed in Figure 8. The results show that the core areas will be fully drilled by mid 2022, leaving the less productive noncore areas for infilling until 2041.
Next, we assign each new well scheduled to be drilled in the future to its corresponding region and class prototype. The results are shown in Figure 10. The black solid lines are the historical total field oil rate and cumulative oil. The purple lines denote the 'base' forecast from all 14,678 existing wells displayed in Section 2.4. The red lines show the result of adding 4402 new wells in the core area. Because all wells in the core area are high-grade wells, the total oil production rate climbs up its highest production peak at 1.6 million bbl/d in the year 2022. After 2022, there is no further drilling in the core area and the production rate declines by a factor of two within one year. Ultimately, the total of 19,080 wells (existing + new core) will produce 7 billion bbl of oil by the year 2050. The blue lines show the most optimistic case with not only the core areas but also the noncore areas will be drilled out by the operators. The drilling of less productive wells in the noncore areas will maintain the plateau oil rate of 1 million bbl/d up to the year 2041. However, this plateau will require the drilling of additional 26,500 wells (almost twice the current number of wells) which will have high water cuts. Ultimately, the total of 45,580 wells (existing + core + noncore) will produce 13 billion bbl of oil by 2050.
(a) (b) Figure 10. The forecasted total field rate (a) and cumulative oil (b) for Bakken based on the drilling scenario in Figure 9b. Since we plan to infill the core areas first, the field oil rate will reach the all-time production peak of about 1.6 million bbl/d by mid 2022, leaving no 'sweet spots' in the core areas. The continuous infill of the less productive wells in the noncore areas will decline the production to a plateau of 1 million bbl/d. After 2041, no more drilling locations will be left in the Bakken region and the oil rate will steeply decline to half a million bbl/d by 2042 and 0.2 million bbl/d by 2050. Ultimately, the 14,898 existing wells will give an EUR of 5 billion bbl. By adding 4402 wells in the core areas, EUR will increase to 7 billion bbl. By adding another 26,512 wells in the noncore areas, EUR will increase to 13 billion bbl.

Discussion
We have presented an alternative to the current industry-standard empirical forecasts of oil production from hydrofractured horizontal wells in shales. With our hybrid modeling approach, we have matched current oil production in the Bakken rather accurately. We have also delivered an optimal prediction of possible futures of the Bakken shale play for up to three decades.
Our Bakken oil forecasting method extends the previous work on predicting fieldwide gas production in the Barnett shale [21] and merges it with our new scaling of oil production in the Bakken [22]. Our field data-driven statistical well prototypes are conditioned by well attrition, hydrofracture deterioration, pressure interference, well interference, progress in technology, and so forth. With no physical scaling, these prototypes follow the exact physics of linear transient oil flow with pressure interference. Therefore these statistical well prototypes serve as templates to calibrate the parameters of our physical scaling model (τ and M) [23,24] and obtain a smooth time-extrapolation of oil production that is mechanistic and not merely an empirical decline curve. At late times, we add to our extended prototypes some radial inflow from the outside of well SRVs [26].
The extended P 50 well prototypes in Figure 6 can be used to compare ultimate recovery in each of the static regions we have identified in the Bakken. The lower bounds are the extended P 50 prototypes without exterior flow (red lines). In most cases, wells completed in the upper Three Forks reservoir are somewhat less productive than those in the Middle Bakken reservoir. The reasons for this difference are: (1) higher water saturation and water cut in Three Forks, (2) faster decline rate (lower τ) and (3) lower initial oil in place (lower M) [22]. This difference is consistent with the stratigraphic column of the Bakken total petroleum system, where Middle Bakken is sandwiched between two world-class source rocks with 10% TOC, the Upper Bakken Shale and the Lower Bakken Shale. On the other hand, the Three Forks formation is below the Lower Bakken Shale member and is exposed to water-bearing formations beneath [3,[28][29][30]. For the same reasons, the core and effective drilling areas in the Three Forks are smaller than those in the Middle Bakken.
In both reservoirs, production from the core areas is superior to that from the noncore areas. The core area located in the center of Williston Basin has been known as the most oil-prolific location in the Bakken region in North Dakota [3,[28][29][30]. Since the 1950s, oil has been produced there in the thickest, naturally fractured Middle Bakken formation in the Nesson anticline. One inexpensive vertical well drilled in the core area in the 1950s has had ultimate recovery of 200 kbbl, the same as a $10 million hydrofractured horizontal well drilled in the Middle Bakken noncore area nowadays. The noncore areas are less productive because the three Bakken formations: Upper, Middle and Lower Bakken are pinching out upward (are thinner and less mature) near the edges of the Williston basin. Consequently, the noncore areas are producing more water than oil, with watercut exceeding 50% on average.
The newly completed wells have much higher initial oil rates than the older ones, because they have: (1) longer lateral lengths, (2) bigger hydraulic fractures and (3) more fracture stages [22,[31][32][33][34]. However, the newly completed wells decline faster and have essentially the same ultimate recovery as the older wells. The reasons for this behavior have been described elsewhere [35]. Interestingly, in most cases, older wells completed in 2000-2012 have higher ultimate recovery than newer ones, even though their initial production rates are lower. These older wells might have been drilled in the best locations ever in the Bakken region. In addition, shorter lateral lengths and fewer fracture stages may help in maintaining a stable pressure drawdown and prevent reservoir degassing that is unfavorable for future production. For comparison, the average lateral length in 2005 was 5000 ft while the average lateral length in 2019 doubled to 10,000 ft. Historically, the number of hydrofracture stages in the Bakken has increased over time from 8 stages in 2007 [36] to 18 stages in 2009 [31], 35 stages in 2016 [32] and to as many as 60 stages in 2019 [33].
According to our records, more than 90% of the wells completed after 2017 are located in the core areas only. Operators have learned to drill only the best parts of the Williston Basin and avoid the less mature noncore areas. However, after calculating the infill potentials of all areas, we predict that by 2021 there will be no well locations left for future drilling in the core areas. Assuming a constant current drilling rate of 120 wells per month, the total field oil rate in the Bakken will reach record level of about 1.6 million bbl/d in 2021. Without further drilling, production will decline by one-half within a year. Later, operators will be forced to drill in the less productive, high watercut noncore areas along the edges of the Williston Basin. Our findings suggest that policy-makers should not assume that the shale oil boom in the Bakken will last for several decades longer. We recommend that operators not focus only on increasing the initial oil rate. Maintenance of reservoir pressure above the bubble point by preventing over-drilling is key to increasing ultimate oil recovery.

Materials and Methods
We mined public well data in the Bakken region from DrillingInfo database, the Montana Department of Natural Resources and Conservation-Board of Oil and Gas Conservation website and the North Dakota Department of Mineral Resources-Oil and Gas Division website. From DrillingInfo, we selected only 14,678 horizontal oil wells that were completed in the Middle Bakken and Three Forks formations after January 2000. We designed an integrated MATLAB R software package to perform data cleanup, consolidation and unit conversions. The clean production data for each well consist of a vector of elapsed times on production, vector of oil and water production rates and records of wellhead location (latitude and longitude), lateral length, completion date, reservoir description and maximum oil rate. The average reservoir and well lateral properties are listed in Appendix C. Our approach can be divided into six main steps: Define play regions in which oil production is statistically uniform. In each region, follow the steps below: (a) Divide all wells in a play into i = 1, 2, . . . , n well groups, where n is the number of reservoirs. In the Bakken play, for example, there are two main reservoirs. Thus, at this step, we identify two groups of wells, and n = 2 in the Bakken shale.
Further subdivide each of these well groups into j i , i = 1, 2, . . . , n, areas with different reservoir qualities. For example, in the Bakken play, the center of the basin has the thickest oil-prolific layer and hence the highest oil production [3,22,[28][29][30]. Thus, we have delineated an area at the center of the basin (with maximum oil rate >750 bopd) as core area and the rest as noncore area, see Figure 2 for more details. At this step, we have four well groups that fall into four distinct, static play regions, In Bakken, j 1 = 2 and j 2 = 2, that is, there are two reservoir qualities (core and noncore) for each of the reservoirs (Middle Bakken and Three Forks). (c)

Reservoir Region Sample Size
Subdivide wells in each of the four regions (two reservoirs × two reservoir qualities each) by time interval classes that encompass significant changes in well completion technology. In the Bakken play, for instance, the newly completed wells have longer lateral lengths, bigger hydraulic fractures and more fracture stages [22,[31][32][33][34].  Table 1. In Bakken, each of the 4 play regions has 3 completion classes, finally yielding 12 static well samples.

II.
For each well sample, obtain a P 50 well prototype by fitting a Generalized Extreme Value (GEV) distribution to all qualifying sample wells as follows: (a) From a given static well sample-k (a region further subdivided by completion dates), consider a dynamic cohort l k that contains all wells that have at least l k years on production (l k = 1, 2, . . . , t max k and k = 1, 2, . . . , N sample ). For example: (i) There are 2550 wells in Middle Bakken noncore [2000][2001][2002][2003][2004][2005][2006][2007][2008][2009][2010][2011][2012] group. However, there are only 2540 wells with at least one year on production (the other 10 wells have production records with less than 12 months). Thus, we retain these 2540 wells as cohort-1 of this particular group, see Figure  4. (ii) There are 428 wells in Three Forks core [2000][2001][2002][2003][2004][2005][2006][2007][2008][2009][2010][2011][2012] group. But, only 197 wells have production records of at least seven years. As such, these 197 wells are qualified as cohort-7 of this particular well group, see Figure 5. For detailed GEV fits for all dynamic well cohorts and static well groups in the Bakken play, see Supporting Online Materials-1. For Bakken, N sample = 12.
Define a set X l k that takes values of oil production rate (kbbl/year) from all wells with at least l k years on production in sample k.

(c)
For every X l k , fit a Generalized Extreme Value (GEV) distribution using Equation (A1) and obtain the location parameter, µ, scale parameter, σ and shape parameter, ξ. (All of these parameters are further subscripted by l k but we will skip this complication in the notation.) (d) Calculatex l k as the GEV mean of X l k for each year-l k , l k = 1, 2, . . . , t max k , using Equation (A3). (e) For each well sample k, obtain the P 50 k well prototype. We can use the same procedure to obtain other statistic parameters for each well cohort. The P 10 k and P 90 k values can be calculated as the ninetieth and tenth percentiles of X l k , respectively. The median and mode can be calculated using Equations (A5) and (A6). by connecting GEV mean values of all years l k = 1, 2, . . . , t max k P 50 k = (x 1 ,x 2 , . . . ,x t max k ), k = 1, 2, . . . , N sample , kbbl/year. III.
Extend P 50 k (we will subsequently skip the implied subscript k) well prototypes by fitting to them physical scaling curves as follows:  Tables A1 and A2.
The matching values of τ and M for two both scaling curves and for all well samples are detailed in Table 2.
Get the extended cumulative mass produced,m (ktons) by multiplying the fitted M and RF(t/τ) from the master curve where t is now calculated each month The benefit of matching P 50 with a physics-based scaling curve is that we can interpolate and extrapolate production data precisely. Thus, we can change time intervals from years to months and forecast production decades into the future. We recommend to use monthly intervals for precise forecasts that is, t = 1 12 , 2 12 , . . . , 50 years. (d) Obtain the extended P 50 well prototypes for well sample k by differencingm converted to volumeP 50 = ∆m ∆t(ρ o,ST + R s ρ g,ST ) , kbbl/month IV.
Obtain base forecast of total field oil production for existing wells as follows: (a) Create a calendar date series with monthly intervals for example, (1/1/2000, 2/1/2000, . . . , 12/1/2050) and assign N dates as the length of this series.
Create an empty vector s with the length of N dates : (s = [s 1 s 2 · · · s N dates ] = [0 · · · 0]). (c) For each well, find index ii in the calendar date series that brackets well completion date. (d) For each well, add to vector s its correspondingP 50 right-shifted by ii.
(e) Calculate total field oil rate, q and total field cumulative oil, Q for existing wells as follows: q = s/(365.25/12) × 10 −3 million bbl of oil per day (4) V. Calculate infill potential as follows: (a) Create a one-square-mile fishnet inside the boundary of each area defined in step I(b).
Search all existing wells located inside the boundary.
(c) Calculate wellhead density by counting the number of wells on each of the one-square-mile squares. (d) Calculate approximate well density following algorithm below.
i. For each well, calculate the number of squares, n, intercepted by the lateral For example, a 5000 ft lateral occupies one square, a 9000 ft lateral occupies two squares and a 14,000 ft lateral occupies three squares because one mile is 5280 ft. ii.
Search for the least-occupied n squares in all possible directions. iii.
Increase the value of well density by 1 well/mi 2 for every least dense n square found. iv.
Repeat points i-iii until all wells in the area of interest are exhausted.
(e) Calculate infill potential by subtracting the calculated well density from the maximum allowable number of wells that still avoid frac hits. For example, in the Bakken play, the tip-to-tip hydraulic fracture length is roughly 1200 ft. 5280 ft/1200 ft ≈ 4. Therefore, the maximum allowable number of wells to avoid frac hits is 4 wells per square mile. Suppose that at some location, the well density already is 3 wells per square mile. Then, the infill potential is 4 − 3 = 1 well per square mile.

VI.
Obtain an infill forecast of total field oil production after adding future wells.
(a) Create monthly drilling schedules for every infill potential and assume a constant drilling rate based on current rig availability, see Figure 9b and SOM-2.
For every drilling schedule, find index ii in the calendar date series that brackets infill schedule date. Let N f be the number of wells to be drilled on that date. (e) For every drilling schedule, add to vector s f its correspondingP 50 after right-shifting it by ii and multiplying by N f .
(f) Calculate total field oil rate, q f and total field cumulative oil, Q f after infill drilling as follows q f = q + s f /(365.25/12) × 10 −3 million bbl of oil per day (7) where q and Q are total field rate and field cumulative from existing wells calculated before with Equations (4) and (5).

Conclusions
• We have provided a transparent hybrid method of forecasting oil production at shale basin scale.

•
Our statistical approach generates the non-parametric well prototype templates that are used to calibrate our physics-based flow scaling with late-time radial inflow. • In particular, our average P 50 well prototypes follow the physics of linear transient flow and are used to calibrate the physics-based scaling extensions to 30 years on production. • A combination of GEV statistics with physical scaling matches historical production data almost perfectly and gives a smooth, physics-based estimate of future production.
• Our prediction of the Bakken future is optimal in the least square sense [37][38][39][40]; in other words, our prediction is as good as it gets given all data at hand and first order physics of oil recovery in the Bakken shale.

•
Regulators may want to consider our approach as a prerequisite to booking reserves in oil shales.

•
Newly completed wells have almost the same ultimate recovery as the older ones, despite their much higher initial oil rates.

•
Ultimately, we predict that the 14,678 existing wells in the Bakken will produce 5 billion bbl of oil by 2050 (∼340 kbbl/well).

•
After drilling additional 4400 new wells at the rate of 120 wells/month, the core area of the Bakken will be drilled out by 2021 and ultimate recovery will be 7 billion barrels of oil.

•
With 26,500 more wells to be drilled in the noncore area until 2041, ultimate recovery in the Bakken might be 13 billion barrels of oil but drilling of such scale is unlikely to happen. • Policy-makers should not of assume that oil boom in the Bakken shale will last decades longer.

Acknowledgments:
The authors thank the Ali I. Al-Naimi Petroleum Engineering Research Center (ANPERC) at KAUST for supporting this research. We thank the reviewers for their thorough, informative and timely reviews.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A. The Generalized Extreme Value (GEV) Distributions
The Generalized Extreme Value (GEV) distribution [41] combines three "Extreme Value" distributions, Weibull [42], Gumbel [43] and Frechet [44] into a single functional form. The data decide which of the three distributions is appropriate. In our case, Fréchet distribution almost always wins.
The probability density function (PDF) of GEV distribution contains three parameters: location, µ; scale, σ; and shape, ξ If ξ is zero, Gumbel distribution results, followed by Weibull distribution, if ξ is negative and by Fréchet distribution if ξ is positive ( Figure A1).
Integrating the GEV PDF, one obtains the cumulative distribution function (CDF) The expected value (mean) of GEV distribution is defined as and the variance is where g k = Γ(1 − kξ) and γ is Euler's constant. Notice that for ξ ≥ 1, the expected value becomes infinite and for ξ ≥ 1 2 the variance goes to infinity. Figure A1. Examples of three GEV distributions with the location parameter, µ = 0, the scale parameter, σ = 1. and three values of the shape parameter i.e., ξ < 0 (Weibull), ξ = 0 (Gumbel) and ξ > 0 (Fréchet).
The median of GEV distribution is defined as and the mode is

Appendix B. Physical Scaling Approach to Forecasting Oil Production in Hydrofractured Shales
The physical scaling approach used in this paper was first derived by Patzek et al. to predict gas production from thousands of hydrofractured horizontal gas wells in the Barnett shale [23,24]. Later, Patzek et al. derived the scaling curve solution for oil production in the Eagle Ford shale [25].
Eftekhari et al. extended the model to include gas inflow from outside of SRVs for the Barnett shale wells [26].

Appendix B.1. Physical Scaling without Exterior Flow
Patzek et al. [25] derived the solution of oil production inside SRV based on a simple model illustrated in Figure A2a. This model assumes bi-linear flow towards hydraulic fracture planes inside SRV with the volume H × 2d × 2L. Briefly, by solving a one-dimensional nonlinear pressure diffusion equation analytically, we obtain a master curve equation for the interior oil production problem wheret is the dimensionless time defined as the ratio between elapsed time on production, t and the characteristic time of pressure diffusion, τ.t Here τ is determined by the onset of pressure interference between two hydrofractures 2d apart and α i is the constant hydraulic diffusivity at initial reservoir conditions  Figure A2. (a) Schematic of bi-linear flow towards hydraulic fractures in a shale well inside the SRV with volume H × 2d × 2L (interior problem). The early radial flow is neglected because the hydrofracture permeability is much higher than that of the matrix. Reproduced from Patzek et al. [25] (b) Illustration of physical scaling approach of the interior oil production problem. Reproduced from Saputra et al. [27] .
The fractional oil recovery factor, RF, is the ratio between the cumulative oil mass production, m and the stimulated mass contained in the SRV, M The physical scaling approach is as follows. By adjusting τ and M, minimize the square of the difference of MRF(t) minus the cumulative mass produced with Equation (A13). A least square fit procedure was used. Figure A2b shows the master curve and scaled cumulative mass produced as the black and red lines.

Appendix B.2. Physical Scaling with Exterior Flow
Eftekhari et al. [26] solved numerically the exterior gas production problem in the Barnett shale. We later extended the same concept to oil shales and proposed an analytical solution of this problem. Figure A3a illustrates interior flow towards the hydraulic fractures inside SRV and exterior flow from the outer reservoir towards SRV. We write a new master curve equation for both interior and exterior flow as follows RF T (t) = RF I (t) + RF E (t) where RF I (t) is the master curve equation for interior oil production problem (Equation (A7)). Eftekhari et al. [26] defined the recovery factor of the exterior problem RF E as a function of the new scaled time,t = t/τ where m E (t ) is the cumulative exterior production and M is the stimulated mass inside SRV defined in Equation (A12) Here τ is the characteristic time scale for exterior production, defined as the time it takes low pressure to diffuse from the wellbore over a distance equal to fracture half-length L Because oil flow from the exterior reservoir is another one-dimensional pressure diffusion problem, one can assume the square root of time solution [26] with a constant slope c 1 Because both the interior and exterior reservoir likely contain similar shale matrix with similar reservoir properties, the constant c 1 can be assumed to be the slope of the interior master curve versus square-root of time (Equation (A7) and Figure A2b), which is constant at early times, say, 0 <t * < 0.2.
RF E can be written as a function of the scaled time for the interior production by substituting Equations (A17) and (A18): where a is defined as Recalling Equation (A14), we obtain a new master curve equation that embodies interior and exterior oil flow as follows Figure A3. (a) Illustration of interior flow towards the hydraulic fractures of a shale well inside SRV and exterior flow from the reservoir beyond SRV. Adapted from Patzek et al. [25] and Eftekhari et al. [26] (b) Illustration of physical scaling approach of the interior and exterior oil production problem. Adapted from Saputra et al. [27] The physical scaling approach for this problem is as follows. By adjusting τ and M, minimize the square of the difference of MRFT(t) minus the cumulative mass produced with Equation (A13). A least square fit procedure was used. Figure A3b shows the scaled cumulative mass as the red line and the new master curve as the black line that deviates from the interior master curve in Equation (A7) (shown as the gray line) at late times on production.