Next Article in Journal
Thermomechanical Analysis of the GTM 400 MOD Turbojet Engine Nozzle During Kerosene and Hydrogen Co-Combustion
Previous Article in Journal
Two-Stage Transformer–Customer Relationship Identification Strategy for Low-Voltage Distribution Grid Using Physics-Guided Graph Attention Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Wellhead Choke Performance for Multiphase Flowback: A Data-Driven Investigation on Shale Gas Wells

1
School of Energy Resources, China University of Geosciences (Beijing), Beijing 100083, China
2
Beijing Key Laboratory of Unconventional Natural Gas Geological Evaluation and Development Engineering, Beijing 100083, China
3
Department of Civil and Environmental Engineering, University of Alberta, Edmonton, AB T6G 2R3, Canada
*
Author to whom correspondence should be addressed.
Energies 2025, 18(16), 4381; https://doi.org/10.3390/en18164381
Submission received: 9 July 2025 / Revised: 4 August 2025 / Accepted: 14 August 2025 / Published: 17 August 2025

Abstract

Wellhead choke performance is critical for flowback choke-size managements in unconventional gas wells. Most existing empirical correlations were originally developed for oil and gas flow, and their accuracy for gas/water multiphase flowback remains uncertain. This study presents a data-driven approach to examine the choke–performance relationship during multiphase flowback. We compiled a flowback dataset containing 18,660 surface measurements from 37 shale gas wells in the Horn River Basin. Using machine learning, we modeled choke performance based on flowback features including water rate, gas/water ratio, wellhead and separator pressures and temperatures, and choke size. The models achieved strong predictive accuracy. Based on the machine learning results, we developed a new choke–performance correlation tailored to multiphase flowback. This model was validated against field data and showed reliable performance. The findings provide a useful tool for optimizing choke-size strategies during flowback in hydraulically fractured gas wells, especially in unconventional reservoirs.

1. Introduction

Gas production from shale/tight gas reservoirs has advanced significantly in recent decades. Hydraulic fracturing is one of the primary techniques to unlocking resources from the low-permeability reservoirs. After hydraulic fracturing, wells undergo a flowback process for fracture cleanup, facilitating gas production [1,2,3,4,5,6,7,8]. Flowback operations commonly involve frequent choke-size adjustments, and flowback rates and pressure are highly sensitive to choke-size adjustments [4,5,9]. Recent studies have demonstrated that flowback choke-size managements are critical for preventing proppant return [10,11,12] and securing long-term production from unconventional gas wells [13,14,15,16,17,18]. However, effective choke-size managements thus requires a reliable performance model to capture the relationships between rates, pressure, and choke size.
Theoretically modeling the choke flow is challenging for multiphase flowback due to the rapid and significant changes in rates and pressure, which can change dramatically within hours to days. Also, the flow regime transits quickly from water-dominated flow in the early stages to gas-dominated flow later [4,5,9,11]. Multiphase flow through choke is theoretically linked to the Reynolds number, which depends on fluid density, viscosity, and flow velocity [19,20,21,22]. However, the critical Reynolds number for multiphase flowback through chokes remains unclear. Also, studies have reported large variations in flowback water salinity and temperature [23,24], which cause large fluctuations in fluid density and Reynolds number, adding complexity to choke-performance modeling.
Empirical correlations have traditionally been developed to describe the multiphase flow through choke, particularly for gas and oil production [25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40]. Flowback studies have employed these empirical correlations to describe water and gas flow through choke [41,42,43]. However, these empirical correlations were originally developed for steady-state oil and gas flow. The accuracy of empirical correlation and coefficients in capturing the multiphase flow of water and gas through choke for flowback remains uncertain.
Empirical correlations of multiphase flow through choke apply within certain ranges of gas/liquid ratio and flow conditions [21,26,28]. For example, the Gilbert-type correlation assumes critical flow, where flow velocity at the choke exceeds the speed of sound [44]. The multiphase flow rates are independent of downstream pressure, and only upstream pressure is thus required for the Gilbert-type correlation. However, it remains uncertain whether the sonic velocity conditions are met during flowback, and how downstream pressure affects the accuracy of choke-performance predictions.
A recent flowback study [42] employed Perkins correlation [34], which incorporates the upstream and downstream pressure to describe multiphase flowback through chokes. However, the robustness of Perkins correlation for multiphase flowback is limited by the small datasets for calibration. Also, the measurements of downstream pressure are often unavailable during flowback, restricting the broader applications of the Perkins model in flowback choke-size managements. Beyond pressure, factors such as variations in water salinity and wellhead temperature further complicate accurate choke–performance correlation.
Machine learning (ML) provides an alternative to tackling engineering problems where physical models are incomplete and relationships between variables are highly nonlinear. Numerous studies have applied ML algorithms to model the multiphase flow through chokes [45,46,47,48,49,50,51,52], primarily targeting for oil and gas flow. ML techniques have been applied to predict gas and condensate flow behavior through chokes [48]. However, limited work has explored the usage of ML to model the multiphase flowback of water and gas through chokes, leaving a gap in understanding flowback dynamics.
In this study, we present an ML investigation to establish the choke–performance correlation for multiphase flowback. We compiled a dataset containing 18,660 flowback records of 37 shale gas wells from the Horn River Basin. We trained and tested ML algorithms on the flowback dataset to capture the relationship between choke size, pressure, and flow rates. We also evaluated the contribution of variables to choke performance. Based on our findings, we developed a simplified empirical correlation to describe gas/water multiphase flow during flowback.

2. Methodology

We conducted this work mainly through the following 5 key steps: (a) We collected and preprocessed multiphase flowback data from shale gas wells. (b) We assessed the traditional empirical correlations by fitting to collected flowback data. (c) We conducted correlation analysis of flowback features to identify potential patterns. (d) We implemented machine learning (ML) algorithms on flowback data to link flowback measurements with choke size, and identified the most important features for choke–performance relationship. (e) We developed four new choke–performance relationships for multiphase flowback data through multivariate correlation of key features.

2.1. Data Collection and Preprocessing

In this study, we compiled a flowback dataset from 37 shale gas wells in the Horn River Basin. The flowback dataset includes 18,660 surface measurements of water rate ( q w ), gas rate ( q g ), choke size ( C h o k e ), wellhead pressure ( P w h ), separator pressure ( P s e ), wellhead temperature ( T w h ), separator temperature ( T s e ), and salinity. We preprocessed the data to ensure quality and consistency. A percentile-based method was used to identify and remove outliers (the threshold of 5 t h and 95 t h percentiles is employed for outlier removal). Records with missing values were excluded to prevent bias in the analysis of choke performance.

2.2. Empirical Choke–Performance Correlations

In this study, we examined the accuracy of traditional empirical choke–performance correlations for multiphase flowback data by linear regression fitting. Table 1 lists the traditional empirical correlations, together with coefficients for each correlation. We tested empirical choke–performance correlations on flowback data via the following steps: first, we converted the flowback measurements to field units; second, we input the converted wellhead pressure, gas/water ratio, and choke size into empirical formulas, and calculated the water rate using the coefficients listed in Table 1; and third, we conducted linear fitting on the calculated and measured water rate.

2.3. Data-Driven Investigation of Multiphase Flowback Through Choke

As illustrated in Figure 1, we investigate the multiphase flowback through choke via the following key procedures:
(1) Correlation Analysis for Feature Selection: We began by analyzing the relationships among flowback variables to select the most relevant features for machine learning. Pearson, Spearman, and Kendall correlation coefficients [56,57,58] were used to evaluate the strength and consistency of these relationships.
(2) Training and Testing Data Split: Next, we split the dataset into training and testing sets. To determine the optimal ratio, we evaluated five scenarios with training-to-testing splits ranging from 50% to 90%.
(3) Testing Machine Learning Algorithms: We implemented 8 ML algorithms using Python 3.10.7 in the flowback dataset, and assessed the performance of ML algorithms in describing the choke performance. The ML algorithms include Multiple Linear Regression (MLR), Artificial Neural Network (ANN), Random Forest (RF), Support Vector Machines with both linear and radial basis functions (SVM-Linear and SVM-Radial), and 3 versions of Extreme Gradient Boosting (XGBDART, XGBLinear, and XGBTree) [59,60]. We employed these ML algorithms mainly considering their wide applications in data-driven studies of oil and gas industry. Each ML algorithm was run 100 times to eliminate the effects of randomness in training/testing splits.
In this study, we employed the performance metrics of Mean Absolute Error ( M A E ), Root Mean Squared Error ( R M S E ), and the Coefficient of Determination ( R 2 ) [61] to evaluate the predictive performance of 8 ML algorithms. M A E , R M S E , and R 2 are defined by
M A E = 1 N i = 1 N y i , measured y i , predicted
where N represents the number of recorded data; y i , measured is the actual measured value of the wellhead data recorded during flowback; and y i , predicted is the predicted value.
R M S E = 1 N i = 1 N y i , measured y i , predicted 2
R 2 = 1 i = 1 N ( y i , measured y i , predicted ) 2 i = 1 N ( y i , measured y i , average ) 2
where y i , average is the mean of the data recorded.
(4) Feature Importance Analysis: We determined the feature importance of flowback measurements using the outputs from ML algorithms. Also, we designed 5 scenarios with different combinations of flowback features to determine the optimum set of flowback features for describing the choke–performance relationship.
(5) Developing the New Choke–Performance Relationship: We established the new choke–performance relationship by considering the key features determined by the feature importance analysis. We also obtained the coefficients for the choke–performance relationship by fitting the relationship to real flowback measurement data. We further conducted a comparative analysis to obtain the recommended choke-size ranges for the new choke–performance relationship.

3. Field Application with Well/Flowback Information

In this section, we describe the field and well information for application. We report the flowback behaviors of typical shale gas wells, and the statistic results of flowback features of target wells.

3.1. Field and Well Information

In this study, we collected flowback data from 37 shale gas wells completed in the Horn River Basin (more details about the field and target wells can be found in our previous studies [13,62]). The shale gas wells are multi-fractured horizontal wells. After hydraulic fracturing, the shale gas wells remain shut for several months before flowback. Tubing string were placed at the end of flowback, and thus water and gas mainly flowed through the casing during flowback. In this study, we treat the casing pressure data as the wellhead pressure for flowback.

3.2. Flowback Operations and Surface Measurements of Typical Well

Figure 2 shows the typical flowback profiles of rates, pressure, choke size, and salinity data for a multi-fractured horizontal well completed in the Horn River Shale. The flowback data were reported hourly for target wells over a period of 7–30 days (more details on the flowback measurements can be found in previous studies [9,13,63,64,65]). Flowback generally begins with a relatively small size of choke, and the choke is gradually opened up to accelerate fracture cleanup. Each choke setting typically lasts for several hours to a few days. Most changes in choke size occur early in the flowback period, with the choke eventually stabilizing at a larger size during later stages.
Overall, the flowback profiles show declining trends in wellhead and separator pressure. Water rate initially increases with enlarging choke size, and then gradually decreases over flowback hours. The decreasing water rate is attributed to pressure depletion and multiphase flow effects within fractures [13,66]. Gas rate follows a similar pattern with water rate. Fluctuations in rates and pressure are primarily linked to the choke-size adjustments during flowback. Previous studies [5,67] have reported a “V-shape” behavior in flowback gas/water ratio.
The flowback profiles show pronounced responses of pressure and rates to the change in choke size during early flowback. As shown in Figure 2, a slight increase in choke size leads to a dramatic drop in pressure and increase in rates at early flowback. Comparatively, the flowback rates and pressure become less sensitive to choke-size changes when flowback profiles are stabilized.
Wellhead temperature initially increases, and then declines slightly during flowback. The increase in wellhead temperature is mainly caused by thermal recovery, which is described by the fracturing water being warmed by the formation rock during the extended shut-in periods after hydraulic fracturing [24]. The subsequent decline in wellhead temperature is mainly related to decreasing pressure.
The results of flowback salinity show a steady increase over time, and generally stabilizes after several hundred hours. The trend of flowback salinity reflects an increase in fluid density, and is thought to result from the mixing of injected fracturing fluid with in situ formation water. This trend has also been used as an indicator of fracture complexity in previous studies [23,68].

3.3. Statistical Analysis of Flowback Surface Measurements

Figure 3 shows the histograms of 18,660 flowback measurements, including q w , G W R , C h o k e , P w h , P s e , T w h , T s e , and salinity data (see Table 2 for the statistic results of each flowback measurement).
Most wells undergo hundreds of choke-size adjustments over flowback. The wells started flowback with a small-sized choke at 4.762 to 7.938 mm, and ended at the choke of 63.5 mm within 20 days of flowback (see Table A1 and Figure A1 in Appendix A). Figure 3 plots the distribution of choke size for flowback. The choke size generally varies from 4.762 mm to 50.8 mm (choke sizes greater than 50.8 mm were excluded from analysis due to the limited number of corresponding measurements). The choke-size data generally follow a normal distribution with a mean value of 19.3 mm (standard deviation: 6.2 mm).
Figure 3 shows asymmetric histograms for water rate, gas rate, and wellhead pressure. Water rate data follow a lognormal distribution with a long tail, suggesting a wide range across the dataset. Most wells were opened for flowback with a water rate of hundreds of cubic meters per day, which declined to just a few cubic meters per day within one to two weeks. A harmonic trend has been reported on water rate decline for shale gas wells [13,66]. The large variation in water rate is largely attributed to differences in fracturing treatment size and choke settings during early flowback.

4. Results and Discussions

4.1. Empirical Formula Fitting

Table 3 summarizes the fitting performance ( R 2 ) of five empirical formulas [26,27,28,54,55] applied to the flowback dataset. The five empirical formulas were originally developed to model gas and oil flow through chokes. Each formula was fitted to the flowback measurements, and the corresponding coefficients (a, b, c, and d) are also reported in Table 3.
Overall, the results show an R 2 below 0.333, indicating that the empirical formulas are not suitable for capturing the complex relationship between flow rates, pressure, and choke size during multiphase flowback of water and gas.

4.2. Engineering Characteristics

In Figure 4, we plot the correlation matrix between flowback parameters ( N = 18,660), including q w , G W R , C h o k e , P w h , P s e , T w h , T s e , and S a l i n i t y . We employed the Pearson, Spearman, and Kendall coefficient (r, ρ s , and τ ) correlations to measure the correlation between flowback parameters. Table 4 lists the correlation coefficients between q w and the remaining seven flowback features.
The positive and negative values of r, ρ s , and τ for rates, pressure, and choke generally align with the trends described by empirical formulas (as listed in Table 4). Water rate shows a positive correlation with choke size and a negative correlation with gas/water ratio. As shown in Table 3, the results indicate a strong dependence of water rate on G W R , T w h , T s e , and S a l i n i t y .
Overall, the results highlight strong correlations between choke size and several flowback variables, including P w h , T w h , T s e , and G W R . High inter-correlation among flowback measurements suggests strong interdependence between these parameters. Based on this analysis, we selected C h o k e , P w h , P s e , T w h , T s e , S a l i n i t y , and G W R as input features for machine learning.

4.3. Comparing the Machine Learning Algorithms

Figure 5 compares the ML-predicted and measured water rates during flowback. We evaluated eight ML algorithms: MLR, ANN, RF, RBF-SVM, LBF-SVM, XGBDART, XGBLinear, and XGBTree (refer to Appendix C for the settings of hyperparameters of each ML algorithm). We split the flowback data into training and testing data at a 70:30 ratio. We ran each algorithm 100 times to reduce the impact of random variation in data splitting.
Overall, RF, XGBTree, XGBDART, and ANN achieve higher prediction accuracy than the other four algorithms. Among them, RF consistently outperforms all the others on both training and testing datasets. Comparatively, RBF-SVM, MLR, LBF-SVM, and XGBLinear produce lower accuracy. MLR and LBF-SVM may be limited in capturing the nonlinear relationships between water rate, pressure, and choke size.
In Figure 6, we compare the performance metrics ( R 2 , M A E , and R M S E ) of eight ML algorithms on the testing data. Among the eight algorithms, RF achieves the highest R 2 and lowest M A E and R M S E . Also, the boxplots show a relatively small variation in R 2 , M A E , and R M S E across 100 runs. We thus recommend RF as a reliable ML algorithm for characterizing multiphase flowback behaviors through chokes.

4.4. Feature Selection for Choke Performance

4.4.1. Feature Importance Analysis from Machine Learning

Figure 7 compares the feature importance for predicting q w using RF, XGBTree, XGBDART, and ANN algorithms. Consistently, GWR is the most influential input across the four algorithms. Also, RF, XGBTree, and XGBDART assign high importance to wellhead/separator temperature, followed by wellhead/separator pressure. The results of ANN suggest that choke size shows a notable contribution to water rate prediction. XGBTree and XGBDART further suggest that salinity may slightly affect water rate prediction, although its influence is minor compared with other variables. These findings indicate that GWR, wellhead temperature, wellhead pressure, and choke size are key parameters influencing multiphase flowback of water and gas.
In Figure 7d, the results of ANN exhibit greater variability in feature importance across runs compared with RF, XGBTree, and XGBDART algorithms. One may expect that feature importance estimates may be more sensitive to random data splits in ANN models than in ensemble-based approaches.

4.4.2. Optimal Feature Combinations

Figure 8 compares the performance metrics for five scenarios using the RF algorithm with different combinations of input features (refer to Table 5 for the combination of features for each scenario). Each scenario was run 100 times to reduce the impact of randomness in training and testing splits (see Figure 9 for the corresponding results of the RF algorithm for five scenarios with average R 2 values for testing data).
As shown in Figure 8, the average R 2 generally increases from Scenario 1 to Scenario 5, while R M S E and M A E generally decrease. The addition of separator pressure in Scenario 2 led to a notable reduction in prediction errors, highlighting its importance in characterizing choke performance. These findings support the use of wellhead and separator pressures as practical substitutes for upstream and downstream pressures in flowback analysis.
Comparing Scenarios 2, 3, and 4 highlights that including wellhead and separator temperatures further enhanced model accuracy. However, Scenario 4 achieved the highest average R 2 and lowest R M S E and M A E among the five scenarios, suggesting that the effects of salinity are insignificant for describing choke–performance relationship during flowback. Therefore, we selected c h o k e , G W R , P w h , P s e , T w h , T s e , and q w for establishing choke–performance relationship.

4.5. The Effects of Training/Testing Split Ratio

Figure 10 compares the boxplots of R 2 , R M S E , and M A E for 10 runs using training/testing split ratios ranging from 50% to 90%. RF algorithm and features from Scenario 4 were used for the comparative analysis. Table 6 lists the statistical results of 100 runs for each split ratio.
We observe a general improvement in predictive performance with a higher training/split ratio. As the split ratio increases, the average R 2 rises slightly, while R M S E and M A E decline. The highest accuracy was achieved at a 90% split, suggesting that a larger training set enhances model performance. However, the benefits became marginal as the ratio increases further.
Figure 10 shows that the variability of prediction results increases with the split ratio. As shown by the widening boxplots in Figure 10, the range of R 2 , R M S E , and M A E grows from a 50% to 90% split. Table 6 confirms this trend through the corresponding maximum and minimum values. The greater spread implies a reduced model stability at a higher split ratio, which may result from increased noise in the training data or limited sample size in the test set. To balance accuracy and stability, we select a 70/30 training/testing split for the subsequent machine learning investigations.

5. Establishing the New Choke–Performance Relationship

In this section, we introduce four new choke–performance relationships and compare their fitting accuracy against the traditional Gilbert-type formula.

5.1. Choke–Performance Relationships

Equation (4) describes the Gilbert-type choke–performance relationship, which requires four coefficients (a, b, c, and d). As described by Equations (5)–(7), we propose three new types of choke–performance relationships by incorporating the effects of separator pressure and temperatures at both the wellhead and separator (see the results of feature analysis in Section 4.4). The new choke–performance relationships require six coefficients (a, b, c, d, e, and f), and the coefficients are determined by multivariate least squares fitting.
q w = a P w h b C h o k e c G W R d
q w = a P w h b C h o k e c ( P wh P se ) e G W R d
q w = a P w h b C h o k e c T w h T s e e G W R d
q w = a P w h b C h o k e c ( P wh P se ) e T wh T se f G W R d
Here, q w represents the water production volume, m3/d; Choke denotes the size of the choke valve, mm; P w h indicates the wellhead pressure, MPa; P s e represents the pressure in the separator, MPa; T w h signifies the wellhead temperature, K; T s e is the temperature in the separator, K; G W R is the gas/water ratio, m3/m3; and a, b, c, d, e, and f are empirical coefficients obtained by multivariate least squares fitting to flowback measurements.
In Table 7, we compare R 2 and the coefficients for four types of choke–performance relationships with and without the conditions of P s e / P w h < 0.5 , respectively (we excluded the choke size which has less than 100 measurements for the fitting). Overall, we observe an improved fitting accuracy by including T w h , suggesting the positive influence of T w h on the model.
Incorporating temperature and separator pressure slightly increases R 2 by comparing the fitting results of Equations (4)–(7). Comparatively, the inclusion of separator pressure alone shows limited benefit by comparing the fitting results of Equations (6) and (7). Also, a high fitting accuracy is reached by Equation (6). Consequently, we employ Equation (6) for subsequent analyses, considering the fitting accuracy under the two pressure ratio conditions.
Interestingly, we reach a relatively higher fitting accuracy after screening the flowback data by P s e / P w h < 0.5 . The results generally align with earlier studies which suggest that the Gilbert-type model performs best under critical flow conditions (i.e., when downstream-to-upstream pressure ratio is below 0.5). However, we recognize the mismatch between actual downstream pressure and the separator pressure used in our model. Future studies should examine the critical ratio of P s e / P w h for the choke-performance relationships.

5.2. The Effects of Choke-Size Range

In Figure 11a, we plot the fitting results of R 2 for each choke size (7.938 to 50.800 mm) based on the proposed choke-performance model (Equation (6)). We excluded the choke size with fewer than 20 data points, and also excluded the choke size of 63.5 mm due to high variation in G W R for the comparative analysis.
As shown in Figure 11a, R 2 generally increases with choke size increasing from 7.938 to 19.844 mm, aligning with the increasing number of measurements (see Figure 11a below). The results indicate that larger datasets improve the reliability of the fitted coefficients.
In general, the proposed model performs best for choke sizes between 17.462 and 34.925 mm. In this range, R 2 values typically exceed 0.9 (Figure 12). However, exceptions exist in choke sizes of 30 mm and 32.544 mm, which show a lower accuracy (see Figure A2 in Appendix B) primarily due to smaller datasets ( N = 64 and N = 444 , respectively). We therefore recommend applying the proposed model and coefficients specifically within the 17.462–34.925 mm range.
In Figure 11a, the results show a decreasing trend in R 2 for a choke size greater than 30.162 mm. The reduced accuracy at large choke sizes is partially attributed to fewer available measurements. Also, Figure 11b shows a relatively larger variation in G W R for a choke size bigger than 30.162 mm, which may further reduce model accuracy.
By applying a natural logarithm to Equation (6), the proposed model becomes conceptually similar to a multiple linear regression. However, the results of the multiple linear regression indicate limited predictive performance at extremely high water rates (refer to Figure 5). The proposed model is thus expected to perform more reliably under stabilized water rate conditions during flowback.
Figure 11c compares R 2 with G W R across all the choke sizes. Excluding low G W R values from the dataset generally reduces R 2 , suggesting the proposed model is more effective under low- G W R conditions. However, the limited data for large choke sizes also contributes to reduced accuracy. Therefore, the proposed choke-performance model shows satisfying performance in characterizing multiphase flowback. Future work is encouraged to validate the model with more flowback data, particularly for larger choke sizes and a wider range of G W R conditions.

6. Significance, Limitations, and Future Works

Our data-driven study demonstrated that ML models were a valuable tool for characterizing multiphase flowback through chokes at wellhead. Also, a simplified version of an empirical choke-performance model we proposed provides reliable predictions under multiphase conditions for flowback. The ML and empirical models offer tools that help monitor the choke conditions [69] during flowback. The models also offer a practical way to estimate flared gas volumes [70,71,72,73] during early flowback, using common field data such as water rate, choke size, and wellhead pressure. While the results are promising, this study is subject to several limitations that suggest opportunities for future research, as outlined below.
We selected the flowback data from 37 Horn River wells mainly because of the abundance of hourly flowback measurements. We also clarify that the findings of this study should not be constrained by a specific field or basin, and further validation is needed wherever flowback data from other fields are available.
We acknowledge the potential for data leakage due to the random sampling approach used to split the training and testing datasets across 37 wells. The random sampling method may allow neighboring data points from the same well and day to appear in both training and testing sets, which may artificially boost model performance. To address this concern, we conducted an additional test using a well-by-well data split. Specifically, we used flowback data from 36 wells for training and reserved one well for testing, and the results suggested that the model maintained a high predictive accuracy (see two case studies in Appendix D). The results suggest that the data-leakage effects are likely minimal. Future studies are recommended to employ time-series ML methods for multiphase choke flowback data, which may better capture temporal patterns and reduce the risk of data leakage related to sampling across time.
This study used a standard set of ML algorithms for data-driven investigation on the problem of multiphase flowback through chokes. We chose these algorithms based on their proven effectiveness in petroleum-engineering applications. Using multiple algorithms also allowed us to identify the most suitable one for our flowback data and ensure strong predictive performance. Future studies are recommended to explore more advanced ML techniques to further enhance model accuracy and generalization.

7. Summary and Conclusions

In this study, we collected a total of 18,660 measurements of a multiphase flowback dataset from 37 shale gas wells from the Horn River Basin. We performed eight machine learning algorithms on the flowback dataset to predict the choke performance, and identified the key features for describing choke–performance relationship. We further established a new choke–performance relationship for multiphase flowback based on the results of machine learning. A set of coefficients for the new choke–performance relationship is recommended for certain ranges of choke size during flowback. The key conclusions are summarized as follows:
(1) In this study, the empirical choke–performance correlations and coefficients traditionally developed for oil and gas flow are inadequate for modeling gas and water flowback data.
(2) Applying the machine learning algorithms of Random Forest, XGBTree, XGBDART, and ANN on the flowback data leads to a high prediction accuracy in choke performance. The prediction accuracy of Random Forest is improved by including the separator pressure and temperature into the inputs of choke size, gas/water ratio, and wellhead pressure and temperature.
(3) This study developed a new choke–performance relationship which links the flowback rate with choke size, gas/water ratio, wellhead pressure/temperature, and separator temperature. The accuracy of new choke–performance relationships and coefficients has been validated for choke sizes ranging from 17.462 to 34.925 mm. The new choke–performance relationship presented in this study needs further validation in other fields wherever more flowback data are available.

Author Contributions

Conceptualization, Y.F.; Software, K.H. and Y.G.; Validation, Y.F.; Formal analysis, Y.F.; Investigation, K.H. and Y.G.; Data curation, K.H. and Y.G.; Writing—original draft, K.H. and Y.G.; Writing—review & editing, Y.F.; Visualization, K.H.; Project administration, Y.F.; Funding acquisition, Y.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No. U22B2073 and No. 52104044).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

We sincerely appreciate the financial support from the National Natural Science Foundation of China (Nos. 52104044, and U22B2073). We also thank the British Columbia Energy Regulator (BCER) for making their wells’ data available to the University of Alberta.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
a , b , c , d , e , f The fitting parameters of the formula.
NThe number of recorded data.
q w The water rate, in m 3 /d.
q g The gas rate, in m 3 /d.
Q l The water rate, in STB/day.
C h o k e The size of choke, in mm.
D 64 The size of choke, in inch/64.
G W R The gas-to-water ratio, in m 3 / m 3 .
G W R * The gas-to-water ratio, in Scf/STB.
P w h The wellhead pressure, in MPa.
P w h * The wellhead pressure, in psig.
P s e The separator pressure, in MPa.
T w h The wellhead temperature, in K.
T s e The separator temperature, in K.
S a l i n i t y The salinity of flowback water, in ppm.
rPearson correlation coefficient.
ρ s Spearman correlation coefficient.
τ Kendall correlation coefficient.
R 2 The coefficient of determination.
R M S E The root mean squared error.
M A E The mean absolute error.
M L Machine learning.
M L R Multiple linear regression model.
A N N Artificial neural network model.
R F Random forest model.
S V M Support vector machine.
L B F - S V M Linear basis function–support vector machine model.
R B F - S V M Radial basis function–support vector machine model.
X G B o o s t Extreme gradient boosting.
X G B T e e Extreme gradient boosting tree model.
X G B D A R T Dropout additive regression trees model.
X G B L i n e a r Extreme gradient boosting linear model.

Appendix A

Table A1. Statistical analysis of choke size for each well (17,646 records of measurements after being preprocessed).
Table A1. Statistical analysis of choke size for each well (17,646 records of measurements after being preprocessed).
Well NameFirst Value
(mm)
Maximum
(mm)
Minimum
(mm)
Average
(mm)
Standard Deviation
(mm)
Time Range
(d)
Number of
Choke
Well-111.90663.5007.14420.86910.8145638
Well-29.52563.5007.14436.67724.24121544
Well-37.93863.5007.93824.96217.89020489
Well-411.11263.5007.14423.79216.6165546
Well-511.90663.5007.14419.86613.0431571
Well-612.70063.5006.35024.16816.94319464
Well-711.90663.5006.35021.51214.12016482
Well-87.93863.5007.93820.52410.37820488
Well-97.93863.5007.93822.35312.42839927
Well-1010.31963.5006.35023.05617.2491422
Well-1112.70063.5006.35045.31722.477291176
Well-1212.70063.5008.73144.41822.72931789
Well-139.52563.5009.52525.13312.53126622
Well-1412.70063.5006.35022.15914.03818455
Well-1511.90663.5007.93827.02116.31036868
Well-1612.70063.5004.76223.93316.25513335
Well-1711.90619.84411.90615.6242.142119
Well-1819.05063.50015.87559.23112.63814378
Well-199.52563.5009.52555.54417.0297176
Well-2020.63863.5009.52554.72115.92810399
Well-2119.05063.50019.05050.98918.6155122
Well-2217.46263.5009.52555.04915.96711385
Well-2314.28863.50014.28853.38218.9266146
Well-2414.28863.50012.70048.81419.77317506
Well-2512.70063.50012.70038.21620.391489
Well-2619.05063.50019.05056.59914.64912301
Well-2719.05063.5009.52523.91814.9329701
Well-2811.11263.50011.11226.47717.86114376
Well-2911.11263.5009.52527.28117.06018695
Well-3011.11263.50011.11229.75316.7427186
Well-319.52563.5007.14419.43713.56314779
Well-3212.70063.5009.52521.15010.09623842
Well-3311.11263.5009.52526.28615.69411344
Well-3412.70063.5004.76220.53611.54920781
Well-3511.11263.5007.93826.49617.5354281
Well-3612.70063.50011.11227.95315.9527324
Note: The time range refers to the time spent to reach the maximum choke size.
Figure A1. Statistical analysis of flowback choke size for each well ((a): Well-1 to Well 12, (b): Well-13 to Well-24, and (c): Well-25 to Well 36).
Figure A1. Statistical analysis of flowback choke size for each well ((a): Well-1 to Well 12, (b): Well-13 to Well-24, and (c): Well-25 to Well 36).
Energies 18 04381 g0a1aEnergies 18 04381 g0a1b

Appendix B

Figure A2. Scatter plot of the fitting prediction values of the new formula (Formula (6)) and the measured values for the flowback data of different choke sizes and gas/water ratios. The black solid line represents the reference line y = x .
Figure A2. Scatter plot of the fitting prediction values of the new formula (Formula (6)) and the measured values for the flowback data of different choke sizes and gas/water ratios. The black solid line represents the reference line y = x .
Energies 18 04381 g0a2

Appendix C

Table A2. Summary of ML algorithms and corresponding hyperparameters used in this study.
Table A2. Summary of ML algorithms and corresponding hyperparameters used in this study.
ModelHyperparameters
MLRRepeated k-fold cross-validation.
k is 5.
Number of repetitions is 100.
ANNRepeated k-fold cross-validation.
Maximum iterations are 500.
Neurons per hidden layer are 50–100.
Weight decay is 0.0001.
RFNumber of trees in forest is 100.
Number of variables randomly selected is 7.
Random split verification (repeated 3 times).
RBF-SVMC parameter is 2 0 .
Sigma parameter is automatically calculated (gamma = scale).
LBF-SVM100 independent random divisions verification.
C parameter is 2 0 .
XGBDARTMaximum depth of a tree is 6.
Iterations are 100.
Step size is 0.1.
Subsample ratio of columns is 0.8.
Subsample ratio of the training instances is 0.6.
XGBLinearIterations are 100.
Step size is 0.1.
XGBTreeIterations are 100.
Step size is 0.1.
Lambda is 1.0.

Appendix D

Figure A3. Two cases illustrating the minimal effects of data leakage sampling on predictive accuracy. Flowback measurements from 36 wells were treated as training dataset, and another well’s flowback data were treated as testing dataset.
Figure A3. Two cases illustrating the minimal effects of data leakage sampling on predictive accuracy. Flowback measurements from 36 wells were treated as training dataset, and another well’s flowback data were treated as testing dataset.
Energies 18 04381 g0a3

References

  1. Alkouh, A.; McKetta, S.; Wattenbarger, R.A. Estimation of effective-fracture volume using water-flowback and production data for shale-gas wells. J. Can. Pet. Technol. 2014, 53, 290–303. [Google Scholar] [CrossRef]
  2. Ezulike, O.D. Complementary Workflows for Analyzing Multiphase Flowback and Post-Flowback Production Data in Unconventional Reservoirs. Doctoral Thesis, Department of Civil and Environmental Engineering, University of Alberta, Edmonton, AB, Canada, 2017. [Google Scholar] [CrossRef]
  3. Ezulike, D.O.; Dehghanpour, H. Modelling flowback as a transient two-phase depletion process. J. Nat. Gas Sci. Eng. 2014, 19, 258–278. [Google Scholar] [CrossRef]
  4. Clarkson, C.R.; Williams-Kovacs, J. Modeling Two-Phase Flowback of Multifractured Horizontal Wells Completed in Shale. SPE J. 2013, 18, 795–812. [Google Scholar] [CrossRef]
  5. Xu, Y.; Dehghanpour, H.; Ezulike, O.; Virues, C. Effectiveness and time variation of induced fracture volume: Lessons from water flowback analysis. Fuel 2017, 210, 844–858. [Google Scholar] [CrossRef]
  6. Hossain, S.; Ezulike, O.; Fu, Y.; Dehghanpour, H. Average fracture compressibility from flowback data. SPE Prod. Oper. 2021, 36, 516–529. [Google Scholar] [CrossRef]
  7. Moussa, T.; Fu, Y.; Dehghanpour, H.; Hawkes, R. Coupled Versus Stratified Flow of Water and Hydrocarbon During Flowback and Post-flowback Processes. In Proceedings of the SPE Annual Technical Conference and Exhibition, SPE, Denver, CO, USA, 30 March–5 May 2020; p. D031S033R004. [Google Scholar]
  8. Moussa, T.; Dehghanpour, H.; Fu, Y.; Ezulike, O. The use of flowback data for estimating dynamic fracture volume and its correlation with completion-design parameters: Eagle Ford cases. J. Pet. Sci. Eng. 2020, 195, 107584. [Google Scholar] [CrossRef]
  9. Ezulike, O.D.; Dehghanpour, H. A complementary approach for uncertainty reduction in post-flowback production data analysis. J. Nat. Gas Sci. Eng. 2015, 27, 1074–1091. [Google Scholar] [CrossRef]
  10. Wang, D.; Zhang, J.; Jiang, X.; Feng, J.; Wu, Y.; Li, B.; Lu, M.; Pan, Z. Optimal packing ratio of proppant monolayer for partially-propped horizontal bedding fractures of shale. Gas Sci. Eng. 2025, 135, 205563. [Google Scholar] [CrossRef]
  11. Cheng, Y.; Li, Z.; Fu, Y.; Xu, L. Evaluating the Effects of Proppant Flowback on Fracture Conductivity in Tight Reservoirs: A Combined Analytical Modeling and Simulation Study. Energies 2024, 17, 4250. [Google Scholar] [CrossRef]
  12. Kang, Z.; Liu, Y.T.; Zhang, G.D.; Su, B.; Liu, X.F.; Tang, P.C.; Xia, B.; Hu, Y.F. A new fracturing flowback strategy based on proppant backflow factor (PBF)—A case study from Weirong and Yongchuan shale gas, Sichuan Basin, China. Pet. Sci. Technol. 2025, 1–22. [Google Scholar] [CrossRef]
  13. Fu, Y.; Dehghanpour, H.; Motealleh, S.; Lopez, C.; Hawkes, R. Evaluating Fracture Volume Loss During Flowback and Its Relationship to Choke Size: Fastback vs. Slowback. SPE Prod. Oper. 2019, 34, 615–624. [Google Scholar] [CrossRef]
  14. Deen, T.; Daal, J.; Tucker, J. Maximizing well deliverability in the Eagle Ford shale through flowback operations. In Proceedings of the SPE Annual Technical Conference and Exhibition, SPE, Houston, TX, USA, 28–30 September 2015; p. D011S002R005. [Google Scholar]
  15. Osiptsov, A.; Garagash, I.; Boronin, S.; Tolmacheva, K.; Lezhnev, K.; Paderin, G. Impact of flowback dynamics on fracture conductivity. J. Pet. Sci. Eng. 2020, 188, 106822. [Google Scholar] [CrossRef]
  16. Bagci, S.; Stolyarov, S. Flowback production optimization for choke size management strategies in unconventional wells. In Proceedings of the SPE Annual Technical Conference and Exhibition, SPE, Calgary, AB, Canada, 30 September–2 October 2019; p. D021S023R001. [Google Scholar]
  17. Potapenko, D.; Theuveny, B.; Williams, R.; Moncada, K.; Campos, M.; Spesivtsev, P.; Willberg, D. State of the art of flow management for frac plug drillout and flowback. In Proceedings of the SPE Annual Technical Conference and Exhibition, SPE, Calgary, AB, Canada, 30 September–2 October 2019; p. D021S023R002. [Google Scholar]
  18. Tompkins, D.; Sieker, R.; Koseluk, D.; Cartaya, H. Managed Pressure Flowback in Unconventional Reservoirs: A Permian Basin Case Study. In Proceedings of the SPE/AAPG/SEG Unconventional Resources Technology Conference. URTeC, San Antonio, TX, USA, 1–3 August 2016; pp. 2687–2696. [Google Scholar]
  19. Darby, R.; Molavi, K. Viscosity correction factor for safety relief valves. Process Saf. Prog. 1997, 16, 80–82. [Google Scholar] [CrossRef]
  20. Darby, R. Correlate pressure drops through fittings. Chem. Eng. 1999, 106, 101–104. [Google Scholar]
  21. Carstensen, C.M.; Kanstad, S.K. Multiphase Flow Through Chokes-An Evaluation of Frozen, Equilibrium, and Nonequilibrium Flow Models. J. Pet. Sci. Eng. 2022, 215, 110402. [Google Scholar] [CrossRef]
  22. Yan, G.; Li, Z.; Bore, T.; Torres, S.A.G.; Scheuermann, A.; Li, L. A lattice Boltzmann exploration of two-phase displacement in 2D porous media under various pressure boundary conditions. J. Rock Mech. Geotech. Eng. 2022, 14, 1782–1798. [Google Scholar] [CrossRef]
  23. Zolfaghari, A.; Dehghanpour, H.; Ghanbari, E.; Bearinger, D. Fracture characterization using flowback salt-concentration transient. SPE J. 2016, 21, 233–244. [Google Scholar] [CrossRef]
  24. Yang, S.; Lai, F.; Li, Z.; Fu, Y.; Wang, K.; Zhang, L.; Liang, Y. The effect of temperature on flowback data analysis in shale gas reservoirs: A simulation-based study. Energies 2019, 12, 3751. [Google Scholar] [CrossRef]
  25. Omana, R.; Houssiere, C., Jr.; Brown, K.E.; Brill, J.P.; Thompson, R.E. Multiphase flow through chokes. In Proceedings of the SPE Annual Technical Conference and Exhibition, SPE, Denver, CO, USA, 28 September–1 October 1969; p. SPE-2682. [Google Scholar]
  26. Gilbert, W. Flowing and gas-lift well performance. In Drilling and Production Practice; SPE: Los Angeles, CA, USA, 1954. [Google Scholar]
  27. Baxendell, P. Bean Performance-Lake Wells; Shell Internal Report; Shell: London, UK, 1957. [Google Scholar]
  28. Ros, N. An analysis of critical simultaneous gas/liquid flow through a restriction and its application to flowmetering. Appl. Sci. Res. 1960, 9, 374–388. [Google Scholar] [CrossRef]
  29. Fortunati, F. Two-phase flow through wellhead chokes. In Proceedings of the SPE Europec Featured at EAGE Conference and Exhibition, SPE, Amsterdam, The Netherlands, 16–19 May 1972; p. SPE-3742. [Google Scholar]
  30. Ashford, F. An evaluation of critical multiphase flow performance through wellhead chokes. J. Pet. Technol. 1974, 26, 843–850. [Google Scholar] [CrossRef]
  31. Ashford, F.; Pierce, P.E. The Determination of Multiphase Pressure Drops and Flow Capacities in Downhole Safety Valves (Storm Chokes). Pap. SPE 1974, 5161, 6–9. [Google Scholar]
  32. Al-Attar, H.; Abdul-Majeed, G. Revised bean performance equation for East Baghdad oil wells. SPE Prod. Eng. 1988, 3, 127–131. [Google Scholar] [CrossRef]
  33. Al-Attar, H.H. New correlations for critical and subcritical two-phase flow through surface chokes in high-rate oil wells. In Proceedings of the SPE Latin America and Caribbean Petroleum Engineering Conference, SPE, Cartagena, Columbia, 31 May–3 June 2009; p. SPE-120788. [Google Scholar]
  34. Perkins, T.K. Critical and subcritical flow of multiphase mixtures through chokes. SPE Drill. Complet. 1993, 8, 271–276. [Google Scholar] [CrossRef]
  35. Osman, M.E.; Dokla, M.E. Gas condensate flow through chokes. In Proceedings of the SPE Europec featured at EAGE Conference and Exhibition, SPE, The Hague, The Netherlands, 17–20 October 1990; p. SPE-20988. [Google Scholar]
  36. Mirzaei-Paiaman, A.; Salavati, S. A new empirical correlation for sonic simultaneous flow of oil and gas through wellhead chokes for Persian oil fields. Energy Sources Part A Recovery Util. Environ. Eff. 2013, 35, 817–825. [Google Scholar] [CrossRef]
  37. Al-Khalifa, M.A.; Al-Marhoun, M.A. Application of neural network for two-phase flow through chokes. In Proceedings of the SPE Kingdom of Saudi Arabia Annual Technical Symposium and Exhibition, SPE, Al-Khobar, Saudi Arabia, 24–27 May 2013; p. SPE-169597. [Google Scholar]
  38. Kaydani, H.; Najafzadeh, M.; Mohebbi, A. Wellhead choke performance in oil well pipeline systems based on genetic programming. J. Pipeline Syst. Eng. Pract. 2014, 5, 06014001. [Google Scholar] [CrossRef]
  39. Surbey, D.; Kelkar, B.; Brill, J. Study of subcritical flow through multiple-orifice valves. SPE Prod. Eng. 1988, 3, 103–108. [Google Scholar] [CrossRef]
  40. Surbey, D.; Kelkar, B.; Brill, J. Study of multiphase critical flow through wellhead chokes. SPE Prod. Eng. 1989, 4, 142–146. [Google Scholar] [CrossRef]
  41. Yang, Z.; Dong, Z.; Guo, W.; Zhang, X.; Hou, T.; Zou, L.; Li, W.; Lin, K. Optimizing Choke Operations in Shale Gas Horizontal Wells: A Comprehensive Study. Improv. Oil Gas Recovery 2025, 9. [Google Scholar] [CrossRef]
  42. Cano, P.N.; Irazuzta, V.L.; Curtti, M.A.; Álvarez, M.G. Flowback and Well Testing Operational Learnings During an Early Production Stage in a Vaca Muerta Field. In Proceedings of the SPE Argentina Exploration and Production of Unconventional Resources Symposium. SPE, Buenos Aires, Argentina, 20–22 March 2023; p. D021S009R001. [Google Scholar]
  43. Jiang, Y.; Tang, W.; Li, Y.; Zhou, X.; Chen, J. Piecewise Gilbert-type correlation for two-phase flowback through wellhead chokes in hydraulically fractured shale gas wells. Pet. Sci. Technol. 2024, 42, 428–447. [Google Scholar] [CrossRef]
  44. Nasriani, H.R.; Khan, K.; Graham, T.; Ndlovu, S.; Nasriani, M.; Mai, J.; Rafiee, M.R. An investigation into sub-critical choke flow performance in high rate gas condensate wells. Energies 2019, 12, 3992. [Google Scholar] [CrossRef]
  45. Dabiri, M.S.; Hadavimoghaddam, F.; Ashoorian, S.; Schaffie, M.; Hemmati-Sarapardeh, A. Modeling liquid rate through wellhead chokes using machine learning techniques. Sci. Rep. 2024, 14, 6945. [Google Scholar] [CrossRef]
  46. Ghorbani, H.; Wood, D.A.; Choubineh, A.; Tatar, A.; Mohamadian, N. Prediction of oil flow rate through an orifice flow meter: Artificial intelligence alternatives compared. Petroleum 2020, 6, 404–414. [Google Scholar] [CrossRef]
  47. Choubineh, A.; Ghorbani, H.; Wood, D.A.; Moosavi, S.R.; Khalafi, E.; Sadatshojaei, E. Improved predictions of wellhead choke liquid critical-flow rates: Modelling based on hybrid neural network training learning based optimization. Fuel 2017, 207, 547–560. [Google Scholar] [CrossRef]
  48. Ghorbani, H.; Moghadasi, J.; Wood, D.A. Prediction of gas flow rates from gas condensate reservoirs through wellhead chokes using a firefly optimization algorithm. J. Nat. Gas Sci. Eng. 2017, 45, 256–271. [Google Scholar] [CrossRef]
  49. Rashid, S.; Ghamartale, A.; Abbasi, J.; Darvish, H.; Tatar, A. Prediction of Critical Multiphase Flow Through Chokes by Using A Rigorous Artificial Neural Network Method. Flow Meas. Instrum. 2019, 69, 101579. [Google Scholar] [CrossRef]
  50. Khamis, M.; Elhaj, M.; Abdulraheem, A. Optimization of choke size for two-phase flow using artificial intelligence. J. Pet. Explor. Prod. Technol. 2020, 10, 14. [Google Scholar] [CrossRef]
  51. Gorjaei, R.G.; Songolzadeh, R.; Torkaman, M.; Safari, M.; Zargar, G. A novel PSO-LSSVM model for predicting liquid rate of two phase flow through wellhead chokes. J. Nat. Gas Sci. Eng. 2015, 24, 10. [Google Scholar] [CrossRef]
  52. AlAjmi, M.D.; Alarifi, S.A.; Mahsoon, A.H. Improving multiphase choke performance prediction and well production test validation using artificial intelligence: A new milestone. In Proceedings of the SPE Digital Energy Conference and Exhibition. SPE, The Woodlands, TX, USA, 3–5 March 2015; p. D031S022R003. [Google Scholar]
  53. Barjouei, H.S.; Ghorbani, H.; Mohamadian, N.; Wood, D.A.; Davoodi, S.; Moghadasi, J.; Saberi, H. Prediction performance advantages of deep machine learning algorithms for two-phase flow rates through wellhead chokes. J. Pet. Explor. Prod. 2021, 11, 1233–1261. [Google Scholar] [CrossRef]
  54. Achong, I. Revised Bean Performance Formula for Lake Maracaibo Wells; Internal Company Report; Shell Oil Co.: Houston, TX, USA, 1961. [Google Scholar]
  55. Pilehvari, A. Experimental Study of Critical Two-Phase Flow Through Wellhead Chokes: University of Tulsa; Technical report, Research report; University of Tulsa Fluid Flow Projects: Tulsa, OK, USA, 1981. [Google Scholar]
  56. Ansari, S.; Mohammadi, M.R.; Bahmaninia, H.; Hemmati-Sarapardeh, A.; Schaffie, M.; Norouzi-Apourvari, S.; Ranjbar, M. Experimental measurement and modeling of asphaltene adsorption onto iron oxide and lime nanoparticles in the presence and absence of water. Sci. Rep. 2023, 13, 122. [Google Scholar] [CrossRef]
  57. Kumar, J.A.; Abirami, S. Aspect-based opinion ranking framework for product reviews using a Spearman’s rank correlation coefficient method. Inf. Sci. 2018, 460–461, 23–41. [Google Scholar]
  58. Lapata, M. Automatic Evaluation of Information Ordering: Kendall’s Tau. Comput. Linguist. 2006, 32, 471–484. [Google Scholar] [CrossRef]
  59. Breiman. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  60. Rostamian, A.; Heidaryan, E.; Ostadhassan, M. Evaluation of different machine learning frameworks to predict CNL-FDC-PEF logs via hyperparameters optimization and feature selection. J. Pet. Sci. Eng. 2022, 208, 109463. [Google Scholar] [CrossRef]
  61. Wang, D.; Li, Z.; Fu, Y. Production Forecast of Deep-Coalbed-Methane Wells Based on Long Short-Term Memory and Bayesian Optimization. SPE J. 2024, 29, 3651–3672. [Google Scholar] [CrossRef]
  62. Fu, Y.; Dehghanpour, H. How far can hydraulic fractures go? A comparative analysis of water flowback, tracer, and microseismic data from the Horn River Basin. Mar. Pet. Geol. 2020, 115, 104259. [Google Scholar] [CrossRef]
  63. Abbasi, M.A. A Comparative Study of Flowback Rate and Pressure Transient Behaviour in Multifractured Horizontal Wells. Master’s Thesis, University of Alberta, Edmonton, AB, Canada, 2013. [Google Scholar]
  64. Abbasi, M.A.; Ezulike, D.O.; Dehghanpour, H.; Hawkes, R.V. A comparative study of flowback rate and pressure transient behavior in multifractured horizontal wells completed in tight gas and oil reservoirs. J. Nat. Gas Sci. Eng. 2014, 17, 82–93. [Google Scholar] [CrossRef]
  65. Xu, Y.; Adefidipe, O.; Dehghanpour, H. Estimating fracture volume using flowback data from the Horn River Basin: A material balance approach. J. Nat. Gas Sci. Eng. 2015, 25, 253–270. [Google Scholar] [CrossRef]
  66. Fu, Y.; Dehghanpour, H. Advances in flowback analysis: Fracturing water production obeys a simple decline model. In Unconventional Shale Gas Development; Elsevier: Amsterdam, The Netherlands, 2022; pp. 299–321. [Google Scholar]
  67. Ghanbari, E.; Dehghanpour, H. The fate of fracturing water: A field and simulation study. Fuel 2016, 163, 282–294. [Google Scholar] [CrossRef]
  68. Sharak, A.Z. Analysis of Shale-Water Interactions and Flowback Water Chemistry for Fracture Characterization. Doctoral Thesis, Department of Civil and Environmental Engineering, University of Alberta, Edmonton, AB, Canada, 2018. [Google Scholar] [CrossRef]
  69. Sæther, J.H. Choke Condition and Performance Monitoring. Master’s Thesis, Norges Teknisk-Naturvitenskapelige Universitet, Trondheim, Norway, 2010. [Google Scholar]
  70. Glazer, Y.R.; Davidson, F.T.; Lee, J.J.; Webber, M.E. An inventory and engineering assessment of flared gas and liquid waste streams from hydraulic fracturing in the USA. Curr. Sustain./Renew. Energy Rep. 2017, 4, 219–231. [Google Scholar] [CrossRef]
  71. Kamel, A.; Alzahabi, A. Pilot Demonstration of a New Tip for Effective Gas Flaring in the Permian Basin: Part 1. ACS Omega 2023, 8, 47440–47451. [Google Scholar] [CrossRef]
  72. Shaw, J.T.; Allen, G.; Pitt, J.; Shah, A.; Wilde, S.; Stamford, L.; Fan, Z.; Ricketts, H.; Williams, P.I.; Bateson, P.; et al. Methane flux from flowback operations at a shale gas site. J. Air Waste Manag. Assoc. 2020, 70, 1324–1339. [Google Scholar] [CrossRef] [PubMed]
  73. Kar, A.; Bahadur, V. Using excess natural gas for reverse osmosis-based flowback water treatment in US shale fields. Energy 2020, 196, 117145. [Google Scholar] [CrossRef]
Figure 1. Flowchart of data-driven investigation on multiphase flowback through chokes.
Figure 1. Flowchart of data-driven investigation on multiphase flowback through chokes.
Energies 18 04381 g001
Figure 2. The flowback profiles of rates, wellhead/separator pressures and temperature, choke size, and salinity for a typical multi-fractured horizontal well completed in the Horn River Shale.
Figure 2. The flowback profiles of rates, wellhead/separator pressures and temperature, choke size, and salinity for a typical multi-fractured horizontal well completed in the Horn River Shale.
Energies 18 04381 g002
Figure 3. Statistical distribution of characteristic parameters of 37 wells. The distribution curves were fitted using Gaussian normal distribution and lognormal distribution models.
Figure 3. Statistical distribution of characteristic parameters of 37 wells. The distribution curves were fitted using Gaussian normal distribution and lognormal distribution models.
Energies 18 04381 g003
Figure 4. Correlation matrix of (a) Pearson, (b) Spearman and (c) Kendall coefficients between flowback measurements.
Figure 4. Correlation matrix of (a) Pearson, (b) Spearman and (c) Kendall coefficients between flowback measurements.
Energies 18 04381 g004
Figure 5. Scatter plots of machine learning-predicted water rate versus measured values for training and testing data using eight algorithms. The training and testing data are represented in black and blue, respectively, with a fixed 7:3 split ratio. The black solid line represents the y = x reference line.
Figure 5. Scatter plots of machine learning-predicted water rate versus measured values for training and testing data using eight algorithms. The training and testing data are represented in black and blue, respectively, with a fixed 7:3 split ratio. The black solid line represents the y = x reference line.
Energies 18 04381 g005
Figure 6. Boxplots comparing the prediction performance of (a) R 2 , (b) R M S E , and (c) M A E of eight machine learning algorithms. Each algorithm was run 100 times for statistic analysis.
Figure 6. Boxplots comparing the prediction performance of (a) R 2 , (b) R M S E , and (c) M A E of eight machine learning algorithms. Each algorithm was run 100 times for statistic analysis.
Energies 18 04381 g006
Figure 7. Boxplots comparing the variable importance interpreted by the (a) RF, (b) XGBTree, (c) XGBDART, and (d) ANN machine learning algorithms based on the statistic results of 100 runs.
Figure 7. Boxplots comparing the variable importance interpreted by the (a) RF, (b) XGBTree, (c) XGBDART, and (d) ANN machine learning algorithms based on the statistic results of 100 runs.
Energies 18 04381 g007
Figure 8. Boxplots of (a) R 2 , (b) R M S E , and (c) M A E for 5 scenarios with different combinations of features feeding into RF algorithm. Each scenario was run 100 times with different random seeds, resulting in a total of 500 observations, as illustrated in the boxplots for each scenario.
Figure 8. Boxplots of (a) R 2 , (b) R M S E , and (c) M A E for 5 scenarios with different combinations of features feeding into RF algorithm. Each scenario was run 100 times with different random seeds, resulting in a total of 500 observations, as illustrated in the boxplots for each scenario.
Energies 18 04381 g008
Figure 9. Fitting results of RF-predicted and measured values of water rate predictions for 5 scenarios for a selected run with fixed training and testing datasets. The training data and test data are represented by black and blue, respectively.
Figure 9. Fitting results of RF-predicted and measured values of water rate predictions for 5 scenarios for a selected run with fixed training and testing datasets. The training data and test data are represented by black and blue, respectively.
Energies 18 04381 g009
Figure 10. Boxplots comparing the prediction metrics, including (a) R 2 , (b) R M S E , and (c) M A E , with training/testing split ratios for Scenario 4 using RF model. At each split ratio, the RF model was run 100 times with a different random seed.
Figure 10. Boxplots comparing the prediction metrics, including (a) R 2 , (b) R M S E , and (c) M A E , with training/testing split ratios for Scenario 4 using RF model. At each split ratio, the RF model was run 100 times with a different random seed.
Energies 18 04381 g010
Figure 11. Fitting results of new choke–performance relationship to the flowback measurements: (a) Comparing R 2 for each choke size (upper) with total data sample numbers (lower) for multivariate regression fitting between choke-performance model and measurements; (b) the gas/water ratio changes as a function of choke size; (c) boxplots comparing the fitting results of R 2 with varying gas/water ratio (the choke size with limited numbers were excluded for correlation analysis).
Figure 11. Fitting results of new choke–performance relationship to the flowback measurements: (a) Comparing R 2 for each choke size (upper) with total data sample numbers (lower) for multivariate regression fitting between choke-performance model and measurements; (b) the gas/water ratio changes as a function of choke size; (c) boxplots comparing the fitting results of R 2 with varying gas/water ratio (the choke size with limited numbers were excluded for correlation analysis).
Energies 18 04381 g011
Figure 12. Scatter plot of measured water rate with that calculated by the choke–performance relationship (Equation (6)) with coefficients listed in Table 7) for the chokes with a fitting R 2 > 0.9. The black solid line represents the reference line y = x. The transparency of colors for data points highlights the values of G W R .
Figure 12. Scatter plot of measured water rate with that calculated by the choke–performance relationship (Equation (6)) with coefficients listed in Table 7) for the chokes with a fitting R 2 > 0.9. The black solid line represents the reference line y = x. The transparency of colors for data points highlights the values of G W R .
Energies 18 04381 g012
Table 1. Empirical correlations and coefficients for multiphase flow through chokes [53].
Table 1. Empirical correlations and coefficients for multiphase flow through chokes [53].
ReferencesEquationEmpirical FormulaCoefficient
Gilbert [26](1) Q l = a P w h b D 64 c G W R d a = 0.1 , b = 1 ,
c = 1.89 , d = 0.546
Baxendell [27](2) Q l = a P w h b D 64 c G W R d a = 0.1046 , b = 1 ,
c = 1.93 , d = 0.546
Ros [28](3) Q l = a P w h b D 64 c G W R d a = 0.05747 , b = 1 ,
c = 2.00 , d = 0.500
Achong [54](4) Q l = a P w h b D 64 c G W R d a = 0.26178 , b = 1 ,
c = 1.88 , d = 0.650
Pilehvari [55](5) Q l = a P w h b D 64 c G W R d a = 0.021427 , b = 1 ,
c = 2.11 , d = 0.313
Note that Q l represents the liquid rate, STB/day; D 64 denotes the choke size in 1/64 inch; P wh * indicates the wellhead pressure, psig; GWR * is the gas/liquid ratio, Scf/STB; a, b, c, and d are empirical coefficients. All the parameters are in field units.
Table 2. Statistics of 18,660 flowback surface measurements from 37 Horn River Shale gas wells.
Table 2. Statistics of 18,660 flowback surface measurements from 37 Horn River Shale gas wells.
Flowback FeaturesAbbreviationsMaximum ValueMinimum ValueAverage Value
Gas Rate ( m 3 / d ) q g 1,087,00071,830429,803
Water Rate ( m 3 / d ) q w 27060288.6
Gas/Water Ratio ( m 3 / m 3 ) G W R 36,693,22233.17800
Choke Size (mm) C h o k e 63.54.833.3
Wellhead Pressure (MPa) P w h 31.12.412.3
Separator Pressure (MPa) P s e 9.31.14.9
Wellhead Temperature (K) T w h 390.0277.0368.8
Separator Temperature (K) T s e 381.0274.0360.2
Salinity (ppm) S a l i n i t y 340,000400030,266
Table 3. Fitting results for five empirical formulas and coefficients (note that the parameters and coefficients were developed based on field units).
Table 3. Fitting results for five empirical formulas and coefficients (note that the parameters and coefficients were developed based on field units).
Empirical ModelFormulasCoefficients R 2
Gilbert [26] Q l = a P w h b D 64 c G W R d a = 0.1 , b = 1 , c = 1.89 , d = 0.546 0.287
Baxendell [27] Q l = a P w h b D 64 c G W R d a = 0.1046 , b = 1 , c = 1.93 , d = 0.546 0.277
Ros [28] Q l = a P w h b D 64 c G W R d a = 0.05747 , b = 1 , c = 2.00 , d = 0.500 0.241
Achong [54] Q l = a P w h b D 64 c G W R d a = 0.26178 , b = 1 , c = 1.88 , d = 0.650 0.333
Pilehvari [55] Q l = a P w h b D 64 c G W R d a = 0.021427 , b = 1 , c = 2.11 , d = 0.313 0.145
Table 4. Correlation coefficients of water rate with 7 flowback parameters.
Table 4. Correlation coefficients of water rate with 7 flowback parameters.
FeaturesCorrelation
Pearson
Coefficient ( r )
Spearman
Coefficient ( ρ s )
Kendall
Coefficient ( τ )
C h o k e 0.190.290.20
G W R −0.49−0.86−0.68
P w h −0.20−0.19−0.12
P s e 0.100.050.03
T w h 0.630.660.50
T s e 0.600.640.48
S a l i n i t y −0.32−0.32−0.23
Table 5. Statistics of RF models’ performance metrics for 5 scenarios with different combinations of input features.
Table 5. Statistics of RF models’ performance metrics for 5 scenarios with different combinations of input features.
ScenariosInput FeaturesPerformance MetricsMaximum ValueMinimum ValueAverage Value
R 2 0.9450.8580.900
1 C h o k e , G W R , P w h R M S E 68.87139.10853.641
M A E 27.01323.58125.251
R 2 0.9520.8630.908
2 C h o k e , G W R , P w h , P s e R M S E 67.50836.48151.313
M A E 23.48920.53222.045
C h o k e , G W R , P w h , P s e R 2 0.9520.8660.912
3 R M S E 66.23536.39950.055
T w h M A E 22.83219.66221.192
R 2 0.9530.8650.914
4 C h o k e , G W R , P w h , P s e R M S E 66.50335.53349.466
T w h , T s e M A E 22.21318.99320.542
R 2 0.9540.8650.914
5 C h o k e , G W R , P w h , P s e R M S E 66.49035.46549.575
T w h , T s e , S a l i n i t y M A E 22.27118.85120.474
Table 6. Statistics of performance metrics of 100 runs of RF model for varying training/testing split ratio.
Table 6. Statistics of performance metrics of 100 runs of RF model for varying training/testing split ratio.
Training/Testing Split RatioPerformance MetricsMaximum ValueMinimum ValueAverage Value
R 2 0.9450.8780.913
5:5 R M S E 60.96339.34950.161
M A E 22.74520.38921.386
R 2 0.9520.8740.916
6:4 R M S E 61.16636.38849.154
M A E 22.04919.39420.873
R 2 0.9530.8650.914
7:3 R M S E 66.51235.54249.470
M A E 22.21518.99920.543
R 2 0.9620.8650.922
8:2 R M S E 66.99632.57646.925
M A E 21.87718.37220.093
R 2 0.9630.7800.922
9:1 R M S E 82.40530.32546.219
M A E 22.85017.76919.855
Table 7. Fitting results of four choke-performance relationships with and without the condition of P s e P w h < 0.5 , respectively (the relationships and coefficients are described in SI units).
Table 7. Fitting results of four choke-performance relationships with and without the condition of P s e P w h < 0.5 , respectively (the relationships and coefficients are described in SI units).
FormulaAll Flowback DataFlowback Data with P se P wh < 0.5
R 2 ( N = 10,623) Coefficients R 2 ( N = 9512) Coefficients
q w = a P w h b C h o k e c G W R d 0.919a = 292.1303; b = 0.7344;
c = 1.2602; d = 0.7921
0.921a = 480.5830; b = 1.0008;
c = 1.3178; d = 0.8091
q w = a P w h b C h o k e c ( P w h P s e ) e G W R d 0.930a = 231.0883; b = 0.8067;
c = 1.0591; d = 0.7723;
e = −0.3309
0.933a = 375.1778; b = 1.1307;
c = 1.1563; d = 0.7905;
e = −0.4238
q w = a P w h b C h o k e c T w h T s e e G W R d 0.932a = 633.9690; b = 0.5943;
c = 1.0874; d = 0.7662;
e = −0.8210
0.933a = 469.2333; b = 0.6581;
c = 1.1816; d = 0.7851;
e = −0.7563
q w = a P w h b C h o k e c ( P w h P s e ) e T w h T s e f G W R d 0.932a = 616.4024; b = 0.6647;
c = 1.0757; d = 0.7664;
e = −0.0615; f = −0.6988
0.933a = 396.6285; b = 1.0439;
c = 1.1577; d = 0.7890;
e = −0.3490; f = −0.1506
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, K.; Fu, Y.; Guo, Y. Wellhead Choke Performance for Multiphase Flowback: A Data-Driven Investigation on Shale Gas Wells. Energies 2025, 18, 4381. https://doi.org/10.3390/en18164381

AMA Style

Huang K, Fu Y, Guo Y. Wellhead Choke Performance for Multiphase Flowback: A Data-Driven Investigation on Shale Gas Wells. Energies. 2025; 18(16):4381. https://doi.org/10.3390/en18164381

Chicago/Turabian Style

Huang, Kundai, Yingkun Fu, and Yufei Guo. 2025. "Wellhead Choke Performance for Multiphase Flowback: A Data-Driven Investigation on Shale Gas Wells" Energies 18, no. 16: 4381. https://doi.org/10.3390/en18164381

APA Style

Huang, K., Fu, Y., & Guo, Y. (2025). Wellhead Choke Performance for Multiphase Flowback: A Data-Driven Investigation on Shale Gas Wells. Energies, 18(16), 4381. https://doi.org/10.3390/en18164381

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop