Machine Learning Applications in Modelling and Analysis of Base Pressure in Suddenly Expanded Flows

: Base pressure becomes a decisive factor in governing the base drag of aerodynamic vehicles. While several experimental and numerical methods have already been used for base pressure analysis in suddenly expanded ﬂows, their implementation is quite time consuming. Therefore, we must develop a progressive approach to determine base pressure ( β ). Furthermore, a direct consideration of the inﬂuence of ﬂow and geometric parameters cannot be studied by using these methods. This study develops a platform for data-driven analysis of base pressure ( β ) prediction in suddenly expanded ﬂows, in which the inﬂuence of ﬂow and geometric parameters including Mach number (M), nozzle pressure ratio ( η ), area ratio ( α ), and length to diameter ratio ( ϕ ) have been studied. Three different machine learning (ML) models, namely, artiﬁcial neural networks (ANN), support vector machine (SVM), and random forest (RF), have been trained using a large amount of data developed from response equations. The response equations for base pressure ( β ) were created using the response surface methodology (RSM) approach. The predicted results are compared with the experimental results to validate the proposed platform. The results obtained from this work can be applied in the right way to maximize base pressure in rockets and missiles to minimize base drag.


Introduction
A flow subjected to sudden expansion will always have flow separation and flow reattachment associated with it ( Figure 1). Such flows with a detached and high-speed nature have not been thoroughly analyzed yet, even though much research has been carried out. Hence, we are unable to predict the nature of these flows. Due to the complicated nature of these flows, their prediction becomes very uncertain. This includes recirculation, shock, and high pressure and velocity gradients [1][2][3][4]. As the shear layer exits the nozzle, a sub-atmospheric recirculation region is developed at the base, with high turbulence, high Reynolds number, and highly compressed shear flow. Such situations are critically important at the base of aerodynamic vehicles like projectiles, missiles, and rockets. The development of a recirculation zone at supersonic Mach numbers for these aerodynamic vehicles with blunt bodies could result in voluminous amounts of base drag which is detrimental to the vehicles' performance. It has already been documented that the base drag values account for almost 60% of the total drag force for aerodynamic vehicles [5,6]. Korst (1956) was the first to research base pressure in the supersonic Mach number regime. He developed a physical model that predicted base pressure by considering the jet and wake shear flow dependencies. Khan and Rathakrishnan [1,3,4,7] studied the influence of various parameters like Mach number (M), nozzle pressure ratio (η), area ratio (α), and length to diameter ratio (ϕ) on base pressure (β) experimentally. They established a direct relationship between normalized base pressure with nozzle (η) and area ratio (α). They also tried to modify base pressure through active controls by blowing through microjets [8][9][10].
Here, the air is drawn from the settling chamber and fed to the control chamber for controlling base pressure through blowing. It was established that microjets were successful in manipulating base pressure to a certain extent. Khan and Rathakrishnan [2] also studied boundary layer regulation in submerged bodies by reducing drag. They employed a suction on the boundary layer that eventually augmented pressure leading to separation control in axisymmetric aerodynamic bodies. Yet another method for controlling or regulating base pressure was employing passive flow control. Here, a geometrical adjustment is made downstream of the nozzle exit that causes variation in base pressure. Khurana et al. [11] conducted experiments on the blunt-body by implementing spikes. Spikes of different shapes such as flat, square, and conical were used. The results showed that the spikes were able to manipulate base pressure (β) effectively. Vikramaditya et al. [12] examined the base cavity's effect on pressure fluctuations for a typical missile configuration.
The experiments were conducted for a Mach number ranging from 0.7 to 1.0 in a wind tunnel test. The results concluded that passive controls in base cavities could act as potential drag reduction techniques for different Mach numbers, mainly for jet-off condition flows. In the recent past, passive control through a backward-facing step (BFS), with the expanded duct having duct dimples implemented in the base region, was investigated by Khan et al. [13]. The nozzle pressure ratios (η) used for the study were 1.28, 1.40, 1.55, and 1.70, and the length to diameter ratios (ϕ) used were varied from 10 to 4. They included dimples of 3 mm diameter along the square duct at 1800 intervals at a pitch circle diameter (PCD) of 23 mm at the nozzle exit. The influence of dimples on flow patterns was analyzed by conducting a computational fluid dynamics (CFD) analysis of the pressure and velocity contours. According to the literature presented above, several ways to determine base pressure through experiments have been demonstrated by various researchers.
Soft computing is an approach that reflects the extraordinary capacity of the human imagination when confronting problems that involve ambiguity and imprecision (Raibaudo et al. [14]). These techniques incorporate various computing models such as fuzzy logic, optimization techniques, regression analysis, neural networks, etc. Quadros et al. [15] used the fuzzy logic approach to predict the primary recirculation region's length developed from the suddenly expanded flow process. Flow simulations were conducted using the CFD analysis [16,17]. The input variables were M, η, and expansion corners (€). Fuzzy logic membership functions, namely triangular, generalized bell shape, and Gaussian membership functions, were used. The results found the triangular membership function to have the least percentage error of 9.0705%.
Similarly, the artificial neural network (ANN) approach was implemented to predict the different aerodynamic coefficients of an airplane configuration. Nejat et al. [18] used the ANN approach to estimate lift coefficients for a selected Reynolds number in a NACA0012 airfoil at various incidence angles. The ANN results were validated using the CFD approach. The work concluded that the ANN approach reasonably predicted lift coefficient, leading to a preliminary design involving low computation cost. Using the CFD and neural networks approach, Quadros and Khan [19] developed a predictive model for base pressure. A CFD database was created to train the network. The input variables were M, η, and α, and the output was base pressure (β). The Levenberg-Marquardt algorithm was used for optimization. The ANN model successfully predicted base pressure with a regression coefficient (R 2 ) of less than 0.99 and a root mean square error (RMSE) of 0.0032.
Using the ANN weight coefficients, the influence of parameters demonstrated M as the dominating variable that primarily affected base pressure (β). The other studies involve optimizing base pressure (β) conducted by Quadros et al. [20], wherein an experimental design approach was implemented by considering various kinematic and geometric parameters. The experiments were performed as per the L9 Orthogonal array. Regression analysis and variance analysis (ANOVA) were conducted for base pressure (β) by considering M, α, and ϕ. The results found M significantly contributes to base pressure (β), followed by α and ϕ. The regression models developed were sufficiently accurate in making predictions for base pressure (β). The same authors attempted to model and analyze the base pressure from a sudden expanded flow process using the Response Surface Methodology technique (Quadros et al. [21]). The authors implemented the central composite design (CCD) and Box-Behnken design (BBD) to develop non-linear regression models. The BBD model results in making accurate predictions for base pressure compared to the CCD model. The techniques mentioned above have been helpful in the analysis of base pressure. However, these methods cannot accurately account for the parameters' effects as they are predominantly experiment and statistical-based. Therefore, it is imperative to propose a novel approach that could overcome the current issues.
Typically, a strategy that characterizes a problem containing numerous non-linearities is classified as machine learning (ML). It is a widespread technique that could be used for the prediction of complex issues, like predicting the stability of metal frameworks [22], design optimization of composite frameworks with limited training data [23], prediction of mechanical properties of composite materials [24], damage detection in metallic parts [25], industrial demand forecasting [26], image recognition [27] and health monitoring [28], etc. The most commonly used ML models for the prediction of such problems are artificial neural networks (ANN), support vector machines (SVM), and random forests (RF). Of these, the ANN technique comprises diversely interconnected neurons. It is commonly used for the prediction of problems that involve non-linearity, such as fatigue life prediction [29], fatigue caused due to loading [30], prediction of damage induced by fatigue [31]. The suddenly expanded flow process majorly deals with non-linearity due to significant changes in the flow density (as the flow progresses downstream, the nozzle exit into the expanded duct), leading to undesirable base pressure changes.
In contrast, the RF technique is commonly adopted for classification or regression issues and is trained using bagging and variable collection. These include modeling mechanical properties for composite/metallic materials [32], modeling manufacturing processes [33], etc. The SVM method is derived from the statistical learning theory and is mainly based on risk minimization. It is more commonly used for problems that involve forecastings, such as engine life estimation [34], strength analysis of concrete [35], and assessment of compaction properties [36]. While many studies have been conducted to determine and optimize base pressure through different techniques, no studies have reported computing base pressure using these three ML models.
In the present study, data-driven analysis of base pressure (β) is performed by considering specific geometric and kinematic parameters. These parameters include Mach number (M), nozzle pressure ratio (η), area ratio (α), and length to diameter ratio (ϕ).
As a first step, experiments are performed to determine the base pressure for various input parameters combinations. The response surface methodology (RSM) [37][38][39][40][41] and regression [42] approach was implemented to develop the base pressure (β) response equation. Typically, the experiments were conducted as per the central composite design (CCD) and Box-Behnken design BBD [38], which accommodated four input factors at three levels and consisted of 27 datasets each. The response equation developed from both these designs was used to generate 1000 output data that were used to train the ML models. Subsequently, ANN, RF, and SVM models were employed for base pressure (β) prediction, and the results were correlated to prove the accuracy of this novel technique.

Nozzle Design
The present study considered three Mach numbers. The detailed dimensions of one of the nozzles, i.e., Mach 3.0 nozzle, have been shown in Figure 2. The exit diameter of the nozzle is kept constant at 10 mm. The throat diameters of the nozzles of Mach 2.0, 2.5, and 3.0 have been derived from the gas dynamics tables by Genick [43]. The nozzle was designed and fabricated based on theoretical calculations. These calculations gave the design Mach number for the nozzles. However, the nozzle was calibrated to determine the actual Mach number at the nozzle exit once fabricated. The calibrated nozzles showed that the exact Mach number at the nozzle exit was 2.0, 2.5, and 3.0, respectively. The detailed dimensions of the three nozzles are shown in Table 1.

Experimental Setup
The different zones of a suddenly expanded flow process have been shown earlier in Figure 1. The experimental setup is shown in Figure 3. The nozzle attached to the settling chamber and expanded duct is shown in (Figure 3a). The pressure tank connections with the settling chamber delivering the required pressure to the nozzle are shown in ( Figure 3b). The ratio of pressure set in the settling chamber to atmospheric pressure is defined as the nozzle pressure ratio; the ratio of expanded duct area to the nozzle exit area is called the area ratio. The pressure sensor is attached to the expanded duct, and the data acquisition system displays the base pressure measurement. The actual setup consists of a circular convergent-divergent (C-D) nozzle. It has eight holes at its outer circular periphery. The holes are denoted by 'm' and 'c' and are equidistant from each other. The holes marked 'm' are used to measure the base pressure, and the holes marked with 'c' are used to manipulate base pressure by blowing air in the form of microjets. A tube connecting the blowing chamber and holes 'c' is used for blowing air. It must be noted that the current research work deals with base pressure without the use of microjets or active control. A pressure transducer (PSI 9010 model) ranging up to 300 psi pressure range was used to record the variation in base pressure. It consisted of 16 channels and recorded the measurement at an average of 250 samples/second. The operating temperature range for the pressure transducer was between −20 • C to +70 • C. It had a resolution of ±0.003 psi and accuracy in the reading of up to ±1 percent. The uncertainty in the measurement of base pressure was conducted as per [7]. The uncertainty in base pressure was found to be approximately ±1.803%.The backpressure values were added to the gauge pressure values to get the absolute pressure values. The base pressure values were then normalized with the atmospheric pressure values. Before the experimentation, it is imperative to check for the actual Mach numbers of the fabricated nozzles as there are chances of these nozzles differing from their exact Mach numbers. These types of cases develop due to a certain level of inaccuracies in the nozzle fabrication. The nozzles are calibrated by determining the nozzle exit total pressure using a pitot tube. A bow shock exists, and the pitot tube records the pressure beyond the shock wave and is given by Equation (1).
Here, P 01 is the settling chamber pressure, M 1 is the exit Mach of the nozzle, and γ is the ratio of specific heat at constant pressure to constant volume (1.4 for air). The nozzle Mach exit was determined by Equation (1). The design Mach numbers were the same as the actual Mach numbers for all the nozzles used in this study

Response Surface Methodology
The RSM is essential in identifying critical variables for various industrial activities, such as process preparation and experimentation [37][38][39][40][41]. Such designs allow several different elements for a wide range of options that the controller could design at two or more levels. The two popular RSM designs are the central composite design (CCD) and the Box-Behnken design (BBD) [32]. A particular design is selected to perform the experiments and obtain a potential response function dependent on the input variables. This concept would help achieve base pressure control using suitable parameters in light of the current issues. Therefore, the RSM approach consisting of various combinations of input parameters is implemented to obtain the input-output data used to train the ML models.

CCD and BBD Designs
Various designs with all possible combinations of input parameters are available in RSM. Generally, a robust and well-ordered design must be implemented for the current set of experiments. A variable causes variance in the response function due to the difference between levels of the input variables. This is known as the main effect and is of critical significance in the experiment. The current investigation understands that the problem is non-linear, either a CCD or BBD could be designed and implemented with the chosen parameters and levels. The design parameters such as M, η, α, and ϕ were selected as input parameters. The first column is designated for M, the second for η, the third for α, and the fourth for ϕ. The factors and levels are shown in Table 2. Both the CCD and BBD individually accommodate a total of 27 experiments for a combination of input variables. The detailed CCD and BBD designs showing input parameters at different levels are shown in Appendix A.
The CCD and BBD designs are executed on Minitab 19 software. The present work develops non-linear regression models for base pressure (β). The response equations for base pressure as per the CCD and BBD design matrices were displayed in Equations (2) and (3), respectively.

Data Collection
The performance of ML models is generally based on the quality and quantity of the data used. For our study, thousand (1000) sets of data have been generated. Out of these, 54 data were generated using the CCD and BBD experiments, and the rest 946 were generated using the response equations for various combinations of input parameters.
Typically, the ML models require massive data to adjust and optimize the parameters during training. It would be impractical to collect such data merely through experiments due to the time and cost involved. Therefore, the response equations developed by CCD and BBD designs through actual experiments were used as an effective tool to generate considerable input-output data by selecting the input parameters within their respective levels and ranges.

Training Data
Experiments have been conducted using the CCD and BBD designs. The two nonlinear regression equations were developed to measure the response base pressure (β). Both the CCD and BBD non-linear regression models have been tested for statistical adequacy. For this purpose, 15 random experimental test cases were performed, as shown in Appendix B. These test cases consisted of various combinations of input parameters that differed from those used in the CCD and BBD design matrices. However, they were within the respective range of levels (refer to Table 1). The regression equations were developed using the statistical Minitab 20 software. It was found that the regression equations were able to predict base pressure accurately with an mean absolute error (MAE) of 6.40%. More importantly, a vast database was generated using the regression equations. The response/regression Equations (2) and (3) were used to create one thousand (1000) output base pressure values corresponding to various combinations of input parameters within the respective range of levels.

Artificial Neural Network (ANN) Model
The ANN and the ML algorithms are facilitated by a brain structure [44]. The training phase primarily determines the effectiveness of the results produced by ANN. The current study uses a multi-layer perception (MLP) neural network. Figure 4 shows the MLP network architecture. The optimal architecture shown in Table 3 helps to achieve results that require low computing time and cost. The input, output, and hidden layers form the network architecture. All these layers consist of neurons. The neuron behavior (i) is represented by Equation (4).  The inputs x j and weights w ij are multiplied by each other. When their summation, i.e., ∑ w ij x j increases to a particular value that is beyond t i , using the activation function f i , the neuron output y i can be calculated. The activation function is called the mathematical 'gate,' which feeds the output neuron to the subsequent layer. The present study uses a sigmoid function as the activation function, f x = 1 1+e −x . This function assures that a range between 0 and 1 is maintained as the values are passed on to the next neuron. Random values have been assigned to weights and thresholds modified by a different approach that minimizes the output errors during training. The current MLP neural network uses a backpropagation algorithm for data training. The procedure followed is as per Guerrero et al. [45].

Random Forest (RF) Model
Typically, a traditionally organized machine learning approach employs a statistical technique of non-linear nature, paired with an ensemble learning strategy [46]. For the present work, RF model is not used for base pressure prediction. Moreover, the RF has not been compared with the other ML models. The average values of the output of hundreds of decision trees generated by the RF are computed to predict the final base pressure result. The decision tree is formed by leaf nodes and decision nodes that use a test function to evaluate the input data. Ideally, the training process here includes decision tree development that is uncorrelated.
The RF models are generally trained by the bootstrap aggregating algorithm [47]. Bootstrap is the process of compiling samples randomly. The training set is S n = {(X 1 , Y 1 ) . . . (X n , Y n )}.
Here, X represents the input vector with m variables and X = {x 1 , x 2 , . . . x m }. The function f (X, S n ) is generated at the closure of every training stage, and the input data and splitting of input data occur at every node. The bootstrap algorithm collects a few samples (S n1 , . . . S nk ) establishes prediction trees represented by f (X, S n1 ), . . . , f (X, S nk ), and delivers the output in the form Y 1pre = f (X, S n1 ), . . . Y 1pre = f (X, S nk ). The Y pre [47] is calculated by using Equation (5) given below There are three parameters of prime importance in the RF model which are to be adjusted. Those include the number of trees n tree , number of variables n var , and depth of trees n dep . It is a well-known fact that each node in a tree is attached to another node through a direct edge. Moreover, every node could be linked to 'n' number of nodes in a tree. The tree depth is the edges that exist between the root and the leaf node of the tree. More information about the data is achieved when the depth of the tree n dep is higher. The n tree is also a parameter that improves the prediction accuracy of the output. The output accuracy increases with an increase in the n tree . The parameter n var is quite a sensitive parameter that performs well when the n var is equal to 1/3rd number of input variables. The present study uses n tree = 100, n var = 2, and n dep = 20 to achieve a reasonable balance between accuracy and computation cost. Figure 5 summarizes the RF regression and its general operation is as follows: • Initially, the bootstrap samples are extracted from the training sample.

•
The bootstrap samples that are extracted are established with regression trees by following the steps as mentioned. The sampled input variables are provided with a suitable split at every node, which is the RF algorithm's adjustment parameters. The regression tree is treated as complete when there is no split.

•
The trained RF model conducts data prediction for the new inputs.

Support Vector Machine (SVM) Model
The structure of the SVM model shown in Figure 6 consists of an input vector, X = {x 1 , x 2 , . . . x n }, and an output vector Y = {y 1 , y, . . . y m }. Here, X is a four-dimension variable consisting of input variables M, η, α, and ϕ. Y represents the output base pressure (β) that is dependent on the input variables. The input-output relationship is established by SVM using Equation (6).
Here, f (x) represents the regression function, ϕ(x) is the function of non-linearity, ω and b are the weight coefficients and bias term, respectively. The SVM is devised with constrained optimization [48], then converted to a dual optimization problem using a Lagrangian function [49].

Prediction of Base Pressure by Novel Methods
In the present study, base pressure has been predicted and compared with a few base pressure values obtained from experiments. A total of 69 experiments were conducted and compared with the results computed. Fifty-four experimental results were obtained.
CCD and BBD designs (see Appendix A) were derived from Appendix B, with 15 experimental results. The computed/predicted base pressure results against experimental results have been shown in Figure 7. The results observed that the ANN model could effectively predict base pressure as all the predicted base pressure data were within the three error bands (±5%). On the other hand, the RF model had four data sets located beyond the three error bands, and the SVM model had six data sets. Additionally, the mean squared error MSE and determination coefficient R 2 are determined to assess the ML models' performance. The R 2 and MSE Equations (7) and (8), respectively, are shown below where y i is the ith experimental data, y pre i is the ith predicted value, and y mean is the average experimental value. The value of R 2 must be closer to 1.0, and the MSE value should be as small as possible to attain better predictability. The MSE values are normalized between 0 and 1 to make the results more precise and logical. Figure 8 displays a plot of the predicted base pressure outputs of the various ML models, and it is conclusive that the ANN model performed better than the other models.

Base Pressure Prediction Using the ANN Model
The current section explains the influence of hidden layers and neurons on the predicted base pressure values. Initially, the hidden layers are varied from 1 to 4 to study the sensitivity of the ANN model. The first hidden layer is acquainted with 25 neurons. The details of neurons in various hidden layers have been summarized in Table 4. It is shown in Figure 9 for the plot of predicted base pressure against experimental data that most of the predicted base pressure for all the ANN models lies inside the three error bands. Only one set of predicted base pressure data was found to lie outside the three error bands, and this was for the ANN model with three hidden layers. This meant that the ANN model with three hidden layers performed better and was optimal for the data within the ranges (Refer Table 1), shown in Table 4 (R 2 = 0.635). The version of each of the ANN models with different hidden layers is also shown in Table 4.
Second, ANN models' sensitivity is examined by varying the neuron numbers. Depending on the optimal results obtained, a three hidden layer ANN model is proposed. Generally, the precision and accuracy of predicted base pressure appear to rely more upon the neurons in the first hidden layer only [11]. The neuron numbers in the first layer are varied from 15 to 30. The neurons in the second and third layers are unaltered. Table 5 provides a description of the comprehensive architectures for various ANN models.   The same database was used to train these ANN models. The plot for predicted base pressure versus experimental base pressure has been shown in Figure 10, and the ANN model predicted results had agreed well with the experiments. Table 5 shows all the ANN models' performance, and it is pretty evident that an increase in the neuron number increases the prediction accuracy. However, there is hardly any significance once the neuron number is increased beyond 25. Third, the neuron numbers only in the hidden layer are varied, leaving out the first layer, and the sensitivity is studied. Subsequently, for this particular case, the neuron numbers in the first hidden layer are not modified, and the architecture with three hidden layers has been proposed for all models. The second hidden layer consists of neurons in the range of 15 to 35, and the third hidden layer has neurons ranging from 10 to 30. The details of the ANN models have been shown in Table 6.
From Figure 11, it can be seen that the best R 2 value was 1.0, and the smaller the MSE value, the better was the prediction of the result. It can be seen that an increase in the neuron numbers in the second and third hidden layer increases the R 2 and subsequently decreases the MSE value. The predicted result did not show much difference as the neuron numbers in the second hidden layer increased beyond 30. However, as the neuron numbers in the third hidden layer were raised, a higher accuracy level in the prediction was observed. There was no difference in the predicted results once the neuron numbers crossed 25 for the third hidden layer. The above discussions mark some crucial conclusions on base pressure prediction by the ANN model. Primarily, the prediction accuracy of the ANN model does not change if there are more than three hidden layers in the ANN architecture. Secondly, an increase in the neuron number for the first hidden layer yields improved accuracy. Finally, the number of hidden layers and neuron numbers in the first hidden layer must be set to 2 and 25, respectively, to improve prediction accuracy.

Base Pressure Prediction Using the RF Model
The RF model involves two critical features that are needed to be investigated. The features include (i) number of training data (ń data ), number of trees (ń tree ), number of sensitive parameters (ń var ), and tree depth ń dep , (ii) base pressure prediction variations versus (ń data ), (ń tree ), (ń var ), and ń dep . The RF model variables and their descriptions have been mentioned in Table 7.  Figure 12 shows the variation of prediction accuracy concerning the RF variables. The MSE values obtained here again have been normalized for a better understanding of the results. The metric value versusń data has been shown in Figure 12a. It shows that, for an increase in theń data , the R 2 increases, and MSE decreases. Typically, the prediction accuracy increases with a higherń data . Hence, it is imperative to have ample training data to achieve higher accuracy for the predicted results. Figure 12b shows the variation of metric value versusń tree for a constantń data ,ń var andń dep . It is seen that very little change in R 2 is observed, and the MSE has almost decreased by 20% whenń tree has been varied from 10 to 100, and thusń tree = 100, is preferred. In Figure 12c, variation of metric value versusήvar has been shown at constantń data ,ń tree , andń dep . It is seen that the R 2 and MSE values almost converge whenήvar is greater than 2. Figure 12d shows metric value variation versusń dep whenń data ,ń var , andń tree are kept constant. A considerable difference in the values of R 2 and MSE was observed when the value ofήdep was lesser than 8. Hence, the value ofń dep must be maintained at eight or higher to predict base pressure values accurately. To further understand the prediction of base pressure with the given parameters (M = 2.5, η = 7, α = 6.25, ϕ = 7), the plots of predicted base pressure for different RF models are shown in Figure 13. For the given input combination, the experimental base pressure is β = 0.395. It can be observed that the RF predicted base pressure and experimental base pressure have a minimum error whenń data = 623,ń tree = 20,ń var = 3, andń dep = 20. Figure 13a indicates that the predicted base pressure is inaccurate when the training datá η data is small. Similarly, Figure 13b observes that the value of predicted base pressure converges when the number of treesń tree is above 50. The curve of predicted base pressure versusń var is shown in Figure 13c. Here, the predicted base pressure does not vary much with the change inń var . All the predicted values lie in the range of 0.7 to 0.8. Hencé n var does not influence the predicted base pressure. Finally, the effect of depth of treeś n dep on predicted base pressure is presented in Figure 13d. The predicted base pressure (β) decreased with an increase inń dep beyond 4. From the above discussions, it can be concluded thatń data andń dep significantly influence the prediction accuracy of base pressure. The values ofń data andń dep must be maintained at a value greater than 600 and 4, respectively, to obtain an accurate prediction. Additionally,ń var hardly influences the predicted base pressure, and the prediction accuracy is much higher whenń tree is above 50.

Base Pressure Prediction Using the SVM Model
The SVM model investigates two crucial aspects for the prediction of base pressure. They include (i) variations in prediction accuracy and predicted base pressure withń data and (ii) variation of predicted base pressure concerning input parameters. Figure 14a shows the variation of metric values versusń data . Here, it can be seen that for the training data between 400 to 600, the R 2 value increases, and the MSE value decreases. However, as thé n data increases beyond 600, both R 2 and MSE values show consistency. This implies that an increase in the training data does not improve the prediction accuracy of base pressure. Figure 14b shows the variation of predicted base pressure versusń data . Figure 14c shows the variation of error (%), i.e., (percentage error between the predicted base pressure and experimental base pressure) concerningń data . Here, the minimum error is observed when theń data is slightly higher than 600.
The predicted base pressure is further studied for input parameters to validate the SVM model. For this, variation of predicted base pressure versus Mach number (M), nozzle pressure ratio (η), and area ratio (α) has been plotted in Figure 15a-c. Figure 15a shows a linear increase in the predicted base pressure when the (M) is increased from 2.1 to 3.0. A significant increase from 0.4 to 0.9 is observed. This shows that the predicted base pressure is highly sensitive to Mach number changes [7]. Figure 15b shows a decline in the predicted base pressure results when the (η) is increased from 3 to 7 [3]. However, the decline is not as steep as it was for (M). Hence the effect of (η) on the improvement of predicted base pressure is not apparent.
Furthermore, the predicted base pressure showed a significant increase for α up to 5, as shown in Figure 15c. Beyond this value, the predicted output did not show much variation. Therefore, it is imperative to design suitable base pressure models to reduce base drag for aerodynamic vehicles. The conclusions drawn from the SVM model are

•
The accuracy of predicted base pressure highly reliable on theń data . The relative error (%) shows a significant decrease forń data between 400 and 600. However, forń data above 600, a trivial change in error (%) is seen.

•
The Mach number (M) is found to have a more significant impact on base pressure when compared to the other parameters. Specifically, base pressure can be improved by increasing the (M), reducing the base drag.

Conclusions
The present study developed a platform for data-driven analysis of base pressure prediction. Additionally, the predicted base pressure values are analyzed in-depth to determine the essential characteristics of various machine learning models. Parametric studies are also performed to decide how expected accuracy and base pressure vary as independent variables in the machine learning models are modified. As a result, some conclusions are reached, outlined below.

•
Compared to the RF and SVM models, the ANN model best predicted base pressure as it had the highest accuracy. Additionally, the RF model showed better performance than the SVM model.

•
The prediction accuracy of the ANN model increased with an increase in the neuron numbers for the first hidden layer. Additionally, the architecture comprising three hidden layers and 25 neurons in the first hidden layer is prescribed to predict base pressure with high accuracy. • The RF model parameters,ń data , andń dep , affected the predicted base pressure considerably. Values ofń data ≥ 600 andń dep ≥ 4 are recommended to achieve a higher prediction accuracy. The prediction accuracy was also said to improve when the value ofń tree was maintained above 50. • A rapid decrease in the relative error (%) with the rise in training data,ń data , was observed for the SVM model. Additionally, predicted base pressure is susceptible to changes in Mach number. In this way, it is helpful to increase the Mach number (M) to increase the base pressure, thereby reducing base drag. Funding: This research received no external funding.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The participants of this study agree to share data upon request.