2.2. Theoretical Foundations
  2.2.1. Empirical Risk Minimization
ERM is a fundamental principle in machine learning that aims to minimize the empirical risk, or training error, on a given dataset. This principle operates under the assumption that minimizing the loss function on training data will yield a model that generalizes well to unseen data. However, ERM is susceptible to overfitting, particularly when dealing with high-dimensional feature spaces or limited sample sizes. The latter scenario is frequently encountered in probe data processing.
Let the model space be defined as 
, containing regression models with parameters 
, as shown in Equation  (
1).
          
Here, X represents the input to the function f, Y represents the output, and  represents the parameter space of the models.
The empirical risk, 
, is typically used to estimate the expected risk or loss on a dataset with 
N samples, as defined in Equation (
2).
          
L represents the loss function, commonly chosen as the mean squared error (MSE) loss in regression problems, as shown in Equation (
3).
          
 In Equation (
3), 
 represents the predicted value vector, 
X is the input vector, and 
Y represents the actual value vector.
Given these definitions, the ERM principle can be succinctly formulated as the minimization of the empirical risk over all models in the model space, as shown in Equation (
4):
  2.2.2. The Definition of Structural Risk Minimization
While ERM provides a straightforward approach to model fitting, it can lead to overfitting, especially with limited data. To address this limitation, the concept of structural risk minimization (SRM) was introduced as an extension of ERM. SRM incorporates a model complexity penalty term, thereby effectively balancing between empirical risk and model complexity. The superiority of SRM lies in its ability to control the model complexity, mitigate overfitting, and enhance generalization performance. Furthermore, SRM provides a theoretical foundation for model selection, facilitating the identification of optimal models within hypothesis classes of varying complexities.
          
          Here, 
 represents the penalty function, which measures the complexity of the model 
f. The parameter 
 controls the strength of the penalty, thereby dictating the balance between fitting the training data and keeping the model simple. Additionally, SRM establishes a strong connection to regularization techniques, offering theoretical justification for practices such as L1 and L2 norm regularization. By seeking an equilibrium between empirical risk and model complexity, SRM yields more robust models with improved generalization capabilities, thus establishing itself as a cornerstone in modern machine learning applications, as shown in Equation (
6).
          
  2.2.3. Overview of Applied Regression Methods
Table 1 presents a concise overview of the regression methods utilized in this study. The table outlines four critical aspects for each method: symbolic representation, underlying principle (ERM or SRM), specific regression technique, and solution approach. The methods span from classical polynomial regression to advanced techniques such as lasso, ridge, and kernel ridge regression. Each method offers distinct characteristics in terms of model complexity, regularization, and capacity to handle non-linear relationships.
   Polynomial Regression
Polynomial regression extends linear regression to model nonlinear relationships, while maintaining the computational advantages of linear methods. Its key insight lies in transforming nonlinear polynomial terms into linear features through the Vandermonde expansion. Consider a polynomial model of degree 
n:
Through the Vandermonde transformation, each power term 
 is treated as an independent feature 
, effectively converting the polynomial model into a linear form:
            where 
 represents the model parameters and 
 for 
. This transformation preserves the model’s ability to capture nonlinear relationships, while allowing the use of standard linear regression techniques for parameter estimation. Using the mean squared error (MSE) as the loss function, the optimization problem is expressed as
            
The solution is obtained using ordinary least squares (OLS):
            where 
X is the design matrix with transformed features and 
y is the vector of the target values.
This approach, as illustrated in Equations (
7) through (
10), allows polynomial regression to capture nonlinear relationships while utilizing a linear regression framework.
  Lasso Regression (L1 Regularization)
Lasso regression incorporates L1 regularization into the linear regression model. The SRM-based optimization problem for lasso regression can be defined as
            
            where 
 is the regularization parameter. As shown in Equation (
11), the L1 penalty term encourages sparsity in the model parameters, effectively performing feature selection. The solution is typically obtained using the coordinate descent algorithm, which iteratively optimizes each parameter while holding others fixed.
  Ridge Regression (L2 Regularization)
Ridge regression employs L2 regularization, which adds a penalty term proportional to the square of the magnitude of the coefficients. The SRM-based optimization problem for ridge regression is
            
The L2 penalty term in Equation (
12) helps to prevent overfitting by shrinking the coefficients towards zero, but unlike lasso regression, it does not typically produce sparse models. The solution is obtained using Cholesky decomposition, which efficiently solves the resulting system of linear equations.
  Kernel Ridge Regression
Kernel ridge regression (KRR) is an extension of ridge regression that incorporates kernel methods, similarly to SVR. Kernel methods are effective for handling non-linear data in machine learning. KRR combines the regularization properties of ridge regression with the kernel trick, enabling it to operate implicitly in a high-dimensional feature space without explicitly computing the mapping. The primal optimization problem for KRR can be formulated as
            
            where 
 is the mapping function to the feature space and 
 is the regularization parameter. However, solving this primal problem directly can be computationally expensive or infeasible for high-dimensional feature spaces. Instead, a dual problem is solved:
Here, K represents the kernel matrix with entries , I is the identity matrix,  is the vector of the dual variables, and y is the vector of the target values. The kernel trick allows efficient computation without explicitly determining .
The solution to the dual problem in Equation (
14) is obtained as 
, typically using Cholesky decomposition, which is computationally more efficient than directly solving the primal problem in Equation (
13). This approach allows KRR to effectively handle non-linear relationships in the data.
Two commonly used kernel functions in KRR are
            
- Polynomial Kernel:  where d is the degree of the polynomial and c is a constant. 
- Radial Basis Function (RBF) Kernel:  where  is a parameter that determines the width of the Gaussian function. 
  2.2.4. S-Fold Cross-Validation
S-fold cross-validation is a robust technique for evaluating a model’s generalization error, which refers to its performance on unseen data. This method mitigates the risk of overfitting and provides a more reliable estimate of model performance compared to a single train–test split. It is particularly valuable in this study due to the limited availability of experimental data on probe calibration, as it allows for efficient use of all available data points for both training and validation.
As shown in 
Figure 2, in S-fold CV, the dataset is randomly partitioned into 
S equally sized subsets. The model is then trained 
S times, each time using 
 subsets for training and the remaining subset for testing. The average of the 
S testing MSE is used as an estimate of the model’s generalization error 
E:
          where 
 represents the validation error for the 
i-th fold.
By training and evaluating the model on different subsets of the data, S-fold CV reduces the model’s dependence on any specific data partition, leading to a more stable and accurate assessment of its ability to generalize. This is particularly beneficial when dealing with limited data, as it maximizes the utilization of the available samples.
  2.2.5. Hyperparameter Optimization
In machine learning, and particularly in the context of multi-hole pressure probe data processing, the performance and generalization capability of regression models are heavily influenced by their hyperparameters. These are parameters that are not learned from the data but are set prior to the training process. Examples include the regularization strength, kernel coefficients,  etc. The process of finding optimal hyperparameters is crucial yet challenging, often referred to as the “outer loop” of machine learning. Hyperparameter optimization was crucial in this study, to ensure that each regression model achieved the best possible performance, thereby allowing for a fair comparison between the different modeling approaches.
This study employed the Optuna framework for hyperparameter optimization, leveraging the tree-structured parzen estimator (TPE) algorithm for efficient tuning [
31]. The TPE, a form of sequential model-based optimization, builds a probabilistic model of the objective function to select promising hyperparameters for evaluation. It is particularly effective for high-dimensional and conditional hyperparameter spaces [
32]. The hyperband algorithm was implemented for pruning, adaptively allocating resources to promising configurations and terminating poor performers early [
33]. This combination of TPE sampling and hyperband pruning allowed efficient exploration of the hyperparameter space, automatically stopping unpromising trials and focusing computational resources on more promising configurations.
The hyperparameter optimization process for each regression model in this study followed these steps:
- Definition of the hyperparameter search space specific to each model, including parameters such as regularization strengths and kernel coefficients. 
- Specification of the objective function, typically minimizing the mean squared error on a testing set. 
- Configuration of the Optuna study with the TPE sampler and hyperband pruner. 
- Execution of the optimization process for the given trials, balancing exploration with computational constraints. 
- Selection of the best hyperparameters based on the lowest testing error. 
  2.3. Regression Model Validation
The proposed methodology was rigorously validated using the McCormick function, a well-established benchmark in optimization and machine learning research [
34,
35]. This bivariate function, defined by Equation (
16) and visualized in 
Figure 3, serves as an ideal proxy for evaluating regression model performance, due to its mathematical properties that closely parallel the complexities encountered in multi-hole pressure probe measurements.
The McCormick function exhibits several key mathematical characteristics that made it particularly suitable for validating the proposed approach. Its structure combines trigonometric coupling through , quadratic interaction via , and linear terms , creating a non-convex surface with multiple local minima. This mathematical composition generates a smooth, continuous, and differentiable surface that effectively simulates the complex pressure distributions encountered in turbomachinery flows. The variable coupling in this function mirrors the interdependent nature of flow parameters in probe measurements, where changes in one parameter often influence others in nonlinear ways. Additionally, its well-defined analytical form enables precise quantification of regression performance, allowing for rigorous assessment of both model accuracy and generalization capability across different operating conditions.
To generate a comprehensive modeling validation dataset, the McCormick function was evaluated over a 
 grid spanning the domain 
. This resulted in a dataset comprising 2500 data points. In addition, to simulate real-world measurement conditions and assess the models’ resilience to noise, Gaussian noise was introduced into every point in the dataset. The noise was sampled from a normal distribution with a mean of 0.5 and a standard deviation of 2.0.
        
This study applied hyperparameter optimization to all models, including the traditional polynomial regression method, demonstrating the compatibility of the proposed approach with conventional modeling frameworks. In the hyperparameter optimization process, the number of trials was set to 500. 
Table 2 presents the hyperparameter configuration for the regression models used in this study. For all models, the parameter 
S represents the number of S-fold cross-validation subsets, ranging from 15 to 30. The polynomial order, applicable to poly, lasso, and ridge regression, and KRR with polynomial kernel, ranged from 4 to 15, enabling the models to capture increasingly non-linear relationships in the data.
For lasso, ridge, and KRR, the regularization parameter  was explored within the range [0.01, 1000], allowing for a wide spectrum of regularization strengths. The larger , the heavier the strength of the regularization terms.
KRR includes additional kernel-specific parameters. For the RBF kernel, the  parameter, which influences the kernel’s width, was optimized within [0.01, 500]. The polynomial kernel for KRR shared the same  range but also included the polynomial order and the constant term c. The order matched the range used in the other polynomial models (4 to 15), while c was explored from 0 to 20.
Table 3 summarizes the optimal hyperparameters and corresponding performance metrics for each model, while 
Figure 4 illustrates the MSE, error ratio (ER), and 
 values for both the training and test subsets across the different S-fold cross-validation configurations. As shown in 
Figure 4a, the MSE values fluctuated with varying S, ranging mostly between 3.5 and 5.5. Test errors were generally higher than training errors, indicating some degree of overfitting. KRR exhibited relatively smaller fluctuations compared to the other models. 
Figure 4b demonstrates the consistently high 
 values (above 0.97) for all models, with training 
 generally exceeding test 
.
 The ER, defined as the ratio of test MSE to training MSE (Equation (
17)), served as an overfitting indicator. 
Figure 4c shows ER values fluctuating between 0.8 and 1.4. Ridge regression displayed some of the highest ER peaks, particularly around S = 10, while KRR exhibited high ER variability, suggesting sensitivity to the choice of S. As per 
Table 3, all models had ER values slightly above 1, indicating mild overfitting. Notably, although the polynomial regression (ERM approach) showed the lowest ER among the models, it exhibited the highest standard deviation of test MSE. This contrast suggests that, while the ERM method may appear to have had less overfitting on average, it suffered from poor stability across the different cross-validation configurations. The high variability in test MSE indicates that the ERM approach was the least stable among the methods compared, highlighting a significant drawback of this approach in handling diverse data scenarios.
The SRM-based models (lasso, ridge, and KRR) consistently outperformed the ERM-based polynomial regression model across the various metrics. This superior performance of the SRM models can be attributed to their ability to balance model complexity and empirical risk, leading to better generalization. Specifically,
        
- Lower MSE: All SRM models achieved lower a average test MSE compared to the polynomial regression model, with ridge regression showing the best performance (4.25432). 
- Higher : The SRM models demonstrated higher  values, indicating a better fit and explanatory power. Ridge regression again led with an  of 0.98047. 
- Stability: KRR, an SRM approach, exhibited the most stable performance with the lowest standard deviation for test MSE (0.56482), suggesting a better robustness across the different cross-validation configurations. 
- Overfitting Control: While all models showed mild overfitting (ER > 1), the SRM models generally maintained low ER values, indicating a good generalization ability. 
In general, the ridge regression model marginally outperformed the others, achieving the lowest average test MSE (4.25432) and highest  (0.98047). However, the KRR model demonstrated the most stable performance with the lowest standard deviation of test MSE (0.56482). The optimal configurations favored moderate complexity, with S ranging from 19 to 27 and polynomial orders between 7 and 9 across all models.
The results underscore the advantages of SRM approaches in regression tasks, particularly when dealing with complex relationships and limited data. The ability of SRM models to introduce regularization and control model complexity proves beneficial in achieving a balance between fit and generalization. This validation study demonstrated the value of comprehensive model evaluation and hyperparameter tuning in regression tasks. It reinforces the “no free lunch” (NFL) theorem’s assertion that successful model selection requires experimentation with various algorithms and careful consideration of the specific problem at hand. While each model exhibited strengths in different performance aspects [
36,
37], the overall superiority of the SRM approaches in this context was evident, highlighting the importance of dataset-specific model selection and a data-driven approach in machine learning.
Through these validation results, the generalization ability, accuracy, and specific model selection of the regression model framework proposed in this paper were effectively verified. This validation laid a solid foundation for the subsequent multi-hole pressure probe data processing. By establishing the robustness and flexibility of the modeling approach across different scenarios, with a particular emphasis on the advantages of SRM methods, it ensured that the framework can be confidently applied to the complex task of analyzing multi-hole pressure probe data, where accurate and reliable regression models are crucial for interpreting the fluid dynamic measurements.
        
  2.4. Linear Cascade Experiment Setup and Probe Data Processing Process
This study utilized a Z-type five-hole pressure probe for flow measurements. 
Figure 5 illustrates the structural form and geometric definition of the probe. Compared to conventional L-type probes, a Z-type probe offers an additional degree of freedom (DoF) in measurement, resulting in a total of two measurement DoFs. This enhanced flexibility allows for more adaptive positioning at the test outlet in terms of both location and orientation. However, it is important to note that this design typically limits measurements to approximately half of the blade height flow field.
The probe configuration, as shown in 
Figure 5, is as follows:
- Hole 2 serves as the center hole 
- Holes 1 and 3 define the yaw angle  
- Holes 4 and 5 define the pitch angle  
The probe data processing framework, designed for efficient regression analysis, is illustrated in 
Figure 6. This framework comprises two primary phases: regression model construction, and model application.
The regression model construction phase consisted of four key layers:
- Data Layer: This utilized the calibration data as the foundation for model training. 
- Algorithm Layer: This layer implemented various regression algorithms using the scikit-learn library. 
- Optimization Layer: This employed the Optuna framework for hyperparameter optimization, to enhance model performance. 
- Storage Layer: This layer managed and stored the models and related data using the MySQL database. 
Upon completion of model construction, the “best model” was selected based on performance metrics. This optimized model was then transferred to the probe regression model application phase, which comprised three layers:
- Data Layer: This received and processed the experimental data. 
- Algorithm Layer: This layer applied the trained scikit-learn model for predictions. 
- Storage Layer: This utilized the MySQL database to store the prediction results and relevant information. 
This layered architecture ensured efficient and reliable data processing, while providing excellent scalability and maintainability. By separating the model construction and application, the framework can adapt to various types of probe data and support continuous model optimization and updates. Furthermore, the use of standardized tools and libraries (e.g., scikit-learn, Optuna, and MySQL) enhances the system stability and interoperability, allowing for seamless integration into broader data analysis and experimental management systems.
In summary, this framework offers a robust and flexible solution for probe data processing, enabling researchers to efficiently develop, optimize, and apply regression models in their experimental workflows.
  2.4.1. Probe Calibration: Regression Model Construction
Accurate probe calibration is crucial for reliable flow measurements. The probe calibration was conducted in the Dalian DLH-02 Wind Tunnel, with the calibration facility shown in 
Figure 7. The calibration setup provided precise control over both pitch and yaw angles. The probe was fixed on the axis O1-O1, controlling its pitchwise motion, which intersected with axis O2-O2 at a 
 angle, controlling the yaw-wise motion. The angle control mechanism achieved an accuracy of 
, ensuring precise positioning during calibration.
The pressure sensors used in the calibration had a range of 80–120 kPa, with an accuracy of  full scale, providing highly accurate pressure measurements. The calibration process covered a comprehensive range of angles and flow conditions. Both  and  angles were varied from  to  in increments of . Flow conditions were tested across Mach numbers ranging from  to , with increments of 0.1, resulting in 8 distinct operating conditions. This thorough calibration procedure resulted in a total dataset size of 1352 points.
Equation (
18) defines the key parameters used in the five-hole probe calibration. In this equation, the subscripts 1 to 5 represent the pressures measured at the corresponding probe holes. 
 denotes the upstream total pressure, 
 the upstream static pressure, and 
 the average pressure of holes 1, 3, 4, and 5. The equation introduces four crucial coefficients: 
 (yaw coefficient), 
 (pitch coefficient), 
 (total pressure coefficient), and 
 (static pressure coefficient). These coefficients were fundamental for characterizing the probe’s performance and formed the basis for the regression model construction.
          
Figure 8 illustrates the probe calibration maps derived from Equation (
18), providing a visual representation of the probe’s performance across the various flow conditions.
 The core challenge in constructing the regression model for the calibration map was to establish the system of equations represented in Equation (
19). This system related the measured pressure coefficients to the flow angles and Mach number, forming the basis for the flow parameter predictions. The modeling methods, processes, and hyperparameter selection and optimization were consistent with those described in 
Section 2.1 and 
Section 2.3. The best performing model was selected based on the comprehensive benchmark results, ensuring the optimal accuracy and generalization capability.
          
  2.4.2. Linear Turbine Cascade Experiment: Regression Model Application
The linear turbine cascade experiment was conducted in the Wide-Speed-Range and Variable-Density Continuous Wind Tunnel at the Southern University of Science and Technology (SUSTech). The experiment was performed in an open-loop configuration, with exhaust directed to the atmosphere.
The test object for this experiment was the VKI-RG (also known as VKI-LS59) cascade, which is a well-documented turbine blade profile [
38]. This cascade was selected due to its extensive documentation and relevance to high-subsonic turbine applications. The key parameters of the VKI-RG cascade are presented in 
Table 4. This table provides the essential geometric and aerodynamic characteristics of the cascade, including the chord length, blade height, solidity, stagger angle, and inlet flow conditions. The operating condition for this study was selected with an exit isentropic Mach number of approximately 0.8, representing high-subsonic flow conditions. This condition was particularly relevant for studying compressibility effects in turbine cascades.
The primary objective of the experiment was to measure the wake at the mid-span position of the VKI-RG blade. As illustrated in 
Figure 9, the probe measurements covered three blade passages from #3 to #5. Data were collected at 102 points, with a spatial interval of 1.25 mm.
The data acquisition process was based on steady-state measurements. After positioning the probe at each measurement location, there was a 3 s wait period, followed by 1 s data collection at a frequency of 10 Hz. Simple time averaging was applied to the collected data. The pressure acquisition device used had a range of 0–300 kPa with an accuracy of  full scale.
The wake measurement plane was positioned at a distance of  from the trailing edge. The probe was installed at an angle of  relative to the measurement plane. Consequently, this installation angle needed to be added to the processed  results to obtain the actual outlet flow angle .
The application of the calibration regression model for subsonic probe measurements, as illustrated in 
Figure 10, involved an iterative process to accurately interpret the experimental data. This process can be outlined as follows:
        This iterative approach ensured that the interdependencies between the flow parameters were properly accounted for, leading to a more accurate interpretation of the probe measurements. The process leveraged the calibrated regression models 
 and 
 to translate the raw pressure data into meaningful flow characteristics, including flow angles and Mach number. This robust methodology was expected to provide high-fidelity measurements of the complex flow field in the turbine cascade wake.