URBaM: A Novel Surrogate Modelling Method to Determine Design Scaling Rules for Product Families
Abstract
1. Introduction
2. URBaM Modelling Method
- First, the f1,1,1(x1) function is defined in L1. Here, DOE data is arranged in different data groups. The design points in each of these groups must have different x1 values, and the rest of design variable values must remain constant (xz = const., where z ∈ {i + 1…p}) (Step 1.1). In order to create data groups of these characteristics, a full factorial DOE must be defined; not all DOE types in the literature can be used for creating URBaM models.
- After defining data groups, a univariate regression function is selected (Step 1.2) and each of the data groups defined is modelled using it. The applied regressions relate the x1 design variable by means of simple univariate (2D) regression functions to the output variable y and the unknown coefficients are determined by the least square method (Step 1.3).
- In order to evaluate in future steps how the applied regression fits to the data groups, the coefficient of determination, R2, is measured together with the number of unknown coefficients that the applied regression function requires being calculate by the least squares method (Step 1.4).The regressions are applied to each data group (Loop 1.1).
- Multiple simple univariate regression types are available in the literature, and the URBaM model must select one of them prior to defining f1,1,1(x1). Table 3 sets out the type of regression functions considered by the URBaM surrogate modelling technique. In order to select the most appropriate function, the model studies all the regression functions in Table 3. To this end, the URBaM model repeats the following procedure: (i) takes one of these functions and applies it to each of the design point groups created in the node, (ii) calculates the R2 of each data group and determines the mean value, and (iii) identifies the number of unknown coefficients of the regression function. The same process is repeated with all functions in Table 3 (Loop 1.2). The best function option will be that with a high correlation between regression parameters (and thus, a high R2), while at the same time being a function type without a high number of unknown parameters to avoid overfitting. In other words, the URBaM method selects the function with the fewest number of unknown coefficients, but with the highest R2 value, considering 0.95 the minimum R2 value (Step 1.5).Figure 2. The steps followed by the URBaM modelling technique.Table 3. Type of regression functions used by the URBaM model.
Model Regression Function Lineal regression Second-order polynomial regression Third order polynomial regression Logarithmic regression Exponential regression Potential regression - The regression function selected by the URBaM method becomes the f1,1,1(x1) function in the root node (N1,1,1). Figure 3a shows an example of a second-order polynomial regression function selection case of one of the defined design point data groups. Following the same example, Figure 3b shows how the number of slave nodes created from the master node N1,1,1 is the same as the number of unknown coefficients determined by least square method in f1,1,1(x1). In this case, the function has three unknown coefficients, so three slave nodes are created from the master node (Step 1.6).Figure 3. (a) Example of a second order polynomial regression fitting in N1,1,1 and (b) its master/slave node configuration.Figure 3. (a) Example of a second order polynomial regression fitting in N1,1,1 and (b) its master/slave node configuration.
- In each node multiple design point data groups are modelled individually; thus, multiple yi+1,j,t coefficients will be determined. All of them will comprise the yi+1,j,t output variable. Following the example in Figure 3a, the A coefficients obtained in the applied regression of each data group will comprise the y2,1,1 vector, and the B and C coefficients will comprise the y2,2,1 and y2,3,1 vectors, respectively.
- The nodal structure in the layer L2 is defined as a consequence of the selection of f1,1,1(x1) function and its unknown coefficients. Thus, the next step consists of defining all the f2,j,t(x2) functions of the created nodes in the layer L2 (Step 2.1). To this end, the obtained y2,j,t values in L1 are related to each of the corresponding nodes in L2 (Figure 3b). In each node, the y2,j,t values are related to the x2 design variable by simple univariate regression functions, f2,j,t(x2). The procedure that must be followed for defining f2,j,t(x2) is the same used in the definition of the f1,1,1(x1) function, but defining the new regressions using x2 and y2,j,t values (Step 2.2). This process is repeated in all nodes of L2. As a result, all f2,j,1(x2) functions, all the yi+1,j,t vector values, and the L3 nodal structure N3,j,t are determined (Loop 2.1). This process is repeated throughout all the layers, until the last layer Li is reached, where i = p (Loop 2.2).
- In the last layer Lp, the xp design variable is related to yp,j,t by fp,j,t(xp) regression function in each Np,j,t node of the layer, repeating the same procedure. However, in this last layer, the unknown coefficients determined by the least square method in the fp,j,t(xp) functions are not used for building new nodes. These last coefficients, termed Cj,t, are constant terms that are used by the URBaM model as inputs for predicting new x test points.
3. Validation of the URBaM Modelling Method
3.1. Problem Classification and Surrogate Model Evaluation Metrics
3.1.1. Problem Non-Linearity Definition Metric
3.1.2. Surrogate Model Accuracy Metrics
3.2. Case Studies
- Case study 1: Stem dovetail dimensioning in a downstream slab valve family. Slab valves in downstream applications are used in the transportation of high volumes of fluid. The valve size ranges from 1” to 46” Db, and they may work from 300 lbs up to 2500 lbs pressure ratings. When the valve is in a closed position, the upstream chamber of the valve is pressurized, but not the downstream chamber. The pressure difference generates a normal force between the seat and the gate that must be borne by the valve opening mechanism in a maneuvering. One of the critical areas in the maneuvering mechanism is the stem dovetail, the failure of which would lead to the inoperability of the valve. Therefore, the aim of this case study is to generate a surrogate model to correctly size the stem dovetail. For this purpose the maximum equivalent Von Mises stress in the dovetail was set as the output variable of the analysis, while the dovetail arm thickness (h1), the dovetail arm cantilever distance (b1), and the thickness of the dovetail core (t1) were considered the design variables (see Case 1 in Table 5). Thus, a four-design-variable (h1, b1, t1, and Db) full factorial DOE with 832 design points was set for a single pressure rating, P, of 900 lbs. The non-linearity analysis resulted in R2 = 0.33, a high non-linearity (HNL)-level problem.
- Case study 2: Structural bolt dimensioning in a flanged joint of a high-pressure split body ball valve family. The body of this type of valves is divided into two parts that are assembled by means of a flanged joint. The analytical equations correctly size the bolting so as to bear the tensile loads resulting from preloading and the internal high pressure during loading stage. This pressure can range from 5000 psi to 15,000 psi, in valve sizes that range from 1 13/16″ to 16 3/4″. However, an inadequate flange dimensioning may cause it to deflect and consequently couple the tensile loads in the bolting with a bending effect that may compromise the integrity of the joint. Therefore, the aim of this case study is to generate a surrogate model to correctly dimension the flange and the bolting preventing excessive flange deflections. For this purpose the maximum deflection angle in the flange was set as the output variable of the analysis, while the bolt metric (Dm), flange thickness (h2), and cantilever distance of the flange (b2) were considered the design variables (see Case 2 in Table 5). Thus, a four-design-variable (Dm, h2, b2, and Db) full factorial DOE with 320 design points was set for a single pressure rating, P, of 5000 psi. The non-linearity analysis classified the present problem as an HNL-level case, due to the value of the coefficient of determination R2 = 0.43.
- Case study 3: Slab dovetail dimensioning in a downstream slab valve family. This case study corresponds to the same valve family as case study 1. However, in this case, the female dovetail of the slab is dimensioned instead of the stem dovetail. Again, the maximum equivalent Von Mises stress in the dovetail was set as the output variable of the analysis, while the dovetail arm thickness (h3), the dovetail arm cantilever distance (b3), and the lateral thickness of the dovetail (t3) were considered the design variables (see Case 3 in Table 5). Thus, a four-design-variable (h3, b3, t3, and Db) full factorial DOE with 832 design points was set for a single pressure rating P of 300 lbs. The non-linearity analysis classified the present problem as a medium non-linearity (MNL)-level case, due to the value of the coefficient of determination R2 = 0.62.
- Case study 4: Stem–ball joint dimensioning for a high-pressure ball valve family. As in case study 2, due to pressure difference between the upstream chamber and downstream chamber, high friction forces appear in this case between the ball and the sealing seats. In addition, the resultant forces may be extremely high as internal pressure may rise up to 5000–15,000 psi. The friction force combined with the ball diameter generates a resistance torque that has to be borne by the maneuvering mechanism. In particular, one of the critical areas in this mechanism is the trunnion stem–ball joint. Therefore, the aim of the present case study is to generate a surrogate model to correctly dimension the stem–ball joint. For this purpose, the maximum equivalent Von Misses stress was set as the output variable of the analysis, while the trunnion diameter (Dt), the groove height (h4), and groove width (t4) were considered the design variables (see Case 4 in Table 5). Thus, a three-design-variable (Dt, h4, and t4) full factorial DOE with 64 design points was set for a 5000 psi P pressure rating and 2 9/16″ Db valve sizes. The non-linearity analysis classified the present problem as a MNL level case, due to the value of the coefficient of determination, R2 = 0.76.
- Case study 5: Flange dimensioning in a bonnet to body joint for a high-pressure ball valve family. This case study is similar to case study 2. However, as commercial valve actuators for valve maneuvering are assembled in the bonnet, bolting diameter and positioning cannot vary in the present case. Therefore, the aim of this case study is to generate a surrogate model to correctly dimension the flange thickness to prevent undesired bending deflections. For this purpose, the deflection angle was set as the output variable of the analysis, while the flange thickness (h5) and the cantilever distance of the bonnet (b5), together with the size of the valve Db were considered the design variables (see Case 5 in Table 5). Thus, a three-design-variable (h5, b5, and Db) full factorial DOE with 112 design points was set for a single pressure rating P of 5000 psi. The non-linearity analysis classified the present problem as a low non-linearity (LNL)-level case, due to the value of the coefficient of determination R2 = 0.85.
- Case study 6: Gate–Seat dimensioning for a high-pressure slab valve family. This case study corresponds to the same valve type defined in Case 1, but for highly pressurized and subsea slab valve families. The present case study analyzes the closure mechanism between the slab and the seat of a valve, where internal sealing must be ensured when the valve is pressurized (up to 5000–15,000 psi) in a closed position for valve sizes that range from 1 13/16″ to 7 1/16″. For this purpose, the contact pressure between the gate and the slab was set as the output variable of the analysis, while the contact thickness of the seat (t6), together with the size of the valve Db were considered the design variables (see Case 6 in Table 5). Thus, a two-design-variable full factorial DOE with 16 design points was set for a single pressure rating P of 15,000 psi. The non-linearity analysis classified the present problem as an LNL-level case, due to the value of the coefficient of determination R2 = 0.96.Table 5. Selected case studies: studied area, output variable of interest, design variables, and non-linearity level are detailed.Table 5. Selected case studies: studied area, output variable of interest, design variables, and non-linearity level are detailed.
Case Valve Type Studied Area Output Variable R2 Non-Linearity Level 1
Downstream Slab Valve
Stem dovetailDovetail equivalent Von Mises stress 0.33 High non-linearity (HNL) 2
High-Pressure Ball Valve
Split body flanged jointSplit Body flange bending angle 0.43 HNL 3
Downstream Slab Valve
Slab dovetailDovetail equivalent Von Mises stress 0.62 Medium non-linearity (MNL) 4
High-Pressure Ball Valve
Stem–ball jointVon Mises equivalent stress 0.76 MNL 5
High-Pressure Subsea Ball Valve
Bonnet flanged jointBonnet flange bending angle 0.85 Low non-linearity (LNL) 6
High-Pressure Slab Valve
Gate–Seat closure mechanismGate–Seat contact pressure 0.96 LNL
3.3. Configuration of Surrogate Models
- Second-Order Polynomial: From all the possible polynomial model configurations, the full second-order polynomial model, which considers all the possible interactions between design variables, is the most widely used polynomial model configuration [53]. Therefore, this configuration was selected.
- Artificial Neural Networks (ANNs): One-hidden-layer-based ANN models are the most widely used [48]. However, there is not a specified rule for defining the optimum number of nodes of the hidden layer. Thus, three arbitrary ANN configurations were selected: (i) 3 nodes, (ii) 5 nodes, and (iii) 10 nodes.
- Radial Basis Function (RBF): The RBF model can be configured according to several types of base functions: linear, cubic, multiquadratic, inverse multiquadratic, thin plate spline, and Gaussian. However, several authors highlight the potential of multiquadratic and inverse multiquadratic functions for building accurate RBF models [54,55,56]. Moreover, the effect of the shape parameter c, which governs the RBF functions, was studied by different authors. Thus, E. Acar and M. Rais-Rohani [41] proposed a c = 1 configuration, while M. J. Colaço et al. [57] recommended a c = 1/N configuration, where N corresponds to the number of design points. Therefore, four configurations of the RBF modelling technique were selected: (i) multiquadratic with c = 1, (ii) multiquadratic with c = 1/N, (iii) inverse multiquadratic with c = 1, and (iv) inverse multiquadratic with c = 1/N.
- Kriging: From all possible Kriging model configurations, Universal Kriging is one of the most widely used due to its high potential [24]. In particular, Kriging models are commonly configured using a Gaussian correlation function with a pu exponent coefficient of 2 [22,48,58]. In addition, the correlation parameter θu can be defined as a constant or variable unknown term for each design variable. Simpson et al. [18] reported that fixing constant the correlation parameter sufficiently good results can be obtained. However, defining a variable correlation parameter is more common than fixing it as a constant. Therefore, in the present work, two Universal Kriging configurations were selected: (i) constant correlation parameter and (ii) variable correlation parameter.
- Multivariate Adaptive Regression Splines (MARS): The main characteristic of the MARS modelling method developed by J. H. Friedman [31] is that the modelled domain is divided into different subdomains by means of knots. Each subdomain can be modelled by linear or cubic piecewise functions. According to [59], the use of a cubic function configuration provides more accurate results for non-noisy data. In the current work, the maximum number of subdomains, Maxsub, was limited according to [60]: the minimum value between 200 and max(20;2k), where k is the number of design variables. In addition, as FEM results are considered as non-noisy data, the MARS model was configured based on cubic piecewise functions.
- Random Forest (RF): In the current work, each tree is built with 1/3 of the total design variables (NDV) selected randomly according to [61], the number of leaves of each tree NL ≥ 1 [34], the minimum number of design points to split each node in the tree NS ≥ 5 [61], and a standard deviation threshold of 5% in the model response to split a node [62]. Different numbers of trees were evaluated to ensure that the predictions of the RF models converged to a stable value. Specifically, each case study was checked for 100, 200, and 500 trees. It was found that the results converged over 200 trees in all the studied cases. Therefore, the results for 500 trees were used as reference for validation purposes.
- Support Vector Machine Regression (SVMR): From all the possible SVMR configurations, the ε-insensitive model, derived from the machine learning field, is typically used in numerical regression problems [28,29]. In particular, SVMR models that use linear loss functions and Gaussian kernels for non-linear transformations are among the most widely employed in the literature. Therefore, this configuration was selected.
- Standalone Ensemble model based on Penalized Predictive Score weighting Genetic Algorithm (PPS-GA): According to M. B. Salem and L. Tomaso [38], the ensemble models based on Genetic Algorithm (GA) techniques together with a Penalized Predictive Score (PPS) weighting criterion are among the models that present the highest accuracy. In particular, the weighting parameter values determined by the PPS method are obtained minimizing (i) the RMSE of the design points, (ii) the k-fold cross-validation PRESS value, and (iii) the overfitting penalization. The ensemble models are typically built based on second-order polynomial, Kriging, RBF, Gaussian Process, SVMR, or Moving Least Square [22] standalone models [37,38,41,42]. Therefore, in the present work, different configurations of second-order polynomial, Kriging, SVMR, and Moving Least Square standalone models, detailed in Table 6, are considered to build ensemble models.
4. Results and Discussion
4.1. URBaM Method Evaluation
- In Cases 1 (HNL), 3, and 4 (MNL), where the equivalent stress is evaluated, as well as in in Case 6 (LNL), where the contact pressure is evaluated, most of the validation points present an error below ±10%, independent of the non-linearity level. In particular, all points in Case 6 are below ±10% error range. However, multiple validation points present an error in the ±10–20% range in Cases 1, 3, and 4. The 20% error limit was surpassed by a few validation points in these cases. In general, the maximum errors occur for lower output variable magnitudes, as they are more sensitive to any deviation in percentages.
- Most of the validation points in Case 2 (HNL) were below the ±10% error curve, and in Case 5 (LNL) below the ±20% error curve. However, in these cases, the evaluated output variable corresponds to the deflection angle in degrees, which leads to very low-magnitude values (<2°). Therefore, the predictions are more sensitive to any deviation, and they provide higher error percentage values.Figure 6. Comparison of the FEM and the URBaM prediction results of the validation points.
- The MAPE value remained stable near the 10% value in all cases except in Case 2 (HNL). In that case, the error reached the 20% value. This error percentage increase is attributed to the fact that several output values are close to zero.
- The NRMSE metric remained stable close to 0.25 for all case studies.
4.2. Comparison with Well-Known Surrogate Model Configurations
4.2.1. Case 1 (HNL): Stem Dovetail Dimensioning in a Downstream Slab Valve Family
- The models that present an MAPE below 10% are the PPS-GA (5.4%), RBF_MQ_1/N (5.7%), SVMR (7.2%), UKRI_Const (7.6%), POL2 (8.6%), and URBaM (9.5%) models. In addition, the ANN_5 (12.4%), RF (13.7%), ANN_10 (18%), ANN_3 (18.1%), and MARS (19%) models present errors below the 20% criteria.
- The models that present an NRMSE below 0.5 are the PPS-GA (0.2), RBF_MQ_1/N (0.2), UKRI_Const (0.2), URBaM (0.3), SVMR (0.3), POL2 (0.3), RF (0.4), ANN_5 (0.4), and MARS (0.5) models.
- The models that report an RMAE below 1 are the PPS-GA (0.6), RBF_MQ_1/N (0.7), URBaM (1), and UKRI_Const (1) models.
4.2.2. Case 2 (HNL): Structural Bolting Dimensioning in a Flanged Joint of a High-Pressure Split Body Ball Valve Family
- All the analyzed models present an MAPE value higher than 20%. As previously stated, this fact is attributed to the magnitude of the output variable of Case 2, which is close to zero. The models that present MAPE values closest to the 20% limit are ANN_10 (20.2%) and URBaM (21.4%).
- The models that present an NRMSE below 0.5 are the URBaM (0.1), ANN_10 (0.1), SVMR (0.2), ANN_5 (0.2), RBF_MQ_1/N (0.2), PPS-GA (0.3), UKRI_Const (0.3), ANN_3 (0.3), MARS (0.3), POL2 (0.4), and RF (0.4) models.
- The models that report an RMAE below 1 are the ANN_10 (0.5), URBaM (0.6), UKRI_Const (0.8), PPS-GA (0.9), and RBF_MQ_1/N (0.9) models.
4.2.3. Case 3 (MNL): Slab Dovetail Dimensioning in a Downstream Slab Valve Family
- The models that present an MAPE below 10% are the RBF_MQ_1/N (3.4%) and SVMR (7%) models. However, several models are close to that value: the RF (10.3%), URBaM (10.6%), POL2 (11.4%), MARS (12%), and UKRI_Const (12.1%) models.
- The models that present an NRMSE below 0.5 are the RBF_MQ_1/N (0.1), URBaM (0.2), POL2 (0.2), UKRI_Const (0.2), SVMR (0.2), MARS (0.2), RF (0.2), ANN_3 (0.4), PPS-GA (0.5), and ANN_5 (0.5) models.
- The RMAE of the RBF_MQ_1/N (0.3) and URBaM (0.5) models are below 1.
4.2.4. Case 4 (MNL): Stem–Ball Joint Dimensioning for a High-Pressure Ball Valve Family
- All the analyzed models have an MAPE value higher than 10%. The models that present an error value within the 10–20% domain are the POL2 (11.3%), URBaM (12.9%), UKRI_Const (13.9%), UKRI_Var (14%), ANN_10 (14.2%), RBF_MQ_1/N (14.6%), RBF_MQ_1 (14.7%), PPS-GA (14.8%), RBF_IMQ_1 (14.9%), SVMR (15%), MARS (17.5%), ANN_5 (19.2%), and ANN_3 (19.7%) models.
- The NRMSE of all the analyzed models is below 0.5.
- The models that present an RMAE below 1 are the URBaM (0.5), POL2 (0.6), SVMR (0.7), PPS-GA (0.7), RBF_MQ_1 (0.7), RBF_MQ_1/N (0.7), RBF_IMQ_1 (0.7), UKRI_Var (0.8), UKRI_Const (0.8), RBF_IMQ_1/N (0.8), MARS (0.8), and ANN_3 (1) models.
4.2.5. Case 5 (LNL): Flange Dimensioning in a Bonnet to Body Joint for a High-Pressure Ball Valve Family
- The models that present an MAPE below 10% are the RBF_MQ_1/N (2.8%), UKRI_Const (2.9%), PPS-GA (3%), UKRI_Var (3.2%), SVMR (3.9%), POL2 (6.2%), MARS (6.2%), and URBaM (7.3%) models. However, several models are close to that value: RBF_IMQ_1/N (11.2%), RF (12%), and ANN_5 (12.8%).
- The models that present an NRMSE below 0.5 are the PPS-GA (0.1), UKRI_Var (0.2), UKRI_Const (0.2), SVMR (0.2), RBF_MQ_1/N (0.2), MARS (0.2), URBaM (0.3), POL2 (0.3), RBF_IMQ_1/N (0.4), ANN_10 (0.5), and RF (0.5) models.
- The models that report an RMAE below 1 are the PPS-GA (0.4), UKRI_Var (0.4), UKRI_Const (0.5), SVMR (0.6), RBF_MQ_1/N (0.7), MARS (0.7), and URBaM (0.9) models.
4.2.6. Case 6 (LNL): Gate–Seat Dimensioning for a High-Pressure Slab Valve Family
- The models that present an MAPE below 10% are the UKRI_Var (1.1%), UKRI_Const (1.1%), POL2 (1.3%), URBaM (1.5%), PPS-GA (1.7%), ANN_3 (1.7%), te ANN_5 (1.7%), MARS (1.8%), ANN_10 (2.9%), SVMR (3%), RF (3.6%), and RBF_IMQ_1/N (4%) models.
- The models that present an NRMSE below 0.5 are the UKRI_Var (0.1), UKRI_Const (0.1), POL2 (0.1), URBaM (0.2), PPS-GA (0.2), ANN_3 (0.2), ANN_5 (0.2), MARS (0.2), ANN_10 (0.4), SVMR (0.4), and RF (0.5) models. The NRMSE of the RBF_IMQ_1/N (0.6) model is also close to 0.5.
- The models that present an RMAE below 1 are the POL2 (0.3), UKRI_Var (0.4), UKRI_Const (0.4), ANN_3 (0.4), URBaM (0.5), PPS-GA (0.5), ANN_5 (0.5), MARS (0.5), ANN_10 (0.7), SVMR (0.8), and RF (1) models. The RMAE of the RBF_IMQ_1/N (1.2) model is also close to 1.
4.2.7. Comparison Overview
5. Conclusions
- The URBaM model adapts with average MAPE, NRMSE, and RMAE errors of 10.5%, 0.22, and 0.66, respectively, with a case-to-case standard deviation of 6%, 0.07, and 0.2, demonstrating good capability in adapting to different non-linearity levels with a single configuration.
- The URBaM model was not the optimum choice in all cases but resulted in being the unique model capable of accurately representing the six analyzed cases of different non-linearity levels. The model was always close to the optimum model with an average deviations MAPE, NRMSE, and RMAE of 3.1%, 0.1, and 0.23, respectively.
- Allows for efficiently determining reliable scaling rules for mechanical product families. The cumbersome and time-consuming optimum surrogate model type and configuration election process is not required with the URBaM model in the process of determining product family scaling rules, where non-linearity level is unknown a priori.
- The result accuracy level shown by the URBaM model allows for reducing the design–analysis iterations close to zero for a new family member. Consequently, the design delivery time and the cost of the product are minimized, which is crucial in product delivery strategies where product dimensions are stablished by the customer, as in Engineer To Order (ETO) strategies.
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Abbreviations
ANN | Artificial Neural Network |
CAE | Computer-Aided Engineering |
DOE | Design Of Experiment |
ETO | Engineer To Order |
FEM | Finite Element Method |
FVM | Finite Volume Method |
HNL | High Non-Linearity |
LNL | Low Non-Linearity |
MAPE | Mean Absolute Percentage Error |
MARS | Multivariate Adaptive Regression Splines |
MLS | Moving Least Square |
MNL | Medium Non-Linearity |
NRMSE | Normalized Root Mean Square Error |
POL2 | Second-Order Polynomial model |
PPS-GA | Penalized Predictive Score Genetic Algorithm |
QL | Quasi-Linear |
RBF | Radial Basis Function |
RF | Random Forest |
RMAE | Relative Maximum Absolute Error |
SVMR | Support Vector Machine Regression |
URBaM | Univariate Regression Based Multivariate |
Appendix A. Fundamentals of Surrogate Modelling Techniques
Appendix A.1. The Polynomial Model
Appendix A.2. The Artificial Neural Network (ANN) Model
Appendix A.3. The Radial Basis Function (RBF) Model
Appendix A.4. The Kriging Model
Appendix A.5. The Multivariate Adaptive Regression Splines (MARS) Model
Appendix A.6. The Random Forest (RF) Model
Appendix A.7. The Support Vector Machine Regression (SVMR) Model
Appendix A.8. The Standalone Weighted Ensemble Model
References
- Jiao, J.R.; Simpson, T.W.; Siddique, Z. Product family design and platform-based product development: A state-of-the-art review. J. Intell. Manuf. 2007, 18, 5–29. [Google Scholar] [CrossRef]
- Simpson, T.W.; Jiao, J.; Siddique, Z.; Hölttä-Otto, K. Advances in Product Family and Product Platform Design; Springer: New York, NY, USA, 2014; Volume 1. [Google Scholar]
- Simpson, T.W. Product platform design and customization: Status and promise. Ai Edam 2004, 18, 3–20. [Google Scholar] [CrossRef]
- Gao, F.; Xiao, G.; Simpson, T.W. Module-scale-based product platform planning. Res. Eng. Des. 2009, 20, 129–141. [Google Scholar] [CrossRef]
- Ma, J.; Kim, H.M. Product family architecture design with predictive, data-driven product family design method. Res. Eng. Des. 2016, 27, 5–21. [Google Scholar] [CrossRef]
- Nolan, D.C.; Tierney, C.M.; Armstrong, C.G.; Robinson, T.T. Defining simulation intent. Comput. -Aided Des. 2015, 59, 50–63. [Google Scholar] [CrossRef]
- Boussuge, F.; Tierney, C.M.; Vilmart, H.; Robinson, T.T.; Armstrong, C.G.; Nolan, D.C.; Léon, J.-C.; Ulliana, F. Capturing simulation intent in an ontology: CAD and CAE integration application. J. Eng. Des. 2019, 30, 688–725. [Google Scholar] [CrossRef]
- Chai, K.-H.; Wang, Q.; Song, M.; Halman, J.I.; Brombacher, A.C. Understanding competencies in platform-based product development: Antecedents and outcomes. J. Prod. Innov. Manag. 2012, 29, 452–472. [Google Scholar] [CrossRef]
- Craig, R.R., Jr.; Taleff, E.M. Mechanics of Materials; John Wiley & Sons: Hoboken, NJ, USA, 2020. [Google Scholar]
- Wang, G.G.; Shan, S. Review of metamodeling techniques in support of engineering design optimization. In Proceedings of the International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Philadelphia, PA, USA, 10–13 September 2006. [Google Scholar]
- Forrester, A.; Sobester, A.; Keane, A. Engineering Design via Surrogate Modelling: A Practical Guide; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
- Banyay, G.A.; Smith, S.D.; Young, J.S. Sensitivity Analysis of a Nuclear Reactor System Finite Element Model. In Proceedings of the ASME 2018 Verification and Validation Symposium, Minneapolis, MN, USA, 16–18 May 2018. [Google Scholar]
- Kleijnen, J.P. Design and analysis of simulation experiments. In Proceedings of the International Workshop on Simulation, Bergeggi, Italy, 21–23 September 2015; pp. 3–22. [Google Scholar]
- Habashneh, M.; Rad, M.M. Optimizing structural topology design through consideration of fatigue crack propagation. Comput. Methods Appl. Mech. Eng. 2024, 419, 116629. [Google Scholar] [CrossRef]
- Crombecq, K.; Laermans, E.; Dhaene, T. Efficient space-filling and non-collapsing sequential design strategies for simulation-based modeling. Eur. J. Oper. Res. 2011, 214, 683–696. [Google Scholar] [CrossRef]
- Garud, S.S.; Karimi, I.A.; Kraft, M. Design of computer experiments: A review. Comput. Chem. Eng. 2017, 106, 71–95. [Google Scholar] [CrossRef]
- Alizadeh, R.; Allen, J.K.; Mistree, F. Managing computational complexity using surrogate models: A critical review. Res. Eng. Des. 2020, 31, 275–298. [Google Scholar] [CrossRef]
- Simpson, T.W.; Poplinski, J.; Koch, P.N.; Allen, J.K. Metamodels for computer-based engineering design: Survey and recommendations. Eng. Comput. 2001, 17, 129–150. [Google Scholar] [CrossRef]
- Mao, J.; Hu, D.; Li, D.; Wang, R.; Song, J. Novel adaptive surrogate model based on LRPIM for probabilistic analysis of turbine disc. Aerosp. Sci. Technol. 2017, 70, 76–87. [Google Scholar] [CrossRef]
- Fang, H.; Horstemeyer, M.F. Global response approximation with radial basis functions. Eng. Optim. 2006, 38, 407–424. [Google Scholar] [CrossRef]
- Song, X.; Lv, L.; Sun, W.; Zhang, J. A radial basis function-based multi-fidelity surrogate model: Exploring correlation between high-fidelity and low-fidelity models. Struct. Multidiscip. Optim. 2019, 60, 965–981. [Google Scholar] [CrossRef]
- Forrester, A.I.; Keane, A.J. Recent advances in surrogate-based optimization. Prog. Aerosp. Sci. 2009, 45, 50–79. [Google Scholar] [CrossRef]
- Sacks, J.; Welch, W.J.; Mitchell, T.J.; Wynn, H.P. Design and analysis of computer experiments. Stat. Sci. 1989, 4, 409–423. [Google Scholar] [CrossRef]
- Mukhopadhyay, T.; Chakraborty, S.; Dey, S.; Adhikari, S.; Chowdhury, R. A critical assessment of Kriging model variants for high-fidelity uncertainty quantification in dynamics of composite shells. Arch. Comput. Methods Eng. 2017, 24, 495–518. [Google Scholar] [CrossRef]
- Qian, J.; Yi, J.; Cheng, Y.; Liu, J.; Zhou, Q. A sequential constraints updating approach for Kriging surrogate model-assisted engineering optimization design problem. Eng. Comput. 2020, 36, 993–1009. [Google Scholar] [CrossRef]
- Rodriguez-Galiano, V.; Sanchez-Castillo, M.; Chica-Olmo, M.; Chica-Rivas, M. Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol. Rev. 2015, 71, 804–818. [Google Scholar] [CrossRef]
- Pavlícek, K.; Kotlan, V.; Doležel, I. Applicability and comparison of surrogate techniques for modeling of selected heating problems. Comput. Math. Appl. 2019, 78, 2897–2910. [Google Scholar] [CrossRef]
- Awad, M.; Khanna, R. Support vector regression. In Efficient Learning Machines; Springer: Berlin/Heidelberg, Germany, 2015; pp. 67–80. [Google Scholar]
- Schlkopf, B.; Smola, A.J.; Bach, F. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Drucker, H.; Burges, C.J.; Kaufman, L.; Smola, A.J.; Vapnik, V. Support vector regression machines. In Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA, 1–6 December 1997; pp. 155–161. [Google Scholar]
- Friedman, J.H. Multivariate adaptive regression splines. Ann. Stat. 1991, 19, 1–67. [Google Scholar] [CrossRef]
- Crino, S.; Brown, D.E. Global optimization with multivariate adaptive regression splines. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2007, 37, 333–340. [Google Scholar] [CrossRef]
- Cutler, A.; Cutler, D.R.; Stevens, J.R. Random forests. In Ensemble Machine Learning; Springer: Berlin/Heidelberg, Germany, 2012; pp. 157–175. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Dasari, S.K.; Cheddad, A.; Andersson, P. Random forest surrogate models to support design space exploration in aerospace use-case. In Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations, Hersonissos, Greece, 24–26 May 2019; pp. 532–544. [Google Scholar]
- Müller, J.; Shoemaker, C.A. Influence of ensemble surrogate models and sampling strategy on the solution quality of algorithms for computationally expensive black-box global optimization problems. J. Glob. Optim. 2014, 60, 123–144. [Google Scholar] [CrossRef]
- Goel, T.; Haftka, R.T.; Shyy, W.; Queipo, N.V. Ensemble of surrogates. Struct. Multidiscip. Optim. 2007, 33, 199–216. [Google Scholar] [CrossRef]
- Salem, M.B.; Tomaso, L. Automatic selection for general surrogate models. Struct. Multidiscip. Optim. 2018, 58, 719–734. [Google Scholar] [CrossRef]
- Acar, E. Various approaches for constructing an ensemble of metamodels using local measures. Struct. Multidiscip. Optim. 2010, 42, 879–896. [Google Scholar] [CrossRef]
- Gorissen, D.; De Tommasi, L.; Croon, J.; Dhaene, D. Automatic model type selection with heterogeneous evolution: An application to rf circuit block modeling. In Proceedings of the 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–6 June 2008; pp. 989–996. [Google Scholar]
- Acar, E.; Rais-Rohani, M. Ensemble of metamodels with optimized weight factors. Struct. Multidiscip. Optim. 2009, 37, 279–294. [Google Scholar] [CrossRef]
- Viana, F.A.; Haftka, R.T.; Steffen, V. Multiple surrogates: How cross-validation errors can help us to obtain the best predictor. Struct. Multidiscip. Optim. 2009, 39, 439–457. [Google Scholar] [CrossRef]
- Joseph, V.R.; Hung, Y.; Sudjianto, A. Blind kriging: A new method for developing metamodels. J. Mech. Des. 2008, 130, 031102. [Google Scholar] [CrossRef]
- Ghiasi, R.; Ghasemi, M.R.; Noori, M. Comparative studies of metamodeling and AI-Based techniques in damage detection of structures. Adv. Eng. Softw. 2018, 125, 101–112. [Google Scholar] [CrossRef]
- Banyay, G. Surrogate Modeling and Global Sensitivity Analysis Towards Efficient Simulation of Nuclear Reactor Stochastic Dynamics. Ph.D. Thesis, University of Pittsburgh, Pittsburgh, PA, USA, 2019. [Google Scholar]
- Lu, H.; Li, Q.; Pan, T.; Agarwal, R.K. An adaptive region segmentation combining surrogate model applied to correlate design variables and performance parameters in a transonic axial compressor. Eng. Comput. 2021, 37, 275–291. [Google Scholar] [CrossRef]
- Jin, R.; Chen, W.; Simpson, T.W. Comparative studies of metamodelling techniques under multiple modelling criteria. Struct. Multidiscip. Optim. 2001, 23, 1–13. [Google Scholar] [CrossRef]
- Chen, V.C.; Tsui, K.-L.; Barton, R.R.; Meckesheimer, M. A review on design, modeling and applications of computer experiments. IIE Trans. 2006, 38, 273–291. [Google Scholar] [CrossRef]
- Jia, L.; Alizadeh, R.; Hao, J.; Wang, G.; Allen, J.K.; Mistree, F. A rule-based method for automated surrogate model selection. Adv. Eng. Inform. 2020, 45, 101123. [Google Scholar] [CrossRef]
- Williams, B.; Cremaschi, S. Selection of surrogate modeling techniques for surface approximation and surrogate-based optimization. Chem. Eng. Res. Des. 2021, 170, 76–89. [Google Scholar] [CrossRef]
- Villa-Vialaneix, N.; Follador, M.; Ratto, M.; Leip, A. A comparison of eight metamodeling techniques for the simulation of N2O fluxes and N leaching from corn crops. Environ. Model. Softw. 2012, 34, 51–66. [Google Scholar] [CrossRef]
- Chen, H.; Loeppky, J.L.; Sacks, J.; Welch, W.J. Analysis methods for computer experiments: How to assess and what counts? Stat. Sci. 2016, 31, 40–60. [Google Scholar] [CrossRef]
- Myers, R.H.; Montgomery, D.C.; Anderson-Cook, C.M. Response Surface Methodology: Process and Product Optimization Using Designed Experiments; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
- Aijazi, A.N.; Glicksman, L.R. Comparison of regression techiques for surrogate models of building energy performance. Proc. SimBuild 2016, 6, 327–334. [Google Scholar]
- Ozcanan, S.; Atahan, A.O. RBF surrogate model and EN1317 collision safety-based optimization of two guardrails. Struct. Multidiscip. Optim. 2019, 60, 343–362. [Google Scholar] [CrossRef]
- Fang, H.; Rais-Rohani, M.; Liu, Z.; Horstemeyer, M. A comparative study of metamodeling methods for multiobjective crashworthiness optimization. Comput. Struct. 2005, 83, 2121–2136. [Google Scholar] [CrossRef]
- Colaço, M.J.; Dulikravich, G.S.; Sahoo, D. A comparison of two methods for fitting high dimensional response surfaces. In Proceedings of the Inverse Problems, Design and Optimization Symposium, Miami, FL, USA, 16–18 April 2007; pp. 16–18. [Google Scholar]
- Levy, S.; Steinberg, D.M. Computer experiments: A review. AStA Adv. Stat. Anal. 2010, 94, 311–324. [Google Scholar] [CrossRef]
- Jekabsons, G. ARESLab: Adaptive Regression Splines Toolbox for Matlab/Octave. Available online: http://www.cs.rtu.lv/jekabsons/regression.html (accessed on 25 August 2025).
- Milborrow, S.; Hastie, T.; Tibshirani, R.; Miller, A.; Lumley, T. Earth: Multivariate Adaptive Regression Splines. Available online: https://cran.r-project.org/web/packages/earth/index.html (accessed on 25 August 2025).
- Jekabsons, G. M5PrimeLab: M5 Regression Tree, Model Tree, and Tree Ensemble Toolbox for Matlab/Octave. Available online: http://www.cs.rtu.lv/jekabsons/regression.html (accessed on 25 August 2025).
- Wang, Y.; Witten, I.H. Induction of Model Trees for Predicting Continuous Classes; University of Waikato: Hamilton, New Zealand, 1996. [Google Scholar]
- Jekabsons, G. Radial Basis Function Interpolation Toolbox for Matlab/Octave. Available online: http://www.cs.rtu.lv/jekabsons/regression.html (accessed on 25 August 2025).
- Viana, F.A.; Gogu, C.; Goel, T. Surrogate modeling: Tricks that endured the test of time and some recent developments. Struct. Multidiscip. Optim. 2021, 64, 2881–2908. [Google Scholar] [CrossRef]
- Zhai, J.; Boukouvala, F. Nonlinear variable selection algorithms for surrogate modeling. AIChE J. 2019, 65, e16601. [Google Scholar] [CrossRef]
- Johnson, R.T.; Montgomery, D.C.; Jones, B.; Parker, P.A. Comparing computer experiments for fitting high-order polynomial metamodels. J. Qual. Technol. 2010, 42, 86–102. [Google Scholar] [CrossRef]
- Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice; OTexts: Melbourne, Australia, 2018. [Google Scholar]
- Bakin, S.; Hegland, M.; Osborne, M.R. Parallel MARS algorithm based on B-splines. Comput. Stat. 2000, 15, 463–484. [Google Scholar] [CrossRef]
Ref. | Compared Models | Key Findings |
---|---|---|
[43] | Ordinary Kriging, Universal Kriging, Linear/Quadratic trend-based Universal Kriging, and Blind Kriging. | Blind Kriging provided most accurate predictions. |
[24] | Blind Kriging, Ordinary Kriging, Co-Kriging, and Universal Kriging with pseudo likelihood estimations. | Universal Kriging with maximum likelihood provided most accurate results. |
[39] | Polynomial, Kriging, RBF, and Standalone Weighted Ensemble across 8 case studies. | The Standalone Weighted Ensemble model was most accurate in all eight cases. Kriging and RBF showed comparable accuracy in some cases. |
[44] | Kriging, RBF, RF, ANN, MARS, and SVMR. | SVMR provided most accurate results. |
[38] | Kriging (Matérn basis and linear trend), SVMR (Gaussian kernel), full second order Polynomial, Moving Least Square (MLS), and PPS-Optimal and PPS-GA Standalone Weighted Ensembles across 15 case studies. | PPS-Optimal and PPS-GA were most accurate in 9 out of 15 cases. Kriging ranked the same as PPS models in 3 cases and was the most accurate in 3 of the cases. Kriging outperformed SVMR in 13 of the cases. |
[45] | PPS-GA-Standalone Weighted Ensemble and Kriging. | PPS-GA provided most accurate predictions. |
[27] | RBF, RF, and ANN. | RBF and RF slightly outperformed ANN. |
Ref. | Performed Analysis | Main Conclusions |
---|---|---|
[47] | Compared MARS, RBF, Kriging, and Polynomial models across 14 case studies, classified as linear, slightly non-linear, and highly non-linear cases. | RBF was the most accurate for highly non-linear cases, consistent with findings in [20,41]. |
MARS and Kriging require a high number of design points to accurately represent highly non-linear problems, while [41] concludes Kriging is more suitable for slightly non-linear and large problems. | ||
Polynomial models are particularly suitable for linear or quasi-linear cases, also noted by [20,48]. | ||
[49] | Proposed the AutoSM model, which automatically selects the optimal model among Polynomial, Kriging, MARS, and RBF model configurations. It was tested on 10 benchmark functions and 5 test problems, considering problem non-linearity, scale, amount of data, and smoothness. | The work highlights the negative correlation between non-linearity and accuracy, with substantially higher errors in non-linear cases. |
[50] | Studied the accuracy of MARS, SVMR, ANN, GP, ALAMO, and RF models, focused on the number of design points in optimization problems. | ANN requires a large number of samples for accurate approximation, and RF and SVMR were less accurate regardless of data amount. However, Ref. [51] recommends RF and SVMR over Kriging and ANN, for cases with a high number of design points. |
Non-Linearity Level | R2 Range |
---|---|
High non-linearity (HNL) | 0 ≤ R2 ≤ 0.5 |
Medium non-linearity (MNL) | 0.5 < R2 ≤ 0.8 |
Low non-linearity (LNL) | 0.8 < R2 ≤ 0.98 |
Quasi-linear (QL) | 0.98 < R2 ≤ 1 |
Type | Configuration | Abbreviation | Code Source | ||
---|---|---|---|---|---|
URBaM | Regression function election criteria: R2 ≥ 0.95 and fewest number of unknown coefficients | URBaM | Excel© 365 VBA code | ||
Second-Order Polynomial | Regression type: Full interaction and quadratic | POL2 | ANSYS® 19.2 DesignXplorer™ | ||
ANN | Layer structure: 1 hidden layer | 3 nodes | ANN_3 | ANSYS® 19.2 DesignXplorer™ | |
5 nodes | ANN_5 | ||||
10 nodes | ANN_10 | ||||
RBF | Base function type: Multiquadratic | c = 1 | RBF_MQ_1 | MATLAB® 2023 [63] | |
c = 1/N | RBF_MQ_1/N | ||||
Base function type: Inverse multiquadratic | c = 1 | RBF_IMQ_1 | |||
c = 1/N | RBF_IMQ_1/N | ||||
Kriging | Trend function: Universal (polynomial) | Constant θu | UKRI_Const | ANSYS® 19.2 DesignXplorer™ | |
Kernel type: Gaussian pu = 2 | Variable θu | UKRI_Var | |||
MARS | Domain partition criteria:Maxsub = min (200, max (20, 2k)) Type of function: Cubic spline approximation | MARS | MATLAB® 2023 [59] | ||
Random Forest | Tree Characteristics:NL ≥ 1; NS ≥ 5; STDerror ≤ 5%; NDV: k/3 Characteristics of the assembly: Nº of trees: 500 | RF | MATLAB® 2023 [61] | ||
SVMR | Loss function type: ε-insensitive with linear loss function | SVMR | ANSYS® 19.2 DesignXplorer™ | ||
Non-linear transformation kernel: Gaussian | |||||
PPS-GA | Kriging | Trend function: Constant/Polynomial Kernel type: Gaussian/Cubic/Thin Plate Spline Kernel variation θu: Constant/variable | PPS-GA | ANSYS® 19.2 DesignXplorer™ | |
Polynomial | Regression types: Linear/quadratic/cross-quadratic | ||||
SVMR | Loss function type: Laplacian/ε-insensitve Kernel type: Linear/Gaussian/Sigmoidal | ||||
Moving Least Square (MLS) | Polynomial functions: Constant/linear/quadratic | ||||
Weight functions: Linear/Gaussian/Wendland |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Telleria, X.; Esnaola, J.A.; Ugarte, D.; Ezkurra, M.; Ulacia, I. URBaM: A Novel Surrogate Modelling Method to Determine Design Scaling Rules for Product Families. Appl. Sci. 2025, 15, 9573. https://doi.org/10.3390/app15179573
Telleria X, Esnaola JA, Ugarte D, Ezkurra M, Ulacia I. URBaM: A Novel Surrogate Modelling Method to Determine Design Scaling Rules for Product Families. Applied Sciences. 2025; 15(17):9573. https://doi.org/10.3390/app15179573
Chicago/Turabian StyleTelleria, Xuban, Jon Ander Esnaola, Done Ugarte, Mikel Ezkurra, and Ibai Ulacia. 2025. "URBaM: A Novel Surrogate Modelling Method to Determine Design Scaling Rules for Product Families" Applied Sciences 15, no. 17: 9573. https://doi.org/10.3390/app15179573
APA StyleTelleria, X., Esnaola, J. A., Ugarte, D., Ezkurra, M., & Ulacia, I. (2025). URBaM: A Novel Surrogate Modelling Method to Determine Design Scaling Rules for Product Families. Applied Sciences, 15(17), 9573. https://doi.org/10.3390/app15179573