A Study of the Response Surface Methodology Model with Regression Analysis in Three Fields of Engineering

Chen, Hsuan-Yu; Chen, Chiachung

doi:10.3390/asi8040099

Open AccessArticle

A Study of the Response Surface Methodology Model with Regression Analysis in Three Fields of Engineering

by

Hsuan-Yu Chen

¹ and

Chiachung Chen

^2,*

¹

Africa Industrial Research Center, National Chung Hsing University, Taichung 40227, Taiwan

²

Department of Bio-Industrial Mechatronics Engineering, National Chung Hsing University, Taichung 40227, Taiwan

^*

Author to whom correspondence should be addressed.

Appl. Syst. Innov. 2025, 8(4), 99; https://doi.org/10.3390/asi8040099

Submission received: 16 May 2025 / Revised: 6 July 2025 / Accepted: 16 July 2025 / Published: 21 July 2025

Download

Browse Figures

Versions Notes

Abstract

Researchers conduct experiments to discover factors influencing the experimental subjects, so the experimental design is essential. The response surface methodology (RSM) is a special experimental design used to evaluate factors significantly affecting a process and determine the optimal conditions for different factors. The relationship between response values and influencing factors is mainly established using regression analysis techniques. These equations are then used to generate contour and surface response plots to provide researchers with further insights. The impact of regression techniques on response surface methodology (RSM) model building has not been studied in detail. This study uses complete regression techniques to analyze sixteen datasets from the literature on semiconductor manufacturing, steel materials, and nanomaterials. Whether each variable significantly affected the response value was assessed using backward elimination and a t-test. The complete regression techniques used in this study included considering the significant influencing variables of the model, testing for normality and constant variance, using predictive performance criteria, and examining influential data points. The results of this study revealed some problems with model building in RSM studies in the literature from three engineering fields, including the direct use of complete equations without statistical testing, deletion of variables with p-values above a preset value without further examination, existence of non-normality and non-constant variance conditions of the dataset without testing, and presence of some influential data points without examination. Researchers should strengthen training in regression techniques to enhance the RSM model-building process.

Keywords:

response surface methodology; regression analysis; backward elimination; semiconductor manufacturing; steel materials; nanomaterials

1. Introduction

The results or responses of an engineering system or its process are often affected by many factors. Finding the optimal level of these factors is called optimization. In research evaluating the impact of multiple factors on one or more response values, the most common method to evaluate the most appropriate conditions of multiple factors is the response surface methodology (RSM) [1,2]. RSM is a special experimental design that can be used to establish process operations for new products, improve the utilization rate of operating processes, and reduce resource waste. Through the use of appropriate statistical techniques, it is possible to evaluate the factors that have a significant impact on the process and thus determine the optimal conditions for these factors [1,3].

The special feature of RSM in experimental design is that this method can test multiple factors with a small number of samples. After completing the data collection, regression analysis is used to establish the mathematical relationship between the system response value and its influencing factors. These models are then used to create graphs that illustrate the impact of various factors on the response. Researchers can use these linear or curvilinear distribution graphs to observe this relationship. Due to this unique graphical observation capability, the response surface methodology has been widely used in industry [1,2,3,4].

Two excellent textbooks introduce the RSM concept, two- and three-level factorial design, experimental design for fitting the response surface, and some special topics of RSM [1,2]. Bas and Boyaci [4] proposed an essential concept about RSM. Asoo et al. [3] introduced the historical background of RSM. These review papers related to engineering applications involved processing in machinability [5], manufacturing [6], biofuel [7,8], energy [9,10], agro-industrial processes [11], and polymers [12].

In an experimental design, an experimental system includes the response of this system and is combined with different levels of the influencing factor. These factors serve as the independent variables for the regression analysis. Experimental runs represent a series of tests for the experiment. The response or output of the experiments is used as the dependent variable for further analysis.

The relationship between the response (y) and the input factors (x₁, x₂, …, x_k) of the RSM can be expressed as

y_i = f(x₁, x₂,…, x_k) + ε_i

(1)

where y is the response of the RSM model and is called the dependent variable; x₁, x₂, …, and x_k are the influencing variables and are called independent variables in regression; k is the number of factors; and ε_i is the model’s error.

The mathematical model of the RSM model includes linear, interaction, and quadratic terms effects. If the process system involves three process variables, x₁, x₂, and x₃, the form of the RSM equation is

y = b_o + b₁x₁ + b₂x₂ + b₃x₃ + b₁₁x₁² + b₂₂x₂² + b₃₃x₃² + b₁₂x₁x₂ + b₁₃x₁x₃ + b₂₃x₂x₃ + ε

(2)

If four factors are used, x₁, x₂, x₃, and x₄, the RSM equation is

y = b_o + b₁x₁ + b₂x₂ + b₃x₃ + b₄x₄ + b₁₁x₁² + b₂₂x₂² + b₃₃x₃² + b₄₄x₄² + b₁₂x₁x₂ + b₁₃x₁x₃ + b₁₄x₁x₄
+ b₂₃x₂x₃ + b₂₄x₂x₄ + b₃₄x₃x₄ + ε

(3)

If the RSM equation includes all variables, as in Equations (2) and (3), it is called a full RSM equation.

The relationship between the response variable, the independent variable, and the dependent variable can be established through regression analysis techniques. This regression model is an empirical equation, not a theoretical equation.

The major experimental methodologies used for the response surface methodology are the full factorial design (CFD), Box–Behnken design (BBD), and central composite design (CCD) [1,6,13,14,15,16]. The number of studies with each experimental methodology, as determined using Google Scholar, is listed in Table 1.

Table 1 presents the number of papers employing the three statistical experimental techniques from 2019 to 2024, as determined by a Google Scholar search. It can be seen that the number of papers using central composite design is the largest, followed by full factorial design, while Box–Behnken design is the least frequent. The number of papers using the central composite design is about 10 times that of paper using Box–Behnken design. However, upon examining the increasing trend, it can be seen that the number of papers using central composite design and full factorial design has been decreasing year by year. The number of papers using Box–Behnken design is increasing year by year.

The experimental matrices in the three methods for RSM design can be found in various publications and textbooks [1,2]. The number of required experiments for the experimental methods is 27, 22, and 13 or more for FFD, BBD, and CCD, respectively.

Many criteria have been proposed to evaluate the regression models’ fitting agreement and predictive ability. The essential procedure in regression analysis is to find the significant effect of certain variables on the response. To evaluate the fitting agreement of a proposal model, the criteria include the F-value of ANOVA, coefficient of determination R², adjusted R² (R²_adj), lack-of-fit test, etc. To assess the significant effect of one variable on the response, the criterion is the t-value or p-value of the coefficients of this variable. The criteria to evaluate the predictive ability are the predicted residual error sum of squares (PRESS) and predicted R-squared (R²_pred) [17,18,19].

After the evaluation, the appropriate equation (RSM model) establishes the relationship between the response value and the influencing factors as a mathematical function. Among the graphical methods, contour plots and three-dimensional surface response plots are particularly popular among researchers [1,3,20]. The studied techniques are described in detail in this paper [21].

The accuracy and predictive power of the RSM model determine whether it can effectively find the factors that actually affect the response value. However, engineers often do not care about the suitability of the regression equations they use. Many researchers directly use commercial software to calculate the RSM model. Reza et al. [22] emphasize the importance of the model’s validity and accuracy in the context of RSM application, as well as the validity of the RSM model itself [22].

Compared with other experimental design methods, the most prominent feature of the response surface method is that it graphically represents the relationship between the influencing factors (variables) and the response value. Contour maps are used to present two-dimensional space. These contour maps illustrate the changes in response value under various influencing factors. Three-dimensional surface response maps represent information in three-dimensional space, with the response value as the z-coordinate and the x- and y-coordinates as the two influencing factors. Suppose the quadratic polynomial relationship does not exist. In that case, this type of contour plot will be drawn as a straight line, and the three-dimensional surface response plot will appear as a stable plane. If the quadratic polynomial relationship is significant, both graphs will be represented by curves. Researchers often use these contour maps to intuitively observe the optimal level of the influencing factors that lead to the optimal response [1,3,4].

The purpose of performing a response surface methodology (RSM) experiment is to find the optimal operating conditions. Suppose the RSM model represents a linear relationship. In that case, the contour plot and the surface response plot indicate the direction of change in the response value concerning the original experimental design conditions. Assuming that the RSM model is a quadratic equation, the two plots will indicate the maximum, minimum, or saddle point conditions, respectively. Therefore, establishing a suitable RSM equation can ensure that the research results lead to the correct optimization conditions. Therefore, the regression technique for establishing the RSM needs careful consideration.

To the best of the authors’ knowledge, regression techniques have not been fully considered when evaluating the accuracy and predictive ability of RSM equations. This study compiled 16 datasets from three engineering fields, drawn from the literature and supplemented with experimental datasets. These datasets were used to evaluate the adequacy of the RSM equations. The complete regression technique was illustrated. The adequate RSM models were proposed to highlight the importance of comprehensive regression analysis. The effect of the RSM equations on optimization was further discussed. Some suggestions were proposed for the use of RSM in engineering fields.

2. Materials and Methods

2.1. Data Sources for the Equations of the Response Surface Methodology

Table 2 presents 16 datasets from three engineering fields, used to evaluate the adequacy of RSM equations. These published papers present the original datasets.

The databases included ScienceDirect, Scopus, IEEE Xplore, SpringerLink, Google Scholar, and J-STAGE. Keywords were “RSM”, “Regression analysis”, “Model evaluation”, and “Semiconductor”, or “Steel materials”, or “Nanomaterials”.

The potential publication bias in the selected literature was a limitation of the data available. Some studies did not list their experimental data and presented their RSM results directly. Much helpful information in other non-selected literature was not considered. The publications years of 16 datasets from three engineering fields ranged from 2004 to 2024. The countries of the authors involved are Algeria, Brazil, China, Korea, Italy, Iran, Kazakhstan, Malaysia, Turkey, and the UK.

The parameters and criteria were estimated using Sigma Plot v.14.0 (SPSS Inc., Chicago, IL, USA).

2.2. Model Building for the Response Surface Methodology

A typical multiple regression model is expressed as

y_{i} = b_{0} + b_{1} x_{1} + \dots . . + b_{k} x_{k} + ε_{i}

(4)

where

ε_{i}

is a model error

The parameters of the regression equation are x₁, x₂, …, x_k, and the coefficients of the parameters are b₀, b₁, b₂, …, b_k. The parameters’ coefficients are usually calculated using the least squares method. Many commercial software programs can help researchers perform this calculation easily and quickly.

To establish the relationship between the variables and response of the RSM models, the significant effect of the variables on the response must be evaluated using statistical techniques. However, due to the convenience of using the commercial software programs, the evaluation of regression analysis is easily neglected.

2.3. Basic and Complete Regression Analyses

Regression analysis is a statistical technique used in experimental design to establish the relationship between response values and influencing factors. With the help of convenient commercial software programs, the estimated values of variable parameters can be easily calculated using the least squares method. These software programs also provide statistical tests to validate the regression models, such as analysis of variance tables. An analysis of variance table lists the t-value and p-value for each variable, facilitating further analysis. This process is called basic regression analysis.

Regression analysis does not simply accept all variables in the initial calculation equation. The next step is to screen out those variables that have no significant effect on the response. Due to the interaction and multicollinearity between these variables, it is impossible to remove the variables that have no significant effect on the response at the same time [17,18,39,40,41,42,43].

Several procedures are employed, including the sequential variable selection procedure, the forward procedure, stepwise regression, and backward elimination. In the establishment of the RSM models, the backward elimination procedure is recommended. For the regression analysis, two criteria are used to evaluate the accuracy (fitting agreement) and prediction [17,18,39,40,41,42].

Besides calculating the estimated values of the parameters and screening the effective variables, the detailed technique includes checking for violations of the assumptions and performing influence diagnostics. The influential data points are checked with other criteria [17,18,39,40,41,42,43].

In this study, the complete regression analysis includes performing a sequential variable selection procedure, evaluating RSM model accuracy and prediction using different criteria, checking for violations of assumptions, and performing influence diagnostics to identify influential data points.

2.4. Assumptions Involved in the Regression Analysis

The regression model includes the following assumptions: ε_i is uncorrelated across observations, the mean of ε_i is zero, the variance of ε_i is constant, and the distribution of ε_i follows a normal distribution. Based on this assumption, non-standard conditions involved in regression analysis include the balance between underfitting and overfitting, the non-normality of the dataset, the heterogeneity of the variance, and influential data points [17].

After completing the regression calculation using the least squares method, the first concern is to verify the significant effect of each parameter. Then, these assumptions must be verified, and any non-standard conditions must be addressed if they exist. Another concern of regression analysis is distinguishing between the model’s performance in terms of model fitting and predictive ability. Two criteria are used [17,44,45,46]. Modern concepts in regression modeling have been introduced by Myers [17] and Marinoiu [46]. Checking the significant effect of each parameter and selecting a regression model is called classical regression. The concept of complete regression involves a trade-off between bias and variance when selecting a limited number of important variables, comparing predictive ability, and verifying and addressing the assumptions of regression.

2.5. Establishment of the Model

Once the regression equation is established, contour and 3D surface plots can be generated using commercial software programs, allowing researchers to easily visualize the optimization using these figures. Suppose the selection of the regression equation is inappropriate. In that case, these contour and 3D surface plots are inappropriate, so the conclusions and suggestions regarding the effect of the dependent variable (factors and levels) are meaningless.

Three methods for performing sequential variable selection are forward selection, stepwise regression, and backward elimination. These methods are illustrated in detail [17,18,39,40,41,42,43,47].

For the forward selection procedure, all variables are selected as regressors and enter the model with a constant term. The variable that produces the largest R-squared value is selected first, and the resulting equation is referred to as the first equation. The other variables are considered as the second variable of the first equation. The variable that produces the largest R-squared value in the second analysis is selected as the second variable. The selection of variables continues in this way. For each procedure, the p-value of the selected variable is compared with the preset value. If the p-value of the selected variable is greater than the preset value, the procedure ends.

Stepwise regression is an improvement on forward selection. Variables that were deleted in the previous stage can be re-entered in the selection procedure. The selection procedure is the same as forward regression.

Backward elimination procedures are used to fit all variables for the regression equation and determine the p-value for each variable in the model. The variable with the lowest observed t-value and its corresponding p-value are compared with a preselected significance level, usually p < 0.05. If its p-value exceeds the preselected value, the variable is removed. The remaining variables are recalculated, and the variable with the lowest t-value and p-value is identified to compare its p-value with a value of p < 0.05. The above backward elimination procedures are repeated until no variable is dropped, and the procedure ends. The selection of the regression model consists of all remaining variables.

The problem with forward and stepwise methods is that the critical t-values they set are not appropriate in the early stages [17]. Since there are fewer variables in the early stages, the standard values of the estimates are usually overestimated, and the p-values of the variables may be too significant, thus preventing important variables from entering the model. Therefore, forward and stepwise models often underestimate. Mendenhall and Sincich [42] recommend the use of backward elimination because this method can select all possible explanatory variables as early as possible and eliminate those that are not important in explaining the response variation. Backward elimination is recommended because it considers the effects of all candidate variables [41,43,48].

In the backward elimination procedure, a t-test statistic for a variable is used to calculate its p-value, which is then used to assess the statistical interpretation of the variable in the regression model [14,48].

2.6. Criteria for the Evaluation of RSM Equations

Ten criteria are used to evaluate the RSM equations after calculating the coefficients of the variables. They are listed in Table 3.

R², the coefficient of determination, is affected by the number of variables, so it is not a reasonable criterion. The criteria of R²adj and s are considered measures of the effect of the number of variables; they serve as criteria for model accuracy. The PRESS value is used to compare the predictive performance [17,18,39,40,41,42].

The normality test technique employed is the Kolmogorov–Smirnov method, with a cutoff value of p = 0.05. The constant variance test uses the Spearman rank correlation method; the cutoff value is p = 0.05.

The externally studentized residuals, ti, and DFFITSi values are used to examine potentially influential data points. Both criteria are set at ±2.0. If a data point is identified as influential, it should not be removed from the dataset immediately. It may be due to experimental or instrumental error, or it may deviate from the trend predicted by the proposed model. Further observations should be made, and more relevant data should be collected under the same experimental conditions.

In this study, statistical analysis was performed using SigmaPlot V.14.0 (SPSS Inc., Chicago, IL, USA). The contour plots were produced by this software.

3. Results

3.1. Semiconductor Manufacturing

An experimental dataset was reported by Box and Draper [49] and was introduced by Myers et al. [1]. The process involved applying a coating material to a wafer. Several coating thicknesses at different locations on a wafer were measured. The mean y₁ and standard deviation y₂ of the thickness were calculated. The influencing variables were x₁, speed; x₂, pressure; and x₃, distance. These datasets illustrate the backward elimination technique for an adequate equation and are listed in Appendix A.1.

For the y₁ mean of thickness [49], the complete regression procedure is

1. y₁ = 327.615 + 177.011x₁ + 109.422x₂ + 131.472x₃ + 32.022x₁² − 22.378x₂²
(<0.001) (<0.001) (<0001) (0.317) (0.481)
−29.061x₃³ + 66.033x₁x₂ + 75.458x₁x₃ + 43.583x₂x₃
(0.363) (0.008) (0.003) (0.064)
R² = 0.927, R²_adj = 0.888, s = 76.111, PRESS = 337,737.94

(5)

Delete x₂² and recalculate the equation.

2. y₁ = 312.696 + 177.011x₁ + 109.422x₂ + 131.475x₃ + 32.022x₁² − 29.061x₃²
(<0.001) (<0.001) (<0.001) (0.310) (0.356)
+ 66.033x₁x₂ + 75.458x₁x₃ + 45.583x₂x₃
(0.007) (0.003) (0.006)

(6)

Delete x₃² and recalculate the equation.

3. y₁ = 293.322 + 177.011x₁ + 109.422x₂ + 131.475x₃ + 32.022x₁² − 66.033x₃²
(<0.001) (<0.001) (<0.001) (0.308) (0.007)
+ 75.458x₁x₃ + 43.458x₂x₃
(0.002) (0.058)

(7)

Delete x₁² and recalculate the equation.

4. y₁ = 314.670 + 177.011x₁ + 109.422x₂ + 131.473x₃ + 66.033x₁x₂ + 75.458x₁x₃
(<0.001) (<0.001) (<0.001) (0.006) (0.002)
+ 47.583x₂x₃
(0.0058)
R² = 0.916, R²_adj = 0.891, s = 75.068, PRESS = 288,457.9

(8)

The p-values of all variables are <0.05; this is an adequate equation.

The normality and constant variance tests are passed. Influential points include the 9th (t_i = −3.316, DFFITS_i = −2.858), 19th (t_i = 2.255, DFFITS_i = 2.056), and 25th (t_i = −2.322).

The adequate equation involves x₁, x₂, x₃, x₁x₃, and x₂x₃. No quadratic terms exist in this equation. The contour plots of the response form a plateau, not a surface curve.

The adequate equation for the y₂ standard deviation and three variables is calculated as follows:

y₂ = 34.904 + 11.522x₁ + 15.317x₂ + 29.183x₃ + 4.189x₁² − 1.328x₂²
(0.280) (0.156) (0.012) (0.818) (0.942)
−16.772x₃³ + 7.717x₁x₂ + 5.117x₁x₃ + 14.075x₂x₃
(0.362) (0.550) (0.691) (0.281)
R² = 0.454, R²_adj = 0.615, s = 43.817, PRESS = 93,044.256

(9)

After the execution of the backward elimination, the adequate equation for y₂ is

y₂ = 47.993 + 29.183x₃
(0.007)
R² = 0.256, R²_adj = 0.227, s = 42.171, PRESS = 52,441.283

(10)

The normality test is passed. The constant variance test is failed (p < 0.001).

The results of the evaluation of adequate RSM models for semiconductor manufacturing in five studies are listed in Table 4.

Won et al. [23] employed the response surface method to optimize the final polishing of Si wafers. The experimental design was CCD, the response y_SR was surface roughness, and the variables were x₁ applied pressure, x₂ platen speed, and x₃ mixed slurry ratio. The RSM model was not reported in this study. Contour plot and response surface plot curves were produced by full models [23].

The full equation with these datasets is as follows:

y_SR = 1.988 + 0.501x₁ − 0.0440x₂ − 0.172x₃ + 0.235x₁² + 0.0101x₂²
(<0.001) (0.464) (0.027) (0.084) (0.931)
+ 0.223x₃² + 0.0711x₁x₂ − 0.128x₁x₂ + 0.00125x₂x₃
(0.098) (0.302) (0.095) (0.985)
R² = 0.958, R²_adj = 0.882, s = 0.172, PRESS = 1.299

(11)

The adequate equation was evaluated with the regression technique of backward elimination:

y_SR = 1.991 + 0.501x₁ − 0.172x₃ + 0.238x₁² + 0.225x₃² − 0.128x₁x₂
(<0.001) (0.007) (0.030) (0.037) (0.044)
R² = 0.941, R²_adj = 0.909, s = 0.154, PRESS = 0.646

(12)

The normality and constant variance tests are passed. Two influential data points are found, the 7th and 15th data points.

In this study [23], the authors used the full equation to produce the curved surface plots. However, the adequate equation indicated that the x₂ variable (platen speed) did not significantly affect the response, surface roughness. With the complete regression technique, the adequate RSM equation could help researchers to find the optimal condition.

Figure 1 indicates the contour plots for the complete and adequate equations. The difference between the two equations resulted in a difference in the distribution of curves between the two figures. The contour and response surface plots produced with the full equation were presented in the study [23]. The incorrect RSM equation could induce incorrect results of observation.

Lee et al. [24] investigated the polishing factors affecting surface roughness using a Box–Behnken design. The polishing factors were x₁, pressure; x₂, wheel speed; and x₃, process time. The authors proposed the full equation as the best equation and found that the R² was 0.974 for this equation. This full equation produced contour and response surface plots.

The full equation proposed by the authors is

y_SR = 9.465 − 24.850x₁ − 0.163x₂ − 0.278x₃ + 46.375x₁² + 0.00309x₂²
(<0.001) (0.006) (0.011) (0.001) (0.008)
+ 0.00815x₃² − 0.0325x₁x₂ + 0.425x₁x₃ + 0.000150x₂x₃
(0.038) (0.661) (0.029) (0.919)
R² = 0.974, R²_adj = 0.928, s = 0.140, PRESS = 1.559

(13)

However, the p-values of the variables x₁x₂ and x₁x₃ are greater than 0.005.

The adequate equation evaluated with the complete regression technique in this study is

y_SR = 9.565 − 25.501x₁ − 0.168x₂ − 0.275x₃ + 46.375x₁² + 0.00309x₂²
(<0.001) (<0.001) (0.002) (<0.001) (0.002)
+ 0.00815x₃² + 0.425x₁x₃
(0.014) (0.010)
R² = 0.973, R²_adj = 0.946, s = 0.121, PRESS = 0.603

(14)

The normality and constant variance tests are passed. Four influential data points are found (4th, 6th, 7th, 12th).

Comparing the full and adequate equations, the adequacy equation has higher values of R²_adj and s and a lower value of PRESS. This indicated that an adequate equation has better accuracy performance (R²_adj, s) and prediction (PRESS).

Zhang et al. [25] investigated the optimization of dispatching rules for wafer manufacturing systems. The affecting variables were x₁, the criterion of bottleneck, and x₂ and x₃, which were two coefficients of work-in-progress (WIP) status. The responses included y_CT, the cycle time (CT); y_WIP, work-in-progress (WIP); and y_TP, throughput (TP). The authors proposed the full models [25], and contour and 3-D response surface plots were produced as curve surface plots.

The results of the complete regression in our study are

y_CT = 932.466 − 10.392x₁ − 31.055x₂ + 0.0685x₁² + 3.849x₂²
R² = 0.813, R²_adj = 0.763, s = 6.755, PRESS = 2307.945

(15)

The normality and constant variance tests were passed. Three influential data points were found.

y_WIP = 10514.817 − 166.362x₁ + 1878.219x₂ + 2653.558x₃ + 1.765x₁² − 164.955x₃²
−22.265x₁x₂
R² = 0.922, R²_adj = 0.886, s = 162.678, PRESS = 948,709.4

(16)

The normality test was failed (p = 0.008), and the constant variance test was passed. One data point was influential (13th).

y_TP = −1920.381 + 40.347x₁ + 124.394x₂ + 248.409x₃ − 0.184x₁² − 15.781x₂²
−8.908x₃² − 1.204x₁x₃
R² = 0.768, R²_adj = 0.633, s = 15.875, PRESS = 13,369.7

(17)

The normality test was failed (p = 0.008), and the constant variance test was failed (p = 0.003). Three influential data points were found (5th, 15th, 18th). For the y_CT response, x₃² did not significantly affect the response. For the y_WIP response, x₃² was not included in this RSM equation. The authors presented plots of the surface curve using the full equation. However, this full equation was inappropriate. The y_WIP response was under non-normality conditions. Both tests of normality and constant variance failed for the y_TP response. An advanced regression technique needs to be performed to remedy these conditions.

Seo et al. [26] optimized a tungsten chemical mechanical planarization (CMP) slurry for semiconductor manufacturing. The CCD experimental design was employed. The responses y_W and y_Oxide were the removal rates of the thickness of the W and oxide films. The full equations of the two responses were proposed [26] and used to produce the contour and response surface plots.

The adequate equations with datasets listed in the literature [26] were evaluated by complete regression analysis:

y_W = 25.601 + 843.010x₂
R² = 0.721, R²_adj = 0.698, s = 138.318, PRESS = 328.601

(18)

y_Oxide = 187.296 + 0.834x₁ + 10.682x₃
R² = 0.688, R²_adj = 0.636, s = 14.801, PRESS = 4.068

(19)

Both responses passed the tests of normality and constant variance. In this study, only the x₂ variable had a significant effect on y_W. The x₁ and x₃ variables had a linear relationship with y_Oxide. The surface plots were presented in the literature [26]. The effect of the adequate equation to present the appropriate contour and response surface plots is evident by comparing the adequate and proposed full equations in the literature [26].

Saleem and Soma [27] used the Box–Behnken design and RSM to study the optimization of MEMS devices. The influencing factors were x₁, top electrode length (TEL); x₂, top electrode width (TEW); x₃, torsion spring length; and x₄, torsion spring width (TSW). The response y was the pull-in voltage.

The full equation, which included x₁, x₂, x₃, x₄, x₁², x₂², x₃², x₄², x₁x₂, x₁x₃, x₁x₄, x₂x₄, and x₃x₄, was proposed and used to produce the surface plots.

The full equation calculated with the datasets in literature [27] is

y_PV = 27.200 − 7.550x₁ − 2.717x₂ − 6.208x₃ + 3.292x₄ + 0.154x₁² − 0.696x₂²
(<0.001) (<0.001) (<0.001) (<0.001) (0.842) (0.376)
+ 1.192x₃² − 0.258x₄² + 0.825x₁x₂ + 1.800x₁x₃ − 0.925x₁x₄ + 0.425x₂x₃
(0.147) (0.731) (0.364) (0.062) (0.311) (0.635)
−0.400x₂x₄+ 1.350x₃x₄
(0.655) (0.148)
R² = 0.975, R²_adj = 0.945, s = 1.748, PRESS = 211.08

(20)

The adequate equation calculated by backward elimination regression is

y_PV = 26.773 − 7.550x₁ − 2.717x₂ − 6.208x₃ + 3.292x₄ + 1.352x₃² + 1.801x₁x₃
R² = 0.962, R²_adj = 0.95, s = 1.659, PRESS = 100.206

(21)

Comparing the R²_adj and s values, the adequate equation has better accuracy than the full equation. The adequate equation also exhibits better predictive performance, as indicated by the PRESS criterion. The normality test is failed (p = 0.0046). Further analysis is needed.

3.2. Steel Materials

The results of evaluating adequate equations for steel materials are listed in Table 5.

Noordiu et al. [28] investigated the performance of coated carbide tools using a CCD design. The influencing variables were x₁, cutting speed; x₂, feed; and x₃, side cutting edge angle (SCEA), and the responses were y_Ra, surface roughness (Ra), and y_Fc, tangential force (Fc). The regression procedure was introduced in detail, and the backward elimination procedure was used. The model assessment criteria included R², R²adj, PRESS, and lack of fit [28].

The response y_Ra and y_Fc calculations in this study are presented in Appendix A.2 and Appendix A.3. The complete regression analysis produces the appropriate model, which is consistent with the report by the authors [28]. In our study, the tests of normality and constant variance are performed. Both responses, y_Ra and y_Fc, pass. For the y_Ra response, one influential data point (7th) is identified. Two influential data points (12th, 13th) exist for the y_Fc response.

Bouacha et al. [29] studied the physical properties in hard turning with a cubic boron nitride (CBN) tool. The factors affecting the response were x₁, cutting speed; x₂, feed rate; and x₃, depth of cut. The responses were surface roughness y_Ra, arithmetic average of absolute roughness Ra; y_Rt, maximum height of the profile Rt; and y₃, average maximum height of the profile y_Rz.

The ANOVA tables of y_Ra, y_Rt, and y_Rz are presented in the literature [29]. The authors use p < 0.05 as a criterion and then delete all other variables for which the p-value is greater than 0.05. The coefficient of the parameters at this first calculation of variables with p-values < 0.05 is used to propose the final equation. The interaction of these variables is not considered. The remaining variables of their proposal equation, which consist of three responses, are presented in the published table [29].

The adequate equations evaluated and checked with the regression technique are as follows:

y_Ra = 0.285 − 0.00841x₁ + 14.410x₂ + 0.0000215x₁² − 33.681x₂² − 0.0128x₁x₂
R² = 0.991, R²_adj = 0.989, s = 0.018, PRESS = 0.0115

(22)

Both tests (normality and constant variance) are passed, and two influential data points are identified (6th and 13th).

y_Rt = 2.221 − 0.0548x₁ + 86.001x₂ + 0.000136x₁² − 208.333x₂² − 0.068x₁x₂
R² = 0.986, R²_adj = 0.982, s = 0.139, PRESS = 0.676

(23)

Both tests (normality and constant variance) are passed, and two influential data points are found (1st, 4th)

y_Rz = 2.994 − 0.0409x₁ + 40.071x₂ + 1.575x₃ + 0.000951x₁² − 78.472x₂² − 0.0.357x₁x₂ − 0.00541x₁x₃
R² = 0.993, R²_adj = 0.990, s = 0.079, PRESS = 0.220

(24)

The normality test is failed (p = 0.007). The constant variance test is passed. Two influential data points are found (5th, 6th).

The x₃ variable did not significantly affect y_Ra and y_Rt. For the y_Rz response, the x₃ variable only had the linear effect, and the quadratic term was insignificant. In this study [29], curve surface plots were produced using full equations rather than the appropriate models.

Figure 2 shows the contour plots for the complete and adequate equations. The difference in the equations induces a difference in the distribution of curves between the two figures. The contour and response surface plots produced with the complete equation are presented in the study [29]. An inadequate RSM equation can induce incorrect results.

Elbah et al. [30] performed a mixed ceramic tool performance test. The factors affecting performance were x₁, depth of cut; x₂, feed rate; and x₃, cutting speed. The response factors were y_Fa, axial force; y_Fr, thrust force; y_Ft, tangential force; and y_Rs, surface roughness.

In the literature [30], four ANOVA tables with four responses were listed, along with the p-values of each variable and a notation indicating whether the variable was significant or not. However, this information was not used. The full equations of the four responses were proposed and used to produce the contour and response surface plots.

The forms of the four responses evaluated by the complete regression technique in our study are as follows:

y_Fa = f(x₁, x₂, x₃, x₁x₃, x₂x₃)

(25)

y_Fr = f(x₁, x₂, x₃, x₁x₂, x₁x₃)

(26)

y_Ft = f(x₁, x₂, x₃, x₁x₂, x₁x₃, x₂x₃)

(27)

y_Rs = f(x₁, x₂, x₁x₂)

(28)

The quadratic terms had no significant effect on the responses. The curved surface plots were inappropriate and could easily induce incorrect results.

Campos et al. [31] observed the machining of hardened steels with CCD in an RSM study. The influencing factors were x₁, cutting speed; x₂, feed rate; and x₃, cut depth. The responses were y_Time, time; y_Ra, average surface roughness Ra; and y_Rt, maximum height of the profile surface roughness Rt.

The sequential model test and the ANOVA results of the three responses are presented in the literature [31]. The significant effects are the linear + square model for y_Time, the linear + square model for y_Ra, and the linear + interaction model for y_Rt. In the three ANOVA tables, the p-values of some variables were >0.05. However, the full models of the three responses were proposed. The curved surfaces of both plots were presented.

The forms of the adequate equations in our study are

y_Time = full equation.

(29)

y_Ra = f(x₁, x₂, x₃, x₁x₂, x₁x₃, x₂x₃, x₁², x₃²).

(30)

y_Rt = f(x₁, x₁²).

(31)

The adequate equation for y_Time is the full equation, yielding the same results as in the literature [31], and four influential data points are identified. For the Ra response y_Ra, the x₂² variable was not included in the adequate equation, and four influential data points were found. The curved surface plots were inappropriate for this response. For the y_Rt Rt response, only the variables of x₁ and x₁² have a significant effect on y_Rt.

Khalil et al. [32] reported optimizing the effect of machining factors on surface roughness for machining AISI D3 steel. The influencing factors were x₁, cutting speed; x₂, feed rate; and x₃, cut depth. The response y_SR was the surface roughness. The results of the ANOVA for the experiment are presented in the literature [32]. The p-values of variables x₁² and x₂² were >0.05. However, the full equation was still adopted, and curved response surface plots were produced.

The form of the adequate equation evaluated by the complete regression technique is

y_SR = f(x₁, x₂, x₃, x₁x₂, x₁x₃, x₃²).

(32)

For the quadratic terms, only the x₂³ term significantly affects the response.

Using the full equation involving x₁² and x₂² to produce the curved surface plots was inappropriate and could easily cause incorrect results.

3.3. Nanomaterials

The results of evaluating adequate RSM equations for nanomaterials are presented in Table 6.

Pakolpakcil et al. [37] investigated the effect of processing parameters on the aerosol filter of poly nanofiber mats. A three-factorial BBD was used. The variables were x₁ concentration; x₂, rotation speed; and x₃, needle size. The response y_Afd was the average fiber diameter. The statistical results of the sequential models showed that the p-value of the quadratic term was <0.05. The ANOVA table in this study [37] indicated that the p-values of x₁x₂, x₁x₃, and x₂² were >0.05. That is, these variables did not have a significant effect on the response. However, the full equation was proposed and used to produce the contour and surface response plots [37].

The calculation of the adequate equation with the datasets in the literature is presented in Appendix A.4. The form of this adequate equation is

y_Afd = 277.462 + 60.375x₁ + 17.001x₂ + 15.875x₃ − 42.058x₁² + 34.442x₃²
+ 17.501x₁x₂

(33)

The variable x₂² is excluded. That is, there is no quadratic relationship between x₂² and the response.

An adequate equation has a similar accuracy performance to that of a full equation. However, it has a better predictive ability (PRESS = 675.548) than the full equation (PRESS = 1,113,520).

Figure 3 reveals the contour plots for the complete and adequate equations. The difference in the equations induces a difference in the distribution of curves between the two figures. The contour and response surface plots produced with the full equation are presented in the study [37]. An inadequate RSM equation can lead to incorrect observation results.

The normality test is passed. However, the constant variance test is failed. One suspicious data point (12th) is found. Further analysis needs to be performed.

Pajaie and Taghizadeh [33] reported optimizing the catalytic performance of synthesized catalysts for the methanol-to-olefin reaction. The BBD experimental design was used. There were three variables: x₁, the MW aging time; x₂, the US aging time; and x₃, the HT time. Two responses are the yield of ethylene and the yield of propylene.

Two ANOVA tables for y_ethylene and y_propylene are presented in the literature. For the ethylene yield, p-values of all variables are <0.05. The full equation is an adequate representation of the ethylene yield. The R² value of ethylene is very close to 1.0 (R² = 0.9995).

In the ANOVA table of the propylene yield y_propylene, the p-values of x₂x₃ and x₃² were >0.05. The two variables did not have a significant effect on response y_propylene. However, the authors used two full equations to produce the contour and response surface plots [33]. It is inappropriate to use the propylene yield as a measure. The x₃ HT time did not have the curved surface condition with y_propylene.

There are six suspicious data points for y_ethylene and two suspicious data points for y_propylene. Further investigation needs to be performed.

Jourshabani et al. [34] investigated the factors influencing benzene hydroxylation to phenol using a V/SBA-16 nanoporous catalyst. The CCD was used. The variables were x₁, reaction temperature; x₂, H₂O₂ content; and x₃, catalyst amount. The response was the yield of phenol.

In the AVOVA table for response, the p-value of the x₁x₃ variable is >0.05. However, the authors proposed the full equation, which was used to produce the surface plots [34].

The adequate equation, evaluated using complete regression analysis, is y_{phenol yield} = f(x₁, x₂, x₃, x₁², x₂², x₃²). The interaction terms (x₁x₂, x₁x₃, x₂x₃) are excluded in this equation. That is, the full equation is inappropriate.

Sheng et al. [35] investigated the optimization of deposition variables to synthesize upright ZnO rod arrays with large diameters. There were four influencing factors: x₁, the concentration of Zn⁺²; x₂, reaction temperature; x₃, reaction time; and x₄, the molar ratio of Zn⁺². The response was y_D diameter. The authors used the logarithm transform for the response. The form of their proposal equation is

Log (y_D + 0.5) = f (x₁, x₂, x₃, x₄, x₃², x₁x₂, x₂x₃, x₂x₄, x₃x₄).

(34)

Both tests are passed for the y response using the normality and constant variance tests, so it is not necessary to transform the response y into a logarithmic form. The adequate equation calculated by complete regression analysis in our study is

y_D = 1.182 − 57.722x₁ − 0.0289x₂ + 0.288x₃ + 0.00265x₄ − 0.00822x₃² + 0.778x₁x₂
−0.0000299x₂x₄ − 0.0000643x₃x₄
R² = 0.86, R²_adj = 0.797, s = 0.169, PRESS = 1.18

(35)

Rakhmanova et al. [36] reported the optimization of nanosized zinc oxide synthesis conditions using electrospinning. Three influencing factors were applied: voltage; x₂, distance; and x₃, calcination temperature, and the response, y, was zinc oxide.

The authors did not propose the empirical equation; instead, surface plots of the curve were presented [36]. The full equation evaluated by complete regression with the datasets in the literature is

y_{Zine oxide} = 302.751 − 0.577x₁ + 1.625x₂ − 0.712x₃ + 0.000348x₁² − 0.00438x₂²
(0.629) (0.970) (0.985) (0.565) (0.997)
−0.432x₃² − 0.00871x₁x₂ + 0.0113x₁x₃ + 0.0897x₂x₃
(0.739) (0.733) (0.566) (0.932)
R² = 0.768, R²_adj = 0.351, s = 7.154, PRESS = 6124.736

(36)

In our study, the adequate equation evaluated by complete regression analysis is

y_{Zine oxide} = 129.859 − 2.526x₂ − 3.663x₃
R² = 0.651, R²_adj = 0.592, s = 5.668, PRESS = 639.412

(37)

The normality and constant variance tests are passed, and two suspicious data points are found.

The PRESS values for the adequate and complete equation are 639.432 and 6124.736. The adequate equation offers a significant improvement in the predictive ability.

Sreekumar et al. [38] investigated the optimization of a photovoltaic/thermal system using a Mxene/water nanofluid as the heat transfer fluid via CCD. The four influencing variables were x₁, nanofluid concentration; x₂, nanofluid flow rate; x₃, solar radiation; and x₄, inlet temperature. The four responses were y_nth, thermal efficiency; y_nele, electrical efficiency; y_nex, thermal exergy efficiency; and _ynex, electrical exergy efficiency.

The author proposed full equations for four responses, which were used to produce curve surface plots of contour and 3D response surface plots [38]. The coefficients of variables and their corresponding p-values for four responses (y_nth, y_nele, y_nex_,th, and y_nex_,ele) are presented in the table from the study [38]. In this table, many p-values of each parameter in the full equations are >0.05. This indicates that these parameters did not significantly affect the response. However, the complete equations were adopted [38].

The complete regression technique evaluated the adequate equation of the four responses.

y_nth = 51.078 + 29.749x₁ + 0.257x₂ − 0.631x₄
R² = 0.814, R²_adj = 0.789, s = 4.017, PRESS = 515.04

(38)

The normality and constant variance tests are passed. One suspicious data point is found.

y_nele = 19.452 − 0.462x₁ + 0.00935x₂ − 0.00438x₃ − 0.0539x₄ + 0.0000238x₂x₃
−0.000406x₂x₄
R² = 0.984, R²_adj = 0.979, s = 0.171, PRESS = 1.419

(39)

The normality and constant variance tests are passed. Three suspicious data points are found.

y_nex_,th = 0.601 − 3.364x₁ − 0.00775x₂ + 0.00353x₃ − 0.0271x₄ + 19.391x₁²
−0.0000171x₂x₃ + 0.000328x₂x₄ − 0.0000258x₃x₄
R² = 0.979, R²_adj = 0.970, s = 0.119, PRESS = 0.666

(40)

The normality and constant variance tests are passed. Two suspicious data points are found.

y_nex_,ele = 20.366 − 0.175x₁ + 0.0225x₂ − 0.00427x₃ − 0.0748x₄ − 0.000192x₂²
−0.00158x₁x₃ + 0.0000196x₂x₃
R² = 0.998, R²_adj = 0.997, s = 0.068, PRESS = 0.175

(41)

The normality test is passed. However, the constant variance test is failed (p = 0047). Two suspicious data points are found.

For y_nth and y_nele, only the linear relationship is valid. No quadratic terms (x₁², x₂², or x₄²) significantly affect y_nth and y_nele. So, the curve surface plots were inappropriate. For y_nex_,th, only x₂² has a significant effect on response. For y_nex_,ele, only the x₂² term exists in the adequate equation. The constant variance test is failed for y_nex and _ele. Advanced regression techniques should be performed to remedy these conditions.

4. Discussion

This study collected 16 papers related to the application of RSM models in three engineering fields. This literature dataset was used to evaluate the adequacy of RSM equations. The evaluation of the adequate RSM model was completed through a complete regression analysis. This analysis calculated the coefficients for all variables, tested the significant effect of each variable, verified the assumptions, and identified influential data points in the regression analysis. It was found that only one paper reported an equation that could fully and correctly express the relationship between the response and the influencing factors [28].

The common issues with the application of RSM in three engineering fields, as identified in the study, are listed in Table 7.

Most papers adopted the full model and then used it to create contour plots and 3D response surface plots, allowing for the observation of the optimization of these variables. In the literature, ANOVA tables typically include the coefficient value of variable parameters, the t-value, and the p-value for each variable. However, this information was not used by some researchers [24,25,26,27,30,32,33,34,38] to evaluate the adequacy of equations.

One study did not report the RSM model [36]. The contour and response surface plots were produced with the full equation.

One study completely deleted unwanted variables after the first regression calculation [29]. When the ANOVA table of regression results indicated that the p-values of some variables were greater than 0.05, these variables were deleted simultaneously [29]. The equation was proposed using the coefficient values of the remaining variables. However, the full equations were still used to yield the contour and response surface plots for this study [29]. This method is unreasonable for model building.

Two studies employed sequential models to investigate the significant effects of linear, interaction, and quadratic terms on the response [31,37]. The p-values of some combinations (linear + interaction, linear + square terms) indicated that these combinations did not significantly affect the response. However, full equations were still used to produce contour and response surface plots.

If the RSM models are not full equations, the curved contour and 3D surface response plots produced using the full equations are inappropriate. The performance of the optimization conditions of these variables by these plots is meaningless.

Some datasets failed the normality test [25,27,29], and others failed the constant variance test [25,37,38]. These datasets need to be transformed to align with the assumptions of regression analysis. Yang et al. [50] emphasized that departures from the homogenous assumption will induce seriously incorrect results and require remedying this violation. Sheng et al. [35] used the logarithmic transformation of the response to perform the regression analysis. However, the response y_D of their datasets did not violate the normality assumption, as determined by the Kolmogorov–Smirnov method in our study.

The implementation of remedial measures for heteroscedasticity in regression analysis includes transforming the dependent variable, such as log transformation, log(y); square root, (y^0.5), and Box–Cox transformation; using weighted least squares (WLS); considering segmenting the data; and using generalized least squares (GLS) [51,52,53].

Influential data points are usually found in the datasets of the examined literature. These data points may be the source of experimental error or indicate the need for further study of other forms of RSM models. The treatment of influential points in regression analysis involves identifying whether these points are due to a pure data entry error or a valid data point that is an extreme value [54,55].

If an influential point is due to a mistake (e.g., experimental error or sampling error), it can be corrected or removed. Valid data points with extreme values can be transformed to reduce their influence, or robust regression can be used [54,55].

Asoo et al. [3] reported the integration of computer technology in RSM. They described the high-performance computing used to calculate the models and visualize the results, which helped the researchers understand the relationships between influencing variables and responses. Graphical visualization helped the researchers interpret results and make decisions [3]. However, the effect of adequate RSM models on graphical visualization was not considered.

The challenges of RSM include limitations in modeling nonlinear systems, sensitivity to experimental error, model interpretability, and model validation [3]. The nonlinear systems of an RSM experiment can be evaluated with nonlinear regression. The criteria for evaluating linear regression can also be applied to a nonlinear system. Some advanced modeling techniques, such as machine learning, neural networks, and support vector machines, can be applied and incorporated into the regression analysis technique. The experimental error can be further studied and quantified by checking the criteria of influential data points. The model’s interpretability problems can be addressed by incorporating the criteria of regression analysis. The PRESS criterion introduced in this study can be used as the model validation criterion to assess the predictive ability of other RSM models.

Sample size is an essential criterion for ensuring the power of statistical techniques. Researchers have proposed simple sample size equations to evaluate the required sample size (n) for multiple regression models. These criteria can be applied in the calculation of RSM models. Snee [56] proposed this equation: n ≥ 2p + 20. Green’s equation is n ≥ 8p + 50 [57]. The calculation equation used by Khamis and Kepler is n ≥ 5p + 20 [58]. In these equations, n is the required number of samples, and p is the number of parameters in the RSM models. With these equations, the sampling numbers are not great enough for the RSM equations in most studied. The solution method is to increase the replicates at each run.

The study used sixteen published studies to evaluate the adequacy of the RSM equation. The 16 papers used in this study employed the following commercial regression analysis software: Design Expert (10 papers), Minitab (2 papers), and MOODE (1 paper). Three papers did not report the software used. The three commercial software packages provide detailed analyses of regression calculations, including regression coefficients, t-values, and ANOVA tables. However, most users struggle to utilize the calculation results of these programs. Ten papers, accounting for 62.5% of the total literature analyzed, utilized Design Expert software. However, only one study [28] reported the appropriate equation.

Based on the results of this study, several suggestions are proposed for the application of RSM in experimental design within engineering fields.

For engineers using RSM, receiving complete regression analysis training will help them in their research work. Engineers not only need to be able to use commercial software programs to calculate the estimated values of parameter coefficients but also need to be familiar with screening influencing variables, checking the conditions of regression analysis assumptions, and examining all possible influencing data points. Training in complete regression techniques can enhance researchers’ ability to establish appropriate RSM equations.
Ask a statistician for help with the experimental design and verify the validity of the regression calculation.
Many commercial software programs can calculate RSM models and create precision contour and response surface plots. The backward elimination technique is beneficial in finding an adequate equation. It is recommended to integrate this backward elimination method into commercial software to assist researchers in developing suitable RSM equations.

5. Conclusions

This study compiled sixteen datasets from the literature in three engineering fields to evaluate the adequacy of RSM equations with complete regression analysis. The results of this study raise some critical issues regarding the use of RSM models in engineering research, including the selection of the full equation without considering statistical validation, the removal of all variables with p-values above a preset value, the presence of non-normality and non-constant variance conditions in the data set, and the presence of influential data points.

These issues need to be considered in RSM modeling. The sample size should be increased to enhance statistical power. Some suggestions for engineering researchers include training them in the complete regression technique, seeking the assistance of a statistician in experimental design and data analysis, and incorporating the backward elimination technique into commercial software programs, especially RSM software.

Author Contributions

Conceptualization, H.-Y.C. and C.C.; methodology, H.-Y.C. and C.C.; software, C.C.; formal analysis, H.-Y.C.; investigation, H.-Y.C. and C.C.; data curation, H.-Y.C.; writing—original draft preparation, H.-Y.C. and C.C.; writing—review and editing, H.-Y.C. and C.C.; visualization, C.C. supervision, C.C.; project administration, C.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is unavailable because a statement is still required.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1. Data from the Coating Experiment [49]

Run	Speed	Pressure	Distance	Mean, y₁	Standard Deviation, y₂
1	−1	−1	−1	24	12.5
2	0	−1	−1	120.3	8.4
3	+1	−1	−1	213.7	42.8
4	−1	0	−1	86	3.5
5	0	0	−1	136.6	80.4
6	+1	0	−1	340.7	16.2
7	−1	+1	−1	112.3	27.6
8	0	+1	−1	256.3	4.6
9	+1	+1	−1	271.7	23.6
10	−1	−1	0	81	0.0
11	0	−1	0	101.7	17.7
12	+1	−1	0	357.0	32.9
13	−1	0	0	171.3	15.0
14	0	0	0	372.0	0.0
15	+1	0	0	501.7	92.5
16	−1	+1	0	264.0	63.5
17	0	+1	0	427.0	88.6
18	+1	+1	0	730.7	21.1
19	−1	−1	+1	220.7	133.8
20	0	−1	+1	239.7	23.5
21	+1	−1	+1	422.0	18.5
22	−1	0	+1	199.0	29.4
23	0	0	+1	485.3	44.7
24	+1	0	+1	673.7	158.2
25	−1	+1	+1	176.7	55.5
26	0	+1	+1	501.0	138.9
27	+1	+1	+1	1010.0	142.4

Appendix A.2. Evaluation of the RSM Models for y_Ra, Ra Surface Roughness [28]

1. y_Ra = 0.210 − 0.0138x₁ + 15.133x₂ − 0.166x₃ + 0.0000308x₁² + 32.24x₂²
(0.403) (0.490) (0.541) (0.245) (0.468)
+ 0.419x₃² − 0.0268x₁x₂ + 0.000110x₁x₃ + 1.375x₂x₃
(0.059) 0.192) (0.773) (0.031)
R² = 0.982, R²_adj = 0.954, s = 0.174, PRESS = 2.470

(A1)

Delete x₁x₃ and recalculate the equation.

2. y_Ra = 0.297 − 0.141x₁ + 15.102x₂ − 0.0827x₃ + 0.0000307x₁² + 33.310x₂²
(0.358) (0.457) (0.559) (0.210) (0.433)
+ 0.0418x₂² − 0.0268x₁x₂ + 1.375x₂x₃
(0.041) (0.159) (0.020)

(A2)

Delete x₂² and recalculate the equation.

3. y_Ra = −0.863 − 0.0177x₁ + 30.425x₂ − 0.0606x₃ + 0.0000366x₁² + 0.0463x₃²
(0.222) (<0.001) (0.652) (0.115) (0.018)
−0.0268x₁x₂ + 1.375x₂x₃
(0.147) (0.015)

(A3)

Delete x₁x₂ and recalculate the equation.

4. y_Ra = 1.022 − 0.0239x₁ + 22.228x₂ − 0.0599x₃ + 0.0000366x₁² + 0.463x₃²
(0.120) (<0.001) (0.680) (0.137) (0.023)
+ 1.372x₂x₃
(0.020)

(A4)

Delete x₁² and recalculate the equation.

5. y_Ra = −2.231 − 0.00125x₁ + 22.228x₂ + 0.00395x₃ + 0.0590x₃² + 1.372x₁x₃
(0.181) (0.001) (0.979) (0.004) (0.020)

(A5)

Although the linear term x₃ is insignificant (p > 0.05), the quadratic term x₃² and interaction term x₁x₃ are significant in the model. The linear term x₃ is hierarchically retained in the equation.

Delete x₁ and recalculate the equation.

6. y_Ra = −2.714 + 22.288x₂ + 0.000289x₃ + 0.0583x₃² + 1.372x₁x₂
(<0.001) (0.0199) (0.005) (0.003)
R² = 0.958, R²_adj = 0.942, s = 0.195, PRESS = 1.013.

(A6)

The normality and constant variance tests are passed. One suspicious data point is found, point 7 (t_i = 2.350, DFFITS_i = 2.058).

Appendix A.3. Evaluation of the RSM Models for y_Rc, Tangent Force [28]

1. y_Rc = 264.240 − 0.294x₁ + 199.40x₂ − 7.094x₃ + 0.000473x₁² + 3750.468x₂²
(0.725) (0.858) (0.481) (0.736) (0.143)
+ 2.668x₃² − 0.0707x₁x₂ + 0.00244x₁x₃ + 62.058x₂x₃
(0.029) (0.943) (0.244) (0.051)
R² = 0.994, R²_adj = 0.985, s = 9.039, PRESS = 3864.088

(A7)

Delete x₁x₂ and recalculate the equation.

2. y_Rc = 269.233 − 0.311x₁ + 177.735x₂ − 7.092x₃ + 0.000473x₁² + 3750.468x₂²
(0.675) (0.857) (0.444) (0.715) (0.112)
+ 2.6688x₃² + 0.0277x₁x₃ + 62.050x₂x₃
(0.018) (0.205) (0.034)

(A8)

Delete x₁² and recalculate the equation.

3. y_Rc = 241.953 − 0.0410x₁ + 62.922x₂ − 6.563x₃ + 4000.060x₂² + 2.722x₃²
(0.490) (0.943) (0.445) (0.062) (0.007)
+ 0.0243x₁x₃ + 62.050x₂x₃
(0.178) (0.024)

(A9)

Delete x₁x₃ and recalculate the equation.

4. y_Rc = 261.851 − 0.104x₁ + 58.033x₂ + 0.842x₃ + 4010.659x₁² + 2.764x₃²
(0.027) (0.950) (0.905) (0.072) (0.008)
+ 62.050x₂x₃
(0.027)

(A10)

Delete x₂² and recalculate the equation.

5. y_Rc = 52.737 − 0.103x₁ + 1902.950x₂ + 4.737x₃ + 3.543x₃² + 62.505x₂x₃
(0.046) (<0.001) (0.544) (0.002) (0.046)
R² = 0.988, R²_adj = 0.982, s = 9.654, PRESS = 2279.002

(A11)

The normality and constant variance tests are passed. Two suspicious data points are found, the 12th (t_i = 2.322) 13th (t_i = −2.326).

Appendix A.4. Evaluation of the RSM Models for y_Afd, Average Fiber Diameter [37]

1. y_Afd = 274.667 + 60.375x₁ + 17.000x₂ + 15.875x₃ − 41.708x₁² + 4.542x₂²
(<0.001) (<0.014) (0.019) (0.002) (0.535)
+ 34.792x₃² + 6.000x₁x₂ + 9.750x₁x₃ − 17.501x₁x₃
(0.004) (0.402) (0.197) (0.044)
R² = 0.982, R²_adj = 0.95, s = 13.099, PRES S = 1,113,250

(A12)

Delete x₂² and recalculate the equation.

2. y_Afd = 277.462 + 60.375x₁ + 17.000x₂ + 15.875x₃ − 42.058x₁² + 34.44x₃²
(<0.001) (<0.014) (0.019) (0.002) (0.535)
+ 6.001x₁x₂ + 9.750x₁x₃ − 17.501x₂x₃
(0.373) (0.169) (0.031)

(A13)

Delete x₁x₂ and recalculate the equation.

3. y_Afd = 277.462 + 60.375x₁ + 17.000x₂ + 15.875x₃ − 42.0058x₁² + 34.442x₃²
(<0.001) (<0.006) (0.009) (<0.001) (0.001)
+ 9.750x₁x₃ − 17.501x₂x₃
(0.160) (0.026)

(A14)

Delete x₁x₃ and recalculate the equation.

4. y_Afd = 277.462 + 60.375x₁ + 17.001x₂ + 15.875x₃ − 42.058x₁² + 34.442x₃² + 17.501x₂x₃
R² = 0.97, R²_adj = 0.947, s = 13.502, PRESS = 675,548

(A15)

The normality test is passed. The constant variance test is failed (p

\geq

0.038). One suspicious data point is found, point 12 (DFFITS_i = 2.981)

References

Myers, R.H.; Montgomery, D.C.; Anderson-Cook, C.M. Response Surface Methodology: Process and Product Optimization Using Designed Experiments; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
Anderson, M.J.; Whitcomb, P.J. RSM Simplified: Optimizing Processes Using Response Surface Methods for Design of Experiments; Productivity Press: University Park, IL, USA, 2016. [Google Scholar]
Asoo, H.R.; Alakali, J.S.; Ikya, J.K.; Yusufu, M.I. Historical background of RSM. In Response Surface Methods-Theory, Applications and Optimization Techniques; IntechOpen: London, UK, 2024. [Google Scholar]
Baş, D.; Boyacı, İ.H. Modeling and optimization I: Usability of response surface methodology. J. Food Eng. 2007, 78, 836–845. [Google Scholar] [CrossRef]
Chelladurai, S.J.S.; Murugan, K.; Ray, A.P.; Upadhyaya, M.; Narasimharaj, V.; Gnanasekaran, S. Optimization of process parameters using response surface methodology: A review. Mater. Today Proc. 2021, 37, 1301–1304. [Google Scholar] [CrossRef]
De Oliveira, L.G.; de Paiva, A.P.; Balestrassi, P.P.; Ferreira, J.R.; da Costa, S.C.; da Silva Campos, P.H. Response surface methodology for advanced manufacturing technology optimization: Theoretical fundamentals, practical guidelines, and survey literature review. Int. J. Adv. Manuf. Technol. 2019, 104, 1785–1837. [Google Scholar] [CrossRef]
Veza, I.; Spraggon, M.; Fattah, I.R.; Idris, M. Response surface methodology (RSM) for optimizing engine performance and emissions fueled with biofuel: Review of RSM for sustainability energy transition. Results Eng. 2023, 18, 101213. [Google Scholar] [CrossRef]
Boshagh, F.; Rostami, K. A review of application of experimental design techniques related to dark fermentative hydrogen production. J. Renew. Energy Environ. 2020, 7, 27–42. [Google Scholar]
Mäkelä, M. Experimental design and response surface methodology in energy applications: A tutorial review. Energy Convers. Manag. 2017, 151, 630–640. [Google Scholar] [CrossRef]
Mishra, P.; Mohapatra, T.; Sahoo, S.S.; Padhi, B.N.; Giri, N.C.; Emara, A.; AboRas, K.M. Experimental assessment and optimization of the performance of a biodiesel engine using response surface methodology. Energy Sustain. Soc. 2024, 14, 28. [Google Scholar] [CrossRef]
Pais-Chanfrau, J.M.; Núñez-Pérez, J.; del Carmen Espin-Valladares, R.; Lara-Fiallos, M.V.; Trujillo-Toledo, L.E. Uses of the response surface methodology for the optimization of agro-industrial processes. In Response Surface Methodology in Engineering Science; IntechOpen: London, UK, 2021. [Google Scholar]
Boublia, A.; Lebouachera, S.E.I.; Haddaoui, N.; Guexxout, Z.; Ghriga, A.A.; Hasanzadeh, M.; Benguerba, Y.; Drouiche, N. State-of-the-art review on recent advances in polymer engineering: Modeling and optimization through response surface methodology approach. Polym. Bull. 2023, 80, 5999–6031. [Google Scholar] [CrossRef]
Bezerra, M.A.; Ferreira, S.L.C.; Novaes, C.G.; Dos Santos, A.M.P.; Valasques, G.S.; da Mata Cerqueira, U.M.F.; dos Santos Alves, J.P. Simultaneous optimization of multiple responses and its application in Analytical Chemistry—A review. Talanta 2019, 194, 941–959. [Google Scholar] [CrossRef]
Dejaegher, B.; Vander Heyden, Y. Experimental designs and their recent advances in set-up, data interpretation, and analytical applications. J. Pharm. Biomed. Anal. 2011, 56, 141–158. [Google Scholar] [CrossRef]
Szpisják-Gulyás, N.; Al-Tayawi, A.N.; Horváth, Z.H.; László, Z.; Kertész, S.; Hodúr, C. Methods for experimental design, central composite design and the Box–Behnken design, to optimise operational parameters: A review. Acta Aliment. 2023, 52, 521–537. [Google Scholar] [CrossRef]
Olabinjo, O.O. Response surface techniques as an inevitable tool in optimization process. In Response Surface Methods—Theory, Applications and Optimization Techniques; IntechOpen: London, UK, 2024. [Google Scholar]
Myers, R.H. Classical and Modern Regression with Applications, 2nd ed.; Duxbury Press: Monterey, CA, USA, 1990. [Google Scholar]
Berger, D.E. Introduction to Multiple Regression. Master’s Thesis, Claremont Graduate University, Claremont, CA, USA, 2008. [Google Scholar]
Meloun, M.; Militký, J. Detection of single influential points in OLS regression model building. Anal. Chim. Acta. 2001, 439, 169–191. [Google Scholar] [CrossRef]
Bhattacharya, S. Central composite design for response surface methodology and its application in pharmacy. In Response Surface Methodology in Engineering Science; IntechOpen: London, UK, 2021. [Google Scholar]
Rodrigues, A.C. Response surface analysis: A tutorial for examining linear and curvilinear effects. Rev. Adm. Contemp. 2021, 25, e200293. [Google Scholar] [CrossRef]
Reza, A.; Chen, L.; Mao, X. Response surface methodology for process optimization in livestock wastewater treatment: A review. Heliyon 2024, 10, e30326. [Google Scholar] [CrossRef]
Won, J.K.; Lee, J.H.; Lee, J.T.; Lee, E.S. The selection on the optimal condition of Si-wafer final polishing by combined Taguchi method and respond surface method. Trans. Korean Soc. Eng. A. 2008, 17, 21–28. [Google Scholar]
Lee, E.S.; Hwang, S.C.; Lee, J.T.; Won, J.K. A study on the characteristic of parameters by the response surface method in final wafer polishing. Int. J. Precis. Eng. Manuf. 2009, 10, 25–30. [Google Scholar] [CrossRef]
Zhang, H.; Jiang, Z.; Guo, C. Simulation-based optimization of dispatching rules for semiconductor wafer fabrication system scheduling by the response surface methodology. Int. J. Adv. Manuf. Technol. 2009, 41, 110–121. [Google Scholar] [CrossRef]
Seo, J.; Kim, J.H.; Lee, M.; You, K.; Moon, J.; Lee, D.H.; Paik, U. Multi-objective optimization of tungsten CMP slurry for advanced semiconductor manufacturing using a response surface methodology. Mater. Des. 2017, 117, 131–138. [Google Scholar] [CrossRef]
Saleem, M.M.; Somá, A. Design of experiments based factorial design and response surface methodology for MEMS optimization. Microsyst. Technol. 2015, 21, 263–276. [Google Scholar] [CrossRef]
Noordin, M.Y.; Venkatesh, V.C.; Sharif, S.; Elting, S.; Abdullah, A. Application of response surface methodology in describing the performance of coated carbide tools when turning AISI 1045 steel. J. Mater. Process. Technol. 2004, 145, 46–58. [Google Scholar] [CrossRef]
Bouacha, K.; Yallese, M.A.; Mabrouki, T.; Rigal, J.F. Statistical analysis of surface roughness and cutting forces using response surface methodology in hard turning of AISI 52100 bearing steel with CBN tool. Int. J. Refract. Met. Hard Mater. 2010, 28, 349–361. [Google Scholar] [CrossRef]
Elbah, M.; Aouici, H.; Meddour, I.; Yallese, M.A.; Boulanouar, L. Application of response surface methodology in describing the performance of mixed ceramic tool when turning AISI 4140 steel. Mech. Ind. 2016, 17, 309. [Google Scholar] [CrossRef]
Campos, d.S.P.H.; de Carvalho Paes, V.; de Carvalho Gonçalves, E.D.; Ferreira, J.R.; Balestrassi, P.P.; Davim, J.P. Optimizing production in machining of hardened steels using response surface methodology. Acta Sci. Technol. 2019, 41, e38091. [Google Scholar] [CrossRef]
Khalil, K.; Mohd, A.; Mohamad, C.O.C.; Faizul, Y.; Ariffin, S.Z. The optimization of machining parameters on surface roughness for AISI D3 steel. J. Phys. Conf. Ser. 2021, 1874, 012063. [Google Scholar] [CrossRef]
Pajaie, H.S.; Taghizadeh, M. Optimization of nano-sized SAPO-34 synthesis in methanol-to-olefin reaction by response surface methodology. J. Indust. Eng. Chem. 2015, 24, 59–70. [Google Scholar] [CrossRef]
Jourshabani, M.; Badiei, A.; Lashgari, N.; Mohammadi Ziarani, G. Application of response surface methodology as an efficient approach for optimization of operational variables in benzene hydroxylation to phenol by V/SBA-16 nanoporous catalyst. J. Nanostructures 2016, 6, 107–115. [Google Scholar]
Sheng, X.; Cheng, Y.; Yao, Y.; Zhao, Z. Optimization of synthesizing upright ZnO rod arrays with large diameters through response surface methodology. Processes 2020, 8, 655. [Google Scholar] [CrossRef]
Rakhmanova, A.; Kalybekkyzy, S.; Soltabayev, B.; Bissenbay, A.; Kassenova, N.; Bakenov, Z.; Mentbayeva, A. Application of response surface methodology for optimization of nanosized zinc oxide synthesis conditions by electrospinning technique. Nanomaterials 2022, 12, 1733. [Google Scholar] [CrossRef]
Pakolpakçıl, A.; Kılıç, A.; Draczynski, Z. Optimization of the centrifugal spinning parameters to prepare poly (butylene succinate) nanofibers mats for aerosol filter applications. Nanomaterials 2023, 13, 3150. [Google Scholar] [CrossRef]
Sreekumar, S.; Chakrabarti, S.; Hewitt, N.; Mondol, J.D.; Shah, N. Performance prediction and optimization of nanofluid-based PV/T using numerical simulation and response surface methodology. Nanomaterials 2024, 14, 774. [Google Scholar] [CrossRef]
Rawlings, J.O.; Pantula, S.G.; Dickey, D. Applied Regression Analysis; Springer: New York, NY, USA, 1998. [Google Scholar]
Allen, M.P. Understanding Regression Analysis; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
Dielman, T.E. Applied Regression Analysis for Business and Economics, 4th ed.; Duxbury/Thomson Learning: Pacific Grove, CA, USA, 2005. [Google Scholar]
Mendenhall, W.; Sincich, T. Regression Analysis. A Second Course in Statistics, 12th ed.; Pearson: London, UK, 2012. [Google Scholar]
Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2021. [Google Scholar]
Ryan, T.P. Modern Regression Methods; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
Wilcox, R.R.; Keselman, H.J. Modern regression methods that can substantially increase power and provide a more accurate understanding of associations. Eur. J. Pers. 2012, 26, 165–174. [Google Scholar] [CrossRef]
Marinoiu, C. Classic and modern in regression modelling. Econom. Insights Trends Chall. 2017, 69, 41–50. [Google Scholar]
Rowley, E.K. Comparison of Variable Selection Methods. Ph.D. Thesis, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA, 2019. [Google Scholar]
Chowdhury, M.Z.I.; Turin, T.C. Variable selection strategies and its importance in clinical prediction modelling. Fam. Med. Community Health 2020, 8, e000262. [Google Scholar] [CrossRef]
Box, G.E.P.; Draper, N.R. Response Surface, Mixtures, and Ridge Analysis; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2007. [Google Scholar]
Yang, K.; Tu, J.; Chen, T. Homoscedasticity: An overlooked critical assumption for linear regression. Gen. Psychiatry 2019, 32, e100148. [Google Scholar] [CrossRef]
Wang, G.C.; Akabay, C.K. Heteroscedasticity: How to handle in regression modeling. J. Bus. Forecast. 1994, 13, 11. [Google Scholar]
Agunbiade, D.A.; Adeboye, N.O. Estimation of heteroscedasticity effects in a classical linear regression model of a cross-sectional data. J. Pro. Appl. Math. 2012, 4, 18–28. [Google Scholar]
Kumar, N.K. Autocorrelation and heteroscedasticity in regression analysis. J. Business Soc. Sci. 2023, 5, 9–20. [Google Scholar] [CrossRef]
Stevens, J.P. Outliers and influential data points in regression analysis. Psychol. Bull. 1984, 95, 334. [Google Scholar] [CrossRef]
Chatterjee, S.; Hadi, A.S. Regression Analysis by Example; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
Snee, R.D. Validation of regression models: Methods and examples. Technometrics 1977, 19, 415–428. [Google Scholar] [CrossRef]
Green, S.B. How many subjects does it take to do a regression analysis? Multivar. Behav. Res. 1991, 26, 499–510. [Google Scholar] [CrossRef]
Khamis, H.J.; Kepler, M. Sample size in multiple regression: 20 + 5k. J. Appl. Stat. Sci. 2010, 17, 505–517. [Google Scholar]

Figure 1. Contour plots of the complete and adequate equations for the final polished surface roughness response of a silicon wafer. The contours present the effect of using the RSM equation on the relationship between the response variables and the influencing variables. (a) Full equation. (b) Adequate equation.

Figure 2. Contour plots of the complete and adequate equations for the average absolute roughness of AISI 52100 bearing steel. The contours present the effect of using the RSM equation on the relationship between the response variables and the influencing variables. (a) Full equation. (b) Adequate equation.

Figure 3. Contour plots of the complete and adequate equations for the average fiber diameter response. The contours present the effect of using the RSM equation on the relationship between the response variables and the influencing variables. (a) Full equation. (b) Adequate equation.

Table 1. The number of studies using each experimental methodology, determined using Google Scholar.

Year	Full Factorial Design (CFD)	Box–Behnken Design (BBD)	Central Composite Design (CCD)
2019	145,000	7250	218,000
2020	151,000	8450	235,000
2021	147,000	9860	224,000
2022	132,000	12,400	207,000
2023	114,000	13,600	159,000
2024	88,300	16,500	113,000

Table 2. Published data in the literature on engineering for evaluating the adequate equation of the response surface methodology.

Study	Targets	Number of Data Points and Experimental Design	Software	Model Evaluation	Criteria for Parameter Selection	Reported Model	Optimization
I. Semiconductor
Won et al. [23] y_SR = surface roughness x₁ = applied pressure x₂ = platen speed x₃ = slurry ratio	Si-wafer polishing	15 CCD	Not reported	Not reported	Not reported	Not reported	Contour plot, response surface plot
Lee et al. [24] y_SR = surface roughness x₁ = pressure x₂ = wheel speed x₃ = time	Final wafer polishing	15 BBD	MINITAB	R²	Not reported	Full models	Contour plot, response surface plot
Zhang et al. [25] y_CT = CT y_WTP = WTP y_TP = TP x₁ = Ub x₂ = C₁ x₃ = C₂	Wafer fabricating	20 CCD	Design Expert, version not mentioned	ANOVA lack of fit, PRESS, R², R²adj	Not reported	Full models	Contour plot, response surface plot
Seo et al. [26] y_w (WAPR) y_oxide (Oxide MRR) x₁ = Free concentration x₂ = H₂O₂ x₃ = SiO₂		15 CCD	MINITAB	ANOVA R², R²adj	t-value p-value	Full models	Contour plot, response surface plot
Saleem and Soma [27] y_PV = pull-in voltage x₁ = TEL x₂ = TEN x₃ = TSL x₄ = TSW	MEMS	27 BBD	Not reported	R², R²adj	F-value p-value	Full models	Contour plot, response surface plot
II. Steel materials
Noordin et al. [28] y_Ra = surface roughness y_Rc = tangential force x₁ = cutting speed x₂ = SCEA	Coated carbide tools AISI 1045 steel	16 CCD	Design Expert Ver. 6.0	ANOVA R², R²adj Lock of First, PRESS	p-value backward elimination	Y_Ra = f(x₂, x₃, x₂x₃, x₃²) Y_R_c = f(x₁, x₂, x₃, x₂x₃, x₃²)	Contour plot, response surface plot
Bouacha et al. [29] y_Ra = Ra y_Rt = Rt y_Rz = Rz x₁ = _VC x₂ = f x₃ = ap	Surface roughness, cutting forces AISI 52100 steel	27 Taguchi orthogonal array	Not reported	ANOVA R² R²_adj	F-value p-value At once delete variance	Y_Ra = f(x₁, x₂, x₃, x₁x₂) Y_Rt = f(x₁, x₂, x₁x₂, x₂²) Y_Rz = f(x₁, x₂, x₁x₂, x₂²)	Contour plot, response surface plot
Elbah et al. [30] y_Fa = Fa y_Fr = Fr y_Ft = Ft y_Ra = Ra x₁ = depth of cut x₂ = feed rate x₃ = cutting speed	Mixed ceramic tool, AISI 4140 steel	27 CCD	Design Expert 8.0.7	ANOVA R² R²_adj	F-value Some variables are not significant in ANOVA	$y_{F a} ~ y_{R a}$ full model	Contour plot, response surface plot
Campos et al. [31] y_Time = Time y_Ra = Ra y_Rt = Rt x₁ = VC x₂ = f x₃ = ap	Machining of hardened steel	19 CCD	Design Expert, version not mentioned	ANOVA R² R²_adj	Sequential model for some terms	$y_{T i m e} ~ y_{R t}$ full model	Contour plot, response surface plot
Khalil et al. [32] y_RS = surface roughness x₁ = cutting spend x₂ = feed rate x₃ = depth of cut	Surface roughness AISI D3 steel	20 CCD	Design Expert, version not mentioned	Lack of fit	F-value p-value (cutoffs p < 0.1)	Full model	Contour plot, response surface plot
III. Nanomaterials
Pajaie and Taghizadeh [33] y_Ethylene = ethylene y_Propylene = propylene x₁ = MW aging time x₂ = US aging time x₃ = HT time	Yield	15 BBD	Design Expert ver. 6	R² R²_adj	F-value p-value ANOVA Table.	Full model	Contour plot, response surface plot
Jourshabani et al. [34] y_{phenol yield} = phenol yield x₁ = temperature x₂ = H₂O₂ content x₃ = Catalyst	Benzene hydroxylation	20 CCD	Design Expert Ver. 7.1.3	R² R²_adj	F-value p-value	Full model	Contour plot, response surface plot
Sheng et al. [35] y_TC₀₀₂ = TC002 y_{Aspect ratio} = Aspect ratio y_D = D x₁ = concentration x₂ = temperature x₃ = catalyst		27 CCD	MOODE Ver. 10	ANOVA lack of fit	Not reported	Not reported Not reported log ( $y_{D} + 0.5$ ) of f(x₁, x₂, x₃, x₄, x₁x₂, x₂x₃, x₂x₄, x₃x₄, x₃²)	Contour plot, response surface plot Contour plot, response surface Contour plot, response surface plot
Rakhmanova et al. [36] y_Zinc Oxide = zinc oxide synthesis x₁ = applied potential x₂ = distance x₃ = temperature		15 BBD	Design Expert ver. 8.0.7.1	R², R²_adj Lack of fit, PRESS	Not reported	Not reported	Contour plot, response surface plot
Pakolpakcil et al. [37] y_Afd = Average fiber diameter x₁ = concentration x₂ = Rotational speed x₃ = Needle size	Poly nanofiber mats	15 BBD	Design Expert ver. 13	ANOVA sequential model, quadratic, and interaction	F-value p-value	Full model	Contour plot, response surface plot
Sreekumar et al. [38] y_Nth = Nth y_nele = nele y_nex_,th = nex,th y_nex_,ele = nex,ele x₁ = _φ% x₂ = m x₃ = I x₄ = Ti	Nanofluid-based PV/T	27 CCD	Design Expert, version not mentioned	ANOVA R², R²_adj	F-value p-value	Full model	Contour plot, Response surface plot

Table 3. Criteria for the evaluation of the RSM equations.

Criterion	Description	Cutoffs
R²	The coefficient of determination is used to determine the relationship between the response and the independent variable.	R² value near 1.0
Adjusted R²	This value takes into account the impact of the number of independent variables on the R-squared value. The closer the adjusted R-squared (R²_adj) is to 1.0, the better the descriptive ability of the regression equation.	R²_adj value closer to 1.0
s	This represents the actual variability in the equation regarding the data distribution between the response and independent variables. It indicates the precision of those estimates. It reflects the variability in the estimates across different random samples from the same population.	A smaller s suggests a more precise estimate, meaning the estimated coefficient is likely closer to the actual population value.
t-value	The t-value is used to test the null hypothesis that the coefficients of the independent variables are significantly different from zero.	A large t-value for the independent variable indicates that the coefficient is statistically significant and not equal to zero.
p-value	The variable coefficient is calculated from its t-value and used to test the null hypothesis that the coefficients of the independent variables are significantly different from zero.	The p-value represents the probability of incorrectly determining whether the coefficient of the variable is not zero. A smaller p-value represents a greater probability of the validity of the variable.
PRESS, predicted residual error sum of squares	This evaluates the predictive ability of the regression model	The smaller, the better
Normality test	The normality test is used to evaluate whether the dataset is normally distributed. The normality test technique used in this study is the Kolmogorov–Smirnov method.	The p-value calculated with this method compares the preset value (p = 0.05).
Constant variance test	This assesses whether the dependent variable (response) has constant variance across its overall sources. The testing technique used in this study is the Spearman rank correlation method.	The cutoff value is p = 0.05.
t_i, externally studentized residuals	This computes the standard error of the residual of the estimated value, and this data is not used in model building.	Values of $\pm 2.0$ are usually used to indicate the possibility of an outlier.
DFFITS_i	This evaluates the prediction effect for a data point. It is used to compare the estimated standard errors when the observed value is removed.	The cutoffs of DFFITSi are $\pm 2.0$ .

Table 4. Results of the evaluation of adequate RSM equations for semiconductor manufacturing.

Source	Reported Equations	Contour and Surface Response Plot	Adequate Equations	Normality Test	Constant Variance Test	Influential Data
Won et al. [23]	Not reported	Curve surface	y_SR = f(x₁, x₃, x₁x₃, x₁², x₃²)	Passed	Passed	2, 15
Lee et al. [24]	Full models	Curve surface	y_SR = f(x₁, x₂, x₃, x₁x₃, x₂², x₃²)	Passed	Passed	4, 6, 7, 12
Zhang et al. [25]	Full models	Curve surface	y_CT = f(x₁, x₂, x₁², x₂²) y_WTD = f(x₁, x₂, x₃, x₁x₂, x₂², x₃²) y_TP = f(x₁, x₂, x₃, x₁², x₂², x₃², x₁x₂)	Passed Failed Failed	Passed Passed Failed	5, 13, 17 13 5, 14, 18
Seo et al. [26]	y_W = full model	Curve surface	y_W = f(x₂)	Passed	Passed	12, 14
	y_Oxide = full model		y_Oxide = (x₁, x₃)	Passed	Passed	no
Saleem and Soma [27]	Full model	Curve surface	y_PV = f(x₁, x₂, x₃, x₄, x₁x₃, x₃²)	Failed	Passed	27

Table 5. Results of the evaluation of adequate RSM equations for steel materials.

Source	Reported Equations	Contour and Surface Response Plot	Adequate Equations	Normality Test	Constant Variance Test	Influential Data Points
Noordin et al. [28]	y_SR = f(x₂, x₃, x₂x₃, x₃²) y_TF = f(x₁, x₂, x₃, x₂x₃, x₃²)	Curve surface	y_SR = f(x₂, x₃, x₁x₂, x₂x₃, x₃²) y_TF = f(x₁, x₂, x₃, x₂x₃, x₃²)	Passed	Passed	7, 12, 13
Bouacha et al. [29]	y_Ra = f(x₁, x₂, x₃, x₁x₂)	Curve surface	y_Ra = f(x₁, x₂, x₁x₂, x₁², x₂²)	Passed	Passed	6, 13
	y_Rt = f(x₁, x₂, x₁x₂, x₂²)	Curve surface	y_Rt = f(x₁, x₂, x₁x₂, x₁², x₂²)	Passed	Passed	1, 4
	y_Rz = f(x₁, x₂, x₁x₂, x₂²)	Curve surface	y_Rz = f(x₁, x₂, x₃, x₁x₂, x₁x₃, x₁², x₂²)	Failed	Passed	5, 6
Elbah et al. [30]	Full models for y_Fa, y_Fr, y_Ft, y_Ra	Curve surface	y_Fa = f(x₁, x₂, x₃, x₁x₃, x₂x₃)	Passed	Passed	26
			y_Fr = f(x₁, x₂, x₃, x₁x₂, x₁x₃)	Passed	Passed	24
			y_Ft = f(x₁, x₂, x₃, x₁x₂, x₁x₃, x₂x₃)	Passed	Passed	27
			y_Ra = f(x₁, x₂, x₁x₂)	Passed	Passed	No
Campos et al. [31]	Full models	Curve surface	y_Time = full equation	Passed	Passed	1, 8, 16, 17
			y_Ra = f(x₁, x₂, x₃, x₁x₂, x₁x₃, x₂x₃, x₁², x₃²) y_Rt = f(x₁, x₁²)	Passed Passed	Passed Passed	1, 3 4, 6 18
Khalil et al. [32]	Full model	Curve surface	y_SR = f(x₁, x₂, x₃, x₁x₂, x₁x₃, x₃²)	Passed	Passed	1, 7, 17, 20

Table 6. Results of the evaluation of adequate RSM equations for nanomaterials.

Source	Reported Equations	Contour and Surface Response Plot	Adequate Equations	Normality Test	Constant Variance Test	Influential Data
Pakolpakcil et al. [37]	Full model	Curve surface	y_Afd = f(x₁, x₂, x₃, x₂x₃, x₁², x₃²)	Passed	Failed	12
Pajaie and Taghizadeh [33]	Full model	Curve surface	y_Ethylene = full model y_Propylene = f(x₁, x₂, x₃, x₁x₃, x₁², x₂²)	Passed	Passed	3, 5, 8, 9, 11
Jourshabani et al. [34]	Full model	Curve surface	y_{phenol yield} = f(x₁, x₂, x₃, x₁², x₂², x₃²)	Passed	Passed	9
Sheng et al. [35]	log ( $y_{D} + 0.5$ ) = f(x₁, x₂, x₃, x₄, x₁x₂, x₂x₃, x₂x₄, x₃x₄, x₃²)	Curve surface	y_D = f(x₁, x₂, x₃, x₄, x₁x₂, x₂x₄, x₃x₄, x₃²)	Passed	Passed	None
Rakhmanova et al. [36]	Not reported	Curve surface	y_{Zine oxide} = f(x₂, x₃)	Passed	Passed	6, 14
Sareekumar et al. [38]	Full models for y_nth-y_nex,ele	Curve surface	Y_nth = f(x₁, x₂, x₄)	Passed	Passed	18
		Curve surface	y_nele = f(x₁, x₂, x₃, x₄, x₂x₃, x₂x₄)	Passed	Passed	7, 24, 25
		Curve surface	y_nex,th = f(x₁, x₂, x₃, x₂x₃, x₂x₄, x₃x₄)	Passed	Passed	3, 25
		Curve surface	y_nex,ele = f(x₁, x₂, x₃, x₄, x₁x₃, x₂x₃, x₂²)	Passed	Failed	3, 14

Table 7. Common issues with the applications of RSM in the literature.

Issue	Literature
The full model was used to generate contour plots and three-dimensional response surface plots, which were then used to optimize these variables. However, the coefficient values, t-values, and p-values for each variable in the ANOVA table were not used.	[24,25,26,27,30,32,33,34,38]
2. The contour plots and three-dimensional response surface plots with the full equation were proposed, but the RSM model was not reported.	[36]
3. All variables with p-values higher than the preselected value (usually p < 0.05) were deleted at once.	[29]
4. The ANOVA table of the sequential model was used and all variables were included in the linear or squared term directly, without conducting significance testing for each variable.	[31,37]
5. Datasets did not pass the normality test.	[25,27,29]
6. Datasets did not pass the constant variance test.	[25,37,38]
7. Influential data points were found.	[23,24,25,26,27,28,29,30,31,32,33,34,36,37,38]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the International Institute of Knowledge Innovation and Invention. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, H.-Y.; Chen, C. A Study of the Response Surface Methodology Model with Regression Analysis in Three Fields of Engineering. Appl. Syst. Innov. 2025, 8, 99. https://doi.org/10.3390/asi8040099

AMA Style

Chen H-Y, Chen C. A Study of the Response Surface Methodology Model with Regression Analysis in Three Fields of Engineering. Applied System Innovation. 2025; 8(4):99. https://doi.org/10.3390/asi8040099

Chicago/Turabian Style

Chen, Hsuan-Yu, and Chiachung Chen. 2025. "A Study of the Response Surface Methodology Model with Regression Analysis in Three Fields of Engineering" Applied System Innovation 8, no. 4: 99. https://doi.org/10.3390/asi8040099

APA Style

Chen, H.-Y., & Chen, C. (2025). A Study of the Response Surface Methodology Model with Regression Analysis in Three Fields of Engineering. Applied System Innovation, 8(4), 99. https://doi.org/10.3390/asi8040099

Article Menu

A Study of the Response Surface Methodology Model with Regression Analysis in Three Fields of Engineering

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Sources for the Equations of the Response Surface Methodology

2.2. Model Building for the Response Surface Methodology

2.3. Basic and Complete Regression Analyses

2.4. Assumptions Involved in the Regression Analysis

2.5. Establishment of the Model

2.6. Criteria for the Evaluation of RSM Equations

3. Results

3.1. Semiconductor Manufacturing

3.2. Steel Materials

3.3. Nanomaterials

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Data from the Coating Experiment [49]

Appendix A.2. Evaluation of the RSM Models for y_Ra, Ra Surface Roughness [28]

Appendix A.3. Evaluation of the RSM Models for y_Rc, Tangent Force [28]

Appendix A.4. Evaluation of the RSM Models for y_Afd, Average Fiber Diameter [37]

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A Study of the Response Surface Methodology Model with Regression Analysis in Three Fields of Engineering

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Sources for the Equations of the Response Surface Methodology

2.2. Model Building for the Response Surface Methodology

2.3. Basic and Complete Regression Analyses

2.4. Assumptions Involved in the Regression Analysis

2.5. Establishment of the Model

2.6. Criteria for the Evaluation of RSM Equations

3. Results

3.1. Semiconductor Manufacturing

3.2. Steel Materials

3.3. Nanomaterials

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Data from the Coating Experiment [49]

Appendix A.2. Evaluation of the RSM Models for yRa, Ra Surface Roughness [28]

Appendix A.3. Evaluation of the RSM Models for yRc, Tangent Force [28]

Appendix A.4. Evaluation of the RSM Models for yAfd, Average Fiber Diameter [37]

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Appendix A.2. Evaluation of the RSM Models for y_Ra, Ra Surface Roughness [28]

Appendix A.3. Evaluation of the RSM Models for y_Rc, Tangent Force [28]

Appendix A.4. Evaluation of the RSM Models for y_Afd, Average Fiber Diameter [37]