#### 3.1. Results of The Model

The list of input variables taken into account in this research work is shown in

Table 1. The total number of dependent variables (output variables) used to build the MARS models was two:

Continuity factor (C_Factor) and the

Average Width of the spots (Ave-Width) forming the centerline segregation. Indeed, we have built two different MARS models taking as dependent variables

C_Factor and

Ave-Width, respectively.

**Table 1.**
Set of input variables used in this study with their mean and standard deviation.

**Table 1.**
Set of input variables used in this study with their mean and standard deviation.
Input variables | Name of the variable | Mean | Standard deviation |
---|

Total aluminum (measured as weight%) | Al | 0.030 | 0.006 |

Total manganese (measured as weight%) | Mn | 1.357 | 0.050 |

Total sulfur (measured as weight%) | S | 0.009 | 0.002 |

Total carbon (measured as weight%) | C | 0.173 | 0.014 |

Total phosphorus (measured as weight%) | P | 0.016 | 0.004 |

Superheating (°C) | Overtemperature | 24.545 | 8.940 |

Percentage of negative strip | Ratio_Strip | 68.517 | 21.519 |

Specific flow (m^{3}·s^{−1}) | Specific_Flow | 0.633 | 0.074 |

Average casting speed (m·s^{−1}) | Ave_Speed | 0.957 | 0.143 |

Mold oscillation frequency | Freq_Oscillation | 2.043 | 0.688 |

Temperature in segment 8 (°C) | Temp_Seg8 | 816.472 | 265.506 |

Temperature in segment 17 (°C) | Temp_Seg17 | 771.911 | 246.454 |

Silicon (measured as weight%) | Si | 0.201 | 0.048 |

In this research work, two second-order MARS models have been used, so that the basis functions of the model consist of linear and second-order splines and the maximum number of terms was not limited (no pruning). The results of the MARS models computed using all the available data observations are shown in

Table 2 and

Table 4.

Table 2 and

Table 3 show a list of 43 and 60 main basis functions for each of the two MARS models and their coefficients, respectively. Please note that

$h\left(x\right)$ =

x if

x>0 and

$h\left(x\right)$ = 0 if

$x\le 0$. Therefore, the MARS model is a form of non-parametric regression technique and can be seen as an extension of linear models that automatically models nonlinearities and interactions as a weighted sum of basis functions called

hinge functions [

14,

15,

16,

17,

18,

19,

20,

21,

22,

23]. The predicted response for C factor (C_Factor) and average width (Ave–Width) is now a better fit to the original values since the MARS model has automatically produced a kink in the predicted dependent variable to take into account nonlinearities.

According to the results shown in

Table 3, the most important variables for the prediction of the C factor (output variable) are as follows (in hierarchical order): Si, Temp_Seg8, S, Ratio_Strip, Mn, Temp_Seg17, Al, C, Overtemperature, P, Freq_Oscillation and Ave_Speed. Specific_Flow input variable is discarded by this model. Indeed, the most important variable is the silicon concentration (Si). This is due to that the silicon proceeds from the detachment of the refractory material during all the steel production steps.

**Table 2.**
List of basis functions of the Method Multivariate Adaptive Regression Splines (MARS) model for the C factor (C_Factor) and their coefficients c_{i}.

**Table 2.**
List of basis functions of the Method Multivariate Adaptive Regression Splines (MARS) model for the C factor (C_Factor) and their coefficients c_{i}.
${B}_{i}$ | Definition | ${c}_{i}$ |
---|

${B}_{1}$ | 1 | 80.112 |

${B}_{2}$ | h (Ratio_Strip − 75.117) | −286.265 |

${B}_{3}$ | h (Ratio_Strip − 75.378) | 471.796 |

${B}_{4}$ | h (Ave_Speed – 1.16) | 6177.268 |

${B}_{5}$ | h (1.16 − Ave_Speed) | 91.964 |

${B}_{6}$ | h (Temp_Seg8 − 870) | 8.631 |

${B}_{7}$ | h (Temp_Seg8 − 889) | −20.522 |

${B}_{8}$ | h (889 – Temp_Seg8) | 0.563 |

${B}_{9}$ | h (Temp_Seg8 − 906) | 11.476 |

${B}_{10}$ | h (Al – 0.0247) | 8358.107 |

${B}_{11}$ | h (Al – 0.0371) | −7741.410 |

${B}_{12}$ | h (Si – 0.2276) × h (889 – Temp_Seg8) | 22.903 |

${B}_{13}$ | h (0.2276 – Si) × h (889 – Temp_Seg8) | −10.688 |

${B}_{14}$ | h (0.2483 − Si) × h (Temp_Seg8 − 870) | 6.243 |

${B}_{15}$ | h (S – 0.0091) × h (Temp_Seg8 − 889) | 433.489 |

${B}_{16}$ | h (0.0194 − P) × h (Temp_Seg8 – 906) | 240.291 |

${B}_{17}$ | h (Freq_Oscillation – 2.43) × h (Ratio_Strip – 75.378) | 697.928 |

${B}_{18}$ | h (75.378 – Ratio_Strip) × h (Temp_Seg8 − 953) | −30.322 |

${B}_{19}$ | h (75.378 – Ratio_Strip) × h (Temp_Seg8 – 938) | 12.800 |

${B}_{20}$ | h (889 – Temp_Seg8) × h (Temp_Seg17 – 883) | 0.433 |

${B}_{21}$ | h (881 – Temp_Seg8) × h (Al – 0.0247) | −35.436 |

${B}_{22}$ | h (Temp_Seg8 − 889) × h (0.0329 – Al) | 537.071 |

${B}_{23}$ | h (Temp_Seg8 − 906) × h (Al – 0.0304) | −219.353 |

${B}_{24}$ | h (Temp_Seg8 – 906) × h (0.0304 – Al) | −961.240 |

${B}_{25}$ | h (C − 0.1863) × h (0.0091 − S) × h (Temp_Seg8 – 889) | −97083.453 |

${B}_{26}$ | h (C – 0.19) × h (75.378 – Ratio_Strip) × h (Temp_Seg8 – 953) | −16338.606 |

${B}_{27}$ | h (C – 0.1739) × h (Temp_Seg8 – 889) × h(Al – 0.0329) | 114852.181 |

${B}_{28}$ | h (Mn – 1.3736) × h (0.0091 – S) × h (Temp_Seg8 – 889) | −16604.875 |

${B}_{29}$ | h (Mn – 1.3464) × h (889 – Temp_Seg8) × h (Temp_Seg17 – 883) | −11.470 |

${B}_{30}$ | h (1.3464 – Mn) × h (889 – Temp_Seg8) × h (Temp_Seg17 – 883) | 38.383 |

${B}_{31}$ | h (0.2276 – Si) × h (P – 0.0166) × h (889 – Temp_Seg8) | 503.269 |

${B}_{32}$ | h (Si – 0.2095) × h (75.378 – Ratio_Strip) × h (953 – Temp_Seg8) | −18.490 |

B_{33} | h (0.2095 − Si) ×h (75.378 – Ratio_Strip) × h (953 – Temp_Seg8) | 0.124 |

${B}_{34}$ | h (0.2483 − Si) × h (Ratio_Strip – 75.977) × h (Temp_Seg8 – 870) | −4789.996 |

${B}_{35}$ | h (0.2483 − Si) × h (Temp_Seg8 – 870) × h (Temp_Seg17 – 815) | −0.133 |

${B}_{36}$ | h (S – 0.0089) × h (Freq_Oscillation – 2.16) × h (899 – Temp_Seg8) | 2206.549 |

${B}_{37}$ | h (S – 0.0089) × h (2.16 − Freq_Oscillation) × h (899 – Temp_Seg8) | 59.436 |

${B}_{38}$ | h (0.0091 − S) × h (75.115 – Ratio_Strip) × h (Temp_Seg8 − 889) | 20,180.563 |

${B}_{39}$ | h (S – 0.0091) × h (Overtemperature – 25) × h (Temp_Seg8 – 889) | −200.213 |

${B}_{40}$ | h (S – 0.0091) × h (25 − Overtemperature) × h (Temp_Seg8 – 889) | −36.885 |

${B}_{41}$ | h (0.015 – P) × h (75.378 – Ratio_Strip) × h (Temp_Seg8 – 870) | −1053.411 |

${B}_{42}$ | h (2.43 – Freq_Oscillation) × h (Ratio_Strip – 75.37) × h (Overtemperature – 29) | 613.802 |

${B}_{43}$ | h (75.378 – Ratio_Strip) × h (953 – Temp_Seg8) × h (Al – 0.0383) | 0.443 |

${B}_{44}$ | h (75.378 – Ratio_Strip) × h (953 – Temp_Seg8) × h (0.0383 – Al) | −0.306 |

${B}_{45}$ | h (Ratio_Strip – 75.378) × h (Temp_Seg8 − 870) × h (Al − 0.0314) | 2183.857 |

${B}_{46}$ | h (Temp_Seg8 – 906) × h (Temp_Seg17 − 815) × h (0.0304 – Al) | 3.265 |

**Table 3.**
Evaluation of the importance of the variables that form the model for the C factor according to criteria Nsubsets, GCV and RSS.

**Table 3.**
Evaluation of the importance of the variables that form the model for the C factor according to criteria Nsubsets, GCV and RSS.
Variable | Nsubsets | GCV | RSS |
---|

Si | 45 | 100.0 | 100.0 |

Temp_Seg8 | 45 | 100.0 | 100.0 |

S | 44 | 91.8 | 92.1 |

Ratio_Strip | 44 | 91.8 | 92.1 |

Mn | 43 | 86.5 | 86.8 |

Temp_Seg17 | 43 | 86.5 | 86.8 |

Al | 42 | 81.0 | 81.4 |

C | 33 | 58.5 | 57.3 |

Overtemperature | 32 | 57.1 | 55.5 |

P | 31 | 55.7 | 53.8 |

Freq_Oscillation | 24 | 49.6 | 44.7 |

Ave_Speed | 20 | 42.3 | 37.7 |

**Table 4.**
List of basis functions of the MARS model for the average width (Ave_Width) and their coefficients c_{i}.

**Table 4.**
List of basis functions of the MARS model for the average width (Ave_Width) and their coefficients c_{i}.
${B}_{i}$ | Definition | ${c}_{i}$ |
---|

${B}_{1}$ | 1 | 0.2156 |

${B}_{2}$ | h (C − 0.1873) | −177.2487 |

${B}_{3}$ | h (0.1873 − C) | −29.2927 |

${B}_{4}$ | h (Si – 0.2483) | −45.3392 |

${B}_{5}$ | h (P–0.0174) | 1102.0646 |

${B}_{6}$ | h (Ave_Speed – 1.16) | 245.0236 |

${B}_{7}$ | h (1.16 − Ave_Speed) | 8.2028 |

${B}_{8}$ | h (749 – Temp_Seg17) | −0.0064 |

${B}_{9}$ | h (Temp_Seg17 – 900) | −0.1186 |

${B}_{10}$ | h (Si – 0.02152) × h (Temp_Seg17 – 749) | 0.3908 |

${B}_{11}$ | h (0.2152−Si) × h (Temp_Seg17 − 749) | 1.0 |

${B}_{12}$ | h (S − 0.0074) × h (Temp_Seg17 − 749) | −4.3071 |

${B}_{13}$ | h (0.0146 − P) × h (Temp_Seg17 − 749) | −11.1187 |

${B}_{14}$ | h (P − 0.0166) × h (749 − Temp_Seg17) | 1.8191 |

${B}_{15}$ | h (0.0166 − P) × h (749 − Temp_Seg17) | 57.9514 |

${B}_{16}$ | h (Freq_Oscillation − 2.53) × h (Temp_Seg17 − 749) | 0.1544 |

${B}_{17}$ | h (Ratio_Strip − 75.572) × h (Temp_Seg17 − 749) | 0.0440 |

${B}_{18}$ | h (75.572 − Ratio_Strip) × h (Temp_Seg17 − 749) | 0.0222 |

${B}_{19}$ | h (Temp_Seg8 − 921) × h (Temp_Seg17 − 749) | 0.0015 |

${B}_{20}$ | h (921 − Temp_Seg8) × h (Temp_Seg17 − 749) | 0.0002 |

${B}_{21}$ | h (Temp_Seg8 − 943) × h (Temp_Seg17 − 749) | −0.0017 |

${B}_{22}$ | h (Temp_Seg17 − 749) × h (Al − 0.0325) | 3.2277 |

${B}_{23}$ | h (749 − Temp_Seg17) × h (0.0302 − Al) | 0.7147 |

${B}_{24}$ | h (C − 0.1863) × h (0.0146 − P) × h (Temp_Seg17 − 749) | 25,383.1351 |

${B}_{25}$ | h (0.1855 − C) × h (921 − Temp_Seg8) × h (Temp_Seg17 − 749) | −0.0074 |

${B}_{26}$ | h (1.4062 − Mn) × h (1.16 − Ave_Speed) × h (Temp_Seg17 − 749) | −0.7150 |

${B}_{27}$ | h (Mn − 1.3506) × h (921 − Temp_Seg8) × h (Temp_Seg17 − 749) | −0.0026 |

${B}_{28}$ | h (0.1979 − Si) × h (2.53 − Freq_Oscillation) × h (Temp_Seg17 − 749) | −3.1666 |

${B}_{29}$ | h (0.2152 − Si) × h (Freq_Oscillation − 2.45) × h (Temp_Seg17 − 749) | −6.1100 |

${B}_{30}$ | h (0.2152 − Si) × h (2.45 − Freq_Oscillation) × h (Temp_Seg17 − 749) | −2.4387 |

${B}_{31}$ | h (0.2152 − Si) × h (0.95 − Ave_Speed) × h (Temp_Seg17 − 749) | 7.9399 |

${B}_{32}$ | h (0.1981 − Si) × h (921 − Temp_Seg8) × h (Temp_Seg17 − 749) | 0.0111 |

${B}_{33}$ | h (0.1957 − Si) × h (Temp_Seg17 − 749) × h (Al − 0.0325) | 132.0068 |

${B}_{34}$ | h (0.0074 − S) × h (P − 0.0127) × h (Temp_Seg17 − 749) | 24,770.8361 |

${B}_{35}$ | h (S − 0.0074) × h (Ratio_Strip − 75.864) × h (Temp_Seg17 − 749) | 118.2158 |

${B}_{36}$ | h (S − 0.0074) × h (Ratio_Strip − 75.977) × h (Temp_Seg17 − 749) | −190.8619 |

${B}_{37}$ | h (S − 0.0116) × h (921 − Temp_Seg8) × h (Temp_Seg17 − 749) | 0.0704 |

${B}_{38}$ | h (P − 0.0156) × h (2.53 − Freq_Oscillation) × h (Temp_Seg17 − 749) | 5.5200 |

${B}_{39}$ | h (0.0166 − P) × h (Freq_Oscillation − 1.62) × h (749 − Temp_Seg17) | −71.7430 |

${B}_{40}$ | h (0.0166 − P) × h (1.62 − Freq_Oscillation) × h (749 − Temp_Seg17) | −30.9687 |

${B}_{41}$ | h (P − 0.0146) × h (Ratio_Strip − 75.667) × h (Temp_Seg17 − 749) | −20.3425 |

${B}_{42}$ | h (P − 0.0146) × h (75.667 − Ratio_Strip) × h (Temp_Seg17 − 749) | −4.8084 |

${B}_{43}$ | h (P − 0.0166) × h (Ave_Speed − 0.88) × h (749 − Temp_Seg17) | −78.9370 |

${B}_{44}$ | h (0.0166 − P) × h (Ave_Speed − 1) × h (749 − Temp_Seg17) | −41.3467 |

B_{45} | h (0.0166 − P) × h (1 − Ave_Speed) × h (749 − Temp_Seg17) | −280.7197 |

${B}_{46}$ | h (P − 0.0166) × h (Overtemperature−9) × h (749 − Temp_Seg17) | −0.0209 |

B_{47} | h (P − 0.0146) × h (Temp_Seg8 − 879) × h (Temp_Seg17 − 749) | −0.1404 |

B_{48} | h (P − 0.0146) × h (879 − Temp_Seg8) × h (Temp_Seg17 − 749) | −0.0658 |

B_{49} | h (P − 0.0156) × h (Temp_Seg8 − 943) × h (Temp_Seg17 − 749) | 0.2179 |

B_{50} | h (0.0156 − P) × h (Temp_Seg8 − 943) × h (Temp_Seg17 − 749) | 0.1261 |

B_{51} | h (Freq_Oscillation − 2.04) × h (1.16 − Ave_Speed) × h (Temp_Seg17 − 749) | 0.1828 |

B_{52} | h (2.53 − Freq_Oscillation) × h (Ave_Speed − 1.09) × h (Temp_Seg17 − 749) | −3.6134 |

B_{53} | h (2.53 − Freq_Oscillation) × h (804 − Temp_Seg8) × h (Temp_Seg17 − 749) | 0.0013 |

B_{54} | h (75.756 − Ratio_Strip) × h (921 − Temp_Seg8) × h (Temp_Seg17 − 749) | −0.0002 |

B_{55} | h (Specific_Flow − 0.65) × h (Temp_Seg17 − 749) × h (0.0325 − Al) | −119.6246 |

B_{56} | h (Overtemperature − 30) × h (Temp_Seg8 − 921) × h (Temp_Seg17 − 749) | −0.0005 |

B_{57} | h (30 − Overtemperature) × h (Temp_Seg8 − 921) × h (Temp_Seg17 − 749) | −0.0001 |

B_{58} | h (30 − Overtemperature) × h (Temp_Seg17 − 749) × h (0.0325 − Al) | 0.1517 |

B_{59} | h (Temp_Seg8 − 910) × h (Temp_Seg17 − 749) × h (Al − 0.0325) | −0.1202 |

B_{60} | h (910 − Temp_Seg8) × h (Temp_Seg17 − 749) × h (Al − 0.0325) | −0.0317 |

Additionally, from the results shown in

Table 5, it is possible to observe that the most important variables for the prediction of the average width of the spots (output variable) forming the centerline segregation are (in hierarchical order): S, P, Temp_Seg17, Ratio_Strip, Al, Temp_Seg8, Ave_Speed, Si, Overtemperature, Freq_Oscillation, Mn, C and finally, Specific_Flow. Indeed, the most important variable is the sulfur (S). In other words, a high percentage of sulfur in the composition of steel is detrimental to its properties, for example the pore formation during the welding process,

etc.

**Table 5.**
Evaluation of the importance of the variables that form the model for the Average Width of the spots according to criteria Nsubsets, generalized cross-validation (GCV) and residual sum of squares (RSS).

**Table 5.**
Evaluation of the importance of the variables that form the model for the Average Width of the spots according to criteria Nsubsets, generalized cross-validation (GCV) and residual sum of squares (RSS).
Variable | Nsubsets | GCV | RSS |
---|

S | 30 | 100.0 | 100.0 |

P | 29 | 55.3 | 56.0 |

Temp_Seg17 | 28 | 47.6 | 48.2 |

Ratio_Strip | 28 | 47.6 | 48.2 |

Al | 27 | 45.6 | 45.8 |

Temp_Seg8 | 21 | 29.6 | 29.1 |

Ave_Speed | 20 | 26.6 | 26.2 |

Si | 13 | 15.8 | 15.8 |

Overtemperature | 10 | 12.7 | 12.7 |

Freq_Oscillation | 44 | 78.3 | 70.5 |

Mn | 43 | 77.3 | 69.0 |

C | 39 | 73.0 | 62.9 |

Specific_Flow | 8 | 31.6 | 24.6 |

Furthermore, a graphical representation of the terms that constitute the two MARS models can be seen in

Figure 5 and

Figure 6, respectively.

**Figure 5.**
Graphical representation of the terms composing the MARS model for the C factor: (**a**) first order term of the predictor variable Ratio_Strip; (**b**) first order term of the predictor variable Ave_Speed; (**c**) first order term of the predictor variable Temp_Seg8; (**d**) first order term of the predictor variable Aluminum content; (**e**) second order term of the variables Si content and Temp_Seg8; (**f**) second order term of the variables Sulfur contents and Temp_Seg8; (**g**) second order term of the variables P content and Temp_Seg8; (**h**) second order term of the variables Ratio_Strip and Temp_Seg8 value; (**i**) second order term of the variables Temp_Seg8 and Temp_Seg17; (**j**) second order term of the variables Temp_Seg8 and Aluminum content.

**Figure 5.**
Graphical representation of the terms composing the MARS model for the C factor: (**a**) first order term of the predictor variable Ratio_Strip; (**b**) first order term of the predictor variable Ave_Speed; (**c**) first order term of the predictor variable Temp_Seg8; (**d**) first order term of the predictor variable Aluminum content; (**e**) second order term of the variables Si content and Temp_Seg8; (**f**) second order term of the variables Sulfur contents and Temp_Seg8; (**g**) second order term of the variables P content and Temp_Seg8; (**h**) second order term of the variables Ratio_Strip and Temp_Seg8 value; (**i**) second order term of the variables Temp_Seg8 and Temp_Seg17; (**j**) second order term of the variables Temp_Seg8 and Aluminum content.

**Figure 6.**
Graphical representation of the terms composing the MARS model for the Average Width of the spots forming the centerline segregation: (**a**) first order term of the predictor variable Carbon content; (**b**) first order term of the predictor variable Si; (**c**) first order term of the variable P; (**d**) first order term of the variable Average Speed; (**e**) first order term of the variable Temp_Seg17; (**f**) second order term of the variables Si and Temp_Seg17; (**g**) second order term of the variables S and Temp_Seg17; (**h**) second order term of the variables P and Temp_Seg17; (**i**) second order term of the variables Freq_Oscillation and Temp_Seg17; **(j**) second order term of the variables Ratio_Strip and Temp_Seg17; (**k**) second order term of the variables Temp_Seg8 and Temp_Seg17; (**l**) second order term of the variables Temp_Seg17 and Aluminum.

**Figure 6.**
Graphical representation of the terms composing the MARS model for the Average Width of the spots forming the centerline segregation: (**a**) first order term of the predictor variable Carbon content; (**b**) first order term of the predictor variable Si; (**c**) first order term of the variable P; (**d**) first order term of the variable Average Speed; (**e**) first order term of the variable Temp_Seg17; (**f**) second order term of the variables Si and Temp_Seg17; (**g**) second order term of the variables S and Temp_Seg17; (**h**) second order term of the variables P and Temp_Seg17; (**i**) second order term of the variables Freq_Oscillation and Temp_Seg17; **(j**) second order term of the variables Ratio_Strip and Temp_Seg17; (**k**) second order term of the variables Temp_Seg8 and Temp_Seg17; (**l**) second order term of the variables Temp_Seg17 and Aluminum.

#### 3.2. The Goodness-Of-Fit for This Approach

It is important to select the model that best fits the experimental data. The following criterion was considered in this research: the coefficient of determination

${R}^{2}$ [

43]. As it is well known, in statistics, the coefficient of determination is used in the context of statistical models whose main purpose is the prediction of future outcomes on the basis of other related information [

17,

41,

42]. This ratio indicates the proportion of total variation in the dependent variables explained by the MARS model (C factor and average width of the spots in our case), that is to say, it provides a measure of how well future outcomes are likely to be predicted by the model. A dataset takes values

${t}_{i}$, each of which has an associated modeled value

${y}_{i}$. The former are called the observed values and the latter are often referred to as the predicted values. Variability in the dataset is measured through different sums of squares:

- (1)
$S{S}_{tot}={\displaystyle \sum _{i=1}^{n}{\left({t}_{i}-\overline{t}\right)}^{2}}$: the total sum of squares, proportional to the sample variance;

- (2)
$S{S}_{reg}={\displaystyle \sum _{i=1}^{n}{\left({y}_{i}-\overline{t}\right)}^{2}}$: the regression sum of squares, also called the explained sum of squares;

- (3)
$S{S}_{err}={\displaystyle \sum _{i=1}^{n}{\left({t}_{i}-{y}_{i}\right)}^{2}}$: the residual sum of squares.

In the previous sums,

$\overline{t}$ is the mean of the

n observed data:

A coefficient of determination value of 1.0 indicates that the regression curve fits the data perfectly. In this current research work, the two fitted MARS models for the

C factor and

Average Width of the spots have coefficients of determination equal to 0.93 and 0.95, respectively. These results indicate a very high goodness of fit for two MARS models analyzed.

Cross-validation is a model validation technique for assessing how the results of a statistical analysis will generalize to an independent dataset [

44]. It is mainly used in datasets where the goal is prediction, and one wants to estimate how accurately a predictive model will work in practice. The aim of cross validation is to define a dataset to test the model in the training phase, in order to limit problems like overfitting, give an insight on how the model will generalize to an independent data set,

etc. [

45].

Therefore, in order to guarantee the ability prediction of the two built MARS models, the cross validation [

44,

45] was the standard technique used here for finding a suitable set of hyperparameters of the three MARS models built in this research work. In this sense, the data set is randomly divided into

l disjoint subsets of equal size, and each subset is used once as a validation set, whereas the other

$l-1$ subsets are put together to form a training set. In the simplest case, the average accuracy of the

l validation sets is used as an estimator for the accuracy of the method. In this research work, 10-fold cross-validation was used, that is to say, to calculate the error criterion, the models were built using 90% of the sample and tested with the remaining 10%, thus simulating as closely as possibly the real conditions under which the model would be built in order to later fit it to new observation data unrelated to the construction of the models.

Finally, this research work was able to estimate the values of the

Continuity Factor from 245 experimental observations in agreement with the experimental actual values of

Continuity Factor observed with success (see

Figure 7). Similarly,

Figure 8 shows a good agreement between the experimental concentrations of the average width of the spots forming the centerline segregation and their predicted values using the MARS models from 245 experimental observations, respectively. Indeed, coefficients of determination equal to 0.93 for Continuity Factor estimation and 0.95 for Average Width were obtained using this model, respectively.

**Figure 7.**
Comparison between the values of the Continuity Factor (C_Factor) observed experimentally and predicted by the model MARS from 245 actual observations.

**Figure 7.**
Comparison between the values of the Continuity Factor (C_Factor) observed experimentally and predicted by the model MARS from 245 actual observations.

**Figure 8.**
Comparison between the values of the Average Width of the spots forming the centerline of segregation observed experimentally and predicted by the model MARS from 245 actual observations.

**Figure 8.**
Comparison between the values of the Average Width of the spots forming the centerline of segregation observed experimentally and predicted by the model MARS from 245 actual observations.

Additionally, cross-validation is a model validation technique for assessing how the results of a statistical analysis will generalize to an independent dataset [

45]. It is mainly used in datasets where the goal is prediction, and one wants to estimate how accurately a predictive model will work in practice. The aim of cross validation is to define a dataset to test the model in the training phase, in order to limit problems like overfitting, give an insight on how the model will generalize to an independent data set,

etc.

In order to guarantee the prediction ability of this MARS model, an exhaustive cross-validation algorithm is used. Cross validation was the standard technique used in this research work in order to find the actual coefficient of determination of the model. The data set is randomly divided into

l disjoint subsets of equal size, and each subset is used once as a validation set, whereas the other

$l-1$ subsets are put together to form a training set. In the simplest case, the average accuracy of the

l validation sets is used as an estimator for the accuracy of the method. In this way, 10-fold cross-validation was used [

14,

15,

16,

17,

18,

19,

20,

21,

22,

23,

44,

45].

Segregation is a very common and serious problem in steel production. The diagnostic techniques commonly used based on the traditional methods (such as to evaluate central segregation in steel slabs from continuous casting by etching with hydrochloric acid or with sulfur prints) are expensive from both the material and human standpoints. Consequently, the development of alternative diagnostic techniques is necessary. In this sense, the multivariate adaptive regression splines used in this work is a good choice to prevent segregation. The MARS is a nonlinear and non-parametric regression methodology and a flexible procedure that models complex relationships that are nearly additive or involve interactions with fewer variables. MARS exhibits the ability of modeling complex relationships among variables without strong model assumptions. Besides, MARS does not require a long training process and hence can save lots of modeling time when the data is particularly large. Therefore, the diagnostic model obtained using the MARS technique is a good methodology to predict the segregation and take measures in advance to tackle this problem. Indeed, this diagnostic technique requires low costs of implementation from both the material and human standpoints.

One of main goals in this research work was the study of the interactions among the input variables. Finally, the model developed in this research work was able to predict the segregation according to the actual database.