1. Introduction
In recent years, the dual challenges of accelerated urbanization and aging underground pipeline infrastructure have led to frequent pipeline system failures, emerging as a significant threat to urban safety, environmental quality, and social stability. This has generated a global demand for sustainable pipeline inspection and rehabilitation [
1,
2]. Traditional excavation-based rehabilitation methods require extensive road surface excavation for renewal or replacement, causing severe environmental damage, resource waste, prolonged construction, noise pollution, and significant disruption to residents’ quality of life and urban traffic operations [
3]. In contrast, trenchless rehabilitation technology has become a key technology for achieving sustainable urban infrastructure management owing to its significant advantages of high efficiency, cost-effectiveness, and minimal environmental and traffic impact [
4]. Among these methods, spiral wound lining (SWL), as an advanced trenchless rehabilitation technology, plays a critical role in promoting sustainable rehabilitation of urban pipeline systems due to its stable construction quality, capability for water-carrying operations, and wide range of applicable pipe diameters, providing an effective technical pathway to reduce resource consumption, minimize environmental impact, and improve rehabilitation efficiency.
With the widespread adoption of this technology in the underground pipeline rehabilitation industry, designers face the following problems [
5,
6]: (1) Under complex loading conditions including soil pressure, traffic loads, and fluid load, SWL struggles to meet the minimum stiffness coefficient for structural rehabilitation, thereby rendering the structural repair scheme unfeasible; (2) Creep behavior, a critical mechanical property of unplasticized PVC (PVC-U), induces time-dependent deformation when liners endure sustained operational loads, progressively leading to liner failure and service life reduction. To address these issues, this study proposes a sustainable optimization framework for SWL liner, aiming to enhance ring stiffness (
Sp) while limiting material consumption (
V) and the total strip profile height (
H).
The optimization of complex-section pipeline structures typically relies on experimental approaches, which consume substantial materials, involve complex processing, incur high economic costs, and require lengthy cycles. Moreover, obtaining data through repeated ring stiffness tests is difficult, costly, and time-consuming. Finite element simulation, as a computational analysis technology, offers cost efficiency, rapid implementation, and a wide application range [
7,
8]. Wang et al. [
9] employed finite element analysis software to compare the effects of braided angle and interlayer friction coefficient on the mechanical properties of fiber-reinforced thermoplastic pipes. Their results showed that the braided angle had a significant impact on the mechanical properties of the pipes, whereas the interlayer friction coefficient exerted negligible effects. Other studies have performed parametric analyses on wall thickness, required reinforcement, and concrete crack width of buried reinforced concrete pipeline through finite element analysis method, establishing an optimal diameter-thickness ratio of reinforced concrete pipelines under a certain buried depth, which was helpful for designers to select the most economical reinforced concrete pipeline [
10]. Although finite element simulation has been widely adopted for pipeline structural optimization, its application remains limited in trenchless pipeline rehabilitation, with research gaps persisting in the mechanical properties and optimization methods of SWL liner.
The integration of machine learning (ML) with experimental methods and numerical simulation provides an efficient research pathway for the development of sustainable pipeline rehabilitation technology, significantly enhancing the time efficiency and resource utilization of design optimization [
11]. To achieve the sustainable optimization design goals of SWL, this study developed a systematic three-stage research framework. Stage 1: A ring stiffness test model was established and validated, generating high-quality datasets through numerical simulation to support subsequent predictive modeling. Stage 2: Advanced ML algorithms, including Random Forest (RF), Support Vector Regression (SVR), and Extreme Gradient Boosting (XGBoost), were employed to establish high-accuracy predictive models for loading force (
F) and
V, enabling intelligent prediction of structural performance. Stage 3: A genetic optimization algorithm framework was constructed based on the predictive models, conducting multi-objective optimization with maximized
F, minimized
V, and minimized
H to achieve an optimal balance between structural performance and resource efficiency. This framework not only enables rapid performance prediction for various designs but also delivers optimal configurations for rehabilitation projects, effectively reducing design cycles, lowering testing costs, and enhancing design reliability.
3. Prediction Model
Firstly, ANSYS simulation was used to establish the ring stiffness test model, solve the
F of different structures, and obtain the dataset. Then, ML was used to accurately predict the
F and
V. Finally, multi-objective optimization was achieved through a genetic algorithm, as shown in
Figure 4. This chapter mainly introduced the ML prediction algorithm, prediction process, and model evaluation. The datasets used in this study are provided in
Appendix A.
3.1. Model Algorithm
To carry out multi-objective optimization design of liner structure, ML models for F and V of the liner were established using RF, SVR, and XGBoost models. The effects of these models were compared, and the best models were selected.
RF is an ensemble learning model based on decision trees. By creating multiple decision trees, each tree in the forest is trained independently, which makes it a parallel and scalable algorithm to average the prediction results of these trees to get the final prediction result [
16].
SVR is a nonlinear regression prediction method based on the support vector machine (SVM) model, which can effectively solve the modeling problem of high-dimensional data under limited sample conditions, and has the advantages of strong generalization ability and dimensionality insensitive [
17]. The objective function is shown in Equation (2):
where
x represents the input data,
ω is the weight vector,
b is the biased term,
δi is the relaxation variable,
C is the regularization parameter, and
n is the number of samples.
XGBoost model is an ML algorithm for classification and regression problems. Instead of averaging a single tree, the model constructs a series of sequential decision trees using prediction errors or resists from previous tree models [
18,
19]. The objective function is shown in Equation (3):
where
is the loss between the actual value (
) and the predicted value (
) of sample
i,
is the complexity of the
j decision tree, and
k is the number of trees.
Several different evaluation indicators commonly used in the literature [
20,
21] were considered in this study.
R squared (R
2), also known as the coefficient of determination, is expressed as Equation (4):
where
is the mean of the actual values.
Mean absolute error (MAE) is the absolute value of the difference between the actual value and the predicted value, calculated as Equation (5):
The mean square error (MSE) is the average of the squared difference between the actual value and the predicted value. It is always a positive value and decreases as the error approaches zero, calculated as Equation (6):
To compare the prediction accuracy of the models, the influence of each input on the model prediction process is determined. SHAP values evaluate the importance of different types of inputs to the model [
22] as shown in Equation (7):
3.2. Prediction Model of Ring Stiffness
In this study, RF, SVR, and XGBoost models were used to predict the
F in the ring stiffness test with varying section parameters. Six parameters were selected as inputs, namely
a*,
ew*,
h*,
b*,
c*, and
r*, and
F was taken as the output of the model. The train_test_split module was used to separate the data, and it was randomly divided into 70% of the train data and 30% of the test data. All input features were standardized. Specifically, the mean and standard deviation of each feature were computed exclusively on the training set. All ML models were trained using the train dataset and tested using the test dataset to determine their accuracy. To further verify the generalization ability of the model, 5-fold cross-validation (CV) was performed only on the training set, and a fixed random seed of 42 was set. The grid search method [
23] was applied to adjust the hyperparameters to improve its accuracy. The candidate hyperparameters were evaluated based on the average validation performance, and the optimal combination was selected.
Table 3 lists the hyperparameters used in the ring stiffness prediction model.
Figure 5 shows the train and test regression graphs for RF, SVR, and XGBoost models, with the SVR model demonstrating the best performance. According to the test data, which were not used for training, the SVR model achieved the highest R
2 of 0.9873, the lowest MAE of 0.0804, and the lowest MSE of 0.0087.
In general, it is not sufficient to check an ML model against a single metric in a performance evaluation, and
Table 4 compares ML models against various evaluation metrics.
Table 4 indicates that the SVR model performed best in both the train and test datasets, followed by the RF and XGBoost. All three models achieved an R
2 greater than 0.98 during training. During training, SVR achieved the highest accuracy at 0.9983, followed by RF at 0.9865, and XGBoost at 0.9859. In the test dataset, which was not used for training, SVR achieved the highest accuracy of 0.9873, RF achieved 0.9786, and XGBoost achieved 0.9700, with none of the three models showing overfitting.
3.3. Prediction Model of Material Consumption
In this study, RF, SVR, and XGBoost models were also used to predict the
V under different section parameter variations. 6 parameters were selected as inputs, including
a*,
ew*,
h*,
b*,
c*, and
r*, and
V was selected as the output of the model. The train_test_split module was used to separate the train data from the test data, and was randomly divided into 70% of the train data and 30% of the test data. All input features were standardized. Specifically, the mean and standard deviation of each feature were computed exclusively on the training set. All ML models were trained using the train dataset and tested using the test dataset to determine their accuracy. To further verify the generalization ability of the model, 5-fold cross-validation (CV) was performed only on the training set, and a fixed random seed of 42 was set. The grid search method [
23] was applied to adjust the hyperparameters to improve its accuracy. The candidate hyperparameters were evaluated based on the average validation performance, and the optimal combination was selected.
Table 5 lists the hyperparameters used in the
V prediction model.
Figure 6 shows the training and testing regression graphs for RF, SVR, and XGBoost models. The XGBoost model exhibited the highest performance. According to the test data, which was not used for training, the test set R
2 was as high as 0.9700, MAE as low as 0.0200, and MSE as low as 0.1200.
Table 6 compares the ML models using different evaluation indicators. The XGBoost model shows the best performance in both the training and test data sets, making it the preferred choice for predicting
V. 4. Multi-Objective Optimization of the Structure Size of Spiral Wound Strip Profile
A genetic algorithm is a computational model based on population evolution. It facilitates information exchange and survival of the fittest through reproduction, variation, and competition among individuals in the population to gradually approach the optimal solution [
24]. A Non-dominated sorting genetic algorithm (NSGA) is a common method to solve multi-objective optimization problems. The NSGA-Ⅱ optimization algorithm, an improved version of NSGA, has the advantages of good exploration, higher optimization efficiency, and clearer solution set frontier [
25,
26]. The NSGA-II optimization algorithm of the genetic algorithm was selected to carry out the multi-objective optimization design of the liner.
4.1. Optimal Design
is a crucial index for evaluating the mechanical properties of the liner. Under fixed test conditions (specifically a pipe inner diameter of 1000 mm and sample length of 1000 mm), and F are interconvertible through Equation (1). Consequently, F was adopted as the optimization objective in this study due to its direct experimental and simulation measurability. F values were subsequently converted to during post-processing to ensure this core engineering parameter was explicitly presented in final results.
At the same time, pipeline design must control both V and H. Since the SWL method involves inserting liners into host pipes, the inner diameter of the repaired pipeline equals the original inner diameter minus 2H. Therefore, the smaller the H, the larger the inner diameter of the repaired pipeline will be. The cross-sectional area of the pipeline will be larger.
4.1.1. Multi-Objective Optimization of F and V
Multi-objective optimization was carried out with maximum
F and minimum
V as two objectives. To obtain the Pareto optimal frontier more intuitively in the figure, the multi-objective column of
F maximization was treated as −
F minimization, as shown in Equation (8).
Setting the initial optimal population size to 100 and genetic algebra to 200, 1284 Pareto optimal solution sets were calculated based on the NSGA-II optimization algorithm, and the resulting Pareto optimal solution sets are shown in
Figure 7.
As seen in
Figure 7, point A in the Pareto optimization solution set represents the section design when the negative value −
F is minimized in single objective optimization, resulting in the highest liner
Sp. Point B represents the design with the minimum
V in single objective optimization, resulting in the smallest volume. Point C represents the optimal section design considering both
F and
V. The section parameter design scheme represented by the three points is shown in
Table 7.
4.1.2. Multi-Objective Optimization of F and H
Multi-objective optimization was carried out with maximum
F and minimum
H as the objectives. To obtain the Pareto optimal frontier more intuitively, the multi-objective column of
F maximization was transformed into minimizing −
F, as shown in Equation (9).
Setting the initial optimal population size to 100 and genetic algebra to 200, 1972 Pareto optimal solution sets were calculated based on the NSGA-II optimization algorithm, and the resulting Pareto optimal solution sets are shown in
Figure 8.
As can be seen from
Figure 8, in the Pareto optimization solution set, point D represents the section design when the negative value −
F is the minimum of single objective optimization, and the liner
Sp is the highest at this time. Point E represents the design under the optimization of a single optimization objective with minimum
H, and the volume is the smallest at this time. Point G represents the optimal design of the section when the two objectives of
F and
V are considered. The section parameter design scheme represented by the three points is shown in
Table 8.
4.1.3. Multi-Objective Optimization of F, V, and H
Multi-objective optimization was carried out by taking the maximum
F, the minimum
V and the minimum
H of the strip profile in the ring stiffness test of the liner as three optimization objectives. To obtain the Pareto optimal frontier more intuitively in the figure, the maximization of
F was treated as the minimization of −
F, and the multi-objective column is shown in Equation (10).
With the initial optimal population size set to 100 and genetic algebra set to 200, 3936 Pareto optimal solution sets were calculated based on the NSGA-Ⅱ optimization algorithm, and the resulting Pareto optimal solution sets are shown in
Figure 9.
As can be seen from
Figure 9, in the Pareto optimization solution set, point I represents the section design when the negative value −
F is the minimum of single objective optimization, and the liner
Sp is the highest at this time. Point J represents the design under the optimization of a single optimization objective with minimum
V, and the volume is the smallest at this time. Point K represents the design when the
H minimum single optimization objective is optimized, and the strip section height is the minimum at this time. Point L represents the optimal design of the section when considering the three objectives of
F,
V, and
H. The section parameter design scheme represented by the four points is shown in
Table 9.
4.2. Optimization Result
The above 10 design schemes were simulated by the finite element method to compare the improvement effect. The calculation results are shown in
Table 10.
Point L is the selected multi-objective optimization scheme, where
h* = 0.22
w,
ew* = 0.028
w,
a* = 0.039
w,
b* = 0.13
w,
c* = 0.039
w, and
r* = 0.0012
w. The section of the optimized profile is shown in
Figure 10. The
Sp of the optimized liner is increased by 24.46%, and the
V is only increased by 1.82%.
5. Parameter Analysis
5.1. Single-Factor Experimental Analysis
By controlling a single parameter, the influences of
a*,
b*,
c*,
ew*,
h*, and
r* on the
F and the
V were studied. A total of 75 sets of
Sp simulation were carried out; the parameters of the section were fitted with the data of
F and
V, and the fitting formula was obtained, as shown in
Figure 11.
The sensitivity coefficient is defined as the ratio of the degree of sensitive factors to the change degree of the evaluation indicators [
27]. By comparing the sensitivity coefficient of different parameters, the influence degree of each parameter on the
F and
V of the profile was determined. The sensitivity coefficient is expressed by Equation (11).
where
β is the sensitivity coefficient of sensitive factor
Q; ∆
Y is the rate of change of the evaluation index, %; ∆
X is the rate of change of the sensitive factor, %.
Figure 12 shows the sensitivity analysis of
F and
V on 6 parameters. Absolute values of sensitivity coefficients are compared, and max-min normalization is carried out. The results show that the sensitivity coefficients of the
F to sensitive factors are as follows:
h*,
a*,
ew*,
b*,
c*, and
r*, among which the
F value has the highest sensitivity to
h* and the lowest sensitivity to
r*. The order of sensitivity coefficients of
V to sensitive factors is:
ew*,
h*,
c*,
a*,
b*,
r*, among which the sensitivity to
ew* is the highest, and the sensitivity to
r* is the lowest.
5.2. Orthogonal Experimental Analysis
In the study, an orthogonal experimental design was carried out for cross-section parameters, and 6 factors and 5 levels of the test scheme were selected, as shown in
Table 11.
ANSYS Workbench software was used to analyze 25 different ring stiffness models of the SWL liner, and the simulation results are shown in
Table 12.
According to the range results in
Table 12, it could be concluded that in the orthogonal test, the primary and secondary order of the influencing factors is
h* >
b* >
a* >
r* >
c* >
ew*; that is, the height of the T-rib has the greatest influence on the
F, followed by the width and height of the roof, which is consistent with the sensitivity analysis results. The wall thickness of the profile, the circle angle of the top plate, and the width of the T-rib waist have little influence on the results.
5.3. SHAP Analysis
SHAP analysis is a method used to interpret the predictions of ML models by quantifying the importance of features by calculating the contribution of each input feature to the model predictions. In SHAP analysis, the higher the input features are on the axis, the more important they are to the model. The importance analysis results of SHAP are shown in
Figure 13, and the priority order of the influence of section parameters on
F and
V is obtained. SHAP analysis results show that, according to the influence of different design parameters on the
F, the order from high to low is
h*,
b*,
a*,
c*,
ew *, and
r*. According to the influence of different design parameters on the
V, the order from high to low is
b*,
h*,
a*,
ew *,
c*, and
r*.
5.4. Analysis of Result Differences
This study employed SHAP analysis, orthogonal experimentation, and sensitivity analysis to evaluate feature importance for F. The SHAP analysis is based on the Shapley value theory in game theory. It quantifies the marginal contribution of each feature to model predictions and explains the importance of the features. Orthogonal experimentation designs a multi-factor orthogonal matrix to focus on the main parameter effects, making it suitable for rapid parameter screening in resource-constrained scenarios. Sensitivity analysis evaluates how input parameter variations contribute to output uncertainty, quantifying global parameter sensitivity.
The SHAP analysis is based on 100 data points in this study, covering a wider feature space. The resultant parameter importance ranking (h* > b* > a* > c* > ew * > r*) has high credibility. Orthogonal experimentation employed a 25-set design matrix. Although the sample size is small, the ranking of main effects (h* > b* > a* > r* > c* > ew *) is highly consistent with the SHAP analysis results in the core parameters (h*, b*, a*). The sensitivity analysis has a large fluctuation in the ranking of feature importance due to the small sample size and insufficient sample size.
All three methods identified that feature h* has the most significant impact on F. Features b* and a* are ranked in the top three in both SHAP analysis and orthogonal experimentation, confirming them as secondary key parameters. For other parameters (c*, ew *, and r*), the differences in ranking mainly result from the principles of the methods and the limitations of sample size. Accordingly, this study proposes a hierarchical importance classification: h* as the primary control parameter, b* and a* as secondary optimization targets, and c*, ew *, and r* as tertiary parameters.
5.5. Stress Analysis
Through finite element simulation analysis, the von Mises stress cloud diagram of the liner is obtained, as shown in
Figure 14. It could be seen that the stress distribution of the liner, the maximum stress of the liner is 17.592 MPa, the stress is mainly distributed in the top plate and T-rib waist, and the stress is transferred from the top plate of the strip profile to the bottom plate through the T-rib waist. From a mechanical perspective, the top plate, which directly bears the loads, undergoes deformation under their influence. The resulting deformation-induced stress then propagates along the T-rib waist. This stress transmission and redistribution process plays a vital role in maintaining the overall stability of the liner.
The T-shaped beam structure of the strip profile could be equivalent to an I-beam, in which the top and bottom flanges primarily resist the bending moments under loads, thereby providing good bending resistance. Under loading conditions, the load directly acts on the top plate, which serves as the main load-bearing component.
According to
Figure 13, the
F is most significantly influenced by the
h* and
b*. This can be explained by considering the moment of inertia of the cross-section. For analytical simplicity, the strip profile can be idealized as a rectangular cross-section. In such a simplified model, the moment of inertia exhibits a cubic relationship with
h* and a linear relationship with
b*. Consequently, increasing the values of
h* and
b* leads to a significant increase in the moment of inertia of the cross-section, which effectively improves the bending performance of the strip profile [
28].
The T rib waist mainly bears the effect of shear stress; the increase of the c* value improves the shear resistance of the material, and the stronger its ability to bear the shear stress. Therefore, the change of T rib waist width has a certain influence on the F and V.
In contrast, the F decreases with the increase of the value of r*. The analysis reason is that the decrease of r* reduces the stressed area of the top plate, thus affecting the bearing performance of the liner, but its change has a very weak influence on the F and V.
To quantify the relationship between the section parameters
h* and
b* and
Sp increment (Δ
Sp), a nonlinear curve fitting method was adopted. The fitting formulas for
h* and
b* with Δ
Sp are provided in Equations (12) and (13). The fitting effect is shown in
Figure 15. The Δ
Sp prediction formula, derived from Equations (12) and (13), is presented in Equation (14), with an R
2 value of 0.9874.