1. Introduction
Transmission lines are critical infrastructures within power grids, ensuring stable and continuous electrical supply essential for societal and economic stability. Under extreme environmental conditions involving strong winds and ice accretion, transmission lines usually employing bundled conductors are highly susceptible to pronounced wind-induced vibrations. Such phenomena typically manifest as galloping or sub-span oscillations, which can significantly compromise operational stability and structural integrity [
1]. These dynamic instabilities significantly degrade the reliability and safety of electrical systems, often leading to fatigue damage of hardware components, insulator failures, and even widespread power outages, thus posing substantial operational and economic challenges [
2,
3]. To mitigate these problems, a thorough understanding of the aerodynamic behavior of bundled conductors under diverse weather conditions is essential [
4].
Wind-induced vibrations constitute a major threat to the safe operation of transmission lines and are a direct source of economic losses through accelerated component fatigue, increased maintenance interventions, and potential power interruptions. Crucially, the aerodynamic coefficients of bundled or iced conductors form the load foundation for vibration analyses, such as spanning galloping, sub-span oscillations, and aeolian vibration, and their accuracy is a necessary condition for reliable force estimation, stability assessment, and response prediction. Improved fidelity in these coefficients helps reduce lifecycle costs and outage risk, while supporting utilities in meeting evolving expectations for grid resilience under extreme weather.
Traditionally, the aerodynamic behavior of conductors has been extensively investigated using wind tunnel experiments [
5,
6,
7,
8,
9] and computational fluid dynamics (CFD) simulations [
10,
11,
12]. Wind tunnel tests provide valuable direct observations but are typically costly, time-consuming, and limited to simplified scenarios [
13]. CFD simulations, meanwhile, offer detailed insights into aerodynamic characteristics under various conditions, such as wind speed, attack angle, icing shape, and conductor configurations [
10]. For instance, previous studies have systematically examined how icing conditions influence aerodynamic coefficients of bundled conductors, offering insights into the complex interactions between airflow and conductor geometry [
14]. Although significant progress has been achieved through these conventional approaches, limitations remain. CFD simulations, in particular, face inherent drawbacks, such as high computational demands, sensitivity to modeling assumptions, and difficulties in providing real-time predictive capabilities required for efficient operational decision-making [
15]. Moreover, recent wake-interference studies [
9,
16] have demonstrated that torsional velocity effects and three-dimensional wake interactions among sub-conductors play a crucial role in large-amplitude galloping, thereby challenging the adequacy of simplified 2D turbulence models, such as the Spalart–Allmaras (S-A) formulation. These findings highlight the need to carefully consider the physical limits of CFD modeling when applied to bundled or iced conductors.
To overcome these challenges, recent studies have increasingly turned toward machine learning (ML)-based methods, leveraging their ability to model complex nonlinear relationships efficiently [
17,
18,
19]. ML techniques have shown promising results in rapidly predicting aerodynamic characteristics and related dynamic behaviors in transmission lines [
20,
21,
22]. For example, artificial neural networks and tree-based algorithms have been successfully employed to predict conductor aerodynamic coefficients and dynamic responses under icing and wind-loading scenarios [
23]. Nevertheless, performance among different ML techniques varies considerably, and selecting the most suitable modeling approach remains a critical yet controversial issue in practical engineering applications [
21].
Recently, machine learning has been applied to the aerodynamic analysis of iced conductors as an efficient alternative to wind tunnel tests. A BP neural network model was used to predict aerodynamic coefficients of iced conductors, showing good agreement with experimental results and confirming its feasibility for galloping studies [
24]. In addition, a convolutional neural network (CNN) approach based on composite images was proposed, which achieved high accuracy in predicting drag, lift, and torque coefficients, demonstrating clear advantages over traditional methods [
25]. These studies indicate that data-driven approaches for predicting aerodynamic coefficients of iced conductors have become a research hotspot.
This study is structured as follows:
Section 2 presents the establishment of a high-fidelity numerical flow field model for six-bundle conductors, and its accuracy is validated against wind tunnel experimental data. The generation of a large-scale aerodynamic coefficient dataset using Latin hypercube sampling, together with the development and optimization of multiple tree-based surrogate models through hyper-parameter tuning and cross-validation, is described in
Section 3. In
Section 4, comparative performance evaluations are provided, and a global sensitivity analysis is carried out. Potential engineering applications of the proposed surrogate model are then discussed in
Section 5. Finally,
Section 6 summarizes the conclusions and outlines future research directions.
3. Establishment of the Prediction Model
3.1. Acquisition of Large-Scale Data Samples
To develop a surrogate model with both high accuracy and strong generalization capability, it is essential to obtain a dataset that uniformly and comprehensively covers the high-dimensional input space. In this study, Latin hypercube sampling (LHS) was employed as the core experimental design strategy. LHS is a stratified, quasi-random sampling technique widely recognized for its efficiency in multi-dimensional parameter studies and its advantage over simple random sampling in achieving good space-filling properties [
18].
The basic principle of LHS can be described as follows: Suppose the input parameter space has
nv dimensions and a total sample size of
N. For each dimension, the range is divided into
N equally probable intervals. A single value is randomly selected from each interval, ensuring that each interval is sampled exactly once per dimension. These values are then randomly paired across all dimensions to form
N distinct sample points. Mathematically, for the
j-th dimension and the
i-th sample, the sample value
xij can be expressed as:
where
is the inverse cumulative distribution function of the
j-th variable,
Pij is the i-th permuted interval for dimension
j, and
uij is a random number uniformly distributed in [0, 1).
In this research, five key factors affecting the aerodynamic coefficients of six-bundle conductors were selected as input parameters: wind speed, wind attack angle, icing shape, icing thickness, and sub-conductor number. To fully capture engineering scenarios, icing shape (
S) included three typical cases: bare conductor, crescent-shaped ice, and sector-shaped ice, which are shown in
Figure 5. It is noted that the icing angle of sector-shaped ice is set as 120°. The specific value ranges for each parameter are provided in
Table 3. Specifically, icing thickness (
T) varies from 0 mm to 50 mm, incremented by 1 mm, reflecting both light and severe icing events. Wind speed (
V) ranges from 0 to 30 m/s, encompassing calm conditions up to severe storms. Wind attack angle (
a) is considered from 0° to 180° at 5° intervals, enabling the exploration of a wide variety of wind directions relative to the conductor axis. Finally, sub-conductor number (
m) covers all six bundle positions to account for possible spatial variability within the bundle structure.
Based on the Latin hypercube sampling (LHS) strategy, a total of 3580 sample combinations were generated to uniformly explore the five-dimensional design space defined by icing shape, icing thickness, wind speed, wind attack angle, and sub-conductor number. Each sample point represents a unique set of parameter values, and these combinations were used as input cases for high-throughput CFD simulations with the validated model described in
Section 2.
The effectiveness of the LHS method in generating well-distributed samples is clearly illustrated in
Figure 6. As shown in
Figure 6a, the projection of the sample distribution onto the two-dimensional plane of icing thickness (
T) and wind speed (
V) reveals a highly uniform coverage, with no clustering or significant gaps, ensuring that all regions of the parameter space are adequately represented.
Figure 6b extends this visualization to three dimensions among icing thickness (
T), wind speed (
V), and icing shape (
S), further confirming that the sampling strategy achieves excellent space-filling properties across both continuous and discrete variables. The stratified, layered pattern in the third dimension reflects the inclusion of all three typical icing shapes, with samples in each layer uniformly spread over the full range of
T and
V.
This comprehensive and well-balanced dataset not only enhances the representativeness and diversity of training and testing samples, but also provides a robust foundation for developing, evaluating, and generalizing the tree-based machine learning models proposed in this study.
3.2. Tree-Based Machine Learning Methods
Tree-based machine learning algorithms offer significant advantages for regression and classification tasks involving complex, nonlinear relationships and heterogeneous data. Their ability to capture feature interactions, robustness to outliers, and interpretability make them particularly suitable for modeling aerodynamic coefficients, which are influenced by multiple coupled factors, such as wind speed, attack angle, and icing conditions. Given our 3580-sample tabular dataset, ensemble tree methods offer strong sample efficiency and stable generalization with modest tuning, whereas neural network alternatives typically require larger datasets and more elaborate regularization to achieve comparable performance. In this study, several representative tree-based regression algorithms were considered, including DT, RF, ERT, GBDT, and XGBoost.
Among these, XGBoost (extreme gradient boosting) stands out for its superior predictive accuracy and computational efficiency. XGBoost is an advanced ensemble learning algorithm built on the framework of gradient boosting decision trees (GBDTs). Its core concept is to construct a series of weak regression trees in an iterative fashion, each learning to correct the residuals of its predecessors. The final prediction is a weighted sum of all trees. Compared to traditional GBDT, XGBoost introduces regularization, second-order gradient optimization, and parallel computation, which together improve both the accuracy and generalization capability of the model—especially when dealing with structured datasets and high-dimensional regression problems.
In the context of aerodynamic coefficient prediction for bundle conductors, XGBoost’s segmented tree-based structure effectively captures the strong nonlinearity and feature interactions inherent in the data, while its built-in mechanisms provide resilience to noise and missing values. The objective function of XGBoost can be formulated as:
where
denotes the loss function (e.g., mean squared error), and
is the regularization term, with
T as the number of leaves,
as the leaf weights, and γ and λ as regularization parameters.
During tree construction, the optimal feature split is determined by maximizing the gain function:
in which
and
represent the first and second derivatives (gradients and Hessians) of the loss function for the left and right child nodes, respectively.
The performance of XGBoost is highly dependent on hyper-parameter tuning. Key parameters include the number of trees (n_estimators), learning rate (learning_rate), maximum tree depth (max_depth), subsample ratio (subsample), and minimum child weight (min_child_weight). These parameters jointly control model complexity, learning efficiency, and the ability to generalize, and are typically optimized through grid search and cross-validation.
3.3. Model Training and Testing
Before developing the surrogate prediction model, data preprocessing is necessary to mitigate the effects of varying scales, dimensions, and outliers, thereby ensuring stability and accuracy during training. In this study, Min–Max scaling was utilized to normalize all five-dimensional input parameters and three-dimensional output variables, linearly mapping each feature to the range [0, 1] as follows:
where
x is the original value, and
x′ is the normalized value.
The complete process for establishing the aerodynamic prediction model for six-bundle conductors is comprehensively illustrated in
Figure 7. After data preprocessing, the dataset was randomly divided into training and testing subsets at a ratio of 4:1, ensuring that the model’s evaluation would be based on previously unseen data and thus provide a reliable assessment of its generalization capability. The training set (80%) was utilized to construct and tune the tree-based surrogate models, with a focus on the XGBoost algorithm due to its superior performance in preliminary comparisons.
A grid search strategy, implemented via the GridSearchCV module from Scikit-learn, was employed to systematically optimize key hyper-parameters. This process was combined with 10-fold cross-validation, allowing the model to be trained and validated on multiple splits of the training data. Through this approach, issues such as overfitting and underfitting could be effectively identified and mitigated, resulting in robust model performance across varying data partitions. The primary hyper-parameters subjected to tuning included the learning rate, maximum tree depth, subsample ratio, minimum child weight, and regularization terms, all of which play a critical role in balancing model complexity and predictive accuracy. The detailed search ranges and the final selected optimal values for these hyper-parameters are provided in
Table 4. The convergence behavior of the XGBoost-based surrogate model during training is illustrated in
Figure 8. As the epoch number increases, the cost function (loss) decreases rapidly in the initial stages and gradually stabilizes, indicating effective learning and optimization of the model parameters. This steady decline in loss demonstrates the model’s ability to fit the training data efficiently without signs of overfitting, further validating the appropriateness of the selected hyper-parameters and the overall robustness of the training procedure.
Following optimization, the best-performing surrogate model was further evaluated using the independent testing dataset (20%) to assess its predictive accuracy, stability, and generalization capacity. The entire modeling workflow, including dataset partitioning, hyper-parameter tuning, model training, cross-validation, and performance evaluation, is clearly visualized in
Figure 7, offering a transparent and reproducible process for surrogate model development in aerodynamic prediction tasks.
It should be noted that the same modeling and hyper-parameter optimization procedure described above was also applied to all other tree-based algorithms evaluated in this study, including DT, RF, ERT, and GBDT. For each algorithm, an independent grid search combined with 10-fold cross-validation was performed using the training dataset to identify the optimal hyper-parameter settings, ensuring a fair and rigorous comparison among different models. Due to space limitations, the specific optimal parameters for these additional models are not listed in detail in this paper, but the overall workflow and optimization principles are consistent with those outlined for the XGBoost-based model.
After selecting the optimal hyper-parameters, the model was retrained on the entire training dataset, applying regularization constraints and feature weighting to further enhance its predictive accuracy and robustness. Finally, the generalization capability of the trained model was quantitatively assessed using the independent test dataset. Three widely used regression evaluation metrics were employed: coefficient of determination (R
2), mean squared error (MSE), and mean absolute error (MAE), calculated as follows:
where
n is the number of test samples,
yi and
are the observed and predicted aerodynamic coefficients, respectively, and
is the mean of observed values. An R
2 value approaching 1 indicates high predictive accuracy, whereas lower values of MSE and MAE reflect improved model precision. These metrics collectively provide a comprehensive basis for evaluating model accuracy, stability, and generalization performance.
4. Discussion
To determine the optimal tree-based surrogate model for accurately predicting aerodynamic coefficients of six-bundle conductors, five representative tree-based algorithms—decision tree (DT), random forest (RF), extremely randomized trees (ERTs), gradient boosted decision tree (GBDT), and extreme gradient boosting (XGBoost)—were systematically evaluated. Each algorithm reflects a distinct modeling philosophy: DT represents the simplest single-tree structure, RF utilizes ensemble learning through feature perturbation under the Bagging framework, ERT incorporates complete randomness in feature and threshold selection to enhance robustness, GBDT applies gradient boosting in a sequential manner to reduce bias, and XGBoost extends GBDT with second-order optimization, regularization, and efficient parallelization.
The comparative evaluation was based on three widely used regression metrics—coefficient of determination (R
2), mean squared error (MSE), and mean absolute error (MAE)—on both training and testing datasets. The results, summarized in
Figure 9, highlight clear performance differences among the models. RF achieved the highest R
2 (0.999) and the lowest MSE (9 × 10
−6) on the training set, but this near-perfect fit indicates overfitting, as reflected by its reduced generalization on the test set (R
2 = 0.835, MSE = 0.001976). ERT also exhibited very high training accuracy (R
2 = 0.965) due to its randomized splitting strategy but suffered a notable performance drop on the test set (R
2 = 0.836, MSE = 0.001925), confirming its tendency toward overfitting.
GBDT delivered balanced but less competitive results, with moderate training accuracy (R2 = 0.934) and reduced test performance (R2 = 0.829), suggesting that its sequential boosting process is more sensitive to noise and parameter tuning. DT, as expected for a single-tree model, had the lowest accuracy on both datasets (R2 = 0.922 on training, 0.814 on testing), indicating its limited capacity to capture complex nonlinear relationships.
Among all candidates, XGBoost achieved the best trade-off between fitting accuracy and generalization, with R2 values of 0.981 (training) and 0.855 (testing), and the lowest MAE (0.0156) on the test set. These results demonstrate that XGBoost’s advanced regularization and optimization mechanisms effectively mitigate overfitting while maintaining high predictive precision. The improved generalization capability of XGBoost makes it the most suitable choice for aerodynamic coefficient prediction in scenarios involving diverse and nonlinear interactions among environmental and structural parameters.
The overall comparison of the five tree-based models is summarized in
Table 5. Although RF and ERT achieve near-perfect training accuracy, their substantial drop in testing performance reveals a pronounced overfitting tendency. DT and GBDT provide moderate results but fail to match the predictive precision of advanced ensemble methods. XGBoost consistently outperforms other models in test-set accuracy and error metrics, demonstrating superior generalization and robustness for aerodynamic coefficient prediction tasks.
To evaluate the reliability of the XGBoost-based model, an uncertainty quantification analysis was conducted, and the results are shown in
Figure 10. As presented in
Figure 10a, the residual histogram indicates that prediction errors are highly concentrated around zero, with only a few deviations on the negative side, suggesting that the model exhibits low bias and good overall calibration.
Figure 10b further demonstrates the stability of model performance across different validation folds: the R
2 values remain consistently high (close to 0.85), while both MSE and MAE show very small dispersion. These results confirm that the proposed tree-based surrogate not only achieves accurate predictions but also provides reliable variance estimates, thereby supporting its robustness in aerodynamic coefficient modeling.
To further validate the predictive capability of the XGBoost-based surrogate model,
Figure 11 and
Figure 12 present detailed comparisons between the predicted and true aerodynamic coefficients under two representative operating conditions.
Figure 10 corresponds to a crescent-shaped ice case with a wind velocity of 10 m/s and an icing thickness of 20 mm, while
Figure 11 corresponds to a sector-shaped ice case with a wind velocity of 30 m/s and the same icing thickness. For both scenarios, the variation of lift coefficient (
CL), drag coefficient (
CD), and moment coefficient (
CM) with wind attack angle (
a) is shown for sub-conductor 1.
Across the entire range of a from 0° to 180°, the predicted curves (red dots) closely follow the measured true values (black squares), accurately reproducing both the amplitude and phase of the aerodynamic coefficient variations. Minor local deviations are observed at certain angles, particularly near peak and valley regions, which may be attributed to local flow separation complexities not fully captured in the training data. Nonetheless, the overall agreement remains high, with the surrogate model successfully capturing the key aerodynamic features under different icing shapes and wind speeds.
These results confirm that the XGBoost-based model is capable of delivering accurate and stable predictions of aerodynamic coefficients under diverse and challenging conditions, further supporting its applicability in practical engineering analyses of wind-induced vibrations in iced bundled conductors.
Feature importance analysis is a critical step in understanding the internal decision-making process of machine learning models, as it quantifies the relative contribution of each input variable to the model’s predictive performance. This not only enhances the interpretability of the surrogate model but also provides valuable guidance for prioritizing key parameters in subsequent aerodynamic studies and engineering applications.
Within the XGBoost framework, feature importance evaluation revealed, as shown in
Figure 13a, that wind attack angle (
a) is the most influential factor, contributing approximately 49.38% to the prediction of aerodynamic coefficients. This is followed by icing thickness (
T, 20.19%), icing shape (
S, 13.58%), sub-conductor number (
m, 10.01%), and wind speed (
V, 6.83%). These results indicate that the aerodynamic response of six-bundle conductors is dominated by the wind attack angle, while icing-related parameters also exert a significant impact, and wind speed plays a comparatively minor role under the considered scenarios.
To provide a more robust and unbiased interpretation beyond the gain-based metric, permutation importance analysis was also performed. As shown in
Figure 13b, this approach identifies sub-conductor number (m) and wind attack angle (a) as the two dominant factors, with wind speed (V) ranking third, whereas icing thickness (T) and icing shape (S) contribute relatively less. The difference arises because gain-based importance reflects model-internal splitting criteria, which tend to favor continuous variables with more potential thresholds (such as wind attack angle), whereas permutation importance directly measures the impact of perturbing each feature on prediction accuracy. The consistency between both methods in highlighting wind attack angle and sub-conductor effects confirms their dominant physical role, while the divergence in the relative ranking of other variables underscores the necessity of combining multiple importance measures. This joint analysis not only reduces potential bias associated with a single metric but also provides a more comprehensive understanding of the aerodynamic drivers of iced bundled conductors.
By integrating this feature importance analysis with the results from model performance evaluations, the XGBoost-based surrogate model is confirmed to be the most effective choice among the investigated tree-based algorithms. It demonstrates high predictive accuracy, strong generalization capability, and clear interpretability, making it particularly suitable for aerodynamic coefficient prediction of bundled conductors under complex wind and icing conditions.
It should be noted that the present surrogate model was trained and validated entirely on CFD-generated data, which have been benchmarked against published wind tunnel measurements to ensure credibility. Nevertheless, direct experimental validation of the surrogate itself against independent wind tunnel or field measurements is still lacking. This limitation will be addressed in future work by designing dedicated wind tunnel campaigns and exploring field monitoring data, which will enable a more rigorous verification of surrogate predictions and provide a deeper quantification of predictive uncertainty under real icing events. In addition, the surrogate achieves inference at the second level per sample on a standard workstation, representing a speedup of several orders of magnitude compared with full CFD simulations (hour level). By contrast, the training process remains computationally intensive and is carried out offline using a precomputed CFD database. Future work will, therefore, focus on integrating physical experiments with data-driven approaches to develop real-time prediction capability.
5. Potential Applications of the Developed Surrogate Model
The tree-based surrogate model established in this study provides a powerful and efficient tool for predicting aerodynamic coefficients of six-bundle conductors under diverse wind and icing conditions. Compared with traditional wind tunnel testing and high-fidelity CFD simulations, the surrogate model offers a substantial reduction in computational cost while maintaining high prediction accuracy, making it suitable for a variety of practical and research applications.
The model enables rapid aerodynamic coefficient prediction for typical transmission line configurations, allowing engineers to efficiently evaluate conductor aerodynamics under numerous environmental scenarios. This capability is particularly valuable in the early stages of line design, where multiple configurations and loading conditions must be assessed in a short time. In addition, the surrogate model serves as a key component for wind-induced vibration analysis, including galloping, sub-span oscillations, and wake-induced vibrations. By providing fast and reliable aerodynamic inputs, it facilitates dynamic simulations and stability assessments for conductors under complex environmental loads.
Beyond design-stage applications, the model can be integrated into fast monitoring and risk assessment systems. Coupled with online meteorological and structural monitoring data, the surrogate model can provide near-instantaneous aerodynamic estimates, enabling timely identification of adverse conditions and supporting preventive control measures. Moreover, the modeling framework developed in this study is scalable and adaptable. With appropriate retraining using additional datasets, the surrogate model can be extended to other conductor types, bundle configurations, or even different transmission line components, providing a generalizable methodology for aerodynamic performance evaluation across the power transmission sector.
Overall, the developed surrogate model bridges the gap between high-accuracy aerodynamic analysis and engineering efficiency, offering significant potential for both academic research and practical engineering applications. Nevertheless, due to the scope of this work, certain aspects—such as the influence of D-shaped icing and non-uniform icing—have not yet been considered and will be the subject of future in-depth investigations.
6. Conclusions
This study developed a large-scale aerodynamic coefficient dataset for six-bundle conductors based on high-throughput numerical simulations and established a multi-input, multi-output aerodynamic prediction model using several tree-based algorithms. It is concluded that:
(1) Among the five evaluated tree-based surrogate models (DT, RF, ERT, GBDT, and XGBoost), the XGBoost-based model consistently achieved the highest predictive accuracy and strongest generalization capability. This was evidenced by its superior R2 values and lower MSE and MAE on the testing dataset, indicating a balanced trade-off between fitting precision and robustness.
(2) Global sensitivity analysis revealed that wind attack angle (a) was the most influential factor (55.38%), followed by icing thickness (T, 20.19%), icing shape (S, 13.58%), sub-conductor number (m, 10.01%), and wind speed (V, 6.83%). These findings confirmed that the aerodynamic responses of six-bundle conductors are governed by strong multi-factor coupling effects, with a and T exerting dominant influences.
(3) Compared to conventional CFD simulation workflows, the proposed XGBoost-based surrogate model achieved high-accuracy aerodynamic coefficient predictions within seconds. This substantial improvement in computational efficiency enabled rapid evaluation of aerodynamic behavior and supports engineering analyses of wind-induced vibration (galloping, sub-span oscillations, etc.) in UHV transmission lines.
It should be noted that, due to the scope of the present study, certain factors, such as 3D models, D-shaped icing, non-uniform icing distributions, and typhoons with complex wind loads, were not considered. These aspects will be the focus of future research to further enhance the applicability and robustness of the proposed surrogate modeling framework.