Hybrid Machine Learning Meta-Model for the Condition Assessment of Urban Underground Pipes

Mohammadagha, Mohsen; Najafi, Mohammad; Kaushal, Vinayak; Jibreen, Ahmad

doi:10.3390/infrastructures10110282

Open AccessArticle

Hybrid Machine Learning Meta-Model for the Condition Assessment of Urban Underground Pipes

¹

Center for Underground Infrastructure Research and Education (CUIRE), Department of Civil Engineering, The University of Texas at Arlington, Arlington, TX 76019, USA

²

Department of Civil, Construction Engineering and Management, The University of Texas at Tyler, Tyler, TX 75799, USA

^*

Authors to whom correspondence should be addressed.

Infrastructures 2025, 10(11), 282; https://doi.org/10.3390/infrastructures10110282

Submission received: 3 July 2025 / Revised: 16 October 2025 / Accepted: 16 October 2025 / Published: 23 October 2025

(This article belongs to the Special Issue Smart Technologies for Sustainable and Resilient Underground Infrastructures)

Download

Browse Figures

Versions Notes

Abstract

Urban water infrastructure faces increasing deterioration, necessitating accurate, cost-effective condition assessment. Traditional inspection techniques are intrusive and inefficient, creating demand for scalable machine learning (ML) solutions. This study develops a hybrid ML meta-model to predict underground pipe conditions using a comprehensive dataset of 11,544 records. The objective is to enhance multi-class classification performance while preserving interpretability. A stacked hybrid architecture was employed, integrating Random Forest, LightGBM, and CatBoost models. Following data preprocessing, feature engineering, and correlation analysis, the neural network-based stacking meta-model achieves 96.67% accuracy, surpassing individual base learners while delivering enhanced robustness through model diversity, improved probability calibration, and consistent performance on challenging intermediate condition classes, which are essential for condition prioritization. Age emerged as the most influential feature, followed by length, material type, and diameter. ROC-AUC scores ranged from 0.894 to 0.998 across all models and classes, confirming high discriminative capability. This work demonstrates hybrid architectures for infrastructure diagnostics.

Keywords:

machine learning; meta-learning methods; condition assessment; water pipe; underground infrastructure

1. Introduction

Urban underground pipe networks are fundamental to supporting the health, safety, and economic vitality of cities [1], providing essential services such as water supply, sewage management [2], and stormwater drainage. As the age and complexity of these buried infrastructures [3] increase, so too do the challenges associated with maintenance and timely rehabilitation. Asset managers frequently encounter difficulties in assessing the actual condition [4,5] of underground pipes due to the limited accessibility and visibility inherent in these systems. Traditional assessment methods [6] often involve intrusive, time-consuming, and costly physical inspections, which may not adequately capture the nuanced interplay of factors influencing pipe deterioration [7]. This growing need for efficient, accurate, and scalable assessment strategies has prompted researchers and practitioners to explore advanced data-driven approaches for pipe condition [6] evaluation.

Recent advances in machine learning [8,9] have opened new avenues for condition assessment by leveraging the diverse and voluminous data generated from urban utility networks [10]. By systematically incorporating variables such as pipe age, material, diameter, length, soil properties, slope, and environmental indices, machine learning [11,12] enables the extraction of hidden patterns [13,14] and relationships beyond human capability. Conventional models [15], such as logistic regression [16] or decision trees, have demonstrated promise in classifying pipe condition [17] states when trained on large-scale datasets. However, the performance of any single model is often limited by dataset complexity [18], nonlinearity of the underlying processes, and the presence of noisy or missing data. To address these limitations, hybrid approaches [19] that combine multiple machine learning [20,21] algorithms into meta-models [22,23] have emerged as powerful alternatives, offering robustness and improved generalizability.

Hybrid machine learning [19] meta-models integrate the strengths of individual learners by employing both integration strategies and meta-learning techniques, making them more advanced and efficient compared to traditional ensemble approaches. This study explores a novel meta-model architecture [24] for the classification of water pipe conditions [25] in an urban context. Diverse base models, such as Random Forest [26], Light Gradient-Boosting Machine (LightGBM) [27], and Categorical Boosting (CatBoost) [28], are trained on a comprehensive dataset featuring key attributes of urban pipes. Through stacking and meta-learning, their collective predictions are synthesized to achieve higher predictive accuracy and resilience to data imperfections. This approach enhances the representation of complex interactions [29] among features, mitigates the risks of overfitting, and provides interpretable insights to guide infrastructure management decisions [30]. Model evaluation is conducted via multifaceted metrics [31], including accuracy [32], precision [33], recall [34], and F1 score [35], supplemented by visualizations and feature importance analyses [36].

Overall, this research demonstrates the feasibility and advantages of a hybrid machine learning meta-model framework for the condition assessment of urban underground pipes. By incorporating heterogeneous algorithms and focusing on both prediction quality and interpretability, the study addresses critical gaps in current assessment methodologies. The proposed approach fosters more reliable decision-making for urban asset management, potentially reducing maintenance costs, optimizing rehabilitation timing, and ultimately advancing the resilience of urban infrastructure systems. The findings highlight a scalable, data-driven path for municipalities and utility providers to modernize their assessment protocols and improve service continuity in rapidly evolving urban environments. The basic structure of this paper is as follows. Section 2 presents Related Works on machine learning approaches for infrastructure condition assessment; Section 3 introduces the main Methodology including the hybrid meta-model framework and evaluation metrics, and Section 4 presents data exploration. Section 5 shows the experimental results from the stacking architecture and individual model comparisons; Section 6 discusses the results including feature importance analysis and model performance evaluation; and Section 7 concludes this paper.

2. Related Works

Recent advancements in urban underground pipe condition assessment [37] research reveal a shift from traditional inspection techniques toward the integration of machine learning and data-driven strategies [38,39]. This evolution is propelled by the need to overcome the limitations of manual and technology-driven approaches, aiming for scalable solutions that ensure more accurate pipeline health monitoring [40]. To provide a comprehensive background, the following paragraphs critically compare pairs of significant studies, focusing on their methodologies, datasets, and results, and effectively illustrate the progression in this research domain.

To begin, the work of Zheng Liu and Yehuda Kleiner (2013) [41] can be contrasted with the empirical modeling approach of Mosavi et al. (2020) [42]. Whereas Liu and Kleiner provided a qualitative evaluation of direct and indirect technologies—such as CCTV, ultrasound, and smart robotics—their review highlighted capability (e.g., SmartBall detecting leaks < 0.026 L/h, LeakFinderRT location < 10 cm) but did not employ a quantitative dataset [41]. In contrast, Mosavi et al. applied Recursive Feature Elimination and ensemble machine learning (Random Forest, AdaBoost, GamBoost, Bagged CART) to a dataset of 339 groundwater locations and 15 variables. The Random Forest achieved an accuracy of 0.86 and a recall of 0.91, outperforming boosting models. Thus, this comparison demonstrates how transitioning from reviews [43] of technical tools to integrated data-driven frameworks can yield higher predictive accuracy and actionable outcomes [42].

Similarly, a notable comparison can be drawn between Rayhana et al. (2021) [44] and Mohsen Mohammadagha et al. (2025) [45], reflecting the progress from system-wide vision technology reviews to targeted machine learning implementations. While Rayhana et al. synthesized findings from datasets of 100 to over 2 million CCTV and SSET pipe images—demonstrating deep learning models such as DCNN and Faster R-CNN achieving defect detection accuracies up to 98% [44]—Mohammadagha et al. systematically modeled 612 cases of reinforced concrete sewer pipe inspections using both Artificial Neural Networks (ANN) and Multiple Linear Regression (MLR). The ANN model yielded R² = 0.9066, outperforming MLR [45]. Both studies endorse data-intensive approaches but differ in scale, with Rayhana et al. emphasizing automated vision at a network level and Mohammadagha et al. focusing on feature-driven pipeline condition forecasting at the asset level.

Furthermore, when considering the broad landscape of pipeline monitoring, the reviews by Jawwad Latif et al. (2022) [46] and by Liu & Kleiner (2013) [41] illuminate the evolution of sensor integration and methodological sophistication. On one hand, Latif et al. categorized monitoring into acoustic, electromagnetic, visual, and IoT-enabled methods, discussing visual classifiers like YOLOv3 and acoustic detections (SmartBall < 0.1 gal/hr) while emphasizing the need for robust, cost-effective, and machine learning integration [46]. On the other hand, Liu & Kleiner mainly addressed the capabilities and cost limitations of traditional and semi-automated systems. This comparison highlights a shift from static technology evaluation to the advocacy for dynamic, adaptable, and intelligent monitoring platforms [41].

What is equally important is that the studies of Dawood et al. (2020) [47] and Rayhana et al. (2021) [44] exemplify the harmonization of artificial intelligence theory with practical image-based inspection. Dawood et al. reviewed 66 studies across seven AI model categories, reporting that ANN models can reach R² of up to 0.9510 for failure prediction, though their findings relied on synthesizing published results rather than a singular dataset [47]. Conversely, Rayhana et al. demonstrated that vision-based deep learning (e.g., Faster R-CNN) achieved up to 98% accuracy in defect detection across enormous and diverse image collections [44]. Both works underscore the value of hybrid and data-driven frameworks, yet Dawood et al. foreground the potential of hybrid modeling logic, while Rayhana et al. stress the strengths of advanced computer vision in real-world applications.

Ultimately, while previous studies have advanced pipe condition assessment [48] with machine learning, Hybrid models [49], and meta-analysis, existing research has not systematically evaluated nor benchmarked a comprehensive hybrid meta-model that integrates the tree-based mixture [50] methods (CatBoost [51], LightGBM [52], Random Forest), and meta-learning for multi-class urban pipe condition prediction using real, multifaceted operational datasets [53]. In this research, we address these gaps by designing and implementing a unified hybrid meta-learning framework, comparing algorithmic performance and feature importance, and providing interpretable model diagnostics using a large-scale, diverse urban pipe dataset. A comprehensive comparison between previous methodological approaches and the proposed hybrid meta-model framework is presented in Table 1.

3. Methodology

The methodology for this research is structured around a hybrid machine learning meta-model by using Python 3.12.0 (Python Software Foundation, Wilmington, DE, USA) with scikit-learn 1.7.1, LightGBM 4.6.0 (Microsoft Corporation, Redmond, WA, USA), and CatBoost 1.2.8 (Yandex LLC, Moscow, Russia) (all open-source libraries available at https://pypi.org/), designed for interpretable condition assessment of urban underground water pipes. The urban water pipe dataset comprises 11,544 records of urban potable water distribution pipes and pressurized distribution mains for potable water supply systems, sourced from New Zealand’s municipal infrastructure systems, with 3297 records from the South Island and 8247 records from the North Island, reflecting the comprehensive coverage of both the Waimakariri District Council (2022) [25] and Matamata-Piako District Council (2024) [26,54,55] networks. While the original dataset contained more environmental and operational parameters, this study strategically selected seven core parameters that are most commonly available and represent the primary factors influencing pipe deterioration. Building on Figure 1’s workflow diagram, the process begins with extensive data acquisition, collecting features such as age, material, length, diameter, slope, soil properties, and thaw index from operational pipe inventories. Thaw index is a proxy for environmental stress that influences buried pipe bedding and surrounding soils; higher thaw indices generally imply deeper seasonal thaw and stability, or serviceability impacts relevant to condition assessment features. Data cleaning first harmonizes schemas, addresses missing values, and removes obvious errors, after which Exploratory Data Analysis (EDA) identifies outliers and distribution patterns. The workflow visualization employs standard symbols including checkmarks (✅) for process validation and X marks (❌) for decision points.

Following preprocessing and feature scaling, complementary machine learning algorithms are selected as base learners for the stacking ensemble: Random Forest, LightGBM [27], and CatBoost. This systematic selection process focuses on algorithms with diverse learning paradigms to maximize ensemble diversity and predictive performance. The stacking architecture employs enhanced base models with optimized hyperparameters configured with fixed random seeds for reproducibility.

The stacking ensemble integrates these base learners through stratified cross-validation, where out-of-fold predicted class probabilities serve as meta-features to prevent information leakage. A comprehensive meta-learner selection process evaluates distinct meta-learning approaches: Neural Network with deep architecture, Random Forest meta-learner, and LightGBM meta-learner. Each meta-learner candidate employs cross-validation techniques with the predict_proba stacking method to learn optimal weighting strategies for combining individual model outputs.

The meta-learner selection algorithm automatically identifies the best-performing architecture based on validation accuracy, ensuring optimal ensemble configuration. The meta-learner incorporates regularization techniques, including early stopping and adaptive learning rate optimization, to prevent overfitting. All meta-learners utilize cross-validation with the same stratified splits to maintain consistency in training procedures, while a held-out test split remains reserved for unbiased final evaluation.

Performance assessment employs comprehensive metrics, including accuracy, precision, recall, F1-score, and multi-class ROC-AUC curves, which are evaluated on an independent test set to ensure model validation.

A crucial component of this methodology is the use of mathematical formulas and evaluation metrics in the background of the models, ensuring best-practice model comparison and interpretability by calculating the accuracy, Precision, Recall, and F1 score. Seven common formulas drive the assessment: (1) Min-Max Normalization for feature scaling [56]; (2) Accuracy for overall performance; (3) Precision and (4) Recall (Sensitivity) for quantifying the correctness and completeness of class predictions; (5) F1 Score to balance precision and recall, especially under class imbalance; (6) Feature Importance, typically the mean decrease in impurity in tree-based models, to prioritize influential variables; and (7) Stacking Prediction [57], which mathematically combines the predictions of diverse base models via a meta-learner for improved overall robustness. This methodology uses well-established mathematical formulas and evaluation metrics that underpin the modeling and assessment processes. Some common formulas central to this study include Min-Max Normalization for Purpose Scales, featuring (variable) a range between 0 and 1. This ensures all features contribute equally to the model and prevents features with larger ranges from dominating, which is shown in Equation (1) [58]:

x^{'} = \frac{x - x_{m i n}}{x_{m a x} - x_{m i n}}

(1)

X

: Feature matrix of shape

(n, p)

, with

n

samples and

p

features.

x

: A scalar value of a feature before normalization.

x^{'}

: The normalized (scaled) value of

x

.

x_{m i n}

,

x_{m a x}

: Minimum and maximum values of a feature across the dataset, used in min–max scaling.

y

: Ground-truth class label vector; multiclass labels in

{0,1, \dots, K - 1}

after encoding.

\hat{y}

: Predicted class label(s).

K

: Number of classes in the multiclass classification problem.

n

: Number of samples;

p

: Number of features.

Accuracy is a fundamental metric for evaluating classification models, representing the proportion of correctly predicted instances among all predictions. Calculated as shown in Equation (2), which is a classification metric, it provides a straightforward measure of overall model performance, especially when the dataset is balanced between classes [28].

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(2)

Precision quantifies the correctness of positive predictions made by a model. Defined as shown in Equation (3), it measures the proportion of true positives among all instances predicted as positive. High precision indicates that the model makes very few false positive errors, which is vital in many applications.

P r e c i s i o n = \frac{T P}{T P + F P}

(3)

Recall, also known as sensitivity, assesses a model’s ability to identify all relevant positive cases. The formula, as shown in Equation (4), calculates the proportion of actual positives correctly predicted. High recall is essential when missing positive instances have significant consequences, such as in medical diagnoses or fraud detection.

R e c a l l = \frac{T P}{T P + F N}

(4)

The F1 score is the harmonic mean of precision and recall, providing a balanced metric for model evaluation, particularly with imbalanced datasets. Calculated as shown in Equation (5), it penalizes extreme values and offers a single, interpretable measure of model effectiveness.

F 1 = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(5)

T P

: True positives; number of positive instances correctly classified as positive.

T N

: True negatives; number of negative instances correctly classified as negative.

F P

: False positives; number of negative instances incorrectly classified as positive.

F N

: False negatives; number of positive instances incorrectly classified as negative.

Feature importance [59] quantifies the contribution of each input variable to a model’s predictive power. In tree-based models, it is often computed as the mean decrease in impurity, as shown in Equation (6), which is feature importance, when a feature is used for splitting. This metric aids in model interpretation, enabling practitioners to identify and prioritize influential variables in decision-making.

{Feature Importance}_{j} = \sum_{t \in Splits on j} \frac{N_{t}}{N} \cdot Δ I_{t}

(6)

where t is an index of the tree nodes where feature j is used for splitting. N_t is the number of samples reaching node t. N is the total number of samples.

Δ I_{t}

is the decrease in impurity (such as Gini impurity or entropy) caused by the split at node t. Stacking prediction combines multiple base models’ outputs using a meta-learner to improve predictive accuracy and robustness. The final prediction is expressed as shown in Equation (7). This approach leverages the strengths of diverse models, often outperforming individual learners in complex tasks.

\hat{y} = f_{m e t a} (f_{1} (X), f_{2} (X), \dots, f_{n} (X))

(7)

By applying all the above formulas in the model, the pipeline deploys and benchmarks Random Forest, LightGBM, and CatBoost models—each trained on stratified training/test splits, and their performance compared using accuracy, precision, recall, and F1. Individual feature importances are computed for transparent, actionable interpretation; stacking meta-models aggregates these predictions to further elevate performance. The integration of three models represents methodological novelty, allowing enhanced representation learning directly from infrastructure data.

In summary, this research advances the condition assessment of urban pipes by integrating a diverse stack of machine learning models—uniquely combining tree-based and neural network approaches within a meta-learning framework. Unlike some prior literature that focused on classic or singly ensembled models, our workflow consistently benchmarks stacking models, which demonstrate superior performance on water pipe data, revealing domain-driven feature patterns and reporting interpretable decision rules. The principal novelty lies in the comprehensive ensemble architecture with multiple meta-learner candidates, directly addressing prior gaps in feature interaction learning and transparency. This architecture delivers state-of-the-art prediction and interpretability, enabling urban utilities and asset managers to implement data-driven, generalizable, and actionable assessments for maintenance prioritization and long-term infrastructure resilience.

4. Data Exploration

A comprehensive correlation analysis in Figure 2 reveals critical feature interdependencies that inform both data preprocessing and model development strategies. The feature correlation matrix demonstrates moderate positive correlations between infrastructure characteristics, notably diameter-material (r = 0.24) and diameter-slope (r = 0.47), indicating that larger-diameter pipes are often associated with specific materials and terrain conditions. Significant negative correlations emerge between thaw index and soil properties (r = −0.38) and between thaw index and slope characteristics (r = −0.55), suggesting that environmental factors interact in predictable patterns across the infrastructure network. These statistical relationships align with engineering principles, where soil composition, environmental conditions, and pipe specifications collectively influence system performance and degradation patterns.

Three-dimensional visualization of the pipe condition distribution in Figure 3 provides insights into the multifactorial nature of infrastructure deterioration across age (X), diameter (Y), and length dimensions (Z). The 3D scatter plot reveals distinct clustering patterns where Condition 1 pipes (excellent condition) dominate the lower age ranges below 20 years, while deteriorated conditions (Classes 4 and 5) appear predominantly in aging infrastructure exceeding 40–60 years. Diameter–length relationships demonstrate heterogeneous distribution across condition classes, with larger-diameter pipes showing varied condition states independent of length, suggesting that age remains the primary deterioration driver. These spatial patterns validate domain knowledge regarding infrastructure lifecycle management and provide empirical evidence for age-weighted feature importance in the subsequent machine learning pipeline, ensuring that model predictions align with established engineering deterioration principles.

The comprehensive material frequency analysis in Figure 4, displayed on a logarithmic scale, demonstrates the hierarchical dominance of modern polymer-based materials in contemporary water infrastructure systems. Medium-Density Polyethylene (MDPE) represents the overwhelming majority of pipe installations, exceeding 4000 installations and reflecting current industry standards for durability and cost-effectiveness. Unplasticized Polyvinyl Chloride (UPVC) and standard Polyvinyl Chloride (PVC) follow as secondary materials with approximately 1500 and 1000 installations, respectively, while traditional materials, including Asbestos Cement (AC), Cast Iron (CI), and Steel (ST), demonstrate markedly lower frequencies below 100 installations each. This material distribution hierarchy validates the transition from metallic and cement-based systems to modern polymer solutions, providing essential context for material-based feature importance in deterioration prediction models.

Visualization of age distribution in Figure 5 reveals insights into infrastructure lifecycle patterns across the urban water network. The age histogram demonstrates a characteristic bimodal distribution with peak frequencies concentrated in the 0–20 year range, indicating substantial recent infrastructure investment and renewal activities. A secondary frequency peak emerges around 40–50 years, reflecting historical construction periods, while the logarithmic scale effectively captures the exponential decay in pipe counts for assets exceeding 60 years. This age profile suggests a system undergoing modernization, with the majority of infrastructure representing contemporary installation practices, while legacy components provide critical insights into long-term deterioration patterns for predictive modeling applications.

Advancing from material composition analysis, the kernel density estimation in Figure 6 reveals critical age-condition relationships that validate deterioration patterns. Condition 1 pipes exhibit sharp density peaks at younger ages (below 20 years), establishing excellent condition ratings for newly installed infrastructure. Conditions 2, 3, and 4 demonstrate sequential progression with density peaks at approximately 40, 50, and 50 years, respectively, indicating gradual deterioration patterns through mid-life infrastructure phases. Most significantly, Condition 5 shows pronounced density concentration beyond 65 years, establishing age as the primary deterioration predictor and confirming time-dependent infrastructure degradation patterns essential for predictive modeling.

The comprehensive pairplot in Figure 7 reveals multivariate relationships across the four primary features (Diameter, Material, Age, Length), colored by condition class. This analysis demonstrates the complexity of condition prediction, with substantial overlap between condition categories across most feature combinations, particularly evident in the scatter plot matrices. The diagonal density plots reveal distinct age-based separation patterns, where Condition 1 concentrates in younger age ranges while Conditions 4 and 5 extend into older infrastructure. However, diameter-length relationships show inter-condition mixing, highlighting the multifactorial nature of infrastructure deterioration and justifying the need for ensemble learning approaches to capture these complex, non-linear feature interactions.

Boxplot analysis in Figure 8 reveals distinct patterns validating infrastructure parameters for condition assessment. Age distribution demonstrates the strongest discriminative power, with excellent condition pipes exhibiting younger median ages and progressively increasing values through deteriorated conditions, establishing age as the primary indicator. Diameter distributions show consistent medians but increasing variability in deteriorated conditions, suggesting extreme values contribute to degradation. Material distributions reveal heterogeneous patterns with certain types associated with specific deterioration pathways, confirming the multifactorial nature requiring sophisticated algorithmic treatment.

5. Results

The hybrid machine learning meta-model developed for urban underground pipe condition assessment was evaluated using a comprehensive dataset of 11,544 water pipe records, strategically partitioned into 9235 training samples (80%) and 2309 testing samples (20%) using stratified sampling to maintain class distribution. The dataset encompasses seven key features, diameter, material, age, length, thaw index, soil properties, and slope, with condition ratings ranging from 1 (excellent) to 5 (poor). This section presents the detailed analysis results, including exploratory data analysis, feature correlations, distribution patterns, model performance comparisons, and the effectiveness of the proposed stacking meta-model architecture. The results demonstrate the superiority of the neural network-based ensemble approach over individual algorithms and provide insights into age as the most influential factor affecting pipe deterioration, followed by material composition and infrastructure geometry parameters.

The comparative performance analysis in Figure 9 demonstrates the superiority of the meta-learner across all evaluation metrics. The stacking ensemble achieves an accuracy of 96.67%, representing an improvement over individual base models: Random Forest (96.10%), LightGBM (96.15%), and CatBoost (96.58%). The meta-model demonstrates consistently high performance with precision (96.47%), recall (96.67%), and F1-score (96.38%), indicating balanced predictive capability across all condition classes without bias toward dominant categories.

The performance enhancement validates the effectiveness of the hybrid meta-learning approach, where the meta-learner successfully integrates diverse algorithmic strengths while mitigating individual model limitations. The marginal but consistent improvements across all metrics demonstrate that the ensemble captures complex feature interactions, particularly in distinguishing intermediate condition classes where infrastructure deterioration patterns exhibit subtle variations.

The receiver operating characteristic curves in Figure 10 reveal discriminative performance across all models and condition classes, with micro-averaged AUC values consistently exceeding 0.997 for the meta-model. Individual class performance demonstrates separation capability, with Condition 1 (excellent) and Condition 5 (poor) achieving near-perfect classification accuracy, while intermediate conditions (2–4) maintain robust AUC scores above 0.95, indicating reliable multi-class discrimination even for subtle deterioration states.

The ROC analysis validates the meta-model’s capacity to maintain high sensitivity and specificity simultaneously across all condition categories. The consistently superior performance of the stacking ensemble compared to individual base learners demonstrates enhanced robustness against false positive and false negative classifications, critical for infrastructure management decisions where misclassification costs can impact maintenance resource allocation and system reliability.

Feature importance analysis across all models consistently in Figure 11 identifies age as the dominant predictor, contributing approximately 38.5% of the predictive power, followed by length (22.6%), material (15.6%), and diameter (13.2%). However, notable algorithmic variations emerge in feature weighting patterns, reflecting distinct model architectures and learning mechanisms. Random Forest exhibits balanced importance distribution through its ensemble of decision trees, while LightGBM shows enhanced sensitivity to age-related patterns due to its gradient boosting optimization that iteratively focuses on difficult cases where temporal deterioration is most pronounced.

CatBoost demonstrates unique material importance elevation compared to other models, attributed to its specialized categorical feature handling algorithms that better capture material-specific deterioration pathways without extensive preprocessing. The meta-model shows intermediate importance patterns that synthesize individual model strengths while maintaining the established hierarchical ranking. These algorithmic differences highlight complementary learning approaches: Random Forest’s random subspace sampling captures diverse feature interactions, LightGBM’s gradient-based optimization emphasizes age-related patterns, and CatBoost’s categorical handling enhances material discrimination, collectively justifying the ensemble approach where diverse algorithmic perspectives improve overall predictive robustness.

Despite these model-specific variations, the convergent identification of age as the primary predictor across all algorithms validates the robustness of this finding and demonstrates that temporal deterioration represents the fundamental physical process underlying infrastructure condition assessment. The relatively balanced contribution of secondary features (approximately 60% combined) supports the multi-factorial modeling approach, indicating that comprehensive condition assessment requires sophisticated integration of temporal, physical, and environmental variables through diverse algorithmic lenses.

The consolidated feature importance ranking in Figure 12 confirms age as the predominant condition predictor with a normalized importance of 0.385, establishing temporal factors as the primary deterioration driver across all modeling approaches. Secondary features demonstrate meaningful but reduced contributions—length (0.226), material (0.156), and diameter (0.132)—while environmental factors (thaw index, soil, slope) provide supplementary predictive value, collectively accounting for the remaining variance in condition assessment.

The confusion matrix in Figure 13 demonstrates classification accuracy with 1989 correctly classified Condition 1 pipes and minimal misclassification across all categories. The meta-model achieves acceptable performance for critical condition states: Condition 5 (poor) with 75 correct classifications from 79 total cases (94.9% accuracy) and Condition 1 (excellent) with near-perfect precision. Intermediate conditions maintain diagonal concentration, with Condition 2 (85/109 correct) and Condition 3 (73/98 correct) demonstrating reliable classification capability.

The predominant concentration of values along the main diagonal confirms the meta-model’s superior discriminative capability and practical utility for infrastructure management applications. The minimal off-diagonal scatter validates the ensemble’s ability to distinguish between adjacent condition classes, critical for accurate maintenance prioritization and resource allocation in urban water network management.

The principal component analysis in Figure 14 reveals distinct decision boundaries across all models when projected into two-dimensional space, with the meta-learner demonstrating the most refined classification regions. The visualization confirms effective separation between condition classes, particularly distinguishing excellent (Condition 1) from deteriorated states (Conditions 4–5), while intermediate conditions show expected overlap reflecting the gradual nature of infrastructure deterioration processes.

Permutation importance analysis of meta-features in Figure 15 reveals balanced contributions from base learners, with LightGBM (56.3%) providing the strongest individual contribution, followed by Random Forest (43.5%), while CatBoost contributes specialized pattern recognition for complex cases. This distribution validates the ensemble design philosophy, where diverse algorithmic strengths combine to achieve superior predictive performance through complementary learning approaches.

Partial dependence analysis in Figure 16 reveals how individual base learner predictions influence the meta-model’s decision-making process. The plots demonstrate that the meta-model exhibits increasing dependence on Random Forest predictions above the 0.6 probability threshold, shows strong sensitivity to LightGBM predictions in the 0.4–0.8 range with steepest response around 0.6, and maintains consistent reliance on CatBoost predictions across the full probability spectrum. The vertical black lines indicate the data concentration regions for each base model’s probability predictions, marking areas where the partial dependence analysis is most statistically reliable. These dependency patterns validate the meta-learning architecture’s ability to leverage complementary strengths from each base model for enhanced predictive performance.

The comprehensive results demonstrate that the proposed hybrid meta-learning framework represents an advancement in urban underground pipe condition assessment, achieving a state-of-the-art accuracy of 96.67% through systematic integration of diverse machine learning paradigms. The consistent performance across multiple evaluation metrics, robust feature importance patterns establishing age as the primary deterioration predictor, and multi-class discrimination capability validate the practical applicability of this approach for real-world infrastructure management. The successful combination of Random Forest, LightGBM, and CatBoost base learners with meta-learning delivers both superior predictive accuracy and interpretable insights, providing municipalities and utility providers with a reliable, data-driven tool for optimizing maintenance strategies and enhancing urban infrastructure resilience.

6. Discussion

The results demonstrate that the proposed hybrid machine learning meta-model represents an advancement in urban underground pipe condition assessment. The stacking meta-model achieves 96.67% accuracy, surpassing individual base learners by effectively integrating the complementary strengths of tree-based methods (Random Forest, LightGBM, CatBoost) within a unified ensemble framework. The average feature importance analysis reveals age as the dominant predictor (38.5%), followed by length (22.6%), material (15.6%), and diameter (13.2%), while environmental factors contribute smaller but meaningful influences. Feature correlation and distribution [60] studies further validate the multi-dimensional nature of pipe degradation and reinforce the need for sophisticated machine learning approaches. The model’s consistent performance across all condition classes—confirmed through detailed confusion matrices and ROC curve [61] analyses—highlights its practical applicability for data-driven infrastructure management. These findings establish a foundation for implementing intelligent maintenance strategies and optimizing resource allocation in urban water networks.

The successful fusion of diverse algorithmic paradigms within the meta-model architecture marks a methodological contribution to the field. By combining complementary learning mechanisms from different model families—Random Forest’s ensemble diversity, LightGBM’s gradient optimization, and CatBoost’s categorical handling—this hybrid approach delivers both superior accuracy and enhanced interpretability. Beyond water pipe assessment, this meta-learning framework demonstrates strong potential for adaptation to other infrastructure evaluation challenges, reinforcing its value as a versatile and scalable data-driven decision-support tool.

The evaluation employs stratified data splitting and class-sensitive metrics (macro/micro-AUC, weighted F1-score) to mitigate potential imbalance effects. The resulting high discrimination across all condition classes suggests that observed performance reflects genuine learnable structure in tabular operational data rather than artifacts of class skew or single-feature dominance. This performance surpasses comparable studies: Mohammadagha et al. (2025) achieved R² = 0.9066 with ANN models on reinforced concrete sewer pipes [45], while Mosavi et al. (2020) reached 86% accuracy for groundwater potential prediction using Random Forest [42]. The consistent feature importance rankings, with age emerging as the primary predictor, corroborate findings from Dawood et al. (2020) in their comprehensive AI review [47]. However, unlike some previous studies focusing on different algorithmic approaches, this research demonstrates that systematic meta-model integration overcomes individual model limitations while preserving interpretability.

Compared to vision-based approaches reviewed by Rayhana et al. (2021), which achieved up to 98% accuracy but required massive image datasets (up to 2 million images) [44], the proposed framework achieves comparable performance using structured operational data with significantly reduced computational requirements and enhanced practical deployability in resource-constrained municipal environments.

7. Conclusions

This research demonstrates that hybrid machine learning meta-models represent an advancement in urban underground pipe condition assessment. The stacking ensemble achieves 96.67% accuracy by systematically integrating Random Forest, LightGBM, and CatBoost algorithms, surpassing individual model performance while maintaining interpretability. The comprehensive evaluation across 11,544 pipe records validates age as the primary deterioration predictor (38.5% importance), followed by physical characteristics. The robust multi-class discriminative capability (0.894 < AUC < 0.998) across all condition states confirms the framework’s practical applicability for infrastructure management, presenting a foundation for data-driven maintenance strategies. The methodological contributions extend predictive accuracy to include enhanced model interpretability and systematic algorithmic integration. The successful deployment using readily available operational data (diameter, material, age, length, environmental factors) demonstrates scalability across diverse urban networks.

Author Contributions

Conceptualization, M.M.; methodology, M.M.; software, M.M.; validation, M.M.; formal analysis, M.M.; investigation, M.M.; resources, M.M.; data curation, M.M.; writing—original draft preparation, M.M.; writing—review and editing, M.M., M.N., V.K. and A.J.; visualization, M.M.; supervision, M.N., V.K. and A.J.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the author (M.M.) on request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Tee, K.F.; Khan, L.R.; Chen, H.P.; Alani, A.M. Reliability based life cycle cost optimization for underground pipeline networks. Tunn. Undergr. Space Technol. 2014, 43, 32–40. [Google Scholar] [CrossRef]
Najafi, M.; Gokhale, S.B. Trenchless Technology: Pipeline and Utility Design, Construction, and Renewal; McGraw-Hill: New York, NY, USA, 2022; Available online: https://cir.nii.ac.jp/crid/1130000794815910144 (accessed on 19 July 2025).
Rogers, C.D.F.; Hao, T.O.N.G.; Costello, S.B.; Burrow, M.P.N.; Metje, N.I.C.O.L.E.; Chapman, D.N.; Parker, J.; Armitage, R.J.; Anspach, J.H.; Muggleton, J.M.; et al. Condition assessment of the surface and buried infrastructure—A proposal for integration. Tunn. Undergr. Space Technol. 2012, 28, 202–211. [Google Scholar] [CrossRef]
Najafi, M.; Kulandaivel, G. Pipeline Condition Prediction Using Neural Network Models. In Proceedings of the Pipeline Division Specialty Conference, Houston, TX, USA, 21–24 August 2005; pp. 767–781. [Google Scholar] [CrossRef]
Thomas, M.; Dorian, B.; Iguchi, T.; Chini, C. Code and data for “Comparing Methods for Water Distribution System Condition Assessment and Forecasting”. Zenodo. 2025. Available online: https://zenodo.org/records/15770108 (accessed on 19 July 2025).
Hao, T.; Rogers, C.; Metje, N.; Chapman, D.; Muggleton, J.; Foo, K.; Wang, P.; Pennock, S.; Atkins, P.; Swingler, S.; et al. Condition assessment of the buried utility service infrastructure. Tunn. Undergr. Space Technol. 2012, 28, 331–344. [Google Scholar] [CrossRef]
Sun, L.; Shang, Z.; Xia, Y.; Bhowmick, S.; Nagarajaiah, S. Review of Bridge Structural Health Monitoring Aided by Big Data and Artificial Intelligence: From Condition Assessment to Damage Detection. J. Struct. Eng. 2020, 146, 04020073. [Google Scholar] [CrossRef]
Banchi, L.; Pereira, J.; Pirandola, S. Generalization in Quantum Machine Learning: A Quantum Information Standpoint. PRX Quantum 2021, 2, 040321. [Google Scholar] [CrossRef]
Munikoti, S.; Agarwal, D.; Das, L.; Halappanavar, M.; Natarajan, B. Challenges and Opportunities in Deep Reinforcement Learning With Graph Neural Networks: A Comprehensive Review of Algorithms and Applications. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 15051–15071. [Google Scholar] [CrossRef]
Khan, Z.; Zayed, T.; Moselhi, O. Structural Condition Assessment of Sewer Pipelines. J. Perform. Constr. Facil. 2010, 24, 170–179. [Google Scholar] [CrossRef]
Chen, M.; Challita, U.; Saad, W.; Yin, C.; Debbah, M. Artificial Neural Networks-Based Machine Learning for Wireless Networks: A Tutorial. IEEE Commun. Surv. Tutor. 2019, 21, 3039–3071. [Google Scholar] [CrossRef]
Asselman, A.; Khaldi, M.; Aammou, S. Enhancing the prediction of student performance based on the machine learning XGBoost algorithm. Interact. Learn. Environ. 2023, 31, 3360–3379. [Google Scholar] [CrossRef]
Liu, Y.; Sun, Y.; Xue, B.; Zhang, M.; Yen, G.G.; Tan, K.C. A Survey on Evolutionary Neural Architecture Search. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 550–570. [Google Scholar] [CrossRef] [PubMed]
Berahmand, K.; Daneshfar, F.; Salehi, E.S.; Li, Y.; Xu, Y. Autoencoders and their applications in machine learning: A survey. Artif. Intell. Rev. 2024, 57, 28. [Google Scholar] [CrossRef]
Singh, N.K.C.K. A review on conventional machine learning vs deep learning. In Proceedings of the 2018 International Conference on Computing, Power and Communication Technologies (GUCON), Greater Noida, India, 28–29 September 2018; pp. 347–352. [Google Scholar] [CrossRef]
Christodoulou, E.; Ma, J.; Collins, G.S.; Steyerberg, E.W.; Verbakel, J.Y.; Van Calster, B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J. Clin. Epidemiol. 2019, 110, 12–22. [Google Scholar] [CrossRef] [PubMed]
Sinha, S.K.; Fieguth, P.W. Neuro-fuzzy network for the classification of buried pipe defects. Autom. Constr. 2006, 15, 73–83. [Google Scholar] [CrossRef]
Smith, M.R.; Martinez, T.; Giraud-Carrier, C. An instance level analysis of data complexity. Mach. Learn. 2014, 95, 225–256. [Google Scholar] [CrossRef]
Azevedo, B.F.; Rocha, A.M.A.C.; Pereira, A.I. Hybrid approaches to optimization and machine learning methods: A systematic literature review. Mach. Learn. 2024, 113, 4055–4097. [Google Scholar] [CrossRef]
Tyralis, H.; Papacharalampous, G.; Langousis, A. Super ensemble learning for daily streamflow forecasting: Large-scale demonstration and comparison with multiple machine learning algorithms. Neural Comput. Appl. 2021, 33, 3053–3068. [Google Scholar] [CrossRef]
Russo, D.P.; Zorn, K.M.; Clark, A.M.; Zhu, H.; Ekins, S. Comparing Multiple Machine Learning Algorithms and Metrics for Estrogen Receptor Binding Prediction. Mol. Pharm. 2018, 15, 4361–4370. [Google Scholar] [CrossRef]
Dluhoš, P.; Schwarz, D.; Cahn, W.; van Haren, N.; Kahn, R.; Španiel, F.; Horáček, J.; Kašpárek, T.; Schnack, H. Multi-center machine learning in imaging psychiatry: A meta-model approach. Neuroimage 2017, 155, 10–24. [Google Scholar] [CrossRef]
Vanschoren, J. The Springer Series on Challenges in Machine Learning Automated Machine Learning Methods, Systems, Challenges. 2019. Available online: http://www.springer.com/series/15602 (accessed on 3 September 2025).
Zhang, P.; Wu, H.N.; Chen, R.P.; Chan, T.H.T. Hybrid meta-heuristic and machine learning algorithms for tunneling-induced settlement prediction: A comparative study. Tunn. Undergr. Space Technol. 2020, 99, 103383. [Google Scholar] [CrossRef]
Aslam, B.; Maqsoom, A.; Cheema, A.H.; Ullah, F.; Alharbi, A.; Imran, M. Water Quality Management Using Hybrid Machine Learning and Data Mining Algorithms: An Indexing Approach. IEEE Access 2022, 10, 119692–119705. [Google Scholar] [CrossRef]
Rodriguez-Galiano, V.; Sanchez-Castillo, M.; Chica-Olmo, M.; Chica-Rivas, M. Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol. Rev. 2015, 71, 804–818. [Google Scholar] [CrossRef]
Yan, J.; Xu, Y.; Cheng, Q.; Jiang, S.; Wang, Q.; Xiao, Y.; Ma, C.; Yan, J.; Wang, X. LightGBM: Accelerated genomically designed crop breeding through ensemble learning. Genome Biol. 2021, 22, 271. [Google Scholar] [CrossRef]
Ibrahim, A.A.; Ridwan, R.L.; Muhammed, M.M.; Abdulaziz, R.O.; Saheed, G.A. Comparison of the CatBoost Classifier with other Machine Learning Methods. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 738–748. [Google Scholar] [CrossRef]
Tang, Y.; Kurths, J.; Lin, W.; Ott, E.; Kocarev, L. Introduction to Focus Issue: When machine learning meets complex systems: Networks, chaos, and nonlinear dynamics. Chaos 2020, 30, 063151. [Google Scholar] [CrossRef]
Kabir, G.; Sadiq, R.; Tesfamariam, S. A review of multi-criteria decision-making methods for infrastructure management. Struct. Infrastruct. Eng. 2014, 10, 1176–1210. [Google Scholar] [CrossRef]
Zhou, J.; Gandomi, A.H.; Chen, F.; Holzinger, A. Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics. Electronics 2021, 10, 593. [Google Scholar] [CrossRef]
Yin, M.; Vaughan, J.W.; Wallach, H. Understanding the effect of accuracy on trust in machine learning models. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Glasgow, Scotland, UK, 4–9 May 2019; Association for Computing Machinery: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
Michaud, E.J.; Liu, Z.; Tegmark, M. Precision Machine Learning. Entropy 2023, 25, 175. [Google Scholar] [CrossRef]
Davis, J.; Goadrich, M. The Relationship Between Precision-Recall and ROC Curves. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 233–240. [Google Scholar] [CrossRef]
Sokolova, M.; Japkowicz, N.; Szpakowicz, S. Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation. AAAI Workshop Tech. Rep. 2006, WS-06-06, 1015–1021. [Google Scholar] [CrossRef]
Asare, K.O.; Terhorst, Y.; Vega, J.; Peltonen, E.; Lagerspetz, E.; Ferreira, D. Predicting depression from smartphone behavioral markers using machine learning methods, hyperparameter optimization, and feature importance analysis: Exploratory study. JMIR Mhealth Uhealth 2021, 9, e26540. [Google Scholar] [CrossRef]
Wang, M.; Luo, H.; Cheng, J.C.P. Towards an automated condition assessment framework of underground sewer pipes based on closed-circuit television (CCTV) images. Tunn. Undergr. Space Technol. 2021, 110, 103840. [Google Scholar] [CrossRef]
Pollice, R.; Gomes, G.d.P.; Aldeghi, M.; Hickman, R.J.; Krenn, M.; Lavigne, C.; Lindner-D’addario, M.; Nigam, A.; Ser, C.T.; Yao, Z.; et al. Data-Driven Strategies for Accelerated Materials Design. Acc. Chem. Res. 2021, 54, 849–860. [Google Scholar] [CrossRef]
Cheng, J.; Yang, Y.; Tang, X.; Xiong, N.; Zhang, Y.; Lei, F. Generative Adversarial Networks: A Literature Review. KSII Trans. Internet Inf. Syst. (TIIS) 2020, 14, 4625–4647. [Google Scholar] [CrossRef]
Yuan, F.-G.; Zargar, S.A.; Chen, Q.; Wang, S. Machine learning for structural health monitoring: Challenges and opportunities. Sens. Smart Struct. Technol. Civ. Mech. Aerosp. Syst. 2020, 11379, 1137903. [Google Scholar] [CrossRef]
Liu, Z.; Kleiner, Y. State of the art review of inspection technologies for condition assessment of water pipes. Measurement 2013, 46, 1–15. [Google Scholar] [CrossRef]
Mosavi, A.; Hosseini, F.S.; Choubin, B.; Goodarzi, M.; Dineva, A.A.; Sardooi, E.R. Ensemble Boosting and Bagging Based Machine Learning Models for Groundwater Potential Prediction. Water Resour. Manag. 2021, 35, 23–37. [Google Scholar] [CrossRef]
Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in Vision: A Survey. ACM Comput. Surv. 2022, 54, 1–41. [Google Scholar] [CrossRef]
Rayhana, R.; Jiao, Y.; Zaji, A.; Liu, Z. Automated Vision Systems for Condition Assessment of Sewer and Water Pipelines. IEEE Trans. Autom. Sci. Eng. 2021, 18, 1861–1878. [Google Scholar] [CrossRef]
Mohammadagha, M.; Najafi, M.; Kaushal, V.; Jibreen, A. Machine Learning Models for Reinforced Concrete Pipes Condition Prediction: The State-of-the-Art Using Artificial Neural Networks and Multiple Linear Regression in a Wisconsin Case Study. 2025. Available online: https://arxiv.org/pdf/2502.00363 (accessed on 19 July 2025).
Latif, J.; Shakir, M.Z.; Edwards, N.; Jaszczykowski, M.; Ramzan, N.; Edwards, V. Review on condition monitoring techniques for water pipelines. Measurement 2022, 193, 110895. [Google Scholar] [CrossRef]
Dawood, T.; Elwakil, E.; Novoa, H.M.; Delgado, J.F.G. Artificial intelligence for the modeling of water pipes deterioration mechanisms. Autom. Constr. 2020, 120, 103398. [Google Scholar] [CrossRef]
Liu, Y.; Bao, Y. Review on automated condition assessment of pipelines with machine learning. Adv. Eng. Inform. 2022, 53, 101687. [Google Scholar] [CrossRef]
Mao, Y.; Li, Y.; Teng, F.; Sabonchi, A.K.S.; Azarafza, M.; Zhang, M. Utilizing Hybrid Machine Learning and Soft Computing Techniques for Landslide Susceptibility Mapping in a Drainage Basin. Water 2024, 16, 380. [Google Scholar] [CrossRef]
Nunez, I.; Marani, A.; Nehdi, M.L. Mixture Optimization of Recycled Aggregate Concrete Using Hybrid Machine Learning Model. Materials 2020, 13, 4331. [Google Scholar] [CrossRef]
Beskopylny, A.N.; Stel’makh, S.A.; Shcherban’, E.M.; Mailyan, L.R.; Meskhi, B.; Razveeva, I.; Chernil’nik, A.; Beskopylny, N. Concrete Strength Prediction Using Machine Learning Methods CatBoost, k-Nearest Neighbors, Support Vector Regression. Appl. Sci. 2022, 12, 10864. [Google Scholar] [CrossRef]
GuolinKe, Q.M.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 52. [Google Scholar]
Chauhan, K.; Jani, S.; Thakkar, D.; Dave, R.; Bhatia, J.; Tanwar, S.; Obaidat, M.S. Automated Machine Learning: The New Wave of Machine Learning. In Proceedings of the 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), Bangalore, India, 5–7 March 2020; pp. 205–212. [Google Scholar] [CrossRef]
Wikipedia. Matamata-Piako District—Wikipedia. Available online: https://en.wikipedia.org/wiki/Matamata-Piako_District (accessed on 19 July 2025).
Wikipedia. Waimakariri District—Wikipedia. Available online: https://en.wikipedia.org/wiki/Waimakariri_District (accessed on 19 July 2025).
Cabello-Solorzano, K.; de Araujo, I.O.; Peña, M.; Correia, L.; Tallón-Ballesteros, A.J. The Impact of Data Normalization on the Accuracy of Machine Learning Algorithms: A Comparative Analysis. Lect. Notes Netw. Syst. 2023, 750 LNNS, 344–353. [Google Scholar] [CrossRef]
Wang, T.; Zhang, K.; Thé, J.; Yu, H. Accurate prediction of band gap of materials using stacking machine learning model. Comput. Mater. Sci. 2022, 201, 110899. [Google Scholar] [CrossRef]
Henderi, H.; Wahyuningsih, T.; Rahwanto, E. Comparison of Min-Max normalization and Z-Score Normalization in the K-nearest neighbor (kNN) Algorithm to Test the Accuracy of Types of Breast Cancer. Int. J. Inform. Inf. Syst. 2021, 4, 13–20. [Google Scholar] [CrossRef]
Zien, A.; Krämer, N.; Sonnenburg, S.; Rätsch, G. The Feature Importance Ranking Measure. Lect. Notes Comput. Sci. (Incl. Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinform.) 2009, 5782 LNAI, 694–709. [Google Scholar] [CrossRef]
Hall, M.A. Correlation-Based Feature Selection for Machine Learning. 1999. Available online: https://hdl.handle.net/10289/15043 (accessed on 19 July 2025).
Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef]

Figure 1. Workflow for Underground Condition Assessment Using Meta-Learning Approaches.

Figure 2. Feature Correlation Matrix.

Figure 3. 3D condition Plot.

Figure 4. Urban water pipe materials.

Figure 5. Urban water pipes: age distribution.

Figure 6. Age distribution reveals deterioration pattern of water pipes.

Figure 7. Feature pairplot matrix.

Figure 8. Boxplots of Key Pipeline Features by Condition Category.

Figure 9. Model Performance Comparison Across Key Metrics.

Figure 10. ROC Curves for All Classification Models.

Figure 11. Feature Importance Analysis All Models.

Figure 12. Average Feature Importance Across Models.

Figure 13. Meta-Model Confusion Matrix Performance Results.

Figure 14. PCA Decision Boundaries All Models.

Figure 15. Permutation Importance of Meta-Features (Neural Network).

Figure 16. Meta-Feature Permutation Importance Analysis.

Table 1. Comparison table.

Study	Methodology	Task/Target	Headline Results	Notes vs. This Work
Liu & Kleiner (2013) [41]	Qualitative review of inspection technologies (CCTV, ultrasound, smart robotics)	Inspection capabilities and limitations	Capabilities such as SmartBall leak detection < 0.026 L/h; LeakFinderRT location < 10 cm (illustrative)	Review synthesizes tools, not predictive modeling; complements data-driven ML by framing sensor limits addressed via analytics.
Mosavi et al. (2020) [42]	Ensemble ML with RFE; RF, AdaBoost, GamBoost, Bagged CART	Groundwater potential classification	Random Forest Accuracy ≈ 0.86, Recall ≈ 0.91	Shows ensemble benefit on tabular tasks; smaller dataset and different domain than pipes; supports value of non-linear learners used here.
Rayhana et al. (2021) [44]	Vision-based DL (DCNN, Faster R-CNN) on CCTV/SSET images	Defect detection in images	Up to 98% detection accuracy	Image-centric, network-scale vision; complementary to tabular ML; indicates strong performance of DL when imagery is available.
Dawood et al. (2020) [47]	Review of 66 AI studies for pipe deterioration	Failure/condition modeling landscape	ANN up to R² ≈ 0.9510 (reported across studies)	Highlights ANN potential and hybrid logic; motivates combining diverse inductive biases as done in stacking.
Latif et al. (2022) [46]	Review of acoustic, EM, visual, IoT monitoring + ML	Condition monitoring taxonomy	Emphasizes ML integration need; examples like SmartBall < 0.1 gal/hr	Frames convergence of sensing and ML; supports operational context for analytics pipelines.
Mohammadagha et al. (2025) [45]	ANN and MLR on sewer pipes	Condition prediction (tabular)	ANN R² = 0.9066; ANN > MLR	Tabular supervised learning confirms non-linear gains; smaller dataset and different materials than current work.
This work (proposed)	Stacking meta-model integrating Random Forest, LightGBM, CatBoost; calibrated probabilities; stratified split	5-class condition rating (1 best → 5 worst)	Stacking Accuracy ≈ 0.967; Precision ≈ 0.965; Recall ≈ 0.967; F1 ≈ 0.964; macro-/micro-ROC AUC > 0.95	Outperforms simpler baselines (e.g., RF Accuracy ≈ 0.961; CatBoost Accuracy ≈ 0.966); aligns with top single boosters while adding calibration; Age most influential feature.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mohammadagha, M.; Najafi, M.; Kaushal, V.; Jibreen, A. Hybrid Machine Learning Meta-Model for the Condition Assessment of Urban Underground Pipes. Infrastructures 2025, 10, 282. https://doi.org/10.3390/infrastructures10110282

AMA Style

Mohammadagha M, Najafi M, Kaushal V, Jibreen A. Hybrid Machine Learning Meta-Model for the Condition Assessment of Urban Underground Pipes. Infrastructures. 2025; 10(11):282. https://doi.org/10.3390/infrastructures10110282

Chicago/Turabian Style

Mohammadagha, Mohsen, Mohammad Najafi, Vinayak Kaushal, and Ahmad Jibreen. 2025. "Hybrid Machine Learning Meta-Model for the Condition Assessment of Urban Underground Pipes" Infrastructures 10, no. 11: 282. https://doi.org/10.3390/infrastructures10110282

APA Style

Mohammadagha, M., Najafi, M., Kaushal, V., & Jibreen, A. (2025). Hybrid Machine Learning Meta-Model for the Condition Assessment of Urban Underground Pipes. Infrastructures, 10(11), 282. https://doi.org/10.3390/infrastructures10110282

Article Menu

Hybrid Machine Learning Meta-Model for the Condition Assessment of Urban Underground Pipes

Abstract

1. Introduction

2. Related Works

3. Methodology

4. Data Exploration

5. Results

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI