Impact of Data Quality on CNN-Based Sewer Defect Detection

Jang, Seokwoo; Kim, Dooil

doi:10.3390/w17132028

Open AccessArticle

Impact of Data Quality on CNN-Based Sewer Defect Detection

by

Seokwoo Jang

and

Dooil Kim

^*

Civil and Environmental Engineering, Dankook University, 152, Jukjeon-ro, Suji-gu, Yongin-si 16890, Gyeonggi-do, Republic of Korea

^*

Author to whom correspondence should be addressed.

Water 2025, 17(13), 2028; https://doi.org/10.3390/w17132028

Submission received: 19 May 2025 / Revised: 28 June 2025 / Accepted: 2 July 2025 / Published: 6 July 2025

(This article belongs to the Special Issue Urban Sewer Systems: Monitoring, Modeling and Management)

Download

Browse Figures

Versions Notes

Abstract

Sewer pipelines are essential urban infrastructure that play a key role in sanitation and disaster prevention. Regular condition assessments are necessary to detect defects early and determine optimal maintenance timing. However, traditional visual inspection using closed-circuit television (CCTV) footage is time-consuming, labor-intensive, and dependent on subjective human judgment. To address these limitations, this study develops a convolutional neural network (CNN)-based sewer defect classification model and analyzes how data quality—such as mislabeled or redundant images—affects model accuracy. A large-scale public dataset of approximately 470,000 sewer images was used for training. The model was designed to classify non-defect and three major defect categories. Based on the ResNet50 architecture, the model incorporated dropout and L2 regularization to prevent overfitting. Experimental results showed the highest accuracy of 92.75% at a dropout rate of 0.2 and a regularization coefficient of 0.01. Further analysis revealed that mislabeled, redundant, or obscured images within the dataset negatively impacted model performance. Additional experiments quantified the impact of data quality on accuracy, emphasizing the importance of proper dataset curation. This study provides practical insights into optimizing data-driven approaches for automated sewer defect detection and high-performance model development.

Keywords:

automated sewer inspection; CNN; data quality assessment; deep learning; image-based defect classification; sewer pipeline defect detection

1. Introduction

Sewers are essential urban infrastructure, playing a critical role in public sanitation and stormwater management. However, the deterioration of aged sewers has emerged as a significant concern in many countries [1]. The main causes of sewer deterioration include long-term degradation of material properties, poor construction, root intrusion, heavy traffic loads, structural deformation due to ground settlement, infiltration of groundwater, and crown corrosion from bio-chemical reaction [2,3,4]. These deteriorations can lead to defects such as cracks, fractures, and joint dislocations in sewer pipelines, which may ultimately cause severe issues such as reduced flow capacity, flooding, and sinkholes. Such problems can be prevented by determining the timing for sewer rehabilitation in advance through regular condition assessments. Sewer condition assessments are conducted using various evaluation protocols like the pipeline assessment certification program (PACP) developed by the National Association of Sewer Service Companies, commonly known as NASSCO [5,6]. After inspection, a condition grade is assigned to each sewer segment based on the protocol, which in turn determines the priority for rehabilitation. These condition assessment protocols generally consist of the following steps: collecting sewer information, identifying and assessing defect severity, and generating an evaluation report [1,7,8].

Sewer data collection methods can generally be classified into visual, laser profiling, and sonar techniques [9]. Among these, visual inspection using CCTV video recording is the most commonly used method due to its operational simplicity. The recorded images are analyzed by experts through visual observation to identify defects, and this method relies entirely on human interpretation. However, such human-dependent approaches are time-consuming and costly, and they suffer from reduced detection accuracy due to inspector fatigue. Furthermore, the results may vary depending on the inspector’s experience and subjective judgment [10]. To overcome these limitations, it is necessary to develop technologies that can automatically detect sewer defects, thereby reducing the need for human intervention and enabling more objective and quantitative assessments [7,11,12].

With recent advances in artificial intelligence (AI) technology, AI-based approaches have been actively studied in the field of image analysis. One area of research for automating the analysis of images is computer vision [13,14,15]. This technique enables the acquisition, processing, and analysis of images by AI. However, it is sensitive to lighting conditions, surface contamination, and the variability of defect types, which can lead to inconsistent detection accuracy and difficulty in effectively identifying complex defects [8,12,16].

Convolutional neural networks (CNNs), a subfield of deep learning (DL), have demonstrated excellent performance in image analysis and have been widely applied in fields such as autonomous driving and face recognition [17]. CNN-based automation research has also been conducted for sewer defect detection [8,12,18]. These studies have employed techniques such as dropout [12], data augmentation [8], and image preprocessing using polar coordinate transformation [18] to prevent model overfitting due to limited data. Many prior studies in sewer defect detection have relied on relatively small datasets or architectures that decouple feature extraction and classification stages. Such approaches inherently limit model adaptability to complex defect patterns and diverse real-world inspection conditions [19]. To address these limitations, recent efforts have increasingly focused on end-to-end deep learning frameworks that integrate the entire inference pipeline into a unified model [20].

End-to-end object detection models such as YOLOv3 and Faster R-CNN have demonstrated strong performance in real-time applications, achieving mean average precision (mAP) scores ranging from 76.2% to 85.7% [7,21,22]. Moreover, the incorporation of attention mechanisms, including lightweight modules such as the Channel-Spatial Attention Module (CBAM), has been shown to further improve detection performance in complex visual environments [23]. Enhanced architectures such as CSA-MaskC-RCNN [22] and compact neural networks [24] are also being actively explored to improve model generalizability and field applicability [25].

While acknowledging these advancements, the present study employs a conventional CNN-based image classification model rather than an end-to-end detection approach. This decision is primarily due to the characteristics of the dataset used in this study, which lacks spatial annotations such as bounding boxes required for object detection. Instead, the dataset provides only image-level defect labels, making classification-based architectures (e.g., ResNet, VGG) more appropriate for the problem setting.

To this end, we utilized a large-scale sewer inspection image dataset provided by AI-Hub [26], a national open-data platform operated by the National Information Society Agency (NIA) of Korea. AI-Hub offers diverse, high-quality datasets for artificial intelligence training across multiple industries. Previous studies using AI-Hub datasets have reported high classification accuracy in domains such as vehicle identification [27] and fruit quality grading [28], achieving over 98% accuracy with more than 60,000 images. Building on these precedents, our study adopts a substantially larger dataset of approximately 470,000 images to enhance model robustness under operational conditions.

This study offers the following contributions:

Development of a large-scale, image-based sewer defect classification model that outperforms conventional manual inspection methods in terms of accuracy and scalability.
Demonstration of model optimization through hyperparameter tuning, including dropout rate and L2 regularization, resulting in statistically significant performance improvements.
Identification of data quality issues—such as mislabeled and duplicated defect images—as critical factors affecting model performance, supported by a systematic misclassification analysis.
Quantitative evaluation showing that a higher proportion of low-quality images within a dataset negatively impacts model accuracy, thereby providing empirical evidence of the relationship between data quality and model performance.
Proposal of a data curation strategy that involves early-stage filtering of ambiguous or noisy images and expert validation, contributing to the construction of a more reliable and domain-consistent dataset for sewer infrastructure assessment.

These contributions are expected to serve as a foundation for advancing automated, data-driven sewer condition monitoring systems and overcoming limitations associated with labor-intensive visual inspection practices.

In this study, a CNN model was trained using the large-scale sewer pipeline image dataset provided by AI-Hub, resulting in the development of a high-accuracy defect detection model for sewer pipelines. Additionally, the quality of the training data was evaluated based on factors such as mislabeling, duplicate defect images, and image censorship. The impact of data quality on detection performance was analyzed accordingly. Finally, through additional experiments involving varying proportions of low-quality data, the effect of data quality on model performance was quantitatively assessed, leading to the proposal of an effective data construction strategy.

2. Materials and Methods

Figure 1 provides an overview of the proposed sewer pipeline defect detection model developed in this study. A defect detection for sewer pipelines was developed using a CNN-based ResNet50 architecture. ResNet50 is a deep neural network composed of 50 layers, incorporating the concept of residual learning to mitigate the vanishing gradient problem and enhance training performance. ResNet50 has demonstrated superior performance compared to conventional architectures such as AlexNet, ResNet-18, and ResNet-34 in prior studies addressing single-image-based sewer water level estimation tasks [29]. This result highlights the robustness of deeper convolutional neural networks in handling complex and noise-prone sewer CCTV imagery. Based on this evidence, ResNet50 was adopted in the present study as a suitable architecture for image classification under varying quality conditions.

Based on sewer pipeline image data collected from AI-Hub, the images were preprocessed to a resolution of 224 × 224 pixels and used to train a CNN model for classification into four defect categories. The model architecture was built upon ResNet50, and it was designed to output classification results for each category upon receiving test image inputs. The entire process is described in detail in the following subsections.

2.1. Dataset

In this study, internal sewer pipe images from AI-Hub were utilized [26]. The platform provides a vast array of image data applicable to various industrial sectors such as autonomous driving, healthcare, manufacturing, environment, and construction. The dataset consists of approximately 470,000 images, categorized into eight defect types and three non-defect types. For this study, the selected classes for model classification include Normal-In (IN), Joint-Fault (JF), Lateral-Protruding (LP), and Surface Damage (SD), which are among the most frequently occurring defects [30]. These selected categories are illustrated in Figure 2.

High-resolution image data used in CNN training generally provide more detailed information, which can enhance model performance. However, training with high-resolution images also increases computational complexity and leads to longer training times. Typically, CNN models are trained using images ranging in size from 128 × 128 to 256 × 256 pixels [12]. In this study, the images were resized into a square format to match the model’s input requirements, and the resolution was set to 224 × 224 pixels. As shown in Figure 3, 75% of the dataset was used as training data and 25% as validation data. The training dataset consisted of 10,000 images per class, totaling 40,000 images. For model evaluation, a separate test dataset was prepared with 100 images per class, totaling 400 images.

2.2. Computer System

Typically, deep learning models require large datasets for effective training. However, in specialized domains such as sewer pipeline defect detection, obtaining large-scale datasets can be challenging. Utilizing a pretrained model helps address data scarcity, facilitates faster convergence during training, and improves the model’s generalization capability [31]. The model weights were initialized using pretrained weights from the ImageNet dataset [32]. The model was implemented using Python 3.0 and the PyTorch (2.4.1) library. Training was conducted in a computing environment consisting of an Intel(R) Core(TM) i5-13400F CPU, NVIDIA GeForce RTX 4060 GPU, 16 GB RAM, Windows 10, and CUDA 12.6. The original fully connected layer of ResNet50, designed for 1000-class classification, was modified to output predictions for the four selected sewer defect categories in this study.

2.3. Model Evaluation

Model testing was conducted using a pre-prepared set of 400 test images. Each experiment was evaluated primarily based on the model’s accuracy. Accuracy refers to the proportion of correct predictions out of all predictions made and serves as a fundamental metric for assessing the overall performance of the model. Accuracy is calculated using the following formula:

A c c u r a c y = \frac{T N + T P}{T N + T P + F N + F P}

(1)

In the above equation, the terms are defined as follows:

TP (True Positive): The number of defect images correctly classified as defects.

TN (True Negative): The number of normal (non-defect) images correctly classified as normal.

FP (False Positive): The number of normal images incorrectly classified as defects.

FN (False Negative): The number of defect images incorrectly classified as normal.

The classification performance of the final trained model was evaluated using a confusion matrix, which provided detailed insights into class-wise prediction accuracy. To precisely analyze the model’s sensitivity to threshold variations, both the Receiver Operating Characteristic (ROC) curve and the Precision-Recall (PR) curve were examined. The ROC curve is a useful tool for assessing the model’s classification capability, with the Area Under the Curve (AUC) value approaching 1 indicating superior performance. The PR curve visualizes the relationship between precision and recall for different threshold settings, particularly for imbalanced datasets. The average precision (AP), defined as the area under the PR curve, also serves as a key indicator of model effectiveness, where higher values imply better performance. These evaluation metrics were comprehensively employed to assess not only the overall performance of the model but also its threshold-dependent behavior and prediction characteristics [33]. Finally, the misclassified images from the final model were analyzed, and the model’s performance was compared in relation to the quality of the training data.

3. Results and Discussion

3.1. Model Training

During model training, CrossEntropyLoss was employed as the loss function and Softmax as the activation function, both of which are well-suited for multi-class classification tasks. These functions enable optimal predictions by considering the probability distribution across multiple classes [34]. For optimization, Stochastic Gradient Descent (SGD) was utilized with a momentum of 0.9 to accelerate convergence [35]. To adjust the learning rate dynamically, a StepLR scheduler was applied, reducing the learning rate by a factor of 10 every seven epochs. This strategy maintained a rapid learning pace in the early stages while ensuring stable optimization in later phases [36]. Figure 4 visualizes the training process of the model. Training and validation steps were alternated in each epoch, and both loss and accuracy were recorded at every epoch. The final model weights were selected based on the epoch that yielded the highest accuracy on the validation dataset.

3.2. Hyperparameter Tuning

To optimize the performance of the sewer pipeline defect detection model, dropout and L2 regularization were selected as key hyperparameters and experimentally evaluated. These two techniques are well-known regularization methods that improve the generalization performance of neural network models.

Dropout is a regularization technique that probabilistically deactivates a portion of neurons in the network during training, preventing over-reliance on specific neurons and encouraging the learning of diverse features. While a low dropout rate may lead to overfitting, an excessively high rate can hinder the model’s learning capacity [37]. According to [37], dropout rates between 0.2 and 0.5 tend to yield optimal performance. In this study, dropout rates were adjusted within the 0.2 to 0.5 range to identify the optimal value. Additionally, a higher dropout rate of 0.8 was tested for comparative analysis.

L2 regularization is a technique that enhances model generalization by adding a penalty term to the loss function, which constrains the magnitude of the model’s weights and reduces unnecessary complexity [38]. It is typically implemented by including a regularization term in the loss function, as shown below.

L_{t o t a l} = L_{o r i g i n a l} + λ \sum ω^{2}

(2)

Here, λ is the regularization coefficient, a hyperparameter that determines the extent to which the model’s weights are constrained. By appropriately tuning this coefficient, overfitting can be mitigated, thereby improving model performance [38]. When applying L2 regularization to CNN models, λ values in the range of 0.0001 to 0.01 are generally considered appropriate [39]. In this study, the L2 regularization coefficient was adjusted within the range of 0.0001 to 0.1 to identify the optimal value.

Hyperparameters were tuned to optimize the performance of a CNN model for interpreting internal CCTV footage of sewer pipelines. The dropout rate and L2 regularization coefficient were adjusted to analyze their individual and combined effects on model performance. Experiments with varying dropout rates revealed that a rate of 0.2 yielded the highest accuracy at 90.25%. Dropout rates of 0.25, 0.35, and 0.8 maintained a similar level of performance; however, at other rates, the accuracy either showed negligible improvement or declined. This indicates that an optimal dropout rate exists depending on the structure of the CNN model and the size of the dataset.

Given the depth and complexity of the CNN architecture used, an appropriately configured dropout rate can help the model filter out irrelevant noise while effectively learning key features. In general, dropout tends to induce overfitting when too few neurons are involved in training, and underfitting when too many are dropped. Thus, the optimal dropout rate is influenced by the dataset size and the model architecture [37].

Applying dropout to datasets with fewer than 1000 images often led to decreased model performance, whereas datasets with more than 1000 images showed improved results with dropout [37]. However, for datasets exceeding 10,000 images, the effectiveness of dropout tended to diminish as the dataset size increased. The dataset used in this study consisted of 40,000 images, demonstrating a trend consistent with the findings of the previous study. The accuracy achieved for each dropout rate is presented in Table 1.

The experimental results for different L2 regularization coefficients showed that the highest accuracy, 92.75%, was achieved when λ = 0.01. As the λ value increased, model performance gradually declined. Notably, when λ = 0.1, the accuracy dropped significantly to 78.00%. This decline is attributed to excessive regularization, which overly constrained the model’s weights and hindered effective learning. Table 2 presents the visual trends and quantitative results of these experiments.

Ref. [39] found that the optimal L2 regularization coefficient was 0.00037. That study used relatively low-resolution datasets such as CIFAR-10 (32 × 32) and MNIST (32 × 32). In contrast, the dataset used in the present study consisted of high-resolution images (224 × 224), which may have necessitated a higher λ value. Furthermore, the LeNet-5 model used in [39] has approximately 60,000 parameters, while the ResNet50 model employed in this study has around 25.6 million parameters. This substantial difference in model complexity likely contributed to the differing optimal regularization values.

Given that ResNet50 is much deeper and more complex than LeNet-5, an appropriate level of L2 regularization is necessary to improve generalization. However, excessive regularization can limit the model’s representational capacity and ultimately degrade performance. Experiments were conducted by varying the dropout rate while fixing the L2 regularization coefficient at 0.01. The results showed that the highest accuracy (92.75%) was achieved when the dropout rate was set to 0.2. Although increased accuracy was also observed for other dropout rates such as 0.3, 0.35, 0.4, 0.5, and 0.8 after applying L2 regularization, the performance unexpectedly decreased when the dropout rate was set to 0.25 or 0.45.

These findings suggest that while dropout and L2 regularization play complementary roles, certain combinations can negatively impact model performance. L2 regularization constrains the magnitude of the weights, preventing the model from becoming overly complex, whereas dropout randomly disables certain neurons during training to reduce reliance on specific weights. Ref. [39] also reported that the combined application of dropout and L2 regularization is more effective in reducing overfitting compared to applying either technique alone.

The results of this study indicated that the combination of a dropout rate of 0.2 and an L2 regularization coefficient of λ = 0.01 yielded the highest performance. While this outcome is generally consistent with findings from previous studies, it was also observed that certain configurations—specifically, dropout rates of 0.25 and 0.45—led to a decrease in performance when both dropout and L2 regularization were applied simultaneously. Table 3 presents the visual patterns and quantitative results of these experiments.

3.3. Impact of Dataset Quality on Model Performance

The quality of the training dataset plays a critical role in the generalization performance of CNN-based sewer defect detection models. In this section, misclassified images were analyzed based on the final model, which achieved an accuracy of 92.75%, and it was confirmed that low-quality data were included in the training dataset. Additional experiments were conducted by varying the proportion of low-quality data within the training set to evaluate the impact of dataset quality on model performance.

As shown in Figure 5, analysis of the model’s confusion matrix revealed variations in classification accuracy across different classes. The normal (IN) class achieved 100% accuracy, while the surface damage (SD) class showed a high accuracy of 97%. In contrast, the joint failure (JF) and lateral protrusion (LP) classes demonstrated comparatively lower accuracies of 90% and 84%, respectively. Although the model exhibited strong overall performance, these results highlight performance discrepancies across different classes. To quantitatively support these findings, additional analyses were conducted using the ROC and PR curves.

Figure 6 presents the ROC curves for each class, with AUC scores of 1.00 for IN, 0.99 for JF, 0.98 for LP, and 0.97 for SD, indicating that the model effectively distinguishes between classes. In addition, Figure 7 displays the PR curves, where the AP values were 1.00 for IN, 0.93 for JF, 0.95 for LP, and 0.93 for SD. These results demonstrate that the model maintains stable and reliable classification performance across all classes, even under data imbalance conditions, as reflected in consistently high AP values.

To visually analyze the basis of the model’s predictions, Grad-CAM was applied to test images to generate visual explanations. Grad-CAM (Gradient-weighted Class Activation Mapping) is a technique that highlights the regions of an input image where the model focuses when making a prediction, by producing a heatmap of the most salient features [40].

Figure 8 presents an input image (left) alongside its corresponding Grad-CAM visualization (right). In the heatmap, blue regions indicate areas where the model concentrated its attention, while red regions represent areas of lower relevance. The visualization confirms that the model focused on the actual defect areas, suggesting that it successfully learned meaningful visual features relevant to the classification task.

While the overall quantitative evaluation results support the strong performance of the model, misclassification may still occur in certain classes due to complex factors such as multiple overlapping defects. Accordingly, the following section provides an analysis of misclassified images to examine the model’s limitations and further investigates the impact of low-quality images present in the training dataset on model performance.

As shown in Figure 9, the model failed to correctly classify an image that simultaneously contains a lateral protrusion (LP) and a joint failure (JF). Although the image was originally labeled as LP, the model misclassified it as JF. As illustrated by the Grad-CAM visualization in Figure 9, the presence of multiple defects appears to have confused the model, leading it to focus on incorrect features during prediction.

In addition to images containing multiple defects, some images may appear to be of low quality due to motion blur or occlusion of key defect features. However, such criteria are inherently subjective and difficult to quantify consistently. To establish a more objective and quantifiable definition of low-quality data, this study defines low-quality images as those containing two or more defect types within a single frame. Based on this criterion, additional experiments were conducted. Examples of such low-quality images are presented in Figure 10. To assess the defect rate of the training dataset, 4000 images (Set 1) were reviewed, revealing 102 mislabeled or duplicate defect images. The inclusion of these low-quality data in the training process likely contributed to limitations in the model’s accuracy improvement.

An experiment was conducted to analyze the impact of low-quality images in the training dataset on model accuracy (Table 4). Set 1-A consisted of 3898 images, from which 102 identified low-quality images were removed. Set 1-B, on the other hand, was constructed by artificially increasing the number of low-quality images to 250, using additional mislabeled or flawed data from the full dataset.

The training results showed that Set 1-A achieved a 3.5% increase in accuracy compared to the original Set 1, which included low-quality images. The removal of duplicated defect images, censored images, and other low-quality data likely helped the model learn clearer defect patterns, thereby improving classification performance. In contrast, Set 1-B demonstrated a 6.75% decrease in accuracy as the number of low-quality images increased. This suggests that the presence of additional mislabeled data led to greater confusion in identifying defect types, ultimately degrading model performance.

3.4. Model Application

The sewer defect classification model developed in this study can serve as an automated diagnostic tool for practical integration into real-world sewer inspection systems. The deployment of the model in operational settings would involve the following steps:

First, during the data acquisition phase, internal video footage of sewer pipelines is captured using CCTV-equipped drones. Still images are extracted from the footage to serve as the foundational input data that visually represent the condition of the pipe interior.

Second, in the image preprocessing phase, the extracted images are resized and formatted to be compatible with the model’s input specifications.

Third, in the model inference phase, the preprocessed images are fed into the trained model, which classifies each image into one of four categories: intact (IN), joint failure (JF), lateral protrusion (LP), and surface damage (SD). The model outputs the class with the highest predicted probability as the final classification result.

Fourth, during the result interpretation and reporting phase, the classification outcomes are temporally reorganized to detect recurring defect patterns. A diagnostic report is generated, summarizing defect locations, types, and frequencies. This report can subsequently support maintenance prioritization and decision-making processes.

This end-to-end application workflow eliminates the subjectivity of manual inspections and significantly enhances the objectivity and efficiency of sewer condition assessments.

4. Conclusions

In this study, a CNN-based automated defect classification model was proposed using image data provided by AI-Hub, aiming to overcome the limitations of conventional sewer pipeline defect detection methods. The experimental results demonstrated that adjusting dropout rates and L2 regularization coefficients led to some improvement in classification accuracy.

Analysis of misclassified images from the final model revealed instances of mislabeling and multiple overlapping defects. These types of low-quality images were also found in the training dataset, which likely contributed to decreased model accuracy. Comparative analysis showed that as the proportion of low-quality data increased, model accuracy declined. While training with large datasets can improve performance up to a certain level, achieving higher model accuracy requires not only a sufficient volume of data but also high data quality.

Datasets contaminated with mislabeled or poor-quality images are difficult to correct due to their large size. Therefore, removing low-quality data from the initial dataset construction phase is essential for building a reliable dataset for future training. To ensure high-quality dataset development, image collection and labeling should involve validation by sewer inspection experts. Incorporating sufficient quantities of verified data into model training is expected to further enhance the accuracy of defect detection.

The technology for detecting sewer pipeline damage using artificial intelligence models should continue to evolve through ongoing research and development to meet future challenges and demands. The single-image-based classification approach lacks temporal and spatial context, which can lead to misclassification in cases involving multiple overlapping defects or ambiguous defect boundaries. Moreover, the ResNet50-based static classification model has limited capacity to capture and incorporate detailed information about the location and morphology of defects. To address these limitations, future work may consider incorporating video-based models that leverage temporal information or applying end-to-end object detection frameworks capable of localizing and classifying defects simultaneously. In addition, the use of explainable AI (XAI) techniques—such as Grad-CAM—can enhance interpretability and trustworthiness by visualizing the model’s decision-making process.

Author Contributions

Conceptualization, S.J. and D.K.; methodology, software, validation, formal analysis, investigation, resources, data curation, S.J.; writing—original draft preparation, S.J.; writing—review and editing, D.K.; visualization, S.J.; supervision, project administration, funding acquisition, D.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by a grant (D2403015) from the Gyeonggi Technology Development Program funded by Gyeonggi Province.

Data Availability Statement

This research (paper) used datasets from ‘The Open AI Dataset Project (AI-Hub, S. Korea)’. All data information can be accessed through ‘AI-Hub’ (www.aihub.or.kr, accessed on 16 May 2025).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

CCTV	Closed-Circuit Television
CNN	Convolutional Neural Network
PACP	Pipeline Assessment Certification Program
NASSCO	National Association of Sewer Service Companies
AI	Artificial Intelligence
DL	Deep Learning
mAP	Mean average precision
NIA	National Information Society Agency
IN	Normal-In
JF	Joint-Fault
LP	Lateral-Protruding
SD	Surface Damage
ROC	Receiver Operating Characteristic
PR	Precision-Recall
AUC	Area Under the Curve
AP	Average Precision
SGD	Stochastic Gradient Descent
XAI	Explainable Artificial Intelligence

References

Pan, G.; Zheng, Y.; Guo, S.; Lv, Y. Automatic sewer pipe defect semantic segmentation based on improved U-Net. Autom. Constr. 2020, 119, 103383. [Google Scholar] [CrossRef]
Wirahadikusumah, R.; Abraham, D.M.; Iseley, T.; Prasanth, R.K. Assessment technologies for sewer system rehabilitation. Autom. Constr. 1998, 7, 259–270. [Google Scholar] [CrossRef]
Ariaratnam, S.T.; El-Assaly, A.; Yang, Y. Assessment of infrastructure inspection needs using logistic models. J. Infrastruct. Syst. 2001, 7, 160–165. [Google Scholar] [CrossRef]
Gould, S.J.F.; Boulaire, F.A.; Burn, S.; Zhao, X.L.; Kodikara, J.K. Seasonal factors influencing the failure of buried water reticulation pipes. Water Sci. Technol. 2011, 63, 2692–2699. [Google Scholar] [CrossRef]
Khazraeializadeh, S.; Gay, L.F.; Bayat, A. Comparative analysis of sewer physical condition grading protocols for the City of Edmonton. Can. J. Civ. Eng. 2014, 41, 811–818. [Google Scholar] [CrossRef]
Tan, X.; Bao, Y.; Zhang, Q.; Nassif, H.; Chen, G. Strain transfer effect in distributed fiber optic sensors under an arbitrary field. Autom. Constr. 2021, 124, 103597. [Google Scholar] [CrossRef]
Yin, X.; Chen, Y.; Bouferguene, A.; Zaman, H.; Al-Hussein, M.; Kurach, L. A deep learning-based framework for an automated defect detection system for sewer pipes. Autom. Constr. 2020, 109, 102967. [Google Scholar] [CrossRef]
Tan, Y.; Cai, R.; Li, J.; Chen, P.; Wang, M. Automatic detection of sewer defects based on improved You Only Look Once algorithm. Autom. Constr. 2021, 131, 103912. [Google Scholar] [CrossRef]
Liu, Z.; Kleiner, Y. State of the art review of inspection technologies for condition assessment of water pipes. Measurement 2013, 46, 1–15. [Google Scholar] [CrossRef]
Koo, D.-H.; Ariaratnam, S.T. Innovative method for assessment of underground sewer pipe condition. Autom. Constr. 2006, 15, 479–488. [Google Scholar] [CrossRef]
Chae, M.J.; Abraham, D.M. Neuro-fuzzy approaches for sanitary sewer pipeline condition assessment. J. Comput. Civ. Eng. 2001, 15, 4–14. [Google Scholar] [CrossRef]
Kumar, S.S.; Abraham, D.M.; Jahanshahi, M.R.; Iseley, T.; Starr, J. Automated defect classification in sewer closed-circuit television inspections using deep convolutional neural networks. Autom. Constr. 2018, 91, 273–283. [Google Scholar] [CrossRef]
Yang, M.D.; Su, T.C. Automated diagnosis of sewer pipe defects based on machine learning approaches. Expert Syst. Appl. 2008, 35, 1327–1337. [Google Scholar] [CrossRef]
Seo, J.; Han, S.; Lee, S.; Kim, H. Computer vision techniques for construction safety and health monitoring. Adv. Eng. Inform. 2015, 29, 239–251. [Google Scholar] [CrossRef]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
Zhong, B.; Wu, H.; Ding, L.; Love, P.E.D.; Li, H.; Luo, H.; Jiao, L. Mapping computer vision research in construction: Developments, knowledge gaps and implications for research. Autom. Constr. 2019, 107, 102919. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
Khow, Z.J.; Tan, Y.-F.; Karim, H.A.; Rashid, H.A.A. Deep learning for pipeline interior defect classification: A comparative study of polar and Cartesian coordinate representations, In Proceedings of the 2024 IEEE Symposium on Industrial Electronics & Applications, Kristiansand, Norway, 5–8 August 2024.
Li, F.; Xi, Q. DefectNet: Toward fast and effective defect detection. IEEE Trans. Instrum. Meas. 2021, 70, 1–9. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
Kumar, S.S.; Wang, M.; Abraham, D.M.; Jahanshahi, M.R.; Iseley, T.; Cheng, J.C. Deep learning–based automated detection of sewer defects in CCTV videos. J. Comput. Civ. Eng. 2020, 34, 4019047. [Google Scholar] [CrossRef]
Rayhana, R.; Yun, H.; Liu, Z.; Kong, X. Automated defect-detection system for water pipelines based on CCTV inspection videos of autonomous robotic platforms. IEEE/ASME Trans. Mechatron. 2023, 29, 2021–2031. [Google Scholar] [CrossRef]
Huang, J.; Kang, H. Automatic Defect Detection in Sewer Pipe Closed-Circuit Television Images via Improved You Only Look Once Version 5 Object Detection Network. IEEE Access 2024, 12, 92797–92825. [Google Scholar] [CrossRef]
Yi, W.; Chan, W.K.; Lee, H.H.; Boles, S.T.; Zhang, X. An Uncertainty-Aware Deep Learning Model for Reliable Detection of Steel Wire Rope Defects. IEEE Trans. Reliab. 2023, 73, 1187–1201. [Google Scholar] [CrossRef]
Zuo, F.; Liu, J.; Zhang, H.; Chen, Z.; Yan, B.; Wang, L.A. Complex Welding Defect Detection Method Based on Active Learning in Pipeline Transportation System. IEEE Trans. Instrum. Meas. 2025, 74, 5022012. [Google Scholar] [CrossRef]
AI Hub. Available online: https://www.aihub.or.kr/ (accessed on 16 May 2025).
Lee, S.-Y. A study on the influence factors on the activities of voluntary neighborhood watchmen. J. Korean Public Police Secur. Stud. 2021, 18, 171–185. [Google Scholar] [CrossRef]
Kim, M.-K. Automatic fruit grading using stacking ensemble model based on visual and physical features. J. Korea Multimed. Soc. 2022, 25, 1386–1394. [Google Scholar]
Qiu, Q.; Lau, D. Real-Time Detection of Cracks in Tiled Sidewalks Using YOLO-Based Method Applied to Unmanned Aerial Vehicle (UAV) Images. Autom. Constr. 2023, 147, 104745. [Google Scholar] [CrossRef]
Son, J.; Lee, J.; Kim, J.; Oh, J.; Yoon, S. Proposal of CCTV inspection defect item codes through defect frequency analysis of domestic sewer pipelines. J. Korean Soc. Water Wastewater 2016, 30, 623–634. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
Meijer, D.; Scholten, L.; Clemens, F.; Knobbe, A. A defect classification methodology for sewer image sets with convolutional neural networks. Autom. Constr. 2019, 104, 281–298. [Google Scholar] [CrossRef]
Mao, A.; Mohri, M.; Zhong, Y. Cross-entropy loss functions: Theoretical analysis and applications. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
Jelassi, S.; Li, Y. Towards understanding how momentum improves generalization in deep learning. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022. [Google Scholar]
Ge, R.; Kakade, S.M.; Kidambi, R.; Netrapalli, P. The step decay schedule: A near optimal, geometrically decaying learning rate procedure for least squares. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Krogh, A.; Hertz, J. A simple weight decay can improve generalization. Adv. Neural Inf. Process. Syst. 1991, 4, 950–957. [Google Scholar]
Xie, X.; Xie, M.; Moshayedi, A.J.; Noori Skandari, M.H. A hybrid improved neural networks algorithm based on L2 and dropout regularization. Math. Probl. Eng. 2022, 2022, 8220453. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]

Figure 1. Overall workflow.

Figure 2. Selected classification categories for the model: (a) Non-defect—Internal (IN); (b) Joint Fault (JF); (c) Lateral Protrusion (LP); (d) Surface Damage (SD). The blue blur in each image was pre-applied to the original data to obscure textual information such as sewer location and recording time.

Figure 3. Dataset configuration: Train, validation, and test sets.

Figure 4. Accuracy and loss over epochs during model training.

Figure 5. Confusion matrix of the final model (Dropout = 0.2, L2 Regularization = 0.01).

Figure 6. ROC curve of the final model.

Figure 7. PR curve of the final model.

Figure 8. Representative Grad-CAM visualizations for the final model.

Figure 9. An example of an image containing multiple defects (left) and the corresponding Grad-CAM visualization (right). The red circles indicate the locations of multiple defects.

Figure 10. Examples of low-quality images: (a) surface damage (SD) and lateral protrusion (LP) coexisting in a single image; (b) joint failure (JF) and lateral protrusion (LP) coexisting in a single image.

Table 1. Effect of dropout rate on classification accuracy.

Dropout Rate	none	0.2	0.25	0.3	0.35	0.4	0.45	0.5	0.8
Accuracy (%)	88.25	90.25	88.75	88.00	89.50	88.00	88.50	87.50	89.75

Table 2. Impact of L2 regularization parameter (λ) on model performance.

L2 Ratio (λ)	none	0.0001	0.0005	0.001	0.005	0.01	0.05	0.1
Accuracy (%)	90.25	89.25	90.00	89.00	89.50	92.75	84.25	78.00

Table 3. Effect of dropout rate on classification accuracy with L2 regularization (λ = 0.01).

Dropout Ratio	L2 Regularization Ratio	Accuracy (%)
0.20	-	90.25
0.20	0.01	92.75
0.25	-	88.75
0.25	0.01	87.75
0.30	-	88.00
0.30	0.01	89.50
0.35	-	89.50
0.35	0.01	91.25
0.4	-	88.00
0.4	0.01	89.50
0.45	-	88.50
0.45	0.01	88.25
0.5	-	87.50
0.5	0.01	89.25
0.8	-	89.75
0.8	0.01	91.75

Table 4. Mean and standard deviation of train dataset statistics.

Dataset	Normal Image	Low-Quality Image	Low-Quality Rate (%)	Accuracy (%)
Set 1-A	3898	0	0	72.50
Set 1	3898	102	2.5	69.00
Set 1-B	3750	250	6.25	62.25

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jang, S.; Kim, D. Impact of Data Quality on CNN-Based Sewer Defect Detection. Water 2025, 17, 2028. https://doi.org/10.3390/w17132028

AMA Style

Jang S, Kim D. Impact of Data Quality on CNN-Based Sewer Defect Detection. Water. 2025; 17(13):2028. https://doi.org/10.3390/w17132028

Chicago/Turabian Style

Jang, Seokwoo, and Dooil Kim. 2025. "Impact of Data Quality on CNN-Based Sewer Defect Detection" Water 17, no. 13: 2028. https://doi.org/10.3390/w17132028

APA Style

Jang, S., & Kim, D. (2025). Impact of Data Quality on CNN-Based Sewer Defect Detection. Water, 17(13), 2028. https://doi.org/10.3390/w17132028

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Impact of Data Quality on CNN-Based Sewer Defect Detection

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. Computer System

2.3. Model Evaluation

3. Results and Discussion

3.1. Model Training

3.2. Hyperparameter Tuning

3.3. Impact of Dataset Quality on Model Performance

3.4. Model Application

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI