A sewer system is a hidden but very expensive type of infrastructure to maintain [1
]. Breakdown of a sewer can result in significant damage to roads and buildings. Furthermore, reduced functionality of the sewers can lead to flooding and exfiltration, for example, which can affect a number of externalities, such as property, traffic disruption, public health and the environment [1
]. For these reasons, the sewers’ operators need to replace them in a timely manner, especially if the sewers are critical. However, sewers’ underground location makes them difficult to monitor. Today, monitoring of the sewers is typically done by Closed Circuit Television inspection (CCTV inspection) [1
]. CCTV inspection is done by manually sending a TV-inspection robot into the sewer and annotating all observations. As this is very time consuming, expensive, and imprecise due to a number of subjective factors [3
], much research has been put into automating these processes [1
]. However, full automatization of sewer inspection is not imminent. The high costs associated with CCTV-inspection forces utilities to prioritize which sewers to inspect. In Denmark, the paradigm for risk-based rehabilitation has been based on area. The areas which should be subject to CCTV-inspection were prioritized based on age of the pipes and the experience of the operators. Based on the findings in the CCTV-inspections, it was chosen whether an area should be rehabilitated or not. This has resulted in rehabilitation of pipes which could have been operational for several years as the inspection showed that the pipe might not be operational for the whole period until the next time the area would be chosen for rehabilitation. Today, for economical optimization and better use of the pipes lift time, there is a trend toward risk based CCTV-inspection planning and rehabilitation on a pipe level.
Maintenance of sewer systems on pipe level entails new requirements for computer systems to keep track of the individual pipes, as the utilities now need to keep track of several tens of thousands of pipes instead of a limited number of areas. To assist the utilities in choosing which sewers to inspect, several decision support systems have been developed [6
]. Usually these systems are risk models, consisting of a deterioration model and a consequence model. The deterioration models predict the condition of the sewers or the likelihood of a sewer’s condition. The consequence models describe the severity of a potential sewer failure and can include economic, environmental, and social consequences [1
]. Generally, the deterioration models suffer from low accuracy.
Development of sewer deterioration models is complicated by a high uncertainty in the data. This uncertainty is influenced, among other things, by subjectivity in the annotation of CCTV inspections, lack of data, and subjective selection of which pipes to inspect [3
]. Dirksen et al. [4
] found that defects with distinct features like roots were easy to find, while the probability of getting a false negative for other defect types varied around 0.25. The probability of a false positive was found to be around 0.04 [4
]. Another issue often affecting the deterioration datasets is a lack of information [1
], which results in low quality data. Furthermore, the datasets are affected by the fact that they have typically been collected for a specific purpose, such as quality assurance before asset handover or road renovation, diagnosis of malfunctioning and random inspections. This introduces a selective survival bias in the data [1
]. Other factors that complicate deterioration modelling are that the datasets in general are highly skewed, both according to the number of pipes in the different classes and according to the predictor variables [10
]. Furthermore, the size of the natural variability between sewers is unknown.
A large number of deterioration models have been developed; however, a lack of publicly available datasets due to privacy issues makes it difficult to compare the models [5
]. Furthermore, the condition state (CS) is typically based on the local standard for CCTV inspection, which can be based on, for example, the European standard [13
], Pipeline Assessment Certification Program [14
], or a country specific standard [12
]. Moreover, in order to evaluate the deterioration models many authors tend to classify the multiclass or regression problem as a binary problem [12
]. However, the model performance is very sensitive to how picky the evaluation is designed to be. For example, the performance of the precision and recall will increase if considering both pipes in the worst CS and the second worst CS as bad pipes, compared to considering only pipes in the worst CS to be in bad condition.
In addition, when deciding how to define the target variable the developer of the deterioration model needs to decide which predictor variables to use. Several methods have previously been used for parameter selection and the feature importance test. O’Reilly [21
] in 1989 investigated the correlation between defects and individual parameters such as age, material, diameter, location, depth, wastewater type, soil type etc. in 180 km of sewers. Hansen et al. [11
] investigated the potential benefits of developing deterioration models based on data groups defined by experts but found no improvement in model performance. Yin et al. [22
] used a backward variable elimination process, through which they removed a parameter at a time and examined how the performance changed. Davies et al. [23
] used a backward selection method and Laakso et al. [13
] used the Boruta algorithm and found eight features to be influential.
Carvalho et al. [10
] used eight different methods to investigate the feature importance and found that the different methods showed very different results. For example, if analyzing the features by removing the most significant features step by step, the importance of the other features will change, as there is often redundancy in the signal from the different predictor variables. This is not encountered when using the build-in feature analysis in Random Forest [10
], however. Due to the uncertainty in the data, Roghani et al. [3
] found that using the two or three most informative predictor variables was sufficient to build the deterioration model. However, using a deterioration model was better than just basing it on the inspection age.
Mohammadi et al. [24
] reviewed 24 statistical and AI based papers on sewer deterioration. Nineteen of the reviewed papers provided information on whether a parameter was relevant. Nineteen features were considered, and none of the features were used in all the papers. Furthermore, none of the features considered relevant in more than three of the papers were considered relevant in all the papers. This illustrates a high variability in feature importance. Likewise, none of the features whose significance level was specified in more than one case were irrelevant in all the studies they were used in [24
]. Finding the most significant features is important as accessing, extracting, and preprocessing each feature is very time demanding. In a review of deterioration models Hawari et al. [25
] concluded that more work needs to be done to identify which data municipalities should collect in order to develop reliable deterioration models [25
As described above, the performance of the deterioration models is affected by many conditions and a number of choices needs to be made for each model. This makes it possible to develop well performing models within academia. However, to create value, the models must meet the utilities’ needs. For example, Guzmán-Fierro et al. [26
] worked with a target variable ranging from 1 to 5 but developed a model that encountered only the pipes in CS 1 and CS 5. In reality, it is not possible to leave out the pipes in between, at least during the preliminary inspection.
In summary, sewer deterioration modeling has been a hot topic for the last two decades and myriad factors influence the performance of the models. Finding the optimal model cannot necessarily be done by selecting the model with the highest performance according to the literature. Likewise, there is a great deal of disagreement about which predictor variables are significant. The existing sewer deterioration models presented in the literature are characterized by large deviations in data, methodology, etc. Today researchers tend to perform feature analyses on single datasets. However, a rarely touched perspective is the statistical variation in the features influencing the results when using similar datasets.
The contributions of this study are investigations of:
The overall feature importance in a dataset containing information from several different utilities, including identification of potential drawbacks
How the performance and feature importance of the models are affected by how the model developer has distinguished between good and bad pipes
How the feature importance varies between utilities when the parameters in the datasets have been found in the same way for all utilities.
To the best of the authors’ knowledge, this study provides the most comprehensive analysis of feature importance in sewer deterioration modeling and the first investigation of feature importance across several utilities with similar data bases. This information adds value to the process of developing deterioration models for utilities, which have a limited budget.
The following section of the paper, Section 2
, provides a description of the data available, preprocessing, model selection, and the method used for feature importance. Section 3
contains three subsections, one for each of the contributions, while Section 4
contains a discussion of the key findings and comparisons to the literature. Section 5
contains a summary of the most important conclusions covered by the paper.
Sewer deterioration modeling is complicated by several influencing factors. In this section the most prominent factors influencing the results are discussed, and the results are compared to previous findings in the literature.
4.1. Representativeness of Data
The results from the baseline experiment underlined the challenges of using historical data for sewer deterioration modeling, as the CCTV-inspections generally have been performed with a specific purpose, introducing a selective survival bias in the data [1
]. However, as the datasets are comprehensive, most utilities do not have the finances to create a new dataset. Instead, the model developers must account for this by excluding the features in which the bias is most prominent, such as features related to geographical position. In the long term, utilities should include some spatial randomness in their strategy for CCTV-inspection.
4.2. Definition of Target Variable
Lack of publicly available data [5
], numerous different standards for CCTV-inspections and different methods for evaluation of sewer deterioration models complicate the comparison of deterioration models. This also applies to the performance obtained in experiment two, where the f1-score drops from 0.73 to 0.35 when solely considering pipes in CS four as being in bad condition instead of considering pipes in both CS three and four. However, although the performance was affected, there was a high correlation in the predictor variables relevant for prediction of pipes in CS four and pipes in either CS three or four, which indicates that it is fair to make a binary evaluation of the feature importance.
Today CCTV inspections are performed by an operator who manually annotate the observations found in the sewers according to a given standard for tv inspections. These observations are often transformed into a general measure of the sewers condition. This condition measure can either be based on general standards or they can be utility specific. The benefit of utilizing the general standards are increased comparability between utilities whereas the benefit of utilizing a utility specific performance measure is that it can be adjusted to prioritize the types of defects relevant for the utility. For instance, a utility with limited capacity at the wastewater treatment plant might increase weight on infiltration. Weighting some defects higher can cause the features related to these defects to become more important in a feature analysis. In the CS used in this study a higher weight has been put on attached deposit and infiltration according to other observation types as shown in Table 2
. This is consistent with the results showing a high importance of the relative groundwater level. As the groundwater maps available for this study were based on measurements every 500 m, the actual groundwater level can change significantly between the data points. It is likely that the ground level can compensate for these changes, which will induce a higher weight on this feature in the feature analysis.
4.3. Size of Datasets
When considering pipes in both CS three and CS four as being in bad condition, the performance and the number of features relevant staggered for datasets with more than 6000 pipes in bad condition. In the initial analysis only pipes in CS four were considered bad. In that analysis the performance and number of relevant features staggered for datasets with more than 1000–1500 pipes in bad condition. This indicates that the number of bad pipes required for optimal performance is correlated with how the target variable is defined and the total number of pipes inspected.
Furthermore, it is worth noticing that if solely considering the utilities with more than 1000 bad pipes, there is more consensus on which features contribute to the performance. For ground level the percentage of time it is found to be relevant increases from 69% to 78%. Similar tendencies are present for age (67% to 71%) and relative groundwater level (65% to 73%). A full overview is presented in Table 5
4.4. Irregularities in the Step Analysis
For some utilities, the performance improved when predictor variables were removed, indicating overfitting of the model. This was clearest for utility 10, which is also the utility with the smallest amount of training data. For datasets with more than 10,000 pipes, the tendency could still be observed in some cases after cleaning but removing features did not lead to an increase in performance of more than two to three percent.
In a few cases, the performance suddenly increased when removing one parameter. This could not be explained by stochasticity in the performance or overfitting. An example of this can be seen in Figure 5
b. This is most likely because some predictor variables perform well when combined but introduce noise when considered individually.
4.5. Comparison to the Literature
Mohammadi et al. [24
] reviewed 24 papers, of which 19 had investigated which features were significant. In Table 6
the results of this study are compared to the findings by Mohammadi et al.
In the review by Mohammadi, there is a higher consensus about which predictor variables are significant. The most probable reason for this is that Mohammadi et al. reviewed studies whose authors selected a number of predictor variables. For example, four of the papers investigated between two and eight predictor variables and did not find any insignificant variables. In general, there is a consensus that length, age, dimension, ground water, and wastewater type are often important predictor variables. However, the model developer should consider the specific case when selecting predictor variables as there is no “gold standard”.
4.6. CCTV-Inspection Planning
The still increasing access to pipe specific data and the increasing awareness of the benefits related to risk based pipe inspection and rehabilitation on pipe level are essentials when optimizing the management of sewer systems to save costs and resources. Sewer deterioration modeling is an essential element in this; however, the scientific literature dealing with the underlaying parameters influencing the deterioration models is sparse. The findings of this study enlighten some of these shortcomings, and the findings can be incorporated in future model development.
Generally, deterioration models can be used to give a snapshot of the sewer system and is used when no CCTV-inspection has been made or when the CCTV inspection is outdated. Typically, the deterioration models are based on datasets which have been collected over several years. Therefore, users of deterioration models should be aware that the predictions of the CSs are evaluated on historical data and thereby cannot give a fair prediction of future condition states. For example, plastic pipes were rarely used 50 years ago, and plastic pipes older than 50 years have limited representation in the data. Furthermore, the surrounding environment, material quality etc. change over time. Future predictions of CSs are further complicated by variations in the degradation profile of different defect types. Some defect types occur stochastically and do not degrade over time such as defects related to pipe connections or installation of the pipes. Other defects degrade over time such as surface damage. Surface damage is often seen in concrete pipes due to the presence of hydrogen sulphide which erode the surface over time. Hydrogen sulphide is typically formed in pump pipes. Likewise, the degradation profile for defects related to roots in the pipes depends on the surrounding trees and their growth.
The primary contribution of this paper is a comprehensive analysis of the feature importance in sewer deterioration modeling. The paper addresses factors that influence sewer deterioration modeling and acknowledges weak or missing information in the literature, such as handling of biased datasets, the impact of how bad pipes are defined, and the variations in feature importance between utilities.
Deterioration models are usually based on CCTV-inspections performed over several years with a specific purpose in mind. This is problematic due to a selective survival bias in the data whereby the models do not perform as well on noninspected areas as they do on inspected areas. Ideally the datasets should be random in character, but due to economic constraints this is often infeasible. Instead, model developers should avoid utilization of geographically related parameters. Moreover, utilities should include randomness in their strategy for CCTV inspection.
Changing the definition of when a pipe is in bad condition produced large deviations in model performance. However, in the feature analysis it was the same features that contributed to the performance, although more features contributed when both pipes in CS three and four were considered bad than when only pipes in CS four were considered bad. This indicates that it is fair to use an advantageous split between good and bad pipes when making a feature analysis.
Comparison of feature analysis from 33 different utilities showed a relatively high variance in the number of features contributing to the performance, which features contributed, and the performance obtained by the models. These variations were especially high for utilities with fewer than 6000 pipes in bad condition. It is worth noting that the number of bad pipes depends on the definition of bad pipes. When solely considering pipes in CS four as “bad”, the high variations were primarily present for utilities with fewer than 1000–1500 pipes in bad condition.
No feature was considered relevant in more than 69% of the utility specific models; however, when only considering utilities with more than 1000 bad pipes there was a higher consensus on which features were relevant (up to 78%). For these utilities, the features that contributed to the performance most of the time were ground level (78%), age (71%), groundwater level (73%), wastewater type (61%), length (57%), dimension (57%), year of construction (46%), and year of rehabilitation (39%). As there is a high redundancy between year of construction, year of rehabilitation, and age, removing one of these as a possible predictor variable would most likely induce the others to contribute to the performance in more cases. In 26 out of 33 cases the most important feature was related to either age, year of construction, or year of rehabilitation. On average 6.5 features contributed to the utility specific models.
The overall trends in feature importance found in this work showed consensus with the findings in a review by Mohammadi et al. [24
]; however, due to variations in study design of the articles reviewed by Mohammadi et al. the two papers are not comparable on a detailed level.
The added value of this paper is a better understanding of the underlying parameters influencing sewer deterioration modeling and knowledge of feature importance when encountering the statistical variations between utilities. The exact results related to feature importance are specific to the condition measure used in the study, however, the overall trends are comparable to findings in the literature and can be used to assist the feature selection for sewer deterioration modeling, which is important because feature extraction is a labor intensive process.