Automating Visual Blockage Classification of Culverts with Deep Learning

Blockage of culverts by transported debris materials is reported as main contributor in originating urban flash floods. Conventional modelling approaches had no success in addressing the problem largely because of unavailability of peak floods hydraulic data and highly non-linear behaviour of debris at culvert. This article explores a new dimension to investigate the issue by proposing the use of Intelligent Video Analytic (IVA) algorithms for extracting blockage related information. Potential of using existing Convolutional Neural Network (CNN) algorithms (i.e., DarkNet53, DenseNet121, InceptionResNetV2, InceptionV3, MobileNet, ResNet50, VGG16, EfficientNetB3, NASNet) is investigated over a custom collected blockage dataset (i.e., Images of Culvert Openings and Blockage (ICOB)) to predict the blockage in a given image. Models were evaluated based on their performance on test dataset (i.e., accuracy, loss, precision, recall, F1-score, Jaccard-Index), Floating Point Operations Per Second (FLOPs) and response times to process a single test instance. From the results, NASNet was reported most efficient in classifying the blockage with the accuracy of 85\%; however, EfficientNetB3 was recommended for the hardware implementation because of its improved response time with accuracy comparable to NASNet (i.e., 83\%). False Negative (FN) instances, False Positive (FP) instances and CNN layers activation suggested that background noise and oversimplified labelling criteria were two contributing factors in degraded performance of existing CNN algorithms.

Project 11: Blockage of Hydraulic Structures [31] was initiated under the Australian Rainfall and Runoff (ARR) [3] framework to study the blockage behaviour and design considerations of hydraulic structures. Under this project, Wollongong City Council (WCC) proposed the guidelines to consider the hydraulic blockage in the hydraulic structures design process [12,13,18,19,21,31]. However, because of the unavailability of relevant supporting data from peak flooding events, proposed guidelines were not adaptive and were based on the post flood visual assessments, which many researchers believe is not the correct representation of blockage during the peak flooding events [11][12][13]. The guidelines suggested that any culvert with an opening diagonal of 6m or more is not prone to blockage. However, this claim was only supported by post flood visual assessments and was not considered economically efficient to implement.
Initially, blockage was defined as the percentage occlusion of hydraulic structure opening, however, many argued that hydraulic blockage and visual blockage are two separate concepts. Hydraulic blockage is more complex and has no established relationship with visual blockage. Hydraulic blockage is associated with the interaction of debris with culvert and corresponding effect on fluid dynamics around culvert, however, due to highly non-linear and uncertain behaviour of debris, it is difficult to model and predict the hydraulic blockage using conventional means. From management and maintenance perspective, making use of multi-dimensional information (i.e., visual blockage status, type of debris material, percentage of blocked openings) extracted using computer vision algorithms may prove helpful in making timely decisions as suggested in literature [2,17]. This paper attempts to address the problem from a different perspective and proposes the use of visual information extracted using automated analysis in better management of blockage at cross drainage hydraulic structures. This paper investigated the potential of CNN algorithms towards classifying culvert images as "clear" or "blocked". Existing CNN models (i.e., DarkNet53 [20], DenseNet121 [16], InceptionResNetV2 [25], InceptionV3 [26], MobileNet [15], ResNet50 [14], VGG16 [24], EfficientNetB3 [27], NASNet [32]) pre-trained over ImageNet were transfer-learned for the culvert blockage classification task and performance was compared based on the standard evaluation measures.

ICOB DATASET
The dataset used for this investigation is referred as "Images of Culvert Openings and Blockage (ICOB)" and consisted of real culverts images collected before and after the flooding events. Main sources of images included WCC historical records, online records and custom captured local culvert images. WCC records were scrutinized using a Microsoft ACCESS based application for filtering the culvert images with visible openings. Final dataset included 929 images of culverts including both blocked and clear. Dataset contained images with high level of variation from each other (intra-class variation) in terms of culvert types, blockage accumulation, presence of debris materials, illumination conditions, culvert view point variations, scale variations, resolution, and backgrounds. This high level of diversity within a relatively small dataset makes it a challenging dataset for visual analytic, even for a binary classification problem.
ICOB dataset was manually labelled for binary classification of a given image with culvert as "clear" or "blocked". A culvert being visually blocked or clear is not as simple and may require defining a detailed criteria in collaboration with flood management officers; however, for this article, simple occlusion based criteria was used. Following subjective annotation criteria was used for labelling ICOB.
• If all of the culvert openings are visible, classify it as "clear".
• If any of the culvert opening is visually occluded by debris material or foreground object (e.g., debris control structure, vegetation, tree), classify it as "blocked".
In total, there were 487 images in "clear" class and 442 images in "blocked " class. Figure 1 shows the sample instances from each class of ICOB. Performance of the models was measured in terms of their test accuracy, test loss, precision score, recall score, F1 score, Jaccard-Index, and processing times. In addition, confusion matrices were plotted to assess the Type I and Type II errors. Type I (False Positive (FP)) and Type II (False Negative (FN)) errors [4] are commonly used terms in machine learning and main goal of model is to minimize one of these two errors, depending on context that which error is more critical in the given task. By definition, a Type I error is concluding the existence of a relationship while in fact it does not exist (e.g., classifying an image as "blocked" while there is no blockage). Similarly, a Type II error is the rejection of the existence of relationship while in fact it exists (e.g., classifying an image as "clear" while there is blockage). For the given culvert blockage context, Type II error is more critical to be minimized in comparison to Type I error because having notified as blocked while there is no blockage is tolerable in comparison to having notified as clear while there is blockage. Type II error will result in damages because it may be very late for response team to clear the blockage before diversion of flow.

RESULTS AND DISCUSSIONS
Implemented CNN models were evaluated as per defined measures in Section 3 and results were compared. Table 1 presents the empirical results of all implemented models when evaluated for test dataset in terms of accuracy, loss, precision, recall, F1 score and   Jaccard-Index. From the results, NASNet was reported as the best among all others with F1 score of 0.85. EfficientNetB3 and Incep-tionResNetV2 were reported as the second best with relatively same performance (F1 score of 0.83). DarkNet preformed worst with the F1 score of 0.71. Figure 2 shows the confusion matrices for the implemented CNN model to observe the Type I and Type II errors. From the figures, it can be observed that NASNet performed best in terms of lowest Type II error of only 8%, however, Type I error was reported 21%. On the other hand, EfficientNetB3 was reported with almost similar Implemented CNN models were also compared for their processing times to investigate the relative response times. Purpose of these analysis was to investigate the hardware implementability of proposed models for real-world applications. Model inference time and image processing time were calculated as two measures to compare the models. Three different size images were used; image 1 of 2048 × 1536, image 2 of 3264 × 2448 and image 3 of 4032 × 3024. From the Table 2, it can be observed that MobileNet and DarkNet53 were fastest among others; however, were least accurate in this case. NASNet model was the slowest but most accurate in performance. As a trade-off, EfficientNetB3 model was relatively fast with accuracy towards higher end and recommended as a suitable choice to implement for on-board processing. It is important to mention that reported processing times are for relative comparison between models and not the actual measure of cutting edge hardware performance. However, given the availability of efficient computing hardware such Nvidia Jetson TX2 [9] and Nvidia Jetson Nano [7], it is highly probable to implement any of the implemented models for real-world applications (e.g., pedestrian detection [5], wildlife tracking [1]).

CONCLUSION AND FUTURE DIRECTIONS
Idea of using visual analytic for the culvert blockage analysis has been successfully pitched by implementing existing CNN models for culvert blockage classification. Images of Culvert Openings and Blockage (ICOB) dataset has been developed with diversity of clear and blocked culvert instances for training the CNN models. From the analysis, it has been observed that NASNet model performed best among all in terms of classification performance, however, was the slowest in relative comparison of processing times. Based on the classification performance and processing times, EfficientNetB3 was recommended model to be deployed for real-world application. From the FP and FN instances, background noise and oversimplified labelling criteria were found potential factors for degraded performance. A visual attention based algorithm and/or detectionclassification pipeline are the concepts that can be implemented to address the background noise problem. Furthermore, enhancement of dataset by injecting scaled physical model and computer generated synthetic images are potential future directions.
ACKNOWLEDGMENT I would like to thank WCC for providing resources and support to carryout this study. Furthermore, I would like to thank University of Wollongong (UOW) and Higher Education Commission (HEC) of Pakistan for funding my PhD studies.