AlleyFloodNet: A Ground-Level Image Dataset for Rapid Flood Detection in Economically and Flood-Vulnerable Areas

Lee, Ook; Joo, Hanseon

doi:10.3390/electronics14102082

Open AccessArticle

AlleyFloodNet: A Ground-Level Image Dataset for Rapid Flood Detection in Economically and Flood-Vulnerable Areas

by

Ook Lee

^* and

Hanseon Joo

^*

Department of Information System, Hanyang University, Seoul 04763, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Electronics 2025, 14(10), 2082; https://doi.org/10.3390/electronics14102082

Submission received: 23 April 2025 / Revised: 20 May 2025 / Accepted: 20 May 2025 / Published: 21 May 2025

(This article belongs to the Special Issue Advanced Edge Intelligence in Smart Environments)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Urban flooding in economically and environmentally vulnerable areas—such as alleyways, lowlands, and semi-basement residences—poses serious threats. Previous studies on flood detection have largely relied on aerial or satellite-based imagery. While some studies used ground-level images, datasets capturing localized flooding in economically vulnerable urban areas remain limited. To address this, we constructed AlleyFloodNet, a dataset designed for rapid flood detection in flood-vulnerable urban areas, with ground-level images collected from diverse regions worldwide. In particular, this dataset includes data from flood-vulnerable urban areas under diverse realistic conditions, such as varying water levels, colors, and lighting. By fine-tuning several deep learning models on AlleyFloodNet, the ConvNeXt-Large model achieved excellent performance, with an accuracy of 96.56%, precision of 95.45%, recall of 97.67%, and an F1 score of 96.55%. Comparative experiments with existing ground-level image datasets confirmed that datasets specifically designed for economically and flood-vulnerable urban areas, like AlleyFloodNet, are more effective for detecting floods in these regions. By successfully fine-tuning deep learning models, AlleyFloodNet not only addresses the limitations of existing flood monitoring datasets but also provides foundational resources for developing practical, real-time flood detection and alert systems for urban populations vulnerable to flooding.

Keywords:

flood detection; image classification; computer vision; ground-level imagery

1. Introduction

Globally, flood damage caused by heavy rainfall has been increasing significantly due to climate change [1,2,3,4]. Areas such as semi-basement residential zones, narrow alleyways, underpasses, and lowlands with inadequate drainage infrastructure are particularly vulnerable to flooding, often inhabited by economically vulnerable populations, leading to substantial financial losses and challenging recovery efforts [5,6,7]. They are exposed to greater risks, as they face relative difficulties in emergency evacuation and have limited capacity to respond swiftly during flood events [8,9].

In 2022, unusually heavy rainfall in Seoul, South Korea, flooded semi-basement residential areas, resulting in the deaths of residents who were unable to evacuate in time. In 2023 as well, the underground roadway in Osong, South Korea, was flooded, resulting in numerous casualties. In 2024, a flash flood also occurred in Valencia, Spain, resulting in multiple fatalities reported in narrow alleyways. These cases indicate that the risks of heavy rainfall and flooding induced by climate change are steadily increasing on a global scale, and that existing disaster prevention systems are insufficient to effectively prevent or respond to all flood-related damages [10,11].

To effectively respond to these emerging risks, it is critical to develop detection methods tailored specifically to flood-vulnerable urban areas. However, research focusing on these areas remains limited. Thus far, research on deep learning-based flood detection and prediction has primarily focused on data from areas near rivers or large-scale inundation zones [12,13]. Moreover, most existing studies have relied on data collected from a distance or on aerial data obtained through unmanned aerial vehicles (UAVs) [14,15,16]. While such approaches may be effective in detecting large-scale river flooding or widespread inundation, they have limitations in accurately identifying localized flood-vulnerable areas within urban environments, such as narrow alleyways or semi-basement residential zones [16,17]. Although some studies have utilized ground-level imagery to detect floods, these images were not specifically tailored to economically and flood-vulnerable areas such as narrow alleyways or semi-basement residences making it difficult to effectively detect floods in economically and flood-vulnerable areas [17,18]. In particular, while UAV-based data can provide high-resolution imagery, it is often difficult to achieve immediate detection and response in rapidly evolving flood situations [19]. Therefore, there is a growing need for data collection and analysis specifically focused on flood-vulnerable areas.

Although damage in urban flood-vulnerable areas is increasing, previous studies have not focused sufficiently on promptly detecting floods in these areas to prevent casualties and property damage. Most studies have relied primarily on remote imagery, and even those using ground-level imagery have rarely targeted economically and flood-vulnerable urban areas. The development of customized flood detection and response systems is essential for addressing the needs of flood-vulnerable areas. To address this issue, this study constructs a specialized ground-level image dataset, AlleyFloodNet, designed to accurately predict and alert flood impacts in flood-vulnerable areas such as narrow alleyways, lowlands, and semi-basement residential spaces. Unlike previous studies, this research specifically focuses on ground-level image analysis tailored to flood-vulnerable areas such as narrow alleyways and semi-basement residences. The primary objective of this study is to overcome the limitations of prior research and enable practical responses in real-world disaster situations by fine-tuning various deep learning algorithms specifically to this domain using the AlleyFloodNet dataset.

In this study, various deep learning-based image classification models collected from Google and YouTube are fine-tuned using the constructed dataset to specifically adapt each model to accurately classify ground-level imagery from flood-vulnerable urban areas. The performance of the models is evaluated using metrics such as accuracy, precision, recall, and F1 score. Additionally, to further validate the effectiveness and necessity of AlleyFloodNet, this study also utilizes another publicly available ground-level flood classification dataset obtained from Kaggle [20]. Models fine-tuned using this general ground-level imagery dataset—which is not specifically tailored to economically and flood-vulnerable areas—are evaluated on AlleyFloodNet’s test set to emphasize the need for specialized datasets. Misclassifications are also explored to better understand challenging conditions in flood-vulnerable areas, and to identify specific visual features that cause detection errors, which can guide future dataset improvements and algorithm refinement. Through these experiments, this study aims to establish the effectiveness and necessity of AlleyFloodNet, ultimately contributing to the development of reliable and real-time flood detection and alert systems for protecting lives and property in economically and flood-vulnerable urban areas.

The rest of this paper is structured as follows. Section 2 (Related Works) systematically analyzes existing flood detection studies from two aspects: dataset construction methods and detection techniques. Section 3 (Materials and Methods) describes the design purpose and criteria of the AlleyFloodNet dataset, data preprocessing methods, and the architectures of various deep learning models used in this study. Section 4 (Experimental Setup) details the experimental environment, hyperparameter configurations, performance evaluation methods, comparative experiment methods with general ground-level image datasets, and misclassification analysis methods. Section 5 (Results) presents quantitative evaluation results of the models trained on AlleyFloodNet, comparison results with another dataset, and results of the misclassification analysis. Section 6 (Discussion) interprets the main findings of this research, including the superior performance of AlleyFloodNet and the necessity of viewpoint-specific datasets, and thoroughly discusses the advantages and significance of AlleyFloodNet, limitations and ways to address them, and directions for future work. Finally, Section 7 (Conclusions) summarizes the key findings, clarifies the significance and limitations of this study, and suggests directions for future research. The overall workflow of this study is summarized in Figure 1.

2. Related Works

2.1. Dataset-Based Research

The construction of high-quality datasets for flood detection is a prerequisite for reliable analysis, and many researchers have attempted to overcome the limitations of existing flood datasets. For example, Rahnemoonfar et al. pointed out that traditional flood datasets mainly rely on low-resolution satellite imagery with infrequent updates, making rapid damage assessment difficult [16]. To address this, they utilized high-resolution UAV imagery. As a result, they developed a dedicated dataset named FloodNet, which includes detailed imagery of otherwise inaccessible regions and provides pixel-level annotations to distinguish flooded from non-flooded areas, enabling rapid and fine-grained scene analysis. Manaf et al. created a large-scale flood image dataset by integrating multiple benchmark datasets and collecting additional images from the web [21]. Their experiments showed that lightweight CNN models such as MobileNet and Xception outperformed ResNet-50, VGG-16, and Inception-v3, achieving up to 98% accuracy and a 92% F1 score. Karanjit et al. developed FloodIMG, a flood image database collected from various sources including Google searches, social media, traffic cameras, and the USGS [22]. Some studies also collected specialized image data such as UAV aerial images or urban underpass scenes to be used as training data for model development [23,24,25,26]. However, most existing datasets are focused on large-scale river flooding or general urban environments, lacking specialized data for vulnerable urban areas such as alleyways and semi-basement residences. The AlleyFloodNet proposed in this study aims to address that need.

2.2. Deep Learning-Based Detection Techniques and Methodologies Research

A range of deep learning models have been proposed for flood detection, each employing different architectures and techniques depending on the image type and application context. Munawar et al. developed a UAV-based flood detection system by first applying Haar cascade classifiers to identify buildings and roads, and then using these features to train a deep learning model for classifying flood presence [23]. Stateczny et al. proposed a hybrid deep learning model that combines CNN and ResNet architectures. Prior to model training, they applied image preprocessing techniques including K-means clustering and vegetation index calculation to enhance feature quality. Model performance was further improved by optimizing weights through the CHHSSO metaheuristic algorithm [24]. In another study, Munawar et al. extracted 2150 image patches from UAV imagery taken before and after flooding events, and trained a CNN to recognize spatial flood patterns for automated flood assessment [25].

From a model architecture perspective, recent studies have implemented specialized deep learning techniques to enhance flood detection in complex environments, focusing on segmentation accuracy, object-level understanding, and robustness under poor visual conditions. Yoo et al. constructed a U-Net–based segmentation model tailored for identifying flood regions in urban underpasses. The model achieved high accuracy in real-world conditions, demonstrating the effectiveness of fully convolutional architectures in dense urban environments [26]. Zhong et al. implemented an object detection–based flood recognition approach using YOLOv4. By targeting partially submerged features such as vehicle exhaust pipes and pedestrians’ legs, the model inferred inundation status without requiring additional sensors [27]. Vo et al. applied a deep learning–based visual recognition model to detect structural cues of flood vulnerability, such as semi-basement windows or entrances, from urban building facades [28]. Witherow et al. developed a preprocessing-enhanced CNN pipeline that addressed visual noise such as poor lighting, reflection, and occlusions. Their system used edge detection, inpainting, and vehicle removal via R-CNN to extract flood regions with higher accuracy [29]. Zeng et al. enhanced semantic segmentation performance in low-light video data by integrating SRGAN-generated super-resolution images into a DeepLabv3+ framework. This led to superior accuracy in CCTV-based flood detection compared to standard models [30].

This study does not merely propose a new dataset, but establishes AlleyFloodNet as a platform for adapting and revalidating deep learning models in the context of structurally complex and socially vulnerable urban environments. Through fine-tuning high-performing architectures such as ConvNeXt, Vision Transformer, and others, the study demonstrates how existing deep learning methods can be extended to address visual ambiguities, constrained spaces, and non-standard perspectives typical of urban flood-vulnerable areas. AlleyFloodNet thus serves not only as a domain-specific benchmark dataset, but also as a robust experimental framework for advancing flood detection technologies that are resilient to the complexities of real-world urban flooding scenarios.

3. Materials and Methods

3.1. Data Development and Preprocessing

In this study, we developed an original dataset named AlleyFloodNet, composed of flood and non-flood images, to identify the occurrence of flooding using a binary classification approach. Considering that existing datasets mainly consist of either aerial images or general ground-level images not specifically captured in economically vulnerable urban environments, we collected ground-level photographs specifically targeting narrow alleyways, semi-basement residences, and lowlands, areas particularly prone to severe flood damage yet underrepresented in current datasets. The dataset consists of images collected from economically and flood-vulnerable urban areas around the world, including South Korea, Japan, China, Vietnam, the United States, Spain, Italy, and India. These images were captured at close-range and low camera angles via CCTV and smartphones, obtained from publicly accessible platforms such as Google and YouTube.

AlleyFloodNet was developed by taking into account various types of floods that occur in flood-vulnerable areas. Floods can take different forms, such as rapidly flowing and rising water or stagnant and gradually rising water. Depending on the form of the flood, the color of the water can also vary—appearing brown due to mixed soil or remaining relatively clear. Furthermore, the visual characteristics of the water change according to the time of day, as the amount of light varies, and floods tend to cause more damage under lower lighting conditions. To account for the diverse characteristics of floods mentioned earlier, AlleyFloodNet was developed to classify flood situations effectively across a wide range of conditions, including flood types, water color, and visual changes over time.

AlleyFloodNet was developed by collecting not only photographs of flooding situations taken in alleys and lowlands, but also images captured under non-flooding conditions, to perform binary classification of flooding versus non-flooding. Compared to existing ground-level datasets—which are typically general in nature and include images from various distances and angles—AlleyFloodNet explicitly focuses on economically vulnerable urban areas and contains images captured from close proximity at lower angles, enabling precise detection of flooding in highly localized urban environments such as narrow alleys and semi-basement residences. This allows the deep learning models to quickly detect situations in which water rapidly accumulates, and to accurately assess the actual risk of flooding rather than simply determining the presence of rainfall. Moreover, both flooding and non-flooding data include various objects such as people and vehicles, allowing the model to be trained effectively under diverse environmental conditions. Based on this design, AlleyFloodNet is structured to support effective model training in various flood-prone areas and is expected to contribute to flooding detection in vulnerable regions.

A total of 1110 images were collected for use in this study. The size of AlleyFloodNet was determined with reference to the training set size (637 images) of FloodNet Track1, a well-known dataset for flooding and non-flooding classification. All images were preprocessed using the ImageNet mean and standard deviation and resized to 224 × 224 pixels. The entire dataset was evenly divided across classes into a training set and a test set at a ratio of 8:2. Additionally, the training set was further split into a training and validation set at a ratio of 8:2, maintaining class balance. As a result, the dataset was divided as follows: the training set consisted of 468 flooding and 380 non-flooding images; the test set included 133 flooding and 129 non-flooding images; and the validation set contained 79 flooding and 90 non-flooding images.

Figure 2 shows example images from the flood class, while Figure 3 presents example images from the non-flood class in the AlleyFloodNet dataset. In this study, ground-level images retrieved by searching for ‘flood’ on Google and YouTube were manually reviewed and labeled. Specifically, images were classified as ‘flood’ if major structures, roads, vehicles, or people were clearly submerged or structural elements were visibly inundated by water. Conversely, images showing merely wet ground surfaces or heavy rainfall without noticeable water accumulation or visible inundation were classified as ‘non-flood’. Through this classification, the AlleyFloodNet dataset facilitates deep learning models to effectively distinguish between flood and non-flood images, thus providing a robust training foundation for developing real-time flood detection and alert systems.

3.2. Deep Learning Models

In this study, several deep learning models were employed to effectively detect flooding occurrences. Deep learning models are known to achieve outstanding performance in complex pattern recognition and image classification tasks, particularly Convolutional Neural Networks (CNNs) and Transformer-based architectures, which have demonstrated high accuracy in image classification. However, training these models from scratch can result in overfitting when sufficient data are not available and requires substantial computational resources [31,32]. Therefore, this study adopted a fine-tuning approach, using pre-trained weights from models already trained on large-scale image datasets such as ImageNet. This method improves the generalization capability of the models and allows the AlleyFloodNet dataset to effectively capture specific flooding patterns occurring in economically vulnerable urban areas. The models utilized in this study include widely recognized CNN-based architectures such as AlexNet, VGG-19, ResNet-50, DenseNet-121, as well as advanced architectures such as Vision Transformer (ViT) and ConvNeXt.

Ground-level images collected from flood-vulnerable areas were clearly labeled into two classes: flooded and non-flooded. Subsequently, these labeled images underwent preprocessing, including resizing and normalization, to match the input specifications of the selected deep learning models. Then, several well-known deep learning models were fine-tuned using this dataset, enabling each model to learn distinct visual patterns for distinguishing flooded from typical urban images.

3.2.1. AlexNet

AlexNet is a deep convolutional neural network that achieved groundbreaking results in the 2012 ImageNet competition. It consists of five convolutional layers and three fully connected layers, using ReLU activation to speed up training and Dropout to reduce overfitting. The model also introduced Local Response Normalization and made effective use of GPU parallelism to accelerate computation. While it played a crucial role in advancing CNN research, it demands more memory and computational power than more recent architectures like VGGNet and ResNet [33].

3.2.2. VGG-19

VGG-19, part of the VGGNet family, is a deep CNN with 19 layers that stood out in the 2014 ILSVRC. It uses multiple small 3 × 3 convolutional filters stacked together, allowing for greater depth and non-linearity while maintaining computational efficiency. Max pooling reduces spatial dimensions, and two fully connected layers with 4096 neurons handle classification. Though simple in design, VGG-19 offers strong performance, but its depth increases computational demands and can lead to vanishing gradients, sometimes addressed with batch normalization [34].

3.2.3. ResNet-50

ResNet-50 is a deep neural network with 50 layers that introduced residual learning to tackle the vanishing gradient problem in very deep models. Its key innovation is the use of skip connections, which allow feature information to bypass layers, enabling more stable and efficient training. The network uses a 1 × 1, 3 × 3, 1 × 1 bottleneck block structure to reduce computation while preserving accuracy. Thanks to this design, ResNet-50 achieves strong performance on large-scale datasets like ImageNet and has become a widely used model for tasks like object detection and image segmentation due to its robustness and generalization capabilities [35].

3.2.4. DenseNet-121

DenseNet-121 is a deep CNN with 121 layers that introduces dense connectivity, where each layer receives input from all previous layers. This design enhances feature reuse, improves gradient flow, and reduces overfitting. It uses 1 × 1 bottleneck layers and global average pooling to minimize parameters and computational cost. Despite its depth, DenseNet-121 is memory-efficient and achieves strong generalization, especially with smaller datasets. It performs comparably or better than models like VGGNet and ResNet while using significantly fewer parameters, making it well-suited for tasks like medical imaging and object recognition [36].

3.2.5. ViT (Vision Transformer)

Vision Transformer (ViT) is an image classification model that replaces traditional CNNs with a Transformer-based architecture. It splits images into fixed-size patches, embeds them, and processes them using a Transformer encoder with self-attention, enabling better capture of long-range dependencies. ViT shows strong performance on large datasets but requires substantial data and computational resources. On smaller datasets, it may underperform compared to CNNs, though this can be mitigated with transfer learning. Recently, hybrid models combining CNNs and Transformers have been developed to address these challenges [37].

3.2.6. ConvNeXt-Large

ConvNeXt is a modern CNN architecture built on ResNet and enhanced with design ideas from Transformer models like Swin Transformer. It incorporates advanced techniques such as layer normalization, large 7 × 7 convolution kernels, and depthwise separable convolutions to improve efficiency and performance. With a deeper architecture and updated normalization strategies, ConvNeXt matches or exceeds the performance of Transformer-based models in tasks like classification, detection, and segmentation. It challenges the notion that CNNs are outdated, proving they remain powerful and competitive in modern deep learning [38].

4. Experimental Evaluation

4.1. Experimental Setup

In this study, experiments were conducted using PyTorch 2.1.0+cu118 in a Google Colab environment equipped with an NVIDIA A100 GPU to evaluate the performance of image classification models. Input images were resized to 224 × 224 pixels and normalized using the mean (0.485, 0.456, 0.406) and standard deviation (0.229, 0.224, 0.225) of the ImageNet dataset to enhance model generalization. The data were loaded in batches of size 32 for training, validation, and testing.

All deep learning models, including the ConvNeXt-Large architecture, were imported directly from the torchvision.models module provided by PyTorch. The pre-trained weights from the ImageNet dataset were utilized, and modifications were applied specifically to the final output layers of these models to match the binary classification task (flood vs. non-flood).

During training, BCEWithLogitsLoss was employed as the loss function, and the AdamW optimizer was adopted to effectively update model parameters. We utilized an early-stopping mechanism based on validation loss to prevent overfitting and to save computational resources. Specifically, training was set to stop if no improvement in validation loss was observed for four consecutive epochs. Additionally, the average training time per epoch and inference time per single image were measured by recording execution times using Python’s built-in time module.

The training process was monitored in real-time using the tqdm library, which provided continuous feedback on epoch-wise training progress and performance metrics. After completing training, inference time was measured using a single test image to evaluate the model’s applicability for real-time flood detection systems. Additionally, the best-performing model based on validation loss was automatically saved using PyTorch’s built-in torch.save function, ensuring reproducibility and facilitating future reuse. Finally, evaluation metrics such as accuracy, precision, recall, and F1 score were computed using functions provided by the scikit-learn library.

4.2. Evaluation Metrics

To comprehensively evaluate model performance, accuracy, recall, precision, and F1 score were employed as the primary evaluation metrics. Accuracy represents the proportion of correctly classified samples out of the total samples and serves as a useful indicator of overall model performance. However, in the presence of class imbalance, accuracy alone may be insufficient for performance assessment; thus, additional metrics were analyzed in conjunction. Recall indicates the proportion of actual positive samples that are correctly predicted and is particularly important in domains such as medical diagnosis or anomaly detection, where false negatives can have critical consequences. In contrast, precision measures the proportion of predicted positive samples that are truly positive and is particularly relevant in tasks such as financial fraud detection or spam filtering, where reducing false positives is crucial. Lastly, the F1 score considers the balance between precision and recall by computing their harmonic mean, enabling an unbiased evaluation that does not favor either metric. In this study, these evaluation metrics were collectively analyzed to compare the predictive performance of the models and to identify the most optimal model.

4.3. Average Fine-Tuning Time per Epoch and Single Image Inference Time

To compare the training speed and real-time detection capability of each model, the average training time per epoch and single image inference time were measured. The average fine-tuning time per epoch was measured by recording the fine-tuning time for each epoch using the AlleyFloodNet dataset and calculating the mean. This enabled analysis of the differences in training speed across models. Single image inference time was calculated by averaging the prediction times for 224 × 224-sized images from the test set after model training was completed. This was used to evaluate the extent to which each model can process data in real-time within a flood detection system.

4.4. Comparative Analysis of Flood Datasets and Misclassification Patterns

To further validate the effectiveness, practical utility, and uniqueness of the AlleyFloodNet dataset, this study conducts comparative experiments using the Flood Classification Dataset [20], a publicly available large-scale ground-level flood image dataset from Kaggle. The Flood Classification Dataset comprises 9296 flood images and 3748 non-flood images, which include ground-level, near-field images capturing various flooding scenarios worldwide. However, this dataset primarily consists of ground-level imagery collected from general urban environments without specific consideration of localized spatial contexts or regional vulnerability characteristics.

In contrast, AlleyFloodNet specifically targets localized flooding scenarios in economically and environmentally vulnerable urban settings, such as narrow alleyways, semi-basement residences, and lowlands. Images in AlleyFloodNet are carefully collected considering the spatial context and unique environmental features associated with urban flooding, such as specific architectural structures, narrow and enclosed spaces, and particular camera angles or viewpoints optimized for urban flood detection. Thus, AlleyFloodNet is distinctly specialized for enhancing detection accuracy and real-world applicability in vulnerable urban flood situations, rather than for generalized flood detection scenarios.

Based on these considerations, this study adopts a cross-dataset experimental design, separating 15% of the Flood Classification Dataset’s training data as a validation set, fine-tuning pre-trained deep learning models using the remaining data, and subsequently evaluating model performance on the AlleyFloodNet dataset. The justification for this experimental design is to empirically assess whether a model trained on ground-level imagery from general urban environments can effectively detect localized flooding events in economically and environmentally vulnerable urban regions. By analyzing how accurately models trained on general urban data can perform on the region-specific AlleyFloodNet dataset, we aim to quantitatively demonstrate the necessity and importance of constructing and utilizing localized, specialized datasets in disaster response and urban flood detection scenarios.

5. Results

5.1. Comparison of Model Performance: Accuracy, Precision, Recall, and F1 Score

This study evaluates the performance of six deep learning models—AlexNet, ResNet50, VGG19, Vision Transformer (ViT), DenseNet121, and ConvNeXt-Large—for flood detection using the AlleyFloodNet dataset. The model performances were quantitatively assessed using accuracy, precision, recall, and F1 score, and the detailed results are presented in Table 1.

The results indicate that the ConvNeXt-Large model achieved the highest accuracy (0.9596), recall (0.9767), and F1 score (0.9655). The VGG19 and DenseNet121 models also showed strong performances, both achieving an accuracy of 0.9466 and high precision and recall values. The Vision Transformer (ViT) exhibited competitive performance with an accuracy of 0.9313 and notably high recall (0.9612), though with relatively lower precision (0.9051). Similarly, ResNet50 achieved an accuracy of 0.9275, showing balanced precision (0.9435) and recall (0.9070). AlexNet exhibited comparatively lower overall performance, with an accuracy of 0.9046 and precision of 0.8611, though it maintained high recall (0.9612).

Additionally, training and validation loss/accuracy curves are presented in Figure 4, illustrating rapid convergence and stable validation trends. The confusion matrix of the ConvNeXt-Large model (Figure 5) further demonstrates the model’s robustness, correctly classifying the majority of flood and non-flood images with minimal misclassifications.

5.2. Results of Average Fine-Tuning Time per Epoch and Single Image Inference Time

To evaluate the training speed and real-time applicability of each model, we measured the average fine-tuning time per epoch and the single image inference time. AlexNet and ResNet50 exhibited the fastest training speeds, with average fine-tuning times of approximately 4.73 and 5.58 s per epoch, respectively. In contrast, ConvNeXt-Large required the longest training time at 18.25 s per epoch. Regarding single image inference speed, AlexNet demonstrated the fastest inference at 0.001459 s, followed by ViT with a relatively fast inference time of 0.008279 s. Conversely, DenseNet121 showed the slowest inference speed at 0.100600 s, and ConvNeXt-Large also exhibited a relatively high computational cost with an inference time of 0.012291 s. Thus, although the ConvNeXt-Large model achieved the highest overall performance, it showed limitations in terms of training time and inference speed. Table 2 compares the average fine-tuning time per epoch and single image inference time.

5.3. Results of Comparative Analysis of Flood Datasets and Misclassification Patterns

Models fine-tuned using AlleyFloodNet generally demonstrated high performance, with ConvNeXt-Large achieving the most outstanding results (accuracy: 0.9596, F1 score: 0.9655). In contrast, the model fine-tuned on the Kaggle Flood Classification Dataset and subsequently evaluated on the AlleyFloodNet test set exhibited a notable decline in performance (accuracy: 0.5611, precision: 0.6750, recall: 0.2093, F1 score: 0.3195). This indicates a significant domain discrepancy between the general urban flood images in the Kaggle dataset and the localized alleyways, semi-basements, and lowland environments specifically targeted by AlleyFloodNet. In other words, general datasets capturing typical urban flood scenes, such as the Kaggle Flood Classification Dataset, may not be sufficient for effectively detecting floods in economically and environmentally vulnerable micro-urban areas. Thus, specialized datasets like AlleyFloodNet are critical for improving detection accuracy in such contexts.

Misclassifications primarily occurred in environments with very low illumination, where even high-performance models struggled to achieve accurate classification due to insufficient visual information. In particular, when dim or distant streetlights illuminated only parts of the scene, the model had difficulty distinguishing between wet surfaces reflecting light and actual flooded areas. Thus, despite the current dataset encompassing diverse environments, its limitation in adequately representing low-light or nighttime flooding scenarios has become apparent. To address these issues, future studies should consider enhancing model robustness through increased collection of nighttime images, data augmentation simulating low-light conditions, or the incorporation of multimodal information such as thermal or infrared imagery.

6. Discussion

In this study, we fine-tuned multiple deep learning models—AlexNet, ResNet50, VGG19, Vision Transformer (ViT), DenseNet121, and ConvNeXt-Large—using the AlleyFloodNet dataset and evaluated their flood detection performance, including training time per epoch and inference speed per image. ConvNeXt-Large demonstrated superior performance across all evaluation metrics, achieving the highest accuracy (0.9596), precision (0.9545), recall (0.9767), and F1 score (0.9655). By assessing both detection accuracy and computational efficiency (training and inference times), this study provides empirical evidence to assist in selecting the most appropriate model for real-time flood detection systems in economically vulnerable urban areas.

Additionally, we conducted comparative experiments using the Kaggle Flood Classification Dataset, another ground-level flood image dataset, to evaluate the unique effectiveness of AlleyFloodNet. The model fine-tuned on the Kaggle dataset and subsequently tested on AlleyFloodNet exhibited significantly reduced performance (accuracy: 0.5611, precision: 0.6750, recall: 0.2093, F1 score: 0.3195). This notable drop in performance underscores the domain discrepancy between general urban flood scenarios in the Kaggle dataset and the localized alleyways, semi-basements, and lowlands targeted by AlleyFloodNet. Thus, it highlights the importance of specialized datasets such as AlleyFloodNet for accurately detecting floods in highly localized urban environments. Figure 6 visually compares the experimental results obtained using the Kaggle Flood Classification Dataset and AlleyFloodNet, clearly demonstrating the superior performance and effectiveness of AlleyFloodNet in detecting floods within localized urban environments.

Moreover, analysis of misclassified images revealed critical insights into challenging scenarios for flood detection. Most misclassifications occurred in images with ambiguous visual cues—such as partially submerged ground or reflections—and under challenging conditions like low lighting, shadows, or visually unclear boundaries between flooded and non-flooded areas. These findings indicate that future research should specifically address these challenging conditions, potentially through targeted data augmentation techniques or enhanced model architectures designed for robust performance under ambiguous and low-light scenarios.

The practical implications of these findings are significant. Deep learning models fine-tuned using AlleyFloodNet could be integrated into automated flood monitoring systems based on CCTV or smartphone footage. Such systems could continuously monitor economically vulnerable urban areas—including narrow alleyways, semi-basement residences, and lowlands—and rapidly classify images to detect flooding events. Upon detecting flood conditions, these systems could promptly issue location-specific alerts to emergency services and residents, facilitating timely evacuations and targeted disaster responses, thereby enhancing public safety and resilience.

The AlleyFloodNet dataset was constructed by gathering diverse ground-level flood images captured from various global regions through publicly accessible platforms. However, due to practical constraints, collecting data in flood-vulnerable urban areas under diverse weather and rainfall conditions remains challenging. Thus, future research should aim to expand the dataset by incorporating more flood images captured under various rainfall intensities and environmental conditions. This would improve the dataset’s robustness and generalizability, ultimately enhancing the effectiveness of flood detection models in real-world scenarios. Moreover, given the observed differences in training and inference speeds across models, future studies should also investigate lightweight or hybrid model architectures that can deliver high accuracy with reduced computational demands. Such advancements would optimize real-time applicability and enhance the effectiveness of flood detection systems deployed in disaster-prone urban environments.

7. Conclusions

The significance of this study lies in the construction of a specialized, close-range image dataset—AlleyFloodNet—which precisely reflects flood-vulnerable urban areas, including narrow alleyways, semi-basement residences, and lowlands. Unlike conventional satellite- or UAV-based datasets that provide distant-view images, AlleyFloodNet comprises ground-level images collected globally through publicly accessible platforms, specifically designed to effectively capture flood patterns in structurally complex urban environments. The experimental validation conducted with multiple deep learning models demonstrated the dataset’s superior performance in accurately classifying floods in these economically vulnerable urban settings.

Moreover, comparative experiments using the Kaggle Flood Classification Dataset highlighted AlleyFloodNet’s unique strengths. The significant performance gap observed when models fine-tuned on general urban flood imagery were evaluated against AlleyFloodNet underscores the necessity and value of region-specific datasets for effective urban flood detection. Furthermore, analysis of misclassified images provided critical insights into the visual ambiguity and environmental challenges associated with accurate flood detection, emphasizing the need for targeted enhancements such as specialized data augmentation and robust model architectures.

Beyond purely technical advancements, this study holds important societal implications by specifically targeting densely populated areas inhabited by economically vulnerable groups, repeatedly affected by flooding due to climate change. The fine-tuned models utilizing AlleyFloodNet offer a practical foundation for integrating automated real-time flood monitoring systems that use CCTV or smartphone footage to promptly issue location-specific alerts, thereby protecting lives and property.

Future research will focus on expanding AlleyFloodNet by incorporating additional ground-level images captured under diverse rainfall intensities, lighting conditions, and environmental contexts, further enhancing its robustness and generalizability. Additionally, the development of lightweight or hybrid deep learning models, combined with model compression and computational optimization techniques, will be essential to achieving real-time detection capabilities suitable for practical disaster response scenarios. Building upon these findings, subsequent research can develop integrated flood response platforms capable of flood depth prediction, risk visualization, and evacuation route guidance, substantially improving the effectiveness and field applicability of disaster response technologies. Ultimately, these efforts will contribute significantly toward protecting vulnerable populations and enhancing urban resilience against flood disasters intensified by climate change.

Author Contributions

Conceptualization, H.J.; methodology, H.J.; software, H.J.; validation, O.L.; formal analysis, H.J.; data curation, H.J.; supervision, O.L.; project administration, O.L.; funding acquisition, O.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the research fund of Hanyang University (HY-201700000000816).

Data Availability Statement

https://www.kaggle.com/datasets/seonyseony/alleyfloodnet (accessed on 19 May 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

National Environmental Satellite, Data, and Information Service (NESDIS). Rising Waters: Leveraging VIIRS for Enhanced Global Flood Preparedness and Response. 2024. Available online: https://www.nesdis.noaa.gov/news/rising-waters-leveraging-viirs-enhanced-global-flood-preparedness-and-response (accessed on 28 March 2025).
Tabari, H. Climate change impact on flood and extreme precipitation increases with water availability. Sci. Rep. 2020, 10, 13768. [Google Scholar] [CrossRef] [PubMed]
Ruan, X.; Sun, H.; Shou, W.; Wang, J. The Impact of Climate Change and Urbanization on Compound Flood Risks in Coastal Areas: A Comprehensive Review of Methods. Appl. Sci. 2024, 14, 10019. [Google Scholar] [CrossRef]
United Nations Environment Programme (UNEP). How Climate Change Is Making Record-Breaking Floods the New Normal. Available online: https://www.unep.org/news-and-stories/story/how-climate-change-making-record-breaking-floods-new-normal (accessed on 19 March 2025).
Ro, B.; Park, H. The Impact of Anthropogenic Changes on Socioeconomically Marginalized Population in Urban Areas: A Case Study of Flood Risk in Seoul, South Korea. In Proceedings of the EGU General Assembly 2024, Vienna, Austria, 14–19 April 2024. [Google Scholar] [CrossRef]
Braun, B.; Aßheuer, T. Floods in megacity environments: Vulnerability and coping strategies of slum dwellers in Dhaka/Bangladesh. Nat. Hazards 2011, 58, 771–787. [Google Scholar] [CrossRef]
Vuong Tai, C.; Kim, D.; Kronenberg, R.; Vorobevskii, I.; Luong, T.T. Beneath the Surface: Exploring Relationship between Pluvial Floods and Income Disparities for Residential Basements in Seoul, South Korea. Int. J. Disaster Risk Reduct. 2025, 123, 105501. [Google Scholar] [CrossRef]
Butters, O.; Dawson, R.J. Flood Evacuation in Informal Settlements: Application of an Agent-Based Model to Kibera Using Open Data. Urban Sci. 2025, 9, 12. [Google Scholar] [CrossRef]
Shiwomeh, D.N.; Kantoush, S.A.; Sumi, T.; Nguyen, B.Q.; Abdrabo, K.I. Holistic mapping of flood vulnerability in slums areas of Yaounde city, Cameroon through household and institutional surveys. Int. J. Disaster Risk Reduct. 2024, 114, 104947. [Google Scholar] [CrossRef]
Franca, M.; Valero, D.; Bayon, A.; Martínez-Gomariz, E.; Russo, B. October 2024 Floods in the Valencia Region, Eastern Spain: One Example too Many of Hazardous Debris. EGU Hydrological Sciences Blog. Available online: https://blogs.egu.eu/divisions/hs/2024/11/06/floods-valencia-hazardous-debris/ (accessed on 19 March 2025).
Galvez-Hernandez, P.; Dai, Y.; Muntaner, C. The DANA disaster: Unraveling the political and economic determinants for Valencia’s floods devastation. Int. J. Equity Health 2025, 24, 64. [Google Scholar] [CrossRef]
D’Ayala, D.; Wang, K.; Yan, Y.; Smith, H.; Massam, A.; Filipova, V.; Pereira, J.J. Flood vulnerability and risk assessment of urban traditional buildings in a heritage district of Kuala Lumpur, Malaysia. Nat. Hazards Earth Syst. Sci. 2020, 20, 2221–2241. [Google Scholar] [CrossRef]
Nelson-Mercer, B.; Kim, T.; Tran, V.N.; Ivanov, V. Pluvial flood impacts and policyholder responses throughout the United States. Npj Nat. Hazards 2025, 2, 8. [Google Scholar] [CrossRef]
Bentivoglio, R.; Isufi, E.; Jonkman, S.N.; Taormina, R. Deep learning methods for flood mapping: A review of existing applications and future research directions. Hydrol. Earth Syst. Sci. 2022, 26, 4345–4378. [Google Scholar] [CrossRef]
Tan, W.; Qin, N.; Zhang, Y.; McGrath, H.; Fortin, M.; Li, J. A rapid high-resolution multi-sensory urban flood mapping framework via DEM upscaling. Remote Sens. Environ. 2024, 301, 113956. [Google Scholar] [CrossRef]
Rahnemoonfar, M.; Chowdhury, T.; Sarkar, A.; Varshney, D.; Yari, M.; Murphy, R. FloodNet: A high-resolution aerial imagery dataset for post-flood scene understanding. arXiv 2020, arXiv:2012.02951. [Google Scholar] [CrossRef]
Alizadeh Kharazi, B.; Behzadan, A.H. Flood depth mapping in street photos with image processing and deep neural networks. Comput. Environ. Urban Syst. 2021, 88, 101628. [Google Scholar] [CrossRef]
Zhao, J.; Wang, X.; Zhang, C.; Hu, J.; Wan, J.; Cheng, L.; Shi, S.; Zhu, X. Urban Waterlogging Monitoring and Recognition in Low-Light Scenarios Using Surveillance Videos and Deep Learning. Water 2025, 17, 707. [Google Scholar] [CrossRef]
Hashemi-Beni, L.; Jones, J.; Thompson, G.; Johnson, C.; Gebrehiwot, A. Challenges and Opportunities for UAV-Based Digital Elevation Model Generation for Flood-Risk Management: A Case of Princeville, North Carolina. Sensors 2018, 18, 3843. [Google Scholar] [CrossRef] [PubMed]
Srivastava, D. Flood Classification Dataset: Balanced Flood and Non-Flood Image Dataset for Training Deep Learning Models. Kaggle. 2024. Available online: https://www.kaggle.com/datasets/dhawalsrivastava2583/flood-classification-dataset (accessed on 8 May 2025).
Manaf, A.; Mujtaba, G.; Mughal, N.; Talpur, K.R.; Talpur, S.R.; Talpur, B.A. Aerial image classification in post-flood scenarios using robust deep learning and explainable artificial intelligence. IEEE Access 2025, 13, 35973–35984. [Google Scholar] [CrossRef]
Karanjit, R.; Pally, R.; Samadi, S. FloodIMG: Flood image DataBase system. Data Brief 2023, 48, 109164. [Google Scholar] [CrossRef]
Munawar, H.S.; Ullah, F.; Qayyum, S.; Heravi, A. Application of Deep Learning on UAV-Based Aerial Images for Flood Detection. Smart Cities 2021, 4, 1220–1242. [Google Scholar] [CrossRef]
Stateczny, A.; Praveena, H.D.; Krishnappa, R.H.; Chythanya, K.R.; Babysarojam, B.B. Optimized Deep Learning Model for Flood Detection Using Satellite Images. Remote Sens. 2023, 15, 5037. [Google Scholar] [CrossRef]
Munawar, H.S.; Ullah, F.; Qayyum, S.; Khan, S.I.; Mojtahedi, M. UAVs in Disaster Management: Application of Integrated Aerial Imagery and Convolutional Neural Network for Flood Detection. Sustainability 2021, 13, 7547. [Google Scholar] [CrossRef]
Yoo, J.; Lee, J.; Jeung, S.; Jung, S.; Kim, M. Development of a Deep Learning-Based Flooding Region Segmentation Model for Recognizing Urban Flooding Situations. Sustainability 2024, 16, 11041. [Google Scholar] [CrossRef]
Zhong, P.; Liu, Y.; Zheng, H.; Zhao, J. Detection of urban flood inundation from traffic images using deep learning methods. Water Resour. Manag. 2024, 38, 287–301. [Google Scholar] [CrossRef]
Vo, A.V.; Bertolotto, M.; Ofterdinger, U.; Laefer, D.F. In Search of Basement Indicators from Street View Imagery Data: An Investigation of Data Sources and Analysis Strategies. Künstl. Intell. 2023, 37, 41–53. [Google Scholar] [CrossRef] [PubMed]
Witherow, M.A.; Sazara, C.; Winter-Arboleda, I.M.; Elbakary, M.I.; Cetin, M.; Iftekharuddin, K.M. Floodwater detection on roadways from crowdsourced images. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 2018, 7, 529–540. [Google Scholar] [CrossRef]
Zeng, Y.-F.; Chang, M.-J.; Lin, G.-F. A novel AI-based model for real-time flooding image recognition using super-resolution generative adversarial network. J. Hydrol. 2024, 638, 131475. [Google Scholar] [CrossRef]
Oquab, M.; Bottou, L.; Laptev, I.; Sivic, J. Learning and Transferring Mid-Level Image Representations Using Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 1717–1724. [Google Scholar] [CrossRef]
Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How Transferable Are Features in Deep Neural Networks? In Proceedings of the 27th International Conference on Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 8–13 December 2014; Volume 27, pp. 3320–3328. Available online: https://proceedings.neurips.cc/paper_files/paper/2014/file/532a2f85b6977104bc93f8580abbb330-Paper.pdf (accessed on 9 May 2025).
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. Available online: https://papers.nips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf (accessed on 28 March 2025). [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations (ICLR) 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar] [CrossRef]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16×16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations (ICLR) 2021, Vienna, Austria, 3–7 May 2021. [Google Scholar] [CrossRef]
Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar] [CrossRef]

Figure 1. Workflow of the proposed study using AlleyFloodNet for flood image classification and dataset comparison.

Figure 2. Example images from the flooding class.

Figure 3. Example images from the non-flooding class.

Figure 4. Training and validation loss/accuracy curves.

Figure 5. Confusion matrix results of the ConvNeXt-Large.

Figure 6. Performance comparison: AlleyFloodNet vs. Kaggle Dataset (ConvNext-Large).

Table 1. Comparison of model performance.

Model Name	Accuracy	Precision	Recall	F1 Score
AlexNet	0.9046	0.8611	0.9612	0.9084
ResNet50	0.9275	0.9435	0.9070	0.9249
VGG19	0.9466	0.9528	0.9380	0.9453
ViT	0.9313	0.9051	0.9612	0.9323
DenseNet121	0.9466	0.9389	0.9535	0.9462
ConvNeXt-Large	0.9596	0.9545	0.9767	0.9655

Table 2. Comparison of average fine-tuning time per epoch and single image inference time.

Model Name	Avg. Training Time per Epoch (s)	Inference Time per Image (s)
AlexNet	4.73	0.001459
ResNet50	5.58	0.043849
VGG19	6.88	0.018143
ViT	9.48	0.008279
DenseNet121	6.25	0.100600
ConvNeXt-Large	18.25	0.012291

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, O.; Joo, H. AlleyFloodNet: A Ground-Level Image Dataset for Rapid Flood Detection in Economically and Flood-Vulnerable Areas. Electronics 2025, 14, 2082. https://doi.org/10.3390/electronics14102082

AMA Style

Lee O, Joo H. AlleyFloodNet: A Ground-Level Image Dataset for Rapid Flood Detection in Economically and Flood-Vulnerable Areas. Electronics. 2025; 14(10):2082. https://doi.org/10.3390/electronics14102082

Chicago/Turabian Style

Lee, Ook, and Hanseon Joo. 2025. "AlleyFloodNet: A Ground-Level Image Dataset for Rapid Flood Detection in Economically and Flood-Vulnerable Areas" Electronics 14, no. 10: 2082. https://doi.org/10.3390/electronics14102082

APA Style

Lee, O., & Joo, H. (2025). AlleyFloodNet: A Ground-Level Image Dataset for Rapid Flood Detection in Economically and Flood-Vulnerable Areas. Electronics, 14(10), 2082. https://doi.org/10.3390/electronics14102082

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AlleyFloodNet: A Ground-Level Image Dataset for Rapid Flood Detection in Economically and Flood-Vulnerable Areas

Abstract

1. Introduction

2. Related Works

2.1. Dataset-Based Research

2.2. Deep Learning-Based Detection Techniques and Methodologies Research

3. Materials and Methods

3.1. Data Development and Preprocessing

3.2. Deep Learning Models

3.2.1. AlexNet

3.2.2. VGG-19

3.2.3. ResNet-50

3.2.4. DenseNet-121

3.2.5. ViT (Vision Transformer)

3.2.6. ConvNeXt-Large

4. Experimental Evaluation

4.1. Experimental Setup

4.2. Evaluation Metrics

4.3. Average Fine-Tuning Time per Epoch and Single Image Inference Time

4.4. Comparative Analysis of Flood Datasets and Misclassification Patterns

5. Results

5.1. Comparison of Model Performance: Accuracy, Precision, Recall, and F1 Score

5.2. Results of Average Fine-Tuning Time per Epoch and Single Image Inference Time

5.3. Results of Comparative Analysis of Flood Datasets and Misclassification Patterns

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI