DeepSDC: Deep Ensemble Learner for the Classification of Social-Media Flooding Events

Hanif, Muhammad; Waqas, Muhammad; Muneer, Amgad; Alwadain, Ayed; Tahir, Muhammad Atif; Rafi, Muhammad

doi:10.3390/su15076049

Open AccessArticle

DeepSDC: Deep Ensemble Learner for the Classification of Social-Media Flooding Events

by

Muhammad Hanif

^1,*,

Muhammad Waqas

^1,2

,

Amgad Muneer

^3,*

,

Ayed Alwadain

⁴,

Muhammad Atif Tahir

¹

and

Muhammad Rafi

¹

FAST School of Computing, National University of Computer and Emerging Sciences (FAST-NUCES), Karachi Campus, Karachi 75030, Pakistan

²

Department of Computer Science, University College of Zhob, Balochistan University of IT, Engineering, and Management Sciences, Quetta 85200, Pakistan

³

Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA

⁴

Computer Science Department, Community College, King Saud University, Riyadh 145111, Saudi Arabia

^*

Authors to whom correspondence should be addressed.

Sustainability 2023, 15(7), 6049; https://doi.org/10.3390/su15076049

Submission received: 27 February 2023 / Revised: 27 March 2023 / Accepted: 28 March 2023 / Published: 31 March 2023

(This article belongs to the Special Issue Flood Risk Assessment Using Deep Learning and State-of-the-Art Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

:

Disasters such as earthquakes, droughts, floods, and volcanoes adversely affect human lives and valuable resources. Therefore, various response systems have been designed, which assist in mitigating the impact of disasters and facilitating relief activities in the aftermath of a disaster. These response systems require timely and accurate information about affected areas. In recent years, social media has provided access to high-volume real-time data, which can be used for advanced solutions to numerous problems, including disasters. Social-media data combines two modalities (text and associated images), and this information can be used to detect disasters, such as floods. This paper proposes an ensemble learning-based Deep Social Media Data Classification (DeepSDC) approach for social-media flood-event classification. The proposed algorithm uses datasets from Twitter to detect the flooding event. The Deep Social Media Data Classification (DeepSDC) uses a two-staged ensemble-learning approach which combines separate models for textual and visual data. These models obtain diverse information from the text and images and combine the information using an ensemble-learning approach. Additionally, DeepSDC utilizes different augmentation, upsampling and downsampling techniques to tackle the class-imbalance challenge. The performance of the proposed algorithm is assessed on three publically available flood-detection datasets. The experimental results show that the proposed DeepSDC is able to produce superior performance when compared with several state-of-the-art algorithms. For the three datasets, FRMT, FCSM and DIRSM, the proposed approach produced F1 scores of 46.52, 92.87, and 92.65, respectively. The mean average precision (MAP@480) of 91.29 and 98.94 were obtained on textual and a combination of textual and visual data, respectively.

Keywords:

flood classification; disaster response; social media; two-staged ensemble learning; deep learning

1. Introduction

Over the past 70 years, the number of natural disasters has increased tenfold, and one-third of those are related to flooding events [1,2]. Floods can have a profound effect on the lives of individuals as well as valuable resources. Additionally, water is a vital substance for life, and floods are one of the primary causes of the wastage of water [1,2]. The European Commission (EC) estimates that EUR 100 billion has been spent on disaster-management activities in the last two decades [3]. Studies have shown that this expenditure can be reduced by 80%, if only 20% of this cost is spent on practical disaster-response activities [3]. Therefore, flood warning systems are introduced to minimize the impact of loss caused by flooding events [4,5,6,7,8,9,10]. The data provided by these systems enables individuals and decision-makers to make informed decisions during flood events. These flood warning systems have four key functions: forecasting flooding events, detecting flooding events, providing timely warnings, and taking appropriate action [11]. Thus, the timely availability and accuracy of data related to a flooding event are critical for the performance of any successful flood response system.

The recent social-media advancements have generated a large amount of data over the last two decades, allowing effective problem solving through the exploration of this data, including information on earthquake detection [12] and flood detection [13,14,15]. In the last few years, the use of social-media data in disaster response has gained increasing attention. One example is classifying social-media images and text data based on the availability of flooding events [13,14,15]. Several flood-detection algorithms have been used successfully based on social-media textual and visual data [16,17,18,19,20,21,22,23,24,25].

However, due to the focus of researchers on a single modality, either text or visual, there is still room for improvement. Particularly, combining visual and textual information for flood detection may improve the performance of the predictive algorithm. For example, text and corresponding images in a tweet can be combined simultaneously for flood classification. The correlation between the textual and visual data can enhance the predictive ability of the algorithms [13,14,15].

While social-media data can be used to detect flooding events, at the same time, it also presents several challenges that hinder a successful outcome. These challenges include insufficient annotated data, class imbalance, and duplication of data [13,14,15]. The data available on social media is unstructured and contains noise. As a result, removing irrelevant and noisy elements from the data is vital to enhance the resulting predictions. Another challenge inherent in using social-media data for disaster detection is the scarcity of relevant and targeted data. Data from social media is extracted via keywords, which may produce more irrelevant information than relevant information. For example, almost half of the images in a publicly available flood-related dataset called Disaster Image Retrieval from Social Media (DIRSM) [15] are unrelated to flooding events.

In addition, data duplication is also encountered when social-media datasets are used in critical applications such as flood detection. For example, flood-related social-media datasets are gathered during a particular period and may contain the same information from different users. Thus, multiple copies of a single instance are found in a specific dataset. This duplication issue must be resolved before applying a classification algorithm. Nevertheless, selecting the right balancing technique is extremely challenging, especially in flooding-event classification. Additionally, extracting data related to a particular event and assigning manual annotations is time-consuming. The example of tweets and relevant images for different twitter-based flood detection datasets are shown in Figure 1.

Considering the above discussion, this paper proposes the Deep Social Media Data Classification (DeepSDC) approach for flood detection using visual and textual data, which is an extension of our top-ranked method presented in MediaEval 2020 and 2021 [18,26]. First, the images are processed using a combination of VGG16 [27] and ResNet50 [28] pre-trained networks and text data is processed using RoBERTa [29] and XLNet [30] pre-trained models. Finally, the obtained information and the predictions from textual and visual models are combined using the average voting procedure. DeepSDC tackles the class-imbalance problem in both (visual and textual) modalities using augmentation, up-sampling and down-sampling techniques. Second, the obtained information and the predictions from textual and visual models are combined using the average voting procedure.

The major contributions of the paper are as follows:

A two-stage ensemble approach is proposed which combines the information obtained by visual and textual data to predict the flooding event. The proposed design provides improved generalization ability with the help of a diverse range of pre-trained networks.
Combined use of visual and textual information is proposed, where the text data from tweets and corresponding images are used for the flood-detection process.
The experiments are conducted on three publically available benchmark datasets, including Flood-related Multimedia Task (FRMT) [13], Flood Classification for Social Multimedia (FCSM) [14], and Disaster Image Retrieval from Social Media (DIRSM) [15]. The performance of the proposed DeepSDC is compared to several state-of-the-art flood-detection algorithms.

The proposed DeepSDC can be helpful for flood-response systems and provide support in assistance-related activities, such as the detection of roads and flooding situations. Additionally, the outcomes of the proposed model can also be used to take appropriate precautionary measures and plan remedial action in case of flood events.

The rest of the paper is arranged as follows. The literature-review section discusses various state-of-the-art methods applied to datasets of flooding events. The dataset section presents the multiple datasets used in this research work. Next, the methodology section describes the steps utilized while implementing the DeepSDC approach. Later, results and discussions are provided. Finally, the research work is concluded by discussing possible future directions.

2. Literature Review

In recent years, several flood-response management systems have been developed to assist emergency responders and decision-makers [13,14,15] based on social-media data. In this section, we present several state-of-the-art flood-detection systems that use Twitter data.

2.1. Social-Media-Based Flood Detection and Response Systems

Flood-event detection using tweet data from Twitter is a research area that aims to automatically identify and classify tweets related to flood events. Twitter is a popular social-media platform where users post short messages or tweets which can include text, images, and location data. People often use Twitter to share information and updates during a flood event, making it a valuable data source for flood monitoring and detection. Therefore, MediaEval (a benchmarking initiative focusing on multimedia analysis and retrieval tasks) organized several annual challenges on flood detection, which aim to evaluate the effectiveness of methods used for detecting and monitoring flood events using social-media data [13,14,15]. This kind of challenge involves the following large-scale dataset of tweets and images which are collected during flood events in different parts of the world. Participants are then provided with the dataset and asked to develop methods for identifying flood-related tweets and tracking the evolution of the flood event. Three different binary-classification datasets related to detecting flooding events have been developed over the years [13,14,15]. For example, the Flood-related Multimedia Task (FRMT) [13] aims to detect the availability or unavailability of the flooding situation, and the Flood Classification for Social Multimedia (FCSM) [14] aims to discover passable roads in flooding events. Moreover, the Disaster Image Retrieval from Social Media (DIRSM) [15] dataset explores the availability or unavailability of flooding situations. Several algorithms are developed to find flooding events from the above-discussed datasets. The discussion related to the methods applied on these datasets is discussed in the following subsection.

2.1.1. Studies Related to the FRMT Dataset

Naina et al. [16] proposed a flood-detection technique using text and images. The proposed system combines image and text features to classify tweets into flood-related and non-flood-related categories. The text features are extracted using various natural-language processing techniques, such as sentiment analysis, keyword extraction, and topic modelling. The image features are extracted using a convolutional neural network. The study used a relatively simple approach to extract image features for flood detection, which may only capture limited relevant information in the images. Therefore, the performance of the proposed method remains limited.

Term Frequency-Inverse Document Frequency (TF-IDF)-based techniques are also applied to FRMT textual data. TF-IDF evaluates the occurence or phrase within a document or a corpus (a collection of documents). For example, in [17,18], the authors generated a vector of TF-IDF to detect flooding events. However, TF-IDF does not incorporate the linguistic and semantic structure of the tweets. Thus, the performance of these algorithms remains limited.

2.1.2. Studies Related to the FCSM Dataset

Laura et al. [31] applied two approaches to find passable roads using textual and visual data. The authors use inception V3 and other CNN-based models to obtain diverse information for visual data. For textual data, the authors used GloVe, and a long short-term memory network (LSTM) was applied for classification. Finally, the retrieved results for textual and visual features are combined using the softmax output layer. However, GloVe relies on pre-trained word embeddings learned from large text corpora. These embeddings may not be optimal for a specific task or domain.

Zhao et al. [32] applied various techniques to classify the textual and visual information of the FCSM dataset. For textual information, different N-grams are selected and used to show the availability of passable roads. In case of the unavailability of the chosen n-grams, the instance is marked as irrelevant and with no passable roads. For visual information, features are extracted using ResNet50, which is pre-trained on the Places-2 dataset. The classification is performed using support vector machine (SVM). However, the N-gram model provides limited contextual information. Additionally, N-gram models rely on a fixed vocabulary of words, and they may not be able to handle words that are not in the vocabulary.

Bischke et al. [33] combined global and local features for passable road classification on the FCSM dataset. The proposed approach uses a ResNet152 model pre-trained on Places365 and ImageNet datasets. Finally, the obtained features are classified using a support vector machine (SVM) classifier. However, the proposed technique focuses on scene-level information for the classification of road passability rather than object-level information. On the other hand, Laura et al. [31] focused on object information and utilized the pre-trained weights of ImageNet. While Zhao et al. [32] concentrated only on scene-based information. However, combining scene-level and object-level information may produce an effective outcome.

2.1.3. Studies Related to the DIRSM Dataset

Zhengyu et al. [20] proposed an approach for the classification of DIRSM instances using textual and visual information. For visual information, the authors assumed that the water lies in the lower middle part of the image; hence, they cropped each image by removing 60% of the upper part and 10% from each side. It was also assumed that flooding zones are homogeneous in colour, so the ranking was created based on colour complexity. For textual data, few water and flood-related keywords are selected, and textual instances are ranked based on the availability or unavailability of those keywords and their correlations. The proposed method mainly lacks focus on feature extraction and classification-based algorithms and ranked instances based on keywords, so the proposed method may not produce an effective outcome in similar datasets.

Bischke et al. [24] utilized deep-learning-based approaches to classify the DIRSM dataset. The approach uses X-ResNet [34], which is pre-trained on DeepSentibank [35]. The method takes advantage of scene-based information, which is helpful in flooding-event detection. For textual data, user tags are chosen for classifying instances. After pre-processing, the Word2Vec model is used to represent textual information. Consequently, each textual instance is represented through 200 dimensions. Later, term frequency-inverse document frequency (TF-IDF) is utilized to classify the contents.

Tkachenko et al. [36] utilized conventional feature-extraction techniques and machine-learning-based classifiers to categorize the flooding events in the DIRSM dataset. The description, title, and user tags of each instance are combined for textual information. Then, machine translation is applied to convert all contents into English. The feature extracted from the textual information is then sent to logistic regression, and classification is performed. The proposed method may be improved by using combinations of different deep-learning models.

3. Methodology

In this section, we present the details of the proposed DeepSDC algorithm. The proposed algorithm has two major modules: Module I (text module), which uses the text from the tweets to identify the flooding event, and Module II (visual module), which uses visual data from the tweets to determine the flooding event. Finally, the average voting ensemble combines the prediction of both modules.

Module I (text module) incorporates XL-Net [30] and RoBERTA [29] pre-trained models to obtain a diverse set of information from textual data and applies ensemble learning to obtain improved performance using text data. Module II (visual module) utilizes ResNET [28] and VGG [27] pre-trained models. VGG is known for its ability to extract rich features from images through multiple convolutional layers. At the same time, ResNet is designed to alleviate the problem of vanishing gradients and improve the training of very deep networks. By combining these models, the resulting model can leverage the strengths of both architectures and achieve higher accuracy on flood-detection tasks. The visual module uses a bagging ensemble approach to reduce the problem of overfitting. Once the predictions of both models are obtained, the final label (Flood/Non-Flood) is obtained by applying a second-stage ensemble to obtain a robust performance. The following subsections discuss these modules in detail. The detailed process adopted in DeepSDC is shown in Figure 2.

3.1. DeepSDC Module-I: Textual Module

The first module (Textual Module) of the DeepSDC approach is designed to classify different flooding events using textual data retrieved from social-media platforms. In Module I, preprocessing of textual data is performed and an ensemble-based deep-learning approach for flooding-event classification is applied. The proposed module translates the entire tweet text into English (including the tweet description and tags) in order to ensure uniformity. The text is then cleaned using preprocessing and class-balancing techniques. In the final step, multiple pre-trained networks are fine-tuned, and their predictions are combined for effective output. The steps involved in Module I are shown in Figure 3.

3.1.1. Data Translation and Pre-Processing

Social-media data are usually published in multiple languages. For the sake of uniformity, textual data are translated into the English language (wherever required). The textual data are available in different languages, including Italian, Spanish, and others. The GoogleTrans library is utilized for the translation of the text into the English language. It is a python-based translation library which facilitates online text conversion and provides details of the original text language.

It is highly dependent on the quality of the input textual data to achieve effective results. Social-media users have fewer restrictions on uploading textual content; hence, they are free to upload text in any language and format. Due to this, extracting meanings of contents is problematic. Therefore, various meaningless elements which includes Uniform Resource Locators (URLs), punctuation, symbols, and emoticons are removed from each instance of textual data.

3.1.2. Class Balancing

In order to extract social-media data, certain keywords such as flood, water, disaster, and destruction are used. However, data extraction also produces irrelevant content. For example, the search results for the keyword “water” may produce tweets mentioning a regular water flow, including lakes and rivers. Moreover, the keyword “disaster” may retrieve contents related to other disasters, including fire events and earthquakes. Therefore, the keyword-based retrieval process may result in a class-imbalance problem, leading to poor generalization and overfitting of the majority class [37]. The class-imbalance problem is observed in the text of all three datasets, including DIRSM, FCSM, and FRMT.

Our findings indicate that, in most cases, the minority class has half or fewer instances than the majority class. Therefore, we used random up-sampling, down-sampling, and their combination based on the imbalance ratio. First, suppose the majority class has twice as many elements as the minority class. In that case, the instances in the minority class are up-sampled to double using a random selection process. Second, suppose the majority class has more than twice the minority class’s instances. In that case, the majority class’s instances are randomly down-sampled to half, and the minority-class instances are up-sampled to make both classes equal. This process can be seen as a computationally efficient version of the selection technique proposed in [38]. The process of class balancing is only applied to text data when the textual module is individually used for flood detection.

The process of class balancing is given below:

\{\begin{matrix} u p s a m p l e (T) & if 2 T = M, \\ d o w n s a m p l e (M) and u p s a m p l e {(T)}^{'} & if M ≫ 2 T, \end{matrix}

(1)

where T represents the number of samples in the minority class, and M denotes the number of samples in the majority class. The function

u p s a m p l e (:)

doubles the size of minority-class samples by duplicating samples from T if the majority class has exactly twice as many elements as the minority class. While

d o w n s a m p l e (:)

randomly reduces the number of samples of the majority class to half, and

u p s a m p l e^{'} (:)

randomly increases the size of the minority class to make both classes equal.

3.1.3. Language Model Combination for Textual Data

The DeepSDC approach utilises pre-trained transformer-based networks for the feature representation and classification of textual contents. We used RoBERTa [29] and XLNet [30] pre-trained networks. These models use Transformers [39,40] to capture bidirectional relationships and are suitable for flood-event classification. These models were selected based on their performance on similar tasks [40,41]. The implementation of both networks is performed with the help of the python-based Fastai [42] library.

The Robustly Optimized BERT Pretraining Approach (RoBERTa) [29] is a pre-trained model for natural-language processing which improves upon its predecessor, Bidirectional Encoder Representations from Transformers (BERT). To improve performance, RoBERTa increases the data size for training, changes hyper-parameters, eliminates next-sentence prediction, trains on extended sequences, and masks tokens dynamically. Training data for this model include 160 GB of textual data from books, news, stories, and Wikipedia.

We also used XLNet [30] to classify textual data. This is an auto-regressive model which integrates the idea from Transformer-XL [43]. In the autoregressive model, the next token depends on all previous tokens in a sentence. However, the XLNet [30] is a generalized autoregressive model because it applies permutation language modeling (PLM). The model captures bidirectional context, including all possible permutations of words in a sentence. Finally, the class scores of both language models are combined by average voting ensembles as:

h_{i}^{text} = \frac{(σ (h_{i}^{R B}) + σ (h_{i}^{X L}))}{2},

(2)

where

h_{i}^{text}

and

h_{i}^{X L}

denote the class scores obtained by RoBERTa and XLNet for the

i t h

data element and

σ (:)

represents the sigmoid activation function. Later, the

h_{i}^{text}

is combined with the output of the visual module. The proposed textual module can also be used separately for flood classification using only text data. In this case, the flood predictions can be obtained as follows:

y_{i}^{text} = \{\begin{matrix} 1 & if σ (h_{i}^{text}) \geq 0.5 \\ 0 & if σ (h_{i}^{text}) < 0.5 \end{matrix}

(3)

where

y_{i}^{text}

denotes the class label of

i t h

data element, obtained using only the proposed system’s textual module. Later, the

h_{i}^{text}

is combined with the output of the visual module. The block diagram of the proposed textual module is shown in Figure 3.

3.2. DeepSDC Module-II: Visual Module

The second module of DeepSDC is the visual module. This module operates on images provided in tweets and classifies the flooding events. The visual module uses a deep-learning-based bagging approach to classify the images. Additionally, this technique uses image augmentation to tackle the class-imbalance challenges. A series of pre-trained VGG [27] and ResNet50 [28] networks are fine-tuned to support binary-classification problems. This visual module reduces the class-imbalance problem by increasing the number of images with the help of an image-augmentation technique. Multiple copies of each image are generated using different alterations, including elastic deformation, rotation, and adding noise. We used Augmentor Library [44] to generate several altered copies of images from the minority class and increase the quantity of training data.

Following the process of class balancing, the bagging process is initiated by random sampling with replacement in order to create n bags of equal input size

S_{1}^{'}, S_{2}^{'}, \dots S_{n}^{'}

. Each of the created bags

S_{k}^{'}

contains randomly selected images and their corresponding labels from training set S, such that

\underset{1 \leq k \leq n}{\forall} S_{k}^{'} \subset S

. Afterwards, for each bag

S_{k}^{'}

, two models, VGG and ResNet, are fine-tuned and denoted as

h_{k}^{VGG}

, and

h_{k}^{RN}

, respectively. For the

i t h

image, the prediction is obtained by averaging the class probabilities of both models using Equation (4). The block diagram of the proposed visual module is shown in Figure 4. We have already published the DeepSDC visual module in [26].

h_{i}^{image} = \frac{1}{n} \sum_{k = 1}^{n} \frac{(σ (h_{k}^{V G G}) + σ (h_{k}^{R N}))}{2}

(4)

3.3. DeepSDC: Ensemble of Module I and Module II

Since both textual and visual data are uploaded as a single entity on social-media platforms, both types of data are related in content. Therefore, the third phase of the proposed algorithm integrates textual and visual modules for flood-event classification. Thus, we propose the use of a second-stage ensemble process to combine the textual and visual modules. The variants of the two-stage ensemble process are also applied in several other studies [45,46,47,48]. In DeepSDC, the visual and textual modules of the proposed DeepSDC are individually trained, and their predictions are combined using an average ensemble approach:

y_{i}^{E} = \{\begin{matrix} 1 & if (\frac{(h_{i}^{image} + h_{i}^{text})}{2}) \geq 0.5 \\ 0 & else \end{matrix}

(5)

where

h_{i}^{image}

and

h_{i}^{text}

denote the class probabilities obtained by individually trained image and text modules, respectively, while

y_{i}^{E}

represents the predicted label through the ensemble process for

i t h

example (both text and image of tweet).

Two major benefits can be derived from the proposed two-stage ensemble approach. First, it combines the information obtained by visual and textual modules and helps to predict the flooding event by using visual and textual data simultaneously. A key component of this process is obtaining as much information as possible from various sources (including text and corresponding images). Second, the proposed ensemble design effectively integrates textual and visual modules, combining the diverse models for effective flood detection. As a result, the learning algorithm can produce more accurate results. Moreover, the proposed ensemble design allows for parallel training of both visual and textual models. Consequently, the models in both modules can be trained faster with this capability.

4. Experimental Setup

The experimental setup for the proposed algorithm is described in this section. Here, we present details about the experimental environment, datasets, and evaluation measures.

4.1. Datasets

The proposed approach was validated on various datasets related to the flooding disaster. The first dataset used in the research is Flood-related Multimedia Task (FRMT) [13]. This dataset is related to a binary-classification problem based on the availability or unavailability of the flooding situation. The FRMT dataset was released in MediaEval Benchmark Workshop, 2020 and contains only Italian tweets. Moreover, each dataset instance contains its respective image along with the text. The tweets were uploaded by different users between 2017 and 2019. There are 7777 tweets in the dataset; out of them, 5419 are part of the training set, and the remaining 2358 are included in the test set. However, the training set is highly imbalanced; the positive class contains 21% of the instances, and 79% of the instances are included in the negative class.

The second dataset, Flood Classification for Social Multimedia (FCSM) [14], was released at the MediaEval Benchmarks workshop in 2018. This dataset aims to identify passable roads during disasters. The dataset contains 8844 instances, of which, 5818 are placed in the training set and 3026 in the test set. Furthermore, the negative class of the train set has twice as many elements as compared to the positive class, resulting in an imbalance between the two classes. The positive class contains information of passable roads, but had only 2128 tweets, while the negative class incorporates 3685 tweets with almost no information regarding the target.

The third dataset we used for evaluation is Disaster Image Retrieval from Social Media (DIRSM) [15]. This dataset is related to flood detection. It was released by MediaEval in 2017 [15]. It contains 6600 instances, each having text and its relevant image. It contains 5280 instances in the training set, and the remaining 1320 are in the test set. The training part is imbalanced, where 3360 instances belong to the negative class, and 1920 are a part of the positive class. The challenging aspect of the dataset is that various images of the negative class also contain the normal flow of water, e.g., a river or lake. Hence, it is difficult for an algorithm to differentiate between images depicting the normal flow of water and flooding situations. The details of the datasets are given in Table 1.

4.2. Experimental Environment and Details of the Model Training

The experiments were performed with the help of the Google Colaboratory (Colab) service, a cloud-based service offered by Google. The service is useful for experiments as it supports valuable libraries, including Keras and TensorFlow, which helps implement artificial-intelligence-based methods. We used the Tesla T4 graphics processing unit and a memory of 12 GB during experiments.

We used the RoBERTa-base [29] and XLNet-base [30] models for the textual module. These two language models were finetuned to accommodate the binary-classification problem. In the training process, only the last two layers of the models were trained while the previous layers’ weights were kept frozen. The input to the RoBERTA model is formatted by adding a class label of a particular element, prefix space, tokens of that instance, separator, and padding. Similarly, for XLNet [30] tokens, separators, and class labels format each sentence. These models in the textual module were trained using a batch size of 16, the learning rate of

10^{- 5}

and Adam optimizer [49], and binary cross-entropy loss [50]. The proposed visual module combines the VGG [27] and ResNet50 [28] pre-trained models. Both these models were fine-tuned to support binary classification. Only the last layer of the models was trained during the training process, while the previous layers’ weights remained frozen.

We used the pre-trained hybrid of weights from two different datasets. For VGG16, we utilized the places365 [51] and ImageNet dataset [52] pre-trained weights. The places365 [51] focus on scene-based information, while the weights of the ImageNet dataset [52] concentrate on object-based information. These models were trained for 50 epochs with a learning rate of

10^{- 5}

and

10^{- 6}

, respectively. Furthermore, we used the Adam optimizer [49] and binary cross-entropy loss criterion to train both models.

4.3. Evaluation Measures

Multiple evaluation measures were used to assess the proposed DeepSDC algorithm’s performance, including average precision at K, mean average precision (MAP), and F1 Score. For the DIRSM dataset, performance is assessed using the first two evaluation measures, while for the FRMT and FCSM datasets, performance is evaluated using the F1 score. These evaluation measures were previously used for the Medieval flood-detection competition for flood-detection tasks over several years, from 2017 to 2020 [13,14,15]. The details of these evaluation measures are discussed in the following subsections.

4.3.1. Mean Average Precision and Average Precision at K

The mean average precision is calculated as average precision at different cut-offs, including 50, 150, 240, and 480. The average precision calculated at different intervals denotes the algorithm’s confidence at only a certain score level. The mean average precision considers the overall implementation performance on the whole dataset.

Average precision at K (AP@K) is a standard evaluation measure for information retrieval and recommendation systems. It measures the quality of the ranked list of items a user recommends. In this method, first, we ranked the results according to their relevance. Then, we computed the precision at each position in the ranked list up to position K. Precision is the fraction of relevant items among the total number of items retrieved at that position. Finally, an average of the precision scores obtained up to position K was computed by considering the position where each relevant item appears in the ranked list. It accounts for both precision and order.

4.3.2. F1 Score

F-score is an evaluation measure used to evaluate the performance of a classification model, particularly when dealing with imbalanced datasets. It indicates the performance of a model by combining precision and recall. A precision score is the fraction of true positives among the predicted positives. In contrast, a recall score is the fraction of true positives among the actual positives. In other words, precision measures the accuracy of the positive predictions, while recall measures the ability of the model to find all positive instances. F-score is the harmonic mean of precision and recall, with a higher F-score indicating better performance.

P r e c i s i o n = \frac{T P}{T P + F P}

(6)

R e c a l l = \frac{T P}{T P + F N}

(7)

F 1 = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l} = \frac{2 \times T P}{2 \times T P + F P + F N}

(8)

where the TP, FP, and FN denote the number of the true-positive, false-positive, and false-negative predictions, respectively.

5. Results and Discussion

This section discusses the results of the DeepSDC approach applied to textual data and a combination of textual and visual data. The performance of the proposed DeepSDC is evaluated on three flood-classification datasets, including Flood-related Multimedia Task (FRMT) [13], Flood Classification for Social Multimedia (FCSM) [14], Disaster Image Retrieval from Social Media (DIRSM) [15]. These datasets provide textual data of tweets and associated images where the aim is to produce binary classification based on flooding events.

The proposed DeepSDC approach is compared with several state-of-the-art flood-classification techniques. A performance comparison is made by providing the text and corresponding images to the visual and textual modules. The outputs from both modules are then combined using an average ensemble method. The performance of the textual module of DeepSDC is also assessed separately using only tweets as textual data. First, we discuss the results of the FRMT dataset, and the results of the DIRSM and FCSM datasets are discussed later in the section.

5.1. Results of DeepSDC: FRMT Dataset

The Flood-related Multimedia Task (FRMT) finds the availability or unavailability of the flooding event. It contains tweets in the Italian language, and an image accompanies each tweet. The performance comparison of the DeepSDC approach on the FRMT dataset using the textual module is given in Table 2, and the performance comparison of DeepSDC with the combination of visual and textual modules is presented in Table 3.

With only text data, the DeepSDC Textual module (Module I) surpassed the performance of state-of-the-art text-based flood-detection techniques [16,17,18]. The proposed Textual module produced a 57.92% F1 score compared to the highest reported F1 score of 54.05%. For the combination of visual and textual data, the proposed ensemble of textual and visual modules achieved superior performance compared to other state-of-the-art techniques.

The current state-of-the-art multimodal approach proposed in [18] extracts visual and textual details from the data using VGG-16 and TF-IDF. Similarly, the proposed approach in [17] utilises Xception [54] and TF-IDF techniques to incorporate textual and visual details in the flood-classification process. These techniques are useful in processing multimodal data. However, the performance of TF-IDF is limited to extracting semantic and linguistic structures, e.g., negation or sarcasm. On the other hand, the proposed DeepSDC combines state-of-the-art language models (XL-Net and RoBERTA) to obtain more information from the textual data. The proposed design allows the use of diverse textual features in the classification process, and the proposed ensemble design in the visual module helps to produce more details for flood classification, which were limited in the previously proposed techniques.

As shown in Table 3, combining textual and visual modules on this dataset leads to inferior performance. Since DeepSDC considers visual and textual modules to be equally important in flood classification. In terms of completeness, consistency, and relevance, the visual data in this dataset does not meet a high-quality standard. Several images in the datasets are incorrectly annotated or exist in both classes, and, in some cases, multiple tweets contain the same image. State-of-the-art techniques face the same problem [16,17], and when textual and visual data are combined for flood classification, their performance is also affected. Despite these limitations, the DeepSDC produced a superior performance compared to other multimodal classification methods.

5.2. Results of DeepSDC: DIRSM Dataset

The second set of data used to evaluate the proposed DeepSDC is Disaster Image Retrieval from Social Media (DIRSM) [15]. This dataset comprises textual data and its relevant visual data. Moreover, the dataset is based on the binary classification of images and associated text based on the availability or unavailability of the flood. On this dataset, the DeepSDC model is evaluated based on average precision at the interval of 480 and mean average precision at different intervals of 50, 150, 240 and 480. In our evaluation, we separately compared the textual module and the ensemble of textual and visual components. The comparative analysis of the proposed textual module is presented in Table 4, and the performance analysis of the ensemble of visual and textual modules is shown in Table 5.

Based on the experimental results in Table 4, the textual module has robustly handled text data compared to other state-of-the-art algorithms. It produced an average precision of 76.33% and a mean precision of 91.29%, respectively, exceeding the performance of the alternative techniques. As shown in Table 5, the proposed combination of the textual and visual modules produces superior performance. The proposed approach is able to attain 92.65% average precision and 98.94% mean average precision.

In the case of textual data, state-of-the-art methods by Dao et al. [22] and Keiller et al. [21] use the bag-of-words (BoW) approach for text classification. However, the BoW ignores word order and structure in a text tweet and provides a limited semantic understanding of the text, which can result in a loss of contextual information. The proposed DeepSDC obtains contextual understanding through the use of RoBERTa and XLNet. These models provide a deeper understanding of the context of the tweet. Therefore, the DeepSDC is able to produce improved results for text data when compared with state-of-the-art techniques.

In the case of combining visual and textual data, state-of-the-art methods such as Sheharyar et al. [23] use AlexNet and the keyword ranking approach which combines visual and textual information. However, the AlexNet is a relatively shallow CNN compared to ResNet50, which allows for capturing more complex features and intricate representations of visual data. Moreover, the keyword ranking approach for text relies on a predefined set of keywords, which might not capture relevant information, and may not scale to large datasets. Similarly, the model for multimodal data proposed by Dao et al. [22] uses a bag of words (BoW) and CNN features and gives a good performance. However, the performance of the BoW model may be limited due to the loss of sequential information, lack of context and inability to handle synonyms. As a result, the combined model may have limited performance.

The proposed DeepSDC employs two different models for visual data to capture diverse visual information. The proposed textual module in DeepSDC incorporates two models (RoBERTA and XL-Net pre-trained on the large dataset), which helps capture contextual understanding and scalability for large datasets.

5.3. Results of DeepSDC: The FCSM Dataset

The third dataset we used for the evaluation of the proposed DeepSDC method is Flood Classification for Social Multimedia (FCSM) [14]. The dataset aims to find passable roads using textual and visual data from tweets. Similar to previous experiments, we separately evaluated the performance of the textual module using text data and the ensemble of textual and visual modules. The experimental results for the textual module and ensemble of the textual and visual modules are shown in Table 6 and Table 7, respectively. The proposed DeepSDC outperformed the existing state-of-the-art algorithms. The obtained result clearly shows that the proposed DeepSDC can robustly handle textual and a combination of textual and visual data. For textual data from the FCSM dataset, the comparison of DeepSDC results with state-of-the-art methods is given in Table 6.

Laura et al. [31] achieved the highest performance on the textual part of this dataset. This technique utilizes the combination of Glove and LSTM. However, this combination provides limited generalization, especially when the data is noisy or contains errors. Additionally, the proposed design may lead to the problem of overfitting. Similarly, Hanif et al. [56] adopt the TF-IDF vector for text classification. However, this approach is vulnerable when encountering noisy or irrelevant terms in text data and may pose difficulties in handling short or very-long tweets.

For the combination of visual and textual information, the highest performance is reported by Zhengyu et al. [32]. This approach adopts ResNet and the Ngram model for visual and textual information, respectively. However, N-gram models can struggle to handle words not present in the training data, also known as out-of-vocabulary (OOV) words. Likewise, Laura et al. [31] proposed stacking GloVe and LSTM for multimodal data. GloVe generates a fixed set of word embeddings based on the co-occurrence statistics of words in a corpus of text. These embeddings are fixed during training. When using GloVe with LSTM, the model will use the same fixed embeddings for each input sequence, which may not provide optimal performance.

In comparison, the proposed DeepSDC avoids overfitting by adopting a bagging ensemble approach and applying different augmentation techniques and procedures in the visual module. Furthermore, the models used in the DeppSDC textual models provide more contextual features as compared to the Ngram model. Therefore, the proposed DeepSDC attained better performance compared to state-of-the-art methods.

5.4. Analysis of Why DeepSDC Works

The inferior performance of the existing state-of-the-art flood detection can be attributed to several factors. First, these datasets are highly class-imbalanced. This problem is occurs in all three datasets, where the negative class contains a larger number of examples compared to the positive class. Moreover, incorrect annotations are also challenging, as similar instances exist in both positive and negative classes of the dataset. Additionally, in the FRMT dataset, the images of symbols showing meteorological alerts are found in both classes. These signs are almost similar and do not provide any additional information about the target. Likewise, similar contents in textual instances are also observed, and this duplication further complicates the learning process. In this case, the learning algorithm cannot guarantee the best performance.

The proposed DeepSDC performance was superior on all three flood-detection datasets, including text and a combination of textual and visual data. The proposed DeepSDC tackles class-imbalance problems through preprocessing, augmentation, upsampling and downsampling techniques. Additionally, DeepSDC employs a two-staged ensemble-learning approach by using more than one model for text and visual data. In the first stage, ensemble learning is applied in visual and textual modules. This process focuses on obtaining a diverse set of information from different examples. In the second stage, the obtained information from various models is combined using the average ensemble process, which improves the average prediction performance compared to that of a single model.

5.5. Analysis of DeepSDC Predictions and Limitations

The example of correctly classified tweets using DeepSDC is shown in Figure 5, while Figure 6 shows cases of misclassified tweets using the proposed method. Figure 6a illustrates the tweets classified as “Flood-event” actually belonging to “Non-Flood”, while the tweets classified as “Flood-event” actually belonging to “Non-Flood” are illustrated in Figure 6b. This misclassification occurred due to several reasons. First, the tweets that were misclassified as “Non-Flood” contain textual information relevant to the flood but do not provide sufficient images related to the flood. Second, in many cases, the annotator assigned the label to the tweet by looking at only a single modality.

In the proposed DeepSDC, text and visual modules contribute equally. Therefore, in cases where a single modality (either text or image) is emphasized by the annotator, the classification performance may be adversely affected. A solution to this problem can be found by providing properly annotated data or assigning weights to the contributions of visual and textual modules.

6. Conclusions

This paper presented a robust flood-classification technique called Deep Social Media Data Classification (DeepSDC). The proposed algorithm utilizes visual and text data extracted from Twitter to classify flooding events. DeepSDC utilizes two different modules: one for text data and one for visual data. These modules are designed based on ensemble learning and help to extract diverse information from text and images simultaneously. The visual module is based on pre-trained VGG and ResNet models, while the textual module utilizes RoBERTa and XL-Net models. Additionally, a second-stage ensemble design is proposed to incorporate the results of the visual and textual modules. DeepSDC adopts different class-balancing and augmentation techniques to tackle class-imbalance problems. The effectiveness of the proposed algorithms is evaluated on three datasets, and the performance is compared with several state-of-the-art flood-detection techniques. The proposed DeepSDC surpassed the performance of state-of-the-art algorithms on all three datasets for different flooding events.

The proposed approach produced limited performance in the case of the DIRSM dataset due to the assumption of the equal contribution of both visual and textual modules. This problem may be faced in the datasets where the training-data labels are assigned by focusing on a single modality, or the text or image modality contains a significant deviation from the target class.

In the future, the DeepSDC can be extended in various dimensions as more pre-trained neural networks for textual and visual data can be added, which may positively affect the results. Furthermore, the ensemble process of visual and textual modules of DeepSDC can be extended by introducing a weighted average process through attention pooling.

Author Contributions

Conceptualization, M.H. and M.A.T.; methodology, M.H.; supervision, M.A.T. and M.R.; writing—original draft, M.H. and M.W.; writing—review and editing, M.W., M.H., A.M., A.A., M.A.T. and M.R.; funding acquisition, A.M. and A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by Researchers Supporting Project number (RSP2023R309), King Saud University, Riyadh, Saudi Arabia.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the organizers of MediaEval Benchmarking Initiative for Multimedia Evaluation. It sets out the terms and conditions of data use and establishing the rights and responsibilities on the registered user.

Conflicts of Interest

The authors declare no conflict of interest.

References

EM-DAT. The International Disaster Database. Center for Research on the Epidemiology of Disasters. 2019. Available online: https://www.emdat.be/ (accessed on 26 February 2023).
Lopez-Fuentes, L.; Farasin, A.; Zaffaroni, M.; Skinnemoen, H.; Garza, P. Deep Learning Models for Road Passability Detection during Flood Events Using Social Media Data. Appl. Sci. 2020, 10, 8783. [Google Scholar] [CrossRef]
EU-Commission. Funding Opportunities to Support Disaster Risk Prevention in the Cohesion Policy 2014–2020 Period; European Commission: Brussels, Belgium, 2014. [Google Scholar]
Esposito, M.; Palma, L.; Belli, A.; Sabbatini, L.; Pierleoni, P. Recent advances in internet of things solutions for early warning systems: A review. Sensors 2022, 22, 2124. [Google Scholar] [CrossRef]
Wu, R.S.; Sin, Y.Y.; Wang, J.X.; Lin, Y.W.; Wu, H.C.; Sukmara, R.B.; Indawati, L.; Hussain, F. Real-time flood warning system application. Water 2022, 14, 1866. [Google Scholar] [CrossRef]
Cao, C.; Xu, P.; Wang, Y.; Chen, J.; Zheng, L.; Niu, C. Flash flood hazard susceptibility mapping using frequency ratio and statistical index methods in coalmine subsidence areas. Sustainability 2016, 8, 948. [Google Scholar] [CrossRef] [Green Version]
Khan, M.Y.A.; ElKashouty, M.; Subyani, A.M.; Tian, F. Flash Flood Assessment and Management for Sustainable Development Using Geospatial Technology and WMS Models in Abha City, Aseer Region, Saudi Arabia. Sustainability 2022, 14, 10430. [Google Scholar] [CrossRef]
Sarkar, S.K.; Ansar, S.B.; Ekram, K.M.M.; Khan, M.H.; Talukdar, S.; Naikoo, M.W.; Islam, A.R.T.; Rahman, A.; Mosavi, A. Developing robust flood susceptibility model with small numbers of parameters in highly fertile regions of Northwest Bangladesh for sustainable flood and agriculture management. Sustainability 2022, 14, 3982. [Google Scholar] [CrossRef]
Pandey, A.C.; Kaushik, K.; Parida, B.R. Google Earth Engine for large-scale flood mapping using SAR data and impact assessment on agriculture and population of Ganga-Brahmaputra basin. Sustainability 2022, 14, 4210. [Google Scholar] [CrossRef]
Attia, W.; Ragab, D.; Abdel-Hamid, A.M.; Marghani, A.M.; Elfadaly, A.; Lasaponara, R. On the Use of Radar and Optical Satellite Imagery for the Monitoring of Flood Hazards on Heritage Sites in Southern Sinai, Egypt. Sustainability 2022, 14, 5500. [Google Scholar] [CrossRef]
Werner, M.; Reggiani, P.; De Roo, A.; Bates, P.; Sprokkereef, E. Flood forecasting and warning at the river basin and at the European scale. Nat. Hazards 2005, 36, 25–42. [Google Scholar] [CrossRef]
George, E.I.; Abraham, C.M. Real-time earthquake detection using Twitter tweets. In AIP Conference Proceedings; AIP Publishing LLC.: Melville, NY, USA, 2022; Volume 2520, p. 030014. [Google Scholar]
Andreadis, S.; Gialampoukidis, I.; Karakostas, A.; Vrochidis, S.; Kompatsiaris, I.; Fiorin, R.; Norbiato, D.; Ferri, M. The flood-related multimedia task at mediaeval 2020. In Proceedings of the MediaEval 2020 Workshop, Online, 14–15 December 2020; pp. 14–15. [Google Scholar]
Benjamin, B.; Patrick, H.; Zhengyu, Z.; Damian, B. The multimedia satellite task at mediaeval 2018: Emergency response for flooding events. In Proceedings of the MediaEval 2018 Workshop (CEUR Workshop Proceedings), Sophia Antipolis, France, 29–31 October 2018. [Google Scholar]
Bischke, B.; Helber, P.; Schulze, C.; Venkat, S.; Dengel, A.; Borth, D. The multimedia satellite task at mediaeval 2017: Emergence response for flooding events. In Proceedings of the MediaEval 2017 Workshop, Dublin, Ireland, 13–15 September 2017. [Google Scholar]
Said, N.; Ahmad, K.; Gul, A.; Ahmad, N.; Al-Fuqaha, A. Floods Detection in Twitter Text and Images. In Proceedings of the MediaEval 2020 Workshop, Online, 14–15 December 2020. [Google Scholar]
Islam, R.; Alan, W. Flood Detection in Twitter Using a Novel Learning Method for Neural Networks. In Proceedings of the MediaEval 2020 Workshop, Online, 14–15 December 2020. [Google Scholar]
Hanif, M.; Joozer, M.; Huzaifa, M.; Rafi, M. An ensemble based method for the classification of flooding event using social media data. In Proceedings of the MediaEval 2018 Workshop, Sophia Antipolis, France, 29–31 October 2018; pp. 29–31. [Google Scholar]
Avgerinakis, K.; Moumtzidou, A.; Andreadis, S.; Michail, E.; Gialampoukidis, I.; Vrochidis, S.; Kompatsiaris, I. Visual and textual analysis of social media and satellite images for flood detection@ multimedia satellite task MediaEval 2017. In Proceedings of the MediaEval 2017 Workshop, Dublin, Ireland, 13–15 September 2017. [Google Scholar]
Zhao, Z.; Larson, M.A. Retrieving Social Flooding Images Based on Multimodal Information. In Proceedings of the MediaEval 2017 Workshop, Dublin, Ireland, 13–15 September 2017. [Google Scholar]
Nogueira, K.; Fadel, S.G.; Dourado, Í.C.; de Oliveira Werneck, R.; Muñoz, J.A.; Penatti, O.A.; Calumby, R.T.; Li, L.; dos Santos, J.A.; da Silva Torres, R. Data-Driven Flood Detection using Neural Networks. In Proceedings of the MediaEval 2017 Workshop, Dublin, Ireland, 13–15 September 2017. [Google Scholar]
Dao, M.S.; Pham, Q.N.M.; Dang Nguyen, D.T. A domain-based late-fusion for disaster image retrieval from social media. In Proceedings of the MediaEval 2017 Workshop, Dublin, Ireland, 13–15 September 2017. [Google Scholar]
Ahmad, S.; Ahmad, K.; Ahmad, N.; Conci, N. Convolutional Neural Networks for Disaster Images Retrieval. In Proceedings of the MediaEval 2017 Workshop, Dublin, Ireland, 13–15 September 2017. [Google Scholar]
Bischke, B.; Bhardwaj, P.; Gautam, A.; Helber, P.; Borth, D.; Dengel, A. Detection of Flooding Events in Social Multimedia and Satellite Imagery using Deep Neural Networks. In Proceedings of the MediaEval 2017 Workshop, Dublin, Ireland, 13–15 September 2017. [Google Scholar]
Dourado, I.C.; Tabbone, S.; da Silva Torres, R. Event prediction based on unsupervised graph-based rank-fusion models. In Proceedings of the International Workshop on Graph-Based Representations in Pattern Recognition, Tours, France, 19–21 June 2019; Springer: Berlin, Germany, 2019; pp. 88–98. [Google Scholar]
Hanif, M.; Tahir, M.A.; Rafi, M. VRBagged-Net: Ensemble Based Deep Learning Model for Disaster Event Classification. Electronics 2021, 10, 1411. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. Xlnet: Generalized autoregressive pretraining for language understanding. In Proceedings of the Advances in Neural Information Processing Systems 32, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
Lopez-Fuentes, L.; Farasin, A.; Skinnemoen, H.; Garza, P. Deep Learning Models for Passability Detection of Flooded Roads. In Proceedings of the MediaEval 2018 Workshop (CEUR Workshop Proceedings), Sophia Antipolis, France, 29–31 October 2018. [Google Scholar]
Zhao, Z.; Larson, M.; Oostdijk, N. Exploiting Local Semantic Concepts for Flooding-related Social Image Classification. In Proceedings of the MediaEval 2018 Workshop (CEUR Workshop Proceedings), Sophia Antipolis, France, 29–31 October 2018. [Google Scholar]
Bischke, B.; Helber, P.; Dengel, A. Global-Local Feature Fusion for Image Classification of Flood Affected Roads from Social Multimedia. In Proceedings of the 16th ACM International Conference on Multimedia (CEUR Workshop Proceedings), Vancouver, BC, Canada, 26–31 October 2008; pp. 1085–1088. [Google Scholar]
Jou, B.; Chang, S.F. Deep cross residual learning for multitask visual recognition. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 998–1007. [Google Scholar]
Chen, T.; Borth, D.; Darrell, T.; Chang, S.F. Deepsentibank: Visual sentiment concept classification with deep convolutional neural networks. arXiv 2014, arXiv:1410.8586. [Google Scholar]
Tkachenko, N.; Zubiaga, A.; Procter, R. Wisc at Mediaeval 2017: Multimedia Satellite Task. 2017. Available online: https://qmro.qmul.ac.uk/xmlui/handle/123456789/56411 (accessed on 26 February 2023).
Sun, Y.; Wong, A.K.; Kamel, M.S. Classification of imbalanced data: A review. Int. J. Pattern Recognit. Artif. Intell. 2009, 23, 687–719. [Google Scholar] [CrossRef]
Barandela, R.; Sánchez, J.S.; Garcıa, V.; Rangel, E. Strategies for learning in class imbalance problems. Pattern Recognit. 2003, 36, 849–851. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Jagadeesh, M.; Alphonse, P. NIT_COVID-19 at WNUT-2020 Task 2: Deep Learning Model RoBERTa for Identify Informative COVID-19 English Tweets. In Proceedings of the W-NUT@ EMNLP, Online, 19 November 2020; pp. 450–454. [Google Scholar]
Abavisani, M.; Wu, L.; Hu, S.; Tetreault, J.; Jaimes, A. Multimodal categorization of crisis events in social media. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 14679–14689. [Google Scholar]
Howard, J. Fastai. 2018. Available online: https://github.com/fastai/fastai (accessed on 26 February 2023).
Dai, Z.; Yang, Z.; Yang, Y.; Carbonell, J.; Le, Q.V.; Salakhutdinov, R. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv 2019, arXiv:1901.02860. [Google Scholar]
Bloice, M.; Stocker, C.; Holzinger, A. Augmentor: An Image Augmentation Library for Machine Learning. J. Open Source Softw. 2017, 2, 432. [Google Scholar] [CrossRef]
Waqas, M.; Tahir, M.A.; Khan, S.A. Robust bag classification approach for multi-instance learning via subspace fuzzy clustering. Expert Syst. Appl. 2023, 214, 119113. [Google Scholar] [CrossRef]
Waqas, M.; Tahir, M.A.; Qureshi, R. Deep Gaussian mixture model based instance relevance estimation for multiple instance learning applications. Appl. Intell. 2022, 1–16. [Google Scholar] [CrossRef]
Waqas, M.; Khan, Z.; Anjum, S.; Tahir, M.A. Lung-Wise Tuberculosis Analysis and Automatic CT Report Generation with Hybrid Feature and Ensemble Learning. In Proceedings of the CLEF (Working Notes), Thessaloniki, Greece, 22–25 September 2020. [Google Scholar]
Waqas, M.; Tahir, M.A.; Qureshi, R. Ensemble-Based Instance Relevance Estimation in Multiple-Instance Learning. In Proceedings of the 2021 9th European Workshop on Visual Information Processing (EUVIP), Paris, France, 23–25 June 2021; pp. 1–6. [Google Scholar]
Zhang, Z. Improved adam optimizer for deep neural networks. In Proceedings of the 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), Banff, AB, Canada, 4–6 June 2018; pp. 1–2. [Google Scholar]
Zhang, Z.; Sabuncu, M. Generalized cross entropy loss for training deep neural networks with noisy labels. In Proceedings of the Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Montreal, QC, Canada, 3–8 December 2018; Volume 31. [Google Scholar]
Zhou, B.; Lapedriza, A.; Khosla, A.; Oliva, A.; Torralba, A. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 1452–1464. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
Nikoletopoulos, T.; Wolff, C. A Tweet Text Binary Artificial Neural Network Classifier. In Proceedings of the MediaEval, Online, 14–15 December 2020. [Google Scholar]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
Lopez-Fuentes, L.; van de Weijer, J.; Bolanos, M.; Skinnemoen, H. Multi-modal Deep Learning Approach for Flood Detection. In Proceedings of the MediaEval 2017 Workshop, Dublin, Ireland, 13–15 September 2017. [Google Scholar]
Hanif, M.; Tahir, M.A.; Rafi, M. Detection of passable roads using Ensemble of Global and Local. In Proceedings of the MediaEval 2018 Workshop (CEUR Workshop Proceedings), Sophia Antipolis, France, 29–31 October 2018. [Google Scholar]

Figure 1. The examples of tweets and relevant images for the datasets of DIRSM, FCSM and FRMT are shown in (a–c), respectively.

Figure 2. The proposed DeepSDC algorithm. Module I (Textual Module) and Module II (Visual Module) are shown in (a,b), respectively.

Figure 3. DeepSDC Module I (Textual Module).

Figure 4. The DeepSDC Module II (Visual Module).

Figure 5. The example of correctly classified tweets using DeepSDC visual and textual module.

Figure 6. The example of misclassified tweets using DeepSDC visual and textual module.

Table 1. The details of the datasets.

Dataset	Year	Objective	Nature	Size	Evaluation Measure
Disaster Image Retrieval from Social Media (DIRSM) [15].	2017	Flood-event detection	Textual and visual	6600	Average Precision at K and Mean Average Precision
Flood Classification for Social Multimedia (FCSM) [14].	2018	Detection of passable roads	Textual and visual	8844	F1 Score
Flood-related Multimedia Task (FRMT) [13].	2020	Flood-event detection using Italian language tweets	Textual and visual	7777	F1 Score

Table 2. Performance comparison of DeepSDC Module I on text data using FRMT dataset.

Methods	Techniques Used	F-Score
Hanif et al. [18]	TF-IDF, Multinomial Naive Bayes	36.31
Rabiul et al. [17]	TF-IDF, Multinomial Naive Bayes	41.58
Naina et al. [16]	Bag of Words	43.70
Nikoletopoulos et al. [53]	ANN, Undersampling	54.05
DeepSDC (Module-I)	Ensemble of Roberta and XLNet	57.92

Table 3. Comparative analysis of DeepSDC Module I and Module II on FRMT dataset.

Methods	Techniques Used	F-Score
Naina et al. [16]	BoW, Resnet, VGG	09.00
Rabiul et al. [17]	TF-IDF, Chi-squared, Xception	14.78
Hanif et al. [18]	TF-IDF, Multinomial Nave Bayes, VGG16	27.86
DeepSDC (Module-I & II)	Ensemble of Visual and Textual Module	46.52

Table 4. Comparative analysis of DeepSDC Module I on DIRSM Dataset. The column AP@480 represents average precision at a single interval of 480. The MAP represents the mean of the average precisions at different intervals of 50, 150, 240 and 480.

Methods	Techniques Used	AP@480	MAP
Dao et al. [22]	BoW	57.07	57.12
Keiller et al. [21]	BoW, Relation Network	76.71	62.63
Laura et al. [55]	Glove, LSTM	61.58	66.38
Nataliya et al. [36]	BoW, Logistic Regression	66.78	74.37
Zhengyu et al. [20]	Ranking based fusion	63.7	75.74
DeepSDC (Module-I)	Ensemble of XLNet and RoBERTa	76.33	91.29

Table 5. Comparative analysis of DeepSDC Module I & Module II on DIRSM Dataset. The column AP@480 represents average precision at a single interval of 480. The MAP represents the mean of the average precisions at different intervals of 50, 150, 240, and 480.

Methods	Techniques Used	AP@480	MAP
Laura et al. [55]	Glove, LSTM, Inception	83.96	81.6
Zhengyu et al. [20]	Ranking, CEDD, SVM	85.43	73.16
Keiller et al. [21]	Relation Network, ResNet	85.63	95.84
Dao et al. [22]	BoW, SVM, CNN	90.39	85.41
Sheharyar et al. [23]	AlexNet, SVM	92.55	83.73
DeepSDC(M-I & M-II)	VGG, ResNet, Roberta, XLNet	92.65	98.94

Table 6. Comparative analysis of DeepSDC Module I on FCSM Dataset.

Methods	Techniques Used	F-Score
Zhengyu et al. [32]	Ngram, semantic ranking	32.6
Hanif et al. [56]	SRKDA, TFIDF	58.3
Laura et al. [31]	Glove, LSTM	62.56
DeepSDC (Module-I)	Roberta, XLNet, Late fusion	68.34

Table 7. Comparative analysis of DeepSDC Module I and Module II on FCSM Dataset.

Methods	Techniques Used	F-Score
Hanif et al. [56]	SRKDA, TFIDF, LIRE Features	74.58
Laura et al. [31]	Stacking, Glove, LSTM	86.99
Zhengyu et al. [32]	ResNet, SVM, Ngram	87.58
DeepSDC (M-I & M-II)	VGG, ResNet, Roberta, XLNet	92.87

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hanif, M.; Waqas, M.; Muneer, A.; Alwadain, A.; Tahir, M.A.; Rafi, M. DeepSDC: Deep Ensemble Learner for the Classification of Social-Media Flooding Events. Sustainability 2023, 15, 6049. https://doi.org/10.3390/su15076049

AMA Style

Hanif M, Waqas M, Muneer A, Alwadain A, Tahir MA, Rafi M. DeepSDC: Deep Ensemble Learner for the Classification of Social-Media Flooding Events. Sustainability. 2023; 15(7):6049. https://doi.org/10.3390/su15076049

Chicago/Turabian Style

Hanif, Muhammad, Muhammad Waqas, Amgad Muneer, Ayed Alwadain, Muhammad Atif Tahir, and Muhammad Rafi. 2023. "DeepSDC: Deep Ensemble Learner for the Classification of Social-Media Flooding Events" Sustainability 15, no. 7: 6049. https://doi.org/10.3390/su15076049

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DeepSDC: Deep Ensemble Learner for the Classification of Social-Media Flooding Events

Abstract

1. Introduction

2. Literature Review

2.1. Social-Media-Based Flood Detection and Response Systems

2.1.1. Studies Related to the FRMT Dataset

2.1.2. Studies Related to the FCSM Dataset

2.1.3. Studies Related to the DIRSM Dataset

3. Methodology

3.1. DeepSDC Module-I: Textual Module

3.1.1. Data Translation and Pre-Processing

3.1.2. Class Balancing

3.1.3. Language Model Combination for Textual Data

3.2. DeepSDC Module-II: Visual Module

3.3. DeepSDC: Ensemble of Module I and Module II

4. Experimental Setup

4.1. Datasets

4.2. Experimental Environment and Details of the Model Training

4.3. Evaluation Measures

4.3.1. Mean Average Precision and Average Precision at K

4.3.2. F1 Score

5. Results and Discussion

5.1. Results of DeepSDC: FRMT Dataset

5.2. Results of DeepSDC: DIRSM Dataset

5.3. Results of DeepSDC: The FCSM Dataset

5.4. Analysis of Why DeepSDC Works

5.5. Analysis of DeepSDC Predictions and Limitations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI