Novel Genetic Optimization Techniques for Accurate Social Media Data Summarization and Classification Using Deep Learning Models

Ghanem, Fahd A.; Padma, M. C.; Abdulwahab, Hudhaifa M.; Alkhatib, Ramez

doi:10.3390/technologies12100199

Open AccessArticle

Novel Genetic Optimization Techniques for Accurate Social Media Data Summarization and Classification Using Deep Learning Models

¹

Department of Computer Science & Engineering, PES College of Engineering (Affiliated to University of Mysore), Mandya 571401, India

²

Department of Computer Science, College of Education-Zabid, Hodeidah University, Hodeidah P.O. Box 3114, Yemen

³

Department of Computer Application, Ramaiah Institute of Technology (Affiliated to VTU), Bengaluru 560054, India

⁴

BMB Nord, Research Center Borstel, Parkallee 35, 23845 Borstel, Germany

^*

Author to whom correspondence should be addressed.

Technologies 2024, 12(10), 199; https://doi.org/10.3390/technologies12100199

Submission received: 2 September 2024 / Revised: 1 October 2024 / Accepted: 7 October 2024 / Published: 15 October 2024

(This article belongs to the Section Information and Communication Technologies)

Download

Browse Figures

Versions Notes

Abstract

In the era of big data, effectively processing and understanding the vast quantities of brief texts on social media platforms like Twitter (X) is a significant challenge. This paper introduces a novel approach to automatic text summarization aimed at improving accuracy while minimizing redundancy. The proposed method involves a two-step process: first, feature extraction using term frequency–inverse document frequency (TF–IDF), and second, summary extraction through genetic optimized fully connected convolutional neural networks (GO-FC-CNNs). The approach was evaluated on datasets from the Kaggle collection, focusing on topics like FIFA, farmer demonstrations, and COVID-19, demonstrating its versatility across different domains. Preprocessing steps such as tokenization, stemming, stop word s removal, and keyword identification were employed to handle unprocessed data. The integration of genetic optimization into the neural network significantly improved performance compared to traditional methods. Evaluation using the ROUGE criteria showed that the proposed method achieved higher accuracy (98.00%), precision (98.30%), recall (98.72%), and F1-score (98.61%) than existing approaches. These findings suggest that this method can help create a reliable and effective system for large-scale social media data processing, enhancing data dissemination and decision-making.

Keywords:

text summarization; Twitter; tokenization; stemming; stop word removal; extractive summarization; genetic optimized fully connected convolutional neuronet (GO-FC-CNN)

1. Introduction

The ability to condense massive amounts of text into clear, concise summaries makes automatic short-text summarization a crucial technological advancement in the current era of information explosion.

1.1. Importance and Growing Demand

This procedure employs modern approaches to perform calculations to derive the required information from a text and present it in a manner that can be understood by users. The primary purpose is to summarize the information from the source material into the navigation’s digestible but sufficient form while preserving its context and relevance [1]. With the learning development of digital media and the internet, textual information has been generated in large volumes; as a result, automatic short-text summarization has become more important. This is a problem that both the user or client and the organization face when searching for relevant information within large amounts of data [2]. Existing solutions for reading and condensing text are rather time-consuming and ineffective given that there is so much information available on the web. Thus, automatic text summarization can be divided into two types based on the abstraction approaches used: extractive and abstractive. Abstractive summarization involves coming up with new sentences that give general information and may require an understanding of this information, unlike extractive summarization, which involves picking relevant lines or words directly from the source material [3].

1.2. Applications and Benefits

Brief text summaries generated by an automatic procedure are needed in numerous spheres of today’s human activities [4]. They assist the writers, especially journalists, in developing headline news or brief statements describing lengthy pieces so that the readers may remain updated with minimal exertion and time. Comparator summarization technologies have the function of helping users to read through large volumes of reports, emails, and meeting notes for improved decision-making, especially in the corporate world [5]. Furthermore, in the context of the social media sphere, where the message should be concise, summarizing algorithms makes it easy to develop brief postings from summaries based on longer pieces of content while ensuring that significant information reaches the audience. Social media sites are the main reason behind the much-intensified push toward the creation of automatic short-text summary methods [6]. Most of the messages on social media networks, tweets, and posts are brief; nevertheless, they are informative, bearing details which are useful for interpreting the public’s mood, tendencies, and perceptions. Thus, it is possible to simplify these brief sentences and thus control and observe social media information and provide reactions or assumptions more quickly.

1.3. Impact on Different Domains

Automated short-text summarization has a broad coverage of applications and spheres of impact. Therefore, and through the summing up of emails, notes from meetings, and reports, automatic summarization in the corporate world will ease the workload of professionals and make room for other tasks that require more of their professional skills to accomplish [7]. Hence, the use of summarized documents and textbooks can assist researchers and students in the field of education by saving effort and time when studying. To increase content visibility and engagement, social networks employ an additional natural language processing (NLP) method called automatic short-text summarization to create summaries of the material generated by users. Despite these advancements, there are many more opportunities for research and development when it comes to automatic short-text summarization [8]. Existing models have biases when it comes to the language and writing styles seen in the training data; thus, another area of focus is the ability to update the summarization models to make them more robust when it comes to multilingual and diverse texts. In addition, there is a need to enhance the interpretability and openness of these models to assist the consumers in developing confidence in the summarization process, as well as comprehending the summarization process. It is with these problems in mind that realizing the full potential of automatic short-text summarization and advancing the field need to be addressed [9].

This research seeks to implement GO-FC-CNN for the subsequent phase of summarizing Twitter datasets with lower redundancy and higher accuracy.

1.4. Contributions of This Paper

This study evaluates the suggested strategy on a Kaggle dataset covering a range of topics, including farmer demonstrations, FIFA, and COVID-19.
It implements extensive data preprocessing to handle concerns with redundant sequences and missing elements in the dataset. This includes tokenization, stemming, stop word removal, and keyword recognition.
TF–IDF is utilized to efficiently extract features, increasing the efficiency of the improved summarization model.
A GO-FC-CNN is presented to decrease redundancy and increase summary accuracy.
Model efficacy is assessed using ROUGE metrics, demonstrating enhanced precision and recall in contrast to current techniques.

A literature review is provided in Section 2. In Section 3, the GO-FC-CNN automatic short-text summary technique is introduced. The empirical outcomes of this study are described in Section 4, and a discussion is provided in Section 5. The conclusions are presented in Section 6.

2. Related Work

Abdel-Salam and Rafea [10] performed a series of investigations to determine the impact of several versions of a Bidirectional Encoder Representation from Transformers (BERT)-based model on text summarization and proposed SqueezeBERTSum, a trained summarization model improved using the SqueezeBERT encoder modification. Their model achieved comparable ROUGE scores with 49% fewer trainable variables while retaining 98% of the BERTSum baseline system efficacy

Conventional approaches often select top-weighted tweets continuously and ignore the relationships between messages to construct a summary. This process was investigated by Chellal and Boughanem [11], who recommended an innovative method that generated an efficiency problem model using integer linear programming to provide the summary. The success of their approach was demonstrated through trials using the TREC RTF 2015 and TREC RTS 2016 datasets.

Geng et al. [12] examined query-focused summarizing and presented a novel summary architecture capable of producing historical summaries of any length of time as well as tailored online summaries. Their approach’s efficacy and efficiency were demonstrated via extensive trials conducted on real microblogs.

Keswani and Celis [13] presented a method that used a traditional summarization technique as a black box and generated a summary that was comparatively more dialect-diverse from a small group of phrases to account for that bias. They demonstrated the effectiveness of their method on Twitter, collecting tweets written in dialects spoken by individuals belonging to various social categories classified by gender, race, or location; in every instance, their method improved dialect diversity compared to conventional summarizing methods.

Integrity-Aware Extractive–Abstractive (IAEA) real-time occurrence summarization is a unique framework for real-time event summarization offered by Lin et al. [14]. They showed experimentally that IAEA could produce more consistent and better summaries than the most advanced methods.

Zhang et al. [15] proposed pre-training a large Transformer-based encoder–decoder model with a novel self-supervised aim on large text corpora. Their model performed surprisingly well on low-resource summarization, outperforming previous state-of-the-art (SOTA) outcomes on six datasets with a mere 1000 samples.

Goyal et al. [16] presented a brand-new method called Mythos that finds events, identifies subevents within an event, and creates an abstract synopsis and plot to offer several perspectives on the event. It performed better in both cases than baseline methods. The summaries produced were compared to summaries from other reference materials, such as Wikipedia and The Guardian.

In an investigation by Wang and Ren, the summary-aware attention weight [17] was computed using attended summary vectors and source hidden states. The results of assessments conducted by humans and computers equally demonstrate that their model operated significantly better than high baselines.

Through the simultaneous consideration of subject feelings and topic aspects, a solution was identified in an investigation by Ali et al. [18]. Their approach could outperform current approaches on standard metrics like ROUGE-1, as demonstrated by their comparison with SOTA Twitter summarizing techniques.

Wu et al. [19] presented an Ortony–Clore–Collins (OCC) model and an opinion summary approach for Chinese microblogging platforms based on convolutional neural networks (CNNs). Experimental findings from the analysis of three real-world microblog databases showed the effectiveness of their proposed strategy.

Although the TREC Incident Streams track dataset was not meant to be used for automated summarization, as evaluated by Dusart et al. [20], the suggested dataset was used to test a number of popular current techniques for automatic text summarization, some of which were tailored specifically to Twitter summarization and some of which were not.

Garg et al. [21] proposed a real-time Twitter summation system for incidents called ontology-based real-time Twitter summarization (OntoRealSumm), which is built on ontologies and produces an overview of disaster-related tweets with limited assistance from humans. OntoRealSumm’s efficacy was confirmed by contrasting its performance against cutting-edge methods on ten disaster datasets.

The task of compiling pertinent tweets was addressed by Saini et al. [22], who examined efficiency by maximizing various aspects of the summary using a multi-objective binary differential evolution (MOBDE) search algorithm to choose a portion of tweets. In comparison to current methods, their best-proposed solution (MOOST3) enhanced ROUGE−2 and ROUGE−L by 8.5% and 3.1%, respectively, and the t-test was used to confirm the statistical significance of these improvements.

Li and Zhang [23] investigated two extraction methods for Twitter event summaries. Comparisons demonstrated that these two strategies work better than others. Table 1 lists the methods, outcomes, and datasets used to automatically summarize brief texts.

3. Methodology

We obtained a Twitter dataset from Kaggle which covered a range of subjects, including COVID-19, FIFA, and farmer demonstrations. Preprocessing Twitter data using stop word removal, tokenization, stemming, and keyword identification comprised part of the methodology [28]. The initial feature extraction process used TF–IDF. Redundancy was decreased and summary accuracy increased with a GO-FC-CNN. Using ROUGE measures, the model’s performance was assessed in comparison with current techniques. An overview of the procedure is shown in Figure 1.

3.1. Datasets

We gathered tweets related to the datasets FIFA, FARMER PROTEST, and COVID-19 taken from Kaggle sources.

FIFA

Several attributes are included in the FIFA World Cup 2022 tweet dataset: an index, the date and time each tweet was created, the number of likes it received, the source platform, the text content, and the mood of the tweet. Public responses to the incident were captured by the sentiment attribute, which has three classes: neutral (2574 tweets), negative (1804 tweets), and positive (2622 tweets). https://www.kaggle.com/datasets/tirendazacademy/fifa-world-cup-2022-tweets (accessed on 28 August 2024).

FARMER DEMONSTRATIONS

The following attributes are part of the FARMER DEMONSTRATIONS dataset: source, medium, retweeted tweet, quoted tweet, mentioned users, sentiment, reply count, retweet count, like count, and user ID. The distribution was as follows: 3034 neutral, 1886 negative, and 2080 positive sentiments. This extensive dataset gathers the necessary information for assessing Twitter activity and attitudes around farmer demonstrations. https://www.kaggle.com/datasets/prathamsharma123/farmers-protest-tweets-dataset-csv (accessed on 28 August 2024).

COVID-19

The COVID-19 Twitter Dataset (April–June 2021) includes various attributes such as Tweet ID, creation date and time, source platform, text content, language, favorite and retweet counts, original author, hashtags, user mentions, location, cleaned tweet text, and the classes (compound, negative, neutral, and positive). The dataset classifies tweets into neutral (3134), negative (1686), and positive (2180) sentiments. https://www.kaggle.com/datasets/arunavakrchakraborty/covid19-X-dataset/data (accessed on 28 August 2024).

3.2. Data Preprocessing

Stop word removal, tokenization, stemming, and keyword recognition are all used to standardize terminology. Enhancing the efficacy and precision of summarization algorithms may also require eliminating redundant data, managing absent values, and guaranteeing uniform formatting.

A crucial step in the NLP process is text preprocessing. The importance of NLP preprocessing is demonstrated by the following [28,29].

Tokenization: Sentence segmentation divides text into meaningful components known as tokens, such as words, characters, and phrases.
Stemming: Stemming eliminates suffixes and prefixes from words to return them to their root form. Word presentation is reduced to its fundamental form.
Stop word removal: This stage involves removing frequently used words which lack semantic content, including this, and, a, and the, from textual material.
Keyword identification: This method identifies words or phrases that are important for the topic or setting of the text. Keywords are vital for understanding the primary content and for other activities like indexing, categorizing, and summing. Figure 2 shows the preprocessed output.

3.3. Feature Extraction Using TF–IDF

To measure importance, methods such as TF–IDF are employed to assist in identifying the most pertinent details to condense and summarize original text while maintaining its meaning. The TF–IDF weight assesses the significance of a word to a document inside a collection [30]. The frequency of a word in the corpus balances out the importance, as shown in Equation (1), which increases in direct proportion to the number of times it is found in the document. In a specific document,

D_{i}

, the term frequency is

t f_{j i} = (m_{j, i}) / \sum_{l} m_{l, i}

(1)

where the denominator is the total number of occurrences of every phrase in the document

d_{i}

, and

m_{j, i}

is the number of instances of the examined term (

t_{j}

) in

d_{i}

.

The amount of every document divided by the number of documents including a word generates the inverse document frequency, which is a measure of a phrase’s overall relevance. The logarithm of the quotient is calculated in Equation (2).

{I d f}_{j} = \begin{matrix} \underline{l o g | D |} \\ | {d : t_{j} € d} | \end{matrix}

(2)

where the denominator is the number of documents in which

t_{j}

appears, and

|D|

is the overall number of documents in the gathering shown in Equation (3).

Hence,

{(t f - i d f)}_{j, i} = {t f}_{j, i} \times {i d f}_{j}

(3)

This technique has a limitation in that it cannot be applied to a single document without any additional documents for comparison because it selects keywords based on phrase frequency. Figure 3 shows the feature extraction output.

3.4. Extractive Summarization

Our approach to extractive text summarization incorporates an FC-CNN and GO. The process is broken down into the following key steps:

Step 1.: Initial summaries: The GO first generates initial summary candidates by selecting sentences from the input text based on their TF–IDF scores.
Step 2.: Evolution: The initial summaries undergo an evolutionary process. The algorithm applies selection, crossover, and mutation operations to enhance the quality of the summaries.
Step 3.: Iterative improvement: This evolutionary process is repeated across multiple generations, with each generation producing better summary candidates.
Step 4.: Refinement using an FC-CNN: Once GO produces a set of high-quality summaries, the FC-CNN model further refines them to achieve the best possible extract, ensuring that the final summaries are both concise and informative.
Step 5.: Final output: The final output is a set of optimized summaries that effectively represent the original text, achieved by combining the strengths of both the GO and the FC-CNN model. As an example, Figure 4 shows one of the high-quality summaries generated by our approach for the twitter_red dataset.

3.5. Classification Using GO-FC-CNN

The process of summarizing is improved using classifying techniques to separate important information from unimportant details. This method efficiently removes noise and highlights important data from social media networks by utilizing the GO-FC-CNN model. This improvement ensures the accuracy of the summary by emphasizing important information and lessening the influence of unimportant details. Advanced classification approaches are used to improve overall content relevance and clarity through more accurate and efficient data summarization.

3.5.1. GO-FC-CNN

The GO-FC-CNN model is a hybrid approach designed for text classification and summarization. It enhances the standard FC-CNN architecture by adding fully connected layers and optimizing the convolutional layers using a genetic algorithm (GO). These additional layers and optimization improve the model’s ability to handle various text formats and classification tasks. The genetic optimization fine-tunes the model, making it more adaptable and accurate. After training, the model categorizes texts into topics and generates concise summaries based on the classified data. This approach efficiently handles short-text classification and summarization.

3.5.2. FC-CNN

The FC-CNN can be used to gather textual patterns and for deep feature integration. Because of this feature, this architecture can be used to produce accurate and efficient summaries of brief texts by eliminating material that can be removed and extracting significant information. An FCNN includes an input layer (IL) and multiple hidden layers (HLs) culminating in an output layer (OL). The input vectors which contain early features of the text are processed in the HLs, the IL generates a numerical vector of preprocessed text data, and the OL holds the summarized text result [31]. For an enhanced understanding of the structure and work of FCNNs, we describe a five-layered neural network in Figure 5. This FCNN has three parallel HLs with seven neurons each, which are independent of the others, five neurons in the IL, and three elements in the OL, which is the final layer.

Each concealed layer in Figure 5 represents a summarizing layer, and every white ellipse with factors w or z characterizes a summary component. The key points in a document are comparable to the data. In the downstream summarizing layer, they can propagate via connections between summary nodes. All summary nodes process the signals throughout this procedure. Additionally, the output value (OV) of each neuron in HL 1 may be found using Equations (4) and (5).

z_{11} = w_{1} . ω_{11}^{1} + w_{2} . ω_{12}^{1} + w_{3} . ω_{13}^{1} + w_{4} . ω_{14}^{1} + w_{5} . ω_{15}^{1} + a_{11}

(4)

z_{17} = w_{1} . ω_{71}^{1} + w_{2} . ω_{72}^{1} + w_{3} . ω_{73}^{1} + w_{4} . ω_{74}^{1} + w_{5} . ω_{75}^{1} + a_{17}

(5)

where

z_{n m}

is the

m t h

OV of the

n t h

HL in the summarization strategy, and w₁ and w₂ are features that show the frequency of significant words in the U and V axes directions, while w₃ and w₄ are decentering variables facing the X and Y axes. The entire length of the text is indicated by the feature w₅. The weight of the

l t h

output

ω_{i l}^{j}

relates to the ith neuron’s preceding layer in the jth HL significance of each text feature. The

q t h

bias term of the pth HL is represented by

a_{p q}

. The training network is an efficient method for adjusting the weight and bias coefficients to increase the summarization quality. When the final estimated summary is reasonably close to the theoretical significance, the network has been accomplished effectively. The initial and second HLs’ equations can be expressed in comparable matrix forms. As a result, the equation for the second HL can be summarized as in Equation (6).

[\begin{matrix} z_{11} \\ ⋮ \\ z_{17} \end{matrix}] = [\begin{matrix} \begin{matrix} ω_{11}^{1} & \dots & ω_{15}^{1} \end{matrix} \\ \begin{matrix} ⋮ & ⋱ & ⋮ \end{matrix} \\ \begin{matrix} ω_{71}^{1} & \dots & ω_{75}^{1} \end{matrix} \end{matrix}] . [\begin{matrix} w_{1} \\ ⋮ \\ w_{5} \end{matrix}] + [\begin{matrix} a_{11} \\ ⋮ \\ a_{17} \end{matrix}]

(6)

Not all Ovs can be moved to the following concealed layer. In actuality, a summarization model must process every OV. The model’s goal is to extract the most important details and remove any unnecessary material to produce a summary that is clear and educational. This extractive summarization method is illustrated in Equation (7).

φ (y) = \frac{1}{1 + e x p (- y)}

(7)

Assume that after a summarization method has been used, each OV of the text’s upper layer can be moved to the next layer. The output is summarized via the second HL in Equation (8).

[\begin{matrix} z_{21} \\ ⋮ \\ z_{27} \end{matrix}] = [\begin{matrix} \begin{matrix} ω_{11}^{2} & \dots & ω_{17}^{2} \end{matrix} \\ \begin{matrix} ⋮ & ⋱ & ⋮ \end{matrix} \\ \begin{matrix} ω_{71}^{2} & \dots & ω_{77}^{2} \end{matrix} \end{matrix}] . [\begin{matrix} z_{11} \\ ⋮ \\ z_{17} \end{matrix}] + [\begin{matrix} a_{21} \\ ⋮ \\ a_{27} \end{matrix}]

(8)

We can obtain

z_{31} \dots z_{37}

by spreading

z_{21} \dots z_{27}

over the levels of the summarization framework up to the third HL. Features of the input elements are summed in the Ovs of the final HL. The weighted summation of features derived from the input data yields a summary score evaluated via a loss function. This summary score is utilized as the final summarized output if it meets the root mean square value judgment criteria of the loss function shown in Equation (9). The following equation can be used to represent the loss function used to assess summarization quality:

l o s s = {(\frac{\sum_{j = 1}^{m} {(c_{j} - b_{j})}^{2}}{m})}^{1 / 2}

(9)

Here,

b_{j}

is a summary of the material and

c_{j}

is the actual content.

The resultant summary will be an immediate output if it satisfies the selection requirements of the loss function. If not, the HLs’ weights will be continuously changed until the value satisfies the criteria for selection. This process, which takes place during the network’s training stage, is sometimes referred to as neural network backward transmission.

It will be a direct output if the resulting summary fulfills the loss function decision. If not, the HL weights will be adjusted once more until the chosen value is reached. This phenomenon, referred to as neural network reverse transmission, typically takes place during the network training phase. Each HL contains one thousand neurons. The network produces five estimated values, each of which represents an angle of view, the tilt mistake of the

U

and

V

axes, and the essential elements of the summary of the

X

and

Y

axes. The architecture and parameters of an FCNN for automatic short-text summarization are enhanced using genetic algorithms. This procedure entails mutation, crossover, and selection altering neural network configuration populations to improve summarization performance. The intention is to enhance the performance and efficiency of writing concise and useful text summaries.

3.5.3. GO

To improve the quality of applicant summaries based on a fitness function, GO will repeatedly choose, alter, and reassemble them. By giving priority to readability and relevance, this approach develops succinct, coherent summaries, and via evolutionary progress, it finally yields the ideal summary.

A population-centered meta-heuristic approach forms the basis of GO, and every member of the population provides an acceptable response [32]. Following crossing, mutation, and selection, the individuals in GO are modified. Two individuals are chosen at random during the selection process, which improves the population’s variability. The crossover mechanism then exchanges values between the chosen individuals (parents) to create new individuals. Next, mutation is used to swap out a randomly chosen person for a randomly chosen value from the search space. Finally, the most exceptional individuals are selected to comprise the emerging and current populations depending on the fitness characteristics of the newly formed individuals and their parents. These three GO processes, selection, crossover, and mutation, are then repeated until the end requirements are met, updating the population.

Crossover

The crossover operator is a fundamental operator in various GO modifications. The single-point approach is the most straightforward crossover; it comprises the two required parents, which are chosen at random from the general population. By singly dividing the information within them to a single point, the parents are employed to create offspring. New solutions are produced by switching the values among the two parents after they use a single point. By merging the best features of two-parent solutions into one potentially better offspring solution, the crossover approach improves the summarization process. The operation of the single-point (S-P) crossover is illustrated graphically in Figure 6.

Although the S-P crossover is a solid alternative, using a different version is preferable for real-world code applications.

{B L X}_{- α}

, commonly referred to as the blend crossover, is an actual coded operator. As with the single-point crossover, two parents,

w^{1} a n d w^{2}

, must be selected from the population. This procedure uses GO to optimize the network’s parameters to improve the efficiency and accuracy of producing succinct and helpful text summaries. A component of

w_{j}^{d}

is retrieved from each using the parents. For

{B L X}_{- α}

, Equation (10) offers a satisfactory explanation.

W_{j}^{1} = \min (w_{j}^{1}, w_{j}^{2}) - {\propto c}_{j} W_{j}^{2} = \max (w_{j}^{1}, w_{j}^{2}) - {\propto c}_{j} c_{j} = |w_{j}^{1} - w_{j}^{2}|

(10)

where

α

is a positive number set to 0.5

b y w^{1} a n d w^{2}

elements extracted from the text segment.

Mutation

A mutation is an operator that aids in exploring the surroundings of a particular solution. Regarding the crossover, there are various methods for carrying out a mutation. For this type of mutation, an element from the population must be taken and changed using a random variable (RV) produced using a Gaussian distribution (GD). This helps the network escape local minima, improves its overall performance, and improves the text summarization quality. The altered solution, which is a mutated individual, is calculated using the formula in Equation (11).

m u t a t e (w_{i d}) = w_{i d} \times (1 + g a u s s i a n (σ))

(11)

A Gaussian (

σ

) is an RV initiator that employs a GD with a standard deviation of

σ = 0.1

to create unpredictability in the summarization procedure. From Equation (11),

w_{i d}

is the summary individual determined from the population.

Selection

The selection operator plays a crucial role in identifying the population components that crossover and mutation will affect. While there are a variety of systems of this type, the roulette wheel is the most widely used. This approach, which is based on fitness, operates by giving each person in the population a probability

o t

. The population is then divided into many regions, which are represented by individuals. If an element

b_{j}

in a population of

m

potential solutions

O = \{b_{1}, b_{2}, \dots, b_{m}\}

has the fitness value

{e (b}_{j})

, the probability that

b_{j}

will be chosen can be calculated as in Equation (12).

O t (b_{j}) = \frac{e (b_{j})}{\sum_{j = 0}^{m} e (b_{i})}, i = 1, 2, \dots, m

(12)

4. Results

The experimental environment and setup, as well as the effectiveness of the suggested approach displayed in Table 2, are described in this section. Table 6 shows the overall performances. Multi-Feature Maximal Marginal Relevance Bidirectional Encoder Representations from Transformers (MFMMR-Bert Sum) [24], using ROUGE metrics, is compared with the existing approaches: MTLTS [25], Deep Classification and Batch Real-Time Summarization (DCBRTS) [26], Convolutional Neural Network—Robustly optimized (RoBERTa), and Bidirectional Gated Recurrent Unit Attention (CNN-BiGRU (Att)) [27].

4.1. Summarization Findings

The ROUGE-L metric, which assesses the quality of text summaries by contrasting them with reference summaries, is used in Table 3 to compare how well various summarization algorithms perform on a particular assignment. The algorithms LSA Summarizer, TextRank, Tweet Ranking, LexRank, and Luhn Summarizer are compared. The effectiveness of these algorithms is evaluated in comparison to the TF-IDF-GO-FC-CNN model.

The efficacy of the TF-IDF-GO-FC-CNN algorithm for each summarization technique is shown using percentages. Greater percentages indicate improved results. From Table 3, the results indicate that the TF-IDF-GO-FC-CNN algorithm outperforms traditional summarization models like LexRank at 90.91%, TextRank at 87.87%, and so on, with ROUGE-L performing the lowest at 74.94%. This suggests that the proposed model excels in producing summaries that retain the core information of the original text. Figure 7 shows a comparison of various summarization algorithms.

ROGUE SCORE

ROUGE-1: This statistic determines the number of unigrams of single words that the summary and the reference text contain. ROUGE-2: This metric calculates how many bigrams two words that follow the summary and the reference text share. Examining the word order assesses the summary’s coherence and fluency. The ROGUE score is displayed in Table 4. In comparison, the TF-IDF-GO-FC-CNN has greater scores for both ROUGE-1 (64.95) and ROUGE-2 (59.5) than MFMMR-Bert Sum, which achieved a ROUGE-1 score of 42.74 and a ROUGE-2 score of 59.5; this indicates that TF-IDF-GO-FC-CNN generates summaries with superior word and phrase matching to the reference text. Higher ROUGE-1 and ROUGE-2 scores demonstrate that the TF-IDF-GO-FC-CNN algorithm works better than the MFMMR-Bert Sum method in producing accurate and coherent summaries. Figure 8 provides a ROGUE score diagram.

The ROUGE-L measure is used in Table 5 to compare the efficiency of several summarization techniques at different breakpoints. The algorithms evaluated include TextRank, LSA Summarizer, LexRank, Tweet Ranking, and Luhn Summarizer. Performance is measured for three distinct breakpoint ranges: up to 2000, from 2000 to 5000, and from 5000 to 7000. Table 5 compares the performance of the summarization algorithms.

Table 5 demonstrates the performance variations of different summarization algorithms across various text length ranges. The tweet ranking shows the highest ROUGE-L score (69.87%) for texts up to 2000 words. Text rank performs better (61.23%) for texts between 2000 and 5000 words, while tweet ranking performs the best (58.65%) for texts between 5000 and 7000 words. Based on the length of the text being summarized, these outcomes suggest that summarizing algorithms can differ greatly in their efficacy. Figure 9 shows the summarization algorithms across different breakpoints.

4.2. Classification Findings

Accuracy

Accuracy evaluates coherence and informativeness, guaranteeing that the summary retains important details and clarity. Figure 10 provides a comparative examination of accuracy. While the existing methods MTLTS, DCBRTS, CNN-BiGRU (Att), and RoBERTa achieved 78.6%, 97.30%, 83.5%, and 83.6%, respectively, our proposed GO-FC-CNN methodology achieved 98.00%. The findings demonstrate that our suggested approach substantially outperforms existing methods (Table 6).

Precision

Precision evaluates how well a summary maintains the primary concepts of the original text while removing any unnecessary information. Figure 11 illustrates a precision output.

The GO-FC-CNN technique accomplished a precision of 98.30%, which is admirable in comparison to the traditional methods MTLTS, DCBRTS, CNN-BiGRU (Att) and RoBERTa, which achieved precision values of 77%, 98.10%, 86.8%, and 94.1%, respectively. The findings indicate that our suggested method outperforms current techniques by a significant margin in terms of precision.

Recall

Recall assesses how well a summary covers all important features, demonstrating the model’s capacity to cover necessary material. Figure 12 shows a comparable recall result.

The GO-FC-CNN technique accomplished a recall of (98.72%), which is admirable compared to the memory of the traditional methods MTLTS, DCBRTS, CNN-BiGRU (Att) and RoBERTa achieved 76.6%, 98.63%, 84.2% and 73.1%. The findings indicate that our suggested method outperforms the current techniques by a significant margin in terms of recall.

F1-score

The F-score is the harmonic mean of these metrics, indicating the completeness and correctness of the summary’s ability to capture the main information. Figure 13 provides a comparative exploration of the F1-score.

The GO-FC-CNN strategy we propose achieved an F1-score of 98.61%, which is superior to the memory of the existing techniques MTLTS, DCBRTS, CNN-BiGRU (Att), and RoBERTa, which achieved 76.8%, 98.41%, 83.3%, and 82.3%, respectively. These outcomes demonstrate that our suggested approach outperforms traditional techniques by a substantial margin in regard to the F1-score.

5. Discussion

The MFMMR-BERT Sum approach, which leverages multiple features and BERT to predict sentiment, offers a powerful methodology for short-text summarization. However, its strength comes with several challenges. The model’s reliance on BERT and multiple feature inputs makes it resource-intensive, demanding significant computational power and memory. This can create bottlenecks, especially when applied to larger datasets or when real-time processing is required. Furthermore, the model’s performance can vary based on the specificity of the text. For instance, MFMMR-BERT Sum may excel in summarizing short texts with fewer features, but it might struggle when the text is dense with intricate details or contains numerous diverse features. Such variability in performance raises concerns about its generalizability across different types of texts.

Moreover, the approach faces potential inefficiencies in balancing its management of multiple tasks, as seen in models like MTLTS (Multi-Task Learning for Text Summarization). The complexity of handling multiple tasks simultaneously can result in suboptimal performance if the model fails to adequately manage these tasks. This complexity also increases the training time and resource demands, making it less feasible for deployment in resource-constrained environments. Additionally, MTLTS may struggle with texts that do not conform to clear patterns or standards, leading to less accurate summaries. This issue is compounded by the need for batch processing, which can introduce delays in real-time applications. Such delays may be unacceptable in scenarios in which rapid summarization is critical.

A significant risk associated with deep learning models, including MFMMR-BERT Sum, is the potential to generate shallow or superficial summaries. While deep classification can be effective in identifying key points, it may sometimes fail to capture the nuances and subtleties of a text. This is particularly problematic for short texts, where every word can carry substantial meaning. If the model focuses too much on broader themes, it may miss important contextual details, resulting in summaries that lack depth or relevance. The demand for high computational resources during training and deployment only adds to the complexity, making this approach challenging to scale without substantial infrastructure.

Given these challenges, we propose an alternative approach: automatic short-text summarization using the Genetic Optimized Fully Connected Convolutional Neural Network (GO-FC-CNN). The GO-FC-CNN approach seeks to address the limitations of existing models by optimizing the network structure through a genetic algorithm, allowing for more efficient training and improved generalization. By focusing on short-text summarization, GO-FC-CNN is designed to handle the specific challenges associated with concise text, such as the need to capture subtle nuances without overwhelming computational demands.

One of the key advantages of GO-FC-CNN is its ability to optimize network architecture dynamically, enabling it to perform well across a range of different texts without the need for excessive resource allocation. Unlike MFMMR-BERT Sum, which relies heavily on BERT’s pre-trained embeddings and requires significant memory, GO-FC-CNN is more lightweight and adaptable. This makes it more suitable for real-time applications in which speed and efficiency are paramount. Furthermore, the use of genetic optimization helps refine the model architecture, ensuring that it can effectively balance the need for depth in summaries with the computational constraints typical of real-world deployments.

In addition to addressing resource constraints, GO-FC-CNN also improves upon the challenge of handling texts with unclear or ambiguous features. By leveraging convolutional layers optimized through genetic algorithms, the model can better capture the underlying structure of the text, even when the features are not immediately apparent. This makes it particularly effective for summarizing short texts with which traditional models might struggle due to a lack of clear patterns.

Despite the promising results of the GO-FC-CNN model compared to existing methods, there is potential for enhancement in handling dense or complex text. As shown in Table 5, a performance drop occurs for texts longer than 5000 words, indicating that the model’s effectiveness may vary across different datasets, especially those with varying levels of complexity. Although GO-FC-CNN is designed to be more efficient than models like BERT, it still demands considerable computational resources during the training phase. Additionally, the model may encounter challenges with dynamic datasets that evolve over time, requiring continuous updates or retraining to maintain its effectiveness.

In summary, while existing models like MFMMR-BERT Sum and MTLTS offer valuable approaches to text summarization, they are not without their limitations. Their resource-intensive nature, potential for shallow summaries, and challenges in handling complex or ambiguous texts make them less ideal for all applications. The GO-FC-CNN approach provides a promising alternative by offering a more efficient and adaptable solution for automatic short-text summarization. Through the use of genetic optimization and a fully connected convolutional architecture, GO-FC-CNN can achieve high-quality summaries without the need for excessive computational resources, making it a more viable option for a wide range of practical applications. This model could be particularly useful in applications requiring summarization, such as summarizing breaking news articles or condensing social media texts, where both speed and accuracy are critical. The lightweight architecture, combined with its high performance, positions it as a viable tool for practical applications.

6. Conclusions

There is a need to utilize short-text-summarizing methods due to the emergence of big data. This research aims to propose solutions to platforms like Twitter. This paper proposes a new approach using GO-FC-CNN for summarization and TF–IDF for feature extraction. Kaggle datasets such as FIFA, farmer demonstrations, and COVID-19 assessments reveal that the recommended strategy improves accuracy and eliminates redundancy in summaries. A comparative analysis employing ROUGE scores stresses the outperformance of the GO-FC-CNN approach in comparison with the present methods. ROUGE-1 (64.95) and ROUGE-2 (59.5) scores reveal that, compared with the MFMMR-Bert Sum approach, the developed GO-FC-CNN can produce summaries that are more intelligible and accurate than the MFMMR-Bert Sum technique, though the baseline traditional LexRank algorithm provides a slightly better ROUGE-L score of 90.91%. Additionally, our method’s incorporation of classification approaches has shown to be crucial in improving summary quality by eliminating superfluous material and emphasizing important details. This provides more evidence that summarizing and classifying information together yields superior outcomes. Even with these improvements, optimization procedures can still take a lot of time, and the model may have trouble understanding complex language, which could result in summaries that exclude important information or cannot adequately handle complex scenarios. To further improve the model’s efficacy, future work should focus on lowering computational complexity, accelerating real-time processing, and optimizing the model for texts that are multilingual and domain-specific.

Author Contributions

Conceptualization, F.A.G., M.C.P. and R.A.; methodology, F.A.G., H.M.A., M.C.P. and R.A.; Software, F.A.G. and H.M.A. formal analysis, F.A.G., M.C.P. and R.A.; investigation, R.A., and M.C.P.; resources, F.A.G. and R.A.; data curation, F.A.G., H.M.A. and R.A.; writing—original draft preparation, F.A.G., R.A. and M.C.P.; writing—review and editing, F.A.G., R.A. and M.C.P.; visualization, F.A.G. and H.M.A.; supervision, M.C.P. and R.A.; project administration, M.C.P.; funding acquisition, F.A.G. and R.A. All authors have read and agreed to the published version of the manuscript.

Funding

This paper received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dutta, S.; Chandra, V.; Mehra, K.; Das, A.K.; Chakraborty, T.; Ghosh, S. Ensemble Algorithms for Microblog Summarization. IEEE Intell. Syst. 2018, 33, 4–14. [Google Scholar] [CrossRef]
Ghanem, F.A.; Padma, M.C.; Alkhatib, R. Automatic Short Text Summarization Techniques in Social Media Platforms. Futur. Internet 2023, 15, 311. [Google Scholar] [CrossRef]
Huang, Y.; Shen, C.; Li, T. Event summarization for sports games using twitter streams. World Wide Web 2018, 21, 609–627. [Google Scholar] [CrossRef]
Phan, H.T.; Nguyen, N.T.; Hwang, D. A Tweet Summarization Method Based on Maximal Association Rules. In Computational Collective Intelligence. ICCCI 2018. Lecture Notes in Computer Science; Nguyen, N., Pimenidis, E., Khan, Z., Trawiński, B., Eds.; Springer: Cham, Switzerland, 2018; Volume 11055. [Google Scholar] [CrossRef]
Rudra, K.; Ganguly, N.; Goyal, P. Extracting and Summarizing Situational Information from the Twitter Social Media during Disasters. ACM Trans. WEB 2018, 12, 17. [Google Scholar] [CrossRef]
Wang, R.; Luo, S.; Pan, L.; Wu, Z.; Yuan, Y.; Chen, Q. Microblog summarization using Paragraph Vector and semantic structure. Comput. Speech Lang. 2019, 57, 1–19. [Google Scholar] [CrossRef]
Rudra, K.; Goyal, P.; Ganguly, N.; Imran, M.; Mitra, P. Summarizing Situational Tweets in Crisis Scenarios: An Extractive-Abstractive Approach. IEEE Trans. Comput. Soc. Syst. 2019, 6, 981–993. [Google Scholar] [CrossRef]
Kumar, N.V.; Reddy, M.J. Factual Instance Tweet Summarization and Opinion Analysis of Sport Competition. In Advances in Intelligent Systems and Computing; Springer: Singapore, 2019; Volume 898, pp. 153–162. [Google Scholar]
Rudrapal, D.; Das, A.; Bhattacharya, B. A New Approach for Twitter Event Summarization Based on Sentence Identification and Partial Textual Entailment. Comput. y Sist. 2019, 23, 1065–1078. [Google Scholar] [CrossRef]
Abdel-Salam, S.; Rafea, A. Performance Study on Extractive Text Summarization Using BERT Models. Information 2022, 13, 67. [Google Scholar] [CrossRef]
Chellal, A.; Boughanem, M. Optimization Framework Model For Retrospective Tweet Summarization. In Proceedings of the 33rd ACM Symposium on Applied Computing (SAC 2018), Pau, France, 9–13 April 2018. HAL Id: Hal-02548108. [Google Scholar]
Geng, F.; Liu, Q.; Zhang, P. A time-aware query-focused summarization of an evolving microblogging stream via sentence extraction. Digit. Commun. Netw. 2020, 6, 389–397. [Google Scholar] [CrossRef]
Keswani, V.; Celis, L.E. Dialect diversity in text summarization on twitter. In Proceedings of the World Wide Web Conference WWW’21: The Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 3802–3814. [Google Scholar] [CrossRef]
Lin, C.; Ouyang, Z.; Wang, X.; Li, H.; Huang, Z. Preserve Integrity in Realtime Event Summarization. ACM Trans. Knowl. Discov. Data 2021, 15, 49. [Google Scholar] [CrossRef]
Zhang, J.; Zhao, Y.; Saleh, M.; Liu, P.J. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, Virtual, 13–18 July 2020; PartF16814. pp. 11265–11276. [Google Scholar] [CrossRef]
Goyal, P.; Kaushik, P.; Gupta, P.; Vashisth, D.; Agarwal, S.; Goyal, N. Multilevel Event Detection, Storyline Generation, and Summarization for Tweet Streams. IEEE Trans. Comput. Soc. Syst. 2020, 7, 8–23. [Google Scholar] [CrossRef]
Wang, Q.; Ren, J. Summary-aware attention for social media short text abstractive summarization. Neurocomputing 2021, 425, 290–299. [Google Scholar] [CrossRef]
Ali, S.M.; Noorian, Z.; Bagheri, E.; Ding, C.; Al-Obeidat, F. Topic and sentiment aware microblog summarization for twitter. J. Intell. Inf. Syst. 2020, 54, 129–156. [Google Scholar] [CrossRef]
Wu, P.; Li, X.; Shen, S.; He, D. Social media opinion summarization using emotion cognition and convolutional neural networks. Int. J. Inf. Manag. 2020, 51, 101978. [Google Scholar] [CrossRef]
Dusart, A.; Pinel-Sauvagnat, K.; Hubert, G. ISSumSet. In Proceedings of the 36th Annual ACM Symposium on Applied Computing, Virtual, 22–26 March 2021; pp. 665–671. [Google Scholar] [CrossRef]
Garg, P.K.; Chakraborty, R.; Dandapat, S.K. OntoDSumm: Ontology based Tweet Summarization for Disaster Events. arXiv 2022, arXiv:2201.06545. [Google Scholar] [CrossRef]
Saini, N.; Saha, S.; Bhattacharyya, P. Microblog summarization using self-adaptive multi-objective binary differential evolution. Appl. Intell. 2022, 52, 1686–1702. [Google Scholar] [CrossRef]
Li, Q.; Zhang, Q. Twitter Event Summarization by Exploiting Semantic Terms and Graph Network. Proc. AAAI Conf. Artif. Intell. 2021, 35, 15347–15354. [Google Scholar] [CrossRef]
Fan, J.; Tian, X.; Lv, C.; Zhang, S.; Wang, Y.; Zhang, J. Extractive social media text summarization based on MFMMR-BertSum. Array 2023, 20, 100322. [Google Scholar] [CrossRef]
Mukherjee, R.; Vishnu, U.; Peruri, H.C.; Bhattacharya, S.; Rudra, K.; Goyal, P.; Ganguly, N. MTLTS: A Multi-Task Framework To Obtain Trustworthy Summaries From Crisis-Related Microblogs. In Proceedings of the WSDM 2022—The 15th ACM International Conference on Web Search and Data Mining, Virtual Event, AZ, USA, 21–25 February 2022; Volume 1, pp. 755–763. [Google Scholar] [CrossRef]
Bansal, D.; Saini, N.; Saha, S. DCBRTS: A Classification-Summarization Approach for Evolving Tweet Streams in Multiobjective Optimization Framework. IEEE Access 2021, 9, 148325–148338. [Google Scholar] [CrossRef]
Zogan, H.; Razzak, I.; Jameel, S.; Xu, G. DepressionNet: Learning Multi-modalities with User Post Summarization for Depression Detection on Social Media. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, 11–15 July 2021; pp. 133–142. [Google Scholar] [CrossRef]
Ghanem, F.A.; Padma, M.C.; Alkhatib, R. Elevating the Precision of Summarization for Short Text in Social Media using Preprocessing Techniques. In Proceedings of the 2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), Melbourne, Australia, 17–21 December 2023; pp. 408–416. [Google Scholar] [CrossRef]
Jindal, S.G.; Kaur, A. Automatic Keyword and Sentence-Based Text Summarization for Software Bug Reports. IEEE Access 2020, 8, 65352–65370. [Google Scholar] [CrossRef]
Kaur, J.; Gupta, V. Effective Approaches For Extraction Of Keywords. J. Comput. Sci. 2010, 7, 144–148. [Google Scholar]
Liu, Z.; Peng, Q.; Xu, Y.; Ren, G.; Ma, H. Misalignment Calculation on Off-Axis Telescope System via Fully Connected Neural Network. IEEE Photonics J. 2020, 12, 0600112. [Google Scholar] [CrossRef]
Ewees, A.A.; Al-qaness, M.A.A.; Abualigah, L.; Oliva, D.; Algamal, Z.Y.; Anter, A.M.; Ali Ibrahim, R.; Ghoniem, R.M.; Abd Elaziz, M. Boosting Arithmetic Optimization Algorithm with Genetic Algorithm Operators for Feature Selection: Case Study on Cox Proportional Hazards Model. Mathematics 2021, 9, 2321. [Google Scholar] [CrossRef]

Figure 1. Overview of methodology.

Figure 2. Preprocessed output.

Figure 3. Feature extraction output, the corresponding English letters for “सल_यप_बहल” are: सल: sal, यप: yap, बहल: bahal. So, the full transliteration would be “sal_yap_bahal”.

Figure 4. Finest system summaries produced for the twitter_red dataset using our suggested methodology.

Figure 5. FCNN architecture.

Figure 6. S-P crossover.

Figure 7. Contrast of various summarization techniques.

Figure 8. ROGUE score.

Figure 9. (A) Summarization algorithms across 2000 breakpoints dataset; (B) summarization algorithms across 2000–5000 breakpoints dataset; (C) summarization algorithms across 5000–7000 breakpoints dataset.

Figure 10. Results of accuracy.

Figure 11. Results of precision.

Figure 12. Results of recall.

Figure 13. Results of F1-score.

Table 1. An overview of relevant literature on automatic short-text summarization.

References	Objective	Dataset	Findings
Fan et al. [24]	The Multi-Features Maximal Marginal Relevance BERT (MFMMR-BertSum) model for extractive summarizing was presented in their investigation. To handle the text summary task, it makes use of the previously trained model BERT.	Social media dataset	Establishing its performance, the proposed method outperformed comparable baseline strategies on the CNN/Daily Mail sample for sentence-level extractive summarization.
Mukherjee et al. [25]	According to their inquiry, the first extensive method for the task that assessed the reliability and summary consistency of tweets was termed Multi-task Framework to Obtain Trustworthy Summaries (MTLTS).	They use the PHEME dataset which consists of 4659 Twitter conversations.	Their unique SOTA outcomes for dependable summarization, the core task, were achieved by training the two elements simultaneously in a hierarchical multi-task structure, leveraging their interconnectedness.
Bansal et al. [26]	The current investigation addressed the continuous tweet streams posted during crisis occurrences by presenting a unique framework for classification followed by summarization.	Every dataset was offered as a collection of 5000 continuously streamed tweets together with additional data like the time and date.	The enhanced performance of the established framework over the traditional methodologies was indicated by the proposed methodology on four datasets pertaining to various disaster-related events.
Zogan et al. [27]	Their proposal was a new type of hierarchical deep learning network called history-aware posting temporal network, which combines several fully linked layers that combine user posting and behavioral characterization.	Social media dataset	Their innovative deep learning framework, which combined attention-enhanced Gated Recurrent Units (GRU) models with Convolutional Neural Networks (CNN) to produce higher empirical efficiency than current strong baselines.

Table 2. Experimental setup.

Experimental Setup	Details
Model	GO-FC-CNN
Task	Automatic Short-Text Summarization
Dataset	Kaggle dataset (FIFA, farmer demonstrations, COVID-19)
Hardware	Laptop running Windows 11
Processor	Intel i5 7th Gen
RAM	16 GB
Software Environment	Python 3.10.7
Evaluation Metrics	Precision, Recall, F1-Score, ROUGE Score

Table 3. Comparison of various summarization algorithms.

ALGORITHMS	LexRank	TextRank	Luhn Summarizer	LSA Summarizer	Tweet Ranking	ROUGE-L
TF-IDF—GO-FC-CNN	90.91%	87.87%	88.6%	85.71%	75.87%	74.94%

Table 4. Performance comparison of MFMMR-Bert Sum and TF-IDF-GO-FC-CNN using ROUGE metrics.

ALGORITHMS	ROUGE-1	ROUGE-2
MFMMR-Bert Sum [25]	42.74	19.85
TF-IDF-GO-FC-CNN	64.95	59.5

Table 5. Performance comparison of summarization algorithms across different Breakpoints.

ALGORITHMS	Datasets	Breakpoints	LexRank	TextRank	Luhn Summarizer	LSA Summarizer	Tweet Ranking
TF-IDF-GO-FC-CNN entry 2	FIFA	2000	65.7%	67.77%	69.25%	66.12%	69.87%
	Farmer demonstrations	2000–5000	59.45%	61.23%	55.64%	60.32%	57.46%
	COVID-19	5000–7000	56.21%	51.56%	57.62%	54.98%	58.65%

Table 6. Overall performances.

ALGORITHMS	ACCURACY	PRECISION	RECALL	F1-Score
MTLTS [25]	78.6%	77%	76.6%	76.8%
DCBRTS [26]	97.30%	98.10%	98.63%	98.41%
CNN-BiGRU (Att) [27]	83.5%	86.8%	84.2%	83.3%
RoBERTa [27]	83.6%	94.1%	73.1%	82.3%
GO-FC-CNN	98.00%	98.30%	98.72%	98.61%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ghanem, F.A.; Padma, M.C.; Abdulwahab, H.M.; Alkhatib, R. Novel Genetic Optimization Techniques for Accurate Social Media Data Summarization and Classification Using Deep Learning Models. Technologies 2024, 12, 199. https://doi.org/10.3390/technologies12100199

AMA Style

Ghanem FA, Padma MC, Abdulwahab HM, Alkhatib R. Novel Genetic Optimization Techniques for Accurate Social Media Data Summarization and Classification Using Deep Learning Models. Technologies. 2024; 12(10):199. https://doi.org/10.3390/technologies12100199

Chicago/Turabian Style

Ghanem, Fahd A., M. C. Padma, Hudhaifa M. Abdulwahab, and Ramez Alkhatib. 2024. "Novel Genetic Optimization Techniques for Accurate Social Media Data Summarization and Classification Using Deep Learning Models" Technologies 12, no. 10: 199. https://doi.org/10.3390/technologies12100199

APA Style

Ghanem, F. A., Padma, M. C., Abdulwahab, H. M., & Alkhatib, R. (2024). Novel Genetic Optimization Techniques for Accurate Social Media Data Summarization and Classification Using Deep Learning Models. Technologies, 12(10), 199. https://doi.org/10.3390/technologies12100199

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Novel Genetic Optimization Techniques for Accurate Social Media Data Summarization and Classification Using Deep Learning Models

Abstract

1. Introduction

1.1. Importance and Growing Demand

1.2. Applications and Benefits

1.3. Impact on Different Domains

1.4. Contributions of This Paper

2. Related Work

3. Methodology

3.1. Datasets

3.2. Data Preprocessing

3.3. Feature Extraction Using TF–IDF

3.4. Extractive Summarization

3.5. Classification Using GO-FC-CNN

3.5.1. GO-FC-CNN

3.5.2. FC-CNN

3.5.3. GO

4. Results

4.1. Summarization Findings

4.2. Classification Findings

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI