Previous Article in Journal
Research on the Coupled and Coordinated Development of Economy–Transportation–Ecology Under the “Dual Carbon” Goals
Previous Article in Special Issue
Reference Set Generator: A Method for Pareto Front Approximation and Reference Set Generation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Arabic Abstractive Text Summarization Using an Ant Colony System

Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(16), 2613; https://doi.org/10.3390/math13162613 (registering DOI)
Submission received: 7 July 2025 / Revised: 8 August 2025 / Accepted: 8 August 2025 / Published: 15 August 2025

Abstract

Arabic abstractive summarization presents a complex multi-objective optimization challenge, balancing readability, informativeness, and conciseness. While extractive approaches dominate NLP, abstractive methods—particularly for Arabic—remain underexplored due to linguistic complexity. This study introduces, for the first time, ant colony system (ACS) for Arabic abstractive summarization (named AASAC—Arabic Abstractive Summarization using Ant Colony), framing it as a combinatorial evolutionary optimization task. Our method integrates collocation and word-relation features into heuristic-guided fitness functions, simultaneously optimizing content coverage and linguistic coherence. Evaluations on a benchmark dataset using LemmaRouge, a lemma-based metric that evaluates semantic similarity rather than surface word forms, demonstrate consistent superiority. For 30% summaries, AASAC achieves 51.61% (LemmaRouge-1) and 46.82% (LemmaRouge-L), outperforming baselines by 13.23% and 20.49%, respectively. At 50% summary length, it reaches 64.56% (LemmaRouge-1) and 61.26% (LemmaRouge-L), surpassing baselines by 10.73% and 3.23%. These results highlight AASAC’s effectiveness in addressing multi-objective NLP challenges and establish its potential for evolutionary computation applications in language generation, particularly for complex morphological languages like Arabic.

1. Introduction

In our data-driven digital era, automatic text summarization has become indispensable for efficient information management. This technology aims to produce concise summaries from source texts without human intervention, offering significant time savings across various domains. Beyond facilitating rapid content digestion, it addresses information storage challenges by compressing documents. Its applications span numerous fields including news aggregation, medical record synthesis, educational material condensation, and web content browsing.
The field traces its origins to the pioneering work on technical paper abstracts by Luhn [1]. Modern summarization systems handle both single-document and multi-document inputs, producing outputs that are either extractive (selecting key sentences) or abstractive (generating novel formulations). These outputs range from headlines to full summaries [2]. While extractive methods dominate current research, abstractive approaches—particularly for morphologically rich languages like Arabic—remain underdeveloped due to their inherent complexity involving paraphrasing, sentence fusion, and novel word generation. This challenge is compounded by the slow progress in developing appropriate evaluation metrics.
Current approaches to abstractive summarization employ diverse techniques, including deep learning, discourse analysis, graph-based methods, and hybrid systems. Notably absent from this landscape is the application of Swarm Intelligence (SI) algorithms, despite their proven success in other NLP tasks. While SI methods like Ant Colony Optimization [3,4], Particle Swarm Optimization [5], and Cat Swarm Optimization [6] have shown promise in extractive summarization, their potential for abstractive summarization remains largely unexplored—particularly for Arabic.
We address this gap by framing abstractive summarization as a multi-objective optimization problem and proposing AASAC (Arabic Abstractive Summarization using Ant Colony). Our approach builds upon the ant colony system’s (ACS) proven effectiveness in pathfinding problems like the Traveling Salesman Problem (TSP), adapting it for linguistic optimization.
A key advantage of nature-inspired optimization methods—and non-deep learning approaches in general—is their transparency and credibility, in contrast to the opaque nature of deep neural networks. While deep learning models often suffer from interpretability issues and hallucination [7], AASAC provides full traceability, allowing step-by-step scrutiny of its summarization process. This explainability is particularly valuable in abstractive summarization, where transparency ensures credible and controllable output generation. Key contributions include the following:
  • Nature-Inspired Algorithm: We introduce AASAC (Arabic Abstractive Summarization using Ant Colony), a novel approach for Arabic abstractive summarization that leverages the ant colony system (ACS), a nature-inspired algorithm. This innovative technique leads to superior summarization results.
  • Expanded Dataset: We have expanded the dataset introduced in [8] by incorporating human-generated abstractive summaries. This dataset expansion facilitates a more comprehensive evaluation process, and it is readily accessible to fellow researchers.
  • Semantic Feature Integration: To enhance the efficacy of summarization, we incorporate semantic features as the foundation for fitness functions. This approach significantly enhances the capacity to generate high-quality summaries.
  • Linguistically Aware Evaluation: Recognizing the unique linguistic characteristics of the Arabic language, we advocate the use of the LemmaRouge evaluation metric. This measure accounts for the subtleties and intricacies specific to Arabic, providing more linguistically aware assessment.
The rest of the paper is organized as follows: In Section 2, we provide an overview of related work in the field of abstractive text summarization. Section 3 details the formulation of the ATS problem. Our proposed summarizer is explained in Section 4. Experimental results and discussion are presented in Section 5. Finally, in Section 6, we conclude the paper and discuss potential directions for future research.

2. Related Works

Within the literature, researchers have recognized the limitations of abstractive text summarization systems due to the complexities associated with natural language processing. As a result, this has attracted the interest of researchers, prompting the exploration of various methods to obtain abstractive summaries, such as graph-based and semantic-based techniques.
Graph-based methods have traditionally dominated the field. These methods entail representing the text by using a graph data structure and determining an optimal path for generating a summary. For instance, Opinosis [9] and Kumar et al. [10] adopted this approach, with Opinosis incorporating shallow NLP techniques and the latter employing a bigram model to identify significant bigrams for summary generation. These techniques excel in extracting essential information and producing concise summaries. It is important to note that this approach does not involve sentence paraphrasing or the use of synonyms. Some summarization systems initially utilize extractive methods and then transition to generating abstractive summaries, as demonstrated by COMPENDIUM [11,12].
Another approach involves employing a semantic graph reduction technique, as demonstrated in [13]. In their summarization method, they initiate the process by creating a rich semantic graph (RSG), which serves as an ontology-based representation. Subsequently, this RSG is transformed into a more abstracted graph, culminating in the generation of the abstractive summary from the final graph. The utilization of RSGs for Arabic text representation was further explored by the same authors in ongoing work related to Arabic summarization [14,15]. Additionally, another study applied this technique to summarize Hindi texts [16]. In their model, the authors harnessed Domain Ontology and Hindi WordNet to facilitate the selection of diverse synonyms, thus enriching the summary generation process.
Furthermore, Le and Le [17] introduced an abstractive summarization method for the Vietnamese language that distinguishes itself from [9,18] by incorporating anaphora resolution. This innovative approach effectively tackles the challenge of obtaining diverse words or phrases to represent the same concept, even when they exist in different nodes within the graph. The summarizer uses Rhetorical Structure Theory (RST) to streamline sentences. It achieves this by removing less important and redundant clauses from the beginning of a sentence and then reconstructing the refined sentence based on syntactic rules. This summarization technique is elegantly straightforward, as it amalgamates multiple sentences represented within a word graph, employing three predefined cases.
One of the earliest Arabic abstractive summarization systems, presented by Azmi and Altmami [8], was developed on the foundation of a successful RST-based Arabic extractive summarizer [19,20]. In this approach, the sentences generated from the original text are first shortened by removing certain words, such as position names and days. Subsequently, sentence reduction rules are applied to create an abstractive summary. However, it is important to note that this method may result in non-coherent summaries due to the absence of paraphrasing. For the Malayalam language, Kabeer and Idicula [21] employed a semantic graph based on POS (Part Of Speech) tagging. They used a set of features to assign weights to the relationships between two nodes representing the subject and object of a sentence. This process culminated in the generation of a reduced graph, from which an abstractive summary was derived. In the case of Kannada, a guided summarizer called sArAmsha [22] relied on lexical analysis, Information Extraction (IE), and domain templates to generate sentences. A similar approach has also been implemented for the Telugu language [23].
Another relevant line of work emphasizes the role of text segmentation in improving summarization quality. SEGBOT [24] introduced a neural end-to-end segmentation model that leverages bidirectional recurrent networks and a pointer mechanism to detect text boundaries without hand-crafted features. It addresses key challenges in segmenting documents into topical sections and sentences into elementary discourse units (EDUs), which are crucial to structuring coherent summaries. SEGBOT’s outputs have also been shown to enhance downstream applications such as sentiment analysis, suggesting its broader utility in discourse-aware summarization pipelines.
In a related direction, Chau et al. [25] introduced DocSum, a domain-adaptive abstractive summarization framework specifically designed for administrative documents. This work addresses key challenges in processing such texts, including noisy OCR outputs, domain-specific terminology, and scarce annotated data. DocSum employs a two-phase training strategy: (1) pre-training on noisy OCR-transcribed text to enhance robustness, followed by (2) fine-tuning with integrated question–answer pairs to improve semantic relevance. When evaluated on the RVL-CDIP dataset, DocSum demonstrated consistent improvements over a BART-base baseline, with ROUGE-1 scores increasing from 49.52 to 50.72 (+1.20%). Smaller but statistically significant gains were observed in ROUGE-2 (+1.14%) and ROUGE-L (+0.96%). These results highlight the framework’s ability to handle domain-specific nuances while maintaining summary coherence.
Sagheer and Sukkar [26] introduced a hybrid system that combines knowledge base and fuzzy logic techniques for processing domain-specific Arabic text. This system operates by leveraging predefined concepts associated with the domain. Within this framework, the knowledge base serves the purpose of identifying concepts within the input text and extracting semantic relations between these concepts. The resulting sentences generated by the system comprise three essential components: subject, verb, and object. Multiple sentences are produced based on the identified concepts and their relations, and a fuzzy logic system is then employed. This fuzzy logic system computes a fuzzy value for each word in a sentence, utilizing fuzzy rules and defuzzification techniques to rank the summary sentences in descending order based on their fuzzy values. It is worth noting that the system’s performance was evaluated on texts sourced from the Essex Arabic Summaries Corpus. However, it is important to mention that no specific evaluation method was applied to systematically assess and compare the system against other techniques.
In the realm of machine learning and deep learning, Rush et al. [27] introduced a neural attention-based summarization model that utilizes a feed-forward neural network language model (NNLM) [28] and an attention-based model [29] to generate headlines with fixed word lengths. However, this model has certain limitations. It tends to summarize each sentence independently, relies on the source text’s vocabulary, and sometimes constructs sentences with incorrect syntax, as it focuses on reordering words. To address some of these issues, Chopra et al. [30] developed the recurrent attention summarizer (RAS), which incorporates word positions and their word-embedding representations to handle word ordering challenges. Additionally, the encoder–decoder recurrent neural network (RNN) [29] has been a fundamental component in many abstractive summarization models. Nallapati et al. [31] enhanced [29]’s model by adding an attention mechanism and applying the large vocabulary trick (LVT) [32]. They also tackled the problem of out-of-vocabulary (OOV) words by introducing a switching generator–pointer model [33]. However, a drawback of this model was the generation of repetitive phrases, which was mitigated to some extent by employing a Temporal Attention model [34]. Another approach improved handling OOV words by learning when to use the pointer and when to generate a word [35], a technique that enhanced [31]’s model.
The issue of repetition was further addressed by incorporating the coverage model [36] and by implementing an intra-decoder attention mechanism [37]. In Arabic headline summarization [38], a pointer–generator model [35] with an attention mechanism [29] served as a baseline, and a variant with a copy mechanism [33] was developed. The latter model demonstrated improved results compared with the baseline. Notably, the model with a copy mechanism and a length penalty outperformed other variants that incorporated coverage penalties or length and coverage penalties, largely due to considerations related to summary length limitations. To evaluate these models, an Arabic headline summary (AHS) dataset was created. Additionally, another study in Arabic explored sequence-to-sequence models with global attention for generating abstractive headline summaries [39]. They examined the impact of the number of encoder layers for three types of networks: gated recurrent units (GRUs), LSTM, and bidirectional LSTM (BiLSTM). Evaluation using ROUGE and BLUE measures, employing the AHS dataset [38] and Arabic Mogalad_Ndeef (AMN) [40], indicated that the two-layer encoder for GRUs and LSTM achieved better results than the single-layer and three-layer configurations. Conversely, the three-layer BiLSTM encoder outperformed the single-layer and two-layer configurations. Notably, utilizing AraBERT [41] in the data preprocessing stage contributed to improved results.
Furthermore, the RNN architecture initially proposed by [31] underwent modifications for a multi-layer encoder and single-layer decoder summarization model tailored to Arabic [42]. The encoder incorporates three hidden state layers for input text, keywords, and text name entities. These layers employ bidirectional LSTM and feature a global attention mechanism for enhanced performance.
Pre-trained language models, including BERT (Bidirectional Encoder Representations from Transformers) [43] and BART (Bidirectional and Auto-Regressive Transformers) [44], have found applications in abstractive summarization tasks. BART, in particular, distinguished itself from BERT by pre-training both the bidirectional encoder and the auto-regressive decoder. To harness the power of BERT for text summarization, BertSum (BERT architecture for summarization) [45] was introduced, offering two variants: BertSumExt and BertSumAbs. The former focused on extractive summarization, while the latter delved into abstractive summarization. BertSumAbs employed an encoder–decoder architecture [35], where the encoder was a pre-trained BertSum and the decoder was a transformer initialized randomly. To strike a balance between overfitting and underfitting, an Adam optimizer was used for both the encoder and decoder, each with distinct learning rates and warm-up steps.
For Arabic abstractive summarization, Elmadani et al. [46] utilized multilingual BERT [47] to train BertSumAbs, and they evaluated its performance on the KALIMAT dataset by using ROUGE scores. In a separate effort, Kahla et al. [48] fine-tuned multilingual transformer models, including BERT and BART, for the Arabic abstractive summarization task. They also leveraged AraBERT [41], a BERT model specifically trained for Arabic. These models underwent fine-tuning using a corpus collected from Arabic Deutsche Welle (DW) news (https://www.dw.com/ar/, accessed on 5 August 2021). Furthermore, they introduced a cross-lingual transfer-based approach by initially fine-tuning multilingual BERT for abstractive Hungarian and English text summarization and subsequently fine-tuning it for Arabic using the same corpus. Automatic and human evaluations demonstrated that multilingual BERT trained from English outperformed other models, producing summaries that closely resembled the original lead. In contrast, other models tended to generate longer summaries than the lead or contained more grammar and context errors.
Another noteworthy development is AraBART [49], an Arabic pre-trained BART model. AraBART underwent fine-tuning for Arabic abstractive summarization tasks, utilizing datasets from Arabic Gigaword [50] and XL-Sum [51]. The evaluation results revealed that AraBART outperformed the pre-trained Arabic BERT-based model [52], the multilingual mBART model [53], and the mT5 model [54].
Additive Manufacturing (AM) constructs objects layer by layer from digital models, generating extensive unstructured textual data such as design rationales, process parameters, and material specifications. Efficient organization of this knowledge is essential to Design for Additive Manufacturing (DFAM), where traceability and interpretability are critical to informed decision making. Abstractive summarization offers a scalable solution by condensing complex AM content into coherent and actionable insights. Recent advances enhance factual consistency in summarization by integrating structured knowledge extraction methods, including triple classification and knowledge graphs (KGs). For example, AddManBERT [55] employs dependency parsing to extract semantic relations between AM entities (e.g., material–process dependencies) and encodes them as vector representations. Complementary work utilizes neural models with meta-paths to capture hierarchical semantics between entities and relations, while KG-based methods support scalable triple classification from multi-source Fused Deposition Modeling data. These techniques have demonstrated superior classification accuracy and computational efficiency compared with rule-based systems, underscoring the value of KG-augmented summarization in AM knowledge management.
Table 1 provides an overview of abstractive text summarization studies, including details about the corpus used and the scope of the summary. The summary scope can fall into one of three categories: headline, sentence level (where a single sentence serves as the summary), or document level (which generates multiple or a few lengthy sentences).
Today, deep learning forms the backbone of most abstractive summarization models [7]. However, its effectiveness typically assumes access to large-scale training data and substantial computational resources—conditions often unmet for Arabic and other morphologically rich languages. Our ACS-based approach offers a strategically compelling alternative by addressing four critical limitations of neural methods. First, it operates effectively with limited labeled data, making it suitable for specialized Arabic domains where annotated corpora are scarce. Second, its explicit modeling of Arabic root patterns and collocations through interpretable fitness functions introduces morphological awareness, which is often lacking in transformer-based systems without extensive pre-training. Third, the framework inherently supports multi-objective optimization, enabling precise trade-offs between competing priorities such as content density and readability—a capability that requires complex architectural modifications in neural models. Finally, ACS achieves competitive performance without reliance on GPUs, thereby democratizing access to abstractive summarization for Arabic NLP researchers and practitioners with constrained infrastructure. Rather than opposing the neural paradigm, this work expands the methodological repertoire for Arabic NLP, particularly in low-resource, high-interpretability scenarios. The success of AASAC further suggests promising directions for hybrid systems combining neural fluency with nature-inspired optimization.

3. Problem Formulation

The ant colony system (ACS) algorithm, an enhanced variant of Ant Colony Optimization (ACO) [56], provides an effective framework for addressing our multi-objective Arabic abstractive summarization challenge. As a population-based metaheuristic, ACS mimics the emergent intelligence of natural ant colonies, where artificial ants collaboratively explore solution paths while dynamically updating pheromone trails to guide subsequent searches toward optimal solutions. This biologically inspired approach is particularly suited for our task, as it efficiently balances multiple competing objectives—content coverage, linguistic coherence, and summary conciseness—through its distributed optimization mechanism.
We formulate the abstractive summarization task as a graph-based optimization problem, where the source document is represented as a connected network of word nodes. Unlike previous ACO applications in extractive summarization [3] that treated entire sentences as nodes, our AASAC approach operates at the word level to enable finer-grained abstractive generation. Each node encapsulates lexical, collocational, and semantic features that collectively inform our multi-component fitness function. The ACS agents navigate this linguistic landscape, with pheromone dynamics reflecting both local heuristic information (word relations) and global summary quality metrics.
This formulation advances prior work in three key aspects: (1) the graph representation preserves Arabic-specific morphological and syntactic relationships critical to abstractive generation; (2) the optimization process simultaneously considers semantic preservation and linguistic fluency through specialized fitness functions; and (3) the solution path directly feeds into a generation module that produces human-like summaries rather than extracted fragments. Table 2 summarizes the mathematical notation for our ACS adaptation to this novel domain.
Our approach can be outlined as follows: Consider a set of words W = { w 1 , w 2 , } representing the words in the original document. Within this set, each word w W is associated with a cost that takes into account factors like its position in the document and frequency of occurrence. These words are interconnected through edges denoted by j J , with each edge carrying a cost that signifies the sequential relationship between the connected words. Importantly, a word can be linked to multiple other words. Our ultimate objective is to construct a summary by identifying a set of words that maximizes the following expression:
M a x i m i z e i = 1 | W | j = 1 E score i j ,
s u b j e c t t o i = 1 | W | w i z i < π ,
where z i assumes a value of 1 when word w i is chosen and 0 otherwise. |W| signifies the total number of words in the document, E represents the count of edges in the document, score i j denotes the cost attributed to the combination of word w i and edge j, and π serves as the constraint that limits the overall length of the selected summary words.
The ACS algorithm consists of three main steps in each iteration. Initially, every ant constructs a solution path, essentially creating a word summary. Subsequently, it identifies the best path among all those generated by the ants up to that point. Lastly, there is a global update of the pheromone level for this best path.
In the process of constructing a solution, unlike in the TSP, where all nodes are explored, each ant, denoted by k, adds an edge labeled j to its path and adjusts the edge’s pheromone level. This process continues until the path reaches a predefined threshold, represented by π , which limits the length of the summary. The selection of edge j over another edge j follows a pseudo-random-proportional rule described by Equation (3):
S i j k = arg max j J k ( i ) τ i j ( η i j ) β , if q q 0 ( exploitation ) , P i j k , o / w ( exploration ) ,
where τ i j represents the pheromone level of edge ( i , j ) , η i j denotes the heuristic information value for edge ( i , j ) , β is a parameter that determines the relative importance of the heuristic information value, while q and q 0 are real values ranging from 0 to 1. Additionally, J k ( i ) represents the set of nearest-neighbor nodes that have not been selected by ant k, which essentially comprises the n-gram words originating from the current word. The value of P i j k is given by Equation (4),
P i j k = τ i j ( η i j ) β / u J k ( i ) U τ i u ( η i u ) β ,
where j J k ( i ) signifies that j belongs to the set of nearest-neighbor nodes not chosen by ant k and U represents the count of available nodes that have not yet been selected by ant k. It is important to note that the denominator of the sum is not zero. When an ant selects an edge, the local update of the edge’s pheromone level takes place using Equation (5). In this equation, the evaporation rate δ is a real value within the range of 0 to 1.
τ i j ( 1 δ ) τ i j + δ τ 0 ,
In the ACS algorithm’s second step, the aim is to identify the best-so-far path among the set of solutions created by the ants, and this is determined based on a fitness function. Finally, the pheromones associated with the best-so-far path are updated globally using Equation (6),
τ i j ( 1 ρ ) τ i j + ρ F F ( S best ) .
Here, ρ represents the global pheromone evaporation rate, and F F ( S best ) signifies the fitness value for the best-so-far solution. We will introduce and define two functions relevant to this process in Section 4.3.

4. Proposed Approach

To generate an abstractive summary for a document, the AASAC system consists of four stages: preprocessing, representation, modeling, and text generation (see Figure 1). The following is a detailed description of the individual stages.

4.1. Preprocessing

The preprocessing stage is essential to any NLP task to make the text ready for the next stage of generating a summary. Special characters such as document formatting and diacritics are removed. Exclamation and question marks are replaced with dots as used in ending a sentence. By using the STANZA library [57] and PADT treebank (https://github.com/UniversalDependencies/UD_Arabic-PADT, accessed on 9 June 2023.), the document is split into tokens by separating the words based on spaces. Then, Universal POS tags (UPOS) and treebank-specific POS (XPOS) Part-of-Speech features are generated for each word. A set of rules are implemented based on the UPOS tag to filter and remove unrelated/important words
  • If the two consecutive words have UPOS tag “X”, which means Other (see Example 1 in Table 3).
  • If a word is tagged “NOUN”, followed by a word tagged “PUNC”, followed by a word tagged “NOUN” (see Example 2 in Table 3).
  • If a word is tagged “X”, followed by a word tagged “PUNC”, followed by a word tagged “X” (see Example 3 in Table 3).
Table 3. Text filtering on the sentence (القاهرة - الوطن أكد د. مصطفى علوي رئيس الهيئة العامة لقصور الثقافة بالقاهرة: “Cairo—Alwatan Dr. Mustafa Alawi, head of the General Authority for Cultural Palaces in Cairo, confirmed”).
Table 3. Text filtering on the sentence (القاهرة - الوطن أكد د. مصطفى علوي رئيس الهيئة العامة لقصور الثقافة بالقاهرة: “Cairo—Alwatan Dr. Mustafa Alawi, head of the General Authority for Cultural Palaces in Cairo, confirmed”).
Tokenالقاهرة-الوطنأكدد.مصطفيعلويرئيسالهيئة
CairoAlwatanconfirmedDr.MostafaAlawihead ofAuthority
UPOSNOUNPUNCTNOUNVERBXPUNCTXXNOUNNOUN
XPOSN–S1DG–N–S1DVP-A-3MSY–G–U–U–N–S1RN–S2D
Mathematics 13 02613 i001 Mathematics 13 02613 i002
Mathematics 13 02613 i003
Once the words are filtered, lemmatization [57] is applied on each word to extract its root. The aim is to normalize the words and simplify language processing. The reason behind applying lemmatization over stemming is the importance of a word’s meaning for analysis. Lemmatization takes into account the context and converts the word into its meaningful base form.

4.2. Representation

We create a graph using Neo4j, a graph database platform (https://neo4j.com, accessed on 24 October 2024). In this graph, words are nodes, and the connections between neighboring words are edges. We have two node types: Lemma and Token. Lemma nodes store word lemmas, while Token nodes represent the original words. For each Lemma node, we calculate |w|, which counts how often word w appears in the document. Additionally, we record the word’s position and the sentence number for each Token node, which is useful for text generation. We establish two edge types: NEXT and WAS. NEXT edges link Lemma nodes in their sequential order, while WAS edges connect Lemma nodes to their corresponding original words in Token nodes. For NEXT edges, we compute the edge count e, representing how many times the connection between two Lemma nodes is repeated. Figure 2 provides a visual representation of the graph structure.
We incorporate additional features at both the node and edge levels to enhance the selection of important words. One of these features is the word-relation feature, which identifies semantic relationships based on the words in a text block. This helps establish connections between various entities, such as people, places, and organizations mentioned in the text. We obtain this feature from the IBM Watson Natural Language Understanding text analytics service (https://www.ibm.com/cloud/watson-natural-language-understanding, accessed on 28 September 2024). For each node, we set this feature as a property if there is a relationship between two words within the text. The property value can be either 0 or 1, indicating the absence or presence of a relation, respectively. Figure 3 provides an example of the “AgentOf” relation, which typically involves two elements: entities and events, with the entity identification playing the primary role in the relationship.
Additionally, we introduce a collocation property, which refers to the frequent occurrence of two or more words together. This property is added to an edge connecting two collocated words, such as (الأمم المتحدة: “United Nations”). The collocation feature helps identify significant words or phrases in the text that are commonly used together, offering a stronger indication of essential information in the text. To set the edge’s property value (0 or 1), indicating the absence or presence of a collocation between edges and nodes, we utilize Maskouk’s Arabic Collocation Dataset [58]. Figure 4 provides an overview of the properties associated with nodes and edges.
Figure 5 shows a sample Arabic text, whose graph representation is shown in Figure 6.

4.3. Modeling

Once the preparation is completed, the ACS algorithm is applied as follows.
Initializationstep.
We begin by specifying the number of ants and the number of iterations. Additionally, we mark all nodes and edges as unexplored or unvisited, and we set the pheromone parameter for all edges.
Furthermore, we calculate the candidate list for each node. This list is composed of n-gram adjacent nodes to the given node. To illustrate, Figure 7 provides examples of 1-gram, 2-gram, and 3-gram candidate lists for the lemma node (دراسة: “a study”). When forming the 1-gram list, the candidate list includes the end nodes of the red edges. For the 2-gram list, it encompasses the end nodes of both red and green edges. Finally, the 3-gram list comprises the end nodes of the red, green, and blue edges.
Subsequently, we calculate the heuristic information, considering different features. Lastly, we select a start node and mark it as explored.
Iterationstep.
The process continues for a set number of rounds while following these steps: (1) Construct a solution for each ant using Equations (3) and (4). (2) Select the best path based on a fitness function using either Equation (9) or Equation (10). (3) Update the global pheromone using Equation (6).
Output.
We display the final best path, which corresponds to a single path with the highest weight.
For each word w i W , we define two heuristic information functions, denoted by H 1 ( i ) and H 2 ( i ) , as follows:
H 1 ( i ) = | w i | / | W | + e i / E ,
H 2 ( i ) = | w i | / | W | + e i / E + c i + r i ,
where | w i | is the frequency of word w i , e i represents the number of edges connected to word w i , E is the total number of edges, c i is the edge’s collocation weight, and r i is the word’s relation weight.
Additionally, we propose two fitness functions, F F 1 and F F 2 ; one considers the heuristic information H 1 values of all nodes in the best path, given by Equation (9). The other utilizes the relation and collocation features (Equation (10)).
F F 1 = i Best path H 1 ( i ) ,
F F 2 = i | W | ( | w i | / | W | + r i ) · ( e i / E + c i ) .

4.4. Text Generation

In the previous stage, the best path, which contains a set of lemmas that will be used to generate a summary, is generated. Since many words can refer to one lemma, a series of steps are conducted to retrieve preserved words and generate a final summary. As mentioned before, the original word for each lemma is stored in the Token node.

4.4.1. Forward Position Filtering

We extract all tokens related to a lemma. Some Token nodes have multiple words that refer to different positions in the original text. So, we perform filtering by position. For each node (called start node), we check the next node (referred to as the end node). If the start and end nodes are in the same sentence, then the start node is ignored if its position is after the end node, since reversing their positions is not acceptable. If the start node is in a sentence that is before the end-node sentence, then the start node is added to the list of nodes.

4.4.2. Backward Position Filtering

For each end node, if the start and end nodes are in the same sentence, then the end node is added to the list of nodes if there is a position for the end token after any position of the start node. If the end node occurred in any sentences after the start node sentence, then it is added to the list (word list). Another filtering occurs when the start node has only one word. In this case, the end token words are filtered by removing all words that are far away from the start token word. This will decrease the number of words that belong to the same lemma.

4.4.3. Processing Tokens

In this step, further processing is performed on the tokens that are filtered in the previous step. This processing is performed at the token level.
  • If the start token has multiple words and the end token has only one word ending in (هـ), then we remove any start token word that starts with (ال). Once the start node has only one token, then the start and end nodes are concatenated to form one word.
  • Otherwise, when the start node token has multiple words, (a) if a token ends in (ا) and there is another one having the same letters but not ending in (ا), then the token is removed; e.g., in (رئيسا, رئيس), the word (رئيسا) will be removed, since diacritics are not our concern in the summary and to minimize the suggested words. (b) Ff the previous token starts with (ال) and there is a start token beginning with (ال), then any word from the start token that does not start with (ال) is removed. Otherwise, the word that does not start with (ال) is removed.

4.4.4. Processing Final Summary

In this step, processing is performed on the list of tokes from the previous step. Each Token node having more than one word is concatenated in one string enclosing with brackets and separated with commas. This string will be displayed in the final summary to give the reader options for appropriate words. For example,
يظهر هذا البحث ان شبكية العين (خاصة, الخاصة) المتصلة مباشرة بمنطقة المخ.
If a Token node has only one word, then a set of conditions are checked as follows:
  • If the word token starts with the letter (و) and the next token has only one letter, then the next token is migrated to the next round check as the current token.
  • If the word token is one of the prefix letters (ب، س، ف، ل) and the next token is a preposition, then the token is ignored.
  • If the next token is one of the suffixes (هـ، ها، هم), then it is appended with the current token after some modifications as follows: (a) If the current token is a preposition that ends in (ى), then the ending letter is changed to (ي), e.g., (على + هـ →عليه). (b) If the current token is one of the prepositions (حتى، كي، منذ، مذ، متى، رب), then the next token is ignored, since it cannot be a valid Arabic word, after it has been concatenated with the current token. (c) If the current token has only one letter or starts with (ال), then the next token is ignored; e.g., (س + هم ¬ سهم), becomes a different word or not a valid one. (d) If the current token ends in (ة) and does not start with (ال), then (ة) is replaced with (ت), e.g., (طبيعة + هـ →طبيعته). (e) If the current token ends in (ى), then (ى) is replaced with (ا), e.g., (أجرى + ها → أجراها).
  • If the current token is the letter (ل) and the next one starts with (ال), then the letter (ا) is removed from the next token word, and the tokens are concatenated, e.g., (ل + السماء → للسماء).
  • If the current token is one of the prefixes (س، ب، ف، ل), then it is concatenated with the next one.
  • Otherwise, the next token is migrated to the next iteration as the current token.
Any sentence with less than four words is removed. If a sentence ends with a preposition, then it is removed. And, any one-letter word is removed unless it is the letter (و) or a full-stop mark.

5. Results and Discussion

A set of experiments were conducted to evaluate the AASAC system. This section showcases setting up the ant colony system (ACS) parameters, the dataset used for evaluation, the evaluation metric, the experimental results, human evaluation, and the discussion.
To facilitate the integration of ACS with Neo4j, we developed a custom Cypher procedure using the Java programming language. This procedure was then called from Python 2.7 for executing our experiments. All experiments were conducted on a Mac OS X 11.7.3 system equipped with a 2.3 GHz Quad-Core Intel Core i7 processor and 32 GB of RAM.

5.1. Experimental Setup

ACS employs a range of parameters, each with specific values. The termination condition, denoted by the number of iterations, was set to i = 30 . Given that ACS incorporates local search, the number of ants, m, was deliberately chosen as a relatively small value, i.e., m = 10 . Following the guidelines presented in [56], the evaporation rate ρ was established at 0.1, while δ was set to 0.1. The weight assigned to heuristic information, denoted by β , was configured to 2.0, with ε set to 0.1 and q 0 to 0.7. Finally, the initial pheromone trail, τ 0 , was calculated as d / n , where n = | W | represents the number of nodes (words) in the graph and d denotes the nearest-neighbor distance.
We performed several experiments to generate summaries at 30% and 50% of the original text. One of the experiments, called H1FF1, utilized the heuristic information function H 1 along with the fitness function F F 1 . Another experiment, named H2FF2, employed the heuristic information function H 2 with the fitness function F F 2 . Table 4 lists all the experiments.
We employed three different settings for the candidate list, namely, 1-gram, 2-gram, and 3-gram, to generate both H1FF1 and H2FF2 summaries. To indicate the appropriate n-gram, the experimental names are further marked by -n. Consequently, the variations are denoted by HiFFj-n to indicate ACS variations when using the heuristic information function H i with fitness F F j and n-gram, with n [ 1 , 3 ] . For instance, H1FF1-1, H1FF1-2, and H1FF1-3 are our ACS variations when using the heuristic information function H 1 with fitness F F 1 and 1-gram, 2-gram, and 3-gram candidate lists, respectively.

5.2. Dataset

Due to the absence of a standardized Arabic single-document abstractive summary dataset, we utilized a subset of data shared by [8]. The subset comprises 104 documents of varying lengths, with an average of 239 words each. These documents were accompanied by system-generated summaries of 30% and 50% of the original document size, which we considered a baseline.
The documents were collected from different sources, including 79 documents from Saudi Arabian newspapers Al-Riyadh (https://www.alriyadh.com, accessed on 5 August 2021.) and Al-Jazirah (https://www.al-jazirah.com, accessed on 5 August 2021), and 25 documents from Lebanese newspapers Al-Joumhouria (https://www.aljoumhouria.com, accessed on 5 August 2021) and An-Nahar (https://www.annahar.com, accessed on 5 August 2021). The topics covered in these documents encompassed a range of subjects such as general health, sports, politics, business, and religion.
Figure 8 displays the distribution of the document lengths based on the number of words. The x-axis depicts the document word counts grouped into categories, while the y-axis represents the number of documents falling into each category. The majority of documents contained between 200 and 300 words.
Since the dataset lacked human-authored summaries for comparison, we sought human professionals to summarize the documents to 30% and 50% of their original size. The complete dataset comprising all 104 documents, alongside their corresponding 30% and 50% human summaries, is freely accessible to interested parties.

5.3. Evaluation Metric

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) [59] is a common evaluation measure for summaries. It compares a generated summary with one or more reference summaries according to a set of metrics; ROUGE-N compares the N-gram, ROUGE-L compares the longest word sequence, and ROUGE-SU skip-bigram counts unigrams and any pair of words between them. For example, ROUGE-N is computed as follows:
ROUGE - N = # overlapping   n - grams # n - grams in reference summary ,
= S { RS } n - gram S count match ( n - gram ) S { RS } n gram S count ( n gram ) ,
where S is a reference summary, RS is the set of reference summaries, N (in ROUGE-N) is the length of the n-gram, and count match ( n gram ) is the maximum number of n-grams co-occurring in a candidate summary and a set of reference summaries. However, Arabic is a more morphological language. Therefore, using ROUGE to evaluate Arabic texts will not result in a valid comparison.
To solve this problem, the LemmaRouge metric [60], which converts system summary words into lemmas as a unified surface form, before applying the ROUGE toolkit [61], was used. The LemmaRouge-N is given by
LemmaRouge N = S { RS } lemma - n S count match ( lemma - n ) S { RS } lemma n S count ( lemma - n ) ,
where lemma-n is a sequence of consecutive n words’ lemmas from a given text.
The use of a lemma-based ROUGE metric for evaluating Arabic text summarization systems proves advantageous due to the morphological complexity of the Arabic language. By considering the lemma form of words, which captures the root meaning while disregarding variations in inflections and derivations, the metric provides a more accurate measure of semantic similarity between system-generated summaries and the original text. This approach better accounts for the unique linguistic characteristics of Arabic and improves the evaluation and development of abstractive text summarization systems in this or other Semetic languages. An example highlighting the advantages of using LemmaRouge for Arabic text is shown in Figure 9.

5.4. Evaluation Results

As most Arabic abstractive summarizers generate one sentence, we compared the performance of our system against the results obtained from the summarizer in [8], which we considered a baseline and we called ANSum. We also applied lemmatization to ANSum summary texts and reference summary texts.
The LemmaRouge-1 and LemmaRouge-2 metrics were used to measure the coverage of salient information, and LemmaRouge-L was used to evaluate fluency. Table 5 reports LemmaRouge recall and F 1 scores for summaries that are 30% of the original length. Each reported score represents the average of three runs. ANSum LemmaRouge scores are the results of comparing the reference summaries with the summaries generated by the system in [8]. As mentioned earlier, we have variants of each of the experiments listed in Table 4. The specific variant depends on the candidate list, which can be 1-gram, 2-gram, or 3-gram. For example, H2FF1-3 indicates an experiment that uses the heuristic information function H 2 with fitness F F 1 and a 3-gram candidate list.
The results show that the H2FF2-1 variant achieved the highest recall, outperforming all other variants and the ANSum system. It also attained superior F 1 scores across all metrics except for LemmaRouge-2, indicating that relation and collocation features enhance performance.
When these features were excluded, the H1FF1-2 variant—using a 2-gram candidate list—performed better than 1-gram or 3-gram lists in terms of LemmaRouge-1 and LemmaRouge-L. The 3-gram candidate lists showed no improvement regardless of feature inclusion. We attribute this to three key factors: (1) the combinatorial sparsity of possible 3-gram combinations reduces their utility as building blocks, (2) longer n-grams impose rigid structural constraints that limit sentence generation flexibility, and (3) the fixed-length structure hinders dynamic optimization between salient coverage and fluency.
The LemmaRouge recall and F 1 scores for the evaluations of summaries at 50% of the original text size are reported in Table 6. All our AASAC system variants gained a higher recall score than ANSum, and higher F 1 scores in LemmaRouge-1 and LemmaRouge-L. In general, variant H2FF2-2 achieved the best results, except for the scores of LemmaRouge-2 recall, and LemmaRouge-2 and LemmaRouge-L F 1 scores. As in the case of the 30% summary size, the variant H1FF1-2 gave better results than using other 1-gram or 3-gram candidate lists for all scores except LemmaRouge-2 recall and F 1 .

5.5. Human Evaluation

ROUGE, originally designed for extractive summaries, falls short in assessing abstractive summaries due to their divergence from the source text wording. Abstractive summaries focus on conveying meaning rather than verbatim representation, making ROUGE’s word-matching approach inadequate. Alternative evaluation methods are needed to capture the semantic and conceptual aspects of abstractive summarization accurately.
To minimize the human effort required for manual evaluation, we selected 20 random documents from the dataset and enlisted three human evaluators to assess our summarizer, AASAC. Given that the H2FF2 variant achieved higher LemmaRouge scores, we specifically asked the evaluators to choose the best summary among the H2FF2 variants based on 1-gram, 2-gram, and 3-gram. In Figure 10a, the preferences of the evaluators for each H2FF2 summary are shown, with the scores representing the total number of times a variant was chosen by the three evaluators. Each selection by an evaluator increases the score by one. A maximum score of 60 indicates unanimous agreement among the evaluators for a particular variant across all 20 documents. The results showed that the H2FF2 summary using a 2-gram candidate list received the highest score compared with the other n-grams.
Following that, the evaluators provided assessments of the H2FF2 summary using a 2-gram candidate list, answering four questions: (Q1) Does the summary effectively cover the document’s most important issues? (Q2) Does the summary enable readers to understand the main points of the article? (Q3) How would you rate the summary’s readability? (Q4) What is your overall assessment of the summary’s quality? Each question was answered on a scale of 1 to 5, with 1 indicating strong disagreement, 3 for a neutral response, and 5 for strong agreement. Figure 10b summarizes the questionnaire results, indicating that the summary effectively captures the document’s most important aspects, enabling readers to understand its content with average scores of 3.7 and 3.8, respectively. Additionally, the summary received ratings of 2.9 and 3.1 for readability and overall quality, respectively.

5.6. Discussion

The experimental results show the ability of ACS to select salient words to generate an informative summary. These results show the potential of considering relation and collocation features in the heuristic information and the fitness function to boost the results. Moreover, the results indicate that setting 3-gram candidate lists will not improve the summary results for any size even when adding relation and collocation features.
Nevertheless, there are occurrences in which the word segmentation performed by the tokenizer is inaccurate, resulting in an adverse impact on the generated summary. For instance, the term (بقايا: “remains”) is incorrectly split into two tokens, namely, (ب) and (قايا). While the character (ب) is a valid Arabic letter, the token (قايا) does not correspond to a valid Arabic word. Similarly, the word (بغداد: “Baghdad”) is divided into (ب) and (غداد). Although both the character (ب) and the token (غداد) exist in the Arabic dictionary, the token (غداد) conveys a different meaning than (بغداد). Utilizing a robust tokenizer is anticipated to address this issue effectively.
Another limitation of our AASAC summarizer manifests when there is an ambiguous selection between more than one word, as the summarizer shows all words’ possibilities. This can be solved by incorporating a grammar model. Repeated words are scarcely generated, and they can be addressed by adding a penalty or cost to the fitness function.
The human evaluators expressed positivity regarding the content of the summary, indicating that our AASAC summarizer effectively captured the main points of the document. However, they remained neutral when evaluating the readability and overall quality of the summary. This suggests that our summarizer may benefit from incorporating language guidance to enhance these aspects during the summary generation process.

6. Conclusions and Future Work

The growing volume of Arabic text content necessitates efficient summarization methods that can quickly distill valuable information. Prior research reveals three critical limitations that this work addresses: (1) existing nature-inspired approaches (ACO, PSO, etc.) remain exclusively extractive, merely selecting sentences rather than rewriting content; (2) few systems optimize for Arabic’s morphological complexity; and (3) none simultaneously address content coverage and linguistic fluency as interdependent objectives. Our AASAC framework breaks these barriers as the first method to (a) apply swarm intelligence (ACS) for true abstractive generation, (b) incorporate Arabic-specific collocations and word relations with LemmaRouge evaluation, and (c) formulate summarization as a multi-objective optimization balancing content preservation (via ACS pathfinding) and fluency (via semantic features). Our domain-independent framework demonstrates strong performance across diverse topics, as evidenced by evaluations on a dataset of over 100 documents using the LemmaRouge metric, outperforming existing Arabic abstractive summarization baselines.
Future work should focus on three key enhancements: first, improving tokenization accuracy through grammar modeling to resolve ambiguous word selections; second, optimizing the fitness function to penalize word repetition; and third, incorporating knowledge-based techniques like Named Entity Recognition, Coreference Resolution, and Sentiment Analysis to deepen semantic understanding. These advancements would further strengthen AASAC’s ability to handle Arabic’s morphological complexity while maintaining the computational efficiency that makes swarm intelligence approaches particularly valuable for real-world applications.

Author Contributions

Conceptualization, A.M.A.; methodology, A.M.A.-N. and A.M.A.; software, A.M.A.-N.; validation, A.M.A.-N.; formal analysis, A.M.A.-N.; investigation, A.M.A.-N.; resources, A.M.A.; data curation, A.M.A. and A.M.A.-N.; writing—original draft, A.M.A.-N.; writing—review and editing, A.M.A.; Supervision, A.M.A. Both authors have read and agreed to the published version of the manuscript.

Funding

The authors would like to thank Ongoing Research Funding Program (ORFFT-2025-006-2), King Saud University, Riyadh, Saudi Arabia, for financial support.

Data Availability Statement

The dataset described in this work is available at http://dx.doi.org/10.13140/RG.2.2.22370.09922, accessed on 5 August 2021.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations (ordered alphabetically) are used in this manuscript:
AASACArabic Abstractive Summarization using Ant Colony
AIArtificial Intelligence
ACOAnt Colony Optimization
ACSAnt colony system
CSOCat Swarm Optimization
NLPNatural Language Processing
OOVOut-Of-Vocabulary
POSPart Of Speech
PSOParticle Swarm Optimization
RSGRich Semantic Graph
RSTRhetorical Structure Theory
SISwarm intelligence
TSPTraveling Salesman Problem

References

  1. Luhn, H.P. The automatic creation of literature abstracts. IBM J. Res. Dev. 1958, 2, 159–165. [Google Scholar] [CrossRef]
  2. Yadav, D.; Desai, J.; Yadav, A.K. Automatic Text Summarization Methods: A Comprehensive Review. arXiv 2022, arXiv:2204.01849. [Google Scholar] [CrossRef]
  3. Mosa, M.A.; Hamouda, A.; Marei, M. Graph coloring and ACO based summarization for social networks. Expert Syst. Appl. 2017, 74, 115–126. [Google Scholar] [CrossRef]
  4. Tefrie, K.G.; Sohn, K.A. Autonomous Text Summarization Using Collective Intelligence Based on Nature-Inspired Algorithm. In Proceedings of the International Conference on Mobile and Wireless Technology, Kuala Lumpur, Malaysia, 26–29 June 2017; Springer: Singapore, 2017; pp. 455–464. [Google Scholar]
  5. Binwahlan, M.S.; Salim, N.; Suanmali, L. Swarm Based Text Summarization. In Proceedings of the 2009 International Association of Computer Science and Information Technology-Spring Conference, Singapore, 17–20 April 2009; IEEE Computer Society: Washington, DC, USA, 2009; pp. 145–150. [Google Scholar]
  6. Rautray, R.; Balabantaray, R.C. Cat swarm optimization based evolutionary framework for multi document summarization. Phys. A Stat. Mech. Its Appl. 2017, 477, 174–186. [Google Scholar] [CrossRef]
  7. Almohaimeed, N.; Azmi, A.M. Abstractive text summarization: A comprehensive survey of techniques, systems, and challenges. Comput. Sci. Rev. 2025, 57, 100762. [Google Scholar] [CrossRef]
  8. Azmi, A.M.; Altmami, N.I. An abstractive Arabic text summarizer with user controlled granularity. Inf. Process. Manag. 2018, 54, 903–921. [Google Scholar] [CrossRef]
  9. Ganesan, K.; Zhai, C.; Han, J. Opinosis: A graph-based approach to abstractive summarization of highly redundant opinions. In Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China, 23–27 August 2010; Association for Computational Linguistics: Stroudsburg, PA, USA, 2010; pp. 340–348. [Google Scholar]
  10. Kumar, N.; Srinathan, K.; Varma, V. A knowledge induced graph-theoretical model for extract and abstract single document summarization. In Computational Linguistics and Intelligent Text Processing; Springer: Berlin/Heidelberg, Germany, 2013; pp. 408–423. [Google Scholar]
  11. Lloret, E.; Romá-Ferri, M.T.; Palomar, M. COMPENDIUM: A text summarization system for generating abstracts of research papers. Data Knowl. Eng. 2013, 88, 164–175. [Google Scholar] [CrossRef]
  12. Mehdad, Y.; Carenini, G.; Ng, R.T. Abstractive summarization of spoken and written conversations based on phrasal queries. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, MD, USA, 23–24 June 2014; Volume 1, pp. 1220–1230. [Google Scholar]
  13. Moawad, I.F.; Aref, M. Semantic graph reduction approach for abstractive Text Summarization. In Proceedings of the IEEE 2012 Seventh International Conference on Computer Engineering & Systems (ICCES), Cairo, Egypt, 27–29 November 2012; pp. 132–138. [Google Scholar]
  14. Ismail, S.; Moawd, I.; Aref, M. Arabic text representation using rich semantic graph: A case study. In Proceedings of the 4th European conference of computer science (ECCS’13), Paris, France, 29–31 October 2013; pp. 148–153. [Google Scholar]
  15. Ismail, S.S.; Aref, M.; Moawad, I. Rich semantic graph: A new semantic text representation approach for arabic language. In Proceedings of the 17th WSEAS European Computing Conference (ECC’13), Dubrovnik, Croatia, 25–27 June 2013. [Google Scholar]
  16. Subramaniam, M.; Dalal, V. Test Model for Rich Semantic Graph Representation for Hindi Text using Abstractive Method. Int. Res. J. Eng. Technol. (IRJET) 2015, 2, 113–116. [Google Scholar]
  17. Le, H.T.; Le, T.M. An approach to abstractive text summarization. In Proceedings of the IEEE 2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR), Hanoi, Vietnam, 15–18 December 2013; pp. 371–376. [Google Scholar]
  18. Lloret, E.; Palomar, M. Analyzing the use of word graphs for abstractive text summarization. In Proceedings of the First International Conference on Advances in Information Mining and Management, Barcelona, Spain, 23–29 October 2011; pp. 61–66. [Google Scholar]
  19. Azmi, A.; Al-Thanyyan, S. Ikhtasir—A user selected compression ratio Arabic text summarization system. In Proceedings of the IEEE 2009 5th International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE), Dalian, China, 24–27 September 2009; pp. 1–7. [Google Scholar]
  20. Azmi, A.M.; Al-Thanyyan, S. A text summarizer for Arabic. Comput. Speech Lang. 2012, 26, 260–273. [Google Scholar] [CrossRef]
  21. Kabeer, R.; Idicula, S.M. Text summarization for Malayalam documents—An experience. In Proceedings of the IEEE 2014 International Conference on Data Science & Engineering (ICDSE), Kochi, India, 26–28 August 2014; pp. 145–150. [Google Scholar]
  22. Embar, V.R.; Deshpande, S.R.; Vaishnavi, A.; Jain, V.; Kallimani, J.S. sArAmsha-A Kannada abstractive summarizer. In Proceedings of the IEEE 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Mysore, India, 22–25 August 2013; pp. 540–544. [Google Scholar]
  23. Kallimani, J.S.; Srinivasa, K.; Eswara Reddy, B. Information extraction by an abstractive text summarization for an Indian regional language. In Proceedings of the IEEE 2011 7th International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE), Tokushima, Japan, 27–29 November 2011; pp. 319–322. [Google Scholar]
  24. Li, J.; Chiu, B.; Shang, S.; Shao, L. Neural text segmentation and its application to sentiment analysis. IEEE Trans. Knowl. Data Eng. 2020, 34, 828–842. [Google Scholar] [CrossRef]
  25. Chau, P.P.M.; Bakkali, S.; Doucet, A. DocSum: Domain-Adaptive Pre-training for Document Abstractive Summarization. In Proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), Tucson, AZ, USA, 28 February–4 March 2025; pp. 1213–1222. [Google Scholar]
  26. Sagheer, D.; Sukkar, F. A Hybrid Intelligent System for Abstractive Summarization. Int. J. Comput. Appl. 2017, 168, 37–44. [Google Scholar] [CrossRef]
  27. Rush, A.M.; Chopra, S.; Weston, J. A neural attention model for abstractive sentence summarization. arXiv 2015, arXiv:1509.00685. [Google Scholar] [CrossRef]
  28. Bengio, Y.; Ducharme, R.; Vincent, P.; Janvin, C. A neural probabilistic language model. J. Mach. Learn. Res. 2003, 3, 1137–1155. [Google Scholar]
  29. Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
  30. Chopra, S.; Auli, M.; Rush, A.M. Abstractive sentence summarization with attentive recurrent neural networks. In Proceedings of the NAACL-HLT, San Diego, CA, USA, 16–17 June 2016; pp. 93–98. [Google Scholar]
  31. Nallapati, R.; Zhou, B.; Gulcehre, C.; Xiang, B. Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond. arXiv 2016, arXiv:1602.06023. [Google Scholar]
  32. Jean, S.; Cho, K.; Memisevic, R.; Bengio, Y. On Using Very Large Target Vocabulary for Neural Machine Translation. arXiv 2014, arXiv:1412.2007. [Google Scholar]
  33. Vinyals, O.; Fortunato, M.; Jaitly, N. Pointer networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; MIT Press: Cambridge, MA, USA, 2015; Volume 2, pp. 2692–2700. [Google Scholar]
  34. Sankaran, B.; Mi, H.; Al-Onaizan, Y.; Ittycheriah, A. Temporal Attention Model for Neural Machine Translation. arXiv 2016, arXiv:1608.02927. [Google Scholar] [CrossRef]
  35. See, A.; Liu, P.J.; Manning, C.D. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July–4 August 2017; Volume 1, pp. 1073–1083. [Google Scholar]
  36. Tu, Z.; Lu, Z.; Liu, Y.; Liu, X.; Li, H. Modeling Coverage for Neural Machine Translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August 2016; Volume 1, pp. 76–85. [Google Scholar]
  37. Paulus, R.; Xiong, C.; Socher, R. A Deep Reinforced Model for Abstractive Summarization. arXiv 2017, arXiv:1705.04304. [Google Scholar] [CrossRef]
  38. Al-Maleh, M.; Desouki, S. Arabic text summarization using deep learning approach. J. Big Data 2020, 7, 109. [Google Scholar] [CrossRef]
  39. Wazery, Y.M.; Saleh, M.E.; Alharbi, A.; Ali, A.A. Abstractive Arabic Text Summarization Based on Deep Learning. Comput. Intell. Neurosci. 2022, 2022, 1566890. [Google Scholar] [CrossRef]
  40. Zaki, A.M.; Khalil, M.I.; Abbas, H.M. Deep architectures for abstractive text summarization in multiple languages. In Proceedings of the 2019 IEEE 14th International Conference on Computer Engineering and Systems (ICCES), Cairo, Egypt, 17–18 December 2019; pp. 22–27. [Google Scholar]
  41. Antoun, W.; Baly, F.; Hajj, H. AraBERT: Transformer-based Model for Arabic Language Understanding. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, Marseille, France, 11–16 May 2020; pp. 9–15. [Google Scholar]
  42. Suleiman, D.; Awajan, A. Multilayer encoder and single-layer decoder for abstractive Arabic text summarization. Knowl. Based Syst. 2022, 237, 107791. [Google Scholar] [CrossRef]
  43. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
  44. Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 7871–7880. [Google Scholar]
  45. Liu, Y.; Lapata, M. Text Summarization with Pretrained Encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3730–3740. [Google Scholar]
  46. Elmadani, K.N.; Elgezouli, M.; Showk, A. BERT fine-tuning for Arabic text summarization. In Proceedings of the AfricaNLP Workshop at ICLR 2020, Virtual, 26 April 2020. [Google Scholar]
  47. Pires, T.; Schlinger, E.; Garrette, D. How Multilingual is Multilingual BERT? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 4996–5001. [Google Scholar]
  48. Kahla, M.; Yang, Z.G.; Novák, A. Cross-lingual fine-tuning for abstractive Arabic text summarization. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), Online, 1–3 September 2021; pp. 655–663. [Google Scholar]
  49. Eddine, M.K.; Tomeh, N.; Habash, N.; Roux, J.L.; Vazirgiannis, M. AraBART: A Pretrained Arabic Sequence-to-Sequence Model for Abstractive Summarization. arXiv 2022, arXiv:2203.10945. [Google Scholar]
  50. Parker, R.; Graff, D.; Chen, K.; Kong, J.; Maeda, K. Arabic Gigaword Fifth Edition LDC2011T11; Linguistic Data Consortium: Philadelphia, PA, USA, 2011. [Google Scholar]
  51. Hasan, T.; Bhattacharjee, A.; Islam, M.S.; Mubasshir, K.; Li, Y.F.; Kang, Y.B.; Rahman, M.S.; Shahriyar, R. XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online, 1–6 August 2021; pp. 4693–4703. [Google Scholar]
  52. Inoue, G.; Alhafni, B.; Baimukan, N.; Bouamor, H.; Habash, N. The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine (Virtual), 9 April 2021; pp. 92–104. [Google Scholar]
  53. Liu, Y.; Gu, J.; Goyal, N.; Li, X.; Edunov, S.; Ghazvininejad, M.; Lewis, M.; Zettlemoyer, L. Multilingual Denoising Pre-training for Neural Machine Translation. Trans. Assoc. Comput. Linguist. 2020, 8, 726–742. [Google Scholar] [CrossRef]
  54. Xue, L.; Constant, N.; Roberts, A.; Kale, M.; Al-Rfou, R.; Siddhant, A.; Barua, A.; Raffel, C. mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 483–498. [Google Scholar]
  55. Haruna, A.; Noman, K.; Li, Y.; Wang, X.; Hasan, M.J.; Alhassan, A.B. AddManBERT: A combinatorial triples extraction and classification task for establishing a knowledge graph to facilitate design for additive manufacturing. Adv. Eng. Inform. 2025, 67, 103578. [Google Scholar] [CrossRef]
  56. Dorigo, M.; Stutzle, T. Ant Colony Optimization; The MIT Press: London, UK, 2004. [Google Scholar]
  57. Qi, P.; Zhang, Y.; Zhang, Y.; Bolton, J.; Manning, C.D. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Online, 5–10 July 2020. [Google Scholar]
  58. Zerrouki, T. Maskouk: Arabic Dictionary for Collocations. 2012. Available online: https://pypi.org/project/maskouk-pysqlite/ (accessed on 20 March 2021).
  59. Lin, C.Y. ROUGE: A Package for Automatic Evaluation of Summaries. In Proceedings of the Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, Barcelona, Spain, 25–26 July 2004; Volume 2, pp. 74–81.
  60. Al-Numai, A.M.; Azmi, A.M. LEMMA-ROUGE: An Evaluation Metric for Arabic Abstractive Text Summarization. Indones. J. Comput. Sci. 2023, 12, 1351–1367. [Google Scholar] [CrossRef]
  61. Ganesan, K. ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization Tasks. arXiv 2018, arXiv:1803.01937. [Google Scholar] [CrossRef]
Figure 1. Diagram showing the four stages of the AASAC summarization system.
Figure 1. Diagram showing the four stages of the AASAC summarization system.
Mathematics 13 02613 g001
Figure 2. Graph representation.
Figure 2. Graph representation.
Mathematics 13 02613 g002
Figure 3. Example of AgentOf relation between the entity “United Nations” and the event “indicated”.
Figure 3. Example of AgentOf relation between the entity “United Nations” and the event “indicated”.
Mathematics 13 02613 g003
Figure 4. Properties of nodes and edges.
Figure 4. Properties of nodes and edges.
Mathematics 13 02613 g004
Figure 5. A sample Arabic news text, the first few lines only, and its English translation. For convenience we split and colored the first sentence for easy referral in its graphical representation shown in Figure 6.
Figure 5. A sample Arabic news text, the first few lines only, and its English translation. For convenience we split and colored the first sentence for easy referral in its graphical representation shown in Figure 6.
Mathematics 13 02613 g005
Figure 6. Graphical representation of the sample text in Table 4. The big blue nodes are the lemmas, and the small green nodes are the tokens. Red, green, and pink lines represent the sequences of the first, second, and third phrases of the first sentence, respectively. Gray color lines represent sequences of other sentences.
Figure 6. Graphical representation of the sample text in Table 4. The big blue nodes are the lemmas, and the small green nodes are the tokens. Red, green, and pink lines represent the sequences of the first, second, and third phrases of the first sentence, respectively. Gray color lines represent sequences of other sentences.
Mathematics 13 02613 g006
Figure 7. A 3-gram candidate list example for the lemma (دراسة: “a study”), indicated by the black node. Red arrows indicate 1-gram connections, green arrows represent 2-gram connections, and blue arrows indicate 3-gram connections.
Figure 7. A 3-gram candidate list example for the lemma (دراسة: “a study”), indicated by the black node. Red arrows indicate 1-gram connections, green arrows represent 2-gram connections, and blue arrows indicate 3-gram connections.
Mathematics 13 02613 g007
Figure 8. The distribution of the documents based on their size (number of words) in our dataset.
Figure 8. The distribution of the documents based on their size (number of words) in our dataset.
Mathematics 13 02613 g008
Figure 9. Comparison of recall calculation using standard ROUGE-1 and LemmaRouge-1 metrics for an automatic generated (model) summary and a reference summary. The model text reads “The study indicated that,” while the reference text is “A study indicated that.” The precision scores (not shown) are 1 / 4 and 3 / 4 , respectively, for these metrics. Colored text indicates the matching words.
Figure 9. Comparison of recall calculation using standard ROUGE-1 and LemmaRouge-1 metrics for an automatic generated (model) summary and a reference summary. The model text reads “The study indicated that,” while the reference text is “A study indicated that.” The precision scores (not shown) are 1 / 4 and 3 / 4 , respectively, for these metrics. Colored text indicates the matching words.
Mathematics 13 02613 g009
Figure 10. Results of human assessment of 20 randomly selected summaries produced by the AASAC system using H2FF2. (a) Frequency distribution of evaluator selections for 1-gram, 2-gram, or 3-gram summaries. (b) Average ratings provided by evaluators in response to four questions assessing the quality of 2-gram H2FF2 summaries.
Figure 10. Results of human assessment of 20 randomly selected summaries produced by the AASAC system using H2FF2. (a) Frequency distribution of evaluator selections for 1-gram, 2-gram, or 3-gram summaries. (b) Average ratings provided by evaluators in response to four questions assessing the quality of 2-gram H2FF2 summaries.
Mathematics 13 02613 g010
Table 1. A sample list of abstractive summarizers in different languages. Entries are sorted by the language.
Table 1. A sample list of abstractive summarizers in different languages. Entries are sorted by the language.
Ref.LanguageSummary ScopeCorpus/Dataset
[8]ArabicDocument levelDedicated dataset (newspapers)
[46]ArabicSentence levelKALIMAT
[38]ArabicHeadlineDedicated dataset (Arabic headline summary)
[48]ArabicDocument levelDedicated dataset (newspapers)
[49]ArabicSentence levelArabic Gigaword and XL-Sum
[39]ArabicHeadlineArabic headline summary and Arabic Mogalad_Ndeef
[42]ArabicSentence levelDedicated dataset
[9]EnglishSentence levelDedicated dataset (reviews from Tripadvisor, Amazon, and Edmunds)
[11]EnglishDocument levelDedicated dataset (50 medical research articles)
[10]EnglishDocument levelDUC-2001 and DUC-2002
[12]EnglishDocument levelGNU eTraffic archive
[27]EnglishSentence levelGigaword for training and testing, and DUC-2004 for testing
[30]EnglishSentence levelGigaword and DUC-2004
[31]EnglishSentence levelGigaword, DUC-2004, and CNN/Daily Mail
[35]EnglishDocument levelCNN/Daily Mail
[37]EnglishDocument levelCNN/Daily Mail and New York Times
[45]EnglishSentence levelCNN/Daily Mail, New York Times, and Xsum
[55]EnglishDocument levelUnstructured Additive Manufacturing texts
[16]HindiDocument levelN/A
[22]KannadaSentence levelN/A
[21]MalayalamDocument levelDedicated dataset (25 documents from Malayalam newspapers)
[17]VietnameseDocument levelDedicated dataset (50 documents collected from newspapers)
Table 2. List of symbols and their definition.
Table 2. List of symbols and their definition.
SymbolDefinition
mNumber of ants.
WSet of words w in the original document.
|W| Number of words in the original document.
w i Word i ( W ) .
ETotal number of edges in the document.
(i, j)Direct edge linking word w i and line j(≤E).
score i j Cost related to edge ( i , j ) .
π Threshold value which controls the length of the summary (in words).
qRandom variable ∈ [0, 1] ⊂ R .
J k ( i ) Set of nearest-neighbor nodes not yet selected by ant k.
UNumber of available nodes that have not yet been selected by ant k.
τ 0 Initial value of the pheromone trail.
τ i j Pheromone value for edge ( i , j ) .
η i j Heuristic information value of edge ( i , j ) .
β Parameter determining the relative importance of the values of the
heuristic information.
S best Best sub-optimal solution.
δ Local pheromone evaporation rate ∈ (0,1).
ρ Global pheromone evaporation rate.
F F Fitness function value for a solution.
Table 4. A list summarizing all the experiments.
Table 4. A list summarizing all the experiments.
ExperimentHeuristic Information FunctionFitness Function
H1FF1 H 1 F F 1
H1FF2 H 1 F F 2
H2FF1 H 2 F F 1
H2FF2 H 2 F F 2
Table 5. List of LemmaRouge [60] recall and F 1 scores for 30% length summaries. H2FF1-3 denotes our AASAC variant that incorporates the H 2 heuristic, the F F 1 fitness function, and a 3-gram candidate list. Scores are averaged over three runs. ANSum scores, used as a baseline, compare reference summaries and those generated by the system in [8] from the same dataset. Abbreviations like LROUGE-1 are used to save space. Best scores are in bold.
Table 5. List of LemmaRouge [60] recall and F 1 scores for 30% length summaries. H2FF1-3 denotes our AASAC variant that incorporates the H 2 heuristic, the F F 1 fitness function, and a 3-gram candidate list. Scores are averaged over three runs. ANSum scores, used as a baseline, compare reference summaries and those generated by the system in [8] from the same dataset. Abbreviations like LROUGE-1 are used to save space. Best scores are in bold.
LemmaRouge Recall ScoresLemmaRouge F 1 Scores
SystemLROUGE-1LROUGE-2LROUGE-LLROUGE-SU4LROUGE-1LROUGE-2LROUGE-LLROUGE-SU4
ANSum0.27950.18520.32520.19280.38380.25370.42070.2633
H1FF1-10.4347 ± 0.050.1980 ± 0.050.3836 ± 0.060.2313 ± 0.040.5099 ± 0.050.2301 ± 0.060.4460 ± 0.070.2666 ± 0.05
H1FF1-20.4349 ± 0.060.1891 ± 0.050.3881 ± 0.070.2279 ± 0.040.5102 ± 0.050.2202 ± 0.060.4537 ± 0.070.2638 ± 0.05
H1FF1-30.4173 ± 0.060.1839 ± 0.060.3752 ± 0.070.2187 ± 0.050.4957 ± 0.060.2168 ± 0.060.4438 ± 0.070.2562 ± 0.05
H2FF2-10.4451 ± 0.080.2102 ± 0.060.4121 ± 0.080.2390 ± 0.060.5161 ± 0.10.2421 ± 0.070.4682 ± 0.10.2725 ± 0.07
H2FF2-20.4315 ± 0.060.1946 ± 0.050.3929 ± 0.070.2279 ± 0.050.5086 ± 0.050.2275 ± 0.060.4565 ± 0.070.2645 ± 0.05
H2FF2-30.4237 ± 0.060.1916 ± 0.050.3850 ± 0.070.2222 ± 0.050.5020 ± 0.060.2253 ± 0.060.4511 ± 0.070.2593 ± 0.05
Table 6. The LemmaRouge recall and F 1 scores for summaries of 50% length. The rest is the same as in Table 5.
Table 6. The LemmaRouge recall and F 1 scores for summaries of 50% length. The rest is the same as in Table 5.
LemmaRouge Recall ScoresLemmaRouge F 1 Scores
SystemLROUGE-1LROUGE-2LROUGE-LLROUGE-SU4LROUGE-1LROUGE-2LROUGE-LLROUGE-SU4
ANSum0.41090.32400.47040.32890.53830.42550.58030.4321
H1FF1-10.5573 ± 0.100.3410 ± 0.080.5353 ± 0.090.3576 ± 0.080.6250 ± 0.110.3805 ± 0.090.5987 ± 0.080.3968 ± 0.09
H1FF1-20.5637 ± 0.070.3331 ± 0.060.5393 ± 0.080.3598 ± 0.060.6332 ± 0.070.3725 ± 0.070.6036 ± 0.070.4004 ± 0.07
H1FF1-30.5548 ± 0.050.3327 ± 0.060.5355 ± 0.060.3532 ± 0.050.6309 ± 0.050.3764 ± 0.060.6027 ± 0.050.3978 ± 0.05
H2FF2-10.5550 ± 0.070.3429 ± 0.070.5463 ± 0.080.3574 ± 0.060.6351 ± 0.070.3902 ± 0.070.6089 ± 0.070.4046 ± 0.06
H2FF2-20.5717 ± 0.050.3394 ± 0.050.5523 ± 0.060.3630 ± 0.050.6456 ± 0.040.3813 ± 0.060.6126 ± 0.060.4057 ± 0.05
H2FF2-30.5611 ± 0.050.3385 ± 0.050.5486 ± 0.060.3569 ± 0.050.6375 ± 0.040.3827 ± 0.050.6126 ± 0.050.4015 ± 0.05
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Al-Numai, A.M.; Azmi, A.M. Arabic Abstractive Text Summarization Using an Ant Colony System. Mathematics 2025, 13, 2613. https://doi.org/10.3390/math13162613

AMA Style

Al-Numai AM, Azmi AM. Arabic Abstractive Text Summarization Using an Ant Colony System. Mathematics. 2025; 13(16):2613. https://doi.org/10.3390/math13162613

Chicago/Turabian Style

Al-Numai, Amal M., and Aqil M. Azmi. 2025. "Arabic Abstractive Text Summarization Using an Ant Colony System" Mathematics 13, no. 16: 2613. https://doi.org/10.3390/math13162613

APA Style

Al-Numai, A. M., & Azmi, A. M. (2025). Arabic Abstractive Text Summarization Using an Ant Colony System. Mathematics, 13(16), 2613. https://doi.org/10.3390/math13162613

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop