Using Graph-Based Maximum Independent Sets with Large Language Models for Extractive Text Summarization

Hark, Cengiz

doi:10.3390/app15126395

Open AccessArticle

Using Graph-Based Maximum Independent Sets with Large Language Models for Extractive Text Summarization

by

Cengiz Hark

Department of Computer Engineering, İnönü University, Malatya 44000, Turkey

Appl. Sci. 2025, 15(12), 6395; https://doi.org/10.3390/app15126395

Submission received: 2 May 2025 / Revised: 4 June 2025 / Accepted: 5 June 2025 / Published: 6 June 2025

Download

Browse Figures

Versions Notes

Abstract

Large Language Models (LLMs) have shown a strong performance across various tasks but still face challenges in automatic text summarization. While they are effective in capturing semantic patterns from large corpora, they typically lack mechanisms for encoding structural relationships between sentences or paragraphs. Their high hardware requirements and limited analysis as to processing efficiency further constrain their applicability. This paper proposes a framework employing the Graph Independent Set approach to extract the essence of textual graphs and address the limitations of LLMs. The framework encapsulates nodes and relations into structural graphs generated through Natural Language Processing (NLP) techniques based on the Maximum Independent Set (MIS) theory. The incorporation of graph-derived structural features enables more semantically cohesive and accurate summarization outcomes. Experiments on the Document Understanding Conference (DUC) and Cable News Network (CNN)/DailyMail datasets are conducted with different summary lengths to evaluate the performance of the framework. The proposed method provides up to a 41.05% (Recall-Oriented Understudy for Gisting Evaluation, ROUGE-2 F1) increase in summary quality and a 60.71% improvement in response times on models such as XLNet, Pegasus, and DistilBERT. The proposed framework enables more informative and concise summaries by embedding structural relationships into LLM-driven semantic representations, while reducing computational costs. In this study, we explore whether integrating MIS-based graph filtering with LLMs significantly enhances both the accuracy and efficiency of extractive text summarization.

Keywords:

graph theory; maximum independent set; LLM; NLP; text summarization

1. Introduction

LLMs have significantly advanced NLP, particularly in tasks such as text comprehension, summarization, and generation [1]. Recent models, including GPT [2] by OpenAI, BERT, RoBERTa, XLNet, ELECTRA, ALBERT [3], Pegasus [4] by Google, and DistilBERT [5] by Hugging Face, have demonstrated a strong performance across a range of NLP applications.

Despite their success, LLMs face difficulties in capturing the structural aspects of language, particularly in tasks such as the generation of abstracts. This limitation often results in outputs that are either repetitive or lack contextual coherence. Their strong dependence on training data characteristics further limits their generalizability [6], making them sensitive to variations in data scope and training strategies [7,8]. Although they are capable of processing large volumes of data conceptually, their inability to model internal structures between text components can lead to inconsistencies, especially when summarizing long and complex texts [9,10]. Their high computational and memory requirements restrict real-time deployment and reduce overall efficiency [11]. These challenges highlight the necessity of integrating structural information into LLMs to improve their summarization performance and produce more contextually coherent summaries.

This paper proposes a novel framework based on the MIS theory to address several limitations of LLMs. These include hardware-related issues such as high computational costs, a limited capacity to process lengthy documents, and significant energy consumption. The framework also targets content-related challenges, including repetitive outputs, the omission of key information, and the lack of structural awareness due to a dependence on training strategies. By modeling textual data as simplified, independent graphs, the proposed method extracts structural features using the MIS algorithm as its core component. Integrating these features with semantic content allows the generation of more coherent and informative summaries. The framework also facilitates a better representation of cause–effect relationships, contributing to an improved summary quality and reliability of output.

The framework begins with preprocessing to prepare documents for structural analysis. It then constructs graph representations that capture relationships between sentences based on shared phrases and semantic similarity. Using the MIS algorithm, the framework identifies independent structures within these graphs to reduce redundancy and simplify the input. This allows LLMs to work with a more structured and concise dataset, improving both the summarization speed and efficiency. Rather than processing entire documents, the model focuses on key sentence connections, which enhances energy efficiency and reduces computational demands. By incorporating structural information, the framework also helps to optimize the input data and mitigates misleading guidance during inference. Experiments conducted on the CNN/DailyMail and DUC-2002 datasets demonstrate that the framework consistently produces accurate and coherent summaries. The approach improves summarization performance by reducing redundancy, minimizing information loss, and lowering the reliance on specific training data and strategies.

Motivation and Objective

LLMs have demonstrated a strong performance not only in NLP but also in fields such as image processing and time-series analysis. However, their large size and high computational and training costs limit their deployment on resource-constrained devices. Another important limitation is their reliance on contextual information while overlooking the structural aspects of language. Although existing studies process large-scale content at the conceptual level, they often fail to effectively capture the internal structure of text. As a result, these models struggle to maintain contextual coherence, particularly when summarizing complex documents. This highlights the need for integrating structural information to improve the summarization capabilities of LLMs.

In this study, a new framework based on the Graph Independent Set is proposed that enables LLMs to be used in inferential text summarization tasks with a high performance and minimum cost. In this study, the MIS algorithm was chosen because it provides a structural filtering mechanism capable of systematically filtering out contextually irrelevant sentences before summarization. While node centrality-based methods are effective in determining the importance of nodes, they fall short in filtering out nodes with a low content contribution. In contrast, MIS prevents redundancy and semantic overlap, resulting in more compact, focused, and diverse summaries. This approach strengthens the semantic integrity by identifying mutually independent nodes, in contrast to centrality-focused methods that may cause repetition by prioritizing highly connected nodes. Our aim within the scope of this study is to eliminate the limitations of LLMs, including their high computational cost, limited scalability, redundancy in outputs, and their inability to utilize structural relationships within text. Collectively, these research questions aim to determine whether integrating MIS-based graph filtering with LLMs can significantly enhance both their summarization accuracy and computational efficiency. Our Graph Independent Set-based study on the effective and efficient use of LLMs seeks answers to the following fundamental research questions;

Does the proposed MIS-based framework improve the summarization accuracy compared to existing popular LLMs?
Can the framework reduce computational costs and increase efficiency in extractive summarization tasks?
Does integrating structured information into LLMs enable a more precise modeling of cause–effect relationships within texts?
Can the use of structured information storage mitigate the context loss inherent in transformer architectures?
Does structured and canonical data storage reduce the energy consumption of LLMs by optimizing data loads, and does this integration strengthen their performance?
Can the proposed method achieve a high summarization accuracy at a low cost across different LLMs and datasets?

The remainder of this paper is structured as follows: Section 2 reviews previous studies related to the topic. Section 3 describes the materials and methods used in the study. Section 4 presents the experimental setup and the obtained results. Section 5 discusses the findings. Finally, Section 6 summarizes the paper by highlighting its contributions and offers suggestions for future research.

2. Related Works

This section reviews academic studies that are directly or indirectly related to the proposed framework. The analysis aims to identify prevailing trends in the literature and to assess potential research gaps from a comprehensive perspective. Understanding these foundations is essential for positioning the current study within the broader scientific context. For clarity and thematic coherence, the literature is categorized into four main groups: (1) limitations in integrating graph structures with LLMs, (2) graph-based approaches to text summarization, (3) hybrid architectures combining LLMs with Graph Neural Networks (GNNs), and (4) graph-based models developed specifically for scientific text summarization.

One major limitation of LLMs in handling graph-based data is their inability to process graph structures directly. Nevertheless, integrating graph learning models with LLMs has shown promise for enhancing their performance. For instance, GraphLLM [12] incorporates structural information from graphs into the input of LLMs, achieving a 96.45% reduction in context length. However, its effectiveness is highly dependent on the choice of graph model and the capacity of the LLM, necessitating adaptation for different graph types. In [13], the authors explored node classification using graphs with textual node features. LLMs were used to enrich node representations, which were then processed by a GNN. While this approach leveraged LLMs for improved feature encoding, it also introduced a significant computational overhead. Moreover, over-reliance on the graph structure led to the omission of relational information between nodes. In another study [14], various graph-to-text transformation methods were evaluated to determine their impact on extractive summarization performance. With appropriate representations, performance improvements of up to 61.8% were achieved. However, the results did not generalize well to large-scale graphs, indicating limitations in scalability.

In [15], a benchmark was introduced to assess the structural and semantic understanding of LLMs of graphs. The input design and context structure were identified as key factors affecting performance. The study found that LLMs still perform worse than graph-specialized models.

Studies integrating graph-based text summarization typically utilize relationships at the document and sentence levels. GoSum [16] converts each document into a heterogeneous graph, modeling sentence-level and hierarchical contexts with GNNs before applying LLMs. The model achieved high ROUGE scores on the PubMed and arXiv datasets, but its performance depends heavily on the document structure. MultiBART-GAT (Graph Attention Network) [17] is another abstractive summarization model that combines knowledge graphs with BART. In this study, the limitations of LLMs in checking the accuracy of information were addressed. It leverages knowledge graphs and transformers for abstractive summarization. The researchers proposed a BART-based model called MultiBART-GAT. This model integrated knowledge graphs and multi-source transformers. LLMs were effective in controlling the information accuracy. The model’s knowledge graph creation and multi-source data processing are computationally expensive and complex. The HeterMDS model [18] constructs a heterogeneous graph structure consisting of document, paragraph, and sentence nodes. It employs a Graph Attention Network (GAT) to learn, while maintaining structural and content integrity, aiming to produce high-quality summaries.

The combination of GNNs and LLMs shows a strong potential for summarization and other NLP tasks. In [19], GNNs were used to predict protein interactions, while LLMs validated these predictions. This approach achieved a high accuracy and provided meaningful explanations. The study in [20] focused on modeling rich conversational structures, addressing the challenges caused by the unstructured nature of human interactions. Two graph types, Discourse Relationship Graph and Action Graph, were employed. The model performed well in both automatic and human evaluations. In [21], document-level information was incorporated alongside sentence-level relationships using a graph-structured representation. This method improved the summarization quality, especially for long scientific articles, and worked well across different document lengths and types. However, the model’s performance depended on dataset characteristics and required high integration costs.

Some studies apply graph-based summarization techniques to specific domains. In this study, an innovative model is proposed by integrating neural networks and graph-based methods for a multi-document summarization task [22]. This model aims to create effective and meaningful summaries by calculating the relationships between sentences. The model has been tested on the DUC-2004 dataset. Document section hierarchies are represented in the graph’s edges. Offline reinforcement learning is employed, and evaluations on PubMed and arXiv datasets yield high ROUGE scores. However, since the model selects sentences independently, context integrity is not provided. The desired level of semantic inference cannot be achieved due to repetitive sentences and heavy processing loads. In [23], the ability of LLMs to make sense of graph data and optimize network performance is focused on. The proposed framework aims to improve the route optimization and communication resource allocation of unmanned aerial vehicles. With LLM-supported graph analysis, a great potential has been achieved in network optimization and automatic decision-making processes. Plaza et al. [24] used graph structures for summarizing biomedical texts. In the generated graphs, semantic relationships between concepts were utilized to summarize important content. Additionally, nodes were classified using UMLS semantic types. This foundation was later extended by Givchi [25], whose new approach considered not only the semantic similarity between concepts but also their frequency of occurrence within documents. This led to more meaningful node connections and thus, the creation of more robust graphs. In a study where GNN processes were combined with text-based guidance [26], language-based graph representations were used. The issue of out-of-vocabulary (OOV) tokens used in nodes hindering task transferability was addressed through a vocabulary approach. However, the prompt-based GNN processed each graph independently, neglecting semantic similarities across graphs.

Despite significant progress in integrating graph structures with LLMs, several key limitations remain. The most notable is the high computational cost, as LLMs require considerable processing power due to their complexity. Another major drawback is the frequent inability to effectively combine structural and semantic information. Many models focus primarily on either structural relationships or conceptual content, with few successfully integrating both aspects.

This study proposes a framework that simultaneously evaluates structural and semantic information, achieves a high performance with low computational cost, and combines the robust theoretical foundations of graph theory with LLMs. It aims to address the context loss and processing inefficiencies found in existing methods. The need for a general-purpose framework constitutes the primary motivation for this study.

Table 1 presents the main features, advantages, and limitations of the reviewed studies, including the proposed method.

3. Datasets

To evaluate the performance of the proposed summarization model, two text summarization datasets, DUC-2002 [27] and CNN/DailyMail [28], were used in the experimental process. These datasets were selected to compare the model’s performance across different document types and to quantitatively assess its generalization ability. Evaluating the model on diverse contexts is essential. Some information about the datasets is presented in Table 2.

3.1. Description of the DUC Dataset

The Document Understanding Conference-2002 (DUC-2002) dataset is used for multi-document text summarization tasks. It contains sets of news articles, along with human-prepared reference summaries for each set, totaling 567 news articles. The DUC-2002 dataset is valuable due to the complexity of information across document sets and the presence of overlapping content that requires summarization. It is well-suited for evaluating the information fusion capability of summarization models [27].

3.2. Description of the CNN/DailyMail Dataset

The CNN/DailyMail dataset consists of news articles paired with their summaries. It is a single-document summarization dataset featuring a wide range of topics. Created by Hermann et al. [29], this dataset is well-suited for testing linguistic diversity due to its broad topic coverage. It contains approximately 290,000 news articles [28,30].

4. Materials and Methods

The main stages of this work are illustrated in Figure 1. The summarization process begins with a preprocessing step applied to the documents. Following this, graphs are generated from the processed texts. In these graphs, relationships between nodes, each representing a sentence, are established based on shared phrases. The next step is the Graph Independent Set analysis, which forms the core of the proposed framework. The MIS is identified on these graphs, and the nodes comprising the independent sets are extracted. This step aims to remove sentence groups represented by the MIS from the graphs. By doing so, repetitive information in the text graphs is eliminated before the quantitative evaluation. This restriction effectively prevents the inclusion of redundant word groups in the summary and addresses the limitations of LLMs.

The revised graphs’ representative sentences are then passed to the LLMs. This creates a much smaller subset of the original text, enabling LLMs to select sentences more quickly and with better contextual clarity. As a result, LLMs process a considerably more concise dataset compared to the full documents. Sentence coverage scores from XLNet, RoBERTa, DistilBERT, ELECTRA, ALBERT, and Pegasus are used to generate summaries of varying lengths (3, 200, and 400 words). To demonstrate the framework’s reliability and robustness, experiments are conducted on two distinct datasets: CNN/DailyMail and DUC-2002. While CNN/DailyMail offers broad, up-to-date news content, DUC-2002 provides more academic and comparative documents. Including both datasets is important to reflect the framework’s comprehensiveness, despite structural differences. Significant improvements in summarization quality are observed across both datasets. Additionally, the summarization process becomes notably faster.

Figure 2 shows the general block diagram of the proposed framework. This framework combines LLMs with graph-based methods for automatic text summarization, enabling LLMs to utilize both conceptual and structural information. The input data consists of texts from the DUC and CNN/DailyMail datasets. To support generalization with quantitative evidence, these texts are preprocessed and simplified while preserving their structural and contextual integrity.

In the second stage, a graph representing sentence phrases and terms is generated from the preprocessed data. Analyses based on Graph Independent Set principles are then conducted to identify the most independent nodes. In the third stage, nodes within the MIS cluster representing sentences least related to the main text are removed, resulting in new graphs containing sentences more closely aligned with the main idea.

In the fourth stage, the most valuable nodes from these graphs are selected using the eigenvector centrality metric. This step increases the understanding of the framework.

Finally, sentences corresponding to nodes with the highest centrality values are identified and selected for further evaluation by LLMs (XLNet, RoBERTa, DistilBERT, ELECTRA, ALBERT, and Pegasus). These pre-trained LLMs perform an additional scoring process on the selected sentences to determine the most suitable candidates for the summary. The refined texts, enriched with structural information, are then input to the LLMs. In the final stage, candidate summaries of the desired lengths are generated from the selected sentences. This framework aims to improve LLMs’ summarization efficiency through sentence selection techniques grounded in Graph Independent Set theory. Overall, the proposed model achieves both a high accuracy and rich summarization quality while reducing computational costs.

4.1. Integrating Natural Language and Graph-Based Representations

In this subsection, textual graphs are constructed. Creating sentence-based graphs that represent the input texts and extracting structural information from these graphs constitute an important stage. The effectiveness of these graphs in storing structural information directly impacts the quality and efficiency of the candidate summaries. This representation approach is based on modeling each sentence as a node, thereby forming a graph structure. Prior to the graphs’ construction, the texts undergo preprocessing steps, including lemmatization. The proposed framework generates graphs following these preprocessing procedures. Formally, a textual graph is represented as

G = (V, E)

where

V \in (V_{1}, V_{2}, \dots \dots, V_{n})

denotes the set of nodes (sentences), and

E \in (E_{1}, E_{2}, \dots \dots, E_{n})

denotes the set of edges (relationships between sentences). Typically, full textual graphs are analyzed using dense matrix representations, which have a time complexity of O(N²). An example of a text-based graph is illustrated in Figure 3. This stage provides the foundational framework and representation for subsequent processing steps.

4.2. Independent Set

MIS is a fundamental concept applied across various fields, facilitating the analysis of relational structures between nodes in a graph. It is utilized to address practical problems in areas such as network planning [31], social network analysis [32], bioinformatics [33], and NLP [34]. An MIS is defined as a subset of nodes in which no two nodes are adjacent; that is, no edges exist between any pair of nodes within the set. Importantly, multiple independent sets may satisfy this condition in a given graph. The MIS problem is equivalent to the maximum clique problem and is classified as NP-hard for general graph types [35]. Consequently, identifying optimal or near-optimal solutions poses significant computational challenges. Various methods have been proposed in the literature to approximate or solve this problem [36]. Formally, given a graph

G = (V, E)

, an independent set

S \subseteq V (G)

is a set of nodes with no connecting edges between them. Adding any other node to

S

would introduce an edge within the set, thus violating the independence property. Among all independent sets identified in

G

, the one with the largest cardinality is designated as the MIS. Additionally,

S

must not be a proper subset of any other independent set in

G

, ensuring the mutual exclusivity or “no conflict” principle among nodes [31,32]. The seminal work by [37] investigates the problem of determining the maximum number of independent sets possible in a graph with nnn nodes, providing solutions for various graph classes.

Within the scope of this study, the textual content targeted for summarization is represented as a graph structure, as described in the corresponding section. From this graph, independent sets are identified, and the sentences associated with the nodes in these sets are excluded from the summarization process. This exclusion is based on the assumption that such sentences contribute minimally to the core content and should not appear in the final summary. By eliminating these elements, the summarization process can prioritize more semantically relevant and informative content. The remaining sentences are evaluated using node-centric graph metrics to guide the summary generation. The full sequence of these steps is detailed and illustrated in Figure 4.

Algorithm 1 takes textual graphs as input and outputs an MIS. This recursive algorithm continues until only independent sets remain or until the graph is empty. In each iteration, the graph is simplified, and eventually all possibilities are scanned to find the MIS.

Algorithm 1. MIS Algorithm: Maximum_Independent_Set (G).
1	function Maximum_Independent_Set $(G)$
2		if all vertices in $V$ have degree 0:
3		return $S = V$
4		if $V$ is empty:
5		return empty set
6		select vertex $u \in V$ with minimum degree
7		if $u \notin S$ then //(if u is not included in the set)
8		$G 1 = G - {u}$
9		$S 1$ = Maximum_Independent_Set $(G 1)$
10		else $u \in S$ //(if u is included in the set)
11		$G 2 = G - {u} -$ neigbors $(u)$
12		$S 2 =$ Maximum_Independent_Set $(G 2) \cup {u}$
13		if $\|S 1\|$ > $\|S 2\|$ then
14		return $S 1$
15		else
16		return $S 2$
17	end function

In order to perform basic condition checks, all node degrees are checked to be 0. If all node degrees are 0, it means that there are no edges in this graph, and all nodes are independent sets. If there are no nodes in the graph, the independent set is empty. (line 2–4)

The node with the minimum degree (u) is selected from the remaining nodes. Since the relevant node will have fewer neighbors, fewer nodes will be eliminated. Thus, the probability of a larger independent set tries to be increased. (line 6)

After this stage, two possibilities are evaluated. The first of these possibilities is that the relevant node (u) is excluded. In this case, (u) is removed from the graph and is not included in the independent set. The algorithm will find the largest independent set over the remaining nodes. (line 7–9)

The second possibility is that node (u) and all its neighbors are removed from the graph. In this case, u is added to the independent set. (line 10–12)

In the last stage, the larger of the clusters found is selected. (line 13–16)

Although the Maximum Independent Set (MIS) problem is theoretically classified as NP-hard, the proposed algorithm significantly narrows the search space by employing a “minimum degree first” branching strategy. Therefore, within the current experimental scope, the proposed method demonstrates practical scalability without the need for additional approximation techniques.

Figure 4 illustrates the process step by step. First, the text to be summarized is divided into sentences, with each sentence represented as a node. Similarities and relationships between these sentences are mapped as edges to form a graph, preserving the structural relationships in the text. Then, the MIS algorithm is applied to eliminate sentences with a low information value. After selection, the graph is reconstructed with the remaining nodes, and meaningful sentences for the summary are identified.

4.3. Formatting of Mathematical Components

LLMs represent a major step beyond earlier statistical language models, which formed the basis of early computational linguistics [38]. The emergence of transformers, along with extensive training corpora and pre-training techniques, significantly enhanced models’ capabilities [39]. These models often contain hundreds of billions of parameters that are optimized during training on vast corpora, enabling sophisticated language generation. As the volume of textual data continues to grow, LLMs have driven major advances across NLP tasks. They have been applied in areas such as code generation, medical text processing, information retrieval, classification, and summarization [40,41,42]. LLMs generate output by capturing contextual dependencies through multi-layer transformer architectures. User input is first tokenized, then embedded into vector form, and passed through stacked transformer blocks. The final layer produces task-specific predictions (as shown in Figure 5).

Models such as XLNet, RoBERTa, DistilBERT, ELECTRA, ALBERT, and Pegasus are basically based on the transformer architecture presented in Figure 5 [44,45,46]. XLNet employs permutation-based training over the transformer architecture, improving its ability to capture word dependencies in longer phrases [47,48]. RoBERTa provides performance gains by a longer and optimized training process based on BERT architecture but with more data [49,50]. DistilBERT is faster and uses less resources with its compact structure. It achieves this by reducing the model size while preserving performance [51,52,53,54]. ELECTRA differs from BERT by using a replaced token detection task, allowing for more efficient pre-training. With ELECTRA, a high performance can be achieved even with small models [55,56,57]. ALBERT employs factorized embeddings and cross-layer parameter sharing, improving the resource efficiency in deep models [58,59,60]. Similarly, the Pegasus model developed by Google [4] demonstrates significant performance improvements in text summarization compared to traditional models. Its pre-training strategy is particularly effective in handling long and complex input texts.

Despite their capabilities, LLMs are often criticized as black-box systems that lack access to explicit structural features, focusing primarily on conceptual content. Furthermore, their high computational demands and long processing times remain a barrier to widespread deployment. To address these limitations, this study introduces a MIS-based framework that extracts salient structural information from sentence graphs, enhancing both the efficiency and summary quality.

4.4. Performance Metrics

In this study, ROUGE is used to evaluate the summarization performance by comparing model-generated summaries with human-written references. ROUGE measures the overlap of textual units such as n-grams and word sequences to assess the quality of generated summaries. This paper considers several ROUGE variants, including ROUGE-N, ROUGE-L, ROUGE-W, and ROUGE-S. Widely adopted in NLP tasks like summarization, machine translation, and text generation, ROUGE offers a practical and comparative metric. Its primary goal is to assess how accurately and comprehensively the model captures the content of the reference text. While ROUGE only approximates semantic similarity, it provides a cost-effective alternative to manual evaluation. It reports a single performance score using Precision (P), Recall (R), and the F-measure (F). Precision measures the proportion of relevant units in the candidate summary, while Recall assesses how much of the reference is captured. The F-measure balances P and R using the

β

parameter. These metrics take values between 0 and 1 and are calculated as follows,

R = \frac{c o r r e c t s e n t e n c e s i n t h e c a n d i d a t e s u m m a r y}{t o t a l n u m b e r o f s e n t e n c e s i n t h e m o d e l s u m m a r y}

(1)

P = \frac{c o r r e c t s e n t e n c e s i n t h e c a n d i d a t e s u m m a r y}{t o t a l s e n t e n c e s i n c a n d i d a t e s u m m a r y}

(2)

F - m e a s u r e = \frac{(1 + β^{2}) R P}{R + β^{2} P}

(3)

ROUGE-N specifically measures n-gram overlap, where N represents the n-gram length. If the candidate summary is denoted by s, ROUGE-N is computed as

R O U G E - N = \frac{\sum_{r \in M} ⟨ Φ N (r), Φ N (s) ⟩}{\sum_{r \in M} ⟨ Φ N (r), Φ N (r) ⟩}

(4)

ΦN(d) is the binary vector representing the N-grams in document d. Here, the ith component of the vector takes the value 1 if the ith N-gram is in d, and it is 0 otherwise. Here, (,) are the inner products of the vectors. ROUGE-L calculates the longest substrings in common between the candidate and the reference summary. ROUGE-W uses a weighting factor for consecutively overlapping words. It searches for strings with consecutive matches. ROUGE-W is a variant of ROUGE-L. ROUGE-S is a skip-bigram co-occurrence. ROUGE-SU is its average [29,61].

5. Experiment Setup and Results

This section describes the experimental setup, including the environment, datasets, and results obtained from evaluating the proposed text summarization framework.

5.1. Experiment Setup

The experiments were conducted on a desktop computer with an AMD Ryzen 7 PRO 4750 U processor (1.70 GHz), Radeon Graphics, and 16 GB RAM. All analyses were performed using Python v3.10. The proposed framework was evaluated on two benchmark datasets: DUC and CNN/DailyMail. While DUC contains more formal and academic content, CNN/DailyMail includes contemporary news articles from various domains. The texts were preprocessed with care to preserve their structural and contextual integrity, then converted into graph representations for summarization.

5.2. Experimental Results

This section presents performance analyses based on the application of the proposed text summarization framework to the DUC and CNN/DailyMail.

To provide LLMs with both conceptual and structural information, the DUC and CNN/DailyMail datasets were used as input to the integrated framework. The texts were preprocessed to preserve their structural and contextual integrity and transformed into simplified, structured forms. Graphs were then constructed from sentence-level phrases and key terms extracted from the processed texts. Using graph-based MIS theory, nodes deemed less important were identified and removed. This yielded new graphs composed of structurally inclusive and semantically rich nodes. Node centrality measures were applied to select the most significant nodes, and their corresponding sentences were identified. These selected sentences were further scored by pre-trained LLMs (XLNet, RoBERTa, DistilBERT, ELECTRA, ALBERT, and Pegasus) to determine final summary candidates. This process resulted in refined summaries that integrate both structural and conceptual dimensions. By combining MIS-based graph filtering with LLM evaluation, the framework significantly improves the summarization performance in terms of both accuracy and computational efficiency. The framework’s effectiveness was evaluated using ROUGE-1, ROUGE-2, ROUGE-L, ROUGE-W, ROUGE-S, and ROUGE-SU metrics, along with sub-metrics such as Precision, Recall, and F-measure. Comparative analyses were conducted on both datasets to validate the results_.

The framework achieved notable performance gains on the DUC dataset compared to the baseline LLMs used in the experiments. Its effectiveness in summarizing current events was further evaluated using the CNN/DailyMail dataset. To assess its summarization capabilities across different summary lengths, the model was tested on both short and long summaries. This evaluation aimed to measure the model’s ability to retain key information in shorter outputs and preserve a detailed context in longer ones. Table 3 presents the ROUGE scores for 400-word summaries, comparing the proposed framework with baseline LLMs.

The F1 score, which balances Precision and Recall, serves as a more reliable performance metric, especially in scenarios involving class imbalance. Therefore, model performance in this study is primarily reported using the F1 score, calculated as the harmonic mean of Precision and Recall.

Experiments conducted on the DUC dataset with 400-word summaries demonstrate that the proposed framework consistently outperforms baseline LLMs. The most significant gain was observed for ALBERT, while all models including ELECTRA showed statistically meaningful improvements. According to the ROUGE-1 metric, which measures unigram overlap, the model achieved notable F1 increases for ALBERT (+0.0568), ELECTRA (+0.0504), RoBERTa (+0.0445), DistilBERT (+0.0109), Pegasus (+0.0294), and XLNet (+0.0086). The ROUGE-2 results, which reflect bigram-level contextual alignment, indicate improved consistency in ALBERT (+0.0678), ELECTRA (+0.0462), RoBERTa (+0.0385), Pegasus (+0.0332), XLNet (+0.0233), and DistilBERT (+0.0001). With ROUGE-L, which emphasizes the preservation of long sentence structures, the proposed model again showed strong gains: ALBERT (+0.0612), ELECTRA (+0.0498), RoBERTa (+0.0331), DistilBERT (+0.0071), XLNet (+0.0094), and Pegasus (+0.0015). ROUGE-W, which evaluates weighted sequence overlap and thus indicates the semantic richness of generated summaries, showed performance gains across all models—highest in ALBERT (+0.0258) and ELECTRA (+0.0223). For phrase-level continuity, as measured by ROUGE-S and ROUGE-SU, the framework improved contextual fluency in ALBERT (+0.0537), RoBERTa (+0.0380), ELECTRA (+0.0329), Pegasus (+0.0260), XLNet (+0.0092), and DistilBERT (+0.0054). Figure 6 provides a visual representation of ROUGE-F1 score improvements across models for the DUC dataset.

Table 4 summarizes the 200-word ROUGE scores of the proposed model compared to the baseline LLMs.

Experimental results on the DUC dataset for 200-word summaries show that the proposed framework significantly outperforms previously evaluated LLMs, with the exception of ELECTRA. One of the key strengths of the model lies in its ability to generate more meaningful and contextually consistent candidate summaries. In terms of ROUGE-1, which measures unigram overlap, the framework yields notable F1 score improvements: +0.0097 for ALBERT, +0.0146 for RoBERTa, +0.0229 for DistilBERT, and +0.0359 for Pegasus. The model also demonstrates strong performance in preserving contextual semantic integrity, as reflected by ROUGE-2 scores. Improvements include +0.0383 for ALBERT, +0.0108 for RoBERTa, +0.0185 for DistilBERT, and +0.0425 for Pegasus. ROUGE-L results, which emphasize the preservation of longer sentence structures, further confirm the model’s effectiveness. The proposed approach reports F1 gains of +0.0192 for ALBERT, +0.0099 for RoBERTa, +0.0301 for DistilBERT, and +0.0372 for Pegasus.

The results also indicate an improvement in computational efficiency. According to the ROUGE-W metric, which captures weighted sequence overlap, the proposed model yields F1 score gains of +0.0067 for ALBERT, +0.0156 for XLNet, and +0.0198 for Pegasus. In terms of contextual ordering and sentence consistency, evaluated through the ROUGE-S and ROUGE-SU metrics, the model reports improvements of +0.0082 and +0.0081 for ALBERT and +0.0332 for Pegasus. Figure 7 provides a graphical summary of the ROUGE-F1 score variations for the DUC dataset with 200-word summaries.

Experiments on the CNN/DailyMail dataset with three-word summaries reveal that the proposed model achieves notable performance gains, particularly when used with XLNet, DistilBERT, and ALBERT. According to the ROUGE-1 metric, which measures word-level overlap, F1 score improvements were observed as follows: +0.0578 for XLNet, +0.1321 for DistilBERT, and +0.0361 for ALBERT. In contrast, the model showed limited or no improvement with RoBERTa and Pegasus. These results indicate that the framework performs especially well with models such as XLNet and ALBERT.

The ROUGE-2 metric measures the contextual accuracy of summaries over two-word sequences. The proposed model demonstrates notable improvements in F1 scores, with enhancements observed in XLNet (+0.0406), ALBERT (+0.0812), and DistilBERT (+0.0147), as shown in Table 5. The ROUGE-L metric, which evaluates the preservation of long sentence structures, revealed positive impacts on F1 scores for XLNet (+0.0557) and ALBERT (+0.0303). However, no improvements were observed in the other LLMs. RoBERTa was disadvantaged in selections lacking contextual awareness. However, improvements in the ROUGE-W metric were observed in the F1 scores of the proposed model, including XLNet (+0.0341) and ALBERT (+0.0196). Notably, the substantial improvement observed in the XLNet model indicates the effective integration of the proposed approach. In the ROUGE-S and ROUGE-SU metrics, which assess contextual structure and sentence sequencing, the proposed model shows gains of (+0.08261), (+0.0257) for XLNet and (+0.0087), (+0.0083) for ALBERT. Figure 8 presents a graphical analysis reflecting the variations in ROUGE F1 scores observed for the CNN/DailyMail dataset and three-word abstracts.

The results indicate that the proposed model yields notable performance improvements with XLNet and ALBERT on the CNN/DailyMail dataset. While DistilBERT also showed positive gains, slight performance drops were observed for RoBERTa and Pegasus.

Table 6 presents the processing times of various transformer-based LLM models across two datasets (DUC and CNN/DailyMail) and three summary lengths (200 words, 400 words, and 3 words). Each cell displays the model’s processing times before and after applying the proposed summarization approach, along with the percentage reduction (gain). Notably, time reductions were observed across all models. Among these, ELECTRA demonstrated the fastest processing speeds, while Pegasus exhibited the highest percentage gains. Due to its large size and computational complexity, Pegasus also had the longest initial processing times. Nevertheless, the proposed framework yielded a remarkable 63.1% reduction in processing time for three-word summaries. DistilBERT, as a compressed LLM model, initially displayed shorter processing times; however, it achieved a 57.69% time reduction for 200-word summaries with the proposed approach. These findings suggest that the framework is effective for both large-scale and compressed models. ELECTRA, recognized as one of the fastest baseline models, also demonstrated gains of up to 58.56%. The most striking improvement was observed with ALBERT, which achieved a 60.22% reduction in processing time for DUC-200 and a 51.85% reduction for short text summarization, making it the most efficient model under the proposed framework. Figure 9 illustrates the time improvements observed across the different transformer-based LLM models, datasets, and summarization lengths.

Time reductions ranged from 54% to 69% for 200-word summaries, indicating that longer texts enhance the framework’s efficiency. For 400-word summaries, the gains were between 48% and 62%. These results show that the proposed approach remains effective even as the text length increases. For short summaries, time reductions varied between 25% and 63%. Since processing times are already low for shorter texts, the relative impact of the framework is more limited. Overall, the proposed method offers substantial time savings for both large and lightweight LLMs. A t-test was conducted to evaluate the impact of the MIS framework across different summary lengths (400-word, 200-word, and 3-word) and ROUGE metrics.

As shown in Table 7, the p-values for all summary lengths and each ROUGE metric (Rouge-1, Rouge-2, Rouge-L, Rouge-W, Rouge-S, and Rouge-SU) are below 0.05. This finding demonstrates that the improvements in summarization performance are statistically significant and not due to chance. The t-statistics and p-values for each summary length and ROUGE metric further support the robustness of the MIS framework and the effectiveness of the method. Table 7 confirms that the proposed approach leads to statistically significant improvements in summary quality.

6. Discussion

This study introduces an MIS-based graph filtering approach, integrated with LLMs, to enhance summarization performance. The proposed framework shows clear improvements in accuracy, semantic coherence, and computational efficiency, especially for 200-word summaries. Significant improvements were observed when using models such as XLNet, RoBERTa, DistilBERT, ALBERT, and Pegasus. However, ELECTRA performed relatively poorly, likely due to its masking-based strategy, which may not align well with the framework’s independent node selection. This highlights the need for tailored sentence selection strategies for models like ELECTRA.

Analyses of 400-word summaries show that the framework remains effective with longer content. The results confirm that it improves LLMs’ summarization and supports strong word-level representations. The notable gains with ALBERT likely stem from its ability to maintain semantic coherence. The summaries are structurally aligned with the original texts and preserve long-range contextual links. This suggests that even smaller LLMs can benefit from the framework. Selecting more information-dense sentences also enhances computational efficiency. By removing out-of-context content, the method produces clearer and more coherent summaries. In 200-word summaries, a consistency in word overlap, order, and sentence structure indicates the successful retention of key textual features. The use of MIS theory further supports contextual preservation by filtering irrelevant information.

Evaluations of three-word summaries showed a stronger integration with models like XLNet and ALBERT, leading to a better contextual coherence and handling of long sentence structures. Sentence selection was more effective with these models. The MIS-based structure reduced computational loads and enabled shorter, yet meaningful and dense, summaries. Pegasus, however, underperformed in short summaries, possibly due to its large architecture being less suited to structural modifications in brief contexts. In contrast, its performance improved significantly in 200- and 400-word summaries, suggesting better alignment with long-text summarization tasks.

Overall, the MIS-based approach has improved sentence selection in LLMs, leading to more realistic summaries. Strong results with models like ALBERT, despite its low parameter count, highlight the framework’s versatility. The MIS-driven structural transformation reduced computational demands, allowing a faster and resource-efficient generation of content-rich summaries.

Integrating structural and conceptual information has enabled more reliable inferences and clearer cause–effect relationships within the text. The graph-based structure preserved contextual links, reducing the risk of information loss. In conclusion, the experimental results show that combining MIS-based graph algorithms with LLMs creates an effective synergy in summarization, improving both accuracy and efficiency. The proposed framework offers valuable contributions to the field, both in theory and application.

By extending the capabilities of LLMs, the framework also improves their accessibility and practical utility.

7. Conclusions

The primary goal of this study is to address structural limitations in LLM-based text summarization and improve the accuracy, coherence, and efficiency of the generated summaries. Traditional LLMs emphasize contextual representations while neglecting structural features, often resulting in repetition, loss of context, and semantic drift, especially in long or complex texts. To overcome these issues, this study proposes a method to help LLMs produce more informative and reliable summaries.

To achieve this, a framework is proposed that represents texts as graphs and uses the MIS algorithm to filter sentences for more information-dense content. The method converts texts into sentence-level graph structures and passes the selected sentences to LLMs for summarization. Experimental results show clear improvements in ROUGE metrics. Processing time and energy efficiency also improved significantly.

A potential risk of the MIS algorithm is its tendency to overlook contextual information during sentence selection. In the proposed framework, this is addressed by modeling semantic relationships between sentences using graph structures. The MIS algorithm selects central, information-dense, and mutually dissimilar sentences to preserve the summary’s semantic scope. Performance gains in ROUGE-L and ROUGE-W, both sensitive to sequence and semantic consistency, indicate that contextual integrity is maintained. These results warrant a closer examination of how the MIS-based approach differs from prior graph-based summarization methods. Its novelty lies in structurally filtering content through MIS theory rather than relying on node centrality or attention weights. Instead, it identifies non-adjacent and thus semantically independent sentence clusters. This process removes irrelevant or weakly connected content, improving coherence and consistency while reducing LLMs’ processing load. The model-agnostic design also allows its integration with a wide range of Large Language Models. The experimental results confirm the value of incorporating structural information in graph-based summarizations.

The proposed framework also has limitations. In very short summaries, MIS-based filtering may exclude important contextual expressions, narrowing the informational scope and affecting semantic integrity, especially in real-time applications such as chatbots. Although only two datasets (DUC-2002 and CNN/DailyMail) were used, CNN/DailyMail includes texts from diverse domains (politics, health, sports, and economics), offering a degree of topic generalizability. Future work will explore applications in more technical domains, such as legal and biomedical texts. To assess its consistency, fluency, and informativeness, further analyses supported by human evaluation are planned. This will enable a more robust validation of the framework from both quantitative and qualitative perspectives.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

LLM	Large Language Model
NLP	Natural Language Process
MIS	Maximum Independent Set
DUC	Document Understanding Conference
CNN	Cable News Network
ROUGE	Recall-Oriented Understudy for Gisting Evaluation
GAT	Graph Attention Network
GNN	Graph Neural Networks
OOV	Out-of-Vocabulary

References

Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
Floridi, L.; Chiriatti, M. GPT-3: Its Nature, Scope, Limits, and Consequences. Minds Mach. 2020, 30, 681–694. [Google Scholar] [CrossRef]
Shetty, A.M.; DH., M.; Aljunid, M.F. Fine-tuning XLNet for Amazon review sentiment analysis: A comparative evaluation of transformer models. ETRI J. 2025, 1–18. [Google Scholar] [CrossRef]
Zhang, J.; Zhao, Y.; Saleh, M.; Liu, P. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 13–18 July 2020; pp. 11328–11339. [Google Scholar]
Pourkeyvan, A.; Safa, R.; Sorourkhah, A. Harnessing the Power of Hugging Face Transformers for Predicting Mental Health Disorders in Social Networks. IEEE Access 2024, 12, 28025–28035. [Google Scholar] [CrossRef]
Tsai, Y.-H.; Chang, C.-M.; Chen, K.-H.; Hwang, S.-Y. An Integration of TextGCN and Autoencoder into Aspect-Based Sentiment Analysis; Springer: Berlin/Heidelberg, Germany, 2022; pp. 3–16. [Google Scholar]
Yang, S.; Duan, X.; Xiao, Z.; Li, Z.; Liu, Y.; Jie, Z.; Tang, D.; Du, H. Sentiment Classification of Chinese Tourism Reviews Based on ERNIE-Gram+GCN. Int. J. Environ. Res. Public Health 2022, 19, 13520. [Google Scholar] [CrossRef]
Zeng, D.; Zha, E.; Kuang, J.; Shen, Y. Multi-label text classification based on semantic-sensitive graph convolutional network. Knowl. -Based Syst. 2024, 284, 111303. [Google Scholar] [CrossRef]
Zuo, Y.; Wu, J.; Zhang, H.; Lin, H.; Wang, F.; Xu, K.; Xiong, H. Topic Modeling of Short Texts. In Proceedings of the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 2105–2114. [Google Scholar]
Zhao, K.; Huang, L.; Song, R.; Shen, Q.; Xu, H. A Sequential Graph Neural Network for Short Text Classification. Algorithms 2021, 14, 352. [Google Scholar] [CrossRef]
Mouratidis, D.; Kermanidis, K.L. Ensemble and Deep Learning for Language-Independent Automatic Selection of Parallel Data. Algorithms 2019, 12, 26. [Google Scholar] [CrossRef]
Chai, Z.; Zhang, T.; Wu, L.; Han, K.; Hu, X.; Huang, X.; Yang, Y. GraphLLM: Boosting Graph Reasoning Ability of Large Language Model. arXiv 2023, arXiv:2310.05845. Available online: https://arxiv.org/abs/2310.05845 (accessed on 1 March 2025).
Chen, Z.; Mao, H.; Li, H.; Jin, W.; Wen, H.; Wei, X.; Wang, S.; Yin, D.; Fan, W.; Liu, H.; et al. Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs. arXiv 2024, arXiv:2307.03393. Available online: https://arxiv.org/abs/2307.03393 (accessed on 1 March 2025). [CrossRef]
Fatemi, B.; Halcrow, J.; Perozzi, B. Talk like a Graph: Encoding Graphs for Large Language Models. arXiv 2023, arXiv:2310.04560. Available online: https://arxiv.org/abs/2310.04560 (accessed on 1 March 2025).
Guo, J.; Du, L.; Liu, H.; Zhou, M.; He, X.; Han, S. GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking. arXiv 2023, arXiv:2305.15066. Available online: https://arxiv.org/abs/2305.15066 (accessed on 2 March 2025).
Bian, J.; Huang, X.; Zhou, H.; Huang, T.; Zhu, S. GoSum: Extractive summarization of long documents by reinforcement learning and graph-organized discourse state. Knowl. Inf. Syst. 2024, 66, 7557–7580. [Google Scholar] [CrossRef]
Chen, T.; Wang, X.; Yue, T.; Bai, X.; Le, C.X.; Wang, W. Enhancing Abstractive Summarization with Extracted Knowledge Graphs and Multi-Source Transformers. Appl. Sci. 2023, 13, 7753. [Google Scholar] [CrossRef]
Zeng, G.-H.; Liu, Y.-Q.; Zhang, C.-Y.; Cai, H.-C.; Chen, C.L.P. Adaptive Multi-Document Summarization Via Graph Representation Learning. IEEE Trans. Cogn. Dev. Syst. 2024, 1–12. [Google Scholar] [CrossRef]
Ivanisenko, T.V.; Demenkov, P.S.; Ivanisenko, V.A. An Accurate and Efficient Approach to Knowledge Extraction from Scientific Publications Using Structured Ontology Models, Graph Neural Networks, and Large Language Models. Int. J. Mol. Sci. 2024, 25, 11811. [Google Scholar] [CrossRef]
Chen, J.; Yang, D. Structure-Aware Abstractive Conversation Summarization via Discourse and Action Graphs. arXiv 2021, arXiv:2104.08400. Available online: https://arxiv.org/abs/2104.08400 (accessed on 2 March 2025).
Cui, P.; Hu, L.; Liu, Y. Enhancing Extractive Text Summarization with Topic-Aware Graph Neural Networks. arXiv 2020, arXiv:2010.06253. Available online: https://arxiv.org/abs/2010.06253 (accessed on 4 March 2025).
Yasunaga, M.; Zhang, R.; Meelu, K.; Pareek, A.; Srinivasan, K.; Radev, D. Graph-based Neural Multi-Document Summarization. arXiv 2017, arXiv:1706.06681. Available online: https://arxiv.org/abs/1706.06681 (accessed on 3 March 2025).
Sun, G.; Wang, Y.; Niyato, D.; Wang, J.; Wang, X.; Poor, H.V.; Letaief, K.B. Large Language Model (LLM)-enabled Graphs in Dynamic Networking. IEEE Netw. 2024, 1. [Google Scholar] [CrossRef]
Plaza, L.; Díaz, A.; Gervás, P. A semantic graph-based approach to biomedical summarisation. Artif. Intell. Med. 2011, 53, 1–14. [Google Scholar] [CrossRef] [PubMed]
Givchi, A.; Ramezani, R.; Baraani-Dastjerdi, A. Graph-based abstractive biomedical text summarization. J. Biomed. Inform. 2022, 132, 104099. [Google Scholar] [CrossRef] [PubMed]
Zhu, X.; Xue, H.; Zhao, Z.; Xu, W.; Huang, J.; Guo, M.; Wang, Q.; Zhou, K.; Zhang, Y. LLM as GNN: Graph Vocabulary Learning for Text-Attributed Graph Foundation Models. arXiv 2025, arXiv:2503.03313. Available online: https://arxiv.org/abs/2503.03313 (accessed on 3 March 2025).
Document Understanding Conferences. Available online: http://duc.nist.gov (accessed on 14 July 2021).
See, A.; Liu Google Brain, P.J.; Manning, C.D. Get To The Point: Summarization with Pointer-Generator Networks. arXiv 2017, arXiv:1704.04368. [Google Scholar]
Lin, C.Y. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out (WAS 2004); Association for Computational Linguistics: Barcelona, Spain, 2004; pp. 25–26. [Google Scholar]
Hermann, K.M.; Kocisky, T.; Grefenstette, E.; Espeholt, L.; Kay, W.; Suleyman, M.; Blunsom, P. Teaching machines to read and comprehend. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 7–12 December 2015. [Google Scholar]
Moscibroda, T.; Wattenhofer, R. Maximal independent sets in radio networks. In Proceedings of the Twenty-Fourth Annual ACM Symposium on Principles of Distributed Computing, Las Vegas, NV, USA, 17–20 July 2005; ACM: New York, NY, USA, 2005; pp. 148–157. [Google Scholar]
McIlroy, D. The UNIX Time-Sharing System: Foreword. In Proceedings of the First ACM SIGPLAN Conference on History of Programming Languages (HOPL-I), Los Angeles, CA, USA, 1–3 June 1978; ACM: New York, NY, USA, 2005; pp. 207–212. [Google Scholar]
Mrzic, A.; Meysman, P.; Bittremieux, W.; Moris, P.; Cule, B.; Goethals, B.; Laukens, K. Grasping frequent subgraph mining for bioinformatics applications. BioData Min. 2018, 11, 20. [Google Scholar] [CrossRef]
Uçkan, T.; Karcı, A. Extractive multi-document text summarization based on graph independent sets. Egypt. Inform. J. 2020, 21, 145–157. [Google Scholar] [CrossRef]
Öztemiz, F. A greedy approach to solve maximum independent set problem: Differential Malatya independent set algorithm. Eng. Sci. Technol. Int. J. 2025, 63, 101995. [Google Scholar] [CrossRef]
YAKUT, S.; ÖZTEMİZ, F.; KARCİ, A. A New Approach Based on Centrality Value in Solving the Maximum Independent Set Problem: Malatya Centrality Algorithm. Comput. Sci. 2023, 8, 16–23. [Google Scholar] [CrossRef]
Jou, M.-J.; Chang, G.J. The Number Of Maximum Independent Sets In Graphs. Taiwan J. Math. 2000, 4, 685–695. [Google Scholar] [CrossRef]
Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A Survey of Large Language Models. arXiv 2025, arXiv:2303.18223. Available online: https://arxiv.org/abs/2303.18223 (accessed on 5 March 2025).
Feng, S.Y.; Gangal, V.; Wei, J.; Chandar, S.; Vosoughi, S.; Mitamura, T.; Hovy, E. A Survey of Data Augmentation Approaches for NLP. arXiv 2021, arXiv:2105.03075. Available online: https://arxiv.org/abs/2105.03075 (accessed on 5 March 2025).
Novelli, C.; Casolari, F.; Rotolo, A.; Taddeo, M.; Floridi, L. Taking AI risks seriously: A new assessment model for the AI Act. AI Soc. 2024, 39, 2493–2497. [Google Scholar] [CrossRef]
Cai, Y.; Mao, S.; Wu, W.; Wang, Z.; Liang, Y.; Ge, T.; Wu, C.; You, W.; Song, T.; Xia, Y.; et al. Low-code LLM: Graphical User Interface over Large Language Models. arXiv 2024, arXiv:2304.08103. Available online: https://arxiv.org/abs/2304.08103 (accessed on 2 March 2025).
Yao, Y.; Duan, J.; Xu, K.; Cai, Y.; Sun, Z.; Zhang, Y. A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly. High-Confid. Comput. 2024, 4, 100211. [Google Scholar] [CrossRef]
Zhongyu, S.; Zhou, W.; Ding, C.; Xia, M. Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing Image. ISPRS Int. J. Geo-Inf. 2022, 11, 165. [Google Scholar] [CrossRef]
Kalyan, K.S. A survey of GPT-3 family large language models including ChatGPT and GPT-4. Nat. Lang. Process. J. 2024, 6, 100048. [Google Scholar] [CrossRef]
Liu, Z.; Roberts, R.A.; Lal-Nag, M.; Chen, X.; Huang, R.; Tong, W. AI-based language models powering drug discovery and development. Drug Discov. Today 2021, 26, 2593–2607. [Google Scholar] [CrossRef]
Kour, H.; Gupta, M.K. AI Assisted Attention Mechanism for Hybrid Neural Model to Assess Online Attitudes About COVID-19. Neural Process. Lett. 2023, 55, 2265–2304. [Google Scholar] [CrossRef]
Marrone, S.; Sansone, C. On the transferability of adversarial perturbation attacks against fingerprint based authentication systems. Pattern Recognit. Lett. 2021, 152, 253–259. [Google Scholar] [CrossRef]
Potamias, R.A.; Siolas, G.; Stafylopatis, A.-G. A transformer-based approach to irony and sarcasm detection. Neural Comput. Appl. 2020, 32, 17309–17320. [Google Scholar] [CrossRef]
Yang, T.; Li, F.; Ji, D.; Liang, X.; Xie, T.; Tian, S.; Li, B.; Liang, P. Fine-grained depression analysis based on Chinese micro-blog reviews. Inf. Process. Manag. 2021, 58, 102681. [Google Scholar] [CrossRef]
Capris, T.; Takagi, Y.; Figueiredo, D.; Henriques, J.; Pires, I.M. A Convolutional Neural Network-enabled IoT framework to verify COVID-19 hygiene conditions and authorize access to facilities. Procedia Comput. Sci. 2022, 203, 727–732. [Google Scholar] [CrossRef] [PubMed]
Kinger, S.; Kinger, D.; Thakkar, S.; Bhake, D. Towards smarter hiring: Resume parsing and ranking with YOLOv5 and DistilBERT. Multimed. Tools Appl. 2024, 83, 82069–82087. [Google Scholar] [CrossRef]
Wang, H.; Kang, X.; Ren, F. Emotion-Sentence-DistilBERT: A Sentence-BERT-Based Distillation Model for Text Emotion Classification; Springer: Berlin/Heidelberg, Germany, 2022; pp. 313–322. [Google Scholar]
Leteno, T.; Gourru, A.; Laclau, C.; Gravier, C. An Investigation of Structures Responsible for Gender Bias in BERT and DistilBERT; Springer: Berlin/Heidelberg, Germany, 2023; pp. 249–261. [Google Scholar]
Jojoa, M.; Eftekhar, P.; Nowrouzi-Kia, B.; Garcia-Zapirain, B. Natural language processing analysis applied to COVID-19 open-text opinions using a distilBERT model for sentiment categorization. AI Soc. 2024, 39, 883–890. [Google Scholar] [CrossRef]
Qiu, K.; Zhang, Y.; Feng, Y.; Chen, F. LogAnomEX: An Unsupervised Log Anomaly Detection Method Based on Electra-DP and Gated Bilinear Neural Networks. J. Netw. Syst. Manag. 2025, 33, 33. [Google Scholar] [CrossRef]
Yu, J.-L.; Dai, Q.-Q.; Li, G.-B. Deep learning in target prediction and drug repositioning: Recent advances and challenges. Drug Discov. Today 2022, 27, 1796–1814. [Google Scholar] [CrossRef]
Ding, M. The road from MLE to EM to VAE: A brief tutorial. AI Open 2022, 3, 29–34. [Google Scholar] [CrossRef]
Ye, Z.; Zuo, T.; Chen, W.; Li, Y.; Lu, Z. Textual emotion recognition method based on ALBERT-BiLSTM model and SVM-NB classification. Soft Comput. 2023, 27, 5063–5075. [Google Scholar] [CrossRef]
Heim, E.; Ramia, J.A.; Hana, R.A.; Burchert, S.; Carswell, K.; Cornelisz, I.; Cuijpers, P.; El Chammay, R.; Noun, P.; van Klaveren, C.; et al. Step-by-step: Feasibility randomised controlled trial of a mobile-based intervention for depression among populations affected by adversity in Lebanon. Internet Interv. 2021, 24, 100380. [Google Scholar] [CrossRef]
Hunhevicz, J.J.; Hall, D.M. Do you need a blockchain in construction? Use case categories and decision framework for DLT design options. Adv. Eng. Inform. 2020, 45, 101094. [Google Scholar] [CrossRef]
Azmi, A.M.; Altmami, N.I. An abstractive Arabic text summarizer with user controlled granularity. Inf. Process. Manag. 2018, 54, 903–921. [Google Scholar] [CrossRef]

Figure 1. Flowchart for improving extractive summarization using graph-based independent sets in LLMs.

Figure 2. Overall block diagram of the proposed framework.

Figure 3. Example of text-based graph.

Figure 4. Demonstration of graph-based content filtering with MIS selection.

Figure 5. Generalized flow diagram of LLMs. Adapted from [43].

Figure 6. Graphical analysis reflecting the observed variations in ROUGE F1 scores for the DUC dataset and 400-word summaries.

Figure 7. Graphical analysis reflecting the observed variations in ROUGE F1 scores for the DUC dataset and 200-word summaries.

Figure 8. Graphical analysis reflecting the observed variations in ROUGE-F1 scores for the CNN/DailyMail dataset.

Figure 9. Effect of the proposed framework on processing time.

Table 1. Comparative analysis of the proposed method and state-of-the-art approaches.

Model	Main Contribution	Advantages	Disadvantages
[12] Graph + LLM	Context length reduction	Performance boost	Model heavily dependent on graph and LLM type
[14] Graph to Text	Converts graphs to text for LLM	Effective with suitable representation	Large-scale graphs less effective
[15] Benchmark for LLM Graph Understanding	Highlights critical factors affecting LLM	Good insight into LLM limits	LLM lags behind graph-specialized models
[16] GoSum	High ROUGE on PubMed/arXiv datasets	Effective summarization	Performance depends on document structure
[17] MultiBART-GAT	Abstractive summarization and info accuracy	Controls info accuracy	Computationally expensive
[18] HeterMDS	Maintains structure and content integrity	High-quality summaries	High integration cost
[19] Protein Interaction Pred.	High-accuracy interaction predictions	Meaningful explanations	Domain-specific
[20] Discourse, Action Graphs	Improves summary correctness	Success in automatic and human evals	Complexity of graphs
[21] Document-Level Graphs	Significant success on scientific datasets	Handles long documents well	Dataset dependency and high cost
[22] Multi-Document Summ.	Effective multi-doc summaries	High ROUGE on DUC-2004	No context integrity, repetitive sentences
[24] Biomed. Text Summ.	Uses UMLS semantic types	Meaningful biomedical summaries	Domain-specific
[25] Extended Biomed. Graphs	More robust graph creation	Improved node connections	Domain-specific
[26] Prompt-Based GNN	Task transferability improvements	Overcomes OOV issue	Ignores semantic similarity across graphs
Proposed framework	MIS-based structural filtering with LLM integration	High summarization accuracy, lower cost	Potential information loss in short summaries

Table 2. Comparison of text summarization datasets and their usage purposes.

Datasets	SD	MD	Data Size	Human Annotation	Purpose of Use
Duc-2002	no	yes	567 document	yes	Information fusion–Content synthesis
CNN/DailyMail	yes	no	290,000 news	yes	Main idea extraction–Summary of information

Table 3. Performance improvements of ROUGE metric on 400-Word abstracts on DUC dataset (comparison of baseline and improved models).

ROUGE Metrics		ROUGE Performance Values (DUC—400-Word Abstracts)
		XLNet		Roberta		Distilbert		Electra		Albert		Pegasus
		Baseline	Prop.	Baseline	Prop.	Baseline	Prop.	Baseline	Prop.	Baseline	Prop.	Baseline	Prop.
Rouge 1	R	0.48489	0.49119▲ (+0.0063)	0.45074	0.50124▲ (+0.0505)	0.48725	0.49911▲ (+0.0118)	0.44350	0.49335▲ (+0.0498)	0.47411	0.53538▲ (+0.0612)	0.50153	0.52736▲ (+0.02583)
	P	0.47843	0.48946▲ (+0.0110)	0.45095	0.48953▲ (+0.0385)	0.48447	0.49456▲ (+0.0100)	0.44173	0.49230▲ (+0.0505)	0.47534	0.52823▲ (+0.0528)	0.49371	0.52692▲ (+0.03321)
	F1	0.48155	0.49018▲ (+0.0086)	0.45067	0.49519▲ (+0.0445)	0.48565	0.49659▲ (+0.0109)	0.44229	0.49278▲ (+0.0504)	0.47468	0.53156▲ (+0.0568)	0.49751	0.52693▲ (+0.02942)
Rouge 2	R	0.15739	0.18030▲ (+0.0229)	0.14651	0.18695▲ (+0.0404)	0.19473	0.19485▲ (+0.0001)	0.12382	0.16971▲ (+0.0458)	0.16493	0.23507▲ (+0.0701)	0.19069	0.22390▲ (+0.03321)
	P	0.15564	0.17936▲ (+0.0237)	0.14585	0.18261▲ (+0.0367)	0.19315	0.19170	0.12259	0.16900▲ (+0.0464)	0.16566	0.23143▲ (+0.0657)	0.18767	0.22250▲ (+0.03483)
	F1	0.15648	0.17978▲ (+0.0233)	0.14612	0.18471▲ (+0.0385)	0.19385	0.19317	0.12313	0.16935▲ (+0.0462)	0.16528	0.23314▲ (+0.0678)	0.18914	0.22313▲ (+0.03399)
Rouge L	R	0.44662	0.45390▲ (+0.0072)	0.41937	0.45791▲ (+0.0385)	0.44972	0.45778▲ (+0.0080)	0.40410	0.45331▲ (+0.0492)	0.43460	0.50005▲ (+0.0654)	0.46817	0.49639▲ (+0.02822)
	P	0.44077	0.45237▲ (+0.0116)	0.41951	0.44727▲ (+0.0277)	0.44708	0.45336▲ (+0.0062)	0.40212	0.45221▲ (+0.0500)	0.43577	0.49323▲ (+0.0574)	0.46088	0.49587▲ (+0.03499)
	F1	0.44360	0.45300▲ (+0.0094)	0.41927	0.45241▲ (+0.0331)	0.44821	0.45534▲ (+0.0071)	0.40283	0.45272▲ (+0.0498)	0.43514	0.49641▲ (+0.0612)	0.46442	0.49594▲ (+0.03152)
Rouge W	R	0.13854	0.13958▲ (+0.0010)	0.12577	0.14008▲ (+0.0143)	0.13673	0.14140▲ (+0.0046)	0.12054	0.13855▲ (+0.0180)	0.13356	0.15578▲ (+0.0222)	0.14722	0.15464▲ (+0.00742)
	P	0.21547	0.21928▲ (+0.0038)	0.19820	0.21550▲ (+0.0173)	0.21439	0.22085▲ (+0.0064)	0.18877	0.21788▲ (+0.0291)	0.21129	0.24207▲ (+0.0307)	0.22846	0.24374▲ (+0.01528)
	F1	0.16860	0.17051▲ (+0.0019)	0.15382	0.16972▲ (+0.0159)	0.16689	0.17231▲ (+0.0054)	0.14701	0.16936▲ (+0.0223)	0.16362	0.18944▲ (+0.0258)	0.17901	0.18916▲ (+0.01015)
Rouge S	R	0.21424	0.22180▲ (+0.0075)	0.18482	0.22801▲ (+0.0431)	0.22186	0.22857▲ (+0.0067)	0.18591	0.21787▲ (+0.0319)	0.20370	0.26192▲ (+0.0582)	0.22906	0.25236▲ (+0.0233)
	P	0.20871	0.21989▲ (+0.0111)	0.18417	0.21726▲ (+0.0330)	0.21888	0.22330▲ (+0.0044)	0.18348	0.21667▲ (+0.0331)	0.20467	0.25468▲ (+0.0500)	0.22189	0.25099▲ (+0.0291)
	F1	0.21129	0.22057▲ (+0.0092)	0.18420	0.22228▲ (+0.0380)	0.21999	0.22544▲ (+0.0054)	0.18420	0.21719▲ (+0.0329)	0.20411	0.25782▲ (+0.0537)	0.22526	0.25130▲ (+0.02604)
Rouge SU	R	0.21556	0.22310▲ (+0.0105)	0.18611	0.22934▲ (+0.0432)	0.22315	0.22988▲ (+0.0067)	0.18717	0.21921▲ (+0.0320)	0.20501	0.26325▲ (+0.0582)	0.23038	0.25369▲ (+0.02331)
	P	0.21001	0.22120▲ (+0.1119)	0.18547	0.21856▲ (+0.0330)	0.22017	0.22460▲ (+0.0044)	0.18474	0.21800▲ (+0.0332)	0.20598	0.25599▲ (+0.0500)	0.22319	0.25233▲ (+0.02914)
	F1	0.21259	0.22188▲ (+0.0092)	0.18550	0.22359▲ (+0.0380)	0.22128	0.22675▲ (+0.0054)	0.18546	0.21853▲ (+0.0330)	0.20541	0.25914▲ (+0.0537)	0.22657	0.25263▲ (+0.02606)

▲ Improvement from baseline.

Table 4. Performance improvements of ROUGE metric on 200-word abstracts on DUC dataset (comparison of baseline and improved models).

ROUGE Metrics		ROUGE Performance Values (DUC—200-Word Abstracts)
		XLNet		Roberta		Distilbert		Electra		Albert		Pegasus
		Baseline	Prop.	Baseline	Prop.	Baseline	Prop.	Baseline	Prop.	Baseline	Prop.	Baseline	Prop.
Rouge 1	R	0.38652	0.40223▲ (+0.0157)	0.36219	0.37387▲ (+0.0116)	0.36504	0.39377▲ (+0.0287)	0.39918	0.38935	0.41210	0.42355▲ (+0.0114)	0.42815	0.46058▲ (+0.03243)
	P	0.38991	0.39357▲ (+0.0036)	0.35462	0.37229▲ (+0.0176)	0.36654	0.38293▲ (+0.0163)	0.40108	0.39035	0.40984	0.41782▲ (+0.0079)	0.41472	0.45413▲ (+0.03941)
	F1	0.38774	0.39757▲ (+0.0141)	0.35814	0.37280▲ (+0.0146)	0.36516	0.38811▲ (+0.0229)	0.39980	0.38936	0.41066	0.42032▲ (+0.0096)	0.42110	0.45704▲ (+0.03594)
Rouge 2	R	0.10675	0.12093▲ (+0.0141)	0.08237	0.09336▲ (+0.0109)	0.08405	0.10225▲ (+0.0182)	0.12258	0.09785	0.13295	0.17328▲ (+0.0403)	0.14364	0.18557▲ (+0.04193)
	P	0.10868	0.11744▲ (+0.0087)	0.08075	0.09154▲ (+0.0107)	0.08120	0.09978▲ (+0.0185)	0.12155	0.09786	0.13196	0.16860▲ (+0.0366)	0.13817	0.18131▲ (+0.04314)
	F1	0.10762	0.11911▲ (+0.0114)	0.08151	0.09237▲ (+0.0108)	0.08251	0.10097▲ (+0.0184)	0.12147	0.09776	0.13240	0.17072▲ (+0.0383)	0.14079	0.18332▲ (+0.04253)
Rouge L	R	0.34285	0.36832▲ (+0.0254)	0.32800	0.4235▲ (+0.0955)	0.32469	0.36007▲ (+0.0353)	0.35715	0.35130	0.36813	0.38939▲ (+0.0212)	0.38714	0.42147▲ (+0.03433)
	P	0.34649	0.36020▲ (+0.0137)	0.32139	0.4235▲ (+0.0121)	0.32559	0.34981▲ (+0.0242)	0.35923	0.35256	0.36599	0.38244▲ (+0.0164)	0.37493	0.41512▲ (+0.04019)
	F1	0.34426	0.36397▲ (+0.0197)	0.32446	0.4235▲ (+0.099)	0.32458	0.35472▲ (+0.0301)	0.35790	0.35148	0.36678	0.38607▲ (+0.0192)	0.38073	0.41801▲ (+0.03728)
Rouge W	R	0.11470	0.12901▲ (+0.0143)	0.11311	0.11237	0.11165	0.12231▲ (+0.0006)	0.12332	0.12067	0.12695	0.13287▲ (+0.0059)	0.13477	0.15021▲ (+0.01544)
	P	0.18244	0.19869▲ (+0.0162)	0.17439	0.17599▲ (+0.0016)	0.17629	0.18719▲ (+0.0109)	0.19514	0.19063	0.19868	0.20661▲ (+0.0079)	0.20555	0.23306▲ (+0.02751)
	F1	0.14067	0.15635▲ (+0.0156)	0.13714	0.13708	0.13653	0.14789▲ (+0.0113)	0.15102	0.14762	0.15480	0.16159▲ (+0.0067)	0.16274	0.18259▲ (+0.01985)
Rouge S	R	0.13650	0.14431▲ (+0.0078)	0.12181	0.12721▲ (+0.0054)	0.12333	0.14294▲ (+0.0196)	0.15122	0.14147	0.15531	0.16666▲ (+0.0113)	0.15566	0.18696▲ (+0.0313)
	P	0.13886	0.13740	0.11690	0.12506▲ (+0.0081)	0.12242	0.13501▲ (+0.0125)	0.15210	0.14193	0.15335	0.15891▲ (+0.0055)	0.14543	0.18059▲ (+0.03516)
	F1	0.13706	0.14045▲ (+0.0033)	0.11901	0.12573▲ (+0.0067)	0.12205	0.13870▲ (+0.0166)	0.15114	0.14103	0.15391	0.16211▲ (+0.0082)	0.15005	0.18325▲ (+0.0332)
Rouge SU	R	0.13887	0.14676▲ (+0.0078)	0.12409	0.12956▲ (+0.0054)	0.12562	0.14532▲ (+0.0197)	0.15358	0.14382	0.15777	0.16910▲ (+0.0113)	0.15825	0.18957▲ (+0.03132)
	P	0.14126	0.13979	0.11910	0.12741▲ (+0.0083)	0.12476	0.13742▲ (+0.0126)	0.15449	0.14430	0.15580	0.16136▲ (+0.0055)	0.14791	0.18317▲ (+0.03526)
	F1	0.13944	0.14287▲ (+0.0034)	0.12125	0.12807▲ (+0.0068)	0.12436	0.14103▲ (+0.0166)	0.15352	0.14338	0.15636	0.16455▲ (+0.0081)	0.15258	0.18584▲ (+0.03326)

▲ Improvement from baseline.

Table 5. Performance improvements of ROUGE metric on 3-word summaries for CNN/DailyMail Dataset (comparison of baseline and improved models).

ROUGE Metrics		ROUGE Performance Values (CNN/DailyMail—3-Word Abstracts)
		XLNet		Roberta		Distilbert		Electra		Albert		Pegasus
		Baseline	Prop.	Baseline	Prop.	Baseline	Prop.	Baseline	Prop.	Baseline	Prop.	Baseline	Prop.
Rouge 1	R	0.20274	0.33916▲ (+0.1364)	0.31928	0.29687	0.34964	0.39377▲ (+0.0441)	0.39918	0.38935	0.25092	0.34894▲ (+0.0980)	0.34134	0.32971
	P	0.17316	0.18269▲ (+0.0095)	0.20929	0.17842	0.21146	0.38293▲ (+0.1714)	0.40108	0.39035	0.18516	0.18799▲ (+0.0028)	0.22404	0.17474
	F1	0.17462	0.23245▲ (+0.0578)	0.24024	0.21643	0.25597	0.38811▲ (+0.1321)	0.39980	0.38936	0.20367	0.23983▲ (+0.0361)	0.25771	0.22518
Rouge 2	R	0.02720	0.09733▲ (+0.0701)	0.10391	0.06482	0.12666	0.10225	0.12258	0.09785	0.04476	0.09292▲ (+0.0481)	0.10747	0.08703
	P	0.02932	0.05161▲ (+0.0222)	0.05958	0.04097	0.06796	0.09978▲ (+0.0318)	0.12155	0.09786	0.03120	0.04988▲ (+0.0186)	0.05933	0.04602
	F1	0.02525	0.06587▲ (+0.0406)	0.07303	0.04920	0.08628	0.10097▲ (+0.0146)	0.12147	0.09776	0.03484	0.06367▲ (+0.0812)	0.07460	0.05945
Rouge L	R	0.15971	0.28513▲ (+0.0125)	0.26122	0.24169	0.29066	0.27501	0.35715	0.35130	0.19718	0.27847▲ (+0.0812)	0.28569	0.26888
	P	0.14087	0.15279▲ (+0.0119)	0.16315	0.14827	0.17326	0.15642	0.35923	0.35256	0.14596	0.14947▲ (+0.0035)	0.18835	0.13953
	F1	0.13901	0.19480▲ (+0.0557)	0.19073	0.17801	0.21101	0.19464	0.35790	0.35148	0.16037	0.19068▲ (+0.0303)	0.21624	0.18105
Rouge W	R	0.07475	0.12937▲ (+0.0546)	0.12172	0.11413	0.13721	0.12538	0.12332	0.12067	0.09331	0.12843▲ (+0.0351)	0.13270	0.12461
	P	0.11091	0.11239▲ (+0.0014)	0.12616	0.11453	0.13400	0.11726	0.19514	0.19063	0.11145	0.11186▲ (+0.0004)	0.14386	0.10589
	F1	0.08302	0.11708▲ (+0.0340)	0.11670	0.11035	0.13165	0.11770	0.15102	0.14762	0.09679	0.11641▲ (+0.0196)	0.13138	0.11225
Rouge S	R	0.03106	0.10828▲ (+0.0772)	0.10053	0.07424	0.12366	0.09124	0.15122	0.14147	0.05610	0.10345▲ (+0.0473)	0.10697	0.09945
	P	0.02826	0.03471 (+0.0064)	0.03416	0.02551	0.04174	0.02900	0.15210	0.14193	0.02986	0.02772	0.03548	0.02697
	F1	0.02284	0.04899▲ (+0.0261)	0.04408	0.03507	0.05722	0.04094	0.15114	0.14103	0.03285	0.04162▲ (+0.0087)	0.04659	0.04090
Rouge SU	R	0.03960	0.12013▲ (+0.0805)	0.11230	0.08574	0.13499	0.10416	0.15358	0.14382	0.06701	0.11635▲ (+0.0493)	0.11915	0.11093
	P	0.03572	0.03878 (+0.0030)	0.04104	0.03059	0.04719	0.03388	0.15449	0.14430	0.03632	0.03239	0.04262	0.03094
	F1	0.02900	0.05471▲ (+0.0257)	0.05165	0.04152	0.06409	0.04754	0.15352	0.14338	0.03987	0.04817▲ (+0.0083)	0.05440	0.04657

▲ Improvement from baseline.

Table 6. Time savings of the proposed framework for varying abstract lengths in LLMs.

Models	Effect of the Proposed Framework on Processing Time (sn)
	DUC		BBC
	200 Words	400 Words	3 Words
XLNet	196.54→77.23▲ (−60.71%)	185.35→82.17▲ (−55.67%)	43.26→24.27▲ (−43.90%)
RoBERTa	122.33→56.24▲ (−54.03%)	125.03→64.49▲ (−48.42%)	32.61→21.73▲ (−33.36%)
DistilBERT	64.14→27.14▲ (−57.69%)	67.58→33.43▲ (−50.53%)	21.57→13.33▲ (−38.20%)
ELECTRA	37.84→15.68▲ (−58.56%)	39.62→19.82▲ (−49.97%)	11.91→8.92▲ (−25.09%)
ALBERT	113.62→45.2▲ (−60.22%)	114.09→55.52▲ (−51.34%)	24.32→11.71▲ (−51.85%)
Pegasus	4478.08→1384.51▲ (−69.08%)	4843.66→1794.01▲ (−62.96%)	903.73→333.52▲ (−63.10%)

▲ Improvement from baseline.

Table 7. Statistical validation of ROUGE improvements via paired t-tests across summary length variants.

	Rouge Metrics	t-Statistic	p-Value
400-word	Rouge-1	5.420	0.0056
	Rouge-2	6.268	0.0033
	Rouge-L	7.407	0.0018
	Rouge-W	6.909	0.0023
	Rouge-S	8.129	0.0012
	Rouge-SU	9.186	0.0008
200-word	Rouge-1	7.301	0.0019
	Rouge-2	6.389	0.0031
	Rouge-L	9.231	0.0008
	Rouge-W	7.714	0.0015
	Rouge-S	10.157	0.0005
	Rouge-SU	8.337	0.0011
3-word	Rouge-1	10.156	0.0005
	Rouge-2	8.500	0.0011
	Rouge-L	12.509	0.0002
	Rouge-W	10.954	0.0004
	Rouge-S	9.436	0.0007
	Rouge-SU	15.922	0.0001

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hark, C. Using Graph-Based Maximum Independent Sets with Large Language Models for Extractive Text Summarization. Appl. Sci. 2025, 15, 6395. https://doi.org/10.3390/app15126395

AMA Style

Hark C. Using Graph-Based Maximum Independent Sets with Large Language Models for Extractive Text Summarization. Applied Sciences. 2025; 15(12):6395. https://doi.org/10.3390/app15126395

Chicago/Turabian Style

Hark, Cengiz. 2025. "Using Graph-Based Maximum Independent Sets with Large Language Models for Extractive Text Summarization" Applied Sciences 15, no. 12: 6395. https://doi.org/10.3390/app15126395

APA Style

Hark, C. (2025). Using Graph-Based Maximum Independent Sets with Large Language Models for Extractive Text Summarization. Applied Sciences, 15(12), 6395. https://doi.org/10.3390/app15126395

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Using Graph-Based Maximum Independent Sets with Large Language Models for Extractive Text Summarization

Abstract

1. Introduction

Motivation and Objective

2. Related Works

3. Datasets

3.1. Description of the DUC Dataset

3.2. Description of the CNN/DailyMail Dataset

4. Materials and Methods

4.1. Integrating Natural Language and Graph-Based Representations

4.2. Independent Set

4.3. Formatting of Mathematical Components

4.4. Performance Metrics

5. Experiment Setup and Results

5.1. Experiment Setup

5.2. Experimental Results

6. Discussion

7. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI