Contextual Hypergraph Networks for Enhanced Extractive Summarization: Introducing Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES)

Onan, Aytuğ; Alhumyani, Hesham

doi:10.3390/app14114671

Open AccessArticle

Contextual Hypergraph Networks for Enhanced Extractive Summarization: Introducing Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES)

by

Aytuğ Onan

^1,*

and

Hesham Alhumyani

²

¹

Department of Computer Engineering, Faculty of Engineering and Architecture, Izmir Katip Celebi University, Izmir 35620, Turkey

²

Department of Computer Engineering, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(11), 4671; https://doi.org/10.3390/app14114671

Submission received: 27 April 2024 / Revised: 22 May 2024 / Accepted: 24 May 2024 / Published: 29 May 2024

(This article belongs to the Special Issue Text Mining and Data Mining)

Download

Browse Figures

Versions Notes

Abstract

Extractive summarization, a pivotal task in natural language processing, aims to distill essential content from lengthy documents efficiently. Traditional methods often struggle with capturing the nuanced interdependencies between different document elements, which is crucial to producing coherent and contextually rich summaries. This paper introduces Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES), a novel framework designed to address these challenges through an advanced hypergraph-based approach. MCHES constructs a contextual hypergraph where sentences form nodes interconnected by multiple types of hyperedges, including semantic, narrative, and discourse hyperedges. This structure captures complex relationships and maintains narrative flow, enhancing semantic coherence across the summary. The framework incorporates a Contextual Homogenization Module (CHM), which harmonizes features from diverse hyperedges, and a Hypergraph Contextual Attention Module (HCA), which employs a dual-level attention mechanism to focus on the most salient information. The innovative Extractive Read-out Strategy selects the optimal set of sentences to compose the final summary, ensuring that the latter reflects the core themes and logical structure of the original text. Our extensive evaluations demonstrate significant improvements over existing methods. Specifically, MCHES achieves an average ROUGE-1 score of 44.756, a ROUGE-2 score of 24.963, and a ROUGE-L score of 42.477 on the CNN/DailyMail dataset, surpassing the best-performing baseline by 3.662%, 3.395%, and 2.166% respectively. Furthermore, MCHES achieves BERTScore values of 59.995 on CNN/DailyMail, 88.424 on XSum, and 89.285 on PubMed, indicating superior semantic alignment with human-generated summaries. Additionally, MCHES achieves MoverScore values of 87.432 on CNN/DailyMail, 60.549 on XSum, and 59.739 on PubMed, highlighting its effectiveness in maintaining content movement and ordering. These results confirm that the MCHES framework sets a new standard for extractive summarization by leveraging contextual hypergraphs for better narrative and thematic fidelity.

Keywords:

extractive summarization; hypergraph networks; text summarization

1. Introduction

Automatic text summarization is a crucial task in the field of natural language processing (NLP) that aims to reduce a text document to its most essential content, thereby making the breadth of information more manageable and accessible [1]. With the exponential increase in digital content, the need for effective summarization systems that can generate concise and coherent summaries has become more pronounced [2]. Extractive summarization, one of the two primary types of summarization techniques, involves selecting significant sentences or fragments from a text and compiling them to form a summary. Unlike abstractive summarization, which involves rewriting to capture the essence of the information, extractive methods ensure that the veracity of the original text is maintained [3].

The traditional approaches to extractive summarization have largely been centered around statistical and linguistic features such as the frequency and positional value of words, sentence length, and document structure. However, these methods often fail to capture the deeper semantic relationships within the text and the contextual nuances that are critical to producing summaries that are not only concise but also rich in content and coherent in narrative structure [4].

Recent advances in deep learning have led to the development of models that can better understand and process the complexities inherent in natural language [5,6,7]. Techniques such as sequence-to-sequence models, attention mechanisms, and neural embeddings have substantially improved the performance of summarization systems [8,9]. Despite these advancements, the challenge of integrating and leveraging diverse contextual relationships within a document remains a significant hurdle [10,11].

To address these challenges, the concept of using hypergraphs in text summarization has been proposed [12]. Hypergraphs allow for a more flexible representation of relationships within data, extending beyond the pairwise links captured by traditional graph models [13]. In document summarization, hypergraphs can represent complex interactions among multiple sentences or segments, facilitating a richer and more integrated understanding of the text’s structure and thematic elements [14,15,16,17].

However, while hypergraphs provide a powerful framework for data representation, their potential in extractive summarization has not been fully exploited, particularly in terms of effectively homogenizing and contextualizing the information across various document elements. This gap points to the need for a sophisticated approach that not only constructs a comprehensive hypergraph of the document but also processes it through advanced machine learning techniques to extract the most relevant content.

This work introduces Multi-element Contextual Hypergraph Extractive Summarizer (MCHES), a novel framework designed to revolutionize extractive summarization. MCHES innovatively combines the representational benefits of hypergraphs with the latest advancements in contextual embeddings and attention mechanisms. Our main contributions are as follows:

We propose a unique method for constructing a multi-element hypergraph that captures extensive semantic, narrative, and discourse relationships within texts, offering a more nuanced understanding of document structure.
We develop the Contextual Homogenization Module (CHM), which effectively integrates the heterogeneous information present in different types of hyperedges, creating a uniform feature space that enhances data processing.
Our Hypergraph Contextual Attention Module (HCA) implements a dual-level attention mechanism, refining the focus on the most informative parts of the hypergraph and significantly improving the relevance and coherence of the generated summaries.
The Extractive Read-out Strategy we introduce optimizes the selection of sentences for the summary based on a combination of their intrinsic content value and their contextual importance, as mediated by their hypergraph connections.

The rest of this paper is organized as follows: Section 2 provides a review of the related work, highlighting the developments in extractive summarization and the role of hypergraphs in text processing. Section 3 details the methodology, describing each component of the MCHES framework. Section 4 presents the experimental setup, including data sources, baseline comparisons, and evaluation metrics. Section 5 discusses the results, offering insights into the effectiveness of MCHES compared with existing methods. Finally, Section 6 concludes the paper with a summary of findings and potential directions for future research.

2. Related Work

Extractive text summarization (ETS) has undergone substantial evolution, driven by advancements in both computational capacities and theoretical understanding. This section provides an overview of the progression in ETS techniques, from early statistical methods to contemporary machine learning approaches, and discusses the integration of novel computational models that underpin our proposed framework.

Traditional ETS methods predominantly utilize statistical metrics to identify key sentences within a document. These methods emphasize keyword frequency and sentence positioning to gauge importance, as evidenced by the pioneering work on algorithms like TF-IDF [18]. Although foundational, these approaches often fail to capture the nuanced semantic relationships within texts, as they lack the capacity to consider broader discourse structures and thematic coherence.

The introduction of machine learning algorithms marked a significant shift in summarization research. Techniques such as fuzzy logic and machine learning classifiers began to be applied to enhance summarization quality by addressing the ambiguities inherent in human language. For instance, the use of the fuzzy c-means clustering algorithm alongside TF-IDF was explored to categorize sentences by importance [19], which provided a more nuanced approach than simple frequency-based metrics.

Graph-based models also became prevalent, with methods like LexRank and TextRank leveraging the relational information among sentences to improve summarization outcomes [20]. These models utilize the concept of ranking to prioritize sentences that are most central within the document’s semantic network, illustrating the shift towards understanding inter-sentence relationships.

The advent of deep learning has brought transformative changes to ETS. Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and Transformer-based models like BERT have been employed to process large-scale data and learn intricate patterns within texts [21]. These models excel at capturing contextual nuances and have significantly improved the ability of summarization systems to generate coherent and relevant summaries [22].

For example, recent studies have introduced graph neural networks (GNNs) and heterogeneous graph-based models to ETS [23,24]. These approaches dynamically incorporate diverse information types from the document’s structure, enhancing the summarization process by synthesizing summaries that reflect the document’s complex informational hierarchy [25,26,27].

Despite the achievements of deep learning models, challenges remain, particularly in integrating document structure and thematic essence coherently, especially in lengthy and complex documents [28,29]. Hybrid models that combine multiple deep learning strategies with traditional machine learning techniques have been proposed to address these challenges [30]. These models aim to leverage the strengths of each approach, ensuring that summaries are not only precise but also thematically consistent and contextually rich [31].

The proposed Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES) addresses these limitations by performing the following:

Utilizing a hypergraph structure to capture complex relationships among sentences, including semantic, narrative, and discourse connections.
Incorporating a Contextual Homogenization Module (CHM) to harmonize features from diverse hyperedges, enhancing data integration.
Implementing a Hypergraph Contextual Attention Module (HCA) with a dual-level attention mechanism, focusing on the most salient information.
Employing the innovative Extractive Read-out Strategy to ensure that summaries reflect the core themes and logical structure of the original text.

Our extensive evaluations demonstrate that MCHES significantly outperforms existing methods in terms of ROUGE scores, BERTScore, and MoverScore, highlighting its effectiveness in producing coherent and contextually rich summaries.

In response to these developments, our work introduces Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES). MCHES integrates the representational power of hypergraphs with advanced contextual embeddings and attention mechanisms, setting a new benchmark in the field. Unlike previous models, MCHES explicitly addresses the need for a more sophisticated integration of semantic, narrative, and discourse elements within summaries, promising improvements in both accuracy and coherence of the generated summaries.

The introduction of MCHES is aimed at bridging the gaps identified in current ETS methodologies by providing a framework that not only captures the intricate relationships within texts but also enhances the summarization process through an innovative use of hypergraph technologies combined with dual-level attention mechanisms.

The related work in ETS, as summarized in Table 1 illustrates a trend towards increasingly sophisticated models that strive to better capture the complexities of human language and document structure. Our proposed framework builds on these innovations, aiming to further the capabilities of summarization systems to produce high-quality, coherent, and contextually aware summaries.

3. Methodology

This section outlines the methodology employed in the development of Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES), which is designed to enhance the extractive summarization process through the use of contextual hypergraphs. The methodology is divided into four main modules: Contextual Hypergraph Construction, Contextual Homogenization Module, Hypergraph Contextual Attention Module, and Extractive Read-out Strategy. In Figure 1, the general structure of the MCHES framework is summarized.

The Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES) framework is designed to enhance text summarization by leveraging the rich contextual relationships among sentences. The framework consists of several key stages:

Input Document Processing: The document is segmented into individual sentences, which are then transformed into high-dimensional vector representations by using pre-trained embeddings.
Hypergraph Construction: Sentences are represented as nodes in a hypergraph, with different types of hyperedges (semantic, narrative, and discourse) capturing various contextual relationships between sentences.
Contextual Homogenization Module (CHM): This module harmonizes features from the different hyperedges to create a unified feature set for each node.
Hypergraph Contextual Attention (HCA) Module: A dual-level attention mechanism is applied to focus on the most relevant sentences and their contextual relationships.
Extractive Read-out Strategy: The most informative sentences are selected based on their relevance scores, ranked, and combined to form the final summary.
Output Summary: The final summary is generated, capturing the core themes and logical structure of the original document.

3.1. Contextual Hypergraph Construction

The initial stage in Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES) is the construction of a contextual hypergraph from the input text document. This hypergraph models complex sentence relationships and contextual nuances vital for effective summarization. We define this hypergraph formally as follows.

3.1.1. Hypergraph Definition

A hypergraph H is denoted by

H = (V, E)

, where V is the set of nodes and

E

is the set of hyperedges. In the context of MCHES, the following apply:

Each node $v_{i} \in V$ represents a sentence in the document.
Each hyperedge $e_{j} \in E$ represents a group of sentences that share a specific contextual relationship, described further below.

3.1.2. Node Representation

Each sentence

s_{i}

in the document is transformed into a node

v_{i}

by using pre-trained embeddings:

v_{i} = Embed (s_{i}), i = 1, 2, \dots, n

(1)

where

Embed (\cdot)

is a function that maps sentences to a high-dimensional vector space (e.g., by using BERT embeddings).

3.1.3. Types of Hyperedges

To capture the rich contextual information within the document, we construct three types of hyperedges:

Semantic hyperedges: They connect nodes that share significant semantic similarities. The semantic similarity between two sentences $s_{i}$ and $s_{j}$ is quantified by using the cosine similarity of their embeddings:

$sim (v_{i}, v_{j}) = \frac{v_{i} \cdot v_{j}}{∥ v_{i} ∥ ∥ v_{j} ∥}$

(2)

A semantic hyperedge $e_{k}$ includes all sentence pairs $(v_{i}, v_{j})$ whose similarity exceeds a predefined threshold $θ_{s}$ .
Narrative hyperedges: They link consecutive sentences to maintain the narrative flow. Each narrative hyperedge connects sentences that are adjacent in the text:

$e_{l} = {v_{i}, v_{i + 1}}, i = 1, 2, \dots, n - 1$

(3)
Discourse hyperedges: They are formed based on the discourse role of sentences within the document structure (e.g., argumentative structure and exposition). Sentences fulfilling similar roles are grouped under the same hyperedge:

$e_{m} = {v_{i} | role (v_{i}) = R}, R \in Roles$

(4)

where $role (v_{i})$ identifies the discourse role of sentence $v_{i}$ .

3.1.4. Hypergraph construction algorithm

The hypergraph construction algorithm has been presented in Algorithm 1.

Algorithm 1 Hypergraph Construction Algorithm

Require:: Document D with sentences $S = {s_{1}, s_{2}, \dots, s_{n}}$
Ensure:: Hypergraph $H = (V, E)$
1:: Initialize $V \leftarrow \emptyset$
2:: Initialize $E \leftarrow \emptyset$
3:: for each sentence $s_{i} \in S$ do
4:: $v_{i} \leftarrow Embed (s_{i})$ ▹ Convert sentence to embedding
5:: $V \leftarrow V \cup {v_{i}}$
6:: end for
7:: for each pair of sentences $(s_{i}, s_{j})$ do
8:: Calculate semantic similarity $s i m (v_{i}, v_{j})$
9:: if $s i m (v_{i}, v_{j}) > θ_{s}$ then
10:: $e_{k} \leftarrow {v_{i}, v_{j}}$ ▹ Create semantic hyperedge
11:: $E \leftarrow E \cup {e_{k}}$
12:: end if
13:: end for
14:: for each consecutive pair of sentences $(s_{i}, s_{i + 1})$ do
15:: $e_{l} \leftarrow {v_{i}, v_{i + 1}}$ ▹ Create narrative hyperedge
16:: $E \leftarrow E \cup {e_{l}}$
17:: end for
18:: for each discourse role R do
19:: $D_{R} \leftarrow {v_{i} ∣ role (v_{i}) = R}$
20:: if $| D_{R} | > 1$ then
21:: $e_{m} \leftarrow D_{R}$ ▹ Create discourse hyperedge
22:: $E \leftarrow E \cup {e_{m}}$
23:: end if
24:: end for
25:: return $H = (V, E)$

This structured approach enables MCHES to effectively leverage diverse contextual relationships among sentences, enhancing the summarizer’s ability to select the most informative content for inclusion in the summary.

3.2. Contextual Homogenization Module (CHM)

The Contextual Homogenization Module (CHM) is designed to address the heterogeneity inherent in the multi-type hyperedges generated during hypergraph construction. This module integrates features from various hyperedges, harmonizing them into a coherent feature space that enables efficient summarization. The CHM uses a Multimodal Factorized Bilinear Pooling (MFB) approach to facilitate the interaction among the different hyperedge features. The main stages of the algorithm are presented in Algorithm 2.

Algorithm 2 Contextual Homogenization Module (CHM)

Require:: Hypergraph $H = (V, E)$
Ensure:: Homogenized feature set F
1:: Initialize $F \leftarrow \emptyset$
2:: for each node $v_{i} \in V$ do
3:: $f_{i} \leftarrow Aggregate (features (v_{i}))$ ▹ Aggregate features from hyperedges
4:: $F \leftarrow F \cup {f_{i}}$
5:: end for
6:: returnF

3.2.1. Multimodal Factorized Bilinear Pooling (MFB)

The MFB technique [32] is utilized to combine and enhance the feature representations from different types of hyperedges effectively. The process can be mathematically modeled as follows:

z = MFB (x, y) = \sum_{k = 1}^{K} (x^{T} W_{k} y) ⊙ σ,

(5)

where x and y are feature vectors from two different types of hyperedges,

W_{k}

are the weight matrices for the k-th bilinear interaction component,

σ

is a nonlinear activation function (e.g., sigmoid), and ⊙ denotes element-wise multiplication. Parameter K represents the number of slices in the factorized bilinear model, which allows for interaction across different feature dimensions.

3.2.2. Feature Integration across Hyperedges

The CHM integrates features from three primary types of hyperedges: semantic, narrative, and discourse. The integration process is detailed below:

Semantic feature integration: For semantic hyperedges, the embeddings of connected nodes are pooled together by using an MFB-based approach, which enhances the feature representation by emphasizing shared semantic characteristics.

$f_{sem} = MFB ({v_{i}}_{i \in e_{k}}),$

(6)

where ${v_{i}}_{i \in e_{k}}$ represents the set of node vectors in semantic hyperedge $e_{k}$ .
Narrative feature integration: Narrative hyperedges connect consecutive sentences, capturing the flow of the narrative. The integration of these features aims to preserve the sequential integrity of the text.

$f_{nar} = MFB ({v_{i}}_{i \in e_{l}}),$

(7)

where ${v_{i}}_{i \in e_{l}}$ includes the node vectors linked by narrative hyperedges.
Discourse feature integration: Discourse hyperedges group sentences by their roles within the document’s structure. Integrating these features helps emphasize the rhetorical significance of each sentence grouping.

$f_{disc} = MFB ({v_{i}}_{i \in e_{m}}),$

(8)

where ${v_{i}}_{i \in e_{m}}$ corresponds to nodes associated with a particular discourse role.

The outputs from the CHM are unified feature vectors for each type of hyperedge and are then forwarded to the Hypergraph Contextual Attention Module (HCA) for further processing. This unified feature space facilitates the selective attention mechanism in HCA, allowing it to more effectively identify and emphasize the most informative elements of the document for summarization.

This comprehensive homogenization strategy ensures that all types of contextual information embedded within the hypergraph are optimally utilized, improving the summarizer’s ability to generate concise and contextually rich summaries.

3.3. Hypergraph Contextual Attention Module (HCA)

The Hypergraph Contextual Attention Module (HCA) is a critical component of the MCHES framework that applies advanced attention mechanisms to the features integrated by the Contextual Homogenization Module (CHM). The HCA module aims to refine these features further by selectively focusing on the most informative parts of the hypergraph, thereby enhancing the summarization process. The general structure of this module is briefly summarized in Algorithm 3.

Algorithm 3 Hypergraph Contextual Attention (HCA) Module

Require:: Homogenized feature set F
Ensure:: Attention-weighted features $F^{'}$
1:: Initialize $F^{'} \leftarrow \emptyset$
2:: for each feature $f_{i} \in F$ do
3:: $a_{i} \leftarrow ComputeAttention (f_{i})$ ▹ Compute attention score
4:: $f_{i}^{'} \leftarrow a_{i} \cdot f_{i}$ ▹ Apply attention weight
5:: $F^{'} \leftarrow F^{'} \cup {f_{i}^{'}}$
6:: end for
7:: return $F^{'}$

3.3.1. Attention Mechanism

The attention mechanism in HCA is dual-level, comprising both sentence-level and hyperedge-level attention, designed to optimize the selection of content for the summary. This dual approach ensures a comprehensive filtering of both individual sentence importance and the collective importance of sentences within hyperedges.

Sentence-level attention aims to assess and weight the significance of each sentence independently. It is mathematically expressed as [33]

α_{i} = \frac{exp (ReLU (w^{T} v_{i} + b))}{\sum_{j = 1}^{n} exp (ReLU (w^{T} v_{j} + b))},

(9)

where

v_{i}

is the feature vector of sentence i, w is a learnable weight vector, b is a bias term, and ReLU is the rectified linear activation function. The attention scores (

α_{i}

) highlight the relative importance of each sentence, guiding extractive summary composition.

Hyperedge-level attention assesses the importance of each hyperedge’s contribution to the overall document context. This attention is calculated as follows:

β_{k} = \frac{exp (LeakyReLU (u^{T} f_{k} + c))}{\sum_{l = 1}^{m} exp (LeakyReLU (u^{T} f_{l} + c))},

(10)

where

f_{k}

is the integrated feature vector of hyperedge k, u is a learnable weight vector, c is a bias term, and LeakyReLU is used as nonlinear activation to allow a small gradient when the unit is inactive. These scores (

β_{k}

) dictate the focus on hyperedges, integrating them into the final summary effectively.

3.3.2. Gating Mechanism

To further enhance the attention process, a gating mechanism is introduced. This mechanism filters out less relevant features and focuses on processing the most critical elements:

g_{i} = σ (W_{g} v_{i} + b_{g}),

(11)

where

σ

represents the sigmoid function,

W_{g}

is a weight matrix, and

b_{g}

is a bias term for gating. This gate (

g_{i}

) multiplicatively adjusts the attention weights, refining the selection process.

3.4. Integration and Output

The final step in the HCA module is the integration of the attention scores from both levels to determine the content of the summary:

v_{i}^{'} = g_{i} \cdot (α_{i} v_{i}) \cdot (\sum_{k \in E (i)} β_{k}),

(12)

where

E (i)

denotes the set of hyperedges to which sentence i belongs. This equation ensures that the final representation (

v_{i}^{'}

) of each sentence reflects both its individual importance and its contextual relevance via hyperedges, enabling the effective extraction of summary sentences.

The HCA module’s sophisticated attention and gating mechanisms significantly enhance the capacity to produce summaries that are not only concise but also contextually comprehensive, addressing the key themes and narrative threads of the original document.

3.5. Extractive Read-Out Strategy

The Extractive Read-out Strategy is the final module in Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES). This module is responsible for synthesizing the processed features into a coherent summary by strategically selecting the most informative sentences based on their enhanced representations. The strategy incorporates an attention-based pooling mechanism to optimally combine insights from both sentence-level and hyperedge-level analyses.

3.5.1. Attention Pooling Mechanism

The attention pooling mechanism is employed to aggregate the representations of sentences after they have been filtered and weighted by the Hypergraph Contextual Attention Module (HCA). This pooling is crucial to determining which sentences are included in the final summary:

S = \sum_{i = 1}^{n} α_{i}^{'} g_{i} v_{i},

(13)

where

v_{i}

is the feature vector of sentence i,

g_{i}

is the gating factor applied to sentence i (as computed in the HCA module), and

α_{i}^{'}

is the updated attention weight reflecting both the intrinsic importance of the sentence and its contextual relevance through hyperedge interactions.

3.5.2. Sentence Selection for Summary

The selection of sentences for inclusion in the summary is based on a composite score calculated by using both the sentence’s individual attention score and its contribution to the narrative or thematic coherence as determined by the hyperedge it belongs to:

c_{i} = softmax (λ α_{i}^{'} + (1 - λ) \sum_{k \in E (i)} β_{k}),

(14)

where

λ

is a parameter that balances the importance of sentence-specific features (

α_{i}^{'}

) and hyperedge-based contextual features (

β_{k}

) and

E (i)

includes all hyperedges to which sentence i is connected. This scoring mechanism ensures that sentences selected for the summary not only contain critical information but also contribute to a holistic understanding of the document.

3.5.3. Summary Compilation

Once the sentences are selected, they are compiled in their original order to maintain the logical and narrative flow of the source document. This ordered compilation is critical to ensuring that the summary is not only informative but also readable:

Summary = ⋃_{i \in selected} s_{i},

(15)

where “selected” refers to the indices of sentences that have the highest composite scores (

c_{i}

), ensuring that the summary is representative of the entire document.

3.5.4. Optimization of Selection Criteria

The parameters involved in the sentence selection process, including

λ

and the thresholds for attention scores, are optimized by using a development set of document–summary pairs. This optimization aims to maximize the alignment with human-judged summary quality, typically measured by using ROUGE scores.

This structured approach to sentence extraction and summary formation allows MCHES to generate summaries that are not only concise but also encompass the essential themes and narratives of the document, making it suitable for diverse applications in automated document summarization.

4. Optimization and Training

The training and optimization of Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES) are crucial to achieving high-quality summarization results. This section describes the training objectives, the optimization algorithm used, and the strategies for parameter tuning.

The primary objective of training MCHES is to minimize the loss function, defined as the negative overlap between the machine-generated summaries and the human-generated reference summaries. This is typically quantified by using the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metric, which measures the number of overlapping units such as N-grams, word sequences, and word pairs between the computer-generated and the reference summaries [34,35]:

L (θ) = - \sum_{i = 1}^{N} ROUGE ({\hat{y}}_{i}, y_{i}; θ),

(16)

where

{\hat{y}}_{i}

is the summary produced by the model for the i-th document,

y_{i}

is the corresponding human-generated reference summary,

θ

represents the model parameters, and N is the number of training instances.

Optimization Algorithm

The parameters of the model are optimized by using stochastic gradient descent (SGD) or one of its variants, like Adam, which is often preferred due to its adaptive learning rate capabilities:

θ_{t + 1} = θ_{t} - η_{t} \cdot \nabla_{θ} L (θ_{t}),

(17)

where

θ_{t}

is the parameter vector at iteration t,

η_{t}

is the learning rate at iteration t, and

\nabla_{θ} L (θ_{t})

is the gradient of the loss function with respect to the parameters at iteration t. In Algorithm 4, the general structure for the optimization algorithm is presented.

Algorithm 4 Optimization algorithm

Require:: Input parameters and initial values
Ensure:: Optimized parameters
1:: Initialize parameters $θ$
2:: Set learning rate $α$
3:: Set convergence threshold $ϵ$
4:: repeat
5:: Calculate gradient $\nabla J (θ)$
6:: Update parameters $θ \leftarrow θ - α \nabla J (θ)$
7:: Compute cost $J (θ)$
8:: until $| \nabla J (θ) | < ϵ$
9:: return Optimized parameters $θ$

To fine-tune the model parameters and prevent overfitting, we employ techniques such as early stopping based on validation set performance. Additionally, parameters such as the number of attention layers, the size of the embedding dimensions, and the balance parameter (

λ

) in the sentence selection scoring are optimized through a grid search approach on the validation dataset:

Early stopping: Monitor the ROUGE score on the validation set after each epoch, and stop training when the score does not improve for a predefined number of consecutive epochs.
Grid search: Systematically vary parameters within specified ranges, and select the configuration that achieves the highest ROUGE score on the validation set.

5. Experiments and Results

In the following sections, we provide a detailed description of our experimental setup, including the datasets and baseline models used for comparison. We then present the results of our experiments, followed by an analysis of these results. This comprehensive evaluation demonstrates the superiority of our MCHES framework over existing methods.

5.1. Datasets

To assess the effectiveness of our novel extractive summarization framework, we conducted experiments by using three benchmark datasets, each tailored to different textual lengths and complexities. These datasets are widely recognized in the domain of automatic text summarization, providing diverse challenges ranging from short news articles to lengthy scientific reports:

XSum: This dataset comprises news articles sourced from the British Broadcasting Corporation (BBC), known for their concise single-sentence summaries. It includes approximately 203,028 articles for training, 11,273 for validation, and 11,332 for testing. Each document is accompanied by a professionally written summary, making it ideal for evaluating summarization on short texts [36].
CNN/DailyMail: This dataset features news articles paired with multi-point bullet summaries, serving as a moderate-length document source for summarization tasks. The non-anonymized version of the dataset was used, segmented into 287,084 training samples, 13,367 validation samples, and 11,489 test samples. The structure of these summaries provides a unique challenge in capturing essential narrative threads across multiple bullet points [37].
PubMed: As a representative of scientific and technical document summarization, the PubMed dataset contains lengthy abstracts of biomedical articles. It is composed of 83,233 training documents, 4946 for validation, and 5025 for testing. The comprehensive and technical nature of these summaries tests the ability of summarization models to handle complex, information-dense content [38].

These datasets provide a robust platform for demonstrating the capabilities of our framework across various dimensions of content length and complexity. Table 2 provides an overview of the domain, the number of document–summary pairs, and the average token counts for documents and summaries in each dataset.

5.2. Evaluation Metrics

To quantitatively evaluate the performance of our Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES), we employ a set of widely used metrics that capture different aspects of summarization quality. These metrics include ROUGE scores, which assess the overlap between the generated summaries and the reference summaries, and advanced embedding-based metrics that evaluate semantic similarity.

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metrics are standard for evaluating automatic summarization and machine translation systems. They measure the overlap between the system-generated summary and a set of reference summaries. ROUGE-N measures the overlap of N-grams between the generated summary and the reference summaries. For unigrams and bigrams, we use ROUGE-1 and ROUGE-2.

The effectiveness of the trained model is assessed by using standard summarization metrics such as ROUGE-N (for assessing the overlap of N-grams between the generated summary and the reference), ROUGE-L (which considers the longest common subsequence), and ROUGE-S (which measures skip-bigram overlap):

ROUGE - N = \frac{\sum_{s \in Reference Summaries} \sum_{g r a m_{n} \in s} {Count}_{match} (g r a m_{n})}{\sum_{s \in Reference Summaries} \sum_{g r a m_{n} \in s} Count (g r a m_{n})},

(18)

This comprehensive optimization and evaluation framework ensures that MCHES not only accurately captures the key elements of the original texts but also generates summaries that are coherent and closely aligned with human judgment.

BERTScore leverages [39] the pre-trained contextual embeddings from BERT to measure the semantic quality of the generated summaries. It computes the cosine similarity between the tokens of the system-generated summary and the tokens of the reference summary:

BERTScore = \frac{1}{N} \sum_{i = 1}^{N} max_{j} cos ({BERT}_{sys} (i), {BERT}_{ref} (j))

(19)

where

{BERT}_{sys} (i)

and

{BERT}_{ref} (j)

are the BERT embeddings of the i-th token in the system summary and the j-th token in the reference summary, respectively, and N is the number of tokens in the system summary.

MoverScore extends [40] the idea of BERTScore by using Earth Mover’s Distance to calculate a distance-based score between the embeddings of the system and the reference summaries. It captures both semantic meaning and word order:

MoverScore = 1 - EMD ({BERT}_{sys}, {BERT}_{ref})

(20)

where

EMD ({BERT}_{sys}, {BERT}_{ref})

is the Earth Mover’s Distance between the BERT embeddings of the system and the reference summaries.

These metrics collectively provide a comprehensive view of both the fidelity and fluency of the generated summaries, allowing us to rigorously assess the effectiveness of the proposed summarization framework.

5.3. Baseline Models

To validate the performance of Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES), we compare it against a range of established baseline models that represent both traditional and state-of-the-art approaches in extractive summarization. These models are selected to cover a variety of methodologies including heuristic-based, machine learning, and deep learning techniques:

LEAD-3 is a heuristic baseline that extracts the first three sentences of a document, based on the assumption that the lead sentences contain key information. It serves as a simple yet effective comparison point.
SummaRuNNer [41] is an RNN-based sequence model that evaluates the salience of sentences based on features extracted across the document to generate a summary.
NeuSum [42] integrates sentence scoring and selection into a single joint model, using an RNN-based encoder–decoder framework to predict the saliency of sentence combinations directly.
BanditSum [43] implements a reinforcement learning approach, treating the summarization task as a contextual bandit problem where the model learns to choose sentences that maximize the ROUGE metric.
JECS [44] utilizes a joint extraction and compression strategy, employing a syntactic transformation of the input text to produce concise summaries.
HIBERT [45] uses a hierarchical bidirectional transformer to encode long documents and perform document-level summarization.
BERTSUMEXT [1,46] is an extension of BERT for extractive summarization, using pre-trained contextual embeddings to assess sentence importance within a document.
NeRoBERTa [47] is a variant that adapts RoBERTa embeddings specifically for nested hierarchical document structures to enhance summarization.
MatchSum [48] formulates the summarization task as a semantic matching problem between candidate sentences and the document, using a contrastive learning framework.
HAHSum [49] incorporates a hierarchical attention mechanism with heterogeneous graph representations to refine the summarization process across multiple document levels.
GenCompareSum [50] is a hybrid model that combines unsupervised abstractive techniques with extractive methods, tailored for summarizing long and complex documents, like scientific papers.

These baselines span a range of approaches and have shown varied degrees of success across different datasets and summarization tasks. By comparing MCHES against these models, we aim to demonstrate its superiority in leveraging complex document structures and semantic relationships for producing high-quality summaries.

5.4. Experimental Settings

Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES) was implemented by using PyTorch. Our experiments were designed to assess the model’s performance across a diverse set of datasets and compare it with several baseline models. Below, we describe the specific settings used in these experiments.

The MCHES framework incorporates several novel components, each configured to optimize the summarization process:

Preprocessing: Texts were tokenized by using spaCy, and sentences were encoded by using pre-trained BERT-base embeddings to capture deep semantic features.
Hypergraph construction: A hypergraph was constructed for each document, where nodes represent sentences and hyperedges encapsulate semantic, narrative, and discourse relations.
Model architecture: The CHM and the HCA module are central to our framework, focusing on homogenizing features and applying dual-level attention, respectively.

Training was conducted on an NVIDIA Tesla V100 GPU with the following specifications:

Optimizer: Adam, with a learning rate of 1 × 10⁻⁴.
Batch size: 32 documents per batch to balance computational efficiency and training stability.
Early stopping: employed based on the validation loss to prevent overfitting, with a patience parameter of 10 epochs.

The baseline models were set up as follows:

Setup: All baselines were implemented with their standard configurations as reported in their respective publications.
Preprocessing: Uniform preprocessing was applied across all models for a fair comparison, including the same tokenization and embedding methods used for MCHES.
Hyperparameter tuning: Each model was tuned individually on the validation set of each dataset to optimize performance.

5.5. Model Variants

To systematically assess the impact of each component of Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES), we conducted an ablation study. This study involves modifications to the base model by sequentially removing or altering specific modules, thereby evaluating the performance decrement and underscoring the contribution of each component. Below, we detail the variants considered in this study:

MCHES-NoCHM: This variant removes the Contextual Homogenization Module to assess the impact of feature homogenization across different hyperedges:
−
Includes: contextual hypergraph, HCA, and Extractive Read-out.
−
Excludes: CHM.
MCHES-NoHCA: This version excludes the Hypergraph Contextual Attention Module, testing the importance of dual-level attention in enhancing summary relevance:
−
Includes: contextual hypergraph, CHM, and Extractive Read-out.
−
Excludes: HCA.
MCHES-NoReadout: By omitting the Extractive Read-out Strategy, this variant evaluates the efficacy of the attention-based sentence selection mechanism:
−
Includes: contextual hypergraph, CHM, and HCA.
−
Excludes: Extractive Read-out.
MCHES-BasicGraph: This model uses a simplified graph structure without the multi-element enhancements to determine the baseline effectiveness of using any hypergraph structure:
−
Includes: basic hypergraph, CHM, HCA, and Extractive Read-out.
−
Replaces multi-element contextual hypergraph with basic hypergraph.

Each variant was tested by using the same datasets and evaluation metrics as the full model to ensure fair comparison. The results from this ablation study provide clear insights into the functionality and necessity of each component within the MCHES framework, contributing to a deeper understanding of the model’s operational dynamics.

5.6. Experimental Results

The experimental results demonstrate the effectiveness of Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES) in the task of automatic extractive summarization. As shown in Table 3, MCHES achieves the highest ROUGE scores across all three metrics (R-1, R-2, and R-L) when evaluated on the CNN/DailyMail dataset, outperforming all baselines and variant models.

The full MCHES model surpasses the strongest baseline, GenCompareSum, by 3.662% in R-1, 3.395% in R-2, and 2.166% in R-L. This significant improvement highlights the synergistic effect of the multi-element hypergraph construction along with the contextual homogenization and attention mechanisms implemented within MCHES.

Further analysis of the ablation study reveals the importance of each component. The removal of the Contextual Homogenization Module (MCHES-NoCHM) results in a performance drop, suggesting that the integration of heterogeneous hyperedge information plays a substantial role in capturing the document context. The performance decline in the MCHES-NoHCA and MCHES-NoReadout variants further emphasizes the contributions of the Hypergraph Contextual Attention Module and the Extractive Read-out Strategy, respectively. Notably, the MCHES-BasicGraph variant, which employs a simplified graph structure, demonstrates the added value of the multi-element contextual representation, as its removal leads to a noticeable decrease in ROUGE scores.

These results corroborate the hypothesis that a comprehensive representation of document structure and context, combined with focused attention and strategic read-out, is critical to producing high-quality summaries. The empirical evidence suggests that the MCHES framework can effectively leverage the intricate relationships within texts, significantly enhancing the summarization output.

The evaluation of Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES) on the Xsum dataset, as presented in Table 4, has yielded insightful findings. The proposed MCHES model outperforms all baseline models and ablated versions across the ROUGE-1 (R-1), ROUGE-2 (R-2), and ROUGE-L (R-L) metrics, reinforcing the efficacy of its comprehensive design.

MCHES’s superiority is pronounced in its ROUGE-2 score, which, at 8150, is significantly higher than the next-best model, GenCompareSum, at 7552. This suggests that MCHES’s sophisticated approach to understanding and leveraging the nuanced relationships between sentences and document structure is particularly effective. The large margin in ROUGE-2, which measures bigram overlap, indicates a robust capacity for capturing and reproducing the most salient content, a critical aspect of effective summarization.

The ROUGE-L score, reflecting the longest common subsequence, also sees MCHES leading by a substantial margin. With a score of 17,318, MCHES demonstrates a strong ability to maintain sentence structure and coherence, which is essential to producing readable and fluent summaries.

The ablation study results shed light on the contribution of individual MCHES components. The decline in performance observed in MCHES-NoCHM suggests that the Contextual Homogenization Module plays a vital role in summarization by effectively integrating diverse contextual information from multiple types of hyperedges. The MCHES-NoHCA model, lacking the Hypergraph Contextual Attention Module, shows a lower but still significant drop in ROUGE scores. This underscores the importance of the dual-level attention mechanism in filtering and focusing on the most informative elements of the hypergraph. Notably, MCHES-NoReadout exhibits a decrease in performance, affirming the significance of the Extractive Read-out Strategy in the final summary construction. The attention-based pooling mechanism employed in the read-out phase contributes markedly to the selection of sentences that best represent the main points of a document.

MCHES-BasicGraph, which employs a simpler graph structure without the multi-element enhancements, provides a baseline to understand the additional value brought by the richer multi-element contextual hypergraph representation. The empirical results from our experiments on the Xsum dataset demonstrate the robustness of the MCHES framework, with particular strengths in understanding and reproducing complex document content. The outcomes of the ablation study confirm the necessity of each module within MCHES, offering valuable insights into their functional roles. Overall, MCHES sets a new standard for extractive summarization, particularly in contexts demanding high levels of semantic comprehension and structural awareness.

The empirical results presented in Table 5 for the PubMed dataset indicate a clear advantage of the MCHES framework over traditional and contemporary baseline models. The proposed model, MCHES, exhibits superior performance, with the highest ROUGE scores, suggesting its effectiveness in summarizing scientific papers, which are characterized by complex structures and specialized vocabulary. The MCHES model demonstrates a remarkable improvement in R-2 scores, achieving 19,129, which is a significant leap from the closest baseline, GenCompareSum, with a score of 15,345. This performance is indicative of MCHES’s ability to identify and retain key information from highly technical and content-dense documents. Moreover, the high R-L score achieved by MCHES suggests that it is particularly adept at maintaining sentence structure and coherence, essential to summarizing scientific articles that require the precise replication of the original content’s complexity.

The results tabulated in Table 6 showcase the performance of various summarization models on the CNN/DailyMail, Xsum, and PubMed datasets by using BERTScore, a metric that reflects semantic similarity with human references. The MCHES framework, our proposed model, attains the highest BERTScore values across all three datasets, indicating its superior ability to produce semantically relevant summaries.

The proposed MCHES model demonstrates exceptional semantic fidelity, as evidenced by its leading BERTScore values of 59,995 on CNN/DailyMail, 88,424 on Xsum, and 89,285 on PubMed. This underscores the model’s effectiveness in generating summaries that are not only topically accurate but also semantically aligned with the reference summaries.

The ablation study results incorporated into the BERTScore evaluation further elucidate the significance of each MCHES component. MCHES-NoCHM exhibits a decrease in scores across all datasets when compared with the full MCHES model, affirming the contribution of the Contextual Homogenization Module to semantic coherence. The performance dip of MCHES-NoHCA emphasizes the value added by the Hypergraph Contextual Attention Module in enhancing semantic alignment with reference summaries.

Interestingly, MCHES-NoReadout and MCHES-BasicGraph maintain relatively high BERTScore values, suggesting that while the full MCHES model is the most effective, its individual components still positively contribute to semantic performance. MCHES-BasicGraph, in particular, scores higher than the complete MCHES model on the Xsum and PubMed datasets, indicating that a simplified graph structure may sometimes yield summaries with slightly more semantic similarity to references, possibly due to less complexity in the model leading to reduced overfitting or noise.

The performance of the summarization models, as presented in Table 7 using MoverScore, aligns with the semantic coherence and content relevance evaluations observed in prior assessments. MCHES stands out across the board, substantiating its effectiveness in generating summaries that closely resemble human references in terms of content movement and ordering. The MCHES model secures the highest MoverScore across all datasets, with particularly notable margins on the PubMed dataset, suggesting that its approach is exceptionally well suited for structurally and semantically complex texts. This aligns with the model’s design, which intricately processes and integrates multi-dimensional content representations, reaffirming the importance of the MCHES framework’s comprehensive architectural components for summarization. Interestingly, the ablation study reveals a minimal decline in MoverScore for models lacking individual MCHES components (MCHES-NoCHM, MCHES-NoHCA, and MCHES-NoReadout), indicating that while each component adds value, their absence does not drastically diminish the overall semantic alignment with the references. This could suggest that the remaining components can compensate to an extent, preserving the semantic quality of the summaries. The MCHES-BasicGraph model demonstrates that even a simplified representation within the MCHES framework can achieve results comparable to more complex models. This implies potential scalability and adaptability of the framework to different summarization contexts, perhaps even suggesting a degree of robustness against model over-parameterization.

To summarize the main findings of the empirical results in Figure 2, Figure 3 and Figure 4, we present the main effects plots for R-1, BERTScore, and MoverScore values, respectively.

To provide a comprehensive comparison, we implemented a simple TF-IDF-based summarization algorithm. The following table presents the comparison results for each dataset and each metric.

As shown in Table 8, the TF-IDF algorithm, while simple and easy to implement, consistently lags behind the MCHES framework in terms of ROUGE and BERTScore metrics across all datasets. This comparison highlights the substantial improvements offered by our proposed MCHES framework, demonstrating its effectiveness in producing high-quality summaries.

5.7. Manual Evaluation

For the manual evaluation, we used the following criteria to assess the quality of the summaries:

Coherence: how logically connected and consistent the summary is.
Readability: how easy it is to read and understand the summary.
Informativeness: how well the summary captures the essential information from the original document.
Relevance: the relevance of the content included in the summary to the main topics of the original document.

We recruited three human annotators with expertise in text summarization and natural language processing. Each annotator independently evaluated a set of summaries generated by our MCHES framework and compared them to the reference summaries and summaries generated by baseline models. The results of the manual evaluation are summarized in the following table.

As shown in Table 9, the summaries generated by our MCHES framework scored higher on all evaluation criteria compared with the baseline models. These results, combined with the automated evaluation metrics, provide a more comprehensive validation of our framework’s effectiveness.

6. Discussion

In this section, we delve into the insights gleaned from our experimental evaluations, as delineated by the evaluation metrics. Our discussion interprets the significance of the empirical results for the MCHES model variants and compares their performance with established baseline models as follows:

Outstanding performance of MCHES:
−
MCHES outperforms the baseline models on all datasets, indicating its effective capture of semantic content and structure.
−
The substantial lead in MoverScore on the PubMed dataset suggests pronounced strength in processing technical and domain-specific texts.
Significance of MCHES components:
−
The minimal performance drop for the MCHES-NoCHM and MCHES-NoHCA variants suggests a strong base model that is enhanced, rather than defined, by its individual components.
−
The MCHES-NoReadout variant, despite the absence of the read-out strategy, maintains competitive scores, underlining the robustness of the attention mechanisms within the model.
Performance consistency across datasets:
−
MCHES exhibits consistent performance across different datasets, reinforcing the model’s adaptability and generalizability.
−
The consistently high performance of MCHES-BasicGraph, particularly on the Xsum dataset, challenges the assumption that model complexity is a prerequisite for higher semantic alignment.
Implications for future research:
−
The results prompt a re-evaluation of component contributions in complex models, suggesting potential areas for simplification without significant performance losses.
−
Future research might explore the trade-off between model complexity and performance, especially in the context of domain-specific summarization tasks.

The empirical evidence presented firmly positions the MCHES framework as a leading approach to extractive summarization, capable of understanding and replicating the intricate semantic relationships in texts. The findings from our ablation study contribute to a nuanced understanding of the role each architectural element plays, setting the stage for subsequent advancements in the field of NLP.

7. Limitations

While Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES) offers significant advancements in extractive summarization, it is important to acknowledge certain limitations inherent in our approach:

Computational complexity: The construction and processing of hypergraphs, especially with multiple types of hyperedges (semantic, narrative, and discourse), can be computationally intensive. This complexity might limit the scalability of MCHES for very large datasets or real-time applications.
Training data requirements: Like many advanced NLP models, MCHES requires a substantial number of training data to perform effectively. The requirement for large annotated datasets can be a barrier, particularly in specialized domains where such data may not be readily available.
Domain specificity: While MCHES has shown superior performance across various datasets, the model’s effectiveness might vary when applied to domain-specific texts not represented in the training data. Adaptation to new domains may require additional fine tuning.
Interpretability: The complexity of the hypergraph-based model and the dual-level attention mechanism may reduce the interpretability of the summarization process. Users might find it challenging to understand how specific sentences are selected for inclusion in the summary.
Resource-intensive nature: The model’s requirements for computational resources, including memory and processing power, are higher compared with simpler models. This could be a limiting factor for deployment in resource-constrained environments.

8. Conclusions

This study introduced Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES), a novel framework designed to tackle the challenges of extractive text summarization. Our comprehensive experiments and analyses, conducted across diverse datasets, demonstrate the superiority of MCHES over existing baseline models. The key contributions of our research can be summarized as follows:

The MCHES framework significantly outperforms baseline models in terms of ROUGE scores and BERTScore, affirming its effectiveness in capturing salient information and maintaining the semantic integrity of the original texts.
The ablation study revealed the importance of each component in the MCHES framework, illustrating their synergistic effect on the model’s overall performance.
Despite the complexity of the summarization task, particularly within specialized domains such as the biomedical literature, MCHES maintained a consistent level of high performance, underscoring its robustness and versatility.

The findings from this work not only set a new benchmark for extractive summarization tasks but also open up new avenues for future research. Potential directions include exploring the adaptability of MCHES to other NLP tasks, further refinement of its components to enhance efficiency, and extending the model to accommodate multilingual summarization. In conclusion, MCHES represents a significant stride forward in the domain of NLP, providing a powerful tool for the automatic generation of accurate and coherent summaries. Its development is a testament to the potential of integrating advanced neural network architectures and natural language understanding to create models that can truly grasp the essence of human-written texts.

Author Contributions

Conceptualization, A.O. and H.A.; methodology, A.O.; software, H.A.; validation, A.O. and H.A.; formal analysis, A.O.; writing—original draft preparation, A.O.; writing—review and editing, H.A.; visualization, A.O.; supervision, H.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Taif University Researchers Supporting Project number TURSP-HC2024/13, Taif University, Saudi Arabia.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, Y.; Lapata, M. Text summarization with pretrained encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 3730–3740. Available online: https://aclanthology.org/D19-1387 (accessed on 1 April 2024).
Moratanch, N.; Chitrakala, S. A survey on extractive text summarization. In Proceedings of the 2017 International Conference on Computer, Communication and Signal Processing (ICCCSP), IEEE, Chennai, India, 10–11 January 2017. [Google Scholar]
Gupta, V.; Lehal, G.S. A survey of text summarization extractive techniques. J. Emerg. Technol. Web Intell. 2010, 2, 258–268. [Google Scholar] [CrossRef]
El-Kassas, W.S.; Salama, C.R.; Rafea, A.A.; Mohamed, H.K. Automatic text summarization: A comprehensive survey. Expert Syst. Appl. 2021, 165, 113679. [Google Scholar] [CrossRef]
Mao, R.; Chen, G.; Zhang, X.; Guerin, F.; Cambria, E. Gpteval: A survey on assessments of ChatGPT and GPT-4. arXiv 2023, arXiv:2308.12488. [Google Scholar]
Yenduri, G.; Ramalingam, M.; Selvi, G.C.; Supriya, Y.; Srivastava, G.; Maddikunta, P.K.R.; Depti, R.G.; Rutvij, H.J.; Prabadevi, B.; Wang, W.; et al. GPT (Generative Pre-trained Transformer)–A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions. arXiv 2023, arXiv:2305.10435. [Google Scholar] [CrossRef]
Kalyan, K.S. A survey of GPT-3 family large language models including ChatGPT and GPT-4. Nat. Lang. Process. J. 2023, 6, 100048. [Google Scholar] [CrossRef]
Onan, A.; Balbal, K.F. Improving Turkish text sentiment classification through task-specific and universal transformations: An ensemble data augmentation approach. IEEE Access 2024, 12, 4413–4458. [Google Scholar] [CrossRef]
Nasution, A.H.; Onan, A. ChatGPT Label: Comparing the Quality of Human-Generated and LLM-Generated Annotations in Low-resource Language NLP Tasks. IEEE Access 2024, 12, 71876–71900. [Google Scholar] [CrossRef]
Yadav, A.K.; Ranvijay; Yadav, R.S.; Maurya, A.K. State-of-the-art approach to extractive text summarization: A comprehensive review. Multimed. Tools Appl. 2023, 82, 29135–29197. [Google Scholar] [CrossRef]
Jin, H.; Zhang, Y.; Meng, D.; Wang, J.; Tan, J. A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods. arXiv 2024, arXiv:2403.02901. [Google Scholar]
Van Lierde, H.; Chow, T.W. Query-oriented text summarization based on hypergraph transversals. Inf. Process. Manag. 2019, 56, 1317–1338. [Google Scholar] [CrossRef]
Wang, W.; Wei, F.; Li, W.; Li, S. Hypersum: Hypergraph based semi-supervised sentence ranking for query-oriented summarization. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, Hong Kong, China, 2–6 November 2009; pp. 1855–1858. [Google Scholar]
Zhang, H.; Liu, X.; Zhang, J. Hegel: Hypergraph transformer for long document summarization. arXiv 2022, arXiv:2210.04126. [Google Scholar]
Onan, A. GTR-GA: Harnessing the power of graph-based neural networks and genetic algorithms for text augmentation. Expert Syst. Appl. 2023, 232, 120908. [Google Scholar] [CrossRef]
Onan, A. Hierarchical graph-based text classification framework with contextual node embedding and BERT-based dynamic fusion. J. King Saud-Univ.-Comput. Inf. Sci. 2023, 35, 101610. [Google Scholar] [CrossRef]
Onan, A. SRL-ACO: A text augmentation framework based on semantic role labeling and ant colony optimization. J. King Saud-Univ.-Comput. Inf. Sci. 2023, 35, 101611. [Google Scholar] [CrossRef]
Gulati, V.; Kumar, D.; Popescu, D.E.; Hemanth, J.D. Extractive article summarization using integrated TextRank and BM25+ algorithm. Electronics 2023, 12, 372. [Google Scholar] [CrossRef]
Yadav, J.; Meena, Y.K. Use of fuzzy logic and WordNet for improving performance of extractive automatic text summarization. In Proceedings of the 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), IEEE, Jaipur, India, 21–24 September 2016; pp. 2071–2077. [Google Scholar] [CrossRef]
Kumar, A.; Sharma, A.; Nayyar, A. Fuzzy logic-based hybrid model for automatic extractive text summarization. In Proceedings of the 2020 5th International Conference on Intelligent Information Technology, Hanoi, Vietnam, 19–22 February 2020; pp. 7–15. [Google Scholar] [CrossRef]
Grail, Q.; Perez, J.; Gaussier, E. Globalizing BERT-based transformer architectures for long document summarization. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Association for Computational Linguistics, Virtual Event, 19–23 April 2021; pp. 1792–1810. Available online: https://aclanthology.org/2021.eacl-main.154 (accessed on 1 April 2024).
Bharathi Mohan, G.; Prasanna Kumar, R.; Parathasarathy, S.; Aravind, S.; Hanish, K.B.; Pavithria, G. Text summarization for big data analytics: A comprehensive review of GPT-2 and BERT approaches. In Data Analytics for Internet of Things Infrastructure; Springer: Berlin/Heidelberg, Germany, 2023; pp. 247–264. [Google Scholar]
Mallick, C.; Das, A.K.; Dutta, M.; Das, A.K.; Sarkar, A. Graph-based text summarization using modified TextRank. In Soft Computing in Data Analytics: Proceedings of International Conference on SCDA 2018; Springer: Singapore, 2019; pp. 137–146. [Google Scholar]
Erkan, G.; Radev, D.R. LexRank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 2004, 22, 457–479. [Google Scholar] [CrossRef]
El-Kassas, W.S.; Salama, C.R.; Rafea, A.A.; Mohamed, H.K. EdgeSumm: Graph-based framework for automatic text summarization. Inf. Process. Manag. 2020, 57, 102264. [Google Scholar] [CrossRef]
Belwal, R.C.; Rai, S.; Gupta, A. A new graph-based extractive text summarization using keywords or topic modeling. J. Ambient. Intell. Humaniz. Comput. 2021, 12, 8975–8990. [Google Scholar] [CrossRef]
Fatima, Q.; Cenek, M. New graph-based text summarization method. In Proceedings of the 2015 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM), IEEE, Victoria, BC, Canada, 24–26 August 2015; pp. 396–401. [Google Scholar]
Suleiman, D.; Awajan, A. Deep learning based abstractive text summarization: Approaches, datasets, evaluation measures, and challenges. Math. Probl. Eng. 2020, 2020, 9365340. [Google Scholar] [CrossRef]
Joshi, A.; Fidalgo, E.; Alegre, E.; de León, U. Deep learning based text summarization: Approaches, databases and evaluation measures. In Proceedings of the International Conference of Applications of Intelligent Systems, Las Palmas de Gran Canaria, Spain, 10–12 January 2018. [Google Scholar]
Song, S.; Huang, H.; Ruan, T. Abstractive text summarization using LSTM-CNN based deep learning. Multimed. Tools Appl. 2019, 78, 857–875. [Google Scholar] [CrossRef]
Zhang, M.; Zhou, G.; Yu, W.; Huang, N.; Liu, W. A comprehensive survey of abstractive text summarization based on deep learning. Comput. Intell. Neurosci. 2022, 2022, 7132226. [Google Scholar] [CrossRef] [PubMed]
Yu, Z.; Yu, J.; Fan, J.; Tao, D. Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1821–1830. [Google Scholar]
Ji, G.; Liu, K.; He, S.; Zhao, J. Distant supervision for relation extraction with sentence-level attention and entity descriptions. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 1. [Google Scholar]
Lin, C.Y.; Och, F.J. Looking for a few good metrics: ROUGE and its evaluation. In Proceedings of the Ntcir Workshop, Tokyo, Japan, 2–4 June 2004. [Google Scholar]
Barbella, M.; Tortora, G. Rouge Metric Evaluation for Text Summarization Techniques; SSRN 4120317; Elsevier: Rochester, NY, USA, 2022. [Google Scholar]
Hasan, T.; Bhattacharjee, A.; Islam, M.S.; Samin, K.; Li, Y.F.; Kang, Y.B.; Rahman, S.M.; Shahriyar, R. XL-sum: Large-scale multilingual abstractive summarization for 44 languages. arXiv 2021, arXiv:2106.13822. [Google Scholar]
Asif, M.H.; Yaseen, A.U. Comparative Evaluation of Text Similarity Matrices for Enhanced Abstractive Summarization on CNN/Dailymail Corpus. J. Comput. Biomed. Inform. 2023, 6, 208–215. [Google Scholar]
Gupta, V.; Bharti, P.; Nokhiz, P.; Karnick, H. SumPubMed: Summarization dataset of PubMed scientific articles. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop, Virtual Event, 1–6 August 2021; pp. 292–303. [Google Scholar]
Colombo, P.; Staerman, G.; Clavel, C.; Piantanida, P. Automatic text evaluation through the lens of Wasserstein barycenters. arXiv 2021, arXiv:2108.12463. [Google Scholar]
Zhao, W.; Peyrard, M.; Liu, F.; Gao, Y.; Meyer, C.M.; Eger, S. MoverScore: Text generation evaluating with contextualized embeddings and earth mover distance. arXiv 2019, arXiv:1909.02622. [Google Scholar]
Narayan, S.; Cohen, S.B.; Lapata, M. Don’t give me the details, just the summary! Topic-aware convolutional neural networks for extreme summarization. arXiv 2018, arXiv:1808.08745. [Google Scholar]
Zhou, Q.; Yang, N.; Wei, F.; Huang, S.; Zhou, M.; Zhao, T. Neural document summarization by jointly learning to score and select sentences. arXiv 2018, arXiv:1807.02305. [Google Scholar]
Dong, Y.; Shen, Y.; Crawford, E.; van Hoof, H.; Cheung, J.C.K. BanditSum: Extractive summarization as a contextual bandit. arXiv 2018, arXiv:1809.09672. [Google Scholar]
Xu, J.; Durrett, G. Neural extractive text summarization with syntactic compression. arXiv 2019, arXiv:1902.00863. [Google Scholar]
Zhang, X.; Wei, F.; Zhou, M. HIBERT: Document level pre-training of hierarchical bidirectional transformers for document summarization. arXiv 2019, arXiv:1905.06566. [Google Scholar]
Liu, Y.; Lapata, M. Text summarization with pretrained encoders. arXiv 2019, arXiv:1908.08345. [Google Scholar]
Kwon, J.; Kobayashi, N.; Kamigaito, H.; Okumura, M. Considering nested tree structure in sentence extractive summarization with pre-trained transformer. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Virtual and Punta Cana, Dominican Republic, 7–11 November 2021; pp. 4039–4044. [Google Scholar]
Zhong, M.; Liu, P.; Chen, Y.; Wang, D.; Qiu, X.; Huang, X. Extractive summarization as text matching. arXiv 2020, arXiv:2004.08795. [Google Scholar]
Liu, Y.; Zhang, J.G.; Wan, Y.; Xia, C.; He, L.; Yu, P.S. HETFORMER: Heterogeneous transformer with sparse attention for long-text extractive summarization. arXiv 2021, arXiv:2110.06388. [Google Scholar]
Bishop, J.; Xie, Q.; Ananiadou, S. GenCompareSum: A hybrid unsupervised summarization method using salience. In Proceedings of the 21st Workshop on Biomedical Language Processing, Dublin, Ireland, 26 May 2022; pp. 220–240. [Google Scholar]

Figure 1. The general structure of the MCHES framework.

Figure 2. Main effects plot for R-1 Scores.

Figure 3. Main effects plot for BERTScore values.

Figure 4. Main effects plot for MoverScore values.

Table 1. Summary of existing work in extractive summarization.

Reference	Technology Used	Performance/Advantages	Disadvantages
Liu and Lapata (2019) [1]	BERT-based Encoder–Decoder	High performance in capturing contextual nuances; improved ROUGE scores	Computationally expensive; requires large datasets for training
Moratanch and Chitrakala (2017) [2]	Statistical and linguistic features	Simple implementation; effective for basic tasks	Fail to capture deep semantic relationships; limited contextual understanding
Gupta and Lehal (2010) [3]	TF-IDF, frequency-based methods	Easy to implement; interpretable results	Do not capture document structure; poor performance on long texts
El-Kassas et al. (2021) [4]	Deep learning models (RNN, CNN, and Transformer)	Superior performance on complex texts; high accuracy	High computational cost; require extensive training data
Yadav et al. (2023) [10]	Graph-based models (TextRank and LexRank)	Capture relational information among sentences; improved coherence	May struggle with very large documents; computationally intensive
Van Lierde and Chow (2019) [12]	Hypergraph-based summarization	Flexible representation of relationships; better integration of document context	Complexity in construction and processing of hypergraphs; requires sophisticated algorithms

Table 2. Summary of datasets used in the experimental evaluation.

Dataset	Domain	# Pairs	Avg. Doc Tokens	Avg. Sum Tokens
XSum	News	226,711	431	23
CNN/DailyMail	News	311,971	781	56
PubMed	Scientific	93,207	3011	203

Table 3. ROUGE scores for the CNN/DailyMail dataset.

Model	R-1	R-2	R-L
LEAD-3	39.447	20.086	38.208
SummaRuNNer	34.862	17.258	33.516
NeuSum	37.573	18.106	33.968
BanditSum	37.898	18.287	33.986
JECS	37.956	19.051	35.499
HIBERT	38.378	20.029	35.806
BERTSUMEXT	39.638	20.161	38.269
NeRoBERTa	40.228	21.101	38.836
MatchSum	40.584	21.512	39.857
HAHSum	40.929	21.546	40.043
GenCompareSum	41.094	21.568	40.311
MCHES (proposed model)	44.756	24.963	42.477
MCHES-NoCHM	43.847	23.491	41.601
MCHES-NoHCA	42.236	21.629	40.616
MCHES-NoReadout	42.709	22.295	41.271
MCHES-BasicGraph	43.599	23.394	41.376

Table 4. ROUGE scores for the Xsum dataset.

Model	R-1	R-2	R-L
LEAD-3	25,512	7355	16,069
SummaRuNNer	23,194	5626	14,867
NeuSum	24,194	5699	15,056
BanditSum	24,577	6416	15,150
JECS	25,005	6709	15,332
HIBERT	25,145	6728	15,582
BERTSUMEXT	25,145	6747	15,631
NeRoBERTa	25,225	7011	15,723
MatchSum	25,431	7022	15,893
HAHSum	25,450	7239	15,901
GenCompareSum	25,593	7552	16,199
MCHES (proposed model)	26,422	8150	17,318
MCHES-NoCHM	25,775	7700	16,498
MCHES-NoHCA	26,059	7771	16,758
MCHES-NoReadout	26,174	7938	17,311
MCHES-BasicGraph	25,605	7576	16,260

Table 5. ROUGE scores for the PubMed dataset.

Model	R-1	R-2	R-L
LEAD-3	39,554	12,822	33,352
SummaRuNNer	37,949	12,759	32,537
NeuSum	40,496	13,909	33,353
BanditSum	40,601	13,987	33,983
JECS	41,120	14,095	34,129
HIBERT	41,152	14,410	34,186
BERTSUMEXT	41,161	14,964	34,207
NeRoBERTa	41,234	15,049	34,881
MatchSum	41,527	15,214	34,955
HAHSum	41,672	15,270	34,963
GenCompareSum	41,892	15,345	35,222
MCHES (proposed model)	44,321	19,129	37,505
MCHES-NoCHM	41,938	15,710	35,415
MCHES-NoHCA	42,148	16,605	35,683
MCHES-NoReadout	42,294	16,842	36,137
MCHES-BasicGraph	42,533	17,873	36,968

Table 6. BERTScore values for the CNN/DailyMail, Xsum, and PubMed datasets.

Model	CNN/DailyMail	Xsum	PubMed
LEAD-3	58,261	87,199	86,073
SummaRuNNer	56,060	84,004	81,392
NeuSum	56,576	84,882	82,460
BanditSum	56,830	85,567	82,784
JECS	56,952	86,273	83,453
HIBERT	57,421	86,491	83,570
BERTSUMEXT	57,880	86,581	84,281
NeRoBERTa	57,999	86,766	84,684
MatchSum	58,130	87,036	85,566
HAHSum	58,130	87,077	85,603
GenCompareSum	58,661	87,257	86,180
MCHES (proposed model)	59,995	88,424	89,285
MCHES-NoCHM	58,862	87,299	87,121
MCHES-NoHCA	59,094	87,784	87,342
MCHES-NoReadout	59,153	87,833	87,980
MCHES-BasicGraph	59,260	88,193	88,430

Table 7. MoverScore values for the CNN/DailyMail, Xsum, and PubMed datasets.

Model	CNN/DailyMail	Xsum	PubMed
LEAD-3	86,651	59,831	56,532
SummaRuNNer	83,318	58,486	51,685
NeuSum	84,648	58,577	52,471
BanditSum	84,878	58,757	52,739
JECS	85,281	58,961	53,570
HIBERT	85,396	59,286	54,607
BERTSUMEXT	85,397	59,429	54,971
NeRoBERTa	85,421	59,531	55,051
MatchSum	85,574	59,618	55,137
HAHSum	85,698	59,783	55,245
GenCompareSum	86,676	59,835	56,975
MCHES (proposed model)	87,432	60,549	59,739
MCHES-NoCHM	86,879	59,974	57,232
MCHES-NoHCA	86,882	60,101	57,643
MCHES-NoReadout	87,024	60,120	58,016
MCHES-BasicGraph	87,342	60,365	58,269

Table 8. Performance comparison between TF-IDF and MCHES framework.

Dataset	Algorithm	R-1	R-2	R-L	BERTScore
CNN/DailyMail	TF-IDF	30.450	15.210	28.370	50.120
CNN/DailyMail	MCHES (proposed)	44.756	24.963	42.477	59.995
XSum	TF-IDF	23.320	6.540	14.890	48.750
XSum	MCHES (proposed)	26.422	8.150	17.318	88.424
PubMed	TF-IDF	32.140	16.890	30.120	52.450
PubMed	MCHES (proposed)	44.321	19.129	37.505	89.285

Table 9. Manual evaluation results.

Criterion	HAHSum	GenCompareSum	MCHES
Coherence	3.5	3.8	4.2
Readability	3.7	4.0	4.3
Informativeness	3.4	3.6	4.1
Relevance	3.6	3.7	4.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Onan, A.; Alhumyani, H. Contextual Hypergraph Networks for Enhanced Extractive Summarization: Introducing Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES). Appl. Sci. 2024, 14, 4671. https://doi.org/10.3390/app14114671

AMA Style

Onan A, Alhumyani H. Contextual Hypergraph Networks for Enhanced Extractive Summarization: Introducing Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES). Applied Sciences. 2024; 14(11):4671. https://doi.org/10.3390/app14114671

Chicago/Turabian Style

Onan, Aytuğ, and Hesham Alhumyani. 2024. "Contextual Hypergraph Networks for Enhanced Extractive Summarization: Introducing Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES)" Applied Sciences 14, no. 11: 4671. https://doi.org/10.3390/app14114671

APA Style

Onan, A., & Alhumyani, H. (2024). Contextual Hypergraph Networks for Enhanced Extractive Summarization: Introducing Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES). Applied Sciences, 14(11), 4671. https://doi.org/10.3390/app14114671

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Contextual Hypergraph Networks for Enhanced Extractive Summarization: Introducing Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES)

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Contextual Hypergraph Construction

3.1.1. Hypergraph Definition

3.1.2. Node Representation

3.1.3. Types of Hyperedges

3.1.4. Hypergraph construction algorithm

3.2. Contextual Homogenization Module (CHM)

3.2.1. Multimodal Factorized Bilinear Pooling (MFB)

3.2.2. Feature Integration across Hyperedges

3.3. Hypergraph Contextual Attention Module (HCA)

3.3.1. Attention Mechanism

3.3.2. Gating Mechanism

3.4. Integration and Output

3.5. Extractive Read-Out Strategy

3.5.1. Attention Pooling Mechanism

3.5.2. Sentence Selection for Summary

3.5.3. Summary Compilation

3.5.4. Optimization of Selection Criteria

4. Optimization and Training

Optimization Algorithm

5. Experiments and Results

5.1. Datasets

5.2. Evaluation Metrics

5.3. Baseline Models

5.4. Experimental Settings

5.5. Model Variants

5.6. Experimental Results

5.7. Manual Evaluation

6. Discussion

7. Limitations

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI