GravRank: A Gravitational Extractive Preprocessing Framework for Abstractive Summarization of Long Documents †
Abstract
1. Introduction
- Interaction-Based Extractive Ranking: A global pairwise interaction model is introduced to compute sentence importance beyond independent scoring and graph-based centrality.
- Integrated Redundancy Control: A softened Plummer potential is embedded in the ranking formulation to suppress reinforcement among semantically similar sentences.
- Deterministic and Unsupervised Formulation: Sentence scores are computed in closed form without iterative propagation, supervised learning, or stochastic optimization.
- Hybrid Summarization Integration and Evaluation: The proposed extractive framework is integrated into a transformer-based hybrid pipeline and evaluated on long-document benchmarks.
2. Materials and Methods
2.1. Relationship to Kernel Density Estimation and Graph Centrality
2.2. Datasets and Preprocessing
2.3. Sentence Charge Computation
- Lexical significance
- b.
- Semantic centrality
- c.
- Residual semantic novelty score
- d.
- Deterministic feature weighting
- e.
- Feature Summary
2.4. Global Interaction Energy Modeling
- are the embeddings of sentences and ;
- are the sentence charges computed in Section 2.3;
- is a softening parameter that prevents singularities at small distances.
2.5. Sentence Selection
2.6. Abstractive Summarization Stage
2.7. Evaluation Protocol
3. Results
3.1. Extractive Summarization Performance (GravRank)
3.2. Statistical Analysis
3.3. Sensitivity Analysis of the Softening Parameter α
3.4. Ablation Study
3.5. Hybrid Summarization Performance
4. Discussion
4.1. Effect of Global Interaction Modeling
4.2. Redundancy Control and the Softened Potential
4.3. Determinism and Statistical Stability
4.4. Hybrid Summarization Behavior
4.5. Future Directions
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Shakil, H.; Ortiz, Z.; Forbes, G.C.; Kalita, J. Utilizing GPT to Enhance Text Summarization: A Strategy to Minimize Hallucinations. Procedia Comput. Sci. 2024, 244, 238–247. [Google Scholar] [CrossRef]
- Chen, X.; Chen, Z.; Cheng, S. CoTHSSum: Structured long-document summarization via chain-of-thought reasoning and hierarchical segmentation. J. King Saud Univ. Comput. Inf. Sci. 2025, 37, 40. [Google Scholar] [CrossRef]
- Wang, Y.; Zhang, J.; Yang, Z.; Wang, B.; Jin, J.; Liu, Y. Improving extractive summarization with semantic enhancement through topic-injection based BERT model. Inf. Process. Manag. 2024, 61, 103677. [Google Scholar] [CrossRef]
- Bashir, A.S.; Bichi, A.A.; Mahmud, U.; Bello, A.M. Long-Text Abstractive Summarization using Transformer Models: A Systematic Review. J. Braz. Comput. Soc. 2025, 31, 1264–1279. [Google Scholar] [CrossRef]
- Jain, D.; Borah, M.D.; Biswas, A. Summarization of Lengthy Legal Documents via Abstractive Dataset Building: An Extract-then-Assign Approach. Expert Syst. Appl. 2024, 237, 121571. [Google Scholar] [CrossRef]
- Koh, H.Y.; Ju, J.; Liu, M.; Pan, S. An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics. ACM Comput. Surv. 2022, 55, 154. [Google Scholar] [CrossRef]
- Kornilova, A.; Eidelman, V. BillSum: A Corpus for Automatic Summarization of US Legislation. In Proceedings of the 2nd Workshop on New Frontiers in Summarization; Wang, L., Cheung, J.C.K., Carenini, G., Liu, F., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 48–56. [Google Scholar] [CrossRef]
- Mihalcea, R.; Tarau, P. TextRank: Bringing Order into Text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing; Lin, D., Wu, D., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA; pp. 404–411. Available online: https://aclanthology.org/W04-3252/ (accessed on 2 June 2025).
- Erkan, G.; Radev, D.R. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. J. Artif. Intell. Res. 2004, 22, 457–479. [Google Scholar] [CrossRef]
- Joshi, A.; Fidalgo, E.; Alegre, E.; Alaiz-Rodriguez, R. RankSum An unsupervised extractive text summarization based on rank fusion. arXiv 2024, arXiv:2402.05976. [Google Scholar] [CrossRef]
- Zheng, H.; Lapata, M. Sentence Centrality Revisited for Unsupervised Summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; Korhonen, A., Traum, D., Màrquez, L., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 6236–6247. [Google Scholar] [CrossRef]
- Liu, Y.; Lapata, M. Text Summarization with Pretrained Encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP); Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 3728–3738. [Google Scholar] [CrossRef]
- Huang, Y.; Yu, Z.; Guo, J.; Xiang, Y.; Xian, Y. Element graph-augmented abstractive summarization for legal public opinion news with graph transformer. Neurocomputing 2021, 460, 166–180. [Google Scholar] [CrossRef]
- Cohan, A.; Dernoncourt, F.; Kim, D.S.; Bui, T.; Kim, S.; Chang, W.; Goharian, N. A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents. arXiv 2018, arXiv:1804.05685. [Google Scholar] [CrossRef]
- Saitoh, T.R.; Makino, J. A Natural Symmetrization for the Plummer Potential. New Astron. 2012, 17, 76–81. [Google Scholar] [CrossRef]
- Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. arXiv 2020, arXiv:1910.13461. [Google Scholar] [CrossRef]
- Lin, C.-Y. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out; Association for Computational Linguistics: Stroudsburg, PA, USA, 2004; pp. 74–81. Available online: https://aclanthology.org/W04-1013/ (accessed on 13 July 2025).
- Liang, X.; Li, J.; Wu, S.; Zeng, J.; Jiang, Y.; Li, M.; Li, Z. An Efficient Coarse-to-Fine Facet-Aware Unsupervised Summarization Framework based on Semantic Blocks. arXiv 2022, arXiv:2208.08253. [Google Scholar] [CrossRef]
- Zhang, H.; Liu, X.; Zhang, J. HEGEL: Hypergraph Transformer for Long Document Summarization. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing; Goldberg, Y., Kozareva, Z., Zhang, Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 10167–10176. [Google Scholar] [CrossRef]
- Gu, N.; Ash, E.; Hahnloser, R. MemSum: Extractive Summarization of Long Documents Using Multi-Step Episodic Markov Decision Processes. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Muresan, S., Nakov, P., Villavicencio, A., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 6507–6522. [Google Scholar] [CrossRef]
- Jain, D.; Borah, M.D.; Biswas, A. Summarization of legal documents: Where are we now and the way forward. Comput. Sci. Rev. 2021, 40, 100388. [Google Scholar] [CrossRef]
- Zhang, Y.; Ni, A.; Mao, Z.; Wu, C.H.; Zhu, C.; Deb, B.; Awadallah, A.H.; Radev, D.; Zhang, R. Summ^N: A Multi-Stage Summarization Framework for Long Input Dialogues and Documents. arXiv 2022, arXiv:2110.10150. [Google Scholar] [CrossRef]
- Xie, J.; Cheng, P.; Liang, X.; Dai, Y.; Du, N. Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 13500–13519. [Google Scholar] [CrossRef]
- Moro, G.; Ragazzi, L.; Valgimigli, L.; Frisoni, G.; Sartori, C.; Marfia, G. Efficient Memory-Enhanced Transformer for Long-Document Summarization in Low-Resource Regimes. Sensors 2023, 23, 3542. [Google Scholar] [CrossRef] [PubMed]
- An, C.; Zhong, M.; Geng, Z.; Yang, J.; Qiu, X. RetrievalSum: A Retrieval Enhanced Framework for Abstractive Summarization. arXiv 2021, arXiv:2109.07943. Available online: http://arxiv.org/abs/2109.07943 (accessed on 10 February 2024).
- Han, C.; Feng, J.; Qi, H. Topic model for long document extractive summarization with sentence-level features and dynamic memory unit. Expert Syst. Appl. 2024, 238, 121873. [Google Scholar] [CrossRef]
| Models | GovReport | BillSum | PubMed | Source | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| R-1 | R-2 | R-L | R-1 | R-2 | R-L | R-1 | R-2 | R-L | ||
| ORACLE | 74.87 | 49.02 | 72.48 | 65.24 | 47.09 | 58.81 | 55.05 | 27.48 | 38.66 | [18] |
| BASELINES | ||||||||||
| LEAD | 50.94 | 19.53 | 48.45 | 40.53 | 18.28 | 34.15 | 35.63 | 12.28 | 25.17 | [18] |
| LEXRANK | 40.16 | 8.85 | 37.65 | 34.39 | 10.05 | 28.93 | 39.19 | 15.87 | 34.53 | [18] |
| TEXTRANK (BERT) | 56.00 | 22.42 | 52.86 | 38.05 | 12.99 | 31.46 | 39.43 | 12.89 | 34.66 | [18] |
| RECENT UNSUPERVISED | ||||||||||
| PACSUM | 56.89 | 26.88 | 54.33 | 41.11 | 17.24 | 34.54 | 39.79 | 14.00 | 36.09 | [18] |
| C2F-FAR | 57.98 | 27.63 | 55.33 | 42.53 | 17.85 | 35.58 | 40.12 | 14.79 | 36.91 | [18] |
| HIPORANK | -- | -- | -- | -- | -- | -- | 43.58 | 17.00 | 39.31 | [19] |
| GRAVRANK | 58.08 | 23.03 | 52.97 | 43.58 | 21.42 | 34.51 | 43.21 | 16.73 | 38.65 | |
| SUPERVISED MODELS | ||||||||||
| HEGEL | -- | -- | -- | -- | -- | -- | 47.13 | 21.00 | 42.18 | [19] |
| HISTRUCT+ | -- | -- | -- | -- | -- | -- | 46.59 | 20.39 | 42.11 | [19] |
| DANCER-LSTM | -- | -- | -- | -- | -- | -- | 44.09 | 17.69 | 40.27 | [19] |
| SUMMARUNNER | -- | -- | -- | -- | -- | -- | 43.89 | 18.78 | 30.36 | [20] |
| NEUSUM | 58.94 | 25.38 | 55.80 | -- | -- | -- | 47.46 | 21.92 | 42.87 | [20] |
| MEMSUM | 59.43 | 28.60 | 56.69 | -- | -- | -- | 49.25 | 22.94 | 44.42 | [20] |
| LSTM WITH W2V | -- | -- | -- | 28.89 | 15.26 | 27.83 | -- | -- | -- | [21] |
| LSTM WITH GLOVE | -- | -- | -- | 29.46 | 15.51 | 28.24 | -- | -- | -- | [21] |
| SUMMN | 56.77 | 23.25 | 53.90 | -- | -- | -- | -- | -- | -- | [22] |
| Dataset | GravRank Mean (R-1) | 95% CI Lower | 95% CI Upper |
|---|---|---|---|
| BillSum | 43.45 | 42.75 | 44.14 |
| PubMed | 43.22 | 41.31 | 45.12 |
| GovReport | 58.08 | 57.20 | 58.95 |
| α Value | GovReport | BillSum | PubMed | ||||||
|---|---|---|---|---|---|---|---|---|---|
| R-1 | R-2 | R-L | R-1 | R-2 | R-L | R-1 | R-2 | R-L | |
| 0.05 | 57.12 | 21.85 | 51.94 | 42.11 | 19.84 | 33.62 | 41.96 | 15.98 | 37.82 |
| 0.10 | 57.74 | 22.46 | 52.51 | 42.93 | 20.63 | 34.17 | 42.74 | 16.41 | 38.24 |
| 0.20 | 58.08 | 23.03 | 52.97 | 43.58 | 21.42 | 34.51 | 43.21 | 16.73 | 38.65 |
| 0.40 | 57.91 | 22.81 | 52.73 | 43.12 | 21.01 | 34.29 | 42.98 | 16.55 | 38.44 |
| 0.80 | 57.46 | 22.12 | 52.05 | 42.47 | 20.18 | 33.74 | 42.21 | 16.02 | 37.91 |
| Configuration | BillSum (R-1) | PubMed (R-1) | GovReport (R-1) |
|---|---|---|---|
| Full GravRank | 43.58 | 43.21 | 58.08 |
| Lexical Only () | 40.22 | 35.14 | 53.56 |
| Centrality Only () | 39.71 | 40.73 | 54.19 |
| Novelty Only () | 35.86 | 29.90 | 38.17 |
| Minus | 42.21 | 41.88 | 57.20 |
| Minus | 42.36 | 40.78 | 56.05 |
| Minus | 42.34 | 41.32 | 57.38 |
| a | |||||
| Study | Model | R-1 | R-2 | R-L | BERTScore-F1 |
| † | BART | 51.66 | 20.13 | 24.22 | 68.02 |
| [23] | BARTbase | 51.72 | 19.37 | 23.11 | 64.12 |
| [24] | LED (Longformer Encoder–Decoder) | 59.42 | 26.53 | 56.63 | -- |
| [23] | SimCAS + BARTlarge | 59.30 | 25.95 | 27.07 | 68.17 |
| [2] | CoTHSSum | 42.56 | 21.54 | 24.36 | 76.12 |
| [2] | T5 | 27.26 | 8.24 | 18.61 | 50.78 |
| This study | GravRank + BART | 61.18 | 39.93 | 47.49 | 83.24 |
| b | |||||
| Study | Model | R-1 | R-2 | R-L | BERTScore-F1 |
| † | BART | 50.98 | 31.34 | 40.46 | 69.89 |
| [5] | ETAROUGE | 33.85 | 15.33 | 30.58 | 69.00 |
| [5] | ETABERTScore | 35.23 | 16.58 | 32.09 | 60.86 |
| [5] | Legal PEGASUS | 34.19 | 16.25 | 30.16 | 59.45 |
| [5] | BigBird-PEGASUS | 30.76 | 13.70 | 27.63 | 56.49 |
| [25] | Retrieval + BART | 56.26 | 34.90 | 52.51 | — |
| [25] | BART | 51.80 | 33.05 | 47.72 | — |
| This study | GravRank + BART | 52.55 | 32.32 | 46.45 | 71.01 |
| c | |||||
| Study | Model | R-1 | R-2 | R-L | BERTScore-F1 |
| † | BART | 44.87 | 22.64 | 41.90 | 70.89 |
| [26] | PEGASUS | 45.97 | 20.15 | 41.34 | -- |
| [26] | BigBird | 46.32 | 20.65 | 42.33 | -- |
| [26] | HEPOS | 45.80 | 18.61 | 40.69 | -- |
| [23] | BARTbase | 40.36 | 16.60 | 35.02 | 61.77 |
| [23] | SimCAS + BARTlarge | 48.65 | 21.40 | 44.14 | 66.52 |
| [2] | CoTHSSum | 52.14 | 28.46 | 36.45 | 77.36 |
| [2] | T5 | 44.65 | 21.54 | 40.65 | 65.48 |
| This study | GravRank + BART | 47.56 | 24.29 | 43.76 | 76.89 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Bashir, A.S.; Bichi, A.A.; Ado, A. GravRank: A Gravitational Extractive Preprocessing Framework for Abstractive Summarization of Long Documents. Eng. Proc. 2026, 124, 65. https://doi.org/10.3390/engproc2026124065
Bashir AS, Bichi AA, Ado A. GravRank: A Gravitational Extractive Preprocessing Framework for Abstractive Summarization of Long Documents. Engineering Proceedings. 2026; 124(1):65. https://doi.org/10.3390/engproc2026124065
Chicago/Turabian StyleBashir, Abubakar Salisu, Abdulkadir Abubakar Bichi, and Abubakar Ado. 2026. "GravRank: A Gravitational Extractive Preprocessing Framework for Abstractive Summarization of Long Documents" Engineering Proceedings 124, no. 1: 65. https://doi.org/10.3390/engproc2026124065
APA StyleBashir, A. S., Bichi, A. A., & Ado, A. (2026). GravRank: A Gravitational Extractive Preprocessing Framework for Abstractive Summarization of Long Documents. Engineering Proceedings, 124(1), 65. https://doi.org/10.3390/engproc2026124065

