HiSem-RAG: A Hierarchical Semantic-Driven Retrieval-Augmented Generation Method

Dongju Yang; Junming Wang

doi:10.3390/app16020903

and

Large-Scale Stream Data Integration and Analysis Technology Beijing Key Laboratory, School of Artificial Intelligence and Computer Science, North China University of Technology, No. 5 Jinyuanzhuang Road, Beijing 100144, China

^*

Author to whom correspondence should be addressed.

Appl. Sci.2026, 16(2), 903;https://doi.org/10.3390/app16020903

This article belongs to the Topic Challenges and Opportunities of Integrating Service Science with Data Science and Artificial Intelligence

Version Notes

Order Reprints

Abstract

Traditional retrieval-augmented generation (RAG) methods struggle with hierarchical documents, often causing semantic fragmentation, structural loss, and inefficient retrieval due to fixed strategies. To address these challenges, this paper proposes HiSem-RAG, a hierarchical semantic-driven RAG method. It comprises three key modules: (1) hierarchical semantic indexing, which preserves boundaries and relationships between sections and paragraphs to reconstruct document context; (2) a bidirectional semantic enhancement mechanism that incorporates titles and summaries to facilitate two-way information flow; and (3) a distribution-aware adaptive threshold strategy that dynamically adjusts retrieval scope based on similarity distributions, balancing accuracy with computational efficiency. On the domain-specific EleQA dataset, HiSem-RAG achieves 82.00% accuracy, outperforming HyDE and RAPTOR by 5.04% and 3.98%, respectively, with reduced computational costs. On the LongQA dataset, it attains a ROUGE-L score of 0.599 and a BERT_F1 score of 0.839. Ablation studies confirm the complementarity of these modules, particularly in long-document scenarios.

Keywords:

hierarchical document representation; semantic enhancement; adaptive threshold retrieval; long-form question answering; similarity distribution modeling; large language model

HiSem-RAG: A Hierarchical Semantic-Driven Retrieval-Augmented Generation Method

Abstract

Article Metrics

Citations

Article Access Statistics