RAP-RAG: A Retrieval-Augmented Generation Framework with Adaptive Retrieval Task Planning
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsAcronyms SLM and RAP are used in the Abstract without being defined.
There are multiple acronyms in the manuscript that are used without being defined.
Section 2 should start with a brief introduction on what will be discussed, and not immediately with a sub-section (or paragraph with title).
In line 182 the authors mention a pre-defined threshold. Please provide more information on this.
In section 4, please provide references and cite the datasets and baselines mentioned in subsections 4.1 and 4.2.
Comments on the Quality of English LanguageThere is a space missing in the Abstract in line 16.
Throughout the entire manuscript there are spaces missing between words and references. E.g. the authors use "scenarios[4]" instead of "scenarios [4]".
Author Response
Dear reviewer,
Thank you'very much for your comments and professional advice. These opinions help toimprove academic rigor of our article. Based on your suggestion and request, we have madecorrectedmodifications onthe revised:
Comments1: Acronyms SLM and RAP are used in the Abstract without being defined.
Response1: We define the acronyms SLM and RAP in the abstract. The acronym SLMs is defined in line 4, "suffer severe performance degradation when using small language models (SLMs)," and the acronym RAP is defined in line 6, "we propose Retrieval-Adaptive-Planning RAG (RAP-RAG)."
Comments2: There are multiple acronyms in the manuscript that are used without being defined.
Response2: Except for SLMs and RAP-RAG, which have been modified in the text, the remaining unexplained abbreviations such as GraphRAG and LightRAG are the names of RAG frameworks in other papers and are cited in lines 129, 133, and 136 of the text, respectively.
Comments3: Section 2 should start with a brief introduction on what will be discussed, and not immediately with a sub-section (or paragraph with title).
Response3: A brief supplement has been added to lines 100-105: "A growing body of research is exploring how to enhance the capabilities of language models through external knowledge retrieval, particularly in resource-constrained settings. This research encompasses three key areas: the rise of SLMs as effective alternatives to LLMs, the evolution of RAG frameworks, and the development of adaptive retrieval mechanisms that tailor search strategies to query characteristics. This section reviews representative progress in each area."
Comments4: In line 182 the authors mention a pre-defined threshold. Please provide more information on this.
Response4: Formula (1) has been modified to add the range of the threshold, the expression of the threshold has been modified in line 190, and the common range of the threshold has been added in line 192.
Comments5: In section 4, please provide references and cite the datasets and baselines mentioned in subsections 4.1 and 4.2.
Response5: References to the dataset have been added in lines 463-464, and references to the baseline model have been added in lines 491-504.
Comments6: There is a space missing in the Abstract in line 16.
Response6: Modified in summary.
Comments7: Throughout the entire manuscript there are spaces missing between words and references. E.g. the authors use "scenarios[4]" instead of "scenarios [4]".
Response7: The relevant parts of the full text have been modified according to the reviewer's request
Author Response File:  Author Response.pdf
 Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe paper presents a lightweight and novel RAG framework called “RAP-RAG”. The framework is primarily designed to overcome SLM’s limitations in reasoning capabilities in intensive RAG tasks. The framework consists of three components, namely weighted graph indexing, index-based retrieval and adaptive retrieval task planning to augment SLM’s limited reasoning capabilities. Authors claim and demonstrate that it performs better than equivalent RAG approaches with significant reduction in storage requirements.
Strengths:
- RAP-RAG achieves good improvement over competitors in accuracy vs storage size tradeoff.
- Adaptive retrieval is a novel approach to handling retrieval on the fly based on query complexity. I liked the approach of having a separate model make this decision and not unduly burden the generator with yet another task.
- The evaluation was done satisfactorily extensively against multiple lightweight RAG approaches with multiple SLMs. The comparison metrics utilized make sense within the context of the paper’s objectives and broader trends in literature.
- I also really liked the ablation experiments that demonstrated the significance of each constituent component of RAP-RAG. I appreciate that the authors established the importance of each of the components/strategies in RAP-RAG through well thought-out experiments.
- I appreciated the detailed captions of tables and figures. They are easy to understand and very successful in establishing what they are depicting/discussing.
- I liked the steps adopted for heterogenous graph indexing. They make sense in the context of what they are trying to accomplish. The steps are also explained in very easy to understand structure.
Weaknesses:
- Are the planner and the generator the same model in all experiments? I assume that they are, but I would have liked clarification in the text.
- Some metrics are compared in absolute terms while others are in percentages. This can make the result look confusing. For example, the authors mention line 552 that the base configuration of RAP-RAG achieves performance gains of 2-4% at the cost of “slightly higher” latency. However, if I look at Table 3, the “Local” variant sacrificed ~4.5% accuracy but was ~33% better than the base configuration in terms of latency. So, I do not agree with the authors’ conclusion in line 571 that base/full configuration strikes a “favorable” trade-off between accuracy and efficiency. Certainly, given that the 33% increase in latency is an increase of milliseconds, there are applications where the increase would not be a significant issue. But there are applications where it would not justify gaining ~4.5% more accuracy at the cost of such latency. Therefore, the authors should rewrite line 570-571 such that it does not come across as hasty generalization.
- The authors claim in the abstract that there is a ~20% reduction in storage size requirements compared to other approaches. I am looking at the results section and I do not see any particular result that justify the number 20%. In Figure 7, the authors show that RAP-RAG actually consumes more storage than MiniRAG. Even if I compare against LightRAG, the savings on storage size is closer to ~15% than it is to ~20%. Also, can the authors clarify what type of applications would it make sense to save a few MBs on storage?
Editorial comments:
The paper has a few grammatical issues. For example, line 166 needs to be corrected. Figure 2 should be made larger so that the text within the top and bottom boxes can be easily read. Table 1,2, and 3’s organization can be improved. Looking at a glance and without referring back to the text, the reader can be confused that “LiHuaWorld”, “MultiHop-RAG” and “Hybrid-SQuAD” are the names of the models.
Overall, my recommendation is to accept the manuscripts with minor revisions and rewrites; particularly in the results section.
Comments for author File:  Comments.pdf
 Comments.pdf
Author Response
Dear reviewer,
Thank you'very much for your comments and professional advice. These opinions help toimprove academic rigor of our article. Based on your suggestion and request, we have madecorrectedmodifications onthe revised.
Comments1: Are the planner and the generator the same model in all experiments? I assume that they are, but I would have liked clarification in the text.
Response1: Clarified in line 510: "In our research, the planner and generator are supported by the same SLMs."
Comments2: Some metrics are compared in absolute terms while others are in percentages. This can make the result look confusing. For example, the authors mention line 552 that the base configuration of RAP-RAG achieves performance gains of 2-4% at the cost of “slightly higher” latency. However, if I look at Table 3, the “Local” variant sacrificed ~4.5% accuracy but was ~33% better than the base configuration in terms of latency. So, I do not agree with the authors’ conclusion in line 571 that base/full configuration strikes a “favorable” trade-off between accuracy and efficiency. Certainly, given that the 33% increase in latency is an increase of milliseconds, there are applications where the increase would not be a significant issue. But there are applications where it would not justify gaining ~4.5% more accuracy at the cost of such latency. Therefore, the authors should rewrite line 570-571 such that it does not come across as hasty generalization.
Response2: Lines 581-587 have been re-described "Overall, these results demonstrate that the adaptive planning mechanism in RAP-RAG consistently improves accuracy across diverse settings. This improvement comes with a modest latency increase of typically two to four tenths of a second. The trade-off is justified in accuracy-sensitive applications, where even small gains of two to four percentage points can significantly impact user experience. For latency-critical scenarios, fixed strategies such as Local offer viable alternatives, albeit at the cost of reduced robustness."
Comments3: The authors claim in the abstract that there is a ~20% reduction in storage size requirements compared to other approaches. I am looking at the results section and I do not see any particular result that justify the number 20%. In Figure 7, the authors show that RAP-RAG actually consumes more storage than MiniRAG. Even if I compare against LightRAG, the savings on storage size is closer to ~15% than it is to ~20%. Also, can the authors clarify what type of applications would it make sense to save a few MBs on storage?
Response3: Revised the result to 15% and added a discussion of the significance of the storage savings in the Discussion section on lines 596-602. “While a 15% storage reduction may seem insignificant, it can be significant in large-scale or resource-constrained deployments. On mobile or IoT devices with limited storage, reducing index size by 15% can help improve battery life and enable the simultaneous deployment of multiple models. In a cloud environment serving millions of users, saving 4MB per instance can add up to terabytes of storage savings across the system. For private, on-premises RAG systems, smaller indexes can improve load times and reduce hardware requirements.”
Comments4: The paper has a few grammatical issues. For example, line 166 needs to be corrected. Figure 2 should be made larger so that the text within the top and bottom boxes can be easily read. Table 1,2, and 3’s organization can be improved. Looking at a glance and without referring back to the text, the reader can be confused that “LiHuaWorld”, “MultiHop-RAG” and “Hybrid-SQuAD” are the names of the models.
Response4: The paragraphs in lines 173-176 have been revised and rewritten, the size of Figure 2 has been adjusted, and the organization of Tables 1, 2, and 3 has been improved, with the dataset being presented in a separate column.
Author Response File:  Author Response.pdf
 Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe paper presents RAP-RAG, a novel Retrieval-Augmented Generation framework designed for both Small Language Models and Large Language Models. The framework is built on three core components:
1. A heterogeneous weighted graph index that integrates semantic similarity and structural connectivity to enhance retrieval quality.
2. A diverse set of retrieval methods (vector, local, dual, and topology-enhanced) that balance efficiency and reasoning power.
3. An adaptive planner that dynamically selects the most suitable retrieval strategy based on query features.
The proposed system demonstrates strong performance across multiple datasets (LiHuaWorld, MultiHop-RAG, Hybrid-SQuAD), outperforming existing baselines such as GraphRAG, LightRAG, and MiniRAG. The framework is particularly well-suited for resource-constrained environments, making it a valuable contribution to the field.
However, there are still some issues to be corrected/added:
1. The paper appears to be designed and evaluated solely for English-language datasets. There is no mention of multilingual support or testing. I recommend the authors clarify this limitation and consider evaluating RAP-RAG on multilingual corpora to assess generalizability.
2. In line 104, the authors list several SLMs but omit DeepSeek-R1, which is a competitive and efficient reasoning model. Including DeepSeek-R1 in the comparison would strengthen the evaluation and relevance of the framework.
3. The term SLM (Small Language Model) is used in the abstract (line 6) and introduction (line 32) without definition. Please define it at first mention.
4. There are missing spaces when abbreviations are introduced in parentheses, e.g., “Small Language Models(SLMs)” should be “Small Language Models (SLMs)”.
5. Equations
Equation 1: Variables ej and vj are used but not defined in the surrounding text.
Equation 2: The set Pk is not explained, and the index k = 1, 2,... is missing. Similarly, m should be clarified as well.
Equation 3: Variables pi and pj are not mentioned in the text. Line 194 refers to ai and aj, but these are not used in the equations.
Equation 6: Retrieval methods should be referred to using the same abbreviations (VR, LR, DR, TR) consistently.
Equations 12 and 18: The function K(.) is used but not defined.
Line 331: The sentence “eh is the tail entity” should be corrected to “et is the tail entity”.
Equation 24: The variable w(r) is referenced but not defined in the equation list, although w(r1) and w(r2) are used.
Author Response
Dear reviewer,
Thank you'very much for your comments and professional advice. These opinions help toimprove academic rigor of our article. Based on your suggestion and request, we have madecorrectedmodifications onthe revised.
Comments1: The paper appears to be designed and evaluated solely for English-language datasets. There is no mention of multilingual support or testing. I recommend the authors clarify this limitation and consider evaluating RAP-RAG on multilingual corpora to assess generalizability.
Response1: Due to data quality issues, there are no suitable Chinese datasets to support the experiment. This has been further clarified in the discussion section in lines 614-615: "Furthermore, although our evaluation covers three representative datasets, they are all English datasets due to quality issues, and their applicability on multilingual datasets requires further verification. Furthermore, more extensive testing in professional fields such as biomedicine and law is needed to verify its universality."
Comments2: In line 104, the authors list several SLMs but omit DeepSeek-R1, which is a competitive and efficient reasoning model. Including DeepSeek-R1 in the comparison would strengthen the evaluation and relevance of the framework.
Response2: 1. The experimental design of this paper referenced existing work. Most articles use GPT4o as the baseline model, so GPT4o was chosen for comparison. 2. This paper conducted experiments based on SLMs. If a DeepSeek model was used, its distilled version was used. The distilled version of the DeepSeek-R1 model is based on the Qwen2.5 series. Therefore, the SLMs in this experiment include Qwen2.5-1.5B, which is representative of the DeepSeek-R1 model to a certain extent.
Comments3: The term SLM (Small Language Model) is used in the abstract (line 6) and introduction (line 32) without definition. Please define it at first mention.
Response3: The acronym SLMs is defined in line 4. "suffer severe performance degradation when using small language models (SLMs)"
Comments4: There are missing spaces when abbreviations are introduced in parentheses, e.g., “Small Language Models(SLMs)” should be “Small Language Models (SLMs)”.
Response4: The full text has been reviewed and edited
Comments5: 
Equation 1: Variables ej and vj are used but not defined in the surrounding text.
Equation 2: The set Pk is not explained, and the index k = 1, 2,... is missing. Similarly, m should be clarified as well.
Equation 3: Variables pi and pj are not mentioned in the text. Line 194 refers to ai and aj, but these are not used in the equations.
Equation 6: Retrieval methods should be referred to using the same abbreviations (VR, LR, DR, TR) consistently.
Equations 12 and 18: The function K(.) is used but not defined.
Line 331: The sentence “eh is the tail entity” should be corrected to “et is the tail entity”.
Equation 24: The variable w(r) is referenced but not defined in the equation list, although w(r1) and w(r2) are used.
Response5: 
1. Formula 1 has been rewritten with the variable definition "Where $e_j$ and $v_j$ represent existing entities in the knowledge base";
2. Formula 2 has been rewritten and explained in lines 198-200.
3. The variable typing errors in Formula 3 have been corrected, with ai and aj replacing pi and pj.
4. The abbreviation references in Formula 6 have been revised.
5. Formula 12 has been defined with the function k. Line 275 states "Here, $\mathcal{K}(\cdot)$ is a keyword extraction function."
6. Line 341 has been revised.
7. Formula 24 has been revised, with the definition "Where $w(\cdot)$ denotes the unified weight of the relation" in line 348.
Author Response File:  Author Response.pdf
 Author Response.pdf
Round 2
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors have appropriately addressed all my comments.
 
        




