Design and Evaluation of Knowledge-Distilled LLM for Improving the Efficiency of School Administrative Document Processing
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsPlease refer to the attachment.
Comments for author File: Comments.pdf
Author Response
Response to Reviewer 1 Comments
|
||
1. Summary |
|
|
Thank you very much for taking the time to review this manuscript. Please find detailed responses below, and any revisions/corrections are marked in red in the resubmitted file.
|
||
2. Questions for General Evaluation |
Reviewer’s Evaluation |
Response and Revisions |
Does the introduction provide sufficient background and include all relevant references? |
Must be improved |
References revised.
|
Are all the cited references relevant to the research? |
Must be improved |
References supplemented and relevance strengthened. Lines 217-253, 286-308, 336-372 revised. |
Is the research design appropriate? |
Must be improved |
Structuring the pipeline
|
Are the methods adequately described? |
Must be improved |
Add qualitative assessment to strengthen methodology Revise lines 385-412
|
Are the results clearly presented? |
Must be improved |
Strengthen the results section
|
Are the conclusions supported by the results? |
Must be improved |
Strengthen the conclusion section Supplement lines 450-459
|
3. Point-by-point response to Comments and Suggestions for Authors |
||
Comments 1: In relation to Table 2, which outlines the knowledge distillation pipeline, we recommend that the authors include relevant works and literature sources. This addition would enhance the academic rigor of this section. Without incorporating these sources, there is a risk that the content may seem artificially generated rather than grounded in established research. In summary, some descriptions (e.g., Table 2 - Knowledge Distillation Pipeline Overview) come across as an AI-generated summary and lack explicit connections to prior literature.
|
||
Response 1: Thank you for pointing this out. I sincerely appreciate you pointing this out. I agree with all the content. I have provided references for the research methodology. The pipeline was constructed based on the references and was carried out according to the pipeline. I have specifically revised the figures to make them more detailed. I worked diligently to create this document over a period of more than two months through trial and error. ※ line 217-223
|
||
Comments 2: The methodology section is detailed but complex, providing limited conceptual explanations for non-specialist readers. Please offer clearer narrative descriptions and link the pipeline steps to the cited works. |
||
Response 2: Thank you for your feedback. As you suggested, I have clarified the explanation to better align with the flow and incorporated the cited studies into the pipeline. ※ line 224-253 |
||
Comments 3: Certain sections are lacking in academic rigor and do not have adequate citation support. |
||
Response 3: Thank you for pointing that out. I have added the reference you mentioned.(References 28-37 added)
|
||
Comments 4: I have some questions about Table 4 regarding how the authors derived their evaluation results, specifically the Token Accuracy of 91.33%. They mention that this figure is based on "30 public documents," but it's unclear what these 30 documents are. |
||
Response 4: We have incorporated the suggested accuracy calculation formula for the token. Additionally, we have excerpted the original documents and the proposed generated documents from the 30 base documents and uploaded them as images. These are Public Documents used in School Administration. ※ line 286-309, line 385-400
|
||
Comments 5: The evaluation relies on just 30 public documents, which is a very small sample size to validate the robustness and reliability of the proposed OPLLM-SA model. This limited dataset raises concerns about overfitting and restricts the ability to generalize the findings to wider administrative or multilingual contexts. |
||
Response 5: I agree with your feedback. However, this study selected 30 diverse official documents to maximize effectiveness with minimal resources, specifically to verify feasibility on low-spec computers in on-premises environments. The rationale and justification for this decision are included in Section 4.2. Furthermore, the limitations of the study you mentioned have been incorporated into the conclusion section.
|
||
Comments 6: The evaluation primarily contrasts the OP-LLM-SA student model with its corresponding teacher model and a vanilla student version. However, the study does not provide benchmarking against established lightweight LLM compression techniques or alternative document automation systems, which limits the ability to position the proposed model within the broader landscape of existing approaches. |
||
Response 6: I agree with your feedback. This study does not provide benchmarking against alternative document automation systems. This limitation arises because document automation systems are not yet implemented in School Administration. Future research will incorporate such systems if available.
|
||
Comments 7: Although the paper briefly acknowledges personal data protection and cites compliance with the Korean Personal Information Protection Act, its discussion of ethical guidelines, potential risks of data leakage, and corresponding mitigation strategies remains underdeveloped. Moreover, while the use of ChatGPT-4.0 during manuscript preparation is disclosed, the authors do not sufficiently elaborate on how issues such as bias, hallucination, or accountability are addressed, which raises concerns regarding transparency and responsible AI deployment. |
||
Response 7: Thank you for your feedback. Through this research, we determined that if conducted in an on-premises environment, School Administration would be thoroughly managed in terms of personal information, thereby resolving personal information issues. We judged that public institutions in South Korea strictly manage personal information, so if utilized within an internal environment not exposed externally, it would be safe. We reiterated this point in the conclusion.
|
||
Comments 8: The performance evaluation relies predominantly on quantitative metrics such as BLEU, ROUGE, BERTScore, token accuracy, and sentence naturalness. While these indicators report relatively high values, the paper offers limited qualitative error analysis or critical discussion of failure cases, particularly in handling long, complex, or ambiguous documents, which restricts a comprehensive understanding of the model's limitations. |
||
Response 8: I agree with your opinion. To address this point, we conducted an additional qualitative evaluation and have incorporated its findings. We performed a qualitative assessment with six individuals who draft administrative documents, and it received high marks. However, limitations such as the inability to use graphics were identified, and we will address these in subsequent research.
|
||
Comments 9: Please note that minor revisions are required for rows 204-212. Kindly change the numbering from CD~@ to (1)~(4). |
||
Response 9: Thank you for pointing that out. I have completed the revisions as you requested.
|
||
Comments 10: Please use the standard citation format required by this journal. |
||
Response 10: Thank you for your feedback. I have completed the revisions to the references as you requested.
|
Reviewer 2 Report
Comments and Suggestions for Authors This study focuses on a paper tool for handling school administrative documents, a highly practical research direction. It's hoped the authors can anonymize their findings for open-sourcing, enabling the open-source community to optimize it for multiple languages, not just Korean. The authors have calculated several evaluation metrics, including text mining and LLM performance-related ones, but the study remains at the traditional LLM research level, without deeply delving into specific administrative scenarios. For instance, the research conclusions underrepresent the need to fill out various formats of forms or generate more detailed content from outlines or summaries of text. The study claims its knowledge distillation-based model is better suited for education administrative document processing than publicly available large models. However, besides privacy and security, the authors could compare their method with publicly available large models to demonstrate tangible improvements. Administrative documents in higher education institutions often involve repetitive tasks in relatively fixed processing scenarios. For example, filling out forms in various formats where basic information already exists but needs reformatting is a classic document processing need. Similarly, generating detailed content from outlines or summarizing text are common tasks. Yet, the study doesn't sufficiently explain the reasons behind the performance differences of different LLMs in these tasks, making it unclear which models or architectures are better suited for specific tasks in this domain. Regarding specific details, what tool was used to convert the school administration documents to PDF files? (Hong, p. 7, line 223) Could you include a diagram of the architecture with the learning server and inference server, and briefly explain their roles? (Hong, p. 8, line 258) The study includes text mining performance evaluations but doesn't provide the relevant formulas. Please supplement these formulas. (Hong, p. 8, line 265) The study includes performance evaluations of LLMs but doesn't provide the relevant formulas. Please supplement these formulas. (Hong, p. 9, line 308) Could you briefly explain why the 3B parameter size of the Llama-3.2-instruct model was chosen? (Hong, p. 10, line 328)Author Response
Response to Reviewer 2 Comments
|
||
1. Summary |
|
|
Thank you very much for reviewing this manuscript. Please find detailed responses below, and any revisions/corrections are marked in red in the resubmitted file. |
||
2. Questions for General Evaluation |
Reviewer’s Evaluation |
Response and Revisions |
Does the introduction provide sufficient background and include all relevant references? |
Yes |
References revised.
|
Are all the cited references relevant to the research? |
Can be improved |
References supplemented and relevance strengthened. Lines 217-253, 286-308, 336-372 revised.
|
Is the research design appropriate? |
Can be improved |
Structuring the pipeline
|
Are the methods adequately described? |
Can be improved |
Add qualitative assessment to strengthen methodology Revise lines 385-412
|
Are the results clearly presented? |
Can be improved |
Strengthen the results section
|
Are the conclusions supported by the results? |
Can be improved |
Strengthen the conclusion section Supplement lines 450-459
|
3. Point-by-point response to Comments and Suggestions for Authors |
||
Comments 1: Regarding specific details, what tool was used to convert the school administration documents to PDF files? (Hong, p. 7, line 223) |
||
Response 1: Thank you for your feedback. I used the program included with Hangul and Computer 2002.
|
||
Comments 2: Could you include a diagram of the architecture with the learning server and inference server, and briefly explain their roles? (Hong, p. 8, line 258) |
||
Response 2: Thank you for your feedback. We have revised the architecture in Figure 2 to include both the training server and Inference server, and have provided detailed supplementary information. ※ line 224-253 |
||
Comments 3: The study includes text mining performance evaluations but doesn't provide the relevant formulas. Please supplement these formulas. (Hong, p. 8, line 265) |
||
Response 3: Thank you for your feedback. Regarding text mining performance evaluation, we have provided relevant formulas and references, including Token Accuracy. ※ line 286-308
|
||
Comments 4: The study includes performance evaluations of LLMs but doesn't provide the relevant formulas. Please supplement these formulas. (Hong, p. 9, line 308) cuments are. |
||
Response 4: Thank you for your feedback. We have compiled and presented the formulas and details for metrics such as BLEU, ROUGE, and BERTScore, which are used in LLM performance evaluation. ※ line 335-372
|
||
Comments 5: Could you briefly explain why the 3B parameter size of the Llama-3.2-instruct model was chosen? (Hong, p. 10, line 328) |
||
Response 5: The 3B variant of the Llama-3.2-instruct model was selected for comparison with similar models because it applies a parameter reduction method and the proposed model is the lightweight Llama-3.2-Korean-Blossom.
|
||
|
||
|
||
|
||
|
||
|
Reviewer 3 Report
Comments and Suggestions for AuthorsThis is a well-structured and highly relevant study that addresses a significant real-world problem: the administrative burden on teachers in public schools due to extensive document processing. The proposed solution, OP-LLM-SA, is both innovative and practical. The use of knowledge distillation to create a lightweight, on-premise LLM is an excellent approach that directly addresses the critical data privacy and security constraints of public institutions, a point that is often overlooked in cloud-centric AI research.
The methodology is sound, following a clear and logical four-step pipeline from data preprocessing to student model evaluation. The performance evaluation is comprehensive, covering text generation quality (Token Accuracy, Sentence Naturalness), system efficiency (CPU/GPU/RAM usage), and comparative LLM performance (BLEU, ROUGE, BERT_Score). The results are impressive and strongly support the conclusion that the proposed model is not only effective but also feasible for deployment on standard office hardware. The paper is well-written and makes a valuable contribution to the field of applied NLP and AI in education.
Major Issues:
There are no major issues with this manuscript. The research is well-conceived, executed, and presented.
Minor Issues and Suggestions for Enhancement:
While the study is strong, a few minor additions could further enhance its impact and completeness:
-
Details on the Training Dataset: The paper mentions using "30 public documents" as the basis for the dataset. While the results are excellent, this number seems small. It would be beneficial to provide more detail on the dataset's scale. For example, how many instruction-output pairs were ultimately generated from these documents to fine-tune the teacher model and distill knowledge for the student model? Clarifying the size of the distilled dataset would add more context to the model's impressive performance.
-
Inclusion of Human Evaluation: The evaluation relies on automatic metrics (BLEU, ROUGE, etc.), which are standard and appropriate. However, given the practical application-oriented nature of this work, a small-scale human evaluation would be a powerful addition. For instance, asking a group of teachers to rate the quality, relevance, and usability of the documents generated by OP-LLM-SA compared to the vanilla student model could provide compelling qualitative evidence of the model's real-world value.
-
Discussion on Teacher Adoption and Usability: The paper successfully demonstrates the technical feasibility of the system. The discussion could be strengthened by briefly touching upon the human factors crucial for its successful implementation. The ultimate success of such a tool depends on teachers' willingness to adopt it. Citing literature that explores teachers' attitudes toward new technologies in education would provide a more holistic perspective on the path from technical validation to practical impact.
-
Recommended Citation: To enrich this discussion, consider citing research that examines the factors influencing technology acceptance in education. For example, the following paper explores teachers' attitudes and intentions towards using intelligent devices, which is highly relevant to the potential adoption of your proposed system.
-
Enhancing education quality: Exploring teachers' attitudes and intentions towards intelligent MR devices. European Journal of Education, 59(2), e12692.
https://doi.org/10.1111/ejed.12692
-
-
Author Response
Response to Reviewer 3 Comments
|
|||||||||||||||||||||||
1. Summary |
|
|
|||||||||||||||||||||
Thank you very much for taking the time to review this manuscript. Please find detailed responses below, and any revisions/corrections are marked in red in the resubmitted file.
|
|||||||||||||||||||||||
3. Point-by-point response to Comments and Suggestions for Authors |
|||||||||||||||||||||||
Comments 1: Details on the Training Dataset:The paper mentions using "30 public documents" as the basis for the dataset. While the results are excellent, this number seems small. It would be beneficial to provide more detail on the dataset's scale. For example, how many instruction-output pairs were ultimately generated from these documents to fine-tune the teacher model and distill knowledge for the student model? Clarifying the size of the distilled dataset would add more context to the model's impressive performance.
|
|||||||||||||||||||||||
Response 1: Thank you for pointing this out. I agree with your feedback. However, this study selected 30 diverse official documents to maximize effectiveness with minimal resources, specifically to verify feasibility on low-spec computers in on-premises environments. The rationale and justification for this decision are included in Section 4.2. Furthermore, the limitations of the study you mentioned are addressed in the conclusion section.
|
|||||||||||||||||||||||
Comments 2: Inclusion of Human Evaluation:The evaluation relies on automatic metrics (BLEU, ROUGE, etc.), which are standard and appropriate. However, given the practical application-oriented nature of this work, a small-scale human evaluation would be a powerful addition. For instance, asking a group of teachers to rate the quality, relevance, and usability of the documents generated by OP-LLM-SA compared to the vanilla student model could provide compelling qualitative evidence of the model's real-world value.
|
|||||||||||||||||||||||
Response 2: I agree with your feedback. To evaluate quality, we conducted FGI for each model based on existing documents and generated documents. The results concluded that the documents from the proposed model can be utilized with minimal modification.
|
|||||||||||||||||||||||
Comments 3: Discussion on Teacher Adoption and Usability:The paper successfully demonstrates the technical feasibility of the system. The discussion could be strengthened by briefly touching upon the human factors crucial for its successful implementation. The ultimate success of such a tool depends on teachers' willingness to adopt it. Citing literature that explores teachers' attitudes toward new technologies in education would provide a more holistic perspective on the path from technical validation to practical impact. Recommended Citation:To enrich this discussion, consider citing research that examines the factors influencing technology acceptance in education. For example, the following paper explores teachers' attitudes and intentions towards using intelligent devices, which is highly relevant to the potential adoption of your proposed system. Enhancing education quality: Exploring teachers' attitudes and intentions towards intelligent MR devices.European Journal of Education, 59(2), e12692. https://doi.org/10.1111/ejed.12692 |
|||||||||||||||||||||||
Response 3: I agree with your opinion. I have incorporated your opinion into the citation to strive for a more substantive study. Your paper has been cited as reference 37. Based on this, I conducted the teachers' qualitative evaluation.
|
|||||||||||||||||||||||
|
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsPlease refer to the attachment.
Comments for author File: Comments.pdf
Author Response
Comments 1: Please ensure that all equations in the manuscript are sequentially numbered, and that the corresponding numbers are consistently cited within the main text to facilitate clarity, precision, and ease of reference for readers.
Response 1: Thank you for pointing this out. I sincerely appreciate you pointing this out. I agree with all the content. All formulas in the manuscript have been assigned consecutive numbers, and the corresponding citations in the main text have been reorganized.
※ line 288-368
Comments 2: Some sections of the methodology (e.g., knowledge distillation pipeline and LoRA integration) use highly technical language but lack illustrative detail for non-specialist readers. Adding clearer explanations, step-by-step examples, or simplified flow diagrams would make the model design more accessible. For instance, Figure 2 could be accompanied by a concise narrative that walks the reader through each stage of the pipeline.\
Response 2: Thank you for your feedback.We have made efforts to simplify the explanation for readers based on your feedback.
Comments 3: Tables (e.g., Table 4 and Table 6) present valuable performance metrics, but the results discussion is somewhat repetitive. A synthesized comparative table or figure showing improvements across different models (Teacher, Vanilla Student, OP-LLM-SA) would help highlight the contribution moreclearly and reduce redundancy.
Response 3: Thank you for pointing that out.I have given this careful consideration based on your feedback. While I diligently revised the content—Table 4 for token values and Table 6 for comparative metrics—I found it challenging to present them comprehensively in the new format, so I left them as is. I apologize.
Comments 4: The limitations are acknowledged, but the discussion is brief. Expanding this section slightly ―for example, by noting potential bias from domain specific data or the restricted generalizability to non-Korean administrative contexts― would show stronger critical reflection and improve the balance of the conclusion.
Response 4: Thank you for your feedback. We have incorporated your suggestions and included a section on limitations, such as the bias inherent in domain-specific data, in the conclusion.
Comments 5: The evaluation primarily concentrates on metrics related to accuracy and system efficiency, such as token accuracy, BLEU, ROUGE, and GPU usage. However, it fails to adequately consider real-time processing speed or scalability. Given that document generation latency and throughput are crucial for practical implementation in school or government settings, the lack of benchmarks for speed and an analysis of scalability restricts the overall assessment of the system's practical applicability.
Response 5: Your feedback is correct. However, I would appreciate it if you could understand this as foundational research that identifies issues and makes proposals regarding the necessary aspects of the Korean domain. We will strengthen this in subsequent research.
Comments 6: Although the paper emphasizes that on-premise deployment protects personal information, it assumes that privacy is "unconditionally and perfectly protected." This overlooks realistic risks of unexpected data leakage (e.g., misconfigurations, insider threats, or inference attacks). The absence of mitigation strategies, encryption protocols, or ethical safeguards weakens the study's credibility in addressing security concerns.
Response 6: I agree with your feedback.Thank you for your feedback. While Korean schools are strengthening their compliance with personal information laws and regulations, and their internal computer networks are strictly segregated, we believe this should be sufficient. However, we acknowledge there may be limitations in addressing unforeseen information security issues. We will note this as a limitation in our conclusion.
Comments 7: The study evaluates the model using only 30 official documents, which is a relatively small dataset for validating performance claims. While the authors argue that this choice is feasible for low-specification computers, the limited sample raises concerns about the generalizability of the results. Incorporating larger and more diverse datasets―such as documents from different types, institutions, or languages― would enhance the reliability and robustness of the findings.
Response 7: Thank you for your feedback. We increased the dataset to 80 items to evaluate the model and enhance its reliability, reflecting your feedback.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe author has added a lot of content based on the previous comments, but not all issues have been addressed.
The paper format is still not standardized. All formulas lack numbers, and the symbol explanations after the formulas should be written according to the text.
Llama-3.2's Korean language support model demonstrated superior performance compared to other LLMs. However, We need the exact comparative data.
The parameters used in the model training process were not elaborated sufficiently, particularly those related to LoRA.
Line 407 has an obvious spelling error “CHAP GPT” (While there are concerns about the leakage of terms used in schools when using CHAT GPT).
Please revise the document more carefully.
Author Response
The author has added a lot of content based on the previous comments, but not all issues have been addressed.
The paper format is still not standardized. All formulas lack numbers, and the symbol explanations after the formulas should be written according to the text.
Llama-3.2's Korean language support model demonstrated superior performance compared to other LLMs. However, We need the exact comparative data.
The parameters used in the model training process were not elaborated sufficiently, particularly those related to LoRA.
Line 407 has an obvious spelling error “CHAP GPT” (While there are concerns about the leakage of terms used in schools when using CHAT GPT).
Response 1: Thank you for your feedback.All formulas have been renumbered. Thank you.
Once again, I have organized the data for accurate comparison. Thank you.
During the model training process, I have reorganized the parameters and aligned them with the formulas.
Finally, I have also addressed the issue you pointed out on line 407. Thank you.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe article presents a study on the development and evaluation of OP-LLM-SA, a knowledge-distilled lightweight large language model designed to enhance the efficiency of school administrative document processing. The model, built on an on-premise AI system, was tested with 30 public documents, achieving a token accuracy of 91.33% and a complete sentence rate of 98.53%. Utilizing Llama-3.2's Korean language support, the model operates efficiently on general office computers with a GPU memory requirement of approximately 4.5 GB. The study highlights its potential to alleviate the administrative burden on teachers in South Korea by enabling real-time document creation and processing within a secure on-premise environment, while adhering to personal information protection regulations.
The study offers several strengths. Firstly, it addresses a critical need by focusing on the automation of public document processing, a significant burden for teachers, thereby enhancing practical applicability. Secondly, the use of knowledge distillation to create a lightweight model is innovative, allowing deployment on low-end servers with reduced costs and power consumption. Thirdly, the emphasis on an on-premise system ensures compliance with South Korea's strict personal information protection laws, adding a layer of security and trustworthiness. The high performance metrics and the model's adaptability to school CPU environments further underscore its potential impact.
Improvement Suggestions
-
Enhance Comparative Analysis: The study lacks a detailed comparison with existing on-premise LLMs or similar systems. Including a comparative analysis would better highlight the OP-LLM-SA's unique contributions. For reference, consider Zou, Y., Xu, Z., Zhang, Q., et al. (2025). "Few-Shot Learning With Manifold-Enhanced LLM for Handling Anomalous Perception Inputs in Autonomous Driving" (IEEE Transactions on Intelligent Transportation Systems), which demonstrates advanced model adaptability that could inform this comparison.
-
Expand Dataset Scope: The evaluation is based on only 30 public documents, which may not fully reflect the model's scalability. Test the model with a broader range of document types and larger datasets to assess its robustness across varied administrative contexts.
-
Detail Future Research Plans: The future research section is brief and lacks specific methodologies for expanding text mining applications. Provide concrete plans, such as integrating the model with other administrative tasks, to strengthen the study's forward-looking impact.
-
Improve Performance Metrics Clarity: The token accuracy (91.33%) and complete sentence rate (98.53%) are promising, but the paper should clarify how these metrics were calculated and their relevance to real-world administrative efficiency. Include additional metrics like processing time or error types for a comprehensive evaluation.
-
Address Legal Compliance Details: While the study mentions compliance with South Korea's Personal Information Protection Act, it lacks specific details on how the on-premise system ensures data security. Elaborate on technical measures (e.g., encryption, access controls) to reinforce trustworthiness.
-
Enhance Model Optimization Discussion: The paper notes the model's efficiency with 4.5 GB GPU usage, but it could further explore optimization techniques to reduce resource demands. For inspiration, refer to Gao, S., Zou, Y., Feng, L. (2025). "A Lightweight Double-Deep Q-Network for Energy Efficiency Optimization of Industrial IoT Devices in Thermal Power Plants" (Electronics), which offers relevant strategies for resource-constrained environments.
-
Strengthen Related Work Integration: The related work section could better connect text mining and knowledge distillation trends to the OP-LLM-SA's design. Integrate more examples of how these technologies have been applied in similar public sector contexts to justify the proposed approach.
Author Response
Response 1: Enhance Comparative Analysis: The study lacks a detailed comparison with existing on-premise LLMs or similar systems. Including a comparative analysis would better highlight the OP-LLM-SA's unique contributions. For reference, consider Zou, Y., Xu, Z., Zhang, Q., et al. (2025). "Few-Shot Learning With Manifold-Enhanced LLM for Handling Anomalous Perception Inputs in Autonomous Driving" (IEEE Transactions on Intelligent Transportation Systems), which demonstrates advanced model adaptability that could inform this comparison.
Comments 1: Thank you for your valuable feedback. I have carefully reviewed the reference and incorporated this point into the paper. [Reference 29]
Response 2: Expand Dataset Scope: The evaluation is based on only 30 public documents, which may not fully reflect the model's scalability. Test the model with a broader range of document types and larger datasets to assess its robustness across varied administrative contexts.
Comments 2: Thank you for your feedback. Since the initial feedback, I have been working to expand the dataset and have now increased it to 80 datasets.
Response 3: Detail Future Research Plans: The future research section is brief and lacks specific methodologies for expanding text mining applications. Provide concrete plans, such as integrating the model with other administrative tasks, to strengthen the study's forward-looking impact.
Comments 3:The conclusion section has been completely revised to present a concrete future direction for text mining. Thank you.
Response 4: Improve Performance Metrics Clarity: The token accuracy (91.33%) and complete sentence rate (98.53%) are promising, but the paper should clarify how these metrics were calculated and their relevance to real-world administrative efficiency. Include additional metrics like processing time or error types for a comprehensive evaluation.
Comments 4: We have reorganized the performance metrics while maximizing the dataset. However, regarding processing time, we have included suggestions for future research to strengthen this aspect in the conclusion section.
Response 5: Address Legal Compliance Details: While the study mentions compliance with South Korea's Personal Information Protection Act, it lacks specific details on how the on-premise system ensures data security. Elaborate on technical measures (e.g., encryption, access controls) to reinforce trustworthiness.
Comments 5: Thank you for your feedback. Regarding legal compliance, given that School Administration in South Korea is highly regulated, we proposed on-premises operation under the assumption that it is protected. While technical safeguards were excluded from this paper, we have revised the conclusion to present this as a limitation and recommend further reinforcement.
Response 6: Enhance Model Optimization Discussion: The paper notes the model's efficiency with 4.5 GB GPU usage, but it could further explore optimization techniques to reduce resource demands. For inspiration, refer to Gao, S., Zou, Y., Feng, L. (2025). "A Lightweight Double-Deep Q-Network for Energy Efficiency Optimization of Industrial IoT Devices in Thermal Power Plants" (Electronics), which offers relevant strategies for resource-constrained environments.
Comments 6: Thank you for the references. I have read them thoroughly and will strengthen this area in the future, taking into account the content regarding model optimization.
Response 7: Strengthen Related Work Integration: The related work section could better connect text mining and knowledge distillation trends to the OP-LLM-SA's design. Integrate more examples of how these technologies have been applied in similar public sector contexts to justify the proposed approach.
Comments 7: Thank you for your feedback. This study is limited to a specific domain, and while existing research is insufficient, we would appreciate your understanding that it serves as foundational research within its current scope.
Round 3
Reviewer 1 Report
Comments and Suggestions for AuthorsPlease refer to the attachment.
Comments for author File: Comments.pdf
Author Response
appear. Furthermore, each equation must be accompanied by its explanation and illustration, with the corresponding equation number explicitly cited in the text. In short, duplicate numbering, such as multiple instances of (1), (2), etc., should be avoided, as this may confuse readers.From row 204, expressions such as “First, (1) … Next, (2) … Next, (3) …” are redundant and confusing. Please use either verbal sequencing (e.g., First, Next, Finally) or numerical referencing (e.g., (1), (2), (3))—but not both simultaneously.
ResponseThank you for your feedback. We have revised the forms and related content once again to incorporate your suggestions.
Comments : Only 80 official documents were used for evaluation, which is a relatively small number. This limits the statistical robustness of the performance metrics and raises concerns about generalizability. Please provide some illustrations in your conclusion.
ResponseThank you for your feedback. We have incorporated your suggestions and, as with the second revision, included limitations regarding the 80 datasets in the conclusion section.
Comments : There is no discussion of the statistical significance of improvements over baseline models. Please provide a more detailed error analysis, including a quantitative breakdown of error types. If possible, include statistical tests comparing student, teacher, and baseline models.
ResponseThank you for your feedback. We have incorporated your suggestion and included the statistical significance probability in Table 4.
Comments : The study highlights efficiency through low GPU usage and CPU feasibility, but it does not measure real-time performance regarding processing speed, latency, and user load.
ResponseThank you for your feedback. We have incorporated your suggestions into the conclusion section, outlining the limitations of this study. This will enable us to strengthen future research by addressing aspects such as processing speed.
Comments : The paper emphasizes on-premise deployment for privacy, but it does not explore the potential risks of internal data leaks or adversarial attacks. The section on security considerations needs to be strengthened, including potential vulnerabilities, data governance, and mitigation strategies.
ResponseThank you for your feedback. We have revised the conclusion to address security threats such as internal data leakage risks and adversarial attacks. We will strive to strengthen these aspects in future research.
Comments : The related work is acknowledged, but the novelty of this approach is not clearly distinguished from previous studies on LLM compression and distillation in similar areas, such as healthcare and government. It is important to explicitly highlight how this approach differs from existing knowledge distillation (KD)-based LLM compression methods and to explain why processing public documents represents a unique contribution.
ResponseThank you for your feedback. While this study addresses a common aspect within established domains such as government and healthcare, its distinction lies in being the first application specifically for School Administration. This point was included in the conclusion as the significance of the research.
Reviewer 2 Report
Comments and Suggestions for Authors- In my previous comment, I wrote 'CHAT GPT,' but it should actually be 'ChatGPT.' I apologize—my input method automatically changed it. Please correct 'CHAT GPT' to 'ChatGPT.'
- In lines 289, 303, 306, and 311, the “-” looks like a Markdown bullet; please switch it to the symbol you used in line 410.
- The symbol explanations after each formula currently list the symbol, a colon, and then the description. While this is concise, papers usually present such information in running text; it would be better to follow that conventional paragraph style.
Author Response
Comments : In my previous comment, I wrote 'CHAT GPT,' but it should actually be 'ChatGPT.' I apologize—my input method automatically changed it. Please correct 'CHAT GPT' to 'ChatGPT.'
In lines 289, 303, 306, and 311, the “-” looks like a Markdown bullet; please switch it to the symbol you used in line 410.
The symbol explanations after each formula currently list the symbol, a colon, and then the description. While this is concise, papers usually present such information in running text; it would be better to follow that conventional paragraph style.
Response: Thank you sincerely for your feedback. I have made the necessary revisions according to the format. Thank you.