LLM-Enhanced Framework for Building Domain-Specific Lexicon for Urban Power Grid Design
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsAuthors will need to address the following comments in their revised manuscript:
- TF-IDF algorithm in the abstract should be first shown in complete form and then in abbreviated forms.
- Please add a conceptual figure to the Introduction to show the aim of research in this manuscript.
- I understand that authors built a domain-specific lexicon comprising 3,426 core seed words and 9,856 synonyms. I expected something at least 10x more than these numbers. Would you please explain how did you choose these numbers for core seed words and synonyms?
- Why do you think these numbers for core seed words and synonyms are enough to prove your model acceptability? I think authors will need to do some sensitivity analysis on the numbers of core seed words and synonyms.
- All equations will need to be cited and have references.
- Section 3.3 is named “experimental results”. Do you mean testing of the LLM?
- Section 4 should be named “conclusion” rather than “discussion”.
- Main parameters of LLMs are set to some values, how did you choose them?
Author Response
Dear Reviewer,
We sincerely appreciate your thorough review and valuable comments on our study. We have carefully addressed each of your suggestions, and the specific revisions are detailed below for your further consideration.
- Regarding the full form of the TF-IDF algorithm
Comments 1: TF-IDF algorithm in the abstract should be first shown in complete form and then in abbreviated forms.
Response 1: We have added the full term "Term Frequency-Inverse Document Frequency (TF-IDF)" upon its first mention in the abstract and cited the foundational work by Salton et al. in the main text.
Location: Line 22 in the abstract and Line 203 in the main text.
- Addition of a conceptual figure in the Introduction
Comments 2: Please add a conceptual figure to the Introduction to show the aim of research in this manuscript.
Response 2: A conceptual diagram illustrating the research framework has been added to the Introduction.
Location: Line 136 and the description above Figure 1.
- The lexicon size potentially being 10 times larger
Comments 3: I understand that authors built a domain-specific lexicon comprising 3,426 core seed words and 9,856 synonyms. I expected something at least 10x more than these numbers. Would you please explain how did you choose these numbers for core seed words and synonyms?
Response 3: The scope of this study is limited to the power grid design domain, not the entire power network. The corpus is derived from design standards and documents, referencing the current Chinese industry standard DL/T 1033.10—2016 Electric Power Vocabulary—Part 10: Power Equipment, which includes 1,086 terms after revision. We believe that 3,426 seed words are sufficient to cover the vast majority of domain-specific vocabulary in power grid design.
- Rationale for the number of seed words and synonyms
Comments 4: Why do you think these numbers for core seed words and synonyms are enough to prove your model acceptability? I think authors will need to do some sensitivity analysis on the numbers of core seed words and synonyms.
Response 4: A sensitivity analysis has been conducted to demonstrate the impact of different TF-IDF thresholds on the quantity and quality of seed words.
Location: Section 3.5 " Analysis of Dictionary Effectiveness and Sensitivity " or Line 397.
- Missing citations for equations
Comments 5:All equations will need to be cited and have references.
Response 5: Mutual information is now cited to Elements of Information Theory. Adjacency entropy is cited to the work of Huang et al.
Location: Line 233 and Line 244 (References 20–21).
- Adjustment of section headings
Comments 6: Section 3.3 is named “experimental results”. Do you mean testing of the LLM?/ Section 4 should be named “conclusion” rather than “discussion”.
Response 6: The original heading "3.3 Experimental Results" has been revised to "3.3 Results of Synonym Mining and Self-Correction." The original heading "4 Discussion" has been revised to "4 Conclusion."
Location: Line 346 and Line 467.
- Basis for LLM parameter selection
Comments 7:Main parameters of LLMs are set to some values, how did you choose them?
Response 7: The rationale for each parameter setting has been elaborated.
Location: A new paragraph added in Section 3.2 "Parameter Settings." (Lines 335-345)
Thank you once again for your insightful feedback. We hope these revisions meet your expectations.
Best regards,
All Authors
Reviewer 2 Report
Comments and Suggestions for AuthorsDear Authors,
Thank you for this interesting contribution.
The paper presents a step-by-step procedure to build specialized vocabulary for urban power grid design which solves the issues stemming from inconsistent language usage and terminology inconsistencies.
The methodology includes three steps for data preprocessing as well as seed vocabulary extraction through an optimized TF-IDF algorithm followed by synonym mining that uses Large Language Models (LLMs).
A two-way filtering approach functions to improve the vocabulary by including proper engineering words yet eliminating unnecessary terms of different fields. The developed lexicon has been created to improve intelligent design systems which focus on smart urban power grids.
Some suggestions are:
- This research demonstrates a new method which combines LLMs with domain-specific lexicon development though similar methods exist in different research contexts.
- The paper stands out as clear but improved English expression together with added visual elements would increase its comprehensibility.
- The manuscript will benefit from a wider diversity of data sources including actual engineering case studies with dynamically updated design documents which would improve both real-time adaptability and lexical relevance of the proposed system.
- The clarity of content will increase when writers explain technical jargon together with methodology details particularly about LLMs in synonym mining to readers who do not understand this domain.
- Visual elements such as graphs and tables should be incorporated into the presentation to better explain the obtained results.
- Besides examining lightweight model deployment methods the proposed approach should tackle its present constraints regarding specific LLM architectures through performance enhancements.
- Additional semantic depth in synonym verification would arise when incorporating power-domain knowledge graphs into the process which enables a better parsing of term relationships.
Basically, the research targets an important void in the development of specialized terminology sets for city power distribution networks because such tools are vital for creating intelligent design systems. It combines automated evaluation with expert validation to maintain the reliability of its obtained results. In my view, this research holds high relevance for power grid design practitioners who also work with natural language processing because it provides practical uses. It delivers essential solutions and beneficial insights about domain-specific lexicon construction which generates substantial progress in the field.
I hope that these comments may help the authors increase the quality of the paper.
Kindest regards
Comments on the Quality of English LanguageWhile the document uses acceptable English at most points there exist specific sections where clarification strategies would increase both clarity and precision in the text.
Complex sentences make up some parts of the text which requires simplification to enhance readability.
The document includes technical terminology which may create reading difficulties for audiences outside the field because no explanation has been provided.
The research material would become easier to understand for diverse audiences if authors clarified these specific sections through context or definitions.
In summary, although the English language does not create major comprehension issues the research benefits from additional refinement of chosen aspects.
Author Response
Dear Reviewer,
We sincerely appreciate your thorough review and valuable comments on our study. We have carefully addressed each of your suggestions, and the specific revisions are detailed below for your further consideration.
- Enhancements to Figures and Visualization
Comments 1: This research demonstrates a new method which combines LLMs with domain-specific lexicon development though similar methods exist in different research contexts. / Visual elements such as graphs and tables should be incorporated into the presentation to better explain the obtained results.
Response 1:
A conceptual diagram of the research framework has been added.
Visual representations of the mining algorithm have been introduced.
The visualization of mining results has been optimized for clarity.
Location: Conceptual diagram (Line 136), mining algorithm visualization (Line 313), and optimized mining results (Lines 350 and 356).
- Supplementation of Data Sources
Comments 2: The manuscript will benefit from a wider diversity of data sources including actual engineering case studies with dynamically updated design documents which would improve both real-time adaptability and lexical relevance of the proposed system.
Response 2: Additional design documents and real-world engineering cases have been incorporated into the corpus, with explanations added in Section 2.1.
Location: Lines 169–171.
- Clarification of Technical Terminology
Comments 3: The clarity of content will increase when writers explain technical jargon together with methodology details particularly about LLMs in synonym mining to readers who do not understand this domain.
Response 3: A brief explanation of large language models (LLMs) and prompt engineering methods has been added in Section 2.4.
Location: Lines 276–282.
- Enhance semantic validation through knowledge graphs
Comments 4: Additional semantic depth in synonym verification would arise when incorporating power-domain knowledge graphs into the process which enables a better parsing of term relationships.
Response 4: Your suggestion to enhance semantic validation through knowledge graphs is highly valuable. In the current study, constructing a high-precision domain lexicon serves as a critical prerequisite for knowledge graph development. Therefore, this research focuses on addressing fundamental challenges in term extraction and synonym expansion, laying the data foundation for subsequent entity-relation mining in knowledge graphs. In future work, we plan to build a power design knowledge graph based on this lexicon, as outlined in the revised Section 4.3 ("Limitations and Future Directions").
Thank you once again for your insightful feedback. We hope these revisions meet your expectations.
Best regards,
All Authors
Reviewer 3 Report
Comments and Suggestions for AuthorsThis study proposes a unique and authentic framework for building a domain-specific lexicon for urban power grid design based on LLM. It is aimed at improving the accuracy and practicality of the lexicon through multi-level term extraction and synonym expansion.
To further improve the article, please pay attention to the following points:
- This kind of research would require a well-defined hypothesis rather than only presenting the research problems.
- The study could give a strong contribution to the field, specifically in the integration of big data in power system design with AI. But, the connection between the field of power system design and the proposed approach seems vague to me.
- Also, the research real-world application (such case study in urban power grid design) should be clearly presented to show how the traditional methods is improved using this approach.
- The method is rigorous and appropriate for the research question. But, the methods should be described in a more detail to allow replication by other researchers.
- The results should be validated to avoid overgeneralization or unsupported claims
- Whenever possible please use universal terms and symbols.
could be improved
Author Response
Dear Reviewer,
We sincerely appreciate your thorough review and valuable comments on our study. We have carefully addressed each of your suggestions, and the specific revisions are detailed below for your further consideration.
- Clarification of Research Hypothesis
Comments 1: This kind of research would require a well-defined hypothesis rather than only presenting the research problems.
Response 1: A paragraph has been added at the end of the Introduction to explicitly state the research hypothesis: "This study hypothesizes that LLM-based term extraction and synonym expansion can significantly improve the accuracy and coverage of lexicons in the power grid design domain."
Location: Last paragraph of the Introduction (Lines 141–144).
- Strengthening the Connection to Practical Applications
Comments 2: The study could give a strong contribution to the field, specifically in the integration of big data in power system design with AI. But, the connection between the field of power system design and the proposed approach seems vague to me. / Also, the research real-world application (such case study in urban power grid design) should be clearly presented to show how the traditional methods is improved using this approach.
Response 2: We have expanded the discussion on the practical applications of this research, including automated knowledge graph building, intelligent Q&A systems, and intelligent design assistance systems.
Location: Section 4.2 "Main Conclusions" (Lines 487–493).
- Elaboration of Methodological Details
Comments 3: The method is rigorous and appropriate for the research question. But, the methods should be described in a more detail to allow replication by other researchers.
Response 3: A flowchart of the synonym mining algorithm has been added. Pseudocode and detailed prompt engineering content have been provided to ensure reproducibility.
Location: Lines 312–318.
- Enhanced Validation of Results
Comments 4: The results should be validated to avoid overgeneralization or unsupported claims
Response 4: Additional experiments on validity and sensitivity analysis have been conducted to demonstrate the impact of lexicon loading and varying seed word quantities on text classification results.
Location: Section 3.5 "Lexicon Validity and Sensitivity Analysis" (Line 395).
- Consistency of terminology
Comments 5: Whenever possible please use universal terms and symbols.
Response 5: We have checked the consistency of terminology throughout the entire text.
Thank you once again for your insightful feedback. We hope these revisions meet your expectations.
Best regards,
All Authors