Automating Lexical Graph Construction with Large Language Models: A Scalable Approach to Japanese Multi-Relation Lexical Networks
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe research mentions that this study is built on a previous study; it is not clear if the applied methodology is the same. If so, then the research contribution is confined to generating additional relations. this situation should be clearly highlighted to understand the research contribution and focus.
Figure 1 shows the workflow. It needs to be clarified and discussed more, as the illustration is very vague.
Previous research suggested lexicon enrichment with minimum resource usage; the current research relies on resources. Discussing this point could clarify the research's contribution.
The algorithm is well stated. The exit point should be highlighted
The research aim is to extract the text relations such as synonym and Holonym, etc. The importance of extracting these relations should be discussed, as well as how this supports text extraction and analytics applications. Examples should be included.
A sample (part ) of the generated graphs should be illustrated in the paper text as an example.
The results are moderate. An explanation is required
The results are compared with those of WordNet to highlight the advancement. However, comparing with the literature is also required.
It is not clear if the research focuses on a specific field or the Japanese language in general.
The references list needs to include more recent sources. I recommend including more scientific references than online resources.
Author Response
Comment 1
The research mentions that this study is built on a previous study; it is not clear if the applied methodology is the same. If so, then the research contribution is confined to generating additional relations. this situation should be clearly highlighted to understand the research contribution and focus.
Response: Thank you for this important point. We acknowledge that the distinction between our previous work and the current study's contribution needs to be more explicit. We have revised the Introduction (Section 1) and the beginning of the Methods section (Section 2) to clearly delineate the contributions of this study. We will explicitly state that while this work builds on the initial concept, it introduces a novel, more complex methodology for constructing and analyzing a multi-relational network, which is the core contribution of this paper.
Comment 2:
Figure 1 shows the workflow. It needs to be clarified and discussed more, as the illustration is very vague.
Response: We agree that the workflow depicted in Figure 1 would benefit from a more detailed explanation in the text.
Action Taken: We have substantially revised the first part of Section 2 (Methods) to provide a step-by-step walkthrough of the process schema shown in Figure 1. We will now explicitly describe each stage of the pipeline—from initializing the graph (Step 1) and selecting a lexeme (Step 2) to the nested loop that generates prompts for each relation type, updates the graph with new nodes and edges, and handles errors (Step 10). This detailed narrative will directly reference the components of Figure 1, making the entire workflow more transparent and easier to follow.
Comment 3:
Previous research suggested lexicon enrichment with minimum resource usage; the current research relies on resources. Discussing this point could clarify the research's contribution.
Response: Thank you for raising this interesting point about resource dependency. We believe a clarification of what "minimum resource" means in this context will be beneficial.
We will add a paragraph in the Introduction (Section 1) and expand the Discussion (Section 4) to contextualize our approach. We will clarify that our method, while computationally intensive, significantly reduces the reliance on manual expert annotation and the development of language-specific annotated corpora, which is a major bottleneck in lexicography.
Comment 4:
The algorithm is well stated. The exit point should be highlighted
Response: We appreciate the positive feedback on the algorithm's description. We have revised the algorithm description in Section 2.2. Specifically, in Step 13, we have added a sentence to clarify that the iterative process terminates once every lexeme in the initial seed list (selected from the VDRJ dataset) has been processed.
Comment 5:
The research aim is to extract the text relations such as synonym and Holonym, etc. The importance of extracting these relations should be discussed, as well as how this supports text extraction and analytics applications. Examples should be included.
Our Response: This is an excellent suggestion. A stronger motivation and practical examples will undoubtedly improve the paper.
Action Taken: We have expanded the Introduction to include a more detailed discussion on the importance of these specific semantic relations for NLP applications. We will include concrete examples, such as:
- How hypernymy/hyponymy is crucial for semantic search (a search for "vehicle" should also retrieve documents about "cars" and "trucks") and for text categorization.
- How meronymy/holonymy is essential for question-answering systems (to understand that if "the engine is broken," then "the car is broken").
- How a rich network of relations improves word sense disambiguation by providing contextual clues based on related concepts.
Comment 6:
A sample (part ) of the generated graphs should be illustrated in the paper text as an example.
Our Response: We agree completely. A visual example would make the structure of our network much more tangible. While Figures 3 and 4 show the visualization tool, a smaller, clearer schematic in the main text is needed.
We will add a new figure in Section 2.4 (Graph Construction). This figure will illustrate a small, localized subgraph centered around a common Japanese noun, such as child - kodomo, similar to the representation in the Illustration. This will provide a clear, concrete example of the multi-relational data we are generating.
Comment 7:
The results are moderate. An explanation is required
Response: Thank you for this critical observation. The "moderate" alignment scores with WordNet (37.96% exact, 58.70% soft) are indeed a key finding that warrants a more in-depth explanation. Our intention is to frame this not as a weakness, but as an insightful result highlighting the complementary nature of our LLM-generated network.
The discrepancy arises because our network is not intended to be a perfect replication of WordNet. Instead, it captures a different, more dynamic, and context-sensitive view of the lexicon. The reasons for the moderate match include:
- Different Lexical Coverage: WordNet may lack modern or domain-specific terms that are present in the LLM's vast training data.
- Semantic Nuance: The LLM often captures broader, associative relationships that do not fit into WordNet’s strict ontological categories. The significant jump from exact to soft matching (from 38% to 59%) supports this, as it shows that while the specific relation may differ, a valid semantic connection often exists.
- Inherent Linguistic Ambiguity: As we discuss in Section 4.2, many word pairs can have multiple valid relationships depending on the context, making a single one-to-one match difficult.
We have expanded the Discussion (Section 4) with a new subsection titled "Interpreting the Alignment with WordNet: A Complementary Resource." In this section, we directly address the "moderate" results and elaborate on the points above, arguing that the differences represent the unique strengths of an LLM-based approach in capturing the fluid, multifaceted nature of lexical semantics.
Comment 8:
The results are compared with those of WordNet to highlight the advancement. However, comparing with the literature is also required.
Response: This is a very important and valid point. Situating our work within the broader context of recent research is crucial.
Action Taken: We will perform a thorough review of recent literature on LLM-based construction of lexical networks and knowledge graphs. We will add a new subsection to the Discussion (Section 4) that compares our methodology, scale, and findings with other relevant studies published in the last few years. This comparison will address aspects like the choice of LLM, the types of relations extracted, evaluation metrics, and the scale of the resulting networks, thereby better highlighting our study's contribution to the current state-of-the-art.
Comment 9:
It is not clear if the research focuses on a specific field or the Japanese language in general.
Response: Thank you for pointing out this ambiguity. The focus of our research is the general Japanese language. The initial seed vocabulary was chosen from the Vocabulary Database for Reading Japanese (VDRJ), which covers general nouns across the JLPT proficiency levels (N1-N4), and the GPT-4o model is itself a general-purpose model.
We have revised the Abstract and Introduction to explicitly state that our aim is to construct a large-scale lexical network for the general Japanese language. We will also reiterate this in Section 2.1 (Data Sources and Preprocessing) to make the scope of our work clear from the outset.
Comment 10:
The references list needs to include more recent sources. I recommend including more scientific references than online resources.
Response: We agree that a strong scholarly foundation requires up-to-date, peer-reviewed references.
In conjunction with our response to comment 8, we have conducted an updated literature search and will incorporate several more recent (2023-2025) scientific articles on large language models, knowledge graph construction, and computational lexicography. We will also review our existing references. While citations to online resources are sometimes necessary for datasets, software libraries (e.g., NetworkX), and API documentation to ensure reproducibility, we will ensure that all core theoretical and comparative claims are supported by peer-reviewed academic publications. The reference list will be significantly updated in the revised manuscript.
Once again, we thank the reviewer for their thorough and valuable feedback. We are confident that by addressing these points, we will significantly improve the quality, clarity, and impact of our manuscript. We look forward to submitting a revised version that reflects these changes.
Sincerely,
Doc. Benedikt Perak, PhD and doc. Dragana Špica, PhD
Reviewer 2 Report
Comments and Suggestions for AuthorsComments:
- The approach is clearly novel and makes a meaningful contribution by enriching lexical networks with multiple relation types. It would help if you emphasized more clearly how your method advances beyond existing automated approaches.
- The pipeline is thoughtfully designed, but the lack of strict relation exclusivity introduces semantic noise. Consider adding stronger validation or a hybrid human–machine evaluation step to improve accuracy.
- Some sections, particularly the discussion on multiple relations (Section 4.4), are overly detailed. Streamlining these parts would make the paper more readable. Adding a short related-work comparison with systems like LexicoMatic and LexAN would also strengthen the positioning of your contribution.
- The network expansion results are impressive, but the relatively low alignment with WordNet (37.9%) suggests room for improvement. Including additional baselines or benchmarks would make the advantages of your method clearer.
- The release of datasets and the visualization tool is an excellent step toward practical impact. To go further, consider illustrating specific downstream applications, for example, integration into Japanese language learning platforms or NLP systems.
- The paper acknowledges limitations such as multiple relation assignments, but this could be expanded with clearer strategies for resolving them. Future directions like cross-lingual expansion, better disambiguation methods, and integration with resources such as Wikidata or BabelNet deserve more attention.
- The evaluation currently relies heavily on automated comparison with WordNet. Adding human evaluation, such as inter-annotator agreement studies, would provide stronger evidence of reliability.
- While related work is cited, a more direct comparison with existing automated systems would help readers better understand the relative novelty and performance of your approach.
- The discussion of applications is promising but remains abstract. Including a concrete case study or prototype would greatly enhance the impact and demonstrate real-world relevance.
Author Response
We are very grateful to the reviewer for their thorough and highly constructive feedback. The comments have provided us with a clear roadmap for substantially improving the paper's rigor, positioning, and impact. We agree with the points raised and have outlined a plan below to address each one in our revision.
--- Grouped Response 1: Positioning, Novelty, and Related Work (Comments 1, 3, 8) > Reviewer's Comments: Emphasize novelty beyond existing automated approaches. Add a short related-work comparison with systems like LexicoMatic and LexAN. A more direct comparison would help readers better understand the novelty.
Response: Thank you for this crucial feedback. We agree that a more direct comparison with specific state-of-the-art systems is essential for clearly positioning our contribution.
We have already created a new subsection titled "Comparison with Recent Literature and State-of-the-Art" . We will significantly enhance this section by adding a direct comparative analysis against LexicoMatic and LexAN. This comparison will highlight the key differentiators of our approach, including:
-Our LLM-prompting approach versus their methodologies (e.g., seed-word-based translation for LexicoMatic, graph-based methods for LexAN).
-The scale of our general Japanese network versus the resources they produce.
-Our focus on extracting a broad set of six hierarchical, compositional, and associative relations simultaneously.
This will more clearly articulate the novelty of our work as requested.
--- Grouped Response 2: Evaluation, Accuracy, and Semantic Noise (Comments 2, 4, 7) > Reviewer's Comments: Lack of strict relation exclusivity introduces semantic noise; consider validation or a hybrid step. Low alignment with WordNet suggests room for improvement; include baselines. Evaluation relies heavily on automated comparison; add human evaluation.
Response: This is an excellent and very important set of points regarding the rigor of our evaluation. We agree that the lack of relation exclusivity is a key finding that needs careful handling and that human evaluation is the gold standard for validation.
We have already written a new section ("Interpreting the Alignment with WordNet") that reframes the moderate alignment as a feature, not a bug, highlighting the complementary and more dynamic nature of our LLM-based network. To address the suggestion for baselines, we will add a statement explaining that establishing direct baselines for a large-scale, multi-relational Japanese graph is challenging due to the lack of comparable public resources, but we will acknowledge this as a limitation and a direction for future benchmark development.
Furthermore, we fully agree that a human evaluation study (e.g., measuring inter-annotator agreement on a subset of the relations) would be the definitive way to assess the quality and "semantic noise" of the network. We have done preliminary evaluation, as mentioned in our previous work. A study of this kind is a significant undertaking that we believe warrants a follow-up paper. Therefore, for this manuscript, we will add a detailed discussion in our "Limitations and Future Work" section. We will explicitly state that the lack of human evaluation is a current limitation and will outline a clear methodology for how such a study could be conducted in the future to validate the relations more rigorously.
--- Grouped Response 3: Applications and Impact (Comments 5, 9) > Reviewer's Comments: Illustrate specific downstream applications (e.g., in language learning). The discussion of applications is promising but abstract; a concrete case study or prototype would enhance impact.
Response: We agree that making the applications more concrete would significantly strengthen the paper's impact. While developing a full prototype is beyond the scope of this methodological paper, we can provide a much more detailed and tangible "use case."
We have a new subsection to the Discussion titled "A Case Study: Enhancing Japanese Language Education." In this section, we move beyond abstract claims and provide a detailed, concrete description of how our network could be integrated into a language learning platform. We will outline specific examples of exercises that become possible, such as: * Dynamic quizzes that ask students to identify hypernyms ("Which of these is a type of 乗り物 (vehicle)?"). * Contextual vocabulary building based on meronyms ("A 車 (car) has an エンジン (engine). What does a 自転車 (bicycle) have?"). * Nuanced synonym exploration tools that use the synonymity_strength score to explain subtle differences between words. This detailed walkthrough will serve as a powerful, concrete illustration of the network's real-world relevance.
--- Grouped Response 4: Limitations and Future Directions (Comment 6) > Reviewer's Comments: Expand on limitations and strategies for resolving them. Future directions like cross-lingual expansion and integration with Wikidata or BabelNet deserve more attention.
Response: Thank you for these excellent and forward-looking suggestions. They provide a clear path for extending this research.
We have substantially expanded our "Limitations and Future Work" section. We discuss the need for human evaluation (as mentioned above) and also explicitly detail the promising future directions you suggested. This will include dedicated paragraphs on: Cross-Lingual Expansion: Applying this scalable methodology to other languages. Knowledge Base Integration: A strategy for aligning and enriching our network by linking it to large-scale, multilingual knowledge bases like Wikidata and BabelNet. Advanced Disambiguation: Methods for resolving the multiple-relation issue, potentially by developing a context-aware model that can prioritize the most likely relation between two words in a given sentence.
--- Grouped Response 5: Readability (Comment 3) > Reviewer's Comment: Some sections, particularly the discussion on multiple relations (Section 4.4), are overly detailed. Streamlining these parts would make the paper more readable.
Response: We appreciate this editorial feedback. It is easy to get lost in the details, and a more concise presentation will improve the reader's experience. We have reviewed Section 4.4 (our current section on multiple relations), focusing on the core message: that multiple relations are an inherent feature of language that our model captures, and this has implications for NLP. We will move some of the more speculative or redundant examples to an appendix or remove them to ensure the main text is clear, focused, and impactful.
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe research could be accepted in its current form

