Automating Lexical Graph Construction with Large Language Models: A Scalable Approach to Japanese Multi-Relation Lexical Networks

Perak, Benedikt; Špica, Dragana

doi:10.3390/knowledge5040024

Open AccessArticle

Automating Lexical Graph Construction with Large Language Models: A Scalable Approach to Japanese Multi-Relation Lexical Networks

by

Benedikt Perak

^1,*

and

Dragana Špica

^2,*

¹

Faculty of Humanities and Social Sciences, University of Rijeka, 51000 Rijeka, Croatia

²

Faculty of Humanities, Juraj Dobrila University of Pula, 52100 Pula, Croatia

^*

Authors to whom correspondence should be addressed.

Knowledge 2025, 5(4), 24; https://doi.org/10.3390/knowledge5040024

Submission received: 30 August 2025 / Revised: 13 October 2025 / Accepted: 16 October 2025 / Published: 27 October 2025

(This article belongs to the Special Issue Knowledge Management in Learning and Education)

Download

Browse Figures

Review Reports Versions Notes

Abstract

In recent advancements within natural language processing (NLP), lexical networks play a crucial role in representing semantic relationships between words, enhancing applications from word sense disambiguation to educational tools. Traditional methods for constructing lexical networks, however, are resource-intensive, relying heavily on expert lexicographers. Leveraging GPT-4o, a large language model (LLM), our study presents an automated, scalable approach to creating multi-relational Japanese lexical networks for the general Japanese language. This study builds on previous methods of integrating synonyms but extends to other relations such as hyponymy, hypernymy, meronymy, and holonomy. Using a combination of structured prompts and graph-based data storage, the model extracts detailed lexical relationships, which are then systematically validated and encoded. Results reveal a substantial expansion in network size, with over 155,000 nodes and 700,000 edges, enriching Japanese lexical associations with nuanced hierarchical and associative layers. Comparisons with WordNet show substantial alignment in relation types, particularly with soft matching, underscoring the model’s efficacy in reflecting the multifaceted nature of lexical semantics. This work contributes a versatile framework for constructing expansive lexical resources that hold promises for enhancing NLP tasks and educational applications across various languages and domains.

Keywords:

lexical networks; natural language processing; large language models; Japanese lexicon; semantic relations; graph algorithms; hierarchical relations

1. Introduction

In the rapidly evolving field of natural language processing (NLP), lexical networks have emerged as pivotal tools for understanding and manipulating language [1,2,3]. These networks, which represent the semantic and syntactic relationships between words, are crucial for many NLP tasks, such as word sense disambiguation [4,5] or sense labelling [6], and in language education. Despite their importance, the traditional construction of lexical networks has been a labor-intensive process, heavily reliant on expert lexicographers and manual annotation.

Recent advancements in large language models (LLMs), such as GPT-4o, have opened new avenues for automating and scaling the creation of lexical resources. These models, trained on vast amounts of text data, have demonstrated remarkable abilities in generating text, translating languages, and understanding complex instructions [7]. However, their potential for constructing lexical networks remains largely unexplored. This research aims to contribute in that direction.

These networks, which represent semantic and syntactic relationships between words, are crucial for advancing NLP beyond simple keyword matching to a deeper level of semantic understanding. The importance of a multi-relational lexical network lies in its ability to model the world knowledge embedded in language. Each type of semantic relation serves a distinct purpose for text extraction and analytics applications. For instance, Hierarchical Relations (Hypernymy and Hyponymy), i.e., “is-a” relationships, are fundamental for semantic search, query understanding, and knowledge generalization. A search engine equipped with this knowledge can understand that a query for the Japanese term 乗り物 norimono ‘vehicle’ should also retrieve documents mentioning its hyponyms such as 車 kuruma—car, バス basu ‘bus’, and 自転車 jitensha ‘bicycle’. This allows for more intelligent information retrieval and robust text classification. On the other hand, Part–Whole Relations (Meronymy and Holonomy), i.e., “is-part-of” relationships, are essential for applications requiring compositional knowledge, such as question-answering and event extraction. A system that knows an エンジン enjin ‘engine ‘is a meronym of kuruma ‘car’ can correctly infer that “the engine failed” is a statement about a problem with the car. This enables deeper reasoning and more accurate information extraction from text. Finally, Associative Relations (Synonymy and Co-hyponymy) are vital for query expansion and recognizing paraphrasing, ensuring that terms such as 物 mono ‘thing’ and 物品 buppin ‘goods’ are treated as semantically equivalent where appropriate. Co-hyponymy, which links terms sharing a common hypernym (e.g., “car” and “bus”), is critical for word sense disambiguation and building recommendation systems, as the presence of one sibling term can provide strong contextual clues for another. By constructing a comprehensive network that encodes these diverse relationships, we provide a foundational resource that can significantly enhance the performance and sophistication of a wide range of NLP applications.

It is also important to contextualize the term ‘resource’ in the era of large language models. While traditional low-resource NLP has focused on minimizing the dependency on large, annotated corpora, the rise of LLMs introduces a paradigm shift. Our approach, while reliant on a significant computational resource, the GPT-4o model is designed to be “low-resource” in terms of human and language-specific assets. The traditional construction of lexical networks like WordNet requires immense investment in expert lexicographers, manual annotation, and years of development. Our methodology automates this labor-intensive process, significantly reducing the need for this specialized human expertise and pre-existing, manually curated linguistic datasets. This work therefore demonstrates how leveraging a general, albeit large, computational model can drastically lower the barrier to creating rich, large-scale lexical resources, especially for languages where such resources are underdeveloped.

In our previous work, we introduced an innovative approach to developing Japanese lexical networks using LLMs, specifically GPT-4o. By leveraging the extensive synonym and antonym generation capabilities of this model, we established a structured and scalable network, encompassing over 100,000 lexical nodes and over 190,000 edges, i.e., synonyms, antonyms and their mutual senses and mutual domains, across Japanese vocabulary. This foundational study emphasized the effectiveness of combining AI-generated data with graph-based techniques, demonstrating the potential of LLMs to complement traditional lexicographic methods [8]. To evaluate our approach, we referenced traditional Japanese lexical resources, such as Word List by Semantic Principles (WLSP) [9], as well as lists of synonyms from web dictionaries such as Weblio [10] or Goo dictionary [11] which use resources from Japanese WordNet [12,13]. While the Japanese corpus constitutes a modest 0.11% of the total training dataset for GPT-3 [14], this still represents a significant volume of data that can be leveraged for constructing robust lexical networks. Although similar data specifics for GPT-4 are not available, the ongoing development of Japanese LLMs [15,16] suggests a growing potential for these models to enhance lexical resources.

Building on our previous work, the present study extends the initial approach and methodology by incorporating additional semantic relations—such as hyponymy, hypernymy, meronymy, and holonomy—thereby enriching the lexical network with a broader range of relationship types. These relationships offer deeper insights into the structure of Japanese lexicon, facilitating applications that benefit from an intricate understanding of word associations. Our methodology follows the same principles described in the previous research: it employs GPT-4o to iteratively generate and validate complex word relations, followed by their integration into an enriched lexical graph that reflects the hierarchical and associative dimensions of Japanese lexicon.

In this paper we aim to provide enhancements of the methodology for constructing and enriching lexical networks with additional lexemes and multifaceted semantic relationships as well as to present a viable quantitative comparison of the results with existing lexical resources, such as Japanese WordNet. Our previous study established the foundational methodology for using an LLM to build a single-relation (synonymy/antonymy) lexical graph for Japanese. The current research represents a significant expansion in both methodology and scope. The primary contribution, besides the generation of additional relations, includes the development and validation of a multi-relational framework. This required substantial methodological advancements, including:

Designing and testing novel structured prompts for more complex hierarchical (hypernymy, hyponymy) and part–whole (meronymy, holonomy) relations, which are fundamentally different from associative relations, such as synonymy.
Enhancing the graph construction algorithm to handle a multi-layered, directed graph where nodes can have multiple, distinct relationship types simultaneously. This includes managing inverse relations (e.g., hyponym is the inverse of hypernym) automatically.
A new and more complex evaluation methodology, including the comparative analysis between different relation types the soft-matching approach, and the confusion matrix analysis against WordNet, which were not part of the previous work.

Much like a thesaurus, our resource can be utilized for exploring synonyms and semantically related concepts. Although we did not utilize the ontological structures from resources such as WLSP, they may serve as important comparative frameworks for assessing the adaptability of our system. Our results illustrate the potential for this framework to extend across languages and domains in NLP and support applications in language education through advanced semantic representations. This aligns with the findings of research [17], which employed ChatGPT 4 to predict semantic categories of basic vocabulary nouns from the Japanese Learners’ Word List, available online: https://jhlee.sakura.ne.jp/JEV/ (accessed on 31 October 2024) [18].

Building upon our initial findings, which demonstrated that nearly 20% of extracted nouns are (near) synonyms while others exhibit various relation types such as hyponymy, hypernymy, and meronymy, we have developed an innovative approach that leverages these insights. This novel methodology involves a systematic pipeline that begins with lexical input, processes through LLM output, and dynamically stores the results within a graph structure for an efficient extraction of lexical relations. A key advancement in our approach combines the utilization of OpenAI’s structured output capabilities, which ensures that the LLM outputs are fully structured and ready for integration into lexical networks, with graph structured processing of lexical attributes.

Our key contributions include:

Development of algorithms for efficient graph extraction and storage.
Creation of a comprehensive JSON dataset containing the extracted lexical edges, i.e., relations between two lexemes (nodes in the graph).
Comparison with the existing lexical relation resources, i.e., Japanese WordNet.

The updated lexical network, along with the scripts used for its construction, is available in our project repository [19] and represented as the web app at https://jln.syntagent.com/ (accessed on 15 October 2025). Researchers are encouraged to utilize and build upon this resource for further exploration in computational linguistics. This study underscores the synergy between AI-driven data generation and traditional lexicographic expertise, providing a scalable and adaptable framework for constructing comprehensive and nuanced lexical networks across diverse linguistic applications.

2. Methods

The methodology for constructing the Japanese lexical network is a systematic and automated pipeline designed to iteratively expand an initial lexical graph by leveraging the generative capabilities of a Large Language Model (LLM). This process, visualized in Figure 1, ensures a structured and scalable approach to data extraction and network enrichment. The workflow begins by loading an initial lexical dataset and, if available, a pre-existing graph file (Initial Graph (G_initial)). The process then proceeds as follows:

Step 1: Initialize Graph: The system first initializes the graph structure, either by creating a new graph object or by loading a previously saved version. This allows the process to be paused and resumed, ensuring continuity and preserving previously extracted data.

Step 2: Select Lexeme for Processing: The core of the methodology is a nested iterative process. The outer loop begins by selecting a source lexeme from the initial dataset for processing.

Inner Loop for Relation Extraction: For each source lexeme, an inner loop cycles through a predefined set of six lexical relation types (SYNONYM, HYPERNYM, HYPONYM, CO-HYPONYM, MERONYM, and HOLONYM), as shown in the LEXICAL RELATIONS box.

First, the system defines the necessary data structures to handle the LLM’s output (Step 3: Define Data Structures).

A structured prompt is then dynamically generated (Define LLM Prompt) for the current lexeme and relation type.

This prompt is sent to LLM, which processes the request to Generate Lexical Relations.

Graph Updating: The LLM’s response, which contains potential target lexemes and associated semantic metadata, is systematically parsed (Extract Lexical Relations). This new information is then used to Update Lexical Graph. This update involves two key actions:

Add Nodes: If a target lexeme from the LLM’s response does not already exist in the graph, a new node is created. To enrich this new node, the system can cross-reference external Sources: Dictionaries & Corpora.

Add Edges: A directed edge representing the specific lexical relationship is added between the source and target nodes.

Error Handling and Data Integrity: To ensure a robust process, the system includes modules for managing empty or invalid LLM responses (Step 10: Handle Empty Responses) and logs all errors and progress for monitoring (Error Handling and LOG: Errors & Progress).

Saving and Iteration: After the inner loop has completed for all relation types for a single source lexeme, the entire graph is saved to a file (Save Graph). This ensures data integrity and prevents data loss. The outer loop then proceeds to the Next Lexeme Iteration until all source lexemes in the initial dataset have been processed.

This modular and iterative design enables the continuous and scalable expansion of the final output: the Expanded Lexical Graph (G_lexical).

While these methods share characteristics with algorithmic workflows, they are better described as a modular system, combining data structures, machine learning models, and graph-based operations.

2.1. Data Sources and Preprocessing

This study utilizes selections of nouns from Vocabulary Database for Reading Japanese (VDRJ), for Japanese language learners [20] as the initial data source. This database includes over 60,000 Japanese words, labelled for their levels, and for this research, we chose only nouns of pre-2010 JLPT levels 1–4. We began by filtering the dataset to obtain specific categories of nouns: 3215 general common nouns, 1609 general common nouns that can function as verbal nouns, 18 general common nouns that may function as nominal adjectives (-na adjectives), and 13 general common nouns that may function as adverbs. In the initial construction of the synonym graph, we extracted 6200 nouns classified as N0. This process resulted in a network comprising 11,055 unique Japanese lexemes. From this set, we selected lexemes categorized at levels 1–4 of the Japanese Language Proficiency Test (JLPT) to construct a new multi-relational lexical graph for the general Japanese language.

2.2. Algorithmic Extraction of Lexical Data Large Using Language Model (LLM)

The system architecture is driven by a custom Python version 3.12 script that interfaces directly with OpenAI’s GPT-4o model, a large language model (LLM) capable of generating extensive lexical relations [21]. The extraction pipeline operates as an algorithm, shown as Algorithm 1, with following features:

Algorithm 1: Lexical Network Expansion Using LLM

Input: Lexical dataset (lexeme, POS, proficiency, register, frequency, …); LLM (e.g., GPT-4o or other); initial graph G_initial.
Output: Expanded graph G_lexical with nodes = lexemes and edges = {synonym, hypernym, hyponym, co-hyponym, meronym, holonym, mutual domain, mutual sense}.
# Step

Load G ← G_initial from storage.
Select the next lexeme x from the processing list.
Define a LexicalRelation Pydantic model and JSON encoder for relation records.
Draft prompts for x per relation type R ∈ {SYN, HYPER, HYPO, COHYPO, MERO, HOLO} with target counts and inference notes.
Include in each prompt the set of already-linked targets for (x, R) to avoid duplicates.
Query the LLM with ⟨x, R⟩ and receive candidate relations and attributes.
Parse the response into validated, JSON-serializable LexicalRelation objects.
For each candidate (x → y, R): skip if an identical edge already exists.
If y ∉ V(G), verify in external dictionary/corpus; add node y with (kanji, kana, gloss, corpus features).
Add edge (x, y, R, attributes).
If R is undirected (SYN, COHYPO), also add (y, x, R).
If R is hierarchical, add the inverse (HYPER ↔ HYPO) with proper attributes.
Enforce uniqueness constraints per (x, y, R) to maintain integrity.
Log errors/warnings; append progress to a checkpoint text file.
Persist G to disk after each lexeme (atomic save).
Repeat from Step 2 until the seed list is exhausted; return G_lexical.

2.3. Output Data Structures

Each filtered lexeme from the VDRJ is fed to the LLM (GPT-4o), which is systematically prompted to generate several types of lexical relations. Through structured prompts, the system elicits synonyms and additional hierarchical relationships, such as hypernyms and hyponyms, as well as part–whole relations like meronyms and holonyms (up to 10 entries per lexeme per relation). This range of relationships ensures a multifaceted representation of each lexeme within the network. To ensure consistency across the model’s outputs, each lexical relationship is parsed into a predefined format using Pydantic models [22].

This structured output format, exported as JSON, includes fields such as:

synonymity_strength: A numerical value indicating the strength of the synonym relationship between the source word and the target word, on a scale from 0 to 1. A value closer to 1 means the words are very similar or nearly identical in meaning, while a value closer to 0 indicates weaker synonymy.
relation_type: Specifies the type of lexical relationship between the source and target word. Possible values include:
SYNONYM: Words that have the same or very similar meaning.
HYPERNYM: A broader term that encompasses the source term (e.g., “vehicle” is a hypernym for “car”).
HYPONYM: A narrower term that is an instance of a broader term (e.g., “car” is a hyponym for “vehicle”).
CO-HYPONYM: Words that share the same hypernym (e.g., “car” and “truck” are co-hyponyms because they are both hyponyms of “vehicle”).
MERONYM: A part of a whole (e.g., “wheel” is a meronym for “car”).
HOLONYM: A whole that contains a part (e.g., “car” is a holonym for “wheel”).
relation_explanation: A brief explanation in English that describes how the source and target word are related. It provides additional context and understanding of the nature of their relationship.
mutual_domain: The shared domain or category to which both words belong. This helps identify the context or area of knowledge where the words are related (e.g., “medicine,” “technology,” “nature”).

By formalizing the output using the OpenAI structured output [23], the system creates a uniform dataset that can be integrated directly into the graph structure, with each relation pre-annotated with semantic metadata. This structuring is critical for ensuring that the model’s outputs are both coherent and readily applicable to graph-based data storage.

2.4. Graph Construction

The core of the lexical network system lies in the graph construction process. Using NetworkX Python library (version 3.5) [24], each extracted lexical relation is modelled as a directed edge between source and target nodes with specified relation type in a multi directed graph, creating a scalable network structure. The script first attempts to load an existing graph file, serving as a cumulative repository of previously extracted lexical data. This incremental loading approach enables continuity, as new lexemes and relationships can be added iteratively without recreating the network from scratch.

Using a system of iterative prompts, the LLM is directed to generate specific types of lexical relationships per relation type (e.g., synonymy, hypernymy, co-hyponymy, meronymy, holonymy). Each relation type adds a unique layer of semantic information to the network, allowing for detailed semantic analysis. Each extracted relationship is represented as an edge in the graph, connecting a source node (lexeme) with a target node (related lexeme). Edges carry attributes specifying:

relation_type;
synonymity_strength;
relation_type_strength;
relation_explanation;
mutual_sense;
mutual_domain.

To provide a concrete example of the network’s multi-relational depth and complexity, Figure 2a–c visualizes the actual subgraph of 1-hop relations centered on the Japanese lexeme 子供 kodomo ‘child’. This data-driven example, showing the central node and its 45 direct neighbors, demonstrates how a single concept is embedded within a dense web of diverse semantic connections generated by our methodology.

An analysis of these layered visualizations reveals the network’s capacity to capture multifaceted world knowledge. Figure 2a illustrates the structural backbone as the concept’s formal classification. We see clear HYPERNYM links to broader categories like 人間 ningen ‘human being’ and 人 hito ‘person’. Conversely, HYPONYM relations point to more specific instances like 男の子 (otokonoko—boy) and 乳児 nyūji ‘infant’. This hierarchical structure is complemented by compositional knowledge, with MERONYM relations connecting 子供 to its constituent parts, such as 手 te ‘hand’ and 足 ashi ‘foot’. This layer effectively forms the concept’s ontological skeleton.

The Associative Context is rendered in Figure 2b. This panel adds a layer of thematic richness. SYNONYM relations connect子供to semantically close terms, while the numerous CO-HYPONYM relations place it in a context of peers—other concepts that share a similar role or hypernym. These links to lexemes like 学生 gakusei ‘student’, 少年 shōnen ‘youth’, and 妹 imōto ‘younger sister’ enrich the central concept with social roles and relational context that go beyond strict “is-a” hierarchies.

The Holistic View is illustrated in Figure 2c combines all layers, presenting a holistic and densely interconnected view. This integrated visualization is the most accurate representation of the output, showing how the different semantic facets are not separate but deeply intertwined. It is in this combined view that the most powerful insights emerge, as we see abstract and thematic connections to concepts like 社会 shakai ‘society’, 教育 kyōiku ‘education’, and 文化 bunka ‘culture’. This demonstrates that the LLM has not just generated a taxonomy, but a rich knowledge graph that situates the concept within its broader functional and cultural context.

This layered example of the 子供 subgraph powerfully underscores our methodology’s ability to create a semantically rich, multi-faceted lexical resource where each node is a nexus of diverse and deeply integrated information.

To further illustrate the structure of the attributes of lexical relationships within the network, consider the following example of a synonym relationship between two Japanese lexemes, mono ‘thing’ and buppin ‘goods’, both of which refer to tangible items or products. Synonym relationships capture pairs of words with closely related meanings, often substitutable in specific contexts.

Example edge: mono ‘thing’—buppin ‘goods’

Source Lexeme: 物 mono ‘thing’

Target Lexeme: 物品 buppin ‘goods’

relation_type: “SYNONYM”

This attribute specifies the type of relationship between the two lexemes. “SYNONYM” indicates that mono ‘thing’ and buppin ‘goods’ share very similar meanings and can often be used interchangeably in contexts related to tangible items, goods, or products.

synonymity_strength: 0.9

The synonymity_strength attribute quantifies the degree of similarity between mono ‘thing’ and buppin ‘goods’ on a scale from 0 to 1. Here, a value of 0.9 suggests a very high level of synonymy, meaning these words are closely related in meaning and can substitute for each other in many contexts.

relation_type_strength: 0.9

The relation_type_strength attribute measures the confidence in the classification of this relationship as synonymity. A high score of 0.9 indicates strong confidence that this relationship is correctly classified as a synonym pair. This value suggests that the model has high certainty that the synonym classification is appropriate for these two lexemes.

relation_explanation: “Both words refer to tangible items or products, often used in commerce.”

Explanation attribute provides a qualitative explanation of the relationship, giving context to the synonymity between mono ‘thing’ and buppin ‘goods’. In this example, it clarifies that both terms are commonly used to describe physical objects, particularly in contexts related to commerce or goods. This explanation enhances the interpretability of the relationship and helps users understand why these two words are considered synonyms.

For synonym relations and for hierarchical relationships (hypernym ↔ hyponym, meronym ↔ holonym), the system automatically generates symmetric and inverse relations, respectively, to reflect bidirectionality, maintaining a comprehensive and navigable network structure.

Additionally, the script employs helper functions to check for pre-existing edges of the same type, ensuring that no duplicate relations are introduced. This edge-based representation, enriched with relationship metadata, enables highly structured querying and semantic exploration within the graph.

2.5. Mutual Domain and Sense Construction

The HAS_DOMAIN and MUTUAL_SENSE relations are derived from the attribute of the synonym, hypernym, hyponym, co-hyponym, meronym and holonym relations designed to provide additional contextual layers within the lexical network, capturing shared meanings and categorical domains for each lexical relationship.

HAS_DOMAIN represents a categorical or thematic association that groups the source and target words into a shared semantic domain. This relation type is intended to capture broader, often abstract, connections that categorize words into general domains (such as nature, technology, or emotions), which can be useful for thematic or topical organization. The model is prompted to identify a broader category (domain) that both the source and target words would logically fall under.

The Mutual_Domain_explanation provides a brief rationale for why this particular domain is relevant, which aids in interpretability and could enhance downstream applications, such as semantic search or educational tools that rely on domain-level understanding.

MUTUAL_SENSE denotes the shared or overlapping sense between the source and target words, providing a finer level of meaning beyond the broader domain category. This is particularly useful for specifying how two lexemes relate semantically when they share attributes or fall within a similar conceptual space, but without being direct synonyms or hypernyms. The prompt requests the model to generate a shared meaning or concept that captures the common aspect between the source and target words. This mutual sense acts as a middle ground between purely lexical similarity (like synonyms) and broader categorical connections (like domain associations), clarifying the specific conceptual overlap.

Together, HAS_DOMAIN and MUTUAL_SENSE allow the network to represent layered associations. HAS_DOMAIN captures the broader, categorical grouping, while MUTUAL_SENSE refines this connection by highlighting more specific shared aspects of meaning. This layered approach adds interpretative depth, enhancing the network’s utility for nuanced language applications, including educational tools, semantic search, and translation aids.

2.6. Quality Control and Data Integrity

To maintain data quality, the system enforces several integrity checks:

Duplicate and Redundancy Checks: Before integrating a new edge, the system verifies that no pre-existing edge with the same source–target–relation combination exists, ensuring semantic exclusivity.
Relation Type Exclusivity: For a specific source lexeme LLM is instructed to associate each target lexeme with only one specific relation type in any instance, preventing semantic overlap and ensuring clear, mutually exclusive categorization. However, that is not altogether implemented very strictly, and the reasons are detailed in the Section 4.

After each lexeme is processed, the system saves the updated graph file as a backup, preserving incremental updates and ensuring the long-term integrity of the lexical network.

2.7. Graph Enrichment and Lexical Attribute Integration

To enhance the graph’s linguistic richness, the system integrates supplemental attributes for each lexeme:

Frequency Data: Lexeme frequency from the VDRJ dataset is included to indicate common usage levels.
JLPT Proficiency Levels: Adding JLPT proficiency levels provides insights into the complexity and difficulty of each lexeme, particularly valuable for educational applications. The current version employs pre-2010 JLPT scale N4–N1.
Romanized Transliteration: Through the PyKakasi library, romanized forms of each lexeme are added, improving accessibility and enabling use by a broader audience, including non-native speakers.
Centrality measures within the NetworkX framework, such as degree and weighted degree (using a relation_type and synonymity_strenght for weight), are calculated to identify nodes of high connectivity, which indicate lexemes with significant semantic overlap or influence across multiple relationships. By analyzing these central nodes, the system highlights critical lexical items that may serve as semantic anchors within the Japanese language network. An executable Jupyter notebook that reproduces the visualizations and queries is provided in the Supplementary Materials (File S1). The Networkx graph object is provided in the Supplementary Materials (File S2).

3. Results

3.1. Initial Lexical Network Construction

Using the initial single relation methodology, which focused predominantly on synonym (or antonym) relationships, the lexical network was constructed with a foundational set of 60,752 nodes and 127,960 edges. This initial network provided a robust starting point for capturing basic associative relationships among a large group of Japanese nouns. However, the focus on limited relation types restricted the network’s capacity to represent more nuanced semantic relationships, such as hierarchical and part–whole relations, which are essential for a comprehensive lexical understanding. Therefore, the application of the new multi-relational methodology allowed for incorporating additional relation types led to a significant expansion of the lexical network. The multi-relational network at the time of writing comprises 155,233 nodes and 708,782 from the same initial 11,055 edges, representing a 2.6-fold increase in nodes and a 4.9-fold increase in edges compared to the initial structure. This substantial growth underscores the system’s enhanced capability to capture complex semantic relationships, broadening the scope of lexical associations within the network.

3.2. Comparative Analysis of Network Structure

The expanded network not only increased in size but also in depth, as evidenced by the inclusion of multiple relationships that enhance lexical specificity, as represented in Table 1. For example, hypernyms and hyponyms provide hierarchical layers, allowing for a clearer representation of lexical generality and specificity, while meronyms and holonyms capture part-to-whole relationships. These additional layers support a more intricate mapping of Japanese lexemes, offering potential applications in NLP tasks that require fine-grained semantic differentiation.

Table 2 provides a detailed breakdown of the distribution of relation types within the lexical network, highlighting the prevalence and role of each relation in shaping the network’s semantic structure.

The distribution of relation types within the lexical network reveals insightful patterns about the semantic structure captured in the graph. The HYPONYM and HYPERNYM relation types, both at 17.41% due to the initial mirroring of the created edges in either of these inverse lexical relations, represent the most frequent connections, suggesting a strong hierarchical organization where specific terms are associated with broader categories or subcategories. This high frequency indicates the network’s robustness in capturing classification layers.

CO-HYPONYM relations, at 17.30%, closely follow, highlighting the prevalence of terms within shared categories that are neither direct superordinates nor subordinates. This reflects language’s associative complexity, where items like 猫 neko ‘cat’ and 犬 inu dog” might share a common category without one being a subtype of the other.

HAS_DOMAIN and DOMAIN_OF relations, both accounting for 11.18%, provide significant thematic or categorical connections between lexemes, linking words within shared conceptual domains, such as “medicine” and “hospital.” These relations are essential for organizing words by broader themes.

SYNONYM relations (9.30%) and HOLONYM (8.93%) contribute to the network’s depth, establishing connections based on similarity or part–whole relationships, respectively. The presence of MERONYM relations (7.28%) further enriches the structure by detailing part-to-whole associations.

The expanded lexical network integrates multiple types of semantic relationships, each enriched with quantitative attributes such as synonymity_strength and relation_type_strength. The synonymity strength measures the semantic similarity between source and target lexemes, while the relation type strength reflects the confidence in the classification of each relationship type.

Table 3 provides a statistical overview of the different relation types in the multi-relational lexical network, detailing the synonymity strength and the relation type strength for each relation. Synonymity strength is highest in SYNONYM relations (mean = 0.74), reflecting the close semantic similarity expected within synonymous pairs. In contrast, HYPERNYM relations exhibit the lowest mean synonymity strength (0.55), which aligns with their broader categorical nature compared to the specific semantic overlap in synonyms.

Relation Type strength is relatively high across all relation types, with HAS_DOMAIN and DOMAIN_OF relations scoring the highest (mean = 0.77). This suggests a strong model confidence in these domain-based classifications, likely due to clear semantic boundaries that categorize words into specific thematic domains.

The standard deviation values for synonymity strength are generally higher in part–whole relations like HOLONYM and MERONYM, indicating more variability in semantic similarity across part-to-whole associations. This could be attributed to the inherent complexity and diversity within part–whole relationships, where the degree of connection can vary significantly depending on the context.

3.3. Network Structure and Centrality Analysis

A structural analysis of the expanded network reveals key characteristics about its topology and the organization of the Japanese lexicon it represents. The graph is very sparse, with a density of 0.000016, which is typical for large real-world networks.

A key finding is the network’s connectivity. While there are 96,935 weakly connected components, the vast majority of the lexicon is unified within a single giant component. This largest component contains 167,372 nodes, representing 63% of all lexical items in the graph. This indicates a high degree of semantic interconnectedness across a significant portion of the Japanese vocabulary.

To identify the most influential concepts in the network, a degree centrality analysis was performed. The nodes with the highest number of connections are overwhelmingly abstract concepts that organize large areas of knowledge. The top 5 most connected nodes are:

社会 shakai ‘society;: 4874 connections;
文化 bunka ‘culture’: 4864 connections;
生物 seibutsu ‘living things/biology’: 3307 connections;
教育 kyōiku ’education’: 3179 connections;
経済 keizai ’economy’: 2742 connections.

The prominence of these abstract concepts provides a unique insight into the thematic core of the generated lexicon, a point we will explore further in the Discussion.

3.4. Comparison of Lexical Relations in Graph and WordNet

The analysis involved comparing lexical relations derived from the constructed graph against relations identified within WordNet for Japanese lexemes that was taken from the NLTK library and Open Multilingual WordNet [25]. Each pair of source–target lexemes was evaluated to identify relations such as HYPERNYM, HYPONYM, HOLONYM, MERONYM, SYNONYM, and CO-HYPONYM. Relations were compared to see if there was a match between the Graph’s relation type and the relation derived from WordNet.

The following examples highlight cases where a source–target pair has multiple relation types in the graph:

Table 4 and Table 5 present a comparison between graph-derived relations and WordNet relations for selected source–target pairs, along with an indication of whether there is a match between the two. The comparison reveals several key insights into the strengths and limitations of the lexical Graph and WordNet as resources for semantic relationships.

3.4.1. Matching Accuracy and Coverage

The exact match rate of 37.96% indicates that a significant number of Graph-derived relations do not directly align with WordNet relations. However, with soft matching, the alignment improves to 58.70%, suggesting that many source–target pairs share multiple semantic dimensions.

Soft matching allows for better alignment by accounting for lexeme pairs with overlapping or multi-faceted relationships (e.g., HYPONYM and CO-HYPONYM), which are common in nuanced or domain-specific lexicons.

The presence of multiple relation types for the same lexeme pairs, such as アパート apaato ‘apartment’ with コンドミニアム kondominiamu ‘condominium’ or タウンハウス taunhausu ‘townhouse’, illustrates the complexity of lexical semantics. These cases often show both hierarchical (e.g., HYPERNYM, HYPONYM) and associative (CO-HYPONYM) connections, reflecting real-world linguistic nuances where words may belong to overlapping categories. Such multi-relation pairs underscore the importance of context in semantic analysis, as lexemes often have context-dependent relations that cannot be fully captured by a single relation type.

3.4.2. Discrepancies Between Graph and WordNet Relations

For pairs where the Graph and WordNet relations did not align, some discrepancies can be attributed to the inherent differences in data sources. WordNet may lack certain domain-specific terms or interpret hierarchical relationships differently, particularly for lexicons that include modern or specialized terms (e.g., IT terms, media). Additionally, Graph-derived relations rely on automated extraction techniques that may capture broader or alternative associations not encoded in WordNet.

3.4.3. Confusion Matrix Analysis and Relation-Type Statistics

The confusion matrix provides a visual comparison of the alignment between Graph-derived relations and WordNet relations. Each cell in the matrix represents the frequency of occurrences where a specific relation type from the graph matches (or not) the corresponding WordNet relation type. The diagonal cells reflect exact matches, while off-diagonal cells indicate mismatches.

Confusion Matrix heatmap (Figure 3) depicts the frequency distribution of relation types between the Graph and WordNet. The heatmap color scale highlights the intensity of matches, with deeper colors indicating higher counts of alignment. For each unique relation type, precision and recall metrics were calculated to assess the accuracy of the Graph-derived relations relative to WordNet. Precision represents the proportion of true positive relations (i.e., relations correctly matched between Graph and WordNet) out of all relations predicted by the graph as a specific type, while recall measures the proportion of true positives out of all actual instances of a specific relation type in WordNet.

3.4.4. Distribution of WordNet Relations for Each Graph Relation Type

A complementary distribution analysis was conducted to understand how each graph-derived relation type distributes across WordNet relations in order to identify patterns where specific graph relations may partially align with multiple WordNet relation types, indicating the nuanced overlap between hierarchical, associative, and part–whole relations in natural language.

Table 6 provides a comparative view of precision and recall for each relation type. Additionally, the WordNet Distribution column shows the breakdown of WordNet relation types that align with each graph relation, illustrating how often each WordNet relation type matches the graph’s predictions.

SYNONYM and HYPONYM relations have relatively high precision and recall, indicating strong alignment between the Graph and WordNet.

HOLONYM relations show lower precision (7.92%) but moderate recall (32%), suggesting that while they are often identified in WordNet, their classification in the Graph may vary across similar relation types like HYPERNYM and CO-HYPONYM.

The WordNet Distribution reveals instances where relation types blend, as seen with CO-HYPONYM and HYPERNYM, which frequently align with multiple relation types in WordNet, reflecting nuanced overlaps in semantic roles.

This combined analysis helps to clarify which relations align most consistently and where further refinement may be needed for enhanced semantic accuracy in the graph.

High precision and recall values for certain relation types suggest that the graph accurately captures these relations compared to WordNet. For example, SYNONYM relations often show high precision, indicating that the graph effectively distinguishes synonymy with minimal false positives.

Lower precision or recall in some relation types, such as MERONYM or HOLONYM, reflects the complexity of part–whole relationships, which may require more nuanced criteria for accurate classification.

3.4.5. Implications for NLP Applications

The mixed alignment between graph and WordNet relations suggests that integrating multiple resources can enrich lexical networks and improve their applicability for NLP tasks. Systems that can incorporate both hierarchical (WordNet) and associative or domain-specific relations (graph) are likely to achieve greater accuracy in semantic interpretation.

The findings support the use of a hybrid approach for lexical networks, combining manually curated resources like WordNet with dynamically generated relations to capture a wider range of semantics in language.

The high incidence of multiple relations highlights the need for improved relation disambiguation methods in computational lexicography. Further research could focus on defining hierarchical rules to prioritize or layer relations based on context, which may help refine the accuracy of lexical relations in language models.

Future work might explore machine learning approaches to classify and weight relations, leveraging metrics such as relation strength and synonymity to create more sophisticated and context-aware lexical networks.

4. Discussion

Our findings contribute to the ongoing conversation about resource creation in computational linguistics. The methodology presented here exemplifies a trade-off: it substitutes the intensive labor of human experts and manual data curation with computational resources in the form of LLM API calls. This study demonstrates that this trade-off is highly effective for achieving scalability and speed. While not “resource-free,” this approach democratizes the creation of foundational linguistic tools. It makes the development of a comprehensive lexical network—a task that once took a decade—feasible within a much shorter timeframe and adaptable to other languages without requiring a new team of linguistic experts for each one. This re-frames the challenge of resource development from one of finding and funding human expertise to one of effectively prompting and managing computational models.

4.1. Structural Insights and Lexical Organization

The structural properties of the node network offer a profound glimpse into the organizational principles of the Japanese lexicon as captured by a large language model. The analysis reveals an emergent ontology that mirrors the complex, hierarchical, and associative nature of human language, providing insights that go far beyond simple node and edge counts.

4.1.1. The Giant Component as a Model of Lexical Cohesion

A primary finding is the consolidation of 63% of the vocabulary into a single, massive, weakly connected component. The existence of this “giant component” is highly significant. It demonstrates that the vast majority of the Japanese lexical items in our graph are semantically reachable from one another. This cohesive structure suggests that the network is not a mere collection of disconnected word lists but a holistic and integrated model of the lexicon. It implies a high degree of lexical cohesion, where concepts from disparate domains—from everyday objects to scientific terminology—are linked through chains of semantic relationships. This property is crucial for applications that rely on measuring semantic distance or traversing conceptual pathways, as it ensures that paths can be found between a vast and diverse range of lexemes.

4.1.2. Network Topology: A Core–Periphery Lexical Structure

The network’s topology, characterized by extreme sparsity (density ≈ 0.000016) and a large gap between the average degree (8.57) and median degree (1.0), is indicative of a classic core–periphery or scale-free structure. This is not a random network; it is highly organized. This structure suggests that the lexicon operates on a principle of efficiency and cognitive economy:

The Periphery: The vast majority of nodes (the periphery) have very few connections. These represent specialized, domain-specific, or less frequent words that have a narrow semantic scope.
The Core: A small but critical number of nodes (the core) are exceptionally well-connected, acting as hubs. These are not random words but, as our centrality analysis shows, are foundational concepts.

This core–periphery model aligns with theories of the mental lexicon, where a stable core of general, high-frequency vocabulary provides the structural scaffolding for a much larger, dynamic periphery of specialized terms. Our work provides large-scale empirical evidence for this structure, generated automatically from language data.

4.1.3. Centrality Analysis: Uncovering the Conceptual Backbone

Perhaps the most compelling insight comes from the degree centrality analysis. The emergence of abstract, high-level concepts like 社会 shakai ‘society’, 文化 bunka ‘culture’, 生物 ikimono ‘living things’, and 教育 kyooiku ‘education’ as the most central nodes is a profound result. It indicates that the LLM, in processing its training data, has learned to model the conceptual superstructure of human knowledge.

These central nodes function as semantic hubs or conceptual anchors. They are not central because they are simply frequent, but because they are thematically essential, organizing vast downstream domains of related vocabulary. For example, “society” serves as an anchor connecting thousands of nodes related to law, politics, family, communication, and social structures. The identification of this conceptual backbone is a key contribution, as it provides an empirical map of the thematic centers of gravity within the Japanese language. This emergent, data-driven ontology is far more dynamic and potentially more representative of modern language use than a manually curated taxonomy, making it an invaluable resource for tasks like topic modeling, document summarization, and knowledge-graph-based reasoning. The rich, annotated data, including relation_explanation attributes, further supports this, offering explainable pathways through this complex conceptual space.

4.2. Multiple Relationship Types for Same Source Target Pair

Although the model was prompted to provide a single, mutually exclusive relation for each lexeme pair, it frequently deviated from this constraint. Analysis of the 708,782 extracted relations reveals that 155,233 unique source–target pairs are characterized by multiple relation types. This phenomenon is not interpreted as a model failure but rather as a reflection of the intrinsic nature of lexical semantics. In natural language, the boundaries between relation types are often fluid and context-dependent, rendering any single, discrete categorization artificially limiting. The LLM’s behavior is a direct consequence of its training on vast, unstructured text corpora where such semantic overlaps are ubiquitous. The model’s generative objective, which prioritizes informational completeness, likely compels it to report all valid relationships it identifies, even when instructed to select only one. Thus, the presence of multiple relations underscores the model’s capacity to capture the nuanced, multifaceted nature of word associations as they appear in real-world language use.

4.3. Examples of Pairs with Multiple Relations

The following examples illustrate source–target pairs in the network that have more than one semantic relationship:

360度カメラ 360-do kamera ‘360-degree camera’ → カメラ kamera ‘camera’

Number of Relations: 2

Relation Types: HYPERNYM, CO-HYPONYM

This pair highlights how “360-degree camera” is considered a specific type of “camera” (HYPERNYM), while also sharing categorical similarities with other types of cameras (CO-HYPONYM).

360度カメラ 360-do kamera ‘360-degree camera’ → デジカメ dejikame ‘digital camera’

Number of Relations: 2

Relation Types: HYPERNYM, CO-HYPONYM

Similarly, “360-degree camera” is viewed as a specific form of “digital camera,” establishing both hierarchical and categorical relationships.

CM shiiemu ‘commercial’ → 映像 eizoo ‘video’

Number of Relations: 2

Relation Types: HYPERNYM, CO-HYPONYM

This example highlights “commercial” as a specific type of “video” (HYPERNYM) and as part of the broader domain of visual media (CO-HYPONYM).

DNSサーバー diienuesusaabaa ‘DNS server’ → ホスト hosuto ‘host’

Number of Relations: 2

Relation Types: HYPERNYM, CO-HYPONYM

“DNS server” and “host” share a hierarchical relationship where “host” is a broader category. They also belong to the same domain of network infrastructure, forming a CO-HYPONYM relation.

4.4. Discussion of Multiple Relations

A significant finding of this study is the high prevalence of source–target pairs linked by multiple, distinct semantic relations. The occurrence of these multi-relational links, found across 155,233 unique pairs, is not an anomaly but a core feature that enriches the network’s semantic depth and offers valuable insights into the complexity of language. For instance, a lexeme pair can exhibit both hierarchical (HYPERNYM) and associative (CO-HYPONYM) connections, reflecting how language organizes concepts into both formal taxonomies and fluid, associative clusters. From an NLP perspective, this layered semantic information is highly valuable; understanding that “DNS server” and “host” share both hierarchical and categorical ties can improve the accuracy of classification and disambiguation tasks. However, this semantic richness also introduces notable challenges, as a high density of edges can increase complexity in network traversal and impact computational efficiency in real-time applications. This behavior stems not from a failure of the model, but from its success in mirroring the natural complexity of language, where strict relational exclusivity is often an artificial constraint. While the LLM was instructed to provide a single relation type, its tendency to return multiple valid connections reflects the intertwined and multifaceted nature of lexical semantics. Consequently, applications requiring a single canonical relation may necessitate additional post-processing or context-aware disambiguation steps. This observation opens promising avenues for future research into semantic role labeling, automated relation disambiguation, and the construction of more nuanced, domain-specific language models.

4.5. Interpreting the Alignment with WordNet: A Complementary Resource

The comparative analysis with WordNet yielded an exact match rate of 37.96% and a soft match rate of 58.70%. While these figures may seem moderate, they do not indicate a failure of our methodology. Instead, we argue that this level of alignment is a significant finding in itself, revealing the fundamental differences between a formally curated ontology and a dynamic, data-driven lexical network. The discrepancy is not a sign of weakness, but a reflection of the complementary strengths of our LLM-generated graph.

Several key factors explain this moderate alignment and highlight our network’s unique value. Firstly, these systems have different design philosophies: WordNet is an expert-curated, top-down resource. It is designed to be a clean, formal ontology with strict, mutually exclusive relationships based on lexicographical principles. In contrast, our network is a bottom-up, emergent structure derived from the statistical patterns in the vast text data on which the LLM was trained. It captures how words are actually used in context, including their associative, thematic, and sometimes ambiguous connections, which may not fit neatly into WordNet’s rigid structure.

The significant increase from an exact match (37.96%) to a soft match (58.70%) is arguably the most important result of the comparison. This 20-percentage-point jump demonstrates that in a large number of cases where the relation type did not match exactly, our graph still identified a valid semantic relationship between the lexemes. For example, a pair identified as CO-HYPONYM in our graph might be a HYPONYM in WordNet. This reflects the inherent complexity and context-dependency of language, where a single word pair can plausibly have multiple relationship types (as discussed in Section 4.2). Our network captures this semantic multiplicity, whereas WordNet often encodes a single, canonical relationship.

Large language models are trained on contemporary text, including the internet. Consequently, our network is likely to include modern, domain-specific, or colloquial terms that are absent from the version of Japanese WordNet used for our comparison. Discrepancies naturally arise when WordNet lacks an entry for a term or models its relationships differently due to its older or more formal scope.

Our goal was not to perfectly replicate WordNet, but to develop a scalable method for creating a new type of lexical resource. The results show that this goal was achieved. The generated graph should be viewed as a resource that augments, rather than replaces, traditional ontologies. WordNet provides a stable, highly structured hierarchical backbone, while our network offers a dynamic, densely connected, and context-sensitive web of associations. For NLP applications, the combination of both resources would likely yield performance superior to either one alone, with our graph providing the nuanced, real-world context that formal ontologies can sometimes miss.

Therefore, the “moderate” alignment is a positive indicator that our approach successfully captures a rich and different facet of lexical semantics—one that is more fluid, associative, and reflective of modern language use.

4.6. Comparison with Other Lexicon Building Approaches

In this section we mention differences between our methodology and other mature approaches to lexicon building, namely LexicoMatic and LexAN, addressing the unique points of our system.

LexicoMatic is a powerful framework designed to create multilingual lexical–semantic resources. Its core methodology is often seed-based; for instance, it may take a word and its definition from a well-resourced language (like English), use machine translation to port this information to a target language, and then identify related terms in that language. This approach is highly effective for its primary goal of multilingual resource creation. Our work differs in both its objective and technical approach. We focus on creating a deep, monolingual network for Japanese, and our methodology does not rely on translation or definition parsing. Instead, we directly query the emergent, latent knowledge of a generative LLM to extract a broad set of six predefined semantic relations. While LexicoMatic excels at multilingual bootstrapping, our method is designed for deep, multi-relational extraction within a single language at a massive scale.

On the other hand, LexAN (Lexical Association Networks) builds its networks primarily from a corpus-based, statistical perspective. It identifies “lexical functions” by analyzing statistically significant co-occurrences and other associations between words in large text corpora. This represents a classic and powerful methodology for uncovering data-driven relationships. Our approach is fundamentally different, as it is generative rather than statistical. We do not rely on co-occurrence patterns within a specific corpus. This allows us to extract hierarchical (hypernymy) and compositional (meronymy) relationships that are often part of latent world knowledge and may not be explicitly signaled by co-occurrence in text. Our network thus captures a different type of knowledge—the abstract, relational understanding learned by the LLM during its pre-training on a vast dataset.

4.7. Comparison with Recent Literature and State-of-the-Art

Our work enters a vibrant and rapidly evolving field of research focused on leveraging LLMs for automated knowledge base and lexical network construction. Recent surveys confirm that the dominant paradigm for knowledge extraction has shifted towards using pre-trained LLMs, largely replacing complex, multi-stage NLP pipelines [26,27,28]. The core challenges now revolve around effective prompt engineering, enforcing structured outputs (e.g., via JSON schema), and mitigating hallucinations—all of which our methodology explicitly addresses. Our evaluation approach, which benchmarks the generated network against a gold-standard resource such as WordNet, is also a standard and widely accepted practice for assessing the quality of automatically extracted relations.

While our work aligns with these general trends, it is differentiated by several key factors. A significant portion of recent research targets the extraction of knowledge graphs for narrow, specialized domains, such as the biomedical field, where LLMs are used to analyze complex sources like electronic medical records [29]. These efforts are critical for their respective applications, but our study addresses the distinct and arguably more complex challenge of building a large-scale, general-purpose lexical network for an entire language. Furthermore, our framework is explicitly multi-relational, designed to capture the nuanced and varied ways lexemes relate—from hierarchical to compositional and associative links.

Our pipeline was designed from the ground up for massive scalability. By implementing an iterative and resumable workflow, we have constructed a network of over 155,000 nodes and 700,000 edges. The sheer scale of this general-language resource is a primary contribution, demonstrating a practical and efficient pathway for developing foundational lexical tools for any language with a sufficiently powerful LLM. A key contribution that sets our work apart is the deep analysis of the network’s emergent topological properties. While many studies focus primarily on the precision and recall of the extraction process itself, we investigate the resulting structure. Our analysis of the “giant component,” the core–periphery topology, and the identification of abstract concepts as semantic hubs provides large-scale empirical evidence for theories of the mental lexicon. Finally, literature increasingly points toward the necessity of a human-in-the-loop (HiL) for final verification and quality assurance, noting that a hybrid approach often yields the best balance of precision and recall [30]. While our framework is fully automated to maximize scalability, we acknowledge that incorporating a HiL validation module is a promising direction for future work to further refine the network’s accuracy. Our current approach represents a significant advance in the rapid and scalable generation of a comprehensive first draft of a language’s lexical network.

4.8. Graph Visualization and Exploration

For this project, we developed an interactive 2D visualization tool, represented in Figure 4, to explore and analyze the Japanese lexical network, with a particular focus on synonym relationships. We created an interface that allows users to interact with the synonym network through several key features, including search functionality, JLPT level filtering, relationship type selection, and graph depth control. These tools enable users to filter and navigate the network based on their specific requirements, providing an adaptable view of the network’s structure.

In addition to the 2D visualization, a 3D representation was implemented using Vásturiano’s 3D library for graph representation [31], see Figure 5. This enhanced spatial representation offers an immersive way to explore synonym connections within the network. The 3D layout makes it easier to understand the network’s structure by placing lexemes in a multi-dimensional space.

While these tools have laid the groundwork for visualizing synonym networks effectively, we are still refining the representation to best accommodate the multi-layered nature of lexical relationships. The graph representation is available as a part of the web app at the https://jln.syntagent.com/ (accessed on 15 October 2025). Future efforts will focus on optimizing the display of various relation types, ensuring that hierarchical and associative connections are accurately and intuitively represented within the same network. This ongoing work aims to provide a comprehensive, user-friendly interface that supports deeper insights into multi-dimensional lexical data.

4.9. A Case Study: Enhancing Japanese Language Education

A tangible application of this multi-relational network lies in the domain of computer-assisted language learning (CALL) for Japanese. Traditional vocabulary acquisition often relies on rote memorization from flat lists, which fails to build the deep, interconnected understanding characteristic of native-like fluency. Our network provides the semantic scaffolding necessary to create a more sophisticated pedagogical tool that moves beyond simple definitions to foster a conceptual understanding of the lexicon.

Such a system could dynamically generate learning modules based on the graph’s structure. For instance, upon learning the word 乗り物 norimono ‘vehicle’, a learner would not only see its definition but also be presented with an interactive tree of its most common hyponyms, such as 車 kuruma ‘car’ and 自転車 jitensha ‘bicycle’. This hierarchical exploration helps learners organize vocabulary into meaningful categories. Furthermore, the system could leverage meronymic relations to create contextually rich exercises, prompting a student to identify that an エンジン enjin ‘engine’ is a part of a car. For advanced learners, the network’s quantitative synonymity_strength attribute could be used to explain the nuanced differences between close synonyms, clarifying which terms are near-perfect substitutes versus those appropriate only in specific contexts. By grounding vocabulary acquisition in a rich web of semantic relationships, this approach would facilitate a more robust and intuitive learning process, mirroring how conceptual knowledge is naturally organized.

5. Limitations and Future Work

While this study demonstrates a scalable methodology for constructing a large-scale lexical network, certain limitations must be acknowledged, which in turn define a clear trajectory for future research. The primary evaluation of the network’s quality is currently automated via comparison with WordNet, an indirect measure that may not fully capture the correctness of all relations from a human perspective. Furthermore, the model’s tendency to assign multiple plausible relations to a single lexeme pair, while reflective of natural language ambiguity, introduces potential semantic noise for applications requiring a single, canonical relationship. To address these limitations and establish a more rigorous measure of quality, our priority for future work is a formal human evaluation study. Such a study, involving the annotation of a sampled subset of relations by multiple native speakers to measure inter-annotator agreement, would provide a direct, empirical validation of the network’s precision. Complementing this, we plan to develop a context-aware ranking model to disambiguate instances of multiple relation assignments, thereby enhancing the network’s utility for downstream tasks. Beyond validation, we aim to expand the network’s scope and utility by integrating it with large-scale knowledge bases such as Wikidata and BabelNet. This would not only enrich the network with encyclopedic knowledge but also facilitate its cross-lingual expansion. Finally, to demonstrate the network’s practical value, we intend to move beyond the conceptual case study presented here to the implementation and empirical testing of a downstream application, such as an intelligent language learning tool, to quantitatively measure its real-world impact.

6. Conclusions

In this study, we have developed a scalable language-independent methodology for constructing and analyzing a richly interconnected lexical network, using advanced language model capabilities to automate relation extraction. The resulting specific Japanese network, with its extensive inclusion of hierarchical, associative, and domain-specific relations, demonstrates the versatility of large language models in creating comprehensive lexical resources. In this article, we have demonstrated its potential in the resulting Japanese network. Beyond capturing common relations like synonyms and hypernyms, the network incorporates nuanced categories such as co-hyponyms, meronyms, and holonyms.

The comparison with WordNet relations highlighted areas of alignment and divergence, indicating that while traditional lexical resources provide a robust foundation, LLM-based networks can capture additional, context-sensitive associations. This flexibility in modeling complex semantic relationships suggests potential for integrating these networks into applications that require layered understanding, from educational tools that address learners’ needs [32] to specialized NLP systems.

Our findings emphasize that integrating large language models in lexical network construction offers a practical approach to enhancing lexical resources across different languages and domains. Future research can further refine this methodology by incorporating other languages, other LLM’s, and additional linguistic contexts and exploring adaptive approaches to problems in lexical semantics. This study thus contributes a foundational framework for developing lexical networks that are both semantically rich and adaptable to diverse language technologies.

Supplementary Materials

The following supporting script for can be downloaded at: https://www.mdpi.com/article/10.3390/knowledge5040024/s1.

Author Contributions

Conceptualization, B.P. and D.Š.; methodology, B.P. and D.Š.; software, B.P.; validation, D.Š.; formal analysis, B.P. and D.Š.; resources, B.P.; data curation, B.P.; funding acquisition, B.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by UNIRI projekti iskusnih znanstvenika 2023: Implementacija i evaluacija komunikacijskih AI asistenata za razvoj pametnog turizma u području Kvarnera (uniri-iskusni-human-23-264 3258), Matematičko modeliranje u obradi prirodnog jezika (uniri-iskusni-prirod-23-150), Formals-2 (Hrvatska zaklada za znanost IP-2024-05-3882). This research was funded in part by the Japan Foundation.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The scripts used for the construction of network along updated lexical networks datasets, is available in our project repository: https://github.com/bperak/japanese-lexical-graph (accessed on 15 October 2025). Researchers are encouraged to utilize and build upon this resource for further exploration in computational linguistics.

Acknowledgments

We would like to express our gratitude to Atsushi Mori from Mukogawa Women’s University for extensive help. This paper was prepared as part of the Institutional project Japanese language and society: basic concepts and empirical analysis (1 October 2023–30 September 2027), Juraj Dobrila University of Pula, Croatia.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Martelli, F.; Procopio, L.; Barba, E.; Navigli, R. LexicoMatic: Automatic Creation of Multilingual Lexical-Semantic Dictionaries. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, Nusa Dua, Indonesia, 1–4 November 2023; Volume 1, pp. 820–833. [Google Scholar]
Reyes-Magaña, J.; Sierra, G.; Bel-Enguix, G.; Gómez-Adorno, H. LexAN: Lexical Association Networks. Comput. Y Sist. 2023, 27, 955–963. [Google Scholar] [CrossRef]
Agustín-Llach, M.P.; Rubio, J. Navigating the Mental Lexicon: Network Structures, Lexical Search and Lexical Retrieval. J. Psycholinguist. Res. 2024, 53, 21. [Google Scholar] [CrossRef] [PubMed]
Dongsuk, O.; Kwon, S.; Kim, K.; Ko, Y. Word sense disambiguation based on word similarity calculation using word vector representation from a knowledge-based graph. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA, 20–26 August 2018; pp. 2704–2714. [Google Scholar]
Perak, B.; Kirigin, T.B. Construction Grammar Conceptual Network: Coordination-based graph method for semantic association analysis. Nat. Lang. Eng. 2023, 29, 584–614. [Google Scholar] [CrossRef]
Ban Kirigin, T.; Bujačić Babić, S.; Perak, B. Lexical sense labeling and sentiment potential analysis using corpus-based dependency graph. Mathematics 2021, 9, 1449. [Google Scholar] [CrossRef]
Sreerakuvandana, S.; Princy, P.; Varsha, A. Understanding Large Language Models. In Challenges in Large Language Model Development and AI Ethics; Gupta, B., Ed.; IGI Global Scientific Publishing: Hershey, PA, USA, 2024; pp. 1–24. [Google Scholar] [CrossRef]
Špica, D.; Perak, B. Enhancing Japanese Lexical Networks Using Large Language Models: Extracting Synonyms and Antonyms with GPT-4o. In Lexicography and Semantics: Proceedings of the XXI EURALEX International Congress; Institute for the Croatian Language: Zagreb, Croatia, 2024; pp. 265–284. Available online: https://euralex.jezik.hr/wp-content/uploads/2021/09/Euralex-XXI-proceedings_1st.pdf (accessed on 15 October 2025).
NINJAL (National Institute for Japanese Language and Linguistics). Bunruigoihyoo—Zooho Kaiteiban [Word List by Semantic Principles—Revised and Enlarged Edition]; Dainippon Tosho: Tokyo, Japan, 2004. [Google Scholar]
Weblio. Available online: https://ejje.weblio.jp/ (accessed on 31 October 2024).
Goo Dictionary. Available online: https://dictionary.goo.ne.jp (accessed on 31 October 2024).
Bond, F.; Isahara, H.; Fujita, S.; Uchimoto, K.; Kuribayashi, T.; Kanzaki, K. Enhancing the Japanese WordNet. In Proceedings of the 7th Workshop on Asian Language Resources (ALR7), Suntec, Singapore, 6–7 August 2009. [Google Scholar]
Isahara, H.; Bond, F.; Uchimoto, K.; Utiyama, M.; Kanzaki, K. Development of the Japanese WordNet. In Proceedings of the Sixth International Conference on Language Resources and Evaluation, Marrakech, Morocco, 28–30 May 2008; pp. 2420–2423. [Google Scholar]
Openai Gpt-3. Available online: https://github.com/openai/gpt-3/blob/master/dataset_statistics/languages_by_word_count.csv (accessed on 31 October 2024).
LLM-jp: A Cross-Organizational Project for the Research and Development of Fully Open Japanese LLMs. Available online: https://arxiv.org/pdf/2407.03963 (accessed on 31 October 2024).
Kawahara, D.; Kuga, Y.; Kurohashi, S.; Suzuki, J.; Miyao, Y. LLM-jp: Inter-organizational Project for the Development of Japanese Large Language Models. J. NLP 2024, 31, 266–279. (In Japanese) [Google Scholar] [CrossRef]
Lee, J. Automatic Assignment of Semantic Categories by ChatGPT: Can Generative AI Understand the Abstract Attributes of Words? Math. Linguist. 2024, 34, 432–442. Available online: https://researchmap.jp/jhlee/published_papers/47599588 (accessed on 15 October 2025).
Sunakawa, Y.; Lee, J.; Takahara, M. The construction of a database to support the compilation of Japanese learners’ dictionaries. Acta Linguist. Asiat. 2012, 2, 97–115. [Google Scholar] [CrossRef]
Github Repository: Japanese Lexical Networks. Available online: https://github.com/bperak/lexical_net_jpn (accessed on 15 October 2025).
Matsushita, T. The Vocabulary Database for Learners of Japanese, Version 1.11 (In Japanese). 2011. Available online: http://www17408ui.sakura.ne.jp/tatsum/database.html#vdrj (accessed on 31 October 2024).
OpenAI API Reference. Available online: https://platform.openai.com/docs/api-reference (accessed on 31 May 2025).
Narayanan, P.K. Getting Started with Data Validation using Pydantic and Pandera. In Data Engineering for Machine Learning Pipelines: From Python Libraries to ML Pipelines and Cloud Platforms; Apress: Berkeley, CA, USA, 2024; pp. 163–196. [Google Scholar]
OpenAI Structured Outputs. Available online: https://platform.openai.com/docs/guides/structured-outputs (accessed on 31 May 2025).
Hagberg, A.; Swart, P.J.; Schult, D.A. Exploring Network Structure, Dynamics, and Function Using NetworkX; No. LA-UR-08-05495; LA-UR-08-5495; Los Alamos National Laboratory (LANL): Los Alamos, NM, USA, 2007.
Goodman, M.W.; Bond, F. Intrinsically interlingual: The Wn Python library for wordnets. In Proceedings of the 11th Global Wordnet Conference 2021, Pretoria, South Africa, 18–22 January 2021; Global Wordnet Association, University of South Africa (UNISA): Pretoria, South Africa, 2021; pp. 100–107. [Google Scholar]
Ren, X.; Tang, J.; Xia, L.; Yin, D.; Chawla, N.; Huang, C. A Survey of Large Language Models for Graphs. arXiv 2024, arXiv:2405.05943. [Google Scholar] [PubMed]
Zhu, Y.; Wang, X.; Chen, J.; Qiao, S.; Ou, Y.; Yao, Y.; Deng, S.; Chen, H.; Zhang, N. Large Language Models for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities. World Wide Web J. 2024, 7, 58. [Google Scholar] [CrossRef]
Li, Y.; Li, Z.; Wang, P.; Li, J.; Sun, X.; Cheng, H.; Yu, J.X. A survey of graph meets large language model: Progress and future directions. arXiv 2023, arXiv:2311.12399. [Google Scholar]
Arsenyan, V.; Bughdaryan, S.; Shaya, F.; Small, K.; Shahnazaryan, D. Large language models for biomedical knowledge graph construction: Information extraction from EMR notes. arXiv 2023, arXiv:2301.12473. [Google Scholar] [CrossRef]
Tsaneva, S.; Dessì, D.; Osborne, F.; Sabou, M. Knowledge graph validation by integrating LLMs and human-in-the-loop. Inf. Process. Manag. 2025, 62, 104145. [Google Scholar] [CrossRef]
Github Repository Vasturiano 3d-Force-Graph. Available online: https://github.com/vasturiano/3d-force-graph (accessed on 31 May 2025).
Srdanović, I.; Špica, D. Sudjelovanje studenata japanskog jezika u leksikografskim aktivnostima i planiranje bilingvalnog e-rječnika. Tabula: Časopis Filoz. Fak. Sveučilište Jurja Dobrile U Puli 2021, 18, 175–200. [Google Scholar] [CrossRef]

Figure 1. Process schema illustrating the system workflow.

Figure 2. (a) Visualization of the subgraph for the lexeme子供 kodomo ‘child’, (marked as the yellow ego-node) showing the contribution of different relation types. The figure displays the 45 direct (1-hop) connections to the central node. The structural layer, showing hierarchical (HYPERNYM, HYPONYM) and compositional (MERONYM) relations. (b) The associative layer, showing relations of semantic similarity (SYNONYM) and shared context (CO-HYPONYM). (c) The complete, integrated network, showing all relation types simultaneously to illustrate the true density and richness of the generated data.

Figure 3. Confusion Matrix Comparing Graph-Derived Relations with WordNet Relations. The heatmap visualizes the alignment between relation types from the graph and WordNet. Diagonal cells represent matches, while off-diagonal cells indicate mismatches, with color intensity corresponding to the frequency of each relation type’s occurrence. This matrix highlights areas of strong alignment, as well as discrepancies in classification across relation types.

Figure 4. Two-dimensional visualization tool to explore and analyze the Japanese lexical network.

Figure 5. Three-dimensional visualization tool for exploring Japanese lexical networks.

Table 1. Lexical networks summary.

Graph	Nodes	Edges
Synonym layered graph	60,752	127,960
Multi-layered relational graph	155,233	708,782

Table 2. Distribution of different relation types in the network, with the percentage of each relation type relative to the total connections.

Relation Type	Percentage (%)
HYPONYM	17.41
HYPERNYM	17.41
CO-HYPONYM	17.30
HAS_DOMAIN	11.18
DOMAIN_OF	11.18
SYNONYM	9.30
HOLONYM	8.93
MERONYM	7.28

Table 3. Statistical overview of the different relation types in the multi-relational lexical network.

Relation Type	Strength	Mean	StD	Range
SYNONYM	Synonymity	0.74	0.12	0.00–1.00
	Relation Type	0.75	0.13	0.00–1.00
HYPONYM	Synonymity	0.69	0.11	0.10–1.00
	Relation Type	0.75	0.12	0.10–0.95
CO-HYPONYM	Synonymity	0.65	0.13	0.00–1.00
	Relation Type	0.73	0.12	0.00–0.95
HYPERNYM	Synonymity	0.55	0.17	0.00–0.90
	Relation Type	0.75	0.12	0.10–0.95
MERONYM	Synonymity	0.63	0.13	0.10–0.95
	Relation Type	0.73	0.13	0.20–0.95
HOLONYM	Synonymity	0.60	0.15	0.10–0.95
	Relation Type	0.72	0.12	0.20–1.00
HAS_DOMAIN	Relation Type	0.77	0.12	0.00–1.00
DOMAIN_OF	Relation Type	0.77	0.12	0.00–1.00

Table 4. Comparison of Exact and Soft Matching Results between Graph and WordNet Relations.

Metric	Value
Total Source–Target Pairs Evaluated	1333
Exact Matches between Graph and WordNet Relations	506
Percentage of Exact Matches	37.96%
Soft Matching Approach ¹
Total Unique Pairs with Soft Matching	862
Pairs with Soft Matching	506
Percentage of Matches with Soft Matching	58.70%

¹ Soft matching refers to a relaxed approach for evaluating the alignment between graph-derived and WordNet relations. Unlike exact matching, which requires a direct one-to-one correspondence between a single relation type in the graph and in WordNet, soft matching accounts for source–target pairs with multiple relations in the graph. With soft matching, a source–target pair is considered a match if any of the multiple relations in the graph aligns with the relation found in WordNet. This approach recognizes the semantic complexity of language, where a pair of words may share several types of connections (e.g., both HYPERNYM and CO-HYPONYM) and improves alignment by acknowledging these layered relationships.

Table 5. Comparison of Graph and WordNet Relations for Selected Source–Target Pairs with Multiple Relation Types.

Source	Target	Graph Relations	WordNet Relation	Match
Apaato (apartment)	Kondominiamu (condominium)	HYPONYM, CO-HYPONYM	CO-HYPONYM	True
Apaato (apartment)	Sutajio (studio)	HYPONYM, CO-HYPONYM	HYPONYM	True
Apaato (apartment)	Taunhausu (townhouse)	HYPONYM, CO-HYPONYM	HYPERNYM	False
Apaato (apartment)	Mezonetto (maisonette)	HYPONYM, CO-HYPONYM	HYPONYM	True
Apaato (apartment)	Juukyo (dwelling)	HYPERNYM, CO-HYPONYM	CO-HYPONYM	True

Table 6. Precision, Recall, and WordNet Distribution by Graph-Derived Relation Type.

Relation Type	Precision	Recall	WordNet Distribution	Relation Type
CO-HYPONYM	38.33%	31.16%	CO-HYPONYM: 110 (38.3%), HYPONYM: 87 (30.3%), SYNONYM: 59 (20.6%), HYPERNYM: 27 (9.4%), MERONYM: 3 (1.0%), HOLONYM: 1 (0.3%)	CO-HYPONYM
HOLONYM	7.92%	32.00%	HYPERNYM: 27 (26.7%), SYNONYM: 25 (24.8%), CO-HYPONYM: 25 (24.8%), HYPONYM: 15 (14.9%), HOLONYM: 8 (7.9%), MERONYM: 1 (1.0%)	HOLONYM
HYPERNYM	27.27%	32.73%	CO-HYPONYM: 70 (35.4%), HYPERNYM: 54 (27.3%), SYNONYM: 50 (25.3%), HYPONYM: 13 (6.6%), HOLONYM: 8 (4.0%), MERONYM: 3 (1.5%)	HYPERNYM
HYPONYM	48.78%	43.36%	HYPONYM: 160 (48.8%), CO-HYPONYM: 78 (23.8%), SYNONYM: 56 (17.1%), MERONYM: 21 (6.4%), HYPERNYM: 13 (4.0%)	HYPONYM
MERONYM	31.68%	46.38%	MERONYM: 32 (31.7%), HYPONYM: 24 (23.8%), SYNONYM: 20 (19.8%), CO-HYPONYM: 18 (17.8%), HYPERNYM: 6 (5.9%), HOLONYM: 1 (1.0%)	MERONYM
SYNONYM	44.65%	40.34%	SYNONYM: 142 (44.7%), HYPONYM: 70 (22.0%), CO-HYPONYM: 52 (16.4%), HYPERNYM: 38 (11.9%), MERONYM: 9 (2.8%), HOLONYM: 7 (2.2%)	SYNONYM

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Perak, B.; Špica, D. Automating Lexical Graph Construction with Large Language Models: A Scalable Approach to Japanese Multi-Relation Lexical Networks. Knowledge 2025, 5, 24. https://doi.org/10.3390/knowledge5040024

AMA Style

Perak B, Špica D. Automating Lexical Graph Construction with Large Language Models: A Scalable Approach to Japanese Multi-Relation Lexical Networks. Knowledge. 2025; 5(4):24. https://doi.org/10.3390/knowledge5040024

Chicago/Turabian Style

Perak, Benedikt, and Dragana Špica. 2025. "Automating Lexical Graph Construction with Large Language Models: A Scalable Approach to Japanese Multi-Relation Lexical Networks" Knowledge 5, no. 4: 24. https://doi.org/10.3390/knowledge5040024

APA Style

Perak, B., & Špica, D. (2025). Automating Lexical Graph Construction with Large Language Models: A Scalable Approach to Japanese Multi-Relation Lexical Networks. Knowledge, 5(4), 24. https://doi.org/10.3390/knowledge5040024

Article Menu

Automating Lexical Graph Construction with Large Language Models: A Scalable Approach to Japanese Multi-Relation Lexical Networks

Abstract

1. Introduction

2. Methods

2.1. Data Sources and Preprocessing

2.2. Algorithmic Extraction of Lexical Data Large Using Language Model (LLM)

2.3. Output Data Structures

2.4. Graph Construction

2.5. Mutual Domain and Sense Construction

2.6. Quality Control and Data Integrity

2.7. Graph Enrichment and Lexical Attribute Integration

3. Results

3.1. Initial Lexical Network Construction

3.2. Comparative Analysis of Network Structure

3.3. Network Structure and Centrality Analysis

3.4. Comparison of Lexical Relations in Graph and WordNet

3.4.1. Matching Accuracy and Coverage

3.4.2. Discrepancies Between Graph and WordNet Relations

3.4.3. Confusion Matrix Analysis and Relation-Type Statistics

3.4.4. Distribution of WordNet Relations for Each Graph Relation Type

3.4.5. Implications for NLP Applications

4. Discussion

4.1. Structural Insights and Lexical Organization

4.1.1. The Giant Component as a Model of Lexical Cohesion

4.1.2. Network Topology: A Core–Periphery Lexical Structure

4.1.3. Centrality Analysis: Uncovering the Conceptual Backbone

4.2. Multiple Relationship Types for Same Source Target Pair

4.3. Examples of Pairs with Multiple Relations

4.4. Discussion of Multiple Relations

4.5. Interpreting the Alignment with WordNet: A Complementary Resource

4.6. Comparison with Other Lexicon Building Approaches

4.7. Comparison with Recent Literature and State-of-the-Art

4.8. Graph Visualization and Exploration

4.9. A Case Study: Enhancing Japanese Language Education

5. Limitations and Future Work

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI