Semi-Supervised Taxonomy Expansion and Completion in Dynamic Taxonomies

Butt, Sabur; Alatorre, Gustavo De los Ríos; González Gómez, Luis José; Ceballos, Hector G.

doi:10.3390/app15126517

Open AccessArticle

Semi-Supervised Taxonomy Expansion and Completion in Dynamic Taxonomies

by

Sabur Butt

^*

,

Gustavo De los Ríos Alatorre

,

Luis José González Gómez

and

Hector G. Ceballos

Institute for the Future of Education, Tecnológico de Monterrey, Monterrey 64700, Mexico

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(12), 6517; https://doi.org/10.3390/app15126517

Submission received: 13 May 2025 / Revised: 5 June 2025 / Accepted: 5 June 2025 / Published: 10 June 2025

Download

Browse Figures

Versions Notes

Abstract

Taxonomies play a critical role in structuring knowledge within rapidly evolving domains such as professional skills. Traditional manual taxonomy management faces challenges due to its labor-intensive nature and the rapid emergence of new concepts. To address these issues, we propose a novel semi-supervised approach leveraging Retrieval-Augmented Generation (RAG) for taxonomy expansion and completion, particularly tailored for dynamic skill-based taxonomies. Our contributions include the creation of a comprehensive dataset derived from automotive sector job postings, designed explicitly to evaluate taxonomy expansion and completion tasks. This methodology integrates the precision of retrieval-based mechanisms with the flexibility of generative models, enabling accurate and efficient updates to taxonomy structures. We evaluated our method using this dataset, demonstrating an overall accuracy of 78%. Although the model performed robustly in horizontal expansions, accurately recognizing variations of existing concepts, it revealed limitations in vertical expansions, especially in identifying entirely new categories. These findings underline the necessity for improved data representation strategies and the incorporation of contextual enrichment to enhance taxonomy robustness.

Keywords:

taxonomy expansion; taxonomy completion; up-skilling; re-skilling

1. Introduction

A taxonomy is a structured classification system that organizes and represents concepts, entities, or knowledge hierarchically [1,2]. Each element in a taxonomy is defined by its relationships to other elements, commonly through “is-a” or “part-of” relationships. For instance, in skill-based taxonomies, “Programming Languages” is a category that includes specific programming skills like Python, JavaScript, and C++. Similarly, “Machine Learning Techniques” can encompass “Supervised Learning”, “Unsupervised Learning”, and “Reinforcement Learning”. Taxonomies are inherently dynamic, requiring continuous updates to accommodate the evolution of knowledge and the emergence of new technologies. Our work specifically focuses on highly volatile and dynamic skill-based taxonomies. Skill-based taxonomies are particularly volatile due to the rapidly shifting job market and technological advancements. For instance, emerging skills such as ’Quantum Computing’ or ’Prompt Engineering’ must be incorporated promptly to ensure relevance and usability. This rapid evolution underscores the difficulty of manual taxonomy curation, a process that is labor-intensive, time-consuming, and often incomplete. Skill-based taxonomies are pivotal in various domains, providing clarity and structure for applications such as curriculum design, skills assessments, and job matching in the professional sector.

Due to this, automated taxonomy management has received substantial attention, particularly for tasks such as taxonomy expansion [3] and completion [4,5]. Taxonomy expansion focuses on extracting and modeling hierarchical information from existing taxonomies, employing techniques such as local egonets [3], parent–query–child triplets [5], and mini-path encodings [6] to capture structural relationships. Taxonomy completion techniques, on the other hand, leverage external corpora to generate embeddings for new concepts, relying either on implicit relational semantics or seed-guided methods to construct limited taxonomies. Earlier approaches have explored combining semantic sentences and subgraph representations, often using methods like lightweight multi-layer perceptrons (MLPs) [7]. Additionally, frameworks such as TaxoExpan [3] and TaxoEnrich [4] have utilized structural and semantic features, incorporating position-aware encoders and pretrained language models to tackle taxonomy completion and expansion tasks.

Despite these advancements, applying automated methods to dynamic and rapidly evolving domains like professional skills introduces unique challenges. Skills are inherently ambiguous, context-dependent, and often represented by sparse data, making it difficult to determine their placement within a taxonomy. One significant challenge in managing skill-based taxonomies is the variability in how people express the same skills or knowledge. The same skill can be written in multiple ways—such as ‘Data Analysis,’ ‘Analyzing Data,’ or ‘Data Analytics’—depending on individual or organizational preferences. This variability becomes even more pronounced when skills are extracted by generative large language models, which can produce diverse representations of the same concept based on context or prompts. These methods rely heavily on exact matches, predefined rules, or limited synonym lists, which are insufficient to capture the breadth and nuance of such variations.

Furthermore, skill taxonomies must handle both vertical expansion (introducing broader categories/hypernyms) and horizontal expansion (adding specific variations/hyponyms) within the categories while maintaining logical consistency and avoiding redundancy. Vertical expansion involves introducing broader categories to accommodate higher-level relationships, such as adding ‘Technology Skills’ as a hypernym for categories like ‘Programming’ or ‘Data Analysis.’ Horizontal expansion, on the other hand, requires adding more specific terms grouped within existing categories. For instance, adding specific hyponyms like ‘Python’ or ‘Rust’ under ‘Programming Languages’ expands the breadth of the category while ensuring consistency and logical alignment. Maintaining logical consistency in both vertical and horizontal expansions while avoiding redundancy is another significant challenge. For example, placing ‘Machine Learning’ under both ‘Artificial Intelligence’ and ‘Data Science’ might create overlaps, leading to confusion. Similarly, deciding whether ‘Deep Learning’ is a subcategory of ‘Machine Learning’ or a separate domain altogether requires careful consideration to ensure the taxonomy remains coherent and non-redundant.

To enhance the credibility and effectiveness of taxonomy expansion and completion, we propose a novel methodology leveraging a Retrieval-Augmented Generation (RAG) framework with a human-in-the-loop. RAG combines the precision of retrieval-based methods, which are adept at accessing structured knowledge, with the generative power of large language models, enabling context-aware and dynamic taxonomy updates. Our approach facilitates the seamless integration of new concepts into existing taxonomies while accurately matching them to related concepts already present, thereby avoiding redundancy and preserving structural consistency. By harnessing contextual embeddings and incorporating domain-specific knowledge, the methodology supports both vertical and horizontal expansion. Additionally, it addresses taxonomy completion by identifying and filling missing hierarchical relationships.

A key distinguishing feature of our work is the design of RAG-based baselines for taxonomy management. Unlike traditional supervised approaches, our baselines are designed to handle semi-supervised tasks, leveraging retrieval mechanisms to augment generative models with relevant knowledge and then relying on a human-in-the-loop to reassure taxonomy credibility with a significant reduction in workload. Additionally, we introduce a novel dataset derived from job postings in the automotive sector, tailored to evaluate taxonomy expansion and completion tasks in a real-world domain. This dataset reflects the complexities of dynamic skill landscapes, including ambiguities, sparse data, and domain-specific variations, making it a valuable resource for benchmarking future methods.

We introduce a comprehensive framework for taxonomy expansion and completion, integrating retrieval and generation in a unified approach tailored to dynamic and large-scale taxonomies.
Our work establishes baseline models that demonstrate the effectiveness of retrieval-augmented generation for handling real-world taxonomy management tasks.
We present a dataset derived from job postings in the automotive sector, reflecting real-world complexities and serving as a benchmark for evaluating taxonomy expansion and completion methodologies.
The proposed framework supports both vertical (hypernym addition) and horizontal (hyponym expansion) updates while addressing missing relationships to ensure logical consistency.

The remainder of this paper is organized as follows: Section 2 formally defines the taxonomy expansion and completion problem, detailing the objectives and expected outputs. Section 3 reviews related work and highlights current limitations in the literature. Section 4 outlines our methodology, including dataset construction, system design, evaluation protocols, and the experimental results, while Section 5 provides an in-depth discussion and interpretation of the findings. Finally, Section 6 concludes the study and outlines directions for future work.

2. Problem Definition

We define the problem as the automated expansion and completion of a taxonomy with the following objectives:

Dynamic Taxonomy Maintenance: Taxonomies are inherently dynamic, requiring the integration of new concepts, relationships, and subcategories without disrupting the existing structure.
Vertical Taxonomy Expansion (New Category/New Hypernym): This involves identifying and adding higher-level categories (hypernyms) to accommodate new concepts that introduce broader generalizations. A hypernym is a broader term that encompasses multiple specific instances. For example, “Programming Languages” is a hypernym that includes more specific skills like “Python” or “Rust”.
Horizontal Taxonomy Expansion (New Variation/New Hyponym): This involves adding new subcategories (hyponyms) or variations under existing categories to improve coverage and granularity. A hyponym is a term that represents a subset of a broader category. For instance, “Python” and “JavaScript” are hyponyms of the category “Programming Languages.”
Taxonomy Completion (Existing variation/Existing hyponym): This involves identifying the most likely skill hyponym relationships between existing concepts in the taxonomy to ensure structural completeness and consistency.

Given an existing taxonomy

T_{0} = (N_{0}, E_{0})

where

N_{0}

represents the set of nodes (concepts) and

E_{0}

the directed edges representing “is-a” relationships, and

C = {n_{1}, n_{2}, \dots, n_{m}}

is a set of new concepts to be integrated into

T_{0}

, the task is to generate an updated taxonomy:

T = (N, E),

where

N = N_{0} \cup C

and

E = E_{0} \cup E^{'}

. The newly added edges

E^{'}

represent the appropriate hypernym and hyponym relationships for concepts in C.

Additionally, the system identifies concepts from C that are already part of the taxonomy (

N_{0}

) and refers to them as matches. This ensures that the taxonomy is not redundantly expanded and that existing concepts are appropriately matched to their most relevant positions. The Figure 1 visually represents the problem described.

The solution must:

Predict the most suitable parent(s) (hypernyms) for each new concept in C.
Identify potential subcategories (hyponyms) or relationships for each concept in C, if applicable.
Match concepts from C to existing nodes in $N_{0}$ if they are already part of the taxonomy.
Ensure the updated taxonomy T remains logically consistent and satisfies the requirements for vertical and horizontal expansion.

3. Related Work

In many real-world applications, taxonomies are meticulously created by experts or through crowdsourcing efforts and are often integrated into online systems [8,9]. Rather than constructing taxonomies from scratch, these systems require dynamic methods to expand existing taxonomies and ensure they remain relevant. Previous research has tackled this by incorporating new concepts into resources like WordNet, using named entities from Wikipedia [10] or domain-specific concepts derived from various fields [11,12]. For example, Task 14 of the SemEval 2016 challenge [13] aimed to enrich WordNet with concepts from domains such as health, sports, and finance. However, these methods heavily rely on WordNet’s unique structure, making it difficult to generalize them to other taxonomies. To address this limitation, recent studies have proposed methods for expanding generic taxonomies. Wang et al. [14] introduced a model that uses search engine query logs to grow taxonomies, while Plachouras et al. [15] developed models that identify concept variations using external datasets. Vedula et al. [16] created a ranking model combining features, including data retrieved from external APIs, to identify the best positions for new concepts. Similarly, Aly et al. [17] employed hyperbolic embeddings to link new concepts to the most related nodes in a taxonomy.

Recent taxonomy expansion methods typically focus on constructing an entire taxonomy through two primary steps. The first one involves identifying hypernym/hyponym pairs using either pattern-based methods [18,19] with predefined patterns or distributional methods [20,21] that calculate term similarities based on embeddings. The second step organizes these relations into a hierarchical structure, such as a tree or Directed Acyclic Graph (DAG), using optimization techniques like maximum spanning trees [22,23], optimal branching [24], or minimum-cost flow [25]. Additionally, some methods utilize entity set expansion to incrementally grow taxonomies from a small seed [26]. Shen et al. [3] in their approach eliminated the need for external data by leveraging the existing taxonomy for self-supervision. This approach enabled its application across diverse domains and incorporated the local structural context of each candidate position. However, for our application of skills, this approach is not reliable as expert supervision is still required in the process.

In contrast, our approach simplifies this process by constructing a single-level tree as a knowledge base. For each query representing a skill, the system retrieves the most relevant branch based on the query’s hypernym, definition and related branch hyponyms. This streamlined methodology avoids building complex hierarchical structures, focusing instead on direct retrieval and representation for specific skills. This approach is particularly suited to dynamic and domain-specific requirements. This distinction is critical because, in dynamic and volatile taxonomies, entirely new categories frequently emerge. For instance, consider concepts like “quantum computing” or “carbon-neutral technology,” which represent terms previously absent from existing taxonomies but that should now be incorporated as new hypernyms. Without a systematic approach to identify and classify such emergent categories as new hypernyms, taxonomies risk becoming outdated and irrelevant. The ability to differentiate between new hypernyms, new hyponyms, and existing hyponyms is essential to maintain the relevance and accuracy of taxonomies. Current methods often fail in this regard by assuming all new concepts are simply hyponyms of existing categories, ignoring the possibility of entirely new parent categories. To address this gap, we reformulated the task into a classification problem with three labels: (1) new hypernym, (2) new hyponym, and (3) existing hyponym, as detailed in Section 4. This enhanced labeling framework ensures that taxonomies remain comprehensive and adaptive to the evolving landscape of knowledge domains.

4. Methodology

The following section describes the development of a RAG-based classifier for skill extraction from job postings, designed for both vertical and horizontal skill classification. This system aims to analyze how skills are identified across job postings and their alignment with an existing knowledge base accessible within the LLM’s RAG architecture. Additionally, it details the steps involved in data extraction and dataset creation, followed by the design of various prompts. The section also discusses the establishment of evaluation metrics and the subsequent analysis of results to assess performance and refine the model. Figure 2 presents a high-level overview of the methodology.

4.1. Data Extraction and Taxonomy Creation

We utilized the skills classification and extraction dataset available on Tecnologico de Monterrey’s Data Hub [27,28], which comprises job postings from the automotive sector. During the development of the skill classification dataset, 1137 job listings were collected from Indeed Mexico and processed using GPT-4o for skill extraction, resulting in 15,042 raw skills. These raw skills underwent preprocessing to remove corrupted data entries, including NaN values. Requirements labeled as skills or knowledge but not fitting traditional classifications were categorized under “Others”. Following the preprocessing stage, two human annotators classified the skills based on the “Knowledge, Skill, Abilities, and Others” (KSA-O) framework. Discrepancies in classification were resolved by a third annotator. Further details on the skill classification dataset can be found in [28] and the published report on the automotive skills taxonomy [29].

Using the skill classification dataset [27], we initiated taxonomy construction by identifying hypernyms and their associated hyponyms. To ensure accuracy, we hired two expert annotators with Master’s degrees in Computer Science and Engineering, both native Spanish speakers, for six months. Conflicting labels were reviewed by an industry expert, who finalized hypernym/hyponym relationships according to industry standards. During annotation, detailed guidelines were provided to help annotators determine appropriate hypernyms and assess whether a hyponym belonged to a particular category. Any hyponym that could serve as an independent hypernym without overlapping existing categories was separated. The annotators ensured that each keyword encapsulated the core skill represented by different expressions.

4.2. RAG Dataset Creation

This approach facilitated the development of a skill taxonomy reflective of the automotive industry’s requirements at the time of data extraction. The hypernyms served as umbrella terms to systematically organize a vast array of skill expressions. The final taxonomy comprised 220 hypernyms, derived from a total of 11,538 text variations obtained post preprocessing. Each hypernym was supplemented with a definition to contextualize its meaning and application. The distribution of the data is detailed in Table 1.

Once the taxonomy was completed, the data in the taxonomy was restructured for its use in a classifier system. The primary function of this classifier is to conduct vertical and horizontal classifications on incoming data sourced from future monthly extractions. The classifier utilized the skills listed under each knowledge category to delineate the technical aspects relevant to individual job postings, thus acting as a critical discriminative factor across distinct knowledge domains. Leveraging the 134 knowledge categories and their 5,120 skill variations as a foundational knowledge base, three specific data subsets were extracted from this collection. These subsets were created explicitly to assess the ability of a Retrieval-Augmented Generation (RAG) classifier to differentiate between three distinct categories: new hypernyms, new hyponyms, and existing hyponyms.

From a hierarchical perspective, classifying data into these three categories and analyzing the classifier’s outcomes enables the evaluation of the RAG model’s effectiveness. This evaluation extends beyond merely categorizing new data within established classes; it also considers the model’s potential for database expansion as new monthly data becomes available. The label “new hypernym” identifies incoming data that does not fit existing categories, thus serving to measure the model’s vertical classification performance, particularly its capability to recognize and integrate new knowledge domains. Conversely, the label “new hyponym” assesses the model’s proficiency in associating previously unseen variations with existing categories. Lastly, the “existing hyponym” label acts as a baseline indicator, ensuring the model accurately recognizes and maintains known variations already represented in the knowledge base.

An extraction of the different variations and categories was performed using the taxonomy dataframe as a baseline. The data selected for the new key and new variation labels was extracted from the dataframe and subsequently erased from its records. Eliminating these data instances from the knowledge base of the model was essential to get a realistic benchmark of the capabilities of the system for classification. Meanwhile, the texts under the existing variation label were kept in the knowledge base to test the system’s recognition capabilities for horizontal classification as well, since these texts should coincide with existing information in the base. Imitating the distribution expected in real-world scenarios it was decided that the new key label would contain 20 samples consisting of underrepresented keys from the original knowledge base, and the new variation label would contain 150 samples consisting of variations from the remaining keys in the knowledge base prioritizing the extraction of variations from the remaining classes with the least representation. Lastly, for the existing variation label, a random selection of 150 variations that appeared more than once was performed without removing them from the knowledge base.

4.3. Design of the System and Prompts

The RAG labeling system was built using GPT-4o as the base model and “text-embedding-3-small” for the embeddings. GPT-4o was selected as it represented the state-of-the-art model at the time of experimentation, offering improved reasoning capabilities, lower latency, and better handling of nuanced language tasks. The text-embedding-3-small model was chosen because it provides high-quality sentence-level embeddings while maintaining low computational overhead. We used 1536 dimensions for the embedding. The system was fed the data in the reference taxonomy that remained after the process referenced in Section 4.2 of this methodology. The system had access to the embeddings of texts that combined each of the hypernyms with their associated hyponyms and definitions. Every time the system receives a new query, it retrieves the relevant knowledge about said query by analyzing its embedding and using cosine similarity between the embeddings of the query and the embeddings in the knowledge base, selecting the top three most similar matches and returning them in a list. The system would utilize structured outputs to keep a consistent format across all responses, with the structured being based on a knowledge map template which would consider the query, the relevant knowledge about that query, the relevant definitions associated with it, and a key that would map it as new hypernym when the received query did not match with any of the existing keys. For the response generation, the temperature of the model was kept as 0 to discourage the LLM from any hallucinations, with the system receiving explicit instructions indicating that it should map the given query to one of the previously discussed labels based on its knowledge base. New hypernym was specified when the query was unique enough to have its category in the knowledge base. New hyponym was intended as a response for when the query could be mapped to one of the existing categories in the knowledge base, but its text was not among the existing variations inside that category. Lastly, the existing hyponym was specified to be used when the query already existed in the knowledge base as a variation. For the user-level role, the returned content was specified to have the structured output format, including the query, relevant knowledge, and definitions, specifying that the model should generate a response based on the available knowledge. The response generated by the system would forward the content of the message from the first choice.

Each of the responses in the structured format was then be appended to a list from which the responses were formatted to be put into an answers dataframe. Figure 3 shows an illustration of the step-by-step RAG process for obtaining the desired categories.

In the following Figure 4 we explain the prompt we used to map the query to the label.

4.4. Definition of the Metrics

The metrics chosen for evaluating the accuracy of the system were accuracy, precision, and recall, as well as the micro, macro, and weighted F1 scores of the results. The basis for all these metrics was the boolean comparisons between the label given by the RAG system and the ground truth key. The equations for each of these metrics are defined below:

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

Precision = \frac{T P}{T P + F P}

Recall = \frac{T P}{T P + F N}

F 1 = 2 \times \frac{Precision \times Recall}{Precision + Recall}

Macro - F 1 = \frac{1}{N} \sum_{i = 1}^{N} {F 1}_{i}

Weighted - F 1 = \sum_{i = 1}^{N} \frac{n_{i}}{\sum_{j = 1}^{N} n_{j}} \cdot {F 1}_{i}

where

T P

,

T N

,

F P

, and

F N

denote true positives, true negatives, false positives, and false negatives, respectively, and N is the total number of classes.

4.5. Results

The results of the label-specific metrics, shown in Table 2, give us more information about the system performance. For the “existing hyponyms” class, the model achieved a string precision of 0.81 and a recall value of 0.82, leading to a harmonized F1 score of 0.81. These numbers suggest that the model is performing well in recognizing the different skill variations present in the taxonomy, benefiting from the robustness of its retrieval and embedding mechanisms.

A particularly interesting case is presented in the resulting values for the “new hypernym” label. It obtained a precision result of 1.00, but it suffered from a very low recall of 0.20. This results in a harmonized F1 value of 0.33. While the high precision indicates that the system is highly confident in the labels it assigns, the low recall reflects its struggle to identify all instances of truly novel concepts. This limitation can be attributed to the sparse representation of new keywords in the dataset (sometimes even new words altogether), which restricts the system’s ability to generalize effectively. This is particularly notable when trying to classify completely new words.

For the “new hyponym” label, the model achieved an F1 score of 0.78, with a precision of 0.74 and a recall of 0.81. While there’s still room for improvement in minimizing false positives, this performance demonstrates the system’s competence in identifying new variations of existing skills.

5. Discussion and Analysis

The results obtained from the RAG-based classifier provide valuable insights into its performance when classifying skills into hypernyms and hyponyms within a dynamic taxonomy. With an overall accuracy of 0.78, the classifier demonstrates strong general effectiveness, indicating that its predictions align closely with the ground truth in the majority of cases. Nonetheless, performance variations across different labels highlight both strengths and limitations. The classifier achieved a weighted precision of 0.79 and a weighted recall of 0.78, suggesting consistent and balanced performance across most labels, especially considering the underlying distribution. The balanced nature of these metrics is further confirmed by the weighted F1 score of 0.77, which combines precision and recall into a unified measure of reliability and robustness across the entire set of categories. However, a macro F1 score of 0.64 indicates notable disparities in the model’s performance among individual labels, particularly affecting less frequently occurring categories such as “new hypernym”. The lower performance observed in these minority classes highlights challenges associated with imbalanced datasets and underscores the complexity involved in correctly identifying and classifying new or evolving concepts.

The confusion matrix presented in Figure 5 provides further clarity on specific misclassification tendencies by illustrating the classification results across the three target labels: existing hyponym, new hyponym, and new hypernym. The matrix visually represents the distribution of correct and incorrect predictions, with the diagonal cells indicating correct classifications. The matrix illustrates a pronounced bias toward classifying concepts as “existing hyponyms” rather than correctly identifying “new hyponyms” and “new hypernym” categories. This inclination suggests that the model currently struggles to distinguish subtle differences between variations of existing terms and entirely new concepts. Such misclassifications underline the necessity for enhanced feature engineering or the incorporation of richer contextual information to boost differentiation capabilities. Overall, while the RAG-based classifier exhibits promising capabilities for horizontal taxonomy expansion tasks, its performance in vertical taxonomy classifications requires further development. The limitations, particularly for underrepresented classes like “new hypernym”, highlight the importance of addressing data imbalance and refining the model’s capacity to recognize novel or less represented terms effectively.

Beyond the technical results, this framework holds significant implications for education, especially in how academic institutions design and update their curricula. The ability to dynamically expand and complete taxonomies allows educational programs to remain closely aligned with industry needs, incorporating emerging skills such as “Prompt Engineering” or “Carbon-Neutral Technologies” into course offerings in a timely manner. This responsiveness is vital in rapidly evolving sectors, where traditional curriculum revision cycles often lag behind technological advancement. Learners benefit from exposure to current and market-relevant competencies, educators are supported in identifying skill gaps and content updates, and institutions enhance their capacity to deliver impactful upskilling and reskilling programs.

What distinguishes this approach from fully automated systems is its semi-supervised, human-in-the-loop design. While large language models offer flexibility and scalability, they can also introduce hallucinations or misclassifications, particularly in complex or ambiguous cases. By embedding expert validation into the loop, the proposed framework ensures that additions to the taxonomy are not only computationally accurate but also pedagogically and contextually sound. This human oversight is indispensable in educational settings, where quality assurance, domain relevance, and instructional clarity are non-negotiable. Ultimately, the framework not only strengthens the robustness of skill taxonomies but also provides a trustworthy and scalable pathway for integrating AI-driven knowledge structuring into education and workforce development.

6. Conclusions

Our study introduced a semi-supervised taxonomy expansion and completion framework using a Retrieval-Augmented Generation (RAG) classifier, specifically designed to manage dynamic skills-based taxonomies. We provided a carefully constructed dataset from automotive sector job postings, facilitating robust evaluation. The system effectively achieved an overall accuracy of 78%, showcasing strong capabilities in horizontal classification tasks by accurately identifying variations of existing skills. However, the model encountered challenges in vertical taxonomy expansion, particularly in recognizing new hypernym categories, primarily due to data imbalance and sparse representation issues. Future research should prioritize addressing these limitations by enriching data representations, incorporating advanced contextual and feature engineering techniques, and developing mechanisms to manage imbalanced datasets. Such improvements would significantly enhance the system’s adaptability, reliability, and effectiveness in dynamically evolving domains. Beyond its technical contributions, we elaborated on our framework’s significant promise for educational applications. By enabling the automated identification and classification of emerging skills, the methodology supports institutions in keeping their curricula aligned with real-time labor market demands. This dynamic adaptability is crucial for effective upskilling and reskilling strategies, particularly in rapidly evolving sectors.

Author Contributions

S.B. led the conceptualization and design of the study, developed the methodology, and was primarily responsible for writing the original draft. G.D.l.R.A. implemented the system and contributed to data curation and technical integration. L.J.G.G. conducted the formal analysis and supported the validation and visualization of results. All authors contributed to the review and editing of the manuscript. H.G.C. supervised the overall project, provided administrative support, and secured funding. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset used in this study was derived from job postings in the automotive sector and is publicly available through the Tecnológico de Monterrey’s Data Hub at https://doi.org/10.57687/FK2/O7E66L. Further information on the taxonomy and classification data can be found in the associated dataset papers and reports cited within the manuscript.

Acknowledgments

We acknowledge the generous support of Santander Bank, whose donation to our institution has contributed to the environment and resources that made this research possible.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Vrandečić, D. Wikidata: A new platform for collaborative data collection. In Proceedings of the 21st International Conference on World Wide Web, Lyon, France, 16–20 April 2012; pp. 1063–1064. [Google Scholar]
Khaouja, I.; Mezzour, G.; Carley, K.M.; Kassou, I. Building a soft skill taxonomy from job openings. Soc. Netw. Anal. Min. 2019, 9, 43. [Google Scholar] [CrossRef]
Shen, J.; Shen, Z.; Xiong, C.; Wang, C.; Wang, K.; Han, J. TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Network. In Proceedings of the Web Conference 2020, New York, NY, USA, 20 April 2020; pp. 486–497. [Google Scholar] [CrossRef]
Jiang, M.; Song, X.; Zhang, J.; Han, J. Taxoenrich: Self-supervised taxonomy completion via structure-semantic representations. In Proceedings of the ACM Web Conference 2022, Lyon, France, 25–29 April 2022; pp. 925–934. [Google Scholar]
Zhang, J.; Song, X.; Zeng, Y.; Chen, J.; Shen, J.; Mao, Y.; Li, L. Taxonomy completion via triplet matching network. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 2–9 February 2021; Volume 35, pp. 4662–4670. [Google Scholar]
Yu, Y.; Li, Y.; Shen, J.; Feng, H.; Sun, J.; Zhang, C. Steam: Self-supervised taxonomy expansion with mini-paths. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020; pp. 1026–1035. [Google Scholar]
Zeng, Q.; Lin, J.; Yu, W.; Cleland-Huang, J.; Jiang, M. Enhancing taxonomy completion with concept generation via fusing relational representations. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Virtual Event, 14–18 August 2021; pp. 2104–2113. [Google Scholar]
Chilton, L.B.; Little, G.; Edge, D.; Weld, D.S.; Landay, J.A. Cascade: Crowdsourcing taxonomy creation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Paris, France, 27 April–2 May 2013; pp. 1999–2008. [Google Scholar]
Yang, D.; Powers, D.M. Measuring Semantic Similarity in the Taxonomy of WordNet; Australian Computer Society: Sydney, Australia, 2005. [Google Scholar]
Toral, A.; Munoz, R.; Monachini, M. Named Entity WordNet. In Proceedings of the LREC, Marrakech, Morocco, 26 May–1 June 2008. [Google Scholar]
Bentivogli, L.; Bocco, A.; Pianta, E. ArchiWordnet: Integrating Wordnet with domain-specific knowledge. In Proceedings of the 2nd International Global Wordnet Conference, Brno, Czech Republic, 20–23 January 2004; pp. 39–47. [Google Scholar]
Fellbaum, C.; Hahn, U.; Smith, B. Towards new information resources for public health—from WordNet to MedicalWordNet. J. Biomed. Inform. 2006, 39, 321–332. [Google Scholar] [CrossRef] [PubMed]
Jurgens, D.; Pilehvar, M.T. Semeval-2016 task 14: Semantic taxonomy enrichment. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, CA, USA, 16–17 June 2016; pp. 1092–1102. [Google Scholar]
Wang, J.; Kang, C.; Chang, Y.; Han, J. A hierarchical dirichlet model for taxonomy expansion for search engines. In Proceedings of the 23rd International Conference on World Wide Web, Seoul, Republic of Korea, 7–11 April 2014; pp. 961–970. [Google Scholar]
Plachouras, V.; Petroni, F.; Nugent, T.; Leidner, J.L. A comparison of two paraphrase models for taxonomy augmentation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), New Orleans, LA, USA, 1–6 June 2018; pp. 315–320. [Google Scholar]
Vedula, N.; Nicholson, P.K.; Ajwani, D.; Dutta, S.; Sala, A.; Parthasarathy, S. Enriching taxonomies with functional domain knowledge. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018; pp. 745–754. [Google Scholar]
Aly, R.; Acharya, S.; Ossa, A.; Köhn, A.; Biemann, C.; Panchenko, A. Every Child Should Have Parents: A Taxonomy Refinement Algorithm Based on Hyperbolic Term Embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 4811–4817. [Google Scholar]
Jiang, M.; Shang, J.; Cassidy, T.; Ren, X.; Kaplan, L.M.; Hanratty, T.P.; Han, J. Metapad: Meta pattern discovery from massive text corpora. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 877–886. [Google Scholar]
Nakashole, N.; Weikum, G.; Suchanek, F. PATTY: A taxonomy of relational patterns with semantic types. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Republic of Korea, 12–14 July 2012; pp. 1135–1145. [Google Scholar]
Tuan, L.A.; Tay, Y.; Hui, S.C.; Ng, S.K. Learning term embeddings for taxonomic relation identification using dynamic weighting neural network. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; pp. 403–413. [Google Scholar]
Roller, S.; Erk, K.; Boleda, G. Inclusive yet selective: Supervised distributional hypernymy detection. In Proceedings of the COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland, 23–29 August 2014; pp. 1025–1036. [Google Scholar]
Navigli, R.; Velardi, P.; Faralli, S. A graph-based algorithm for inducing lexical taxonomies from scratch. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI), Barcelona, Spain, 16–22 July 2011; Volume 11, pp. 1872–1877. [Google Scholar]
Bansal, M.; Burkett, D.; De Melo, G.; Klein, D. Structured learning for taxonomy induction with belief propagation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, MD, USA, 22–27 June 2014; pp. 1041–1051. [Google Scholar]
Velardi, P.; Faralli, S.; Navigli, R. Ontolearn reloaded: A graph-based algorithm for taxonomy induction. Comput. Linguist. 2013, 39, 665–707. [Google Scholar] [CrossRef]
Gupta, A.; Lebret, R.; Harkous, H.; Aberer, K. Taxonomy induction using hypernym subsequences. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 1329–1338. [Google Scholar]
Shen, J.; Wu, Z.; Lei, D.; Shang, J.; Ren, X.; Han, J. Setexpan: Corpus-based set expansion via context feature selection and rank ensemble. In Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2017, Skopje, Macedonia, 18–22 September 2017; Proceedings, Part I 10. pp. 288–304. [Google Scholar]
Butt, S.; De Los Ríos Alatorre, G.; González Gómez, L.J.; Ceballos Héctor, G. Automotive Industry Skills Taxonomy. 2025. Available online: https://datahub.tec.mx/dataset.xhtml?persistentId=doi:10.57687/FK2/O7E66L (accessed on 18 February 2025).
Butt, S.; Ceballos, H.G.; Madera, D.P. Tec-Habilidad: Skill Classification for Bridging Education and Employment. arXiv 2025, arXiv:2503.03932. [Google Scholar]
Ceballos Cancino, H.G.; Butt, S.; Ríos Alatorre, G.d.l.; Madera Espíndola, D.P. Skills Report on Automotive Industry in Mexico: Shaping Skills Framework. 2025. Available online: https://repositorio.tec.mx/items/194d7c28-f216-43d9-82ea-9fa668bafa29 (accessed on 10 January 2025).

Figure 1. The Figure gives a visual illustration of both taxonomy expansion and the completion process.

Figure 2. The figure shows the high-level diagram of the RAG. Skills are first extracted from job postings and passed through the classifier, which labels each skill as a new hypernym, a new hyponym, or an existing hyponym. These predicted labels are then reviewed by a human expert. Yellow texts are categorized as new entries in the taxonomy after classification and human verification. Once verified, the validated labels are used to update the taxonomy accordingly.

Figure 3. Step-by-step RAG process.

Figure 4. Prompt Template: Skill Taxonomy Classification.

Figure 5. Confusion matrix portraying the classification results.

Table 1. Distribution of the skills used in the creation of the December taxonomy.

KSA-O Category	Hypernyms	Hyponyms
Knowledge	134	5120
Skills	31	1228
Abilities	28	441
Others	27	426
Total	220	11,538

Table 2. Classification metrics for each label and overall system performance.

Class Label	Precision	Recall	F1-Score	Support
Existing hyponym	0.81	0.82	0.81	150
New hypernym	1.00	0.20	0.33	20
New hyponym	0.74	0.81	0.78	150
Macro Avg	0.85	0.61	0.64	320
Weighted Avg	0.79	0.78	0.77	320
Accuracy	0.78

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Butt, S.; Alatorre, G.D.l.R.; González Gómez, L.J.; Ceballos, H.G. Semi-Supervised Taxonomy Expansion and Completion in Dynamic Taxonomies. Appl. Sci. 2025, 15, 6517. https://doi.org/10.3390/app15126517

AMA Style

Butt S, Alatorre GDlR, González Gómez LJ, Ceballos HG. Semi-Supervised Taxonomy Expansion and Completion in Dynamic Taxonomies. Applied Sciences. 2025; 15(12):6517. https://doi.org/10.3390/app15126517

Chicago/Turabian Style

Butt, Sabur, Gustavo De los Ríos Alatorre, Luis José González Gómez, and Hector G. Ceballos. 2025. "Semi-Supervised Taxonomy Expansion and Completion in Dynamic Taxonomies" Applied Sciences 15, no. 12: 6517. https://doi.org/10.3390/app15126517

APA Style

Butt, S., Alatorre, G. D. l. R., González Gómez, L. J., & Ceballos, H. G. (2025). Semi-Supervised Taxonomy Expansion and Completion in Dynamic Taxonomies. Applied Sciences, 15(12), 6517. https://doi.org/10.3390/app15126517

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Semi-Supervised Taxonomy Expansion and Completion in Dynamic Taxonomies

Abstract

1. Introduction

2. Problem Definition

3. Related Work

4. Methodology

4.1. Data Extraction and Taxonomy Creation

4.2. RAG Dataset Creation

4.3. Design of the System and Prompts

4.4. Definition of the Metrics

4.5. Results

5. Discussion and Analysis

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI