Eng | November 2025 - Browse Articles

Journal Browser

► Journal Browser

Eng, Volume 6, Issue 11 (November 2025) – 1 article

Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
You may sign up for e-mail alerts to receive table of contents of newly released issues.
PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.

27 pages, 1378 KB

Open AccessArticle

Automated Taxonomy Construction Using Large Language Models: A Comparative Study of Fine-Tuning and Prompt Engineering

by Binh Vu, Rashmi Govindraju Naik, Bao Khanh Nguyen, Sina Mehraeen and Matthias Hemmje

Eng 2025, 6(11), 283; https://doi.org/10.3390/eng6110283 (registering DOI) - 22 Oct 2025

Abstract

Taxonomies provide essential hierarchical structures for classifying information, enabling effective retrieval and knowledge organization in diverse domains such as e-commerce, academic research, and web search. Traditional taxonomy construction, heavily reliant on manual curation by domain experts, faces significant challenges in scalability, cost, and consistency when dealing with the exponential growth of digital data. Recent advancements in Large Language Models (LLMs) and Natural Language Processing (NLP) present powerful opportunities for automating this complex process. This paper explores the potential of LLMs for automated taxonomy generation, focusing on methodologies incorporating semantic embedding generation, keyword extraction, and machine learning clustering algorithms. We specifically investigate and conduct a comparative analysis of two primary LLM-based approaches using a dataset of eBay product descriptions. The first approach involves fine-tuning a pre-trained LLM using structured hierarchical data derived from chain-of-layer clustering outputs. The second employs prompt-engineering techniques to guide LLMs in generating context-aware hierarchical taxonomies based on clustered keywords without explicit model retraining. Both methodologies are evaluated for their efficacy in constructing organized multi-level hierarchical taxonomies. Evaluation using semantic similarity metrics (BERTScore and Cosine Similarity) against a ground truth reveals that the fine-tuning approach yields higher overall accuracy and consistency (BERTScore F1: 70.91%; Cosine Similarity: 66.40%) compared to the prompt-engineering approach (BERTScore F1: 61.66%; Cosine Similarity: 60.34%). We delve into the inherent trade-offs between these methods concerning semantic fidelity, computational resource requirements, result stability, and scalability. Finally, we outline potential directions for future research aimed at refining LLM-based taxonomy construction systems to handle large dynamic datasets with enhanced accuracy, robustness, and granularity. Full article

► Show Figures

Figure 1

Journal Menu

Journal Browser

Eng, Volume 6, Issue 11 (November 2025) – 1 article

Further Information

Guidelines

MDPI Initiatives

Follow MDPI