Definition-Anchored Unsupervised Word Sense Induction Using LLM-Generated Glosses
Abstract
1. Introduction
- RQ1 (Effectiveness): Does incorporating LLM-generated definitions as classification anchors improve WSI performance, particularly for minority senses that are prone to conflation in embedding space?
- RQ2 (Feature Contribution): How do discriminative features (SVM-derived keywords) and topical features (PMI-based co-occurrences) contribute to the quality of generated definitions and downstream sense classification?
- RQ3 (Robustness): To what extent does the proposed method maintain performance under varying degrees of sense distribution imbalance, compared to purely embedding-based baselines?
- (RQ1) To address RQ1, we propose a definition-driven refinement framework for WSI. The method feeds unsupervised clustering results into an LLM to generate sense definitions, which are then used to refine cluster boundaries.
- (RQ2) To address RQ2, we conduct an analysis of feature requirements for definition generation, showing that explicit discriminative features extracted via SVM are essential for generating accurate and distinctive sense descriptions.
- (RQ3) To address RQ3, we demonstrate that the proposed method improves robustness under skewed sense distributions, particularly enhancing minority sense detection while maintaining competitive overall performance.
2. Related Work
2.1. Word Sense Induction via Clustering of Contextualized Representations
2.2. Limitations of Existing WSI Under Skewed Sense Distributions
2.3. Gloss-Aware and Definition-Based Sense Modeling
2.4. LLM-Assisted Semantic Interpretation in WSI
3. Proposed Method
- Formal Problem Definition.
3.1. Initial Clustering
3.2. Sense Definition Generation with LLMs
- Inputs to the LLMWe extract discriminative and salient keywords from each cluster to guide definition generation. We train a one-vs-rest linear SVM with L1 regularization and balanced class weights, and select the top 10 keywords ranked by absolute SVM weight. In addition, we compute a Pointwise Mutual Information (PMI)-based salience score defined as the ratio between the cluster-specific feature probability and the global feature probability, and select the top 10 keywords per cluster accordingly.SVM keywords emphasize inter-cluster discrimination. In contrast, PMI keywords capture salient conceptual features within each cluster. These signals are complementary and particularly helpful for identifying minority senses. The number of keywords is fixed across clusters to avoid bias toward larger clusters.Each cluster is represented by (i) SVM keywords, (ii) PMI keywords, and (iii) representative example sentences obtained from the initial clustering. For each cluster, we randomly sample up to 20 example sentences to capture diverse contextual realizations of the sense. This strategy mitigates the risk of overfitting the definition to a single highly frequent or central instance.Based on these inputs, the LLM infers the shared concept underlying the cluster and generates a concise one-sentence sense definition.
- Prompt for Definition GenerationThe prompt is structured as follows:Sense: <sense ID>SVM keywords: <keyword list>PMI keywords: <keyword list>Representative examples: <sentences>Generate a concise, one-sentence English definition that best describes the main meaning of this cluster.
| Algorithm 1 Definition-Anchored WSI: Overall pipeline including clustering, definition generation, and reclassification |
| Require: Instance set X |
| Ensure: Predicted sense labels |
|
3.3. Definition-Guided Reclassification
You are classifying example sentences for the lemma <lemma_pos>.Here are the possible senses:1. <definition for sense 1>2. <definition for sense 2>3. <definition for sense 3>…For each example, respond ONLY with the number of the most appropriate sense.
4. Experiments
4.1. Datasets
4.2. Evaluation Metrics
- Normalized Mutual Information (NMI).
- V-measure.
- Paired F-Score.
- F-B3.
- Fuzzy-F-B3.
4.3. Baselines
- k-means.
- HDP.
- WSI-NS.
- 1cpl.
- Discussion on LLM-only Clustering.
- Additional Baselines from Recent Work.
4.4. Implementation Details
- Clustering Configuration.
- Keyword Extraction.
- Representative Examples.
- LLM Settings.
- Computation Environment and Cost.
4.5. Data Partitioning
4.6. Main Results
4.7. Ablation Study
- SVM keywords + PMI keywords + instances
- SVM keywords + instances
- PMI keywords + instances
- instances only
5. Results
5.1. Comparison Across LLMs
5.2. Comparison with Baselines
- Analysis of 1cpl Baseline Behavior.
5.3. Ablation on Prompt Inputs
6. Discussion
6.1. Effect of Definition-Based Refinement
6.2. Model-Specific Behavior
- Performance on SemEval-2013.
6.3. Qualitative Comparison of Generated Definitions
6.4. Interpretation of Baseline Behavior
6.5. Statistical Significance Analysis
6.6. Role of Prompt Inputs
6.7. Minority-Sense Clustering
6.8. Implications
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Navigli, R. Word Sense Disambiguation: A Survey. ACM Comput. Surv. 2009, 41, 10. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 4171–4186. [Google Scholar]
- Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA, 1–6 June 2018; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 2227–2237. [Google Scholar]
- Amrami, A.; Goldberg, Y. Word Sense Induction with Neural biLM and Symmetric Patterns. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 4860–4867. [Google Scholar]
- Kilgarriff, A. How Dominant is the Commonest Sense of a Word? In Proceedings of the 5th International Conference on Text, Speech and Dialogue, Brno, Czech Republic, 8–11 September 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 103–111. [Google Scholar]
- Blevins, T.; Zettlemoyer, L. Moving Down the Long Tail of Word Sense Disambiguation with Gloss Informed Bi-encoders. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 1006–1017. [Google Scholar]
- Su, Y.; Zhang, H.; Song, Y.; Zhang, T. Rare and Zero-shot Word Sense Disambiguation using Z-Reweighting. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; pp. 4713–4723. [Google Scholar]
- Schütze, H. Automatic Word Sense Discrimination. Comput. Linguist. 1998, 24, 97–123. [Google Scholar]
- Pantel, P.; Lin, D. Discovering Word Senses from Text. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, AB, Canada, 23–26 July 2002; ACM: New York, NY, USA, 2002; pp. 613–619. [Google Scholar]
- Wiedemann, G.; Remus, S.; Chawla, A.; Biemann, C. Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings. In Proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019), Erlangen, Germany, 9–11 October 2019; German Society for Computational Linguistics & Language Technology: Erlangen, Germany, 2019; pp. 161–170. [Google Scholar]
- Hadiwinoto, C.; Ng, H.T.; Gan, W.C. Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; Association for Computational Linguistics: Kerrville, TX, USA, 2019; pp. 5297–5306. [Google Scholar]
- Amrami, A.; Goldberg, Y. Towards Better Substitution-based Word Sense Induction. arXiv 2019, arXiv:1905.12598. [Google Scholar] [CrossRef]
- Alagić, D.; Šnajder, J.; Padó, S. Leveraging Lexical Substitutes for Unsupervised Word Sense Induction. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; AAAI Press: Washington, DC, USA, 2018; pp. 5004–5011. [Google Scholar]
- Zipf, G.K. Human Behavior and the Principle of Least Effort; Addison-Wesley Press: Boston, MA, USA, 1949. [Google Scholar]
- McCarthy, D.; Koeling, R.; Weeds, J.; Carroll, J. Finding Predominant Word Senses in Untagged Text. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, 21–26 July 2004; Association for Computational Linguistics: Stroudsburg, PA, USA, 2004; pp. 279–286. [Google Scholar]
- Lau, J.H.; Cook, P.; McCarthy, D.; Newman, D.; Baldwin, T. Word Sense Induction for Novel Sense Detection. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, 23–27 April 2012; Association for Computational Linguistics: Stroudsburg, PA, USA, 2012; pp. 591–601. [Google Scholar]
- Mancini, M.; Camacho-Collados, J.; Iacobacci, I.; Navigli, R. Embedding Words and Senses Together via Joint Knowledge-Enhanced Training. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), Vancouver, BC, Canada, 3–4 August 2017; Association for Computational Linguistics: Stroudsburg, PA, USA, 2017; pp. 100–111. [Google Scholar]
- Teh, Y.W.; Jordan, M.I.; Beal, M.J.; Blei, D.M. Hierarchical Dirichlet Processes. J. Am. Stat. Assoc. 2006, 101, 1566–1581. [Google Scholar] [CrossRef]
- Brody, S.; Lapata, M. Bayesian Word Sense Induction. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Athens, Greece, 30 March–3 April 2009; Association for Computational Linguistics: Stroudsburg, PA, USA, 2009; pp. 103–111. [Google Scholar]
- Yao, X.; Van Durme, B. Nonparametric Bayesian Word Sense Induction. In Proceedings of the TextGraphs-6 Workshop on Graph-based Methods for Natural Language Processing, Portland, OR, USA, 23 June 2011; Association for Computational Linguistics: Stroudsburg, PA, USA, 2011; pp. 10–14. [Google Scholar]
- Lesk, M. Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone. In Proceedings of the 5th Annual International Conference on Systems Documentation (SIGDOC ’86), Champaign, IL, USA, 13–15 October 1986; ACM: New York, NY, USA, 1986; pp. 24–26. [Google Scholar]
- Banerjee, S.; Pedersen, T. An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In Proceedings of the 3rd International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2002), Mexico City, Mexico, 17–23 February 2002; Springer: Berlin/Heidelberg, Germany, 2002; pp. 136–145. [Google Scholar]
- Luo, F.; Liu, T.; Xia, Q.; Chang, B.; Sui, Z. Incorporating Glosses into Neural Word Sense Disambiguation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, VIC, Australia, 15–20 July 2018; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 2473–2482. [Google Scholar]
- Huang, L.; Sun, C.; Qiu, X.; Huang, X. GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 3509–3514. [Google Scholar]
- Raganato, A.; Camacho-Collados, J.; Navigli, R. Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, 3–7 April 2017; Association for Computational Linguistics: Stroudsburg, PA, USA, 2017; pp. 99–110. [Google Scholar]
- Loureiro, D.; Jorge, A.M. Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 5682–5691. [Google Scholar]
- Bevilacqua, M.; Navigli, R. Breaking Through the 80% Glass Ceiling: Raising the State of the Art in Word Sense Disambiguation by Incorporating Knowledge Graph Information. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 2854–2864. [Google Scholar]
- Jurgens, D.; Klapaftis, I. SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses. In Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval-2013), Atlanta, GA, USA, 14–15 June 2013; Association for Computational Linguistics: Stroudsburg, PA, USA, 2013; pp. 290–299. [Google Scholar]
- Chen, X.; Liu, Z.; Sun, M. A Unified Model for Word Sense Representation and Disambiguation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; Association for Computational Linguistics: Stroudsburg, PA, USA, 2014; pp. 1025–1035. [Google Scholar]
- Rothe, S.; Schütze, H. AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 26–31 July 2015; Association for Computational Linguistics: Stroudsburg, PA, USA, 2015; pp. 1793–1803. [Google Scholar]
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Qin, C.; Zhang, A.; Zhang, Z.; Chen, J.; Yasunaga, M.; Yang, D. Is ChatGPT a General-Purpose Natural Language Processing Task Solver? In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 1339–1384. [Google Scholar]
- Kocoń, J.; Cichecki, I.; Kaszyca, O.; Kochanek, M.; Szydło, D.; Baran, J.; Bielaniewicz, J.; Gruza, M.; Janz, A.; Kanclerz, K.; et al. ChatGPT: Jack of All Trades, Master of None. Inf. Fusion 2023, 99, 101861. [Google Scholar] [CrossRef]
- Hanna, M.; Mareček, D. Analyzing BERT’s Knowledge of Hypernymy via Prompting. In Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, Abu Dhabi, United Arab Emirates, 8 December 2022; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 275–282. [Google Scholar]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. GPT-4 Technical Report. arXiv 2024, arXiv:2303.08774. [Google Scholar]
- Gemini Team; Anil, R.; Borgeaud, S.; Wu, Y.; Alayrac, J.B.; Yu, J.; Sorber, R.; Schalkwyk, J.; Dai, A.M.; Hauth, A.; et al. Gemini: A Family of Highly Capable Multimodal Models. arXiv 2023, arXiv:2312.11805. [Google Scholar] [CrossRef]
- Manandhar, S.; Klapaftis, I.P.; Dligach, D.; Pradhan, S.S. SemEval-2010 Task 14: Word Sense Induction & Disambiguation. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval-2010), Uppsala, Sweden, 15–16 July 2010; Association for Computational Linguistics: Stroudsburg, PA, USA, 2010; pp. 63–68. [Google Scholar]
- Mosolova, A.; Loureiro, D.; Glavaš, G. Large Language Models Struggle to Outperform One Cluster per Lemma in Word Sense Induction. In Findings of the Association for Computational Linguistics; ACL: Stroudsburg, PA, USA, 2025. [Google Scholar]
- Liu, W.; Hu, J.; Lv, F.; Tang, Z. A new method for long-term temperature compensation of structural health monitoring by ultrasonic guided wave. Measurement 2025, 252, 117310. [Google Scholar] [CrossRef]





| Model Clustering/Classify | SemEval 2013 | SemEval 2010 | ||||
|---|---|---|---|---|---|---|
| Fuzzy-NMI | Fuzzy-F-B3 | V-M | Paired F-S | NMI | F-B3 | |
| 1/9 | 20.09 [±3.19] | 60.27 [±2.17] | 41.52 [±1.14] | 60.85 [±1.79] | 41.52 [±1.14] | 66.06 [±1.32] |
| 2/8 | 14.11 [±0.90] | 54.50 [±0.48] | 44.33 [±3.11] | 62.23 [±2.58] | 44.33 [±3.11] | 67.17 [±1.81] |
| 3/7 | 15.08 [±1.80] | 55.17 [±1.45] | 44.56 [±1.69] | 63.65 [±0.79] | 44.56 [±1.69] | 68.13 [±0.74] |
| 4/6 | 14.65 [±1.47] | 55.13 [±0.67] | 46.45 [±1.40] | 64.52 [±1.30] | 46.45 [±1.40] | 69.34 [±1.16] |
| 5/5 | 14.60 [±1.06] | 56.57 [±1.10] | 47.52 [±2.83] | 64.22 [±1.35] | 47.52 [±2.83] | 70.26 [±1.10] |
| 6/4 | 14.47 [±1.27] | 56.37 [±1.27] | 49.57 [±3.01] | 64.59 [±2.00] | 49.57 [±3.01] | 71.85 [±1.54] |
| 7/3 | 17.22 [±2.57] | 58.71 [±2.21] | 52.00 [±2.99] | 64.14 [±2.44] | 52.00 [±2.99] | 73.30 [±1.48] |
| 8/2 | 17.83 [±2.54] | 58.73 [±2.48] | 55.96 [±1.04] | 62.76 [±2.21] | 55.96 [±1.04] | 76.39 [±0.87] |
| 9/1 | 16.59 [±2.68] | 63.07 [±2.01] | 59.34 [±4.55] | 54.71 [±1.99] | 59.34 [±4.55] | 80.02 [±1.87] |
| Model Clustering/Classify | SemEval 2013 | SemEval 2010 | ||||
|---|---|---|---|---|---|---|
| Fuzzy-NMI | Fuzzy-F-B3 | V-M | Paired F-S | NMI | F-B3 | |
| 1/9 | 15.05 [±1.24] | 55.72 [±1.42] | 15.57 [±0.31] | 39.27 [±1.48] | 15.57 [±0.31] | 46.89 [±1.03] |
| 2/8 | 12.16 [±1.18] | 50.81 [±0.94] | 16.01 [±1.17] | 42.76 [±0.97] | 16.01 [±1.17] | 48.95 [±0.45] |
| 3/7 | 12.91 [±2.31] | 51.15 [±1.72] | 17.39 [±0.97] | 45.10 [±1.24] | 17.39 [±0.97] | 51.19 [±0.85] |
| 4/6 | 12.05 [±1.29] | 50.87 [±1.65] | 20.10 [±0.80] | 45.37 [±0.72] | 20.10 [±0.80] | 52.22 [±0.89] |
| 5/5 | 12.09 [±1.79] | 51.60 [±1.76] | 24.29 [±0.82] | 46.44 [±0.67] | 24.29 [±0.82] | 54.54 [±0.23] |
| 6/4 | 11.46 [±0.98] | 51.50 [±1.25] | 28.67 [±2.60] | 48.18 [±1.08] | 28.67 [±2.60] | 57.84 [±1.22] |
| 7/3 | 13.49 [±2.32] | 53.11 [±0.79] | 33.53 [±2.02] | 47.96 [±1.71] | 33.53 [±2.02] | 60.90 [±1.33] |
| 8/2 | 13.83 [±2.23] | 53.70 [±1.95] | 41.96 [±2.95] | 49.02 [±1.89] | 41.96 [±2.95] | 67.43 [±0.94] |
| 9/1 | 13.08 [±1.79] | 57.71 [±0.67] | 50.54 [±6.47] | 41.76 [±2.38] | 50.54 [±6.47] | 73.45 [±2.30] |
| Model Clustering/Classify | SemEval 2013 | SemEval 2010 | ||||
|---|---|---|---|---|---|---|
| Fuzzy-NMI | Fuzzy-F-B3 | V-M | Paired F-S | NMI | F-B3 | |
| 1/9 | 19.92 [±2.13] | 59.81 [±1.32] | 33.30 [±1.81] | 55.71 [±1.72] | 33.30 [±1.81] | 60.82 [±1.21] |
| 2/8 | 15.21 [±0.74] | 55.00 [±1.06] | 36.98 [±1.70] | 59.70 [±1.78] | 36.98 [±1.70] | 63.90 [±1.23] |
| 3/7 | 16.38 [±1.50] | 55.87 [±0.82] | 37.77 [±1.64] | 62.13 [±1.56] | 37.77 [±1.64] | 65.97 [±1.46] |
| 4/6 | 15.46 [±1.71] | 56.61 [±2.16] | 40.56 [±1.73] | 62.15 [±1.62] | 40.56 [±1.73] | 66.46 [±1.25] |
| 5/5 | 14.65 [±1.28] | 56.61 [±2.28] | 42.76 [±1.30] | 62.92 [±1.10] | 42.76 [±1.30] | 68.30 [±1.02] |
| 6/4 | 16.64 [±2.54] | 57.82 [±2.36] | 44.21 [±3.12] | 62.40 [±1.70] | 44.21 [±3.12] | 69.33 [±1.34] |
| 7/3 | 16.71 [±2.61] | 57.99 [±1.79] | 47.87 [±1.85] | 63.03 [±1.62] | 47.87 [±1.85] | 71.36 [±1.22] |
| 8/2 | 17.97 [±1.44] | 58.52 [±1.04] | 53.22 [±1.65] | 62.19 [±0.63] | 53.22 [±1.65] | 75.92 [±0.59] |
| 9/1 | 17.81 [±2.58] | 63.68 [±2.26] | 54.51 [±3.86] | 52.37 [±2.60] | 54.51 [±3.86] | 78.17 [±1.46] |
| Model Train/Classify | SemEval 2013 | SemEval 2010 | ||||
|---|---|---|---|---|---|---|
| Fuzzy-NMI | Fuzzy-F-B3 | V-M | Paired F-S | NMI | F-B3 | |
| Our experiments | ||||||
| k-means | 16.94 | 31.05 | 24.86 | 47.04 | 24.86 | 52.92 |
| HDP | 8.98 | 58.13 | 11.89 | 54.51 | 11.89 | 58.13 |
| 1cpl | 0 | 58.07 | 0 | 60.68 | 0 | 61.96 |
| GPT-5 (9/1) | 16.59 [±2.68] | 63.07 [±2.01] | 59.34 [±4.55] | 54.71 [±1.99] | 59.34 [±4.55] | 80.02 [±1.87] |
| GPT-4o (9/1) | 13.08 [±1.79] | 57.71 [±0.67] | 50.54 [±6.47] | 41.76 [±2.38] | 50.54 [±6.47] | 73.45 [±2.30] |
| Gemini-2.5-Flash (9/1) | 17.81 [±2.58] | 63.68 [±2.26] | 54.51 [±3.86] | 52.37 [±2.60] | 54.51 [±3.86] | 78.17 [±1.46] |
| WSI-NS | 18.11 [±0.32] | 54.96 [±0.37] | 12.58 [±0.34] | 61.81 [±2.33] | 82.30 [±0.41] | 55.02 [±0.29] |
| Reported in Mosolova et al. [38] | ||||||
| PolyLM-large | 23.7 | 66.7 | 43.6 | 67.5 | 6.2 | 49.2 |
| PolyLM-base | 23.0 | 65.4 | 41.8 | 66.4 | 6.2 | 49.1 |
| LSDP | 21.1 [±0.6] | 64.1 [±0.5] | 38.9 [±1.0] | 70.7 [±0.4] | 4.6 [±0.1] | 52.8 [±0.2] |
| GPT-4o (reported) | 16.9 [±0.5] | 58.6 [±1.6] | 36.3 [±2.0] | 63.9 [±2.0] | 7.1 [±0.3] | 47.7 [±1.9] |
| 1cpl (reported) | 0.0 | 61.23 | 0.0 | 63.5 | 0.0 | 64.1 |
| 1cpex (reported) | 6.9 | NA | 31.7 | 0 | 19.5 | 8.0 |
| Model Train/Classify | SemEval 2013 | SemEval 2010 | ||||
|---|---|---|---|---|---|---|
| Fuzzy-NMI | Fuzzy-F-B3 | V-M | Paired F-S | NMI | F-B3 | |
| none | 17.87 [±1.11] | 58.47 [±0.67] | 14.54 [±0.69] | 47.11 [±1.33] | 14.54 [±0.69] | 51.63 [±1.01] |
| svm | 18.95 [±2.49] | 59.23 [±1.62] | 13.82 [±1.40] | 47.56 [±1.55] | 13.82 [±1.40] | 52.01 [±1.22] |
| pmi | 17.88 [±1.11] | 59.46 [±0.67] | 14.92 [±2.17] | 47.37 [±0.95] | 14.92 [±2.17] | 51.92 [±0.92] |
| svm and pmi | 21.57 [±1.08] | 61.03 [±0.71] | 13.81 [±0.61] | 47.43 [±0.88] | 13.81 [±0.61] | 52.87 [±0.73] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yoshikawa, S.; Sasaki, M. Definition-Anchored Unsupervised Word Sense Induction Using LLM-Generated Glosses. Appl. Sci. 2026, 16, 3797. https://doi.org/10.3390/app16083797
Yoshikawa S, Sasaki M. Definition-Anchored Unsupervised Word Sense Induction Using LLM-Generated Glosses. Applied Sciences. 2026; 16(8):3797. https://doi.org/10.3390/app16083797
Chicago/Turabian StyleYoshikawa, Shota, and Minoru Sasaki. 2026. "Definition-Anchored Unsupervised Word Sense Induction Using LLM-Generated Glosses" Applied Sciences 16, no. 8: 3797. https://doi.org/10.3390/app16083797
APA StyleYoshikawa, S., & Sasaki, M. (2026). Definition-Anchored Unsupervised Word Sense Induction Using LLM-Generated Glosses. Applied Sciences, 16(8), 3797. https://doi.org/10.3390/app16083797

