MDPI - Publisher of Open Access Journals

27 pages, 2599 KB

Open AccessArticle

AdaGram in Python: An AI Framework for Multi-Sense Embedding in Text and Scientific Formulas

by Arun Josephraj Arokiaraj, Samah Ibrahim, André Then, Bashar Ibrahim and Stephan Peter

Mathematics 2025, 13(14), 2241; https://doi.org/10.3390/math13142241 - 10 Jul 2025

Viewed by 715

The Adaptive Skip-gram (AdaGram) algorithm extends traditional word embeddings by learning multiple vector representations per word, enabling the capture of contextual meanings and polysemy. Originally implemented in Julia, AdaGram has seen limited adoption due to ecosystem fragmentation and the comparative scarcity of Julia’s machine learning tooling compared to Python’s mature frameworks. In this work, we present a Python-based reimplementation of AdaGram that facilitates broader integration with modern machine learning tools. Our implementation expands the model’s applicability beyond natural language, enabling the analysis of scientific notation—particularly chemical and physical formulas encoded in LaTeX. We detail the algorithmic foundations, preprocessing pipeline, and hyperparameter configurations needed for interdisciplinary corpora. Evaluations on real-world texts and LaTeX-encoded formulas demonstrate AdaGram’s effectiveness in unsupervised word sense disambiguation. Comparative analyses highlight the importance of corpus design and parameter tuning. This implementation opens new applications in formula-aware literature search engines, ambiguity reduction in automated scientific summarization, and cross-disciplinary concept alignment. Full article

(This article belongs to the Section E: Applied Mathematics)

► Show Figures

Figure 1

22 pages, 577 KB

Open AccessArticle

Unsupervised Word Sense Disambiguation Using Transformer’s Attention Mechanism

by Radu Ion, Vasile Păiș, Verginica Barbu Mititelu, Elena Irimia, Maria Mitrofan, Valentin Badea and Dan Tufiș

Mach. Learn. Knowl. Extr. 2025, 7(1), 10; https://doi.org/10.3390/make7010010 - 18 Jan 2025

Cited by 2 | Viewed by 2657

Abstract

Transformer models produce advanced text representations that have been used to break through the hard challenge of natural language understanding. Using the Transformer’s attention mechanism, which acts as a language learning memory, trained on tens of billions of words, a word sense disambiguation (WSD) algorithm can now construct a more faithful vectorial representation of the context of a word to be disambiguated. Working with a set of 34 lemmas of nouns, verbs, adjectives and adverbs selected from the National Reference Corpus of Romanian (CoRoLa), we show that using BERT’s attention heads at all hidden layers, we can devise contextual vectors of the target lemma that produce better clusters of lemma’s senses than the ones obtained with standard BERT embeddings. If we automatically translate the Romanian example sentences of the target lemma into English, we show that we can reliably infer the number of senses with which the target lemma appears in the CoRoLa. We also describe an unsupervised WSD algorithm that, using a Romanian BERT model and a few example sentences of the target lemma’s senses, can label the Romanian induced sense clusters with the appropriate sense labels, with an average accuracy of 64%. Full article

(This article belongs to the Collection Extravaganza Feature Papers on Hot Topics in Machine Learning and Knowledge Extraction)

25 pages, 1910 KB

Open AccessReview

A Literature Survey on Word Sense Disambiguation for the Hindi Language

by Vinto Gujjar, Neeru Mago, Raj Kumari, Shrikant Patel, Nalini Chintalapudi and Gopi Battineni

Information 2023, 14(9), 495; https://doi.org/10.3390/info14090495 - 7 Sep 2023

Cited by 11 | Viewed by 3741

Abstract

Word sense disambiguation (WSD) is a process used to determine the most appropriate meaning of a word in a given contextual framework, particularly when the word is ambiguous. While WSD has been extensively studied for English, it remains a challenging problem for resource-scarce languages such as Hindi. Therefore, it is crucial to address ambiguity in Hindi to effectively and efficiently utilize it on the web for various applications such as machine translation, information retrieval, etc. The rich linguistic structure of Hindi, characterized by complex morphological variations and syntactic nuances, presents unique challenges in accurately determining the intended sense of a word within a given context. This review paper presents an overview of different approaches employed to resolve the ambiguity of Hindi words, including supervised, unsupervised, and knowledge-based methods. Additionally, the paper discusses applications, identifies open problems, presents conclusions, and suggests future research directions. Full article

(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)

► Show Figures

Figure 1

21 pages, 4216 KB

Open AccessArticle

Word Sense Disambiguation Using Prior Probability Estimation Based on the Korean WordNet

by Minho Kim and Hyuk-Chul Kwon

Electronics 2021, 10(23), 2938; https://doi.org/10.3390/electronics10232938 - 26 Nov 2021

Cited by 5 | Viewed by 3038

Abstract

Supervised disambiguation using a large amount of corpus data delivers better performance than other word sense disambiguation methods. However, it is not easy to construct large-scale, sense-tagged corpora since this requires high cost and time. On the other hand, implementing unsupervised disambiguation is relatively easy, although most of the efforts have not been satisfactory. A primary reason for the performance degradation of unsupervised disambiguation is that the semantic occurrence probability of ambiguous words is not available. Hence, a data deficiency problem occurs while determining the dependency between words. This paper proposes an unsupervised disambiguation method using a prior probability estimation based on the Korean WordNet. This performs better than supervised disambiguation. In the Korean WordNet, all the words have similar semantic characteristics to their related words. Thus, it is assumed that the dependency between words is the same as the dependency between their related words. This resolves the data deficiency problem by determining the dependency between words by calculating the

χ^{2}

statistic between related words. Moreover, in order to have the same effect as using the semantic occurrence probability as prior probability, which is used in supervised disambiguation, semantically related words of ambiguous vocabulary are obtained and utilized as prior probability data. An experiment was conducted with Korean, English, and Chinese to evaluate the performance of our proposed lexical disambiguation method. We found that our proposed method had better performance than supervised disambiguation methods even though our method is based on unsupervised disambiguation (using a knowledge-based approach). Full article

(This article belongs to the Special Issue Electronic Solutions for Artificial Intelligence Healthcare Volume II)

► Show Figures

Figure 1

15 pages, 423 KB

Open AccessArticle

Unsupervised Predominant Sense Detection and Its Application to Text Classification

by Attaporn Wangpoonsarp, Kazuya Shimura and Fumiyo Fukumoto

Appl. Sci. 2020, 10(17), 6052; https://doi.org/10.3390/app10176052 - 1 Sep 2020

Cited by 3 | Viewed by 2482

Abstract

This paper focuses on the domain-specific senses of words and proposes a method for detecting predominant sense depending on each domain. Our Domain-Specific Senses (DSS) model is an unsupervised manner and detects predominant senses in each domain. We apply a simple Markov Random Walk (MRW) model to ranking senses for each domain. It decides the importance of a sense within a graph by using the similarity of senses. The similarity of senses is obtained by using distributional representations of words from gloss texts in the thesaurus. It can capture large semantic context and thus does not require manual annotation of sense-tagged data. We used the Reuters corpus and the WordNet in the experiments. We applied the results of domain-specific senses to text classification and examined how DSS affects the overall performance of the text classification task. We compared our DSS model with one of the word sense disambiguation techniques (WSD), Context2vec, and the results demonstrate our domain-specific sense approach gains 0.053 F1 improvement on average over the WSD approach. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

16 pages, 994 KB

Open AccessArticle

Word Sense Disambiguation Using Cosine Similarity Collaborates with Word2vec and WordNet

by Korawit Orkphol and Wu Yang

Future Internet 2019, 11(5), 114; https://doi.org/10.3390/fi11050114 - 12 May 2019

Cited by 85 | Viewed by 12438

Abstract

Words have different meanings (i.e., senses) depending on the context. Disambiguating the correct sense is important and a challenging task for natural language processing. An intuitive way is to select the highest similarity between the context and sense definitions provided by a large lexical database of English, WordNet. In this database, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms interlinked through conceptual semantics and lexicon relations. Traditional unsupervised approaches compute similarity by counting overlapping words between the context and sense definitions which must match exactly. Similarity should compute based on how words are related rather than overlapping by representing the context and sense definitions on a vector space model and analyzing distributional semantic relationships among them using latent semantic analysis (LSA). When a corpus of text becomes more massive, LSA consumes much more memory and is not flexible to train a huge corpus of text. A word-embedding approach has an advantage in this issue. Word2vec is a popular word-embedding approach that represents words on a fix-sized vector space model through either the skip-gram or continuous bag-of-words (CBOW) model. Word2vec is also effectively capturing semantic and syntactic word similarities from a huge corpus of text better than LSA. Our method used Word2vec to construct a context sentence vector, and sense definition vectors then give each word sense a score using cosine similarity to compute the similarity between those sentence vectors. The sense definition also expanded with sense relations retrieved from WordNet. If the score is not higher than a specific threshold, the score will be combined with the probability of that sense distribution learned from a large sense-tagged corpus, SEMCOR. The possible answer senses can be obtained from high scores. Our method shows that the result (50.9% or 48.7% without the probability of sense distribution) is higher than the baselines (i.e., original, simplified, adapted and LSA Lesk) and outperforms many unsupervised systems participating in the SENSEVAL-3 English lexical sample task. Full article

(This article belongs to the Special Issue Big Data Analytics and Artificial Intelligence)

► Show Figures

Figure 1

Search Results (6)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (6)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI