MDPI - Publisher of Open Access Journals

27 pages, 2599 KiB

Open AccessArticle

AdaGram in Python: An AI Framework for Multi-Sense Embedding in Text and Scientific Formulas

by Arun Josephraj Arokiaraj, Samah Ibrahim, André Then, Bashar Ibrahim and Stephan Peter

Mathematics 2025, 13(14), 2241; https://doi.org/10.3390/math13142241 - 10 Jul 2025

Viewed by 360

The Adaptive Skip-gram (AdaGram) algorithm extends traditional word embeddings by learning multiple vector representations per word, enabling the capture of contextual meanings and polysemy. Originally implemented in Julia, AdaGram has seen limited adoption due to ecosystem fragmentation and the comparative scarcity of Julia’s [...] Read more.

The Adaptive Skip-gram (AdaGram) algorithm extends traditional word embeddings by learning multiple vector representations per word, enabling the capture of contextual meanings and polysemy. Originally implemented in Julia, AdaGram has seen limited adoption due to ecosystem fragmentation and the comparative scarcity of Julia’s machine learning tooling compared to Python’s mature frameworks. In this work, we present a Python-based reimplementation of AdaGram that facilitates broader integration with modern machine learning tools. Our implementation expands the model’s applicability beyond natural language, enabling the analysis of scientific notation—particularly chemical and physical formulas encoded in LaTeX. We detail the algorithmic foundations, preprocessing pipeline, and hyperparameter configurations needed for interdisciplinary corpora. Evaluations on real-world texts and LaTeX-encoded formulas demonstrate AdaGram’s effectiveness in unsupervised word sense disambiguation. Comparative analyses highlight the importance of corpus design and parameter tuning. This implementation opens new applications in formula-aware literature search engines, ambiguity reduction in automated scientific summarization, and cross-disciplinary concept alignment. Full article

(This article belongs to the Section E: Applied Mathematics)

► Show Figures

Figure 1

33 pages, 4035 KiB

Open AccessArticle

Hybrid Transformer-Based Large Language Models for Word Sense Disambiguation in the Low-Resource Sesotho sa Leboa Language

by Hlaudi Daniel Masethe, Mosima Anna Masethe, Sunday O. Ojo, Pius A. Owolawi and Fausto Giunchiglia

Appl. Sci. 2025, 15(7), 3608; https://doi.org/10.3390/app15073608 - 25 Mar 2025

Cited by 1 | Viewed by 1016

Abstract

This study addresses a lexical ambiguity issue in Sesotho sa Leboa that arises from terms with various meanings, often known as homonyms or polysemous words. When compared to, for instance, European languages, this lexical ambiguity in Sesotho sa Leboa causes computational semantic problems [...] Read more.

This study addresses a lexical ambiguity issue in Sesotho sa Leboa that arises from terms with various meanings, often known as homonyms or polysemous words. When compared to, for instance, European languages, this lexical ambiguity in Sesotho sa Leboa causes computational semantic problems in NLP when trying to identify the lexicon of a language. In other words, it is challenging to determine the proper lexical category and sense of words due to this ambiguity problem. In order to address the issue of polysemy in the Sesotho sa Leboa language, this study set out to create a word sense discrimination (WSD) scheme using a corpus-based hybrid transformer-based architecture and deep learning models. Additionally, the performance of baseline and improved machine learning models for a sequence-based natural language processing (NLP) task was assessed and compared. The baseline models included RNN-LSTM, BiGRU, LSTMLM, DeBERTa, and DistilBERT, with accuracies of 61%, 79%, 74%, 70%, and 64%, respectively. Among these, BiGRU emerged as the strongest performer, leveraging its bidirectional architecture to achieve the highest baseline accuracy. Transformer-based models, such as DeBERTa and DistilBERT, demonstrated moderate performance, with the latter prioritizing efficiency at the cost of accuracy. The enhanced results explored optimization techniques and hybrid model architectures to improve performance. BiGRU, optimized with ADAM, achieved an accuracy of 84%, while BiGRU with attention mechanisms further improved to 85%, showcasing the effectiveness of these enhancements. Hybrid models integrating BiGRU with transformer architectures demonstrated varying results. BiGRU + DeBERTa and BiGRU + ALBERT achieved the highest accuracies of 85% and 84%, respectively, highlighting the complementary strengths of bidirectional context modeling and advanced transformer-based contextual understanding. Conversely, the Hybrid BiGRU + RoBERTa model underperformed, with an accuracy of 70%, indicating potential mismatches in model synergy. These findings highlight how crucial hybridization and optimization are to reaching cutting-edge performance on NLP tasks. According to this study’s findings, the most promising approaches for fusing accuracy and efficiency are attention-based BiGRU and BiGRU–transformer hybrids, especially those that incorporate DeBERTa and ALBERT. To further improve speed, future research should concentrate on exploring task-specific optimizations and improving hybrid model integration. Full article

(This article belongs to the Special Issue Advances in Large Language Models: Techniques, Applications and Challenges)

► Show Figures

Figure 1

14 pages, 2439 KiB

Open AccessArticle

A Context-Preserving Tokenization Mismatch Resolution Method for Korean Word Sense Disambiguation Based on the Sejong Corpus and BERT

by Hanjo Jeong

Mathematics 2025, 13(5), 864; https://doi.org/10.3390/math13050864 - 5 Mar 2025

Viewed by 901

Abstract

The disambiguation of word senses (Word Sense Disambiguation, WSD) plays a crucial role in various natural language processing (NLP) tasks, such as machine translation, sentiment analysis, and information retrieval. Due to the complex morphological structure and polysemy of the Korean language, the meaning [...] Read more.

The disambiguation of word senses (Word Sense Disambiguation, WSD) plays a crucial role in various natural language processing (NLP) tasks, such as machine translation, sentiment analysis, and information retrieval. Due to the complex morphological structure and polysemy of the Korean language, the meaning of words can change depending on the context, making the WSD problem challenging. Since a single word can have multiple meanings, accurately distinguishing between them is essential for improving the performance of NLP models. Recently, large-scale pre-trained models like BERT and GPT, based on transfer learning, have shown promising results in addressing this issue. However, for languages with complex morphological structures, like Korean, the tokenization mismatch between pre-trained models and fine-tuning data prevents the rich contextual and lexical information learned by the pre-trained models from being fully utilized in downstream tasks. This paper proposes a novel method to address the tokenization mismatch issue during the fine-tuning of Korean WSD, leveraging BERT-based pre-trained models and the Sejong corpus, which has been annotated by language experts. Experimental results using various BERT-based pre-trained models and datasets from the Sejong corpus demonstrate that the proposed method improves performance by approximately 3–5% compared to existing approaches. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

► Show Figures

Figure 1

22 pages, 577 KiB

Open AccessArticle

Unsupervised Word Sense Disambiguation Using Transformer’s Attention Mechanism

by Radu Ion, Vasile Păiș, Verginica Barbu Mititelu, Elena Irimia, Maria Mitrofan, Valentin Badea and Dan Tufiș

Mach. Learn. Knowl. Extr. 2025, 7(1), 10; https://doi.org/10.3390/make7010010 - 18 Jan 2025

Cited by 1 | Viewed by 1765

Abstract

Transformer models produce advanced text representations that have been used to break through the hard challenge of natural language understanding. Using the Transformer’s attention mechanism, which acts as a language learning memory, trained on tens of billions of words, a word sense disambiguation [...] Read more.

Transformer models produce advanced text representations that have been used to break through the hard challenge of natural language understanding. Using the Transformer’s attention mechanism, which acts as a language learning memory, trained on tens of billions of words, a word sense disambiguation (WSD) algorithm can now construct a more faithful vectorial representation of the context of a word to be disambiguated. Working with a set of 34 lemmas of nouns, verbs, adjectives and adverbs selected from the National Reference Corpus of Romanian (CoRoLa), we show that using BERT’s attention heads at all hidden layers, we can devise contextual vectors of the target lemma that produce better clusters of lemma’s senses than the ones obtained with standard BERT embeddings. If we automatically translate the Romanian example sentences of the target lemma into English, we show that we can reliably infer the number of senses with which the target lemma appears in the CoRoLa. We also describe an unsupervised WSD algorithm that, using a Romanian BERT model and a few example sentences of the target lemma’s senses, can label the Romanian induced sense clusters with the appropriate sense labels, with an average accuracy of 64%. Full article

(This article belongs to the Collection Extravaganza Feature Papers on Hot Topics in Machine Learning and Knowledge Extraction)

19 pages, 528 KiB

Open AccessArticle

Enhancing Word Embeddings for Improved Semantic Alignment

by Julian Szymański, Maksymilian Operlejn and Paweł Weichbroth

Appl. Sci. 2024, 14(24), 11519; https://doi.org/10.3390/app142411519 - 10 Dec 2024

Viewed by 3129

Abstract

This study introduces a method for the improvement of word vectors, addressing the limitations of traditional approaches like Word2Vec or GloVe through introducing into embeddings richer semantic properties. Our approach leverages supervised learning methods, with shifts in vectors in the representation space enhancing [...] Read more.

This study introduces a method for the improvement of word vectors, addressing the limitations of traditional approaches like Word2Vec or GloVe through introducing into embeddings richer semantic properties. Our approach leverages supervised learning methods, with shifts in vectors in the representation space enhancing the quality of word embeddings. This ensures better alignment with semantic reference resources, such as WordNet. The effectiveness of the method has been demonstrated through the application of modified embeddings to text classification and clustering. We also show how our method influences document class distributions, visualized through PCA projections. By comparing our results with state-of-the-art approaches and achieving better accuracy, we confirm the effectiveness of the proposed method. The results underscore the potential of adaptive embeddings to improve both the accuracy and efficiency of semantic analysis across a range of NLP. Full article

(This article belongs to the Special Issue Advances in Human–Machine Systems, Human–Machine Interfaces and Human Wearable Device Performance)

► Show Figures

Figure 1

26 pages, 6706 KiB

Open AccessSystematic Review

Word Sense Disambiguation for Morphologically Rich Low-Resourced Languages: A Systematic Literature Review and Meta-Analysis

by Hlaudi Daniel Masethe, Mosima Anna Masethe, Sunday Olusegun Ojo, Fausto Giunchiglia and Pius Adewale Owolawi

Information 2024, 15(9), 540; https://doi.org/10.3390/info15090540 - 4 Sep 2024

Cited by 4 | Viewed by 2440

Abstract

In natural language processing, word sense disambiguation (WSD) continues to be a major difficulty, especially for low-resource languages where linguistic variation and a lack of data make model training and evaluation more difficult. The goal of this comprehensive review and meta-analysis of the [...] Read more.

In natural language processing, word sense disambiguation (WSD) continues to be a major difficulty, especially for low-resource languages where linguistic variation and a lack of data make model training and evaluation more difficult. The goal of this comprehensive review and meta-analysis of the literature is to summarize the body of knowledge regarding WSD techniques for low-resource languages, emphasizing the advantages and disadvantages of different strategies. A thorough search of several databases for relevant literature produced articles assessing WSD methods in low-resource languages. Effect sizes and performance measures were extracted from a subset of trials through analysis. Heterogeneity was evaluated using pooled effect and estimates were computed by meta-analysis. The preferred reporting elements for systematic reviews and meta-analyses (PRISMA) were used to develop the process for choosing the relevant papers for extraction. The meta-analysis included 32 studies, encompassing a range of WSD methods and low-resourced languages. The overall pooled effect size indicated moderate effectiveness of WSD techniques. Heterogeneity among studies was high, with an I² value of 82.29%, suggesting substantial variability in WSD performance across different studies. The (τ²) tau value of 5.819 further reflects the extent of between-study variance. This variability underscores the challenges in generalizing findings and highlights the influence of diverse factors such as language-specific characteristics, dataset quality, and methodological differences. The p-values from the meta-regression (0.454) and the meta-analysis (0.440) suggest that the variability in WSD performance is not statistically significantly associated with the investigated moderators, indicating that the performance differences may be influenced by factors not fully captured in the current analysis. The absence of significant p-values raises the possibility that the problems presented by low-resource situations are not yet well addressed by the models and techniques in use. Full article

(This article belongs to the Special Issue Advances in Human-Centered Artificial Intelligence)

► Show Figures

Graphical abstract

20 pages, 1291 KiB

Open AccessArticle

Reversal of the Word Sense Disambiguation Task Using a Deep Learning Model

by Algirdas Laukaitis

Appl. Sci. 2024, 14(13), 5550; https://doi.org/10.3390/app14135550 - 26 Jun 2024

Viewed by 2197

Abstract

Word sense disambiguation (WSD) remains a persistent challenge in the natural language processing (NLP) community. While various NLP packages exist, the Lesk algorithm in the NLTK library demonstrates suboptimal accuracy. In this research article, we propose an innovative methodology and an open-source framework [...] Read more.

Word sense disambiguation (WSD) remains a persistent challenge in the natural language processing (NLP) community. While various NLP packages exist, the Lesk algorithm in the NLTK library demonstrates suboptimal accuracy. In this research article, we propose an innovative methodology and an open-source framework that effectively addresses the challenges of WSD by optimizing memory usage without compromising accuracy. Our system seamlessly integrates WSD into NLP tasks, offering functionality similar to that provided by the NLTK library. However, we go beyond the existing approaches by introducing a novel idea related to WSD. Specifically, we leverage deep neural networks and consider the language patterns learned by these models as the new gold standard. This approach suggests modifying existing semantic dictionaries, such as WordNet, to align with these patterns. Empirical validation through a series of experiments confirmed the effectiveness of our proposed method, achieving state-of-the-art performance across multiple WSD datasets. Notably, our system does not require the installation of additional software beyond the well-known Python libraries. The classification model is saved in a readily usable text format, and the entire framework (model and data) is publicly available on GitHub for the NLP research community. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

9 pages, 1925 KiB

Open AccessProceeding Paper

A New Approach for Carrying Out Sentiment Analysis of Social Media Comments Using Natural Language Processing

by Mritunjay Ranjan, Sanjay Tiwari, Arif Md Sattar and Nisha S. Tatkar

Eng. Proc. 2023, 59(1), 181; https://doi.org/10.3390/engproc2023059181 - 17 Jan 2024

Cited by 5 | Viewed by 6481

Abstract

Business and science are using sentiment analysis to extract and assess subjective information from the web, social media, and other sources using NLP, computational linguistics, text analysis, image processing, audio processing, and video processing. It models polarity, attitudes, and urgency from positive, negative, [...] Read more.

Business and science are using sentiment analysis to extract and assess subjective information from the web, social media, and other sources using NLP, computational linguistics, text analysis, image processing, audio processing, and video processing. It models polarity, attitudes, and urgency from positive, negative, or neutral inputs. Unstructured data make emotion assessment difficult. Unstructured consumer data allow businesses to market, engage, and connect with consumers on social media. Text data are instantly assessed for user sentiment. Opinion mining identifies a text’s positive, negative, or neutral opinions, attitudes, views, emotions, and sentiments. Text analytics uses machine learning to evaluate “unstructured” natural language text data. These data can help firms make money and decisions. Sentiment analysis shows how individuals feel about things, services, organizations, people, events, themes, and qualities. Reviews, forums, blogs, social media, and other articles use it. DD (data-driven) methods find complicated semantic representations of texts without feature engineering. Data-driven sentiment analysis is three-tiered: document-level sentiment analysis determines polarity and sentiment, aspect-based sentiment analysis assesses document segments for emotion and polarity, and data-driven (DD) sentiment analysis recognizes word polarity and writes positive and negative neutral sentiments. Our innovative method captures sentiments from text comments. The syntactic layer encompasses various processes such as sentence-level normalisation, identification of ambiguities at paragraph boundaries, part-of-speech (POS) tagging, text chunking, and lemmatization. Pragmatics include personality recognition, sarcasm detection, metaphor comprehension, aspect extraction, and polarity detection; semantics include word sense disambiguation, concept extraction, named entity recognition, anaphora resolution, and subjectivity detection. Full article

(This article belongs to the Proceedings of Eng. Proc., 2023, RAiSE-2023)

► Show Figures

Figure 1

25 pages, 1910 KiB

Open AccessReview

A Literature Survey on Word Sense Disambiguation for the Hindi Language

by Vinto Gujjar, Neeru Mago, Raj Kumari, Shrikant Patel, Nalini Chintalapudi and Gopi Battineni

Information 2023, 14(9), 495; https://doi.org/10.3390/info14090495 - 7 Sep 2023

Cited by 8 | Viewed by 3363

Abstract

Word sense disambiguation (WSD) is a process used to determine the most appropriate meaning of a word in a given contextual framework, particularly when the word is ambiguous. While WSD has been extensively studied for English, it remains a challenging problem for resource-scarce [...] Read more.

Word sense disambiguation (WSD) is a process used to determine the most appropriate meaning of a word in a given contextual framework, particularly when the word is ambiguous. While WSD has been extensively studied for English, it remains a challenging problem for resource-scarce languages such as Hindi. Therefore, it is crucial to address ambiguity in Hindi to effectively and efficiently utilize it on the web for various applications such as machine translation, information retrieval, etc. The rich linguistic structure of Hindi, characterized by complex morphological variations and syntactic nuances, presents unique challenges in accurately determining the intended sense of a word within a given context. This review paper presents an overview of different approaches employed to resolve the ambiguity of Hindi words, including supervised, unsupervised, and knowledge-based methods. Additionally, the paper discusses applications, identifies open problems, presents conclusions, and suggests future research directions. Full article

(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)

► Show Figures

Figure 1

10 pages, 254 KiB

Open AccessArticle

Findings on Ad Hoc Contractions

by Sing Choi and Kazem Taghva

Information 2023, 14(7), 391; https://doi.org/10.3390/info14070391 - 10 Jul 2023

Viewed by 1584

Abstract

Abbreviations are often overlooked, since their frequency and acceptance are almost second nature in everyday communication. Business names, handwritten notes, online messaging, professional domains, and different languages all have their own set of abbreviations. The abundance and frequent introduction of new abbreviations cause [...] Read more.

Abbreviations are often overlooked, since their frequency and acceptance are almost second nature in everyday communication. Business names, handwritten notes, online messaging, professional domains, and different languages all have their own set of abbreviations. The abundance and frequent introduction of new abbreviations cause multiple areas of overlaps and ambiguity, which mean documents often lose their clarity. We reverse engineered the process of creating these ad hoc abbreviations and revealed some preliminary statistics on what makes them easier or harder to define. In addition, we generated candidate definitions for which it proved difficult for a word sense disambiguation model to select the correct definition. Full article

(This article belongs to the Special Issue Novel Methods and Applications in Natural Language Processing)

► Show Figures

Figure 1

13 pages, 415 KiB

Open AccessArticle

Developing an Urdu Lemmatizer Using a Dictionary-Based Lookup Approach

by Saima Shaukat, Muhammad Asad and Asmara Akram

Appl. Sci. 2023, 13(8), 5103; https://doi.org/10.3390/app13085103 - 19 Apr 2023

Cited by 3 | Viewed by 3126

Abstract

Lemmatization aims at returning the root form of a word. The lemmatizer is envisioned as a vital instrument that can assist in many Natural Language Processing (NLP) tasks. These tasks include Information Retrieval, Word Sense Disambiguation, Machine Translation, Text Reuse, and Plagiarism Detection. [...] Read more.

Lemmatization aims at returning the root form of a word. The lemmatizer is envisioned as a vital instrument that can assist in many Natural Language Processing (NLP) tasks. These tasks include Information Retrieval, Word Sense Disambiguation, Machine Translation, Text Reuse, and Plagiarism Detection. Previous studies in the literature have focused on developing lemmatizers using rule-based approaches for English and other highly-resourced languages. However, there have been no thorough efforts for the development of a lemmatizer for most South Asian languages, specifically Urdu. Urdu is a morphologically rich language with many inflectional and derivational forms. This makes the development of an efficient Urdu lemmatizer a challenging task. A standardized lemmatizer would contribute towards establishing much-needed methodological resources for this low-resourced language, which are required to boost the performance of many Urdu NLP applications. This paper presents a lemmatization system for the Urdu language, based on a novel dictionary lookup approach. The contributions made through this research are the following: (1) the development of a large benchmark corpus for the Urdu language, (2) the exploration of the relationship between parts of speech tags and the lemmatizer, and (3) the development of standard approaches for an Urdu lemmatizer. Furthermore, we experimented with the impact of Part of Speech (PoS) on our proposed dictionary lookup approach. The empirical results showed that we achieved the best accuracy score of 76.44% through the proposed dictionary lookup approach. Full article

(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications)

15 pages, 1209 KiB

Open AccessArticle

Computationally Efficient Context-Free Named Entity Disambiguation with Wikipedia

by Michael Angelos Simos and Christos Makris

Information 2022, 13(8), 367; https://doi.org/10.3390/info13080367 - 2 Aug 2022

Cited by 4 | Viewed by 3060

Abstract

The induction of the semantics of unstructured text corpora is a crucial task for modern natural language processing and artificial intelligence applications. The Named Entity Disambiguation task comprises the extraction of Named Entities and their linking to an appropriate representation from a concept [...] Read more.

The induction of the semantics of unstructured text corpora is a crucial task for modern natural language processing and artificial intelligence applications. The Named Entity Disambiguation task comprises the extraction of Named Entities and their linking to an appropriate representation from a concept ontology based on the available information. This work introduces novel methodologies, leveraging domain knowledge extraction from Wikipedia in a simple yet highly effective approach. In addition, we introduce a fuzzy logic model with a strong focus on computational efficiency. We also present a new measure, decisive in both methods for the entity linking selection and the quantification of the confidence of the produced entity links, namely the relative commonness measure. The experimental results of our approach on established datasets revealed state-of-the-art accuracy and run-time performance in the domain of fast, context-free Wikification, by relying on an offline pre-processing stage on the corpus of Wikipedia. The methods introduced can be leveraged as stand-alone NED methodologies, propitious for applications on mobile devices, or in the context of vastly reducing the complexity of deep neural network approaches as a first context-free layer. Full article

(This article belongs to the Special Issue Knowledge Management and Digital Humanities)

► Show Figures

Figure 1

11 pages, 427 KiB

Open AccessArticle

Word Sense Disambiguation Using Clustered Sense Labels

by Jeong Yeon Park, Hyeong Jin Shin and Jae Sung Lee

Appl. Sci. 2022, 12(4), 1857; https://doi.org/10.3390/app12041857 - 11 Feb 2022

Cited by 8 | Viewed by 2754

Abstract

Sequence labeling models for word sense disambiguation have proven highly effective when the sense vocabulary is compressed based on the thesaurus hierarchy. In this paper, we propose a method for compressing the sense vocabulary without using a thesaurus. For this, sense definitions in [...] Read more.

Sequence labeling models for word sense disambiguation have proven highly effective when the sense vocabulary is compressed based on the thesaurus hierarchy. In this paper, we propose a method for compressing the sense vocabulary without using a thesaurus. For this, sense definitions in a dictionary are converted into sentence vectors and clustered into the compressed senses. First, the very large set of sense vectors is partitioned for less computational complexity, and then it is clustered hierarchically with awareness of homographs. The experiment was done on the English Senseval and Semeval datasets and the Korean Sejong sense annotated corpus. This process demonstrated that the performance greatly increased compared to that of the uncompressed sense model and is comparable to that of the thesaurus-based model. Full article

► Show Figures

Figure 1

28 pages, 1948 KiB

Open AccessReview

A Contemporary Review on Utilizing Semantic Web Technologies in Healthcare, Virtual Communities, and Ontology-Based Information Processing Systems

by Senthil Kumar Narayanasamy, Kathiravan Srinivasan, Yuh-Chung Hu, Satish Kumar Masilamani and Kuo-Yi Huang

Electronics 2022, 11(3), 453; https://doi.org/10.3390/electronics11030453 - 3 Feb 2022

Cited by 31 | Viewed by 9426

Abstract

The semantic web is an emerging technology that helps to connect different users to create their content and also facilitates the way of representing information in a manner that can be made understandable for computers. As the world is heading towards the fourth [...] Read more.

The semantic web is an emerging technology that helps to connect different users to create their content and also facilitates the way of representing information in a manner that can be made understandable for computers. As the world is heading towards the fourth industrial revolution, the implicit utilization of artificial-intelligence-enabled semantic web technologies paves the way for many real-time application developments. The fundamental building blocks for the overwhelming utilization of semantic web technologies are ontologies, and it allows sharing as well as reusing the concepts in a standardized way so that the data gathered from heterogeneous sources receive a common nomenclature, and it paves the way for disambiguating the duplicates very easily. In this context, the right utilization of ontology capabilities would further strengthen its presence in many web-based applications such as e-learning, virtual communities, social media sites, healthcare, agriculture, etc. In this paper, we have given the comprehensive review of using the semantic web in the domain of healthcare, some virtual communities, and other information retrieval projects. As the role of semantic web is becoming pervasive in many domains, the demand for the semantic web in healthcare, virtual communities, and information retrieval has been gaining huge momentum in recent years. To obtain the correct sense of the meaning of the words or terms given in the textual content, it is deemed necessary to apply the right ontology to fix the ambiguity and shun any deviations that persist on the concepts. In this review paper, we have highlighted all the necessary information for a good understanding of the semantic web and its ontological frameworks. Full article

(This article belongs to the Special Issue New Trends in Deep Learning for Computer Vision)

► Show Figures

Figure 1

21 pages, 4216 KiB

Open AccessArticle

Word Sense Disambiguation Using Prior Probability Estimation Based on the Korean WordNet

by Minho Kim and Hyuk-Chul Kwon

Electronics 2021, 10(23), 2938; https://doi.org/10.3390/electronics10232938 - 26 Nov 2021

Cited by 5 | Viewed by 2731

Abstract

Supervised disambiguation using a large amount of corpus data delivers better performance than other word sense disambiguation methods. However, it is not easy to construct large-scale, sense-tagged corpora since this requires high cost and time. On the other hand, implementing unsupervised disambiguation is [...] Read more.

Supervised disambiguation using a large amount of corpus data delivers better performance than other word sense disambiguation methods. However, it is not easy to construct large-scale, sense-tagged corpora since this requires high cost and time. On the other hand, implementing unsupervised disambiguation is relatively easy, although most of the efforts have not been satisfactory. A primary reason for the performance degradation of unsupervised disambiguation is that the semantic occurrence probability of ambiguous words is not available. Hence, a data deficiency problem occurs while determining the dependency between words. This paper proposes an unsupervised disambiguation method using a prior probability estimation based on the Korean WordNet. This performs better than supervised disambiguation. In the Korean WordNet, all the words have similar semantic characteristics to their related words. Thus, it is assumed that the dependency between words is the same as the dependency between their related words. This resolves the data deficiency problem by determining the dependency between words by calculating the

χ^{2}

statistic between related words. Moreover, in order to have the same effect as using the semantic occurrence probability as prior probability, which is used in supervised disambiguation, semantically related words of ambiguous vocabulary are obtained and utilized as prior probability data. An experiment was conducted with Korean, English, and Chinese to evaluate the performance of our proposed lexical disambiguation method. We found that our proposed method had better performance than supervised disambiguation methods even though our method is based on unsupervised disambiguation (using a knowledge-based approach). Full article

(This article belongs to the Special Issue Electronic Solutions for Artificial Intelligence Healthcare Volume II)

► Show Figures

Figure 1

Search Results (34)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (34)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI