Optimising Contextual Embeddings for Meaning Conflation Deficiency Resolution in Low-Resourced Languages
Abstract
1. Introduction
2. Related Literature
2.1. XLNet (Extra Long Net)
- N is the given input sequence with length N [1, 2, …, N];
- refers to the set of all possible permutations of the given sequence length N;
- The nth element is ;
- refers to the nth element of a permutation .
2.2. Subsection Bidirectional Abstractive Representation (Auto Regressive) from Transformers (BART)
2.3. Convolutional Neural Networks (CNNs)
2.4. CNN-LSTM
2.5. Large Language Model Attention Mechanism (LLaMa)
2.6. TextRank
- is the probability of the user to the page at random, and the value is between 0 and 1;
- is the weight of the edge of arbitrary two points i and j;
- represents the set of points that point to ;
- represents the score of the point obtained by the TextRank model.
2.7. Language-Agnostic BERT Sentence Embedding (LABSE)
2.8. XLM-RoBeRTa (XLM-R)
2.9. Distilled Universal Sentence Encoder (DistillUse)
2.10. Deep Neural Network (DNN)
3. Research Methodology
- Phase 1: Corpus Development and Annotation
- Phase 2: Model Development, Training, and Optimisation
- Phase 3: Evaluation and Validation
3.1. Data Collection and Annotation
3.2. Optimised BART-Based Algorithm 1 for Meaning Conflation Deficiency
Algorithm 1 Optimised BART Training for Meaning Conflation Deficiency Excel Corpus |
1: Input: with sentence si, target word wi, and sense label yi 2: Output: Trained BART model M, evaluation metrics, visualisations. Step 1: Preprocessing: 3: (si, wi, yi) ∈ D Construct input as xi = [TGT] wi [/TGT] si 4: Append to training samples Step 2: Label Encoding: 5: Encode sense labels yi into class indices via LabelEncoder. Step 3: Train-Test Split: 6: Split D into training (Dtrain), validation (Dval), and test sets (Dtest) using stratification. Step 4: Tokenisation: 7: tokenise input texts xi with max length L, and convert to tensors. Step 5: Model initialisation: 8: Load BART model with classification head M ← BARTForSequenceClassification with K output classes. Step 6: Optimisation Setup: 9: Configure optimiser as AdamW with learning rate α, batch size B, and gradient accumulation G. Use cosine scheduler with warmup ratio ρ. Step 7: Loss Function: class imbalance is present Compute class weights using Effective Number of Samples or Focal Loss Step 8: Fine-tuning: 10: Train model M for E epochs with early stopping (P patience), label smoothing ϵ, and optional: bf16, checkpointing, encoder layer freezing, torch.compile Step 9: Evaluation: 11: Compute accuracy, macro-F1, precision, recall on Dtest Step 10: visualisations: 12: Generate: 13: ROC Curves (micro, macro, per-class) 14: Confusion Matrix 15: Training/Validation Loss Curves 16: Positional Embedding Norms 17: Decoding Behaviour Samples |
End of the Optimised BART Algorithm |
3.3. Optimised ELMo Algorithm 2 for Meaning Conflation Deficiency
Algorithm 2 Optimised ELMo Fine-Tuning for WSD |
1: Data: From Excel, build the dataset Split stratified into train/validation/test: 2: Model: 3: Let : string → Rd be the ELMo encoder (d = 1024) with trainable parameters θ. Define a regularised classifier head: 4: Label smoothing & loss. 5: With smoothing define Cross-entropy with L2 regularisation: 6: Learning rate schedule (staircase exponential decay). 7: At step: the decay rate, and s the decay steps. 8: Adam optimiser. Given gradients : , , , 9: Early stopping. 10: Let the best epoch. 11: Stop if fails to improve by at least δ for P consecutive epochs; restore . 12: Excel file with columns Sentence, word, sense; hyperparameters: epochs E, batch size B, LR η0, decay γ, steps s, label smoothing ε, dropout p, L2 λ, patience P, tolerance δ. 13: Load Excel and construct xi = [TGT] wi [/TGT] || Sentencei, labels yi. 14: Stratify-split D → Dtrain, Dval, Dtest. 15: Initialise ELMo Eθ (trainable) and head (W1, b1, W2, b2). 16: Initialise Adam states m0, v0 = 0. 17: for epoch e = 1 to E do 18: Shuffle Dtrain and form mini-batches {B} of size B. 19: for batch B do 20: Encode hi ← Eθ(xi), compute zi, oi, pˆi (forward pass). 21: Build smoothed targets qi,k; compute L = LCE + 22: Backprop to get gt = ∇ΘL. 23: Update Adam moments mt, vt, compute ηt, and update Θ. 24: end for 25: Evaluate on Dval. 26. If then 27: Save checkpoint; reset patience counter. 28: else 29: Increment patience counter; if counter ≥ P then break. 30: end if 31: end for 32: Restore best checkpoint. 33: Evaluate on Dtest: accuracy, confusion matrix, and compute ROC/AUC. 34: ROC and AUC (one-vs-rest). 35: For class c, define scores Vary threshold τ ∈ [0, 1] to get The ROC curve is , and Macro/micro averages: 36: Typical hyperparameters (fine-tuning). End of the Algorithm |
3.4. Optimised XLNet Algorithm 3 for Meaning Conflation Deficiency
Algorithm 3 Optimised XLNet |
1: Input: hyperparameters: learning rate 2 × 10−5, batch size 8, epochs 5, max length 128. 2: Output: Fine-tuned XLNet model and tokeniser; validation metrics and plots. Step 1: Data Prep 3: Load dataset D from Excel; keep columns sentence, word, sense 4: Create input text by marking target word: replace word in sentence with <word> 5: Encode labels y with a label encoder; set K ← |classes| 6: Stratified split (X, y) into train/validation (80/20) with fixed random seed Step 2: Tokenisation & Datasets 7: initialise tokeniser T ← XLNettokeniser.from pretrained(″xlnet-base-cased″) 8: Define tokenise(batch): T (text, padding = True, truncation = True, max length = 128) 9: Wrap train/val into HuggingFace Datasets and apply map(tokenise) with batched=True Step 3: Model & Training Setup 10: initialise model M ← XLNetForSequenceClassification with num labels = K 11: Set training args: evaluation per epoch; save per epoch; lr = 2 × 10−5; weight decay = 0.01; epochs = 5; load best model at end=True 12: Enable early stopping with patience = 2 13: Optional (imbalance): compute class weights from training labels and use weighted cross-entropy in a custom compute loss 14: Create Trainer with (M, args, train dataset, eval dataset, callbacks = {EarlyStopping}) Step 4: Training & Evaluation 15: Train: trainer.train() 16: Predict on validation: obtain logits Zˆ ← trainer.predict; compute probabilities Pˆ = softmax(Zˆ); predictions yˆ = arg max Pˆ 17: Compute metrics: classification report; confusion matrix; one-vs-rest ROC and AUC per class Step 5: Analysis & Persistence 18: Optional: project Zˆ with PCA and UMAP for 2D class-separable visualisations 19: Save artifacts: trainer.save model() and tokeniser.save pretrained() |
3.5. Optimised CNN Using Dropout-Enhanced Variants Algorithm 4 for Meaning Conflation Deficiency
Algorithm 4 Counteracting Overfitting with Dropout, L2 Regularisation, and Early |
Stopping 1: Input: Labeled dataset , network , dropout rates L2 coefficient λ, optimiser (Adam) learning rate η, max epochs E, patience P 2: Output: Parameters θ⋆ at best validation epoch Step 1: Split: 3: Partition D → Dtrain, Dval, Dtest (stratified). Step 2: Model: 4: Define Fθ with layers (e.g., Embedding/Conv/Dense) and insert Dropout layers with rates pℓ after capacity-heavy layers. Step 3: Objective (training mode): 5: where B is a mini-batch, m is the dropout mask, and Wℓ are weight matrices subject to L2. Step 4: Optimisation: 6: initialise Adam with step size η. Step 5: Training loop: 7: for e = 1 E do 8: — Train epoch — 9: mini-batch B ⊂ Dtrain 10: Sample dropout masks m (keep prob. qℓ = 1 − pℓ) 11: Compute yˆ and L(θ); 12: backpropagate gradients Update 13: — Validation (no dropout; inference scaling) — 14: Compute val loss and metric (e.g., macro-F1) on Dval val loss < best loss−δ 15: best loss ← val_loss; 16: ; 17: stall ← 0 18: stall ← stall + 1 19: if stall ≥ P thenbreak * 20: Early stopping = 0 Step 5: Return: (best checkpoint). Evaluate once on Dtest. |
4. Experimental Results and Discussion
Analysis of Confusion Matrices
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Khan, W.; Daud, A.; Khan, K.; Muhammad, S.; Haq, R. Exploring the frontiers of deep learning and natural language processing: A comprehensive overview of key challenges and emerging trends. Nat. Lang. Process. J. 2023, 4, 100026. [Google Scholar] [CrossRef]
- Orkphol, K.; Yang, W. Word Sense Disambiguation Using Cosine Similarity Collaborates with Word2vec and WordNet. Future Internet 2019, 11, 114. [Google Scholar] [CrossRef]
- Cheng, J.; Tong, W.; Yan, W. applied sciences Capsule Network Improved Multi-Head Attention for Word Sense Disambiguation. Appl. Sci. 2021, 11, 2488. [Google Scholar] [CrossRef]
- Gujjar, V.; Mago, N.; Kumari, R.; Patel, S.; Chintalapudi, N.; Battineni, G. A Literature Survey on Word Sense Disambiguation for the Hindi Language. Information 2023, 14, 495. [Google Scholar] [CrossRef]
- Liang, R.Y.; Zhang, C.X.; Wang, H.X.; Luo, C.Y.; Lei, T.Y.; Li, M.Z. Word Sense Disambiguation Based on Semantic Knowledge. In Proceedings of the 2019 IEEE 2nd International Conference on Electronic Information and Communication Technology (ICEICT 2019), Harbin, China, 20–22 January 2019; pp. 645–648. [Google Scholar] [CrossRef]
- Boruah, P. A Novel Approach to Word Sense Disambiguation for a Low-Resource Morphologically Rich Language. In Proceedings of the 2022 IEEE 6th Conference on Information and Communication Technology (CICT 2022), Gwalior, India, 18–20 November 2022; pp. 1–6. [Google Scholar] [CrossRef]
- Kokane, C.D.; Babar, S.D.; Mahalle, P.N. Word sense disambiguation for large documents using neural network model. In Proceedings of the 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur, India, 6–8 July 2021; pp. 1–5. [Google Scholar] [CrossRef]
- Heo, Y.; Kang, S.; Seo, J. Hybrid Sense Classification Method for Large-Scale Word Sense Disambiguation. IEEE Access 2020, 8, 27247–27256. [Google Scholar] [CrossRef]
- Camacho-Collados, J.; Pilehvar, M.T. Embeddings in Natural Language Processing. In Proceedings of the in COLING 2020—28th International Conference on Computational Linguistics, Tutorial Abstracts, Barcelona, Spain, 8–13 December 2020; pp. 10–15. [Google Scholar] [CrossRef]
- Pilehvar, M.T. On the Importance of Distinguishing Word Meaning Representations: A Case Study on Reverse Dictionary Mapping. In Proceedings of the NAACL-HLT, Minneapolis, Minnesota: Association for Computational Linguistics, Minneapolis, MI, USA, 2–7 June 2019 ; pp. 2151–2156. [Google Scholar] [CrossRef]
- Wu, L.; Zheng, Z.; Qiu, Z.; Wang, H.; Gu, H.; Shen, T.; Qin, C.; Zhu, C.; Zhu, H.; Liu, Q.; et al. A survey on large language models for recommendation. World Wide Web 2024, 27, 60. [Google Scholar] [CrossRef]
- Rodrigues da Silva, J.; Caseli, H.M. Sense representations for Portuguese: Experiments with sense embeddings and deep neural language models. Lang. Resour. Eval. 2021, 55, 901–924. [Google Scholar] [CrossRef]
- Alshattnawi, S.; Shatnawi, A.; Alsobeh, A.M.R.; Magableh, A.A. Beyond Word-Based Model Embeddings: Contextualized Representations for Enhanced Social Media Spam Detection. Appl. Sci. 2024, 14, 2254. [Google Scholar] [CrossRef]
- Loureiro, D.; Jorge, A.M.; Camacho-Collados, J. LMMS reloaded: Transformer-based sense embeddings for disambiguation and beyond. Artif. Intell. 2022, 305, 103661. [Google Scholar] [CrossRef]
- Kumar J, A.; Trueman, T.E.; Cambria, E. Fake News Detection Using XLNet Fine-Tuning Model. In Proceedings of the 2021 International Conference on Computational Intelligence and Computing Applications (ICCICA 2021), Nagpur, India, 26–27 November 2021; pp. 1–4. [Google Scholar] [CrossRef]
- Li, H.; Choi, J.; Lee, S.; Ahn, J.H. Comparing BERT and XLNet from the Perspective of Computational Characteristics. In Proceedings of the 2020 International Conference on Electronics, Information, and Communication (ICEIC 2020), Barcelona, Spain, 19–22 January 2020; pp. 2–5. [Google Scholar] [CrossRef]
- Athithan, S.; Jain, A.; Sachi, S.; Singh, A.K.; Sharma, Y.K. Twitter Fake News Detection by Using Xlnet Model. In Proceedings of the International Conference on Technological Advancements in Computational Sciences (ICTACS 2023), Tashkent, Uzbekistan, 1–3 November 2023; pp. 868–872. [Google Scholar] [CrossRef]
- Mathews, A.; Singh, P.N. Opinion Mining from Audio Conversation using Machine Learning tools and BART Transformer. In Proceedings of the 3rd IEEE International Conference on Mobile Networks and Wireless Communications (ICMNWC 2023), Tumkur, India, 4–5 December 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Jijo, S.M.; Panchal, D.; Ardeshana, J.; Chaudhari, U. Text Summarization using Textrank, Lexrank and Bart model. In Proceedings of the 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT 2024), Kamand, India, 24–28 June 2024; pp. 1–7. [Google Scholar] [CrossRef]
- Kaur, K.; Kaur, P. Improving BERT model for requirements classification by bidirectional LSTM-CNN deep model. Comput. Electr. Eng. 2023, 108, 108699. [Google Scholar] [CrossRef]
- Samadi, M.; Mousavian, M.; Momtazi, S. Deep contextualized text representation and learning for fake news detection. Inf. Process. Manag. 2021, 58, 102723. [Google Scholar] [CrossRef]
- Mao, X.; Tian, Y.; Jin, T.; Di, B. Enhancing music audio signal recognition through CNN-BiLSTM fusion with De-noising autoencoder for improved performance. Neurocomputing 2025, 625, 129607. [Google Scholar] [CrossRef]
- Mao, T.; Fu, J.; Yoshie, O. Enhancing Argument Pair Extraction Through Supervised Fine-Tuning of the Llama Model. In Proceedings of the 2024 IEEE 3rd International Conference on Electrical Engineering, Big Data and Algorithms (EEBDA 2024), Changchun, China, 27–29 February 2024; pp. 1153–1158. [Google Scholar] [CrossRef]
- Fang, S. Application Research on Large Language Model Attention Mechanism in Automatic Classification of Book Content. In Proceedings of the 2024 IEEE 2nd International Conference on Image Processing and Computer Applications (ICIPCA 2024), Shenyang, China, 28–30 June 2024; pp. 343–350. [Google Scholar] [CrossRef]
- Li, W.; Zhao, J. TextRank Algorithm by Exploiting Wikipedia for Short Text Keywords Extraction. In Proceedings of the 2016 3rd International Conference on Information Science and Control Engineering (ICISCE 2016), Beijing, China, 8–10 July 2016; pp. 683–686. [Google Scholar] [CrossRef]
- Kedtiwerasak, R.; Adsawinnawanawa, E.; Jirakunkanok, P.; Kongkachandra, R. Thai Keyword Extraction using TextRank Algorithm. In Proceedings of the 2019 14th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP 2019), Chiang Mai, Thailand, 30 October–1 November 2019; pp. 1–6. [Google Scholar] [CrossRef]
- Gunawan, D.; Purnamasari, F.; Ramadhiana, R.; Rahmat, R.F. Keyword extraction from scientific articles in bahasa indonesia using textrank algorithm. In Proceedings of the 2020 4th International Conference on Electrical, Telecommunication and Computer Engineering (ELTICOM 2020), Medan, Indonesia, 3–4 September 2020; pp. 260–264. [Google Scholar] [CrossRef]
- Pataci, T.T.; Göz, F. Multilingual and Multi-Class Sentiment Classification Using Machine Learning, BERT, and GPT-4o-mini. In Proceedings of the ICHORA 2025—2025 7th International Congress on Human-Computer Interaction, Optimization and Robotic Applications, Ankara, Turkiye, 23–24 May 2025. [Google Scholar] [CrossRef]
- Sreelakshmi, K.; Premjith, B.; Chakravarthi, B.R.; Soman, K.P. Detection of Hate Speech and Offensive Language CodeMix Text in Dravidian Languages Using Cost-Sensitive Learning Approach. IEEE Access 2024, 12, 20064–20090. [Google Scholar] [CrossRef]
- Feng, X.; Richardson, B.; Amman, S.; Glass, J. On using heterogeneous data for vehicle-based speech recognition: A DNN-based approach. In Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, South Brisbane, QLD, Australia, 19–24 April 2015; pp. 4385–4389. [Google Scholar] [CrossRef]
- Hu, Z.; Fu, Y.; Xu, X.; Zhang, H. I-Vector and DNN Hybrid Method for Short Utterance Speaker Recognition. In Proceedings of the 2020 IEEE International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA 2020), Chongqing, China, 6–8 November 2020; pp. 67–71. [Google Scholar] [CrossRef]
- Khabbazan, B.; Riera, M.; González, A. QeiHaN: An Energy-Efficient DNN Accelerator that Leverages Log Quantization in NDP Architectures. In Proceedings of the Parallel Architectures and Compilation Techniques—Conference Proceedings, PACT, Vienna, Austria, 21–25 October 2023; pp. 325–326. [Google Scholar] [CrossRef]
- Neha, F.; Bansal, A.K. Convnext-PCA: A Parameter-Efficient Model for Accurate Kidney Abnormality Classification. In Proceedings of the 2024 IEEE 34th International Workshop on Machine Learning for Signal Processing (MLSP), London, UK, 22–25 September 2024. [Google Scholar] [CrossRef]
- Neha, F.N.U.; Bansal, A. Understanding the architecture of vision transformer and its variants: A review. In Proceedings of the 1st International Conference on Innovative Engineering Sciences and Technological Research (ICIESTR 2024), Muscat, Oman, 14–15 May 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Chaudhary, A.; Zhou, C.; Levin, L.; Neubig, G.; Mortensen, D.R.; Carbonell, J.G. Adapting word embeddings to new languages with morphological and phonological subword representations. arXiv 2018, arXiv:1808.09500. [Google Scholar] [CrossRef]
- Abudouwaili, G.; Abliz, W.; Abiderexiti, K.; Wumaier, A.; Yi, N. Strategies to Improve Low-Resource Agglutinative Languages Morphological Inflection. In Proceedings of the CoNLL 2023—27th Conference on Computational Natural Language Learning, Dublin, Ireland, 6–7 December 2023; Association for Computational Linguistics: Singapore, 2023; pp. 508–520. [Google Scholar] [CrossRef]
- Anastasopoulos, A.; Neubig, G. Pushing the limits of low-resource morphological inflection. arXiv 2019, arXiv:1908.05838. [Google Scholar] [CrossRef]
- Wiemerslage, A.; Silfverberg, M.; Yang, C.; Mccarthy, A.D.; Nicolai, G.; Colunga, E.; Kann, K. Findings of the Association for Computational Linguistics Morphological Processing of Low-Resource Languages: Where We Are and What’s Next. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, 22–27 May 2022; Association for Computational Linguistics, 2022; pp. 988–1007. Available online: http://universaldependencies.org/v2/ (accessed on 4 September 2025).
Model Comparison | b (Baseline Correct, Opt Wrong) | c (Opt Correct, Baseline Wrong) | McNemar’s p-Value | Significance |
---|---|---|---|---|
ELMo vs. Optimised ELMo | 14 | 30 | 0.0226 | Significant |
BART vs. Optimised BART | 18 | 35 | 0.0270 | Significant |
XLNet vs. Optimised XLNet | 21 | 42 | 0.0111 | Significant |
CNN vs. Optimised CNN (dropout) | 19 | 36 | 0.0300 | Significant |
Model | Precision | Recall | F1-Score | Accuracy |
---|---|---|---|---|
BART with ablation studies | 97 | 97 | 97 | 97 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Masethe, M.A.; Ojo, S.O.; Masethe, H.D. Optimising Contextual Embeddings for Meaning Conflation Deficiency Resolution in Low-Resourced Languages. Computers 2025, 14, 402. https://doi.org/10.3390/computers14090402
Masethe MA, Ojo SO, Masethe HD. Optimising Contextual Embeddings for Meaning Conflation Deficiency Resolution in Low-Resourced Languages. Computers. 2025; 14(9):402. https://doi.org/10.3390/computers14090402
Chicago/Turabian StyleMasethe, Mosima A., Sunday O. Ojo, and Hlaudi D. Masethe. 2025. "Optimising Contextual Embeddings for Meaning Conflation Deficiency Resolution in Low-Resourced Languages" Computers 14, no. 9: 402. https://doi.org/10.3390/computers14090402
APA StyleMasethe, M. A., Ojo, S. O., & Masethe, H. D. (2025). Optimising Contextual Embeddings for Meaning Conflation Deficiency Resolution in Low-Resourced Languages. Computers, 14(9), 402. https://doi.org/10.3390/computers14090402