Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (27)

Search Parameters:
Keywords = multilingual representation learning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
31 pages, 2985 KB  
Article
Heterogeneous Ensemble Sentiment Classification Model Integrating Multi-View Features and Dynamic Weighting
by Song Yang, Jiayao Xing, Zongran Dong and Zhaoxia Liu
Electronics 2025, 14(21), 4189; https://doi.org/10.3390/electronics14214189 (registering DOI) - 27 Oct 2025
Abstract
With the continuous growth of user reviews, identifying underlying sentiment across multi-source texts efficiently and accurately has become a significant challenge in NLP. Traditional single models in cross-domain sentiment analysis often exhibit insufficient stability, limited generalization capabilities, and sensitivity to class imbalance. Existing [...] Read more.
With the continuous growth of user reviews, identifying underlying sentiment across multi-source texts efficiently and accurately has become a significant challenge in NLP. Traditional single models in cross-domain sentiment analysis often exhibit insufficient stability, limited generalization capabilities, and sensitivity to class imbalance. Existing ensemble methods predominantly rely on static weighting or voting strategies among homogeneous models, failing to fully leverage the complementary advantages between models. To address these issues, this study proposes a heterogeneous ensemble sentiment classification model integrating multi-view features and dynamic weighting. At the feature learning layer, the model constructs three complementary base learners, a RoBERTa-FC for extracting global semantic features, a BERT-BiGRU for capturing temporal dependencies, and a TextCNN-Attention for focusing on local semantic features, thereby achieving multi-level text representation. At the decision layer, a meta-learner is used to fuse multi-view features, and dynamic uncertainty weighting and attention weighting strategies are employed to adaptively adjust outputs from different base learners. Experimental results across multiple domains demonstrate that the proposed model consistently outperforms single learners and comparison methods in terms of Accuracy, Precision, Recall, F1 Score, and Macro-AUC. On average, the ensemble model achieves a Macro-AUC of 0.9582 ± 0.023 across five datasets, with an Accuracy of 0.9423, an F1 Score of 0.9590, and a Macro-AUC of 0.9797 on the AlY_ds dataset. Moreover, in cross-dataset ranking evaluation based on equally weighted metrics, the model consistently ranks within the top two, confirming its superior cross-domain adaptability and robustness. These findings highlight the effectiveness of the proposed framework in enhancing sentiment classification performance and provide valuable insights for future research on lightweight dynamic ensembles, multilingual, and multimodal applications. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

21 pages, 728 KB  
Article
Resolving Linguistic Asymmetry: Forging Symmetric Multilingual Embeddings Through Asymmetric Contrastive and Curriculum Learning
by Lei Meng, Yinlin Li, Wei Wei and Caipei Yang
Symmetry 2025, 17(9), 1386; https://doi.org/10.3390/sym17091386 - 25 Aug 2025
Viewed by 842
Abstract
The pursuit of universal, symmetric semantic representations within large language models (LLMs) faces a fundamental challenge: the inherent asymmetry of natural languages. Different languages exhibit vast disparities in syntactic structures, lexical choices, and cultural nuances, making the creation of a truly shared, symmetric [...] Read more.
The pursuit of universal, symmetric semantic representations within large language models (LLMs) faces a fundamental challenge: the inherent asymmetry of natural languages. Different languages exhibit vast disparities in syntactic structures, lexical choices, and cultural nuances, making the creation of a truly shared, symmetric embedding space a non-trivial task. This paper aims to address this critical problem by introducing a novel framework to forge robust and symmetric multilingual sentence embeddings. Our approach, named DACL (Dynamic Asymmetric Contrastive Learning), is anchored in two powerful asymmetric learning paradigms: Contrastive Learning and Dynamic Curriculum Learning (DCL). We extend Contrastive Learning to the multilingual context, where it asymmetrically treats semantically equivalent sentences from different languages (positive pairs) and sentences with distinct meanings (negative pairs) to enforce semantic symmetry in the target embedding space. To further refine this process, we incorporate Dynamic Curriculum Learning, which introduces a second layer of asymmetry by dynamically scheduling training instances from easy to hard. This dual-asymmetric strategy enables the model to progressively master complex cross-lingual relationships, starting with more obvious semantic equivalences and advancing to subtler ones. Our comprehensive experiments on benchmark cross-lingual tasks, including sentence retrieval and cross-lingual classification (XNLI, PAWS-X, MLDoc, MARC), demonstrate that DACL significantly outperforms a wide range of established baselines. The results validate our dual-asymmetric framework as a highly effective approach for forging robust multilingual embeddings, particularly excelling in tasks involving complex linguistic asymmetries. Ultimately, this work contributes a novel dual-asymmetric learning framework that effectively leverages linguistic asymmetry to achieve robust semantic symmetry across languages. It offers valuable insights for developing more capable, fair, and interpretable multilingual LLMs, emphasizing that deliberately leveraging asymmetry in the learning process is a highly effective strategy. Full article
Show Figures

Figure 1

102 pages, 17708 KB  
Review
From Detection to Understanding: A Systematic Survey of Deep Learning for Scene Text Processing
by Zhandong Liu, Ruixia Song, Ke Li and Yong Li
Appl. Sci. 2025, 15(17), 9247; https://doi.org/10.3390/app15179247 - 22 Aug 2025
Viewed by 2036
Abstract
Scene text understanding, serving as a cornerstone technology for autonomous navigation, document digitization, and accessibility tools, has witnessed a paradigm shift from traditional methods relying on handcrafted features and multi-stage processing pipelines to contemporary deep learning frameworks capable of learning hierarchical representations directly [...] Read more.
Scene text understanding, serving as a cornerstone technology for autonomous navigation, document digitization, and accessibility tools, has witnessed a paradigm shift from traditional methods relying on handcrafted features and multi-stage processing pipelines to contemporary deep learning frameworks capable of learning hierarchical representations directly from raw image inputs. This survey distinctly categorizes modern scene text recognition (STR) methodologies into three principal paradigms: two-stage detection frameworks that employ region proposal networks for precise text localization, single-stage detectors designed to optimize computational efficiency, and specialized architectures tailored to handle arbitrarily shaped text through geometric-aware modeling techniques. Concurrently, an in-depth analysis of text recognition paradigms elucidates the evolutionary trajectory from connectionist temporal classification (CTC) and sequence-to-sequence models to transformer-based architectures, which excel in contextual modeling and demonstrate superior performance. In contrast to prior surveys, this work uniquely emphasizes several key differences and contributions. Firstly, it provides a comprehensive and systematic taxonomy of STR methods, explicitly highlighting the trade-offs between detection accuracy, computational efficiency, and geometric adaptability across different paradigms. Secondly, it delves into the nuances of text recognition, illustrating how transformer-based models have revolutionized the field by capturing long-range dependencies and contextual information, thereby addressing challenges in recognizing complex text layouts and multilingual scripts. Furthermore, the survey pioneers the exploration of critical research frontiers, such as multilingual text adaptation, enhancing model robustness against environmental variations (e.g., lighting conditions, occlusions), and devising data-efficient learning strategies to mitigate the dependency on large-scale annotated datasets. By synthesizing insights from technical advancements across 28 benchmark datasets and standardized evaluation protocols, this study offers researchers a holistic perspective on the current state-of-the-art, persistent challenges, and promising avenues for future research, with the ultimate goal of achieving human-level scene text comprehension. Full article
Show Figures

Figure 1

20 pages, 983 KB  
Article
A Library-Oriented Large Language Model Approach to Cross-Lingual and Cross-Modal Document Retrieval
by Wang Yi, Xiahuan Cai, Hongtao Ma, Zhengjie Fu and Yan Zhan
Electronics 2025, 14(15), 3145; https://doi.org/10.3390/electronics14153145 - 7 Aug 2025
Viewed by 1052
Abstract
Under the growing demand for processing multimodal and cross-lingual information, traditional retrieval systems have encountered substantial limitations when handling heterogeneous inputs such as images, textual layouts, and multilingual language expressions. To address these challenges, a unified retrieval framework has been proposed, which integrates [...] Read more.
Under the growing demand for processing multimodal and cross-lingual information, traditional retrieval systems have encountered substantial limitations when handling heterogeneous inputs such as images, textual layouts, and multilingual language expressions. To address these challenges, a unified retrieval framework has been proposed, which integrates visual features from images, layout-aware optical character recognition (OCR) text, and bilingual semantic representations in Chinese and English. This framework aims to construct a shared semantic embedding space that mitigates semantic discrepancies across modalities and resolves inconsistencies in cross-lingual mappings. The architecture incorporates three main components: a visual encoder, a structure-aware OCR module, and a multilingual Transformer. Furthermore, a joint contrastive learning loss has been introduced to enhance alignment across both modalities and languages. The proposed method has been evaluated on three core tasks: a single-modality retrieval task from image → OCR, a cross-lingual retrieval task between Chinese and English, and a joint multimodal retrieval task involving image, OCR, and language inputs. Experimental results demonstrate that, in the joint multimodal setting, the proposed model achieved a Precision@10 of 0.693, Recall@10 of 0.684, nDCG@10 of 0.672, and F1@10 of 0.685, substantially outperforming established baselines such as CLIP, LayoutLMv3, and UNITER. Ablation studies revealed that removing either the structure-aware OCR module or the cross-lingual alignment mechanism resulted in a decrease in mean reciprocal rank (MRR) to 0.561, thereby confirming the critical role of these components in reinforcing semantic consistency across modalities. This study highlights the powerful potential of large language models in multimodal semantic fusion and retrieval tasks, providing robust solutions for large-scale semantic understanding and application scenarios in multilingual and multimodal contexts. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

23 pages, 3847 KB  
Article
Optimizing Sentiment Analysis in Multilingual Balanced Datasets: A New Comparative Approach to Enhancing Feature Extraction Performance with ML and DL Classifiers
by Hamza Jakha, Souad El Houssaini, Mohammed-Alamine El Houssaini, Souad Ajjaj and Abdelali Hadir
Appl. Syst. Innov. 2025, 8(4), 104; https://doi.org/10.3390/asi8040104 - 28 Jul 2025
Viewed by 3458
Abstract
Social network platforms have a big impact on the development of companies by influencing clients’ behaviors and sentiments, which directly affect corporate reputations. Analyzing this feedback has become an essential component of business intelligence, supporting the improvement of long-term marketing strategies on a [...] Read more.
Social network platforms have a big impact on the development of companies by influencing clients’ behaviors and sentiments, which directly affect corporate reputations. Analyzing this feedback has become an essential component of business intelligence, supporting the improvement of long-term marketing strategies on a larger scale. The implementation of powerful sentiment analysis models requires a comprehensive and in-depth examination of each stage of the process. In this study, we present a new comparative approach for several feature extraction techniques, including TF-IDF, Word2Vec, FastText, and BERT embeddings. These methods are applied to three multilingual datasets collected from hotel review platforms in the tourism sector in English, French, and Arabic languages. Those datasets were preprocessed through cleaning, normalization, labeling, and balancing before being trained on various machine learning and deep learning algorithms. The effectiveness of each feature extraction method was evaluated using metrics such as accuracy, F1-score, precision, recall, ROC AUC curve, and a new metric that measures the execution time for generating word representations. Our extensive experiments demonstrate significant and excellent results, achieving accuracy rates of approximately 99% for the English dataset, 94% for the Arabic dataset, and 89% for the French dataset. These findings confirm the important impact of vectorization techniques on the performance of sentiment analysis models. They also highlight the important relationship between balanced datasets, effective feature extraction methods, and the choice of classification algorithms. So, this study aims to simplify the selection of feature extraction methods and appropriate classifiers for each language, thereby contributing to advancements in sentiment analysis. Full article
(This article belongs to the Topic Social Sciences and Intelligence Management, 2nd Volume)
Show Figures

Figure 1

19 pages, 457 KB  
Article
Transinger: Cross-Lingual Singing Voice Synthesis via IPA-Based Phonetic Alignment
by Chen Shen, Lu Zhao, Cejin Fu, Bote Gan and Zhenlong Du
Sensors 2025, 25(13), 3973; https://doi.org/10.3390/s25133973 - 26 Jun 2025
Viewed by 1518
Abstract
Although Singing Voice Synthesis (SVS) has revolutionized audio content creation, global linguistic diversity remains challenging. Current SVS research shows scant exploration of cross-lingual generalization, as fragmented, language-specific phoneme encodings (e.g., Pinyin, ARPA) hinder unified phonetic modeling. To address this challenge, we built a [...] Read more.
Although Singing Voice Synthesis (SVS) has revolutionized audio content creation, global linguistic diversity remains challenging. Current SVS research shows scant exploration of cross-lingual generalization, as fragmented, language-specific phoneme encodings (e.g., Pinyin, ARPA) hinder unified phonetic modeling. To address this challenge, we built a four-language dataset based on GTSinger’s speech data, using the International Phonetic Alphabet (IPA) for consistent phonetic representation and applying precise segmentation and calibration for improved quality. In particular, we propose a novel method of decomposing IPA phonemes into letters and diacritics, enabling the model to deeply learn the underlying rules of pronunciation and achieve better generalization. A dynamic IPA adaptation strategy further enables the application of learned phonetic representations to unseen languages. Based on VISinger2, we introduce Transinger, an innovative cross-lingual synthesis framework. Transinger achieves breakthroughs in phoneme representation learning by precisely modeling pronunciation, which effectively enables compositional generalization to unseen languages. It also integrates Conformer and RVQ techniques to optimize information extraction and generation, achieving outstanding cross-lingual synthesis performance. Objective and subjective experiments have confirmed that Transinger significantly outperforms state-of-the-art singing synthesis methods in terms of cross-lingual generalization. These results demonstrate that multilingual aligned representations can markedly enhance model learning efficacy and robustness, even for languages not seen during training. Moreover, the integration of a strategy that splits IPA phonemes into letters and diacritics allows the model to learn pronunciation more effectively, resulting in a qualitative improvement in generalization. Full article
Show Figures

Figure 1

27 pages, 17331 KB  
Article
RTACompensator: Leveraging AraBERT and XGBoost for Automated Road Accident Compensation
by Taoufiq El Moussaoui, Awatif Karim, Chakir Loqman and Jaouad Boumhidi
Appl. Syst. Innov. 2025, 8(1), 19; https://doi.org/10.3390/asi8010019 - 24 Jan 2025
Viewed by 1598
Abstract
Road traffic accidents (RTAs) are a significant public health and safety concern, resulting in numerous injuries and fatalities. The growing number of cases referred to traffic accident rooms in courts has underscored the necessity for an automated solution to determine victim indemnifications, particularly [...] Read more.
Road traffic accidents (RTAs) are a significant public health and safety concern, resulting in numerous injuries and fatalities. The growing number of cases referred to traffic accident rooms in courts has underscored the necessity for an automated solution to determine victim indemnifications, particularly given the limited number of specialized judges and the complexity of cases involving multiple victims. This paper introduces RTACompensator, an artificial intelligence (AI)-driven decision support system designed to automate indemnification calculations for road accident victims. The system comprises two main components: a calculation module that determines initial compensation based on factors such as age, salary, and medical assessments, and a machine learning (ML) model that assigns liability based on police accident reports. The model uses Arabic bidirectional encoder representations from transformer (AraBERT) embeddings to generate contextual vectors from the report, which are then processed by extreme gradient boosting (XGBoost) to determine responsibility. The model was trained on a purpose-built Arabic corpus derived from real-world legal judgments. To expand the dataset, two data augmentation techniques were employed: multilingual bidirectional encoder representations from transformers (BERT) and Gemini, developed by Google DeepMind. Experimental results demonstrate the model’s effectiveness, achieving accuracy scores of 97% for the BERT-augmented corpus and 97.3% for the Gemini-augmented corpus. These results underscore the system’s potential to improve decision-making in road accident indemnifications. Additionally, the constructed corpus provides a valuable resource for further research in this domain, laying the groundwork for future advancements in automating and refining the indemnification process. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

23 pages, 4988 KB  
Review
The Evolution of English Medium Instruction Research in Higher Education: A Bibliometric Study
by Akmaral Karabay and Naureen Durrani
Educ. Sci. 2024, 14(9), 982; https://doi.org/10.3390/educsci14090982 - 5 Sep 2024
Cited by 7 | Viewed by 7030
Abstract
The expansion of English medium instruction (EMI) in higher education has generated significant scholarly interest, resulting in an increasing body of research across different contexts. This bibliometric study examines 1522 publications in the Scopus database to explore the intellectual, conceptual, and social structure [...] Read more.
The expansion of English medium instruction (EMI) in higher education has generated significant scholarly interest, resulting in an increasing body of research across different contexts. This bibliometric study examines 1522 publications in the Scopus database to explore the intellectual, conceptual, and social structure of the EMI literature in higher education. Findings revealed substantial growth in publications and citations between 1974 and 2024, showing a notable increase in productivity after 2018. Most cited authors focus on EMI within their affiliated country, but some affiliated with British universities have made global contributions. The field exhibits global coverage, albeit with strong dominance by China, Spain, the UK, Australia, and Hong Kong, as well as limited representation from African nations, barring South Africa. EMI networks are primarily driven by authors’ current and past institutional affiliations as well as geographical proximity, with the UK, Spain, and China emerging as leaders in these networks. The most productive journals focus on multilingualism, bilingualism, language policy, teaching, and learning while also encompassing higher education and multidisciplinary areas. Key topics signal a shift towards translanguaging and classroom interaction. Under-researched areas include (post)colonialism and EMI implementation. These findings provide a comprehensive insight into the evolving landscape of EMI research and potential future directions. Full article
(This article belongs to the Section Higher Education)
Show Figures

Figure 1

10 pages, 585 KB  
Technical Note
Text-Independent Phone-to-Audio Alignment Leveraging SSL (TIPAA-SSL) Pre-Trained Model Latent Representation and Knowledge Transfer
by Noé Tits, Prernna Bhatnagar and Thierry Dutoit
Acoustics 2024, 6(3), 772-781; https://doi.org/10.3390/acoustics6030042 - 29 Aug 2024
Cited by 1 | Viewed by 2258
Abstract
In this paper, we present a novel approach for text-independent phone-to-audio alignment based on phoneme recognition, representation learning and knowledge transfer. Our method leverages a self-supervised model (Wav2Vec2) fine-tuned for phoneme recognition using a Connectionist Temporal Classification (CTC) loss, a dimension reduction model [...] Read more.
In this paper, we present a novel approach for text-independent phone-to-audio alignment based on phoneme recognition, representation learning and knowledge transfer. Our method leverages a self-supervised model (Wav2Vec2) fine-tuned for phoneme recognition using a Connectionist Temporal Classification (CTC) loss, a dimension reduction model and a frame-level phoneme classifier trained using forced-alignment labels (using Montreal Forced Aligner) to produce multi-lingual phonetic representations, thus requiring minimal additional training. We evaluate our model using synthetic native data from the TIMIT dataset and the SCRIBE dataset for American and British English, respectively. Our proposed model outperforms the state-of-the-art (charsiu) in statistical metrics and has applications in language learning and speech processing systems. We leave experiments on other languages for future work but the design of the system makes it easily adaptable to other languages. Full article
(This article belongs to the Special Issue Developments in Acoustic Phonetic Research)
Show Figures

Figure 1

48 pages, 5649 KB  
Article
Multimodal Dictionaries for Traditional Craft Education
by Xenophon Zabulis, Nikolaos Partarakis, Valentina Bartalesi, Nicolo Pratelli, Carlo Meghini, Arnaud Dubois, Ines Moreno and Sotiris Manitsaris
Multimodal Technol. Interact. 2024, 8(7), 63; https://doi.org/10.3390/mti8070063 - 18 Jul 2024
Cited by 1 | Viewed by 2528
Abstract
We address the problem of systematizing the authoring of digital dictionaries for craft education from ethnographic studies and recordings. First, we present guidelines for the collection of ethnographic data using digital audio and video and identify terms that are central in the description [...] Read more.
We address the problem of systematizing the authoring of digital dictionaries for craft education from ethnographic studies and recordings. First, we present guidelines for the collection of ethnographic data using digital audio and video and identify terms that are central in the description of crafting actions, products, tools, and materials. Second, we present a classification scheme for craft terms and a way to semantically annotate them, using a multilingual and hierarchical thesaurus, which provides term definitions and a semantic hierarchy of these terms. Third, we link ethnographic resources and open-access data to the identified terms using an online platform for the representation of traditional crafts, associating their definition with illustrations, examples of use, and 3D models. We validate the efficacy of the approach by creating multimedia vocabularies for an online eLearning platform for introductory courses to nine traditional crafts. Full article
Show Figures

Figure 1

20 pages, 2098 KB  
Article
Tibetan Sentence Boundaries Automatic Disambiguation Based on Bidirectional Encoder Representations from Transformers on Byte Pair Encoding Word Cutting Method
by Fenfang Li, Zhengzhang Zhao, Li Wang and Han Deng
Appl. Sci. 2024, 14(7), 2989; https://doi.org/10.3390/app14072989 - 2 Apr 2024
Cited by 3 | Viewed by 1659
Abstract
Sentence Boundary Disambiguation (SBD) is crucial for building datasets for tasks such as machine translation, syntactic analysis, and semantic analysis. Currently, most automatic sentence segmentation in Tibetan adopts the methods of rule-based and statistical learning, as well as the combination of the two, [...] Read more.
Sentence Boundary Disambiguation (SBD) is crucial for building datasets for tasks such as machine translation, syntactic analysis, and semantic analysis. Currently, most automatic sentence segmentation in Tibetan adopts the methods of rule-based and statistical learning, as well as the combination of the two, which have high requirements on the corpus and the linguistic foundation of the researchers and are more costly to annotate manually. In this study, we explore Tibetan SBD using deep learning technology. Initially, we analyze Tibetan characteristics and various subword techniques, selecting Byte Pair Encoding (BPE) and Sentencepiece (SP) for text segmentation and training the Bidirectional Encoder Representations from Transformers (BERT) pre-trained language model. Secondly, we studied the Tibetan SBD based on different BERT pre-trained language models, which mainly learns the ambiguity of the shad (“།”) in different positions in modern Tibetan texts and determines through the model whether the shad (“།”) in the texts has the function of segmenting sentences. Meanwhile, this study introduces four models, BERT-CNN, BERT-RNN, BERT-RCNN, and BERT-DPCNN, based on the BERT model for performance comparison. Finally, to verify the performance of the pre-trained language models on the SBD task, this study conducts SBD experiments on both the publicly available Tibetan pre-trained language model TiBERT and the multilingual pre-trained language model (Multi-BERT). The experimental results show that the F1 score of the BERT (BPE) model trained in this study reaches 95.32% on 465,669 Tibetan sentences, nearly five percentage points higher than BERT (SP) and Multi-BERT. The SBD method based on pre-trained language models in this study lays the foundation for establishing datasets for the later tasks of Tibetan pre-training, summary extraction, and machine translation. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

17 pages, 12932 KB  
Article
On Isotropy of Multimodal Embeddings
by Kirill Tyshchuk, Polina Karpikova, Andrew Spiridonov, Anastasiia Prutianova, Anton Razzhigaev and Alexander Panchenko
Information 2023, 14(7), 392; https://doi.org/10.3390/info14070392 - 10 Jul 2023
Cited by 5 | Viewed by 6623
Abstract
Embeddings, i.e., vector representations of objects, such as texts, images, or graphs, play a key role in deep learning methodologies nowadays. Prior research has shown the importance of analyzing the isotropy of textual embeddings for transformer-based text encoders, such as the BERT model. [...] Read more.
Embeddings, i.e., vector representations of objects, such as texts, images, or graphs, play a key role in deep learning methodologies nowadays. Prior research has shown the importance of analyzing the isotropy of textual embeddings for transformer-based text encoders, such as the BERT model. Anisotropic word embeddings do not use the entire space, instead concentrating on a narrow cone in such a pretrained vector space, negatively affecting the performance of applications, such as textual semantic similarity. Transforming a vector space to optimize isotropy has been shown to be beneficial for improving performance in text processing tasks. This paper is the first comprehensive investigation of the distribution of multimodal embeddings using the example of OpenAI’s CLIP pretrained model. We aimed to deepen the understanding of the embedding space of multimodal embeddings, which has previously been unexplored in this respect, and study the impact on various end tasks. Our initial efforts were focused on measuring the alignment of image and text embedding distributions, with an emphasis on their isotropic properties. In addition, we evaluated several gradient-free approaches to enhance these properties, establishing their efficiency in improving the isotropy/alignment of the embeddings and, in certain cases, the zero-shot classification accuracy. Significantly, our analysis revealed that both CLIP and BERT models yielded embeddings situated within a cone immediately after initialization and preceding training. However, they were mostly isotropic in the local sense. We further extended our investigation to the structure of multilingual CLIP text embeddings, confirming that the observed characteristics were language-independent. By computing the few-shot classification accuracy and point-cloud metrics, we provide evidence of a strong correlation among multilingual embeddings. Embeddings transformation using the methods described in this article makes it easier to visualize embeddings. At the same time, multiple experiments that we conducted showed that, in regard to the transformed embeddings, the downstream tasks performance does not drop substantially (and sometimes is even improved). This means that one could obtain an easily visualizable embedding space, without substantially losing the quality of downstream tasks. Full article
(This article belongs to the Special Issue Information Visualization Theory and Applications)
Show Figures

Figure 1

32 pages, 4788 KB  
Article
The Language Diamond: An Intercultural Model to Teach and Learn (through) Languages
by Nathalie Auger
Educ. Sci. 2023, 13(5), 520; https://doi.org/10.3390/educsci13050520 - 20 May 2023
Cited by 7 | Viewed by 4234
Abstract
The starting point (1) of our proposal is the observation of the lack of intercultural practices in schools in France, even in the crucial context of teaching French to migrant children (2). Thanks to previous studies, we, therefore, develop theoretical anchors (3) about [...] Read more.
The starting point (1) of our proposal is the observation of the lack of intercultural practices in schools in France, even in the crucial context of teaching French to migrant children (2). Thanks to previous studies, we, therefore, develop theoretical anchors (3) about learning territories, the ways to recycle language, and cultural experiences that can encompass all the context parameters (a pan-language approach) to elaborate an intercultural model for learning and teaching. The aim is to propose, methodological reflections to offer a model which could help change the representations and practices of the educational community regarding multilingualism so that students’ language and cultural experiences could become an asset to achieve academic success (4). It leads to a discussion about leads to the creation of the intercultural language diamond model to teach and learn (through) languages (5). Projects based on the language model give the opportunity to discuss this proposal (5): interests and possible limitations (6). The conclusion (7) pledges the use of the language diamond to counterbalance the ideology which considers diversity as an issue, and therefore adopt a holistic, maximalist point of view: a pan-language and pan-cultural approach to encompass the complexity of education challenges today. Full article
Show Figures

Figure 1

15 pages, 540 KB  
Article
A Mixed Malay–English Language COVID-19 Twitter Dataset: A Sentiment Analysis
by Jeffery T. H. Kong, Filbert H. Juwono, Ik Ying Ngu, I. Gde Dharma Nugraha, Yan Maraden and W. K. Wong
Big Data Cogn. Comput. 2023, 7(2), 61; https://doi.org/10.3390/bdcc7020061 - 27 Mar 2023
Cited by 9 | Viewed by 7049
Abstract
Social media has evolved into a platform for the dissemination of information, including fake news. There is a lot of false information about the current situation of the Coronavirus Disease 2019 (COVID-19) pandemic, such as false information regarding vaccination. In this paper, we [...] Read more.
Social media has evolved into a platform for the dissemination of information, including fake news. There is a lot of false information about the current situation of the Coronavirus Disease 2019 (COVID-19) pandemic, such as false information regarding vaccination. In this paper, we focus on sentiment analysis for Malaysian COVID-19-related news on social media such as Twitter. Tweets in Malaysia are often a combination of Malay, English, and Chinese with plenty of short forms, symbols, emojis, and emoticons within the maximum length of a tweet. The contributions of this paper are twofold. Firstly, we built a multilingual COVID-19 Twitter dataset, comprising tweets written from 1 September 2021 to 12 December 2021. In particular, we collected 108,246 tweets, with over 67% in Malay language, 27% in English, 2% in Chinese, and 4% in other languages. We then manually annotated and assigned the sentiment of 11,568 tweets into three-class sentiments (positive, negative, and neutral) to develop a Malay-language sentiment analysis tool. For this purpose, we applied a data compression method using Byte-Pair Encoding (BPE) on the texts and used two deep learning approaches, i.e., the Multilingual Bidirectional Encoder Representation for Transformer (M-BERT) and convolutional neural network (CNN). BPE tokenization is used to encode rare and unknown words into smaller meaningful subwords. With the CNN, we converted the labeled tweets into image files. Our experiments explored different BPE vocabulary sizes with our BPE-Text-to-Image-CNN and BPE-M-BERT models. The results show that the optimal vocabulary size for BPE is 12,000; any values beyond that would not contribute much to the F1-score. Overall, our results show that BPE-M-BERT slightly outperforms the CNN model, thereby showing that the pre-trained M-BERT network has the advantage for our multilingual dataset. Full article
(This article belongs to the Topic Social Computing and Social Network Analysis)
Show Figures

Figure 1

17 pages, 966 KB  
Article
Improving Many-to-Many Neural Machine Translation via Selective and Aligned Online Data Augmentation
by Weitai Zhang, Lirong Dai, Junhua Liu and Shijin Wang
Appl. Sci. 2023, 13(6), 3946; https://doi.org/10.3390/app13063946 - 20 Mar 2023
Cited by 3 | Viewed by 3068
Abstract
Multilingual neural machine translation (MNMT) models are theoretically attractive for low- and zero-resource language pairs with the impact of cross-lingual knowledge transfer. Existing approaches mainly focus on English-centric directions and always underperform compared to their pivot-based counterparts for non-English directions. In this work, [...] Read more.
Multilingual neural machine translation (MNMT) models are theoretically attractive for low- and zero-resource language pairs with the impact of cross-lingual knowledge transfer. Existing approaches mainly focus on English-centric directions and always underperform compared to their pivot-based counterparts for non-English directions. In this work, we aim to build a many-to-many MNMT system with an emphasis on the quality of non-English directions by exploring selective and aligned online data augmentation algorithms. Based on our findings showing that the augmented synthetic samples are not “the more, the better” we propose selective online back-translation (SOBT) and thoroughly study different selection criteria to pick suitable samples for training. Furthermore, we boost SOBT with cross-lingual online substitution (CLOS) to align token representations and encourage transfer learning. Our intuition is based on the hypothesis that a universal cross-lingual representation leads to a better multilingual translation performance, especially for non-English directions. Comparing to previous state-of-the-art many-to-many MNMT models and conventional pivot-based methods, experiments on IWSLT2014 and OPUS-100 translation benchmarks show that our approach achieves a competitive or even better performance on English-centric directions and achieves up to ∼12 BLEU for non-English directions. All of our models and codes are publicly available. Full article
(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications)
Show Figures

Figure 1

Back to TopTop