Submit to Special Issue Submit Abstract to Special Issue Review for Symmetry Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

Symmetry/Asymmetry Studies in Data Mining & Machine Learning of Large Language Models

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Symmetry (ISSN 2073-8994). This special issue belongs to the section "Computer".

Deadline for manuscript submissions: 31 January 2026 | Viewed by 7608

Share This Special Issue

Special Issue Editors

Dr. Shaolin Zhu

E-Mail Website
Guest Editor

College of Intelligence and Computing, Tianjin University, Tianjin, China
Interests: natural language processing; machine translation

Dr. Lijie Wen

E-Mail Website
Guest Editor

School of Software, Tsinghua University, Beijing, China
Interests: named entity recognition; relation extraction; natural language inference; abstract meaning representation; text to SQL; robustness and watermark of LLMs, AI, ML, and NLP
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The landscape of data mining and machine learning has been dramatically reshaped by the advent of large language models (LLMs). These powerful, data-hungry models have pushed the boundaries of what is possible, achieving unprecedented performance in tasks ranging from natural language understanding to image generation. However, as we delve deeper into the workings of LLMs, the intricate interplay between symmetry and asymmetry becomes increasingly apparent, presenting both opportunities and challenges for future research. Traditionally, symmetry played a central role in data mining and machine learning, with algorithms often seeking to identify recurring patterns and regularities. This approach, while effective in certain domains, has limitations when dealing with the vast and complex datasets that LLMs consume. Asymmetry, in contrast, offers a more nuanced perspective, acknowledging the inherent variability and irregularity within real-world data.

This Special Issue focuses on exploring the implications of symmetry and asymmetry in the context of large language models:

LLMs often rely on massive, diverse datasets that exhibit inherent asymmetry. How can we leverage this asymmetry to improve data representation and encoding within LLMs?
Can we develop new data representation techniques that specifically capture asymmetrical relationships?
How can we incorporate domain-specific knowledge to address asymmetries in the data?
The design of LLMs inherently involves balancing symmetric and asymmetric elements, from the architecture of neural networks to the training process. How can we leverage the interplay between symmetry and asymmetry to optimize model performance and efficiency?
Can we design new neural network architectures that are more adept at handling asymmetrical data?
How can we utilize asymmetric training strategies to enhance model performance and robustness?
Explainability and Interpretability: LLMs are often criticized for their lack of transparency. Can the principles of symmetry and asymmetry contribute to developing more explainable and interpretable LLMs, making their decisions more understandable?
How can we use symmetry and asymmetry to identify key features and relationships that drive LLM predictions?
Can we develop new visualization techniques that highlight the interplay between symmetric and asymmetric patterns in LLM decision-making?
Bias and Fairness: The vast datasets used to train LLMs can contain inherent biases, which may manifest as asymmetrical patterns. How can we use our understanding of symmetry and asymmetry to mitigate bias and promote fairness in LLMs?
How can we identify and mitigate asymmetrical biases that may be present in the training data?
Can we design new methods to measure and quantify the impact of symmetry and asymmetry on fairness in LLM outputs?
Beyond NLP and Vision: LLMs are increasingly being applied in diverse domains beyond natural language processing and computer vision. How do the concepts of symmetry and asymmetry manifest in these new applications, and how can they be leveraged to improve model performance?
How can we apply the principles of symmetry and asymmetry to develop LLMs for tasks like time-series analysis, scientific data analysis, or drug discovery?

Dr. Shaolin Zhu
Dr. Lijie Wen
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Symmetry is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

symmetry
asymmetry
data mining
machine learning
large language models
deep learning

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (7 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

24 pages, 2616 KB

Open AccessArticle

Symmetric Affix–Context Co-Attention: A Dual-Gating Framework for Robust POS Tagging in Low-Resource MRLs

by Yuan Qi, Samat Ali and Alim Murat

Symmetry 2025, 17(9), 1561; https://doi.org/10.3390/sym17091561 - 18 Sep 2025

Viewed by 493

Abstract

Part-of-speech (POS) tagging in low-resource, morphologically rich languages (LRLs/MRLs) remains challenging due to extensive affixation, high out-of-vocabulary (OOV) rates, and pervasive polysemy. We propose MRL-POS, a unified Transformer-CRF framework that dynamically selects informative affix features and integrates them with deep contextual embeddings via a novel dual-gating co-attention mechanism. First, a Dynamic Affix Selector adaptively adjusts n-gram ranges and frequency thresholds based on word length to ensure high-precision affix segmentation. Second, the Affix–Context Co-Attention Module employs two gating functions that conditionally amplify contextual dimensions with affix cues and vice versa, enabling robust disambiguation of complex and ambiguous forms. Third, Layer-Wise Attention Pooling aggregates multi-layer XLM-RoBERTa representations, emphasizing those most relevant for morphological and syntactic tagging. Evaluations on Uyghur, Kyrgyz, and Uzbek show that MRL-POS achieves an average F₁ of 84.10%, OOV accuracy of 84.24%, and Poly-F₁ of 72.14%, outperforming strong baselines by up to 8 F₁ points. By explicitly modeling the symmetry between morphological affix cues and sentence-level context through a dual-gating co-attention mechanism, MRL-POS achieves a balanced fusion that both preserves local structure and captures global dependencies. Interpretability analyses confirm that 89.1% of the selected affixes align with linguistic expectations. This symmetric design not only enhances robustness in low-resource and agglutinative settings but also offers a general paradigm for symmetry-aware sequence labeling tasks. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry Studies in Data Mining & Machine Learning of Large Language Models)

► Show Figures

Figure 1

21 pages, 728 KB

Open AccessArticle

Resolving Linguistic Asymmetry: Forging Symmetric Multilingual Embeddings Through Asymmetric Contrastive and Curriculum Learning

by Lei Meng, Yinlin Li, Wei Wei and Caipei Yang

Symmetry 2025, 17(9), 1386; https://doi.org/10.3390/sym17091386 - 25 Aug 2025

Viewed by 836

Abstract

The pursuit of universal, symmetric semantic representations within large language models (LLMs) faces a fundamental challenge: the inherent asymmetry of natural languages. Different languages exhibit vast disparities in syntactic structures, lexical choices, and cultural nuances, making the creation of a truly shared, symmetric embedding space a non-trivial task. This paper aims to address this critical problem by introducing a novel framework to forge robust and symmetric multilingual sentence embeddings. Our approach, named DACL (Dynamic Asymmetric Contrastive Learning), is anchored in two powerful asymmetric learning paradigms: Contrastive Learning and Dynamic Curriculum Learning (DCL). We extend Contrastive Learning to the multilingual context, where it asymmetrically treats semantically equivalent sentences from different languages (positive pairs) and sentences with distinct meanings (negative pairs) to enforce semantic symmetry in the target embedding space. To further refine this process, we incorporate Dynamic Curriculum Learning, which introduces a second layer of asymmetry by dynamically scheduling training instances from easy to hard. This dual-asymmetric strategy enables the model to progressively master complex cross-lingual relationships, starting with more obvious semantic equivalences and advancing to subtler ones. Our comprehensive experiments on benchmark cross-lingual tasks, including sentence retrieval and cross-lingual classification (XNLI, PAWS-X, MLDoc, MARC), demonstrate that DACL significantly outperforms a wide range of established baselines. The results validate our dual-asymmetric framework as a highly effective approach for forging robust multilingual embeddings, particularly excelling in tasks involving complex linguistic asymmetries. Ultimately, this work contributes a novel dual-asymmetric learning framework that effectively leverages linguistic asymmetry to achieve robust semantic symmetry across languages. It offers valuable insights for developing more capable, fair, and interpretable multilingual LLMs, emphasizing that deliberately leveraging asymmetry in the learning process is a highly effective strategy. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry Studies in Data Mining & Machine Learning of Large Language Models)

► Show Figures

Figure 1

27 pages, 490 KB

Open AccessArticle

Dynamic Asymmetric Attention for Enhanced Reasoning and Interpretability in LLMs

by Feng Wen, Xiaoming Lu, Haikun Yu, Chunyang Lu, Huijie Li and Xiayang Shi

Symmetry 2025, 17(8), 1303; https://doi.org/10.3390/sym17081303 - 12 Aug 2025

Viewed by 826

Abstract

The remarkable success of autoregressive Large Language Models (LLMs) is predicated on the causal attention mechanism, which enforces a static and rigid form of informational asymmetry by permitting each token to attend only to its predecessors. While effective for sequential generation, this hard-coded unidirectional constraint fails to capture the more complex, dynamic, and nonlinear dependencies inherent in sophisticated reasoning, logical inference, and discourse. In this paper, we challenge this paradigm by introducing Dynamic Asymmetric Attention (DAA), a novel mechanism that replaces the static causal mask with a learnable context-aware guidance module. DAA dynamically generates a continuous-valued attention bias for each query–key pair, effectively learning a “soft” information flow policy that guides rather than merely restricts the model’s focus. Trained end-to-end, our DAA-augmented models demonstrate significant performance gains on a suite of benchmarks, including improvements in perplexity on language modeling and notable accuracy boosts on complex reasoning tasks such as code generation (HumanEval) and mathematical problem-solving (GSM8k). Crucially, DAA provides a new lens for model interpretability. By visualizing the learned asymmetric attention patterns, it is possible to uncover the implicit information flow graphs that the model constructs during inference. These visualizations reveal how the model dynamically prioritizes evidence and forges directed logical links in chain-of-thought reasoning, making its decision-making process more transparent. Our work demonstrates that transitioning from a static hard-wired asymmetry to a learned and dynamic one not only enhances model performance but also paves the way for a new class of more capable and profoundly more explainable LLMs. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry Studies in Data Mining & Machine Learning of Large Language Models)

► Show Figures

Figure 1

20 pages, 407 KB

Open AccessArticle

Leveraging Asymmetric Adaptation with Dynamic Sparse LoRA for Enhanced Nuance in LLM-Based Offensive Language Detection

by Yanzhe Wang, Bingquan Chen and Jingchao Sun

Symmetry 2025, 17(7), 1076; https://doi.org/10.3390/sym17071076 - 7 Jul 2025

Viewed by 1255

Abstract

The challenge of detecting nuanced, context-dependent offensive language highlights the need for Large Language Model (LLM) adaptation strategies that can effectively address inherent data and task asymmetries. Standard Parameter-Efficient Finetuning (PEFT) methods like Low-Rank Adaptation (LoRA), while efficient, often employ a more uniform, or symmetric, update mechanism that can be suboptimal for capturing such linguistic subtleties. In this paper, we propose Dynamic Sparse LoRA (DS-LoRA), a novel technique that leverages asymmetric adaptation to enhance LLM finetuning for nuanced offensive language detection. DS-LoRA achieves this by (1) incorporating input-dependent gating mechanisms, enabling the asymmetric modulation of LoRA module contributions based on instance-specific characteristics, and (2) promoting asymmetric sparsity within LoRA update matrices via L1 regularization. This dual asymmetric strategy empowers the model to selectively engage and refine only the most pertinent parameters for a given input, fostering a more parsimonious and contextually aware adaptation. Extensive experiments on benchmark datasets demonstrate DS-LoRA’s significant overperformance over standard LoRA and other strong baselines, particularly in identifying subtle and contextually ambiguous offensive content, underscoring the benefits of its asymmetric adaptive capabilities. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry Studies in Data Mining & Machine Learning of Large Language Models)

► Show Figures

Figure 1

20 pages, 904 KB

Open AccessArticle

Addressing Structural Asymmetry: Unsupervised Joint Training of Bilingual Embeddings for Non-Isomorphic Spaces

by Lei Meng, Xiaona Yang, Shangfeng Chen and Xiaojun Zhao

Symmetry 2025, 17(7), 1005; https://doi.org/10.3390/sym17071005 - 26 Jun 2025

Viewed by 639

Abstract

Bilingual Word Embeddings (BWEs) are crucial for multilingual NLP tasks, enabling cross-lingual transfer. While traditional joint training methods require bilingual corpora, their applicability is limited for many language pairs, especially low-resource ones. Unsupervised methods, relying on the isomorphism assumption, suffer from performance degradation when dealing with non-isomorphic embedding spaces, which are common in distant language pairs. This structural asymmetry challenges conventional approaches. To address these limitations, we propose a novel unsupervised joint training method for BWEs. We leverage monolingual corpora and introduce a dynamic programming algorithm to extract bilingual text segments, facilitating concurrent BWE training without relying on explicit bilingual supervision. Our approach effectively mitigates the challenge posed by asymmetric, non-isomorphic spaces by jointly learning BWEs in a shared space. Extensive experiments demonstrate the superiority of our method compared to existing approaches, particularly for distant language pairs exhibiting significant structural asymmetry Full article

(This article belongs to the Special Issue Symmetry/Asymmetry Studies in Data Mining & Machine Learning of Large Language Models)

► Show Figures

Figure 1

21 pages, 23794 KB

Open AccessArticle

Towards Faithful Local Explanations: Leveraging SVM to Interpret Black-Box Machine Learning Models

by Jiaxiang Xu, Zhanhao Zhang, Junfei Wang, Biao Ouyang, Benkuan Zhou, Jianxiong Zhao, Hanfang Ge and Bo Xu

Symmetry 2025, 17(6), 950; https://doi.org/10.3390/sym17060950 - 15 Jun 2025

Viewed by 674

Abstract

Although machine learning (ML) models are widely used in many fields, their prediction processes are often hard to understand. This lack of transparency makes it harder for people to trust them, especially in high-stakes fields like healthcare and finance. Human-interpretable explanations for model predictions are crucial in these contexts. While existing local interpretation methods have been proposed, many suffer from low local fidelity, instability, and limited effectiveness when applied to highly nonlinear models. This paper presents SVM-X, a model-agnostic local explanation approach designed to address these challenges. By leveraging the inherent symmetry of the SVM hyperplane, SVM-X precisely captures the local decision boundaries of complex nonlinear models, providing more accurate and stable explanations. Experimental evaluations on the UCI Adult dataset, the Bank Marketing dataset, and the Amazon Product Review dataset demonstrate that SVM-X consistently outperforms state-of-the-art methods like LIME and LEMNA. Notably, SVM-X achieves up to a 27.2% improvement in accuracy. Our work introduces a reliable and interpretable framework for understanding machine learning predictions, offering a promising new direction for future research. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry Studies in Data Mining & Machine Learning of Large Language Models)

► Show Figures

Figure 1

17 pages, 1801 KB

Open AccessArticle

Addressing Asymmetry in Contrastive Learning: LLM-Driven Sentence Embeddings with Ranking and Label Smoothing

by Yan Huang, Shaoben Zhu, Wei Liu, Jiayi Wang and Xinheng Wei

Symmetry 2025, 17(5), 646; https://doi.org/10.3390/sym17050646 - 25 Apr 2025

Cited by 1 | Viewed by 1877

Abstract

Unsupervised sentence embedding, vital for numerous NLP tasks, struggles with the inherent asymmetry of semantic relationships within contrastive learning (CL). This paper proposes Label Smoothing-based Ranking Negative Sampling (LS-RNS), a novel framework that directly tackles the semantic asymmetry between anchor and negative samples in CL. LS-RNS utilizes a Large Language Model (LLM) to assess fine-grained asymmetric similarity scores between sentences, constructing a ranking-aware negative sampling strategy combined with adaptive label smoothing. This design encourages the model to learn more effectively from informative negatives that are semantically closer to the anchor, leading to asymmetry-aware sentence embeddings. Experiments on standard Semantic Textual Similarity (STS) benchmarks (STS12–STS16, STS-B, SICK-R) show that LS-RNS achieves state-of-the-art performance. We adopt Spearman’s rank correlation coefficient as the primary evaluation metric for semantic similarity tasks, and we use classification accuracy for downstream and transfer tasks. LS-RNS achieves 79.87 on STS tasks with BERT-base (vs. 76.25 for SimCSE, +3.62) and 80.41 with RoBERTa-base (vs. 79.18 for DiffCSE). On transfer tasks, it attains 88.82 (BERT) and 87.68 (RoBERTa), consistently outperforming PromptBERT and SNCSE. On STL-10, LS-RNS improves SimCLR top-one accuracy from 79.50% to 80.52% with ResNet-18 and from 68.91% to 72.19% with VGG-16, even enabling a shallow ResNet-18 to surpass a deeper ResNet-34 baseline. These results confirm the modality-agnostic effectiveness of LS-RNS and its potential to redefine contrastive learning objectives by modeling semantic asymmetry, rather than relying solely on encoder depth or pre-training objectives. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry Studies in Data Mining & Machine Learning of Large Language Models)

► Show Figures

Journal Menu

Journal Browser

Symmetry/Asymmetry Studies in Data Mining & Machine Learning of Large Language Models

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (7 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI