Symmetry/Asymmetry Studies in Data Mining & Machine Learning of Large Language Models

A special issue of Symmetry (ISSN 2073-8994). This special issue belongs to the section "Computer".

Deadline for manuscript submissions: 31 July 2025 | Viewed by 347

Special Issue Editors


E-Mail Website
Guest Editor
College of Intelligence and Computing, Tianjin University, Tianjin, China
Interests: natural language processing; machine translation

E-Mail Website
Guest Editor
School of Software, Tsinghua University, Beijing, China
Interests: named entity recognition; relation extraction; natural language inference; abstract meaning representation; text to SQL; robustness and watermark of LLMs, AI, ML, and NLP
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The landscape of data mining and machine learning has been dramatically reshaped by the advent of large language models (LLMs). These powerful, data-hungry models have pushed the boundaries of what is possible, achieving unprecedented performance in tasks ranging from natural language understanding to image generation. However, as we delve deeper into the workings of LLMs, the intricate interplay between symmetry and asymmetry becomes increasingly apparent, presenting both opportunities and challenges for future research. Traditionally, symmetry played a central role in data mining and machine learning, with algorithms often seeking to identify recurring patterns and regularities. This approach, while effective in certain domains, has limitations when dealing with the vast and complex datasets that LLMs consume. Asymmetry, in contrast, offers a more nuanced perspective, acknowledging the inherent variability and irregularity within real-world data.

This Special Issue focuses on exploring the implications of symmetry and asymmetry in the context of large language models:

  • LLMs often rely on massive, diverse datasets that exhibit inherent asymmetry. How can we leverage this asymmetry to improve data representation and encoding within LLMs?
  • Can we develop new data representation techniques that specifically capture asymmetrical relationships?
  • How can we incorporate domain-specific knowledge to address asymmetries in the data?
  • The design of LLMs inherently involves balancing symmetric and asymmetric elements, from the architecture of neural networks to the training process. How can we leverage the interplay between symmetry and asymmetry to optimize model performance and efficiency?
  • Can we design new neural network architectures that are more adept at handling asymmetrical data?
  • How can we utilize asymmetric training strategies to enhance model performance and robustness?
  • Explainability and Interpretability: LLMs are often criticized for their lack of transparency. Can the principles of symmetry and asymmetry contribute to developing more explainable and interpretable LLMs, making their decisions more understandable?
  • How can we use symmetry and asymmetry to identify key features and relationships that drive LLM predictions?
  • Can we develop new visualization techniques that highlight the interplay between symmetric and asymmetric patterns in LLM decision-making?
  • Bias and Fairness: The vast datasets used to train LLMs can contain inherent biases, which may manifest as asymmetrical patterns. How can we use our understanding of symmetry and asymmetry to mitigate bias and promote fairness in LLMs?
  • How can we identify and mitigate asymmetrical biases that may be present in the training data?
  • Can we design new methods to measure and quantify the impact of symmetry and asymmetry on fairness in LLM outputs?
  • Beyond NLP and Vision: LLMs are increasingly being applied in diverse domains beyond natural language processing and computer vision. How do the concepts of symmetry and asymmetry manifest in these new applications, and how can they be leveraged to improve model performance?
  • How can we apply the principles of symmetry and asymmetry to develop LLMs for tasks like time-series analysis, scientific data analysis, or drug discovery?

Dr. Shaolin Zhu
Dr. Lijie Wen
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Symmetry is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • symmetry
  • asymmetry
  • data mining
  • machine learning
  • large language models
  • deep learning

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (1 paper)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 1801 KiB  
Article
Addressing Asymmetry in Contrastive Learning: LLM-Driven Sentence Embeddings with Ranking and Label Smoothing
by Yan Huang, Shaoben Zhu, Wei Liu, Jiayi Wang and Xinheng Wei
Symmetry 2025, 17(5), 646; https://doi.org/10.3390/sym17050646 - 25 Apr 2025
Viewed by 130
Abstract
Unsupervised sentence embedding, vital for numerous NLP tasks, struggles with the inherent asymmetry of semantic relationships within contrastive learning (CL). This paper proposes Label Smoothing-based Ranking Negative Sampling (LS-RNS), a novel framework that directly tackles the semantic asymmetry between anchor and negative samples [...] Read more.
Unsupervised sentence embedding, vital for numerous NLP tasks, struggles with the inherent asymmetry of semantic relationships within contrastive learning (CL). This paper proposes Label Smoothing-based Ranking Negative Sampling (LS-RNS), a novel framework that directly tackles the semantic asymmetry between anchor and negative samples in CL. LS-RNS utilizes a Large Language Model (LLM) to assess fine-grained asymmetric similarity scores between sentences, constructing a ranking-aware negative sampling strategy combined with adaptive label smoothing. This design encourages the model to learn more effectively from informative negatives that are semantically closer to the anchor, leading to asymmetry-aware sentence embeddings. Experiments on standard Semantic Textual Similarity (STS) benchmarks (STS12–STS16, STS-B, SICK-R) show that LS-RNS achieves state-of-the-art performance. We adopt Spearman’s rank correlation coefficient as the primary evaluation metric for semantic similarity tasks, and we use classification accuracy for downstream and transfer tasks. LS-RNS achieves 79.87 on STS tasks with BERT-base (vs. 76.25 for SimCSE, +3.62) and 80.41 with RoBERTa-base (vs. 79.18 for DiffCSE). On transfer tasks, it attains 88.82 (BERT) and 87.68 (RoBERTa), consistently outperforming PromptBERT and SNCSE. On STL-10, LS-RNS improves SimCLR top-one accuracy from 79.50% to 80.52% with ResNet-18 and from 68.91% to 72.19% with VGG-16, even enabling a shallow ResNet-18 to surpass a deeper ResNet-34 baseline. These results confirm the modality-agnostic effectiveness of LS-RNS and its potential to redefine contrastive learning objectives by modeling semantic asymmetry, rather than relying solely on encoder depth or pre-training objectives. Full article
Show Figures

Figure 1

Back to TopTop