Computer Vision, Pattern Recognition, Machine Learning, and Symmetry, 2nd Edition

A special issue of Symmetry (ISSN 2073-8994). This special issue belongs to the section "Computer".

Deadline for manuscript submissions: 31 January 2026 | Viewed by 13002

Special Issue Editors


E-Mail Website
Guest Editor
Key Laboratory of Digital Performance and Simulation Technology, Beijing Institute of Technology, Beijing 100081, China
Interests: multimedia retrieval; computer vision; machine learning; digital performance
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of computers and informatics, Technical University of Košice, 040 01 Košice, Slovakia
Interests: semantics of programming languages; software engineering; formal methods in software engineering
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

We would like to invite you to submit your work to this Special Issue, "Computer Vision, Pattern Recognition, Machine Learning, and Symmetry", on the topic of symmetry/asymmetry. This Special Issue seeks high-quality contributions in the fields of computer vision/pattern recognition/machine learning and symmetry in theory, as well as applications to solve practical application problems.

This Special Issue of Symmetry will collect articles on solving real-world problems by solving data- and learning-centric technologies, including computer vision, pattern recognition, and the correlation between machine learning and symmetry. We are soliciting contributions covering all related topics including, but not limited to, vision, multimedia, biometrics, behavior analysis, adversarial learning, simulation, network security, Internet of Things, and performance. The main criteria for submission are theoretical and application-centric innovative methods aimed at solving real-world problems. There is no limit on the number of pages, but the submissions must demonstrate an understanding of the theme and a contribution to the topic.

Dr. Longfei Zhang
Dr. William Steingartner
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Symmetry is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • applied statistics
  • pattern recognition
  • behavior analysis
  • artificial intelligence
  • machine learning
  • adversarial learning
  • reinforcement learning
  • deep learning
  • emerging technologies (telecommunications, blockchain, Internet of Things, cyber security, digital performance, smart creativity)

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issue

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 1697 KB  
Article
Football-YOLO: A Lightweight and Symmetry-Aware Football Detection Model with an Enlarged Receptive Field
by Jingjing Zhou, Hongyang Liu, Gang Zhao and Ying Gao
Symmetry 2025, 17(12), 2046; https://doi.org/10.3390/sym17122046 - 1 Dec 2025
Viewed by 168
Abstract
In modern elite football, accurate ball localization is increasingly vital for smooth match flow and reliable officiating. Yet mainstream detectors still struggle with small objects like footballs in cluttered scenes due to limited receptive fields, weak feature representations, and non-trivial computational cost. To [...] Read more.
In modern elite football, accurate ball localization is increasingly vital for smooth match flow and reliable officiating. Yet mainstream detectors still struggle with small objects like footballs in cluttered scenes due to limited receptive fields, weak feature representations, and non-trivial computational cost. To address these issues and introduce structural symmetry, we propose a lightweight framework that balances model complexity and representational completeness. Concretely, we design a Dynamic clustering C3k2 module (DcC3k2) to enlarge the effective receptive field and preserve local–global symmetry and a SegNeXt-based noise-attentive C3k2 module (SNAC3k2) to perform multi-scale suppression of background interference. For efficient feature extraction, we adopt GhostNetV2—a lightweight convolutional backbone—thereby maintaining computational symmetry and speed. Experiments on a Football dataset show that our approach improves mAP by 3.4% over strong baselines while reducing computation by 2.2%. These results validate symmetry-aware lightweight design as a promising direction for high-precision small-object detection in football analytics. Full article
Show Figures

Figure 1

18 pages, 12803 KB  
Article
AHLFNet: Adaptive High–Low Frequency Collaborative Auxiliary Feature Alignment Network
by Chunguang Yue and Jinbao Li
Symmetry 2025, 17(11), 1952; https://doi.org/10.3390/sym17111952 - 13 Nov 2025
Viewed by 287
Abstract
Dense image prediction tasks require both strong semantic category information and precise boundary delineation in order to be effectively applied to downstream applications. However, existing networks typically fuse deep coarse features with adjacent fine features directly through upsampling. Such a straightforward upsampling strategy [...] Read more.
Dense image prediction tasks require both strong semantic category information and precise boundary delineation in order to be effectively applied to downstream applications. However, existing networks typically fuse deep coarse features with adjacent fine features directly through upsampling. Such a straightforward upsampling strategy not only blurs boundaries due to the loss of high-frequency information, but also amplifies intra-class conflicts caused by high-frequency interference within the same object. To address these issues, this paper proposes an Adaptive High–Low Frequency Collaborative Auxiliary Feature Alignment Network(AHLFNet), which consists of an Adaptive Low-Frequency Multi-Kernel Smoothing Unit(ALFU), a Gate-Controlled Selector(GCS), and an Adaptive High-Frequency Edge Enhancement Unit(AHFU). The ALFU suppresses high-frequency components within objects, mitigating interference during upsampling and thereby reducing intra-class conflicts. The GCS adaptively chooses suitable convolutional kernels based on the size of similar regions to ensure accurate upsampled features. The AHFU preserves high-frequency details from low-level features, enabling more refined boundary delineation. Extensive experiments demonstrate that the proposed network achieves state-of-the-art performance across various downstream tasks. Full article
Show Figures

Figure 1

23 pages, 11276 KB  
Article
EP-REx: Evidence-Preserving Receptive-Field Expansion for Efficient Crack Segmentation
by Sanghyuck Lee, Jeongwon Lee, Timur Khairulov, Daehyeon Kim and Jaesung Lee
Symmetry 2025, 17(10), 1653; https://doi.org/10.3390/sym17101653 - 4 Oct 2025
Viewed by 512
Abstract
Crack segmentation plays a vital role in ensuring structural safety, yet practical deployment on resource-limited platforms demands models that balance accuracy with efficiency. While high-accuracy models often rely on computationally heavy designs to expand their receptive fields, recent lightweight approaches typically delay this [...] Read more.
Crack segmentation plays a vital role in ensuring structural safety, yet practical deployment on resource-limited platforms demands models that balance accuracy with efficiency. While high-accuracy models often rely on computationally heavy designs to expand their receptive fields, recent lightweight approaches typically delay this expansion to the deepest, low-resolution layers to maintain efficiency. This design choice leaves long-range context underutilized, where fine-grained evidence is most intact. In this paper, we propose an evidence-preserving receptive-field expansion network, which integrates a multi-scale dilated block to efficiently capture long-range context from the earliest stages and an input-guided gate that leverages grayscale conversion, average pooling, and gradient extraction to highlight crack evidence directly from raw inputs. Experiments on six benchmark datasets demonstrate that the proposed network achieves consistently higher accuracy under lightweight constraints. Each of the three proposed variants—Base, Small, and Tiny—outperforms its corresponding baselines with larger parameter counts, surpassing a total of 13 models. For example, the Base variant reduces parameters by 66% compared to the second-best CrackFormer II and floating-point operations by 53% on the Ceramic dataset, while still delivering superior accuracy. Pareto analyses further confirm that the proposed model establishes a superior accuracy–efficiency trade-off across parameters and floating-point operations. Full article
Show Figures

Figure 1

16 pages, 4587 KB  
Article
FAMNet: A Lightweight Stereo Matching Network for Real-Time Depth Estimation in Autonomous Driving
by Jingyuan Zhang, Qiang Tong, Na Yan and Xiulei Liu
Symmetry 2025, 17(8), 1214; https://doi.org/10.3390/sym17081214 - 1 Aug 2025
Viewed by 1983
Abstract
Accurate and efficient stereo matching is fundamental to real-time depth estimation from symmetric stereo cameras in autonomous driving systems. However, existing high-accuracy stereo matching networks typically rely on computationally expensive 3D convolutions, which limit their practicality in real-world environments. In contrast, real-time methods [...] Read more.
Accurate and efficient stereo matching is fundamental to real-time depth estimation from symmetric stereo cameras in autonomous driving systems. However, existing high-accuracy stereo matching networks typically rely on computationally expensive 3D convolutions, which limit their practicality in real-world environments. In contrast, real-time methods often sacrifice accuracy or generalization capability. To address these challenges, we propose FAMNet (Fusion Attention Multi-Scale Network), a lightweight and generalizable stereo matching framework tailored for real-time depth estimation in autonomous driving applications. FAMNet consists of two novel modules: Fusion Attention-based Cost Volume (FACV) and Multi-scale Attention Aggregation (MAA). FACV constructs a compact yet expressive cost volume by integrating multi-scale correlation, attention-guided feature fusion, and channel reweighting, thereby reducing reliance on heavy 3D convolutions. MAA further enhances disparity estimation by fusing multi-scale contextual cues through pyramid-based aggregation and dual-path attention mechanisms. Extensive experiments on the KITTI 2012 and KITTI 2015 benchmarks demonstrate that FAMNet achieves a favorable trade-off between accuracy, efficiency, and generalization. On KITTI 2015, with the incorporation of FACV and MAA, the prediction accuracy of the baseline model is improved by 37% and 38%, respectively, and a total improvement of 42% is achieved by our final model. These results highlight FAMNet’s potential for practical deployment in resource-constrained autonomous driving systems requiring real-time and reliable depth perception. Full article
Show Figures

Figure 1

27 pages, 14879 KB  
Article
Research on AI-Driven Classification Possibilities of Ball-Burnished Regular Relief Patterns Using Mixed Symmetrical 2D Image Datasets Derived from 3D-Scanned Topography and Photo Camera
by Stoyan Dimitrov Slavov, Lyubomir Si Bao Van, Marek Vozár, Peter Gogola and Diyan Minkov Dimitrov
Symmetry 2025, 17(7), 1131; https://doi.org/10.3390/sym17071131 - 15 Jul 2025
Viewed by 644
Abstract
The present research is related to the application of artificial intelligence (AI) approaches for classifying surface textures, specifically regular reliefs patterns formed by ball burnishing operations. A two-stage methodology is employed, starting with the creation of regular reliefs (RRs) on test parts by [...] Read more.
The present research is related to the application of artificial intelligence (AI) approaches for classifying surface textures, specifically regular reliefs patterns formed by ball burnishing operations. A two-stage methodology is employed, starting with the creation of regular reliefs (RRs) on test parts by ball burnishing, followed by 3D topography scanning with Alicona device and data preprocessing with Gwyddion, and Blender software, where the acquired 3D topographies are converted into a set of 2D images, using various virtual camera movements and lighting to simulate the symmetrical fluctuations around the tool-path of the real camera. Four pre-trained convolutional neural networks (DenseNet121, EfficientNetB0, MobileNetV2, and VGG16) are used as a base for transfer learning and tested for their generalization performance on different combinations of synthetic and real image datasets. The models were evaluated by using confusion matrices and four additional metrics. The results show that the pretrained VGG16 model generalizes the best regular reliefs textures (96%), in comparison with the other models, if it is subjected to transfer learning via feature extraction, using mixed dataset, which consist of 34,037 images in following proportions: non-textured synthetic (87%), textured synthetic (8%), and real captured (5%) images of such a regular relief. Full article
Show Figures

Figure 1

23 pages, 5304 KB  
Article
Improvement and Optimization of Underwater Image Target Detection Accuracy Based on YOLOv8
by Yisong Sun, Wei Chen, Qixin Wang, Tianzhong Fang and Xinyi Liu
Symmetry 2025, 17(7), 1102; https://doi.org/10.3390/sym17071102 - 9 Jul 2025
Viewed by 824
Abstract
The ocean encompasses the majority of the Earth’s surface and harbors substantial energy resources. Nevertheless, the intricate and asymmetrically distributed underwater environment renders existing target detection performance inadequate. This paper presents an enhanced YOLOv8s approach for underwater robot object detection to address issues [...] Read more.
The ocean encompasses the majority of the Earth’s surface and harbors substantial energy resources. Nevertheless, the intricate and asymmetrically distributed underwater environment renders existing target detection performance inadequate. This paper presents an enhanced YOLOv8s approach for underwater robot object detection to address issues of subpar image quality and low recognition accuracy. The precise measures are enumerated as follows: initially, to address the issue of model parameters, we optimized the ninth convolutional layer by substituting certain conventional convolutions with adaptive deformable convolution DCN v4. This modification aims to more effectively capture the deformation and intricate features of underwater targets, while simultaneously decreasing the parameter count and enhancing the model’s ability to manage the deformation challenges presented by underwater images. Furthermore, the Triplet Attention module is implemented to augment the model’s capacity for detecting multi-scale targets. The integration of low-level superficial features with high-level semantic features enhances the feature expression capability. The original CIoU loss function was ultimately substituted with Shape IoU, enhancing the model’s performance. In the underwater robot grasping experiment, the system shows particular robustness in handling radial symmetry in marine organisms and reflection symmetry in artificial structures. The enhanced algorithm attained a mean Average Precision (mAP) of 87.6%, surpassing the original YOLOv8s model by 3.4%, resulting in a marked enhancement of the object detection model’s performance and fulfilling the real-time detection criteria for underwater robots. Full article
Show Figures

Figure 1

21 pages, 8372 KB  
Article
Audio-Visual Learning for Multimodal Emotion Recognition
by Siyu Fan, Jianan Jing and Chongwen Wang
Symmetry 2025, 17(3), 418; https://doi.org/10.3390/sym17030418 - 11 Mar 2025
Cited by 1 | Viewed by 5992
Abstract
Most current emotion recognition methods are often limited to a single- or dual-modality approach, neglecting the rich information embedded in other modalities. This limitation hampers the accurate identification of complex or subtle emotional expressions. Additionally, to reduce the computational cost during inference, minimizing [...] Read more.
Most current emotion recognition methods are often limited to a single- or dual-modality approach, neglecting the rich information embedded in other modalities. This limitation hampers the accurate identification of complex or subtle emotional expressions. Additionally, to reduce the computational cost during inference, minimizing the model’s parameter size is essential. To address these challenges, we utilize the concept of symmetry to design a balanced multimodal architecture that integrates facial expressions, speech, and body posture information, aiming to enhance both recognition performance and computational efficiency. By leveraging the E-Branchformer network and using the F1- score as the primary performance evaluation metric, the experiments are mainly conducted on the CREMA-D corpora. The experimental results demonstrate that the proposed model outperforms baseline models on the CREMA-D dataset and an extended dataset incorporating eNTERFACE’05, achieving significant performance improvements while reducing the number of parameters. These findings demonstrate the effectiveness of the proposed approach and provide a new technical solution for the field of emotion recognition. Full article
Show Figures

Figure 1

34 pages, 8852 KB  
Article
A Biologically Inspired Model for Detecting Object Motion Direction in Stereoscopic Vision
by Yuxiao Hua, Sichen Tao, Yuki Todo, Tianqi Chen, Zhiyu Qiu and Zheng Tang
Symmetry 2025, 17(2), 162; https://doi.org/10.3390/sym17020162 - 22 Jan 2025
Viewed by 1435
Abstract
This paper presents a biologically inspired model, the Stereoscopic Direction Detection Mechanism (SDDM), designed to detect motion direction in three-dimensional space. The model addresses two key challenges: the lack of biological interpretability in current deep learning models and the limited exploration of binocular [...] Read more.
This paper presents a biologically inspired model, the Stereoscopic Direction Detection Mechanism (SDDM), designed to detect motion direction in three-dimensional space. The model addresses two key challenges: the lack of biological interpretability in current deep learning models and the limited exploration of binocular functionality in existing biologically inspired models. Rooted in the fundamental concept of ’disparity’, the SDDM is structurally divided into components representing the left and right eyes. Each component mimics the layered architecture of the human visual system, from the retinal layer to the primary visual cortex. By replicating the functions of various cells involved in stereoscopic motion direction detection, the SDDM offers enhanced biological plausibility and interpretability. Extensive experiments were conducted to evaluate the model’s detection accuracy for various objects and its robustness against different types of noise. Additionally, to ascertain whether the SDDM matches the performance of established deep learning models in the field of three-dimensional motion direction detection, its performance was benchmarked against EfficientNet and ResNet under identical conditions. The results demonstrate that the SDDM not only exhibits strong performance and robust biological interpretability but also requires significantly lower hardware and time costs compared to advanced deep learning models. Full article
Show Figures

Figure 1

Back to TopTop