Submit to Mathematics Review for Mathematics Propose a Special Issue

Journal Menu

Journal Browser

Representation Learning for Computer Vision and Pattern Recognition

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "E1: Mathematics and Computer Science".

Deadline for manuscript submissions: closed (20 June 2025) | Viewed by 28855

Share This Special Issue

Special Issue Editors

Dr. Guangwei Gao

E-Mail Website
Guest Editor

Institute of Advanced Technology, Nanjing University of Posts and Telecommunications, Nanjing 210046, China
Interests: machine learning; pattern recognition; learning-based vision problems
Special Issues, Collections and Topics in MDPI journals

Dr. Juncheng Li

E-Mail Website
Guest Editor

Center for Mathematical Artificial Intelligence (CMAI), Department of Mathematics, The Chinese University of Hong Kong, Hong Kong, China
Interests: artificial intelligence and its applications to computer vision
Special Issues, Collections and Topics in MDPI journals

Dr. Zhi Li

E-Mail Website
Guest Editor

School of Computer Science and Technology, East China Normal University, Shanghai 200062, China
Interests: image processing; machine learning

Special Issue Information

Dear Colleagues,

Representation learning has always been an important research area in Computer Vision and Pattern Recognition. A good representation of practical data is critical to achieve satisfactory performance. Broadly speaking, such presentation can be an "intra-data representation" or an "inter-data representation". Intra-data representation focuses on extracting or refining the raw feature of a data point itself. Representative methods range from early stage hand-crafted feature design (e.g., SIFT, LBP, HoG, etc.) to feature extraction (e.g., PCA, LDA, LLE, etc.) and feature selection (e.g., sparsity-based and submodularity-based methods) established in the past two decades, until the recent development of deep neural networks (e.g., CNN, RNN, GNN, GAN, etc.). Inter-data representation characterizes the relationship between different data points or the structure carried out by the dataset. For example, metric learning, kernel learning and causality reasoning investigate the spatial or temporal relationships among different examples, while subspace learning, manifold learning and clustering discover the underlying structural property inherited by the dataset.

The above analysis reflects that representation learning covers a wide range of research topics related to pattern recognition. On one hand, many new algorithms on representation learning are put forward every year to cater to the needs of processing and understanding various practical multimedia data. On the other hand, massive problems regarding representation learning still remain unsolved, especially for big data and noisy data. Thereby, the objective of this Special Issue is to provide a stage for researchers all over the world to publish their latest and original results on representation learning.

Topics include but are not limited to:

Metric learning and kernel learning;
Multi-view/Multi-modal learning;
Robust representation and coding;
Domain transfer learning ;
Learning under low-quality media data;
Efficient vision Transformer;
Deep learning and its applications.

Dr. Guangwei Gao
Dr. Juncheng Li
Dr. Zhi Li
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

representation learning
computer vision
pattern recognition
metric learning and kernel learning
multi-view/multi-modal learning
robust representation and coding
domain transfer learning
learning under low-quality media data
efficient vision Transformer
deep learning and its applications

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (10 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

Jump to: Review

18 pages, 5806 KiB

Open AccessArticle

Optical Flow Magnification and Cosine Similarity Feature Fusion Network for Micro-Expression Recognition

by Heyou Chang, Jiazheng Yang, Kai Huang, Wei Xu, Jian Zhang and Hao Zheng

Mathematics 2025, 13(15), 2330; https://doi.org/10.3390/math13152330 - 22 Jul 2025

Abstract

Recent advances in deep learning have significantly advanced micro-expression recognition, yet most existing methods process the entire facial region holistically, struggling to capture subtle variations in facial action units, which limits recognition performance. To address this challenge, we propose the Optical Flow Magnification and Cosine Similarity Feature Fusion Network (MCNet). MCNet introduces a multi-facial action optical flow estimation module that integrates global motion-amplified optical flow with localized optical flow from the eye and mouth–nose regions, enabling precise capture of facial expression nuances. Additionally, an enhanced MobileNetV3-based feature extraction module, incorporating Kolmogorov–Arnold networks and convolutional attention mechanisms, effectively captures both global and local features from optical flow images. A novel multi-channel feature fusion module leverages cosine similarity between Query and Key token sequences to optimize feature integration. Extensive evaluations on four public datasets—CASME II, SAMM, SMIC-HS, and MMEW—demonstrate MCNet’s superior performance, achieving state-of-the-art results with 92.88% UF1 and 86.30% UAR on the composite dataset, surpassing the best prior method by 1.77% in UF1 and 6.0% in UAR. Full article

(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

20 pages, 3577 KiB

Open AccessArticle

Auxcoformer: Auxiliary and Contrastive Transformer for Robust Crack Detection in Adverse Weather Conditions

by Jae Hyun Yoon, Jong Won Jung and Seok Bong Yoo

Mathematics 2024, 12(5), 690; https://doi.org/10.3390/math12050690 - 27 Feb 2024

Cited by 2 | Viewed by 1441

Abstract

Crack detection is integral in civil infrastructure maintenance, with automated robots for detailed inspections and repairs becoming increasingly common. Ensuring fast and accurate crack detection for autonomous vehicles is crucial for safe road navigation. In these fields, existing detection models demonstrate impressive performance. However, they are primarily optimized for clear weather and struggle with occlusions and brightness variations in adverse weather conditions. These problems affect automated robots and autonomous vehicle navigation that must operate reliably in diverse environmental conditions. To address this problem, we propose Auxcoformer, designed for robust crack detection in adverse weather conditions. Considering the image degradation caused by adverse weather conditions, Auxcoformer incorporates an auxiliary restoration network. This network efficiently restores damaged crack details, ensuring the primary detection network obtains better quality features. The proposed approach uses a non-local patch-based 3D transform technique, emphasizing the characteristics of cracks and making them more distinguishable. Considering the connectivity of cracks, we also introduce contrastive patch loss for precise localization. Then, we demonstrate the performance of Auxcoformer, comparing it with other detection models through experiments. Full article

(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

11 pages, 13153 KiB

Open AccessArticle

Image Steganography and Style Transformation Based on Generative Adversarial Network

by Li Li, Xinpeng Zhang, Kejiang Chen, Guorui Feng, Deyang Wu and Weiming Zhang

Mathematics 2024, 12(4), 615; https://doi.org/10.3390/math12040615 - 19 Feb 2024

Cited by 6 | Viewed by 3057

Abstract

Traditional image steganography conceals secret messages in unprocessed natural images by modifying the pixel value, causing the obtained stego to be different from the original image in terms of the statistical distribution; thereby, it can be detected by a well-trained classifier for steganalysis. To ensure the steganography is imperceptible and in line with the trend of art images produced by Artificial-Intelligence-Generated Content (AIGC) becoming popular on social networks, this paper proposes to embed hidden information throughout the process of the generation of an art-style image by designing an image-style-transformation neural network with a steganography function. The proposed scheme takes a content image, an art-style image, and messages to be embedded as inputs, processing them with an encoder–decoder model, and finally, generates a styled image containing the secret messages at the same time. An adversarial training technique was applied to enhance the imperceptibility of the generated art-style stego image from plain-style-transferred images. The lack of the original cover image makes it difficult for the opponent learning steganalyzer to identify the stego. The proposed approach can successfully withstand existing steganalysis techniques and attain the embedding capacity of three bits per pixel for a color image, according to the experimental results. Full article

(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

16 pages, 1587 KiB

Open AccessArticle

Quantized Graph Neural Networks for Image Classification

by Xinbiao Xu, Liyan Ma, Tieyong Zeng and Qinghua Huang

Mathematics 2023, 11(24), 4927; https://doi.org/10.3390/math11244927 - 11 Dec 2023

Cited by 2 | Viewed by 2778

Abstract

Researchers have resorted to model quantization to compress and accelerate graph neural networks (GNNs). Nevertheless, several challenges remain: (1) quantization functions overlook outliers in the distribution, leading to increased quantization errors; (2) the reliance on full-precision teacher models results in higher computational and memory overhead. To address these issues, this study introduces a novel framework called quantized graph neural networks for image classification (QGNN-IC), which incorporates a novel quantization function, Pauta quantization (PQ), and two innovative self-distillation methods, attention quantization distillation (AQD) and stochastic quantization distillation (SQD). Specifically, PQ utilizes the statistical characteristics of distribution to effectively eliminate outliers, thereby promoting fine-grained quantization and reducing quantization errors. AQD enhances the semantic information extraction capability by learning from beneficial channels via attention. SQD enhances the quantization robustness through stochastic quantization. AQD and SQD significantly improve the performance of the quantized model with minimal overhead. Extensive experiments show that QGNN-IC not only surpasses existing state-of-the-art quantization methods but also demonstrates robust generalizability. Full article

(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

19 pages, 3179 KiB

Open AccessArticle

Self-Organizing Memory Based on Adaptive Resonance Theory for Vision and Language Navigation

by Wansen Wu, Yue Hu, Kai Xu, Long Qin and Quanjun Yin

Mathematics 2023, 11(19), 4192; https://doi.org/10.3390/math11194192 - 7 Oct 2023

Viewed by 2136

Abstract

Vision and Language Navigation (VLN) is a task in which an agent needs to understand natural language instructions to reach the target location in a real-scene environment. To improve the model ability of long-horizon planning, emerging research focuses on extending the models with different types of memory structures, mainly including topological maps or a hidden state vector. However, the fixed-length hidden state vector is often insufficient to capture long-term temporal context. In comparison, topological maps have been shown to be beneficial for many robotic navigation tasks. Therefore, we focus on building a feasible and effective topological map representation and using it to improve the navigation performance and the generalization across seen and unseen environments. This paper presents a S elf-organizing Memory based on Adaptive Resonance Theory (SMART) module for incremental topological mapping and a framework for utilizing the SMART module to guide navigation. Based on fusion adaptive resonance theory networks, the SMART module can extract salient scenes from historical observations and build a topological map of the environmental layout. It provides a compact spatial representation and supports the discovery of novel shortcuts through inferences while being explainable in terms of cognitive science. Furthermore, given a language instruction and on top of the topological map, we propose a vision–language alignment framework for navigational decision-making. Notably, the framework utilizes three off-the-shelf pre-trained models to perform landmark extraction, node–landmark matching, and low-level controlling, without any fine-tuning on human-annotated datasets. We validate our approach using the Habitat simulator on VLN-CE tasks, which provides a photo-realistic environment for the embodied agent in continuous action space. The experimental results demonstrate that our approach achieves comparable performance to the supervised baseline. Full article

(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

11 pages, 3998 KiB

Open AccessArticle

Representing Blurred Image without Deblurring

by Shuren Qi, Yushu Zhang, Chao Wang and Rushi Lan

Mathematics 2023, 11(10), 2239; https://doi.org/10.3390/math11102239 - 10 May 2023

Cited by 1 | Viewed by 1833

Abstract

The effective recognition of patterns from blurred images presents a fundamental difficulty for many practical vision tasks. In the era of deep learning, the main ideas to cope with this difficulty are data augmentation and deblurring. However, both facing issues such as inefficiency, instability, and lack of explainability. In this paper, we explore a simple but effective way to define invariants from blurred images, without data augmentation and deblurring. Here, the invariants are designed from Fractional Moments under Projection operators (FMP), where the blur invariance and rotation invariance are guaranteed by the general theorem of blur invariants and the Fourier-domain rotation equivariance, respectively. In general, the proposed FMP not only bears a simpler explicit definition, but also has useful representation properties including orthogonality, statistical flexibility, as well as the combined invariance of blurring and rotation. Simulation experiments are provided to demonstrate such properties of our FMP, revealing the potential for small-scale robust vision problems. Full article

(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

16 pages, 3672 KiB

Open AccessArticle

Two-Dimensional Exponential Sparse Discriminant Local Preserving Projections

by Minghua Wan, Yuxi Zhang, Guowei Yang and Hongjian Guo

Mathematics 2023, 11(7), 1722; https://doi.org/10.3390/math11071722 - 4 Apr 2023

Cited by 1 | Viewed by 1562

Abstract

The two-dimensional discriminant locally preserved projections (2DDLPP) algorithm adds a between-class weighted matrix and a within-class weighted matrix into the objective function of the two-dimensional locally preserved projections (2DLPP) algorithm, which overcomes the disadvantage of 2DLPP, i.e., that it cannot use the discrimination information. However, the small sample size (SSS) problem still exists, and 2DDLPP processes the whole original image, which may contain a large amount of redundant information in the retained features. Therefore, we propose a new algorithm, two-dimensional exponential sparse discriminant local preserving projections (2DESDLPP), to address these problems. This integrates 2DDLPP, matrix exponential function and elastic net regression. Firstly, 2DESDLPP introduces the matrix exponential into the objective function of 2DDLPP, making it positive definite. This is an effective method to solve the SSS problem. Moreover, it uses distance diffusion mapping to convert the original image into a new subspace to further expand the margin between labels. Thus more feature information will be retained for classification. In addition, the elastic net regression method is used to find the optimal sparse projection matrix to reduce redundant information. Finally, through high performance experiments with the ORL, Yale and AR databases, it is proven that the 2DESDLPP algorithm is superior to the other seven mainstream feature extraction algorithms. In particular, its accuracy rate is 3.15%, 2.97% and 4.82% higher than that of 2DDLPP in the three databases, respectively. Full article

(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

14 pages, 781 KiB

Open AccessArticle

Robust Exponential Graph Regularization Non-Negative Matrix Factorization Technology for Feature Extraction

by Minghua Wan, Mingxiu Cai and Guowei Yang

Mathematics 2023, 11(7), 1716; https://doi.org/10.3390/math11071716 - 3 Apr 2023

Cited by 3 | Viewed by 1942

Abstract

Graph regularized non-negative matrix factorization (GNMF) is widely used in feature extraction. In the process of dimensionality reduction, GNMF can retain the internal manifold structure of data by adding a regularizer to non-negative matrix factorization (NMF). Because Ga NMF regularizer is implemented by local preserving projections (LPP), there are small sample size problems (SSS). In view of the above problems, a new algorithm named robust exponential graph regularized non-negative matrix factorization (REGNMF) is proposed in this paper. By adding a matrix exponent to the regularizer of GNMF, the possible existing singular matrix will change into a non-singular matrix. This model successfully solves the problems in the above algorithm. For the optimization problem of the REGNMF algorithm, we use a multiplicative non-negative updating rule to iteratively solve the REGNMF method. Finally, this method is applied to AR, COIL database, Yale noise set, and AR occlusion dataset for performance test, and the experimental results are compared with some existing methods. The results indicate that the proposed method is more significant. Full article

(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

Review

Jump to: Research

30 pages, 814 KiB

Open AccessReview

Review on Channel Estimation for Reconfigurable Intelligent Surface Assisted Wireless Communication System

by Yun Yu, Jinhao Wang, Xiao Zhou, Chengyou Wang, Zhiquan Bai and Zhun Ye

Mathematics 2023, 11(14), 3235; https://doi.org/10.3390/math11143235 - 23 Jul 2023

Cited by 12 | Viewed by 6629

Abstract

With the dramatic increase in the number of mobile users and wireless devices accessing the network, the performance of fifth generation (5G) wireless communication systems has been severely challenged. Reconfigurable intelligent surface (RIS) has received much attention as one of the promising technologies for the sixth generation (6G) due to its ease of deployment, low power consumption, and low price. RIS is an electromagnetic metamaterial that serves to reconfigure the wireless environment by adjusting the phase, amplitude, and frequency of the wireless signal. To maximize channel transmission efficiency and improve the reliability of communication systems, the acquisition of channel state information (CSI) is essential. Therefore, an effective channel estimation method guarantees the achievement of excellent RIS performance. This survey presents a comprehensive study of existing channel estimation methods for RIS. Firstly, channel estimation methods in high and low frequency bands are overviewed and compared. We focus on channel estimation in the high frequency band and analyze the system model. Then, the comprehensive description of the different channel estimation methods is given, with a focus on the application of deep learning. Finally, we conclude the paper and provide an outlook on the future development of RIS channel estimation. Full article

(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

21 pages, 3285 KiB

Open AccessReview

Review of Quaternion-Based Color Image Processing Methods

by Chaoyan Huang, Juncheng Li and Guangwei Gao

Mathematics 2023, 11(9), 2056; https://doi.org/10.3390/math11092056 - 26 Apr 2023

Cited by 24 | Viewed by 5502

Abstract

Images are a convenient way for humans to obtain information and knowledge, but they are often destroyed throughout the collection or distribution process. Therefore, image processing evolves as the need arises, and color image processing is a broad and active field. A color image includes three distinct but closely related channels (red, green, and blue (RGB)). Compared to directly expressing color images as vectors or matrices, the quaternion representation offers an effective alternative. There are several papers and works on this subject, as well as numerous definitions, hypotheses, and methodologies. Our observations indicate that the quaternion representation method is effective, and models and methods based on it have rapidly developed. Hence, the purpose of this paper is to review and categorize past methods, as well as study their efficacy and computational examples. We hope that this research will be helpful to academics interested in quaternion representation. Full article

(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)

► Show Figures

Journal Menu

Journal Browser

Representation Learning for Computer Vision and Pattern Recognition

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (10 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI