Representation Learning for Computer Vision and Pattern Recognition

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: closed (28 February 2024) | Viewed by 9590

Special Issue Editors


E-Mail Website
Guest Editor
Institute of Advanced Technology, Nanjing University of Posts and Telecommunications, Nanjing 210046, China
Interests: machine learning; pattern recognition; learning-based vision problems
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Center for Mathematical Artificial Intelligence (CMAI), Department of Mathematics, The Chinese University of Hong Kong, Hong Kong, China
Interests: artificial intelligence and its applications to computer vision

E-Mail Website
Guest Editor
School of Computer Science and Technology, East China Normal University, Shanghai 200062, China
Interests: image processing; machine learning

Special Issue Information

Dear Colleagues,

Representation learning has always been an important research area in Computer Vision and Pattern Recognition. A good representation of practical data is critical to achieve satisfactory performance. Broadly speaking, such presentation can be an "intra-data representation" or an "inter-data representation". Intra-data representation focuses on extracting or refining the raw feature of a data point itself. Representative methods range from early stage hand-crafted feature design (e.g., SIFT, LBP, HoG, etc.) to feature extraction (e.g., PCA, LDA, LLE, etc.) and feature selection (e.g., sparsity-based and submodularity-based methods) established in the past two decades, until the recent development of deep neural networks (e.g., CNN, RNN, GNN, GAN, etc.). Inter-data representation characterizes the relationship between different data points or the structure carried out by the dataset. For example, metric learning, kernel learning and causality reasoning investigate the spatial or temporal relationships among different examples, while subspace learning, manifold learning and clustering discover the underlying structural property inherited by the dataset.

The above analysis reflects that representation learning covers a wide range of research topics related to pattern recognition. On one hand, many new algorithms on representation learning are put forward every year to cater to the needs of processing and understanding various practical multimedia data. On the other hand, massive problems regarding representation learning still remain unsolved, especially for big data and noisy data. Thereby, the objective of this Special Issue is to provide a stage for researchers all over the world to publish their latest and original results on representation learning.

Topics include but are not limited to:

  • Metric learning and kernel learning;
  • Multi-view/Multi-modal learning;
  • Robust representation and coding;
  • Domain transfer learning ;
  • Learning under low-quality media data;
  • Efficient vision Transformer;
  • Deep learning and its applications.

Dr. Guangwei Gao
Dr. Juncheng Li
Dr. Zhi Li
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • representation learning
  • computer vision
  • pattern recognition
  • metric learning and kernel learning
  • multi-view/multi-modal learning
  • robust representation and coding
  • domain transfer learning
  • learning under low-quality media data
  • efficient vision Transformer
  • deep learning and its applications

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

20 pages, 3577 KiB  
Article
Auxcoformer: Auxiliary and Contrastive Transformer for Robust Crack Detection in Adverse Weather Conditions
by Jae Hyun Yoon, Jong Won Jung and Seok Bong Yoo
Mathematics 2024, 12(5), 690; https://doi.org/10.3390/math12050690 - 27 Feb 2024
Viewed by 425
Abstract
Crack detection is integral in civil infrastructure maintenance, with automated robots for detailed inspections and repairs becoming increasingly common. Ensuring fast and accurate crack detection for autonomous vehicles is crucial for safe road navigation. In these fields, existing detection models demonstrate impressive performance. [...] Read more.
Crack detection is integral in civil infrastructure maintenance, with automated robots for detailed inspections and repairs becoming increasingly common. Ensuring fast and accurate crack detection for autonomous vehicles is crucial for safe road navigation. In these fields, existing detection models demonstrate impressive performance. However, they are primarily optimized for clear weather and struggle with occlusions and brightness variations in adverse weather conditions. These problems affect automated robots and autonomous vehicle navigation that must operate reliably in diverse environmental conditions. To address this problem, we propose Auxcoformer, designed for robust crack detection in adverse weather conditions. Considering the image degradation caused by adverse weather conditions, Auxcoformer incorporates an auxiliary restoration network. This network efficiently restores damaged crack details, ensuring the primary detection network obtains better quality features. The proposed approach uses a non-local patch-based 3D transform technique, emphasizing the characteristics of cracks and making them more distinguishable. Considering the connectivity of cracks, we also introduce contrastive patch loss for precise localization. Then, we demonstrate the performance of Auxcoformer, comparing it with other detection models through experiments. Full article
(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)
Show Figures

Figure 1

11 pages, 13153 KiB  
Article
Image Steganography and Style Transformation Based on Generative Adversarial Network
by Li Li, Xinpeng Zhang, Kejiang Chen, Guorui Feng, Deyang Wu and Weiming Zhang
Mathematics 2024, 12(4), 615; https://doi.org/10.3390/math12040615 - 19 Feb 2024
Viewed by 654
Abstract
Traditional image steganography conceals secret messages in unprocessed natural images by modifying the pixel value, causing the obtained stego to be different from the original image in terms of the statistical distribution; thereby, it can be detected by a well-trained classifier for steganalysis. [...] Read more.
Traditional image steganography conceals secret messages in unprocessed natural images by modifying the pixel value, causing the obtained stego to be different from the original image in terms of the statistical distribution; thereby, it can be detected by a well-trained classifier for steganalysis. To ensure the steganography is imperceptible and in line with the trend of art images produced by Artificial-Intelligence-Generated Content (AIGC) becoming popular on social networks, this paper proposes to embed hidden information throughout the process of the generation of an art-style image by designing an image-style-transformation neural network with a steganography function. The proposed scheme takes a content image, an art-style image, and messages to be embedded as inputs, processing them with an encoder–decoder model, and finally, generates a styled image containing the secret messages at the same time. An adversarial training technique was applied to enhance the imperceptibility of the generated art-style stego image from plain-style-transferred images. The lack of the original cover image makes it difficult for the opponent learning steganalyzer to identify the stego. The proposed approach can successfully withstand existing steganalysis techniques and attain the embedding capacity of three bits per pixel for a color image, according to the experimental results. Full article
(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)
Show Figures

Figure 1

16 pages, 1587 KiB  
Article
Quantized Graph Neural Networks for Image Classification
by Xinbiao Xu, Liyan Ma, Tieyong Zeng and Qinghua Huang
Mathematics 2023, 11(24), 4927; https://doi.org/10.3390/math11244927 - 11 Dec 2023
Viewed by 1201
Abstract
Researchers have resorted to model quantization to compress and accelerate graph neural networks (GNNs). Nevertheless, several challenges remain: (1) quantization functions overlook outliers in the distribution, leading to increased quantization errors; (2) the reliance on full-precision teacher models results in higher computational and [...] Read more.
Researchers have resorted to model quantization to compress and accelerate graph neural networks (GNNs). Nevertheless, several challenges remain: (1) quantization functions overlook outliers in the distribution, leading to increased quantization errors; (2) the reliance on full-precision teacher models results in higher computational and memory overhead. To address these issues, this study introduces a novel framework called quantized graph neural networks for image classification (QGNN-IC), which incorporates a novel quantization function, Pauta quantization (PQ), and two innovative self-distillation methods, attention quantization distillation (AQD) and stochastic quantization distillation (SQD). Specifically, PQ utilizes the statistical characteristics of distribution to effectively eliminate outliers, thereby promoting fine-grained quantization and reducing quantization errors. AQD enhances the semantic information extraction capability by learning from beneficial channels via attention. SQD enhances the quantization robustness through stochastic quantization. AQD and SQD significantly improve the performance of the quantized model with minimal overhead. Extensive experiments show that QGNN-IC not only surpasses existing state-of-the-art quantization methods but also demonstrates robust generalizability. Full article
(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)
Show Figures

Figure 1

19 pages, 3179 KiB  
Article
Self-Organizing Memory Based on Adaptive Resonance Theory for Vision and Language Navigation
by Wansen Wu, Yue Hu, Kai Xu, Long Qin and Quanjun Yin
Mathematics 2023, 11(19), 4192; https://doi.org/10.3390/math11194192 - 07 Oct 2023
Viewed by 907
Abstract
Vision and Language Navigation (VLN) is a task in which an agent needs to understand natural language instructions to reach the target location in a real-scene environment. To improve the model ability of long-horizon planning, emerging research focuses on extending the models with [...] Read more.
Vision and Language Navigation (VLN) is a task in which an agent needs to understand natural language instructions to reach the target location in a real-scene environment. To improve the model ability of long-horizon planning, emerging research focuses on extending the models with different types of memory structures, mainly including topological maps or a hidden state vector. However, the fixed-length hidden state vector is often insufficient to capture long-term temporal context. In comparison, topological maps have been shown to be beneficial for many robotic navigation tasks. Therefore, we focus on building a feasible and effective topological map representation and using it to improve the navigation performance and the generalization across seen and unseen environments. This paper presents a S elf-organizing Memory based on Adaptive Resonance Theory (SMART) module for incremental topological mapping and a framework for utilizing the SMART module to guide navigation. Based on fusion adaptive resonance theory networks, the SMART module can extract salient scenes from historical observations and build a topological map of the environmental layout. It provides a compact spatial representation and supports the discovery of novel shortcuts through inferences while being explainable in terms of cognitive science. Furthermore, given a language instruction and on top of the topological map, we propose a vision–language alignment framework for navigational decision-making. Notably, the framework utilizes three off-the-shelf pre-trained models to perform landmark extraction, node–landmark matching, and low-level controlling, without any fine-tuning on human-annotated datasets. We validate our approach using the Habitat simulator on VLN-CE tasks, which provides a photo-realistic environment for the embodied agent in continuous action space. The experimental results demonstrate that our approach achieves comparable performance to the supervised baseline. Full article
(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)
Show Figures

Figure 1

11 pages, 3998 KiB  
Article
Representing Blurred Image without Deblurring
by Shuren Qi, Yushu Zhang, Chao Wang and Rushi Lan
Mathematics 2023, 11(10), 2239; https://doi.org/10.3390/math11102239 - 10 May 2023
Cited by 1 | Viewed by 1077
Abstract
The effective recognition of patterns from blurred images presents a fundamental difficulty for many practical vision tasks. In the era of deep learning, the main ideas to cope with this difficulty are data augmentation and deblurring. However, both facing issues such as inefficiency, [...] Read more.
The effective recognition of patterns from blurred images presents a fundamental difficulty for many practical vision tasks. In the era of deep learning, the main ideas to cope with this difficulty are data augmentation and deblurring. However, both facing issues such as inefficiency, instability, and lack of explainability. In this paper, we explore a simple but effective way to define invariants from blurred images, without data augmentation and deblurring. Here, the invariants are designed from Fractional Moments under Projection operators (FMP), where the blur invariance and rotation invariance are guaranteed by the general theorem of blur invariants and the Fourier-domain rotation equivariance, respectively. In general, the proposed FMP not only bears a simpler explicit definition, but also has useful representation properties including orthogonality, statistical flexibility, as well as the combined invariance of blurring and rotation. Simulation experiments are provided to demonstrate such properties of our FMP, revealing the potential for small-scale robust vision problems. Full article
(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)
Show Figures

Figure 1

16 pages, 3672 KiB  
Article
Two-Dimensional Exponential Sparse Discriminant Local Preserving Projections
by Minghua Wan, Yuxi Zhang, Guowei Yang and Hongjian Guo
Mathematics 2023, 11(7), 1722; https://doi.org/10.3390/math11071722 - 04 Apr 2023
Cited by 1 | Viewed by 839
Abstract
The two-dimensional discriminant locally preserved projections (2DDLPP) algorithm adds a between-class weighted matrix and a within-class weighted matrix into the objective function of the two-dimensional locally preserved projections (2DLPP) algorithm, which overcomes the disadvantage of 2DLPP, i.e., that it cannot use the discrimination [...] Read more.
The two-dimensional discriminant locally preserved projections (2DDLPP) algorithm adds a between-class weighted matrix and a within-class weighted matrix into the objective function of the two-dimensional locally preserved projections (2DLPP) algorithm, which overcomes the disadvantage of 2DLPP, i.e., that it cannot use the discrimination information. However, the small sample size (SSS) problem still exists, and 2DDLPP processes the whole original image, which may contain a large amount of redundant information in the retained features. Therefore, we propose a new algorithm, two-dimensional exponential sparse discriminant local preserving projections (2DESDLPP), to address these problems. This integrates 2DDLPP, matrix exponential function and elastic net regression. Firstly, 2DESDLPP introduces the matrix exponential into the objective function of 2DDLPP, making it positive definite. This is an effective method to solve the SSS problem. Moreover, it uses distance diffusion mapping to convert the original image into a new subspace to further expand the margin between labels. Thus more feature information will be retained for classification. In addition, the elastic net regression method is used to find the optimal sparse projection matrix to reduce redundant information. Finally, through high performance experiments with the ORL, Yale and AR databases, it is proven that the 2DESDLPP algorithm is superior to the other seven mainstream feature extraction algorithms. In particular, its accuracy rate is 3.15%, 2.97% and 4.82% higher than that of 2DDLPP in the three databases, respectively. Full article
(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)
Show Figures

Figure 1

14 pages, 781 KiB  
Article
Robust Exponential Graph Regularization Non-Negative Matrix Factorization Technology for Feature Extraction
by Minghua Wan, Mingxiu Cai and Guowei Yang
Mathematics 2023, 11(7), 1716; https://doi.org/10.3390/math11071716 - 03 Apr 2023
Cited by 1 | Viewed by 1219
Abstract
Graph regularized non-negative matrix factorization (GNMF) is widely used in feature extraction. In the process of dimensionality reduction, GNMF can retain the internal manifold structure of data by adding a regularizer to non-negative matrix factorization (NMF). Because Ga NMF regularizer is implemented by [...] Read more.
Graph regularized non-negative matrix factorization (GNMF) is widely used in feature extraction. In the process of dimensionality reduction, GNMF can retain the internal manifold structure of data by adding a regularizer to non-negative matrix factorization (NMF). Because Ga NMF regularizer is implemented by local preserving projections (LPP), there are small sample size problems (SSS). In view of the above problems, a new algorithm named robust exponential graph regularized non-negative matrix factorization (REGNMF) is proposed in this paper. By adding a matrix exponent to the regularizer of GNMF, the possible existing singular matrix will change into a non-singular matrix. This model successfully solves the problems in the above algorithm. For the optimization problem of the REGNMF algorithm, we use a multiplicative non-negative updating rule to iteratively solve the REGNMF method. Finally, this method is applied to AR, COIL database, Yale noise set, and AR occlusion dataset for performance test, and the experimental results are compared with some existing methods. The results indicate that the proposed method is more significant. Full article
(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)
Show Figures

Figure 1

Review

Jump to: Research

21 pages, 3285 KiB  
Review
Review of Quaternion-Based Color Image Processing Methods
by Chaoyan Huang, Juncheng Li and Guangwei Gao
Mathematics 2023, 11(9), 2056; https://doi.org/10.3390/math11092056 - 26 Apr 2023
Cited by 6 | Viewed by 2343
Abstract
Images are a convenient way for humans to obtain information and knowledge, but they are often destroyed throughout the collection or distribution process. Therefore, image processing evolves as the need arises, and color image processing is a broad and active field. A color [...] Read more.
Images are a convenient way for humans to obtain information and knowledge, but they are often destroyed throughout the collection or distribution process. Therefore, image processing evolves as the need arises, and color image processing is a broad and active field. A color image includes three distinct but closely related channels (red, green, and blue (RGB)). Compared to directly expressing color images as vectors or matrices, the quaternion representation offers an effective alternative. There are several papers and works on this subject, as well as numerous definitions, hypotheses, and methodologies. Our observations indicate that the quaternion representation method is effective, and models and methods based on it have rapidly developed. Hence, the purpose of this paper is to review and categorize past methods, as well as study their efficacy and computational examples. We hope that this research will be helpful to academics interested in quaternion representation. Full article
(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)
Show Figures

Figure 1

Back to TopTop