Advances in Computer Vision and Machine Learning

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: closed (30 September 2023) | Viewed by 46582

Special Issue Editors


E-Mail Website
Guest Editor
Key Laboratory of Spectral Imaging Technology CAS, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, China
Interests: remote sensing scene classification; cross-domain scene classification
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
Interests: Information and communication engineering; satellite communication and satellite navigation; machine learning; pattern recognition
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Computer vision focuses on the theories and practices that give rise to semantically meaningful interpretations of the visual world. Mathematical models and tools can provide enormous opportunities for developing intelligent algorithms that extract useful information from visual data, such as a single image, a video sequence, and even a multi-/hyper-spectral image cube. In recent years, a number of  emerging machine learning techniques have been applied in visual perception tasks such as camera imaging geometry, camera calibration, image stabilization, multiview geometry, feature learning, image classification, and object recognition and tracking. However, it is still challenging to provide theoretical explanations of the underlying learning processes, especial when using deep neural networks, where a few questions remain to be answered, such as the design principles, the optimal architecture, the number of required layers, the sample complexity, and the optimization algorithms.

This Special Issue focuses on recent advances in computer vision and machine learning. The topics of interest include, but are not limited to, the following:

  • Pattern recognition and machine learning for computer vision;
  • Feature learning for computer vision;
  • Self-supervised/weakly supervised/unsupervised learning;
  • Image processing and analysis;
  • Deep neural networks in computer vision;
  • Graph neural networks;
  • Optimization method for machine learning;
  • Evolutionary computation and optimization problems;
  • Emerging applications.

Dr. Xiangtao Zheng
Prof. Dr. Jinchang Ren
Prof. Dr. Ling Wang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • computer vision
  • pattern recognition
  • statistical learning
  • data mining
  • deep learning

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (21 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 14351 KiB  
Article
A Deep Joint Network for Monocular Depth Estimation Based on Pseudo-Depth Supervision
by Jiahai Tan, Ming Gao, Tao Duan and Xiaomei Gao
Mathematics 2023, 11(22), 4645; https://doi.org/10.3390/math11224645 - 14 Nov 2023
Viewed by 1246
Abstract
Depth estimation from a single image is a significant task. Although deep learning methods hold great promise in this area, they still face a number of challenges, including the limited modeling of nonlocal dependencies, lack of effective loss function joint optimization models, and [...] Read more.
Depth estimation from a single image is a significant task. Although deep learning methods hold great promise in this area, they still face a number of challenges, including the limited modeling of nonlocal dependencies, lack of effective loss function joint optimization models, and difficulty in accurately estimating object edges. In order to further increase the network’s prediction accuracy, a new structure and training method are proposed for single-image depth estimation in this research. A pseudo-depth network is first deployed for generating a single-image depth prior, and by constructing connecting paths between multi-scale local features using the proposed up-mapping and jumping modules, the network can integrate representations and recover fine details. A deep network is also designed to capture and convey global context by utilizing the Transformer Conv module and Unet Depth net to extract and refine global features. The two networks jointly provide meaningful coarse and fine features to predict high-quality depth images from single RGB images. In addition, multiple joint losses are utilized to enhance the training model. A series of experiments are carried out to confirm and demonstrate the efficacy of our method. The proposed method exceeds the advanced method DPT by 10% and 3.3% in terms of root mean square error (RMSE(log)) and 1.7% and 1.6% in terms of squared relative difference (SRD), respectively, according to experimental results on the NYU Depth V2 and KITTI depth estimation benchmarks. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

13 pages, 1136 KiB  
Article
Variational Disentangle Zero-Shot Learning
by Jie Su, Jinhao Wan, Taotao Li, Xiong Li and Yuheng Ye
Mathematics 2023, 11(16), 3578; https://doi.org/10.3390/math11163578 - 18 Aug 2023
Viewed by 1161
Abstract
Existing zero-shot learning (ZSL) methods typically focus on mapping from the feature space (e.g., visual space) to class-level attributes, often leading to a non-injective projection. Such a mapping may cause a significant loss of instance-level information. While an ideal projection to instance-level attributes [...] Read more.
Existing zero-shot learning (ZSL) methods typically focus on mapping from the feature space (e.g., visual space) to class-level attributes, often leading to a non-injective projection. Such a mapping may cause a significant loss of instance-level information. While an ideal projection to instance-level attributes would be desirable, it can also be prohibitively expensive and thus impractical in many scenarios. In this work, we propose a variational disentangle zero-shot learning (VDZSL) framework that addresses this problem by constructing variational instance-specific attributes from a class-specific semantic latent distribution. Specifically, our approach disentangles each instance into class-specific attributes and the corresponding variant features. Unlike transductive ZSL, which assumes that unseen classes’ attributions are known beforehand, our VDZSL method does not rely on this strong assumption, making it more applicable in real-world scenarios. Extensive experiments conducted on three popular ZSL benchmark datasets (i.e., AwA2, CUB, and FLO) validate the effectiveness of our approach. In the conventional ZSL setting, our method demonstrates an improvement of 12∼15% relative to the advanced approaches and achieves a classification accuracy of 70% on the AwA2 dataset. Furthermore, under the more challenging generalized ZSL setting, our approach can gain an improvement of 5∼15% compared with the advanced methods. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

19 pages, 2213 KiB  
Article
Omni-Domain Feature Extraction Method for Gait Recognition
by Jiwei Wan, Huimin Zhao, Rui Li, Rongjun Chen and Tuanjie Wei
Mathematics 2023, 11(12), 2612; https://doi.org/10.3390/math11122612 - 7 Jun 2023
Cited by 1 | Viewed by 1519
Abstract
As a biological feature with strong spatio-temporal correlation, the current difficulty of gait recognition lies in the interference of covariates (viewpoint, clothing, etc.) in feature extraction. In order to weaken the influence of extrinsic variable changes, we propose an interval frame sampling method [...] Read more.
As a biological feature with strong spatio-temporal correlation, the current difficulty of gait recognition lies in the interference of covariates (viewpoint, clothing, etc.) in feature extraction. In order to weaken the influence of extrinsic variable changes, we propose an interval frame sampling method to capture more information about joint dynamic changes, and an Omni-Domain Feature Extraction Network. The Omni-Domain Feature Extraction Network consists of three main modules: (1) Temporal-Sensitive Feature Extractor: injects key gait temporal information into shallow spatial features to improve spatio-temporal correlation. (2) Dynamic Motion Capture: extracts temporal features of different motion and assign weights adaptively. (3) Omni-Domain Feature Balance Module: balances fine-grained spatio-temporal features, highlight decisive spatio-temporal features. Extensive experiments were conducted on two commonly used public gait datasets, showing that our method has good performance and generalization ability. In CASIA-B, we achieved an average rank-1 accuracy of 94.2% under three walking conditions. In OU-MVLP, we achieved a rank-1 accuracy of 90.5%. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

10 pages, 1546 KiB  
Article
Performance Analysis of the CHAID Algorithm for Accuracy
by Yeling Yang, Feng Yi, Chuancheng Deng and Guang Sun
Mathematics 2023, 11(11), 2558; https://doi.org/10.3390/math11112558 - 2 Jun 2023
Cited by 8 | Viewed by 3124
Abstract
The chi-squared automatic interaction detector (CHAID) algorithm is considered to be one of the most used supervised learning methods as it is adaptable to solving any kind of problem at hand. We are keenly aware of the non-linear relationships among CHAID maps, and [...] Read more.
The chi-squared automatic interaction detector (CHAID) algorithm is considered to be one of the most used supervised learning methods as it is adaptable to solving any kind of problem at hand. We are keenly aware of the non-linear relationships among CHAID maps, and they can empower predictive models with stability. However, we do not precisely know how high its accuracy. To determine the perfect scope the CHAID algorithm fits into, this paper presented an analysis of the accuracy of the CHAID algorithm. We introduced the causes, applicable conditions, and application scope of the CHAID algorithm, and then highlight the differences in the branching principles between the CHAID algorithm and several other common decision tree algorithms, which is the first step towards performing a basic analysis of CHAID algorithm. We next employed an actual branching case to help us better understand the CHAID algorithm. Specifically, we used vehicle customer satisfaction data to compare multiple decision tree algorithms and cited some factors that affect the accuracy and some corresponding countermeasures that are more conducive to obtaining accurate results. The results showed that CHAID can analyze the data very well and reliably detect significantly correlated factors. This paper presents the information required to understand the CHAID algorithm, thereby enabling better choices when the use of decision tree algorithms is warranted. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

33 pages, 7191 KiB  
Article
Denoising in Representation Space via Data-Dependent Regularization for Better Representation
by Muyi Chen, Daling Wang, Shi Feng and Yifei Zhang
Mathematics 2023, 11(10), 2327; https://doi.org/10.3390/math11102327 - 16 May 2023
Viewed by 1205
Abstract
Despite the success of deep learning models, it remains challenging for the over-parameterized model to learn good representation under small-sample-size settings. In this paper, motivated by previous work on out-of-distribution (OoD) generalization, we study the representation learning problem from an OoD perspective to [...] Read more.
Despite the success of deep learning models, it remains challenging for the over-parameterized model to learn good representation under small-sample-size settings. In this paper, motivated by previous work on out-of-distribution (OoD) generalization, we study the representation learning problem from an OoD perspective to identify the fundamental factors affecting representation quality. We formulate a notion of “out-of-feature subspace (OoFS) noise” for the first time, and we link the OoFS noise in the feature extractor to the OoD performance of the model by proving two theorems that demonstrate that reducing OoFS noise in the feature extractor is beneficial in achieving better representation. Moreover, we identify two causes of OoFS noise and prove that the OoFS noise induced by random initialization can be filtered out via L2 regularization. Finally, we propose a novel data-dependent regularizer that acts on the weights of the fully connected layer to reduce noise in the representations, thus implicitly forcing the feature extractor to focus on informative features and to rely less on noise via back-propagation. Experiments on synthetic datasets show that our method can learn hard-to-learn features; can filter out noise effectively; and outperforms GD, AdaGrad, and KFAC. Furthermore, experiments on the benchmark datasets show that our method achieves the best performance for three tasks among four. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

19 pages, 1464 KiB  
Article
Task-Covariant Representations for Few-Shot Learning on Remote Sensing Images
by Liyi Zhang, Zengguang Tian, Yi Tang and Zuo Jiang
Mathematics 2023, 11(8), 1930; https://doi.org/10.3390/math11081930 - 19 Apr 2023
Cited by 2 | Viewed by 1320
Abstract
In the regression and classification of remotely sensed images through meta-learning, techniques exploit task-invariant information to quickly adapt to new tasks with fewer gradient updates. Despite its usefulness, task-invariant information alone may not effectively capture task-specific knowledge, leading to reduced model performance on [...] Read more.
In the regression and classification of remotely sensed images through meta-learning, techniques exploit task-invariant information to quickly adapt to new tasks with fewer gradient updates. Despite its usefulness, task-invariant information alone may not effectively capture task-specific knowledge, leading to reduced model performance on new tasks. As a result, the concept of task-covariance has gained significant attention from researchers. We propose task-covariant representations for few-shot Learning on remote sensing images that utilizes capsule networks to effectively represent the covariance relationships among objects. This approach is motivated by the superior ability of capsule networks to capture such relationships. To capture and leverage the covariance relations between tasks, we employ vector capsules and adapt our model parameters based on the newly learned task covariance relations. Our proposed meta-learning algorithm offers a novel approach to effectively address the real task distribution by incorporating both general and specific task information. Based on the experimental results, our proposed meta-learning algorithm shows a significant improvement in both the average accuracy and training efficiency compared to the best model in the experiments. On average, the algorithm increases the accuracy by approximately 4% and improves the training efficiency by approximately 8%. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

14 pages, 5302 KiB  
Article
Low-Light Image Enhancement by Combining Transformer and Convolutional Neural Network
by Nianzeng Yuan, Xingyun Zhao, Bangyong Sun, Wenjia Han, Jiahai Tan, Tao Duan and Xiaomei Gao
Mathematics 2023, 11(7), 1657; https://doi.org/10.3390/math11071657 - 30 Mar 2023
Cited by 5 | Viewed by 2713
Abstract
Within low-light imaging environment, the insufficient reflected light from objects often results in unsatisfactory images with degradations of low contrast, noise artifacts, or color distortion. The captured low-light images usually lead to poor visual perception quality for color deficient or normal observers. To [...] Read more.
Within low-light imaging environment, the insufficient reflected light from objects often results in unsatisfactory images with degradations of low contrast, noise artifacts, or color distortion. The captured low-light images usually lead to poor visual perception quality for color deficient or normal observers. To address the above problems, we propose an end-to-end low-light image enhancement network by combining transformer and CNN (convolutional neural network) to restore the normal light images. Specifically, the proposed enhancement network is designed into a U-shape structure with several functional fusion blocks. Each fusion block includes a transformer stem and a CNN stem, and those two stems collaborate to accurately extract the local and global features. In this way, the transformer stem is responsible for efficiently learning global semantic information and capturing long-term dependencies, while the CNN stem is good at learning local features and focusing on detailed features. Thus, the proposed enhancement network can accurately capture the comprehensive semantic information of low-light images, which significantly contribute to recover normal light images. The proposed method is compared with the current popular algorithms quantitatively and qualitatively. Subjectively, our method significantly improves the image brightness, suppresses the image noise, and maintains the texture details and color information. For objective metrics such as peak signal-to-noise ratio (PSNR), structural similarity (SSIM), image perceptual similarity (LPIPS), DeltaE, and NIQE, our method improves the optimal values by 1.73 dB, 0.05, 0.043, 0.7939, and 0.6906, respectively, compared with other methods. The experimental results show that our proposed method can effectively solve the problems of underexposure, noise interference, and color inconsistency in micro-optical images, and has certain application value. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

14 pages, 6537 KiB  
Article
Meta-Learning for Zero-Shot Remote Sensing Image Super-Resolution
by Zhangzhao Cha, Dongmei Xu, Yi Tang and Zuo Jiang
Mathematics 2023, 11(7), 1653; https://doi.org/10.3390/math11071653 - 29 Mar 2023
Cited by 4 | Viewed by 2717
Abstract
Zero-shot super-resolution (ZSSR) has generated a lot of interest due to its flexibility in various applications. However, the computational demands of ZSSR make it ineffective when dealing with large-scale low-resolution image sets. To address this issue, we propose a novel meta-learning model. We [...] Read more.
Zero-shot super-resolution (ZSSR) has generated a lot of interest due to its flexibility in various applications. However, the computational demands of ZSSR make it ineffective when dealing with large-scale low-resolution image sets. To address this issue, we propose a novel meta-learning model. We treat the set of low-resolution images as a collection of ZSSR tasks and learn meta-knowledge about ZSSR by leveraging these tasks. This approach reduces the computational burden of super-resolution for large-scale low-resolution images. Additionally, through multiple ZSSR task learning, we uncover a general super-resolution model that enhances the generalization capacity of ZSSR. Finally, using the learned meta-knowledge, our model achieves impressive results with just a few gradient updates when given a novel task. We evaluate our method using two remote sensing datasets with varying spatial resolutions. Our experimental results demonstrate that using multiple ZSSR tasks yields better outcomes than a single task, and our method outperforms other state-of-the-art super-resolution methods. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

16 pages, 20655 KiB  
Article
A Robust Sphere Detection in a Realsense Point Cloud by USING Z-Score and RANSAC
by Luis-Rogelio Roman-Rivera, Jesus Carlos Pedraza-Ortega, Marco Antonio Aceves-Fernandez, Juan Manuel Ramos-Arreguín, Efrén Gorrostieta-Hurtado and Saúl Tovar-Arriaga
Mathematics 2023, 11(4), 1023; https://doi.org/10.3390/math11041023 - 17 Feb 2023
Cited by 2 | Viewed by 1850
Abstract
Three-dimensional vision cameras, such as RGB-D, use 3D point cloud to represent scenes. File formats as XYZ and PLY are commonly used to store 3D point information as raw data, this information does not contain further details, such as metadata or segmentation, for [...] Read more.
Three-dimensional vision cameras, such as RGB-D, use 3D point cloud to represent scenes. File formats as XYZ and PLY are commonly used to store 3D point information as raw data, this information does not contain further details, such as metadata or segmentation, for the different objects in the scene. Moreover, objects in the scene can be recognized in a posterior process and can be used for other purposes, such as camera calibration or scene segmentation. We are proposing a method to recognize a basketball in the scene using its known dimensions to fit a sphere formula. In the proposed cost function we search for three different points in the scene using RANSAC (Random Sample Consensus). Furthermore, taking into account the fixed basketball size, our method differentiates the sphere geometry from other objects in the scene, making our method robust in complex scenes. In a posterior step, the sphere center is fitted using z-score values eliminating outliers from the sphere. Results show our methodology converges in finding the basketball in the scene and the center precision improves using z-score, the proposed method obtains a significant improvement by reducing outliers in scenes with noise from 1.75 to 8.3 times when using RANSAC alone. Experiments show our method has advantages when comparing with novel deep learning method. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

22 pages, 5222 KiB  
Article
Topological Regularization for Representation Learning via Persistent Homology
by Muyi Chen, Daling Wang, Shi Feng and Yifei Zhang
Mathematics 2023, 11(4), 1008; https://doi.org/10.3390/math11041008 - 16 Feb 2023
Cited by 1 | Viewed by 1753
Abstract
Generalization is challenging in small-sample-size regimes with over-parameterized deep neural networks, and a better representation is generally beneficial for generalization. In this paper, we present a novel method for controlling the internal representation of deep neural networks from a topological perspective. Leveraging the [...] Read more.
Generalization is challenging in small-sample-size regimes with over-parameterized deep neural networks, and a better representation is generally beneficial for generalization. In this paper, we present a novel method for controlling the internal representation of deep neural networks from a topological perspective. Leveraging the power of topology data analysis (TDA), we study the push-forward probability measure induced by the feature extractor, and we formulate a notion of “separation” to characterize a property of this measure in terms of persistent homology for the first time. Moreover, we perform a theoretical analysis of this property and prove that enforcing this property leads to better generalization. To impose this property, we propose a novel weight function to extract topological information, and we introduce a new regularizer including three items to guide the representation learning in a topology-aware manner. Experimental results in the point cloud optimization task show that our method is effective and powerful. Furthermore, results in the image classification task show that our method outperforms the previous methods by a significant margin. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

17 pages, 4609 KiB  
Article
Compression Reconstruction Network with Coordinated Self-Attention and Adaptive Gaussian Filtering Module
by Zhen Wei, Qiurong Yan, Xiaoqiang Lu, Yongjian Zheng, Shida Sun and Jian Lin
Mathematics 2023, 11(4), 847; https://doi.org/10.3390/math11040847 - 7 Feb 2023
Cited by 5 | Viewed by 1859
Abstract
Although compressed sensing theory has many advantages in image reconstruction, its reconstruction and sampling time is very long. Fast reconstruction of high-quality images at low measurement rates is the direction of the effort. Compressed sensing based on deep learning provides an effective solution [...] Read more.
Although compressed sensing theory has many advantages in image reconstruction, its reconstruction and sampling time is very long. Fast reconstruction of high-quality images at low measurement rates is the direction of the effort. Compressed sensing based on deep learning provides an effective solution for this. In this study, we propose an attention-based compression reconstruction mechanism (ACRM). The coordinated self-attention module (CSAM) is designed to be embedded in the main network consisting of convolutional blocks and utilizes the global space and channels to focus on key information and ignore irrelevant information. An adaptive Gaussian filter is proposed to solve the loss of multi-frequency components caused by global average pooling in the CSAM, effectively supplementing the network with different frequency information at different measurement rates. Finally, inspired by the basic idea of the attention mechanism, an improved loss function with attention mechanism (AMLoss) is proposed. Extensive experiments show that the ACRM outperforms most compression reconstruction algorithms at low measurement rates. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

21 pages, 4948 KiB  
Article
MFTransNet: A Multi-Modal Fusion with CNN-Transformer Network for Semantic Segmentation of HSR Remote Sensing Images
by Shumeng He, Houqun Yang, Xiaoying Zhang and Xuanyu Li
Mathematics 2023, 11(3), 722; https://doi.org/10.3390/math11030722 - 1 Feb 2023
Cited by 7 | Viewed by 3672
Abstract
Due to the inherent inter-class similarity and class imbalance of remote sensing images, it is difficult to obtain effective results in single-source semantic segmentation. We consider applying multi-modal data to the task of the semantic segmentation of HSR (high spatial resolution) remote sensing [...] Read more.
Due to the inherent inter-class similarity and class imbalance of remote sensing images, it is difficult to obtain effective results in single-source semantic segmentation. We consider applying multi-modal data to the task of the semantic segmentation of HSR (high spatial resolution) remote sensing images, and obtain richer semantic information by data fusion to improve the accuracy and efficiency of segmentation. However, it is still a great challenge to discover how to achieve efficient and useful information complementarity based on multi-modal remote sensing image semantic segmentation, so we have to seriously examine the numerous models. Transformer has made remarkable progress in decreasing model complexity and improving scalability and training efficiency in computer vision tasks. Therefore, we introduce Transformer into multi-modal semantic segmentation. In order to cope with the issue that the Transformer model requires a large amount of computing resources, we propose a model, MFTransNet, which combines a CNN (convolutional neural network) and Transformer to realize a lightweight multi-modal semantic segmentation structure. To do this, a small convolutional network is first used for performing preliminary feature extraction. Subsequently, these features are sent to the multi-head feature fusion module to achieve adaptive feature fusion. Finally, the features of different scales are integrated together through a multi-scale decoder. The experimental results demonstrate that MFTransNet achieves the best balance among segmentation accuracy, memory-usage efficiency and inference speed. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

13 pages, 3181 KiB  
Article
Iterative Dual CNNs for Image Deblurring
by Jinbin Wang, Ziqi Wang and Aiping Yang
Mathematics 2022, 10(20), 3891; https://doi.org/10.3390/math10203891 - 20 Oct 2022
Cited by 4 | Viewed by 2437
Abstract
Image deblurring attracts research attention in the field of image processing and computer vision. Traditional deblurring methods based on statistical prior largely depend on the selected prior type, which limits their restoring ability. Moreover, the constructed deblurring model is difficult to solve, and [...] Read more.
Image deblurring attracts research attention in the field of image processing and computer vision. Traditional deblurring methods based on statistical prior largely depend on the selected prior type, which limits their restoring ability. Moreover, the constructed deblurring model is difficult to solve, and the operation is comparatively complicated. Meanwhile, deep learning has become a hotspot in various fields in recent years. End-to-end convolutional neural networks (CNNs) can learn the pixel mapping relationships between degraded images and clear images. In addition, they can also obtain the result of effectively eliminating spatial variable blurring. However, conventional CNNs have some disadvantages in generalization ability and details of the restored image. Therefore, this paper presents an iterative dual CNN called IDC for image deblurring, where the task of image deblurring is divided into two sub-networks: deblurring and detail restoration. The deblurring sub-network adopts a U-Net structure to learn the semantical and structural features of the image, and the detail restoration sub-network utilizes a shallow and wide structure without downsampling, where only the image texture features are extracted. Finally, to obtain the deblurred image, this paper presents a multiscale iterative strategy that effectively improves the robustness and precision of the model. The experimental results showed that the proposed method has an excellent effect of deblurring on a real blurred image dataset and is suitable for various real application scenes. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

15 pages, 34888 KiB  
Article
Cattle Number Estimation on Smart Pasture Based on Multi-Scale Information Fusion
by Minyue Zhong, Yao Tan, Jie Li, Hongming Zhang and Siyi Yu
Mathematics 2022, 10(20), 3856; https://doi.org/10.3390/math10203856 - 18 Oct 2022
Cited by 4 | Viewed by 1723
Abstract
In order to solve the problem of intelligent management of cattle numbers in the pasture, a dataset of cattle density estimation was established, and a multi-scale residual cattle density estimation network was proposed to solve the problems of uneven distribution of cattle and [...] Read more.
In order to solve the problem of intelligent management of cattle numbers in the pasture, a dataset of cattle density estimation was established, and a multi-scale residual cattle density estimation network was proposed to solve the problems of uneven distribution of cattle and large scale variations caused by perspective changes in the same image. Multi-scale features are extracted by multiple parallel dilated convolutions with different dilation rates. Meanwhile, aiming at the “grid effect” caused by the use of dilated convolution, the residual structure is combined with a small dilation rate convolution to eliminate the influence of the “grid effect”. Experiments were carried out on the cattle dataset and dense population dataset, respectively. The experimental results show that the proposed multi-scale residual cattle density estimation network achieves the lowest mean absolute error (MAE) and means square error (RMSE) on the cattle dataset compared with other density estimation methods. In ShanghaiTech, a dense population dataset, the density estimation results of the multi-scale residual network are also optimal or suboptimal in MAE and RMSE. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

16 pages, 10103 KiB  
Article
Printed Texture Guided Color Feature Fusion for Impressionism Style Rendering of Oil Paintings
by Jing Geng, Li’e Ma, Xiaoquan Li, Xin Zhang and Yijun Yan
Mathematics 2022, 10(19), 3700; https://doi.org/10.3390/math10193700 - 9 Oct 2022
Cited by 1 | Viewed by 2245
Abstract
As a major branch of Non-Photorealistic Rendering (NPR), image stylization mainly uses computer algorithms to render a photo into an artistic painting. Recent work has shown that the ex-traction of style information such as stroke texture and color of the target style image [...] Read more.
As a major branch of Non-Photorealistic Rendering (NPR), image stylization mainly uses computer algorithms to render a photo into an artistic painting. Recent work has shown that the ex-traction of style information such as stroke texture and color of the target style image is the key to image stylization. Given its stroke texture and color characteristics, a new stroke rendering method is proposed. By fully considering the tonal characteristics and the representative color of the original oil painting, it can fit the tone of the original oil painting image into a stylized image whilst keeping the artist’s creative effect. The experiments have validated the efficacy of the proposed model in comparison to three state-of-the-arts. This method would be more suitable for the works of pointillism painters with a relatively uniform style, especially for natural scenes, otherwise, the results can be less satisfactory. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

19 pages, 4252 KiB  
Article
SaMfENet: Self-Attention Based Multi-Scale Feature Fusion Coding and Edge Information Constraint Network for 6D Pose Estimation
by Zhuoxiao Li, Xiaobing Li, Shihao Chen, Jialong Du and Yong Li
Mathematics 2022, 10(19), 3671; https://doi.org/10.3390/math10193671 - 7 Oct 2022
Cited by 1 | Viewed by 2098
Abstract
Accurate estimation of an object’s 6D pose is one of the crucial technologies for robotic manipulators. Especially when the lighting conditions changes or the object is occluded, resulting in the missing or the interference of the object information, which makes the accurate 6D [...] Read more.
Accurate estimation of an object’s 6D pose is one of the crucial technologies for robotic manipulators. Especially when the lighting conditions changes or the object is occluded, resulting in the missing or the interference of the object information, which makes the accurate 6D pose estimation more challenging. To estimate the 6D pose of the object accurately, a self-attention-based multi-scale feature fusion coding and edge information constraint 6D pose estimation network is proposed, which can achieve accurate 6D pose estimation by employing RGB-D images. The proposed algorithm first introduces the edge reconstruction module into the pose estimation network, which improves the attention of the feature extraction network to the edge features. Furthermore, a self-attention multi-scale point cloud feature extraction module, i.e., MSPNet, is proposed to extract point cloud geometric features, which are reconstructed from depth maps. Finally, the clustering feature encoding module, i.e., SE-NetVLAD, is proposed to encode multi-modal dense feature sequences to construct more expressive global features. The proposed method is evaluated on the LineMOD and YCB-Video datasets, and the experimental results illustrate that the proposed method has an outstanding performance, which is close to the current state-of-the-art methods. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

13 pages, 950 KiB  
Article
Residual-Prototype Generating Network for Generalized Zero-Shot Learning
by Zeqing Zhang, Xiaofan Li, Tai Ma, Zuodong Gao, Cuihua Li and Weiwei Lin
Mathematics 2022, 10(19), 3587; https://doi.org/10.3390/math10193587 - 1 Oct 2022
Cited by 3 | Viewed by 1790
Abstract
Conventional zero-shot learning aims to train a classifier on a training set (seen classes) to recognize instances of novel classes (unseen classes) by class-level semantic attributes. In generalized zero-shot learning (GZSL), the classifier needs to recognize both seen and unseen classes, which is [...] Read more.
Conventional zero-shot learning aims to train a classifier on a training set (seen classes) to recognize instances of novel classes (unseen classes) by class-level semantic attributes. In generalized zero-shot learning (GZSL), the classifier needs to recognize both seen and unseen classes, which is a problem of extreme data imbalance. To solve this problem, feature generative methods have been proposed to make up for the lack of unseen classes. Current generative methods use class semantic attributes as the cues for synthetic visual features, which can be considered mapping of the semantic attribute to visual features. However, this mapping cannot effectively transfer knowledge learned from seen classes to unseen classes because the information in the semantic attributes and the information in visual features are asymmetric: semantic attributes contain key category description information, while visual features consist of visual information that cannot be represented by semantics. To this end, we propose a residual-prototype-generating network (RPGN) for GZSL that extracts the residual visual features from original visual features by an encoder–decoder and synthesizes the prototype visual features associated with semantic attributes by a disentangle regressor. Experimental results show that the proposed method achieves competitive results on four GZSL benchmark datasets with significant gains. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

16 pages, 4001 KiB  
Article
Efficient Smoke Detection Based on YOLO v5s
by Hang Yin, Mingxuan Chen, Wenting Fan, Yuxuan Jin, Shahbaz Gul Hassan and Shuangyin Liu
Mathematics 2022, 10(19), 3493; https://doi.org/10.3390/math10193493 - 25 Sep 2022
Cited by 13 | Viewed by 3233
Abstract
Smoke detection based on video surveillance is important for early fire warning. Because the smoke is often small and thin in the early stage of a fire, using the collected smoke images for the identification and early warning of fires is very difficult. [...] Read more.
Smoke detection based on video surveillance is important for early fire warning. Because the smoke is often small and thin in the early stage of a fire, using the collected smoke images for the identification and early warning of fires is very difficult. Therefore, an improved lightweight network that combines the attention mechanism and the improved upsampling algorithm has been proposed to solve the problem of small and thin smoke in the early fire stage. Firstly, the dataset consists of self-created small and thin smoke pictures and public smoke pictures. Secondly, an attention mechanism module combined with channel and spatial attention, which are attributes of pictures, is proposed to solve the small and thin smoke detection problem. Thirdly, to increase the receptive field of the smoke feature map in the feature fusion network and to solve the problem caused by the different smoke scenes, the original upsampling has been replaced with an improved upsampling algorithm. Finally, extensive comparative experiments on the dataset show that improved detection model has demonstrated an excellent effect. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

12 pages, 779 KiB  
Article
Region Collaborative Network for Detection-Based Vision-Language Understanding
by Linyan Li, Kaile Du, Minming Gu, Fuyuan Hu and Fan Lyu
Mathematics 2022, 10(17), 3110; https://doi.org/10.3390/math10173110 - 30 Aug 2022
Viewed by 1601
Abstract
Given a query language, a Detection-based Vision-Language Understanding (DVLU) system needs to respond based on the detected regions (i.e.,bounding boxes). With the significant advancement in object detection, DVLU has witnessed great improvements in recent years, such as Visual Question Answering (VQA) and Visual [...] Read more.
Given a query language, a Detection-based Vision-Language Understanding (DVLU) system needs to respond based on the detected regions (i.e.,bounding boxes). With the significant advancement in object detection, DVLU has witnessed great improvements in recent years, such as Visual Question Answering (VQA) and Visual Grounding (VG). However, existing DVLU methods always process each detected image region separately but ignore that they were an integral whole. Without the full consideration of each region’s context, the image’s understanding may contain more bias. In this paper, to solve the problem, a simple yet effective Region Collaborative Network (RCN) block is proposed to bridge the gap between independent regions and the integrative DVLU task. Specifically, the Intra-Region Relations (IntraRR) inside each detected region are computed by a position-wise and channel-wise joint non-local model. Then, the Inter-Region Relations (InterRR) across all the detected regions are computed by pooling and sharing parameters with IntraRR. The proposed RCN can enhance the features of each region by using information from all other regions and guarantees the dimension consistency between input and output. The RCN is evaluated on VQA and VG, and the experimental results show that our method can significantly improve the performance of existing DVLU models. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

19 pages, 1512 KiB  
Article
Multimodal Image Aesthetic Prediction with Missing Modality
by Xiaodan Zhang, Qiao Song and Gang Liu
Mathematics 2022, 10(13), 2312; https://doi.org/10.3390/math10132312 - 1 Jul 2022
Cited by 2 | Viewed by 2281
Abstract
With the increasing growth of multimedia data on the Internet, multimodal image aesthetic assessment has attracted a great deal of attention in the image processing community. However, traditional multimodal methods often have the following two problems: (1) Existing multimodal image aesthetic methods are [...] Read more.
With the increasing growth of multimedia data on the Internet, multimodal image aesthetic assessment has attracted a great deal of attention in the image processing community. However, traditional multimodal methods often have the following two problems: (1) Existing multimodal image aesthetic methods are based on the assumption that full modalities are available in all samples, which is unapplicable in most cases since textual information is more difficult to obtain. (2) They only fuse multimodal information at a single level and ignore their interaction at different levels. To address these two challenges, we proposed a novel framework termed Missing-Modility-Multimodal-Bert networks (MMMB). To achieve the completeness, we first generate the missing textual modality conditioned on the available visual modality. We then project the image features to the token space of the text, and use the transformer’s self-attention mechanism to make the two different modalities information interact at different levels for earlier and more fine-grained fusion, rather than only at the final layer. A large number of experiments on two large benchmark datasets in the field of image aesthetic quality evaluation: AVA and Photo.net demonstrate that the proposed model significantly improves image aesthetic assessment performance under both textual missing modality condition and full-modality condition. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

15 pages, 7865 KiB  
Article
Reduced Calibration Strategy Using a Basketball for RGB-D Cameras
by Luis-Rogelio Roman-Rivera, Israel Sotelo-Rodríguez, Jesus Carlos Pedraza-Ortega, Marco Antonio Aceves-Fernandez, Juan Manuel Ramos-Arreguín and Efrén Gorrostieta-Hurtado
Mathematics 2022, 10(12), 2085; https://doi.org/10.3390/math10122085 - 16 Jun 2022
Cited by 3 | Viewed by 1721
Abstract
RGB-D cameras produce depth and color information commonly used in the 3D reconstruction and vision computer areas. Different cameras with the same model usually produce images with different calibration errors. The color and depth layer usually requires calibration to minimize alignment errors, adjust [...] Read more.
RGB-D cameras produce depth and color information commonly used in the 3D reconstruction and vision computer areas. Different cameras with the same model usually produce images with different calibration errors. The color and depth layer usually requires calibration to minimize alignment errors, adjust precision, and improve data quality in general. Standard calibration protocols for RGB-D cameras require a controlled environment to allow operators to take many RGB and depth pair images as an input for calibration frameworks making the calibration protocol challenging to implement without ideal conditions and the operator experience. In this work, we proposed a novel strategy that simplifies the calibration protocol by requiring fewer images than other methods. Our strategy uses an ordinary object, a know-size basketball, as a ground truth sphere geometry during the calibration. Our experiments show comparable results requiring fewer images and non-ideal scene conditions than a reference method to align color and depth image layers. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning)
Show Figures

Figure 1

Back to TopTop