New Insights into Computer Vision and Graphics

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 30 October 2024 | Viewed by 2314

Special Issue Editor


E-Mail Website
Guest Editor
Department of Information Engineering, China University of Geosciences, Wuhan 430075, China
Interests: computer vision; deep learning; image and video understanding

Special Issue Information

Dear Colleagues,

Application trends, device technologies, and the blurring of boundaries between disciplines are propelling information technology forward. This poses new challenges in the study of visual computing-based interactive graphics processing technology. Therefore, this Special Issue intends to presentation new ideas and experimental discoveries in the field of computer vision and graphics, from its design, service, and theory, to its applications.

Computer vision and graphics focus on the computational processing and applications of visual data. Areas relevant to computer vision and graphics include, but are not limited to, robotics, medical imaging, security and surveillance, gaming and entertainment, education and training, art and design, environmental monitoring, etc. High-speed processing techniques and real-time performance, developing and refining deep learning techniques for computer vision and graphics applications, and explainable AI techniques to improve the transparency and interpretability of AI models are all topics of interest.

This Special Issue will publish high-quality, original research papers in overlapping fields, including the following:

  • Image processing/analysis;
  • Computer vision theory and application;
  • Video and audio encoding;
  • Motion detection and tracking;
  • Reconstruction and representation;
  • Facial and hand gesture recognition;
  • Rendering techniques;
  • Matching, inference, and recognition;
  • Geometric modeling;
  • 3D vision;
  • Graph-based learning and applications.

Dr. Yuanyuan Liu
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image processing/analysis
  • computer vision theory and application
  • video and audio encoding
  • motion detection and tracking
  • reconstruction and representation
  • facial and hand gesture recognition
  • rendering techniques
  • matching, inference, and recognition
  • geometric modeling
  • 3D vision
  • graph-based learning and applications

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 1493 KiB  
Article
Hypergraph Position Attention Convolution Networks for 3D Point Cloud Segmentation
by Yanpeng Rong, Liping Nong, Zichen Liang, Zhuocheng Huang, Jie Peng and Yiping Huang
Appl. Sci. 2024, 14(8), 3526; https://doi.org/10.3390/app14083526 - 22 Apr 2024
Viewed by 415
Abstract
Point cloud segmentation, as the basis for 3D scene understanding and analysis, has made significant progress in recent years. Graph-based modeling and learning methods have played an important role in point cloud segmentation. However, due to the inherent complexity of point cloud data, [...] Read more.
Point cloud segmentation, as the basis for 3D scene understanding and analysis, has made significant progress in recent years. Graph-based modeling and learning methods have played an important role in point cloud segmentation. However, due to the inherent complexity of point cloud data, it is difficult to capture higher-order and complex features of 3D data using graph learning methods. In addition, how to quickly and efficiently extract important features from point clouds also poses a great challenge to the current research. To address these challenges, we propose a new framework, called hypergraph position attention convolution networks (HGPAT), for point cloud segmentation. Firstly, we use hypergraph to model the higher-order relationships among point clouds. Secondly, in order to effectively learn the feature information of point cloud data, a hyperedge position attention convolution module is proposed, which utilizes the hyperedge–hyperedge propagation pattern to extract and aggregate more important features. Finally, we design a ResNet-like module to reduce the computational complexity of the network and improve its efficiency. We have conducted point cloud segmentation experiments on the ShapeNet Part and S3IDS datasets, and the experimental results demonstrate the effectiveness of the proposed method compared with the state-of-the-art ones. Full article
(This article belongs to the Special Issue New Insights into Computer Vision and Graphics)
Show Figures

Figure 1

13 pages, 12039 KiB  
Article
Camera Path Generation for Triangular Mesh Using Toroidal Patches
by Jinyoung Choi, Kangmin Kim, Seongil Kim, Minseok Kim, Taekgwan Nam and Youngjin Park
Appl. Sci. 2024, 14(2), 490; https://doi.org/10.3390/app14020490 - 5 Jan 2024
Viewed by 637
Abstract
Triangular mesh data structures are principal in computer graphics, serving as the foundation for many 3D models. To effectively utilize these 3D models across diverse industries, it is important to understand the model’s overall shape and geometric features thoroughly. In this work, we [...] Read more.
Triangular mesh data structures are principal in computer graphics, serving as the foundation for many 3D models. To effectively utilize these 3D models across diverse industries, it is important to understand the model’s overall shape and geometric features thoroughly. In this work, we introduce a novel method for generating camera paths that emphasize the model’s local geometric characteristics. This method uses a toroidal patch-based spatial data structure, approximating the mesh’s faces within a predetermined tolerance ϵ, encapsulating their geometric intricacies. This facilitates the determination of the camera position and gaze path, ensuring the mesh’s key characteristics are captured. During the path construction, we create a bounding cylinder for the mesh, project the mesh’s faces and associated toroidal patches onto the cylinder’s lateral surface, and sequentially select grids of the cylinder containing the highest number of toroidal patches as we traverse the lateral surface. The centers of the selected grids are used as control points for a periodic B-spline curve, which serves as our foundational path. After initial curve generation, we generated camera position and gaze path from the curve by multiplying factors to ensure a uniform camera amplitude. We applied our method to ten triangular mesh models, demonstrating its effectiveness and adaptability across various mesh configurations. Full article
(This article belongs to the Special Issue New Insights into Computer Vision and Graphics)
Show Figures

Figure 1

28 pages, 4448 KiB  
Article
ED2IF2-Net: Learning Disentangled Deformed Implicit Fields and Enhanced Displacement Fields from Single Images Using Pyramid Vision Transformer
by Xiaoqiang Zhu, Xinsheng Yao, Junjie Zhang, Mengyao Zhu, Lihua You, Xiaosong Yang, Jianjun Zhang, He Zhao and Dan Zeng
Appl. Sci. 2023, 13(13), 7577; https://doi.org/10.3390/app13137577 - 27 Jun 2023
Viewed by 857
Abstract
There has emerged substantial research in addressing single-view 3D reconstruction and the majority of the state-of-the-art implicit methods employ CNNs as the backbone network. On the other hand, transformers have shown remarkable performance in many vision tasks. However, it is still unknown whether [...] Read more.
There has emerged substantial research in addressing single-view 3D reconstruction and the majority of the state-of-the-art implicit methods employ CNNs as the backbone network. On the other hand, transformers have shown remarkable performance in many vision tasks. However, it is still unknown whether transformers are suitable for single-view implicit 3D reconstruction. In this paper, we propose the first end-to-end single-view 3D reconstruction network based on the Pyramid Vision Transformer (PVT), called ED2IF2-Net, which disentangles the reconstruction of an implicit field into the reconstruction of topological structures and the recovery of surface details to achieve high-fidelity shape reconstruction. ED2IF2-Net uses a Pyramid Vision Transformer encoder to extract multi-scale hierarchical local features and a global vector of the input single image, which are fed into three separate decoders. A coarse shape decoder reconstructs a coarse implicit field based on the global vector, a deformation decoder iteratively refines the coarse implicit field using the pixel-aligned local features to obtain a deformed implicit field through multiple implicit field deformation blocks (IFDBs), and a surface detail decoder predicts an enhanced displacement field using the local features with hybrid attention modules (HAMs). The final output is a fusion of the deformed implicit field and the enhanced displacement field, with four loss terms applied to reconstruct the coarse implicit field, structure details through a novel deformation loss, overall shape after fusion, and surface details via a Laplacian loss. The quantitative results obtained from the ShapeNet dataset validate the exceptional performance of ED2IF2-Net. Notably, ED2IF2-Net-L stands out as the top-performing variant, exhibiting the highest mean IoU, CD, EMD, ECD-3D, and ECD-2D scores, reaching impressive values of 61.1, 7.26, 2.51, 6.08, and 1.84, respectively. The extensive experimental evaluations consistently demonstrate the state-of-the-art capabilities of ED2IF2-Net in terms of reconstructing topological structures and recovering surface details, all while maintaining competitive inference time. Full article
(This article belongs to the Special Issue New Insights into Computer Vision and Graphics)
Show Figures

Figure 1

Back to TopTop