3D Image Processing: Progress and Challenges

A special issue of Journal of Imaging (ISSN 2313-433X). This special issue belongs to the section "Image and Video Processing".

Deadline for manuscript submissions: closed (28 February 2026) | Viewed by 6685

Special Issue Editor


E-Mail Website
Guest Editor
Department of Mechanical and Industrial Engineering, College of Engineering, Northeastern University, Vancouver, BC V6B 1Z3, Canada
Interests: graph signal processing; graph neural networks; image processing; 3D point cloud processing

Special Issue Information

Dear Colleagues,

3D image processing is becoming increasingly transformative across a wide range of application domains. Although initially developed for computer graphics, 3D imaging is now indispensable in fields such as autonomous driving, medical diagnostics, virtual and augmented reality, robotics, geospatial analysis, and industrial inspection. The widespread availability of affordable 3D sensors—such as LiDAR, depth cameras, and photogrammetry-based systems—has created a growing demand for scalable and robust 3D data-processing pipelines.

However, the unique characteristics of 3D data—such as irregular sampling, high dimensionality, sparsity, and sensitivity to noise—pose significant challenges in acquisition, sampling, restoration, segmentation, compression, and semantic understanding. Two complementary paradigms are proving particularly effective in addressing these issues: the model-based framework of Graph Signal Processing (GSP) and the data-driven approach of Graph Neural Networks (GNNs). Both are well-suited to modeling non-Euclidean structures and capturing the underlying geometric relationships present in 3D data.

To enhance interpretability while reducing reliance on large annotated datasets, researchers are increasingly exploring hybrid methods that integrate model-based priors with data-driven learning. These approaches offer efficient, interpretable, and generalizable solutions with fewer parameters. In parallel, advances in Large Language Models (LLMs) and multimodal foundation models are opening new avenues for cross-modal 3D understanding, including scene captioning, spatial reasoning, and task planning in 3D environments.

We are also witnessing rapid progress in novel 3D rendering techniques, such as 3D Gaussian Splatting, which enable photorealistic and efficient visualization of neural radiance fields and point clouds. Moreover, as autonomous agents and embodied AI systems become more prevalent, there is a pressing need for machine-optimized 3D coding schemes—beyond traditional human-centric visualization—to support real-time analytics, compression, and semantic understanding of 3D data by machines.

Notably, 3D imaging in the medical domain has seen rapid development, driven by advancements in modalities such as CT, MRI, and ultrasound. Recent research focuses on leveraging deep learning and graph-based methods for 3D tumor segmentation, organ reconstruction, surgical navigation, and cross-modal fusion of volumetric and surface data. These applications highlight the growing importance of accurate, efficient, and interpretable 3D models to support clinical workflows and decision-making.

This Special Issue welcomes high-quality contributions on topics including, but not limited to, 3D point cloud sampling and restoration, GSP- and GNN-based models, model-based deep learning, LLM-guided 3D analytics, advanced rendering methods like Gaussian splatting, and machine-optimized 3D coding. Interdisciplinary applications in healthcare, smart cities, robotics, and digital twins are particularly encouraged.

We invite submissions that address current challenges while proposing visionary concepts to shape the future roadmap of 3D image processing research.

Dr. Chinthaka Dinesh
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Imaging is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • 3D image processing
  • point cloud restoration
  • graph signal processing
  • graph neural networks
  • model-based deep learning
  • 3D gaussian splatting
  • large language models for 3D vision
  • 3D scene understanding
  • machine-optimized 3D coding
  • multimodal ai in 3D vision
  • 3D medical image segmentation
  • volumetric medical imaging

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 5436 KB  
Article
Self-Supervised Text-Driven Point Cloud Upsampling via Semantic Text Guidance
by Zhiyong Zhang, Meiling Qiu, Shuo Chen, Ruyu Liu, Jianhua Zhang and Shengyong Chen
J. Imaging 2026, 12(5), 204; https://doi.org/10.3390/jimaging12050204 - 11 May 2026
Viewed by 276
Abstract
Point cloud upsampling is a fundamental task in 3D vision, yet most existing methods adopt a global and uniform strategy, which is computationally inefficient and fails to address the need for region-specific refinement. To address this challenge, we propose PartSPUNet, a novel self-supervised, [...] Read more.
Point cloud upsampling is a fundamental task in 3D vision, yet most existing methods adopt a global and uniform strategy, which is computationally inefficient and fails to address the need for region-specific refinement. To address this challenge, we propose PartSPUNet, a novel self-supervised, text-driven point cloud upsampling framework designed to enhance robotic perception through task-oriented local refinement. Inspired by the human cognitive process where high-level language instructions guide visual attention to specific regions of interest, our method allows an operator to use intuitive natural language prompts to direct the upsampling process. Specifically, PartSPUNet leverages a pretrained vision–language model to zero-shot localize the user-specified semantic part within a sparse point cloud. It then performs geometry-aware densification exclusively on this target region, recovering rich geometric details while preserving the global structure. Experimental results demonstrate that our approach significantly outperforms existing methods in reconstructing specified areas, offering a powerful and intuitive tool for enhancing the 3D perception pipeline in intelligent robotic systems. Full article
(This article belongs to the Special Issue 3D Image Processing: Progress and Challenges)
Show Figures

Figure 1

26 pages, 1315 KB  
Article
SFD-ADNet: Spatial–Frequency Dual-Domain Adaptive Deformation for Point Cloud Data Augmentation
by Jiacheng Bao, Lingjun Kong and Wenju Wang
J. Imaging 2026, 12(2), 58; https://doi.org/10.3390/jimaging12020058 - 26 Jan 2026
Cited by 1 | Viewed by 677
Abstract
Existing 3D point cloud enhancement methods typically rely on artificially designed geometric transformations or local blending strategies, which are prone to introducing illogical deformations, struggle to preserve global structure, and exhibit insufficient adaptability to diverse degradation patterns. To address these limitations, this paper [...] Read more.
Existing 3D point cloud enhancement methods typically rely on artificially designed geometric transformations or local blending strategies, which are prone to introducing illogical deformations, struggle to preserve global structure, and exhibit insufficient adaptability to diverse degradation patterns. To address these limitations, this paper proposes SFD-ADNet—an adaptive deformation framework based on a dual spatial–frequency domain. It achieves 3D point cloud augmentation by explicitly learning deformation parameters rather than applying predefined perturbations. By jointly modeling spatial structural dependencies and spectral features, SFD-ADNet generates augmented samples that are both structurally aware and task-relevant. In the spatial domain, a hierarchical sequence encoder coupled with a bidirectional Mamba-based deformation predictor captures long-range geometric dependencies and local structural variations, enabling adaptive position-aware deformation control. In the frequency domain, a multi-scale dual-channel mechanism based on adaptive Chebyshev polynomials separates low-frequency structural components from high-frequency details, allowing the model to suppress noise-sensitive distortions while preserving the global geometric skeleton. The two deformation predictions dynamically fuse to balance structural fidelity and sample diversity. Extensive experiments conducted on ModelNet40-C and ScanObjectNN-C involved synthetic CAD models and real-world scanned point clouds under diverse perturbation conditions. SFD-ADNet, as a universal augmentation module, reduces the mCE metrics of PointNet++ and different backbone networks by over 20%. Experiments demonstrate that SFD-ADNet achieves state-of-the-art robustness while preserving critical geometric structures. Furthermore, models enhanced by SFD-ADNet demonstrate consistently improved robustness against diverse point cloud attacks, validating the efficacy of adaptive space-frequency deformation in robust point cloud learning. Full article
(This article belongs to the Special Issue 3D Image Processing: Progress and Challenges)
Show Figures

Figure 1

14 pages, 1003 KB  
Article
Use of Patient-Specific 3D Models in Paediatric Surgery: Effect on Communication and Surgical Management
by Cécile O. Muller, Lydia Helbling, Theodoros Xydias, Jeanette Greiner, Valérie Oesch, Henrik Köhler, Tim Ohletz and Jatta Berberat
J. Imaging 2026, 12(2), 56; https://doi.org/10.3390/jimaging12020056 - 26 Jan 2026
Viewed by 842
Abstract
Children with rare tumours and malformations may benefit from innovative imaging, including patient-specific 3D models that can enhance communication and surgical planning. The primary aim was to evaluate the impact of patient-specific 3D models on communication with families. The secondary aims were to [...] Read more.
Children with rare tumours and malformations may benefit from innovative imaging, including patient-specific 3D models that can enhance communication and surgical planning. The primary aim was to evaluate the impact of patient-specific 3D models on communication with families. The secondary aims were to assess their influence on medical management and to establish an efficient post-processing workflow. From 2021 to 2024, we prospectively included patients aged 3 months to 18 years with rare tumours or malformations. Families completed questionnaires before and after the presentation of a 3D model generated from MRI sequences, including peripheral nerve tractography. Treating physicians completed a separate questionnaire before surgical planning. Analyses were performed in R. Among 21 patients, diagnoses included 11 tumours, 8 malformations, 1 trauma, and 1 pancreatic pseudo-cyst. Likert scale responses showed improved family understanding after viewing the 3D model (mean score 3.94 to 4.67) and a high overall evaluation (mean 4.61). Physicians also rated the models positively. An efficient image post-processing workflow was defined. Although manual 3D reconstruction remains time-consuming, these preliminary results show that colourful, patient-specific 3D models substantially improve family communication and support clinical decision-making. They also highlight the need for supporting the development of MRI-based automated segmentation softwares using deep neural networks, which are clinically approved and usable in routine practice. Full article
(This article belongs to the Special Issue 3D Image Processing: Progress and Challenges)
Show Figures

Graphical abstract

16 pages, 5440 KB  
Article
Pov9D: Point Cloud-Based Open-Vocabulary 9D Object Pose Estimation
by Tianfu Wang and Hongguang Wang
J. Imaging 2025, 11(11), 380; https://doi.org/10.3390/jimaging11110380 - 28 Oct 2025
Cited by 1 | Viewed by 1240
Abstract
We propose a point cloud-based framework for open-vocabulary object pose estimation, called Pov9D. Existing approaches are predominantly RGB-based and often rely on texture or appearance cues, making them susceptible to pose ambiguities when objects are textureless or lack distinctive visual features. In contrast, [...] Read more.
We propose a point cloud-based framework for open-vocabulary object pose estimation, called Pov9D. Existing approaches are predominantly RGB-based and often rely on texture or appearance cues, making them susceptible to pose ambiguities when objects are textureless or lack distinctive visual features. In contrast, Pov9D takes 3D point clouds as input, enabling direct access to geometric structures that are essential for accurate and robust pose estimation, especially in open-vocabulary settings. To bridge the gap between geometric observations and semantic understanding, Pov9D integrates category-level textual descriptions to guide the estimation process. To this end, we introduce a text-conditioned shape prior generator that predicts a normalized object shape from both the observed point cloud and the textual category description. This shape prior provides a consistent geometric reference, facilitating precise prediction of object translation, rotation, and size, even for unseen categories. Extensive experiments on the OO3D-9D benchmark demonstrate that Pov9D achieves state-of-the-art performance, improving Abs IoU@50 by 7.2% and Rel 10° 10 cm by 27.2% over OV9D. Full article
(This article belongs to the Special Issue 3D Image Processing: Progress and Challenges)
Show Figures

Figure 1

20 pages, 14512 KB  
Article
Dual-Attention-Based Block Matching for Dynamic Point Cloud Compression
by Longhua Sun, Yingrui Wang and Qing Zhu
J. Imaging 2025, 11(10), 332; https://doi.org/10.3390/jimaging11100332 - 25 Sep 2025
Cited by 1 | Viewed by 1045
Abstract
The irregular and highly non-uniform spatial distribution inherent to dynamic three-dimensional (3D) point clouds (DPCs) severely hampers the extraction of reliable temporal context, rendering inter-frame compression a formidable challenge. Inspired by two-dimensional (2D) image and video compression methods, existing approaches attempt to model [...] Read more.
The irregular and highly non-uniform spatial distribution inherent to dynamic three-dimensional (3D) point clouds (DPCs) severely hampers the extraction of reliable temporal context, rendering inter-frame compression a formidable challenge. Inspired by two-dimensional (2D) image and video compression methods, existing approaches attempt to model the temporal dependence of DPCs through a motion estimation/motion compensation (ME/MC) framework. However, these approaches represent only preliminary applications of this framework; point consistency between adjacent frames is insufficiently explored, and temporal correlation requires further investigation. To address this limitation, we propose a hierarchical ME/MC framework that adaptively selects the granularity of the estimated motion field, thereby ensuring a fine-grained inter-frame prediction process. To further enhance motion estimation accuracy, we introduce a dual-attention-based KNN block-matching (DA-KBM) network. This network employs a bidirectional attention mechanism to more precisely measure the correlation between points, using closely correlated points to predict inter-frame motion vectors and thereby improve inter-frame prediction accuracy. Experimental results show that the proposed DPC compression method achieves a significant improvement (gain of 70%) in the BD-Rate metric on the 8iFVBv2 dataset. compared with the standardized Video-based Point Cloud Compression (V-PCC) v13 method, and a 16% gain over the state-of-the-art deep learning-based inter-mode method. Full article
(This article belongs to the Special Issue 3D Image Processing: Progress and Challenges)
Show Figures

Figure 1

19 pages, 6571 KB  
Article
From Brain Lobes to Neurons: Navigating the Brain Using Advanced 3D Modeling and Visualization Tools
by Mohamed Rowaizak, Ahmad Farhat and Reem Khalil
J. Imaging 2025, 11(9), 298; https://doi.org/10.3390/jimaging11090298 - 1 Sep 2025
Cited by 2 | Viewed by 1796
Abstract
Neuroscience education must convey 3D structure with clarity and accuracy. Traditional 2D renderings are limited as they lose depth information and hinder spatial understanding. High-resolution resources now exist, yet many are difficult to use in the class. Therefore, we developed an educational brain [...] Read more.
Neuroscience education must convey 3D structure with clarity and accuracy. Traditional 2D renderings are limited as they lose depth information and hinder spatial understanding. High-resolution resources now exist, yet many are difficult to use in the class. Therefore, we developed an educational brain video that moves from gross to microanatomy using MRI-based models and the published literature. The pipeline used Fiji for preprocessing, MeshLab for mesh cleanup, Rhino 6 for target fixes, Houdini FX for materials, lighting, and renders, and Cinema4D for final refinement of the video. We had our brain models validated by two neuroscientists for educational fidelity. We tested the video in a class with 96 undergraduates randomized to video and lecture or lecture only. Students completed the same pretest and posttest questions. Student feedback revealed that comprehension and motivation to learn increased significantly in the group that watched the video, suggesting its potential as a useful supplement to traditional lectures. A short, well-produced 3D video can supplement lectures and improve learning in this setting. We share software versions and key parameters to support reuse. Full article
(This article belongs to the Special Issue 3D Image Processing: Progress and Challenges)
Show Figures

Figure 1

Back to TopTop