electronics-logo

Journal Browser

Journal Browser

Digital Signal and Image Processing for Multimedia Technology

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (15 October 2025) | Viewed by 15486

Special Issue Editors


E-Mail Website
Guest Editor
Department of Information and Computer Engineering, Chung Yuan Christian University, Taoyuan City 320314, Taiwan
Interests: artificial intelligence; machine learning; deep learning; virtual reality
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Determining how to employ deep learning technology has become a primary research topic in numerous fields. These include, for example, image processing, computer vision, the Internet of Things, natural language processing, and multimedia processing. In addition, due to the increasing process power of electronic devices and the expansion of network transmission bandwidth, deep learning models have begun to be embedded in various edge devices for application in several fields, such as automobiles, transportation, education, manufacturing, and many others.

In this Special Issue, entitled "Deep Learning Applications in Image Processing and Edge Devices", we invite authors to submit original research articles and review articles related to the application of deep learning techniques in image processing and edge devices.

We are open to papers addressing a wide range of topics, including deep learning for image analysis problems, novel algorithms for applying deep learning to various computer vision domains, and innovative methods for porting deep learning models to edge devices.

Topics of interest in this Special Issue include, but are not limited to, the following:

  • Machine learning and deep learning for image processing and computer vision;
  • Deep learning algorithms for clustering and classification;
  • Deep learning algorithms for segmentation and data annotation;
  • Embedded multimedia applications for edge computing;
  • Novel applications in robotic vision and intelligent consumer electronics;
  • Application architecture of AI-based systems.

Dr. Chi-hung Chuang
Prof. Dr. Chih-Lung Lin
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image processing
  • computer vision
  • deep learning
  • neural network
  • artificial intelligence
  • multimedia processing

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issue

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 558 KB  
Article
FPGA-Accelerated Multi-Resolution Spline Reconstruction for Real-Time Multimedia Signal Processing
by Manuel J. C. S. Reis
Electronics 2026, 15(1), 173; https://doi.org/10.3390/electronics15010173 - 30 Dec 2025
Viewed by 848
Abstract
This paper presents an FPGA-based architecture for real-time spline-based signal reconstruction, targeted at multimedia signal processing applications. Leveraging the multi-resolution properties of B-splines, the proposed design enables efficient upsampling, denoising, and feature preservation for image and video signals. Implemented on a mid-range FPGA, [...] Read more.
This paper presents an FPGA-based architecture for real-time spline-based signal reconstruction, targeted at multimedia signal processing applications. Leveraging the multi-resolution properties of B-splines, the proposed design enables efficient upsampling, denoising, and feature preservation for image and video signals. Implemented on a mid-range FPGA, the system supports parallel processing of multiple channels, with low-latency memory access and pipelined arithmetic units. The proposed pipeline achieves a throughput of up to 33.1 megasmples per second for 1D signals and 19.4 megapixels per second for 2D images, while maintaining average power consumption below 250 mW. Compared to CPU and embedded GPU implementations, the design delivers >15× improvement in energy efficiency and deterministic low-latency performance (8–12 clock cycles). A key novelty lies in combining multi-resolution B-spline reconstruction with fixed-point arithmetic and streaming-friendly pipelining, making the architecture modular, compact, and robust to varying input rates. Benchmarking results on synthetic and real multimedia datasets show significant improvements in throughput and energy efficiency compared to conventional CPU and GPU implementations. The architecture supports flexible resolution scaling, making it suitable for edge-computing scenarios in multimedia environments. Full article
(This article belongs to the Special Issue Digital Signal and Image Processing for Multimedia Technology)
Show Figures

Figure 1

21 pages, 424 KB  
Article
MultiHeadEEGModelCLS: Contextual Alignment and Spatio-Temporal Attention Model for EEG-Based SSVEP Classification
by Vangelis P. Oikonomou
Electronics 2025, 14(22), 4394; https://doi.org/10.3390/electronics14224394 - 11 Nov 2025
Cited by 2 | Viewed by 1074
Abstract
Steady-State Visual Evoked Potentials (SSVEPs) offer a robust basis for brain–computer interface (BCI) systems due to their high signal-to-noise ratio, minimal user training requirements, and suitability for real-time decoding. In this work, we propose MultiHeadEEGModelCLS, a novel Transformer-based architecture that integrates context-aware representation [...] Read more.
Steady-State Visual Evoked Potentials (SSVEPs) offer a robust basis for brain–computer interface (BCI) systems due to their high signal-to-noise ratio, minimal user training requirements, and suitability for real-time decoding. In this work, we propose MultiHeadEEGModelCLS, a novel Transformer-based architecture that integrates context-aware representation learning into SSVEP decoding. The model employs a dual-stream spatio-temporal encoder to process both the input EEG trial and a contextual signal (e.g., template or reference trial), enhanced by a learnable classification ([CLS]) token. Through self-attention and cross-attention mechanisms, the model aligns trial-level representations with contextual cues. The architecture supports multi-task learning via signal reconstruction and context-informed classification heads. Evaluation on benchmark datasets (Speller and BETA) demonstrates state-of-the-art performance, particularly under limited data and short time window scenarios, achieving higher classification accuracy and information transfer rates (ITR) compared to existing deep learning methods such as the multi-branch CNN (ConvDNN). Our method achieved an ITR of 283 bits/min and 222 bits/min for the Speller and BETA datasets, and a ConvDNN of 238 bits/min and 181 bits/min. These results highlight the effectiveness of contextual modeling in enhancing the robustness and efficiency of SSVEP-based BCIs. Full article
(This article belongs to the Special Issue Digital Signal and Image Processing for Multimedia Technology)
Show Figures

Figure 1

18 pages, 6703 KB  
Article
Lightweight Attention-Based Architecture for Accurate Melanoma Recognition
by Mohammad J. Beirami, Fiona Gruzmark, Rayyan Manwar, Maria Tsoukas and Kamran Avanaki
Electronics 2025, 14(21), 4281; https://doi.org/10.3390/electronics14214281 - 31 Oct 2025
Cited by 1 | Viewed by 611
Abstract
Dermoscopy, a non-invasive imaging technique, has transformed dermatology by enabling early detection and differentiation of skin conditions. Integrating deep learning with dermoscopic images enhances diagnostic potential but raises computational challenges. This study introduces APNet, an attention-based architecture designed for melanoma detection, offering fewer [...] Read more.
Dermoscopy, a non-invasive imaging technique, has transformed dermatology by enabling early detection and differentiation of skin conditions. Integrating deep learning with dermoscopic images enhances diagnostic potential but raises computational challenges. This study introduces APNet, an attention-based architecture designed for melanoma detection, offering fewer parameters than conventional convolutional neural networks. Two baseline models are considered: HU-Net, a trimmed U-Net that uses only the encoding path for classification, and Pocket-Net, a lightweight U-Net variant that reduces parameters through fewer feature maps and efficient convolutions. While Pocket-Net is highly resource-efficient, its simplification can reduce performance. APNet extends Pocket-Net by incorporating squeeze-and-excitation (SE) attention blocks into the encoding path. These blocks adaptively highlight the most relevant dermoscopic features, such as subtle melanoma patterns, improving classification accuracy. The study evaluates APNet against Pocket-Net and HU-Net using four large, annotated dermoscopy datasets (ISIC 2017–2020), covering melanoma, benign nevi, and other lesions. Results show that APNet achieves faster processing than HU-Net while overcoming the performance loss observed in Pocket-Net. By reducing parameters without sacrificing accuracy, APNet provides a practical solution for computationally demanding dermoscopy, offering efficient and accurate melanoma detection where medical imaging resources are limited. Full article
(This article belongs to the Special Issue Digital Signal and Image Processing for Multimedia Technology)
Show Figures

Figure 1

21 pages, 4464 KB  
Article
Chest X-Ray Medical Report Generation Using a CNN—Transformer Model with Maximum Attention
by Mei-Hua Hsih, Shih-Po Lin and Chen-Chiung Hsieh
Electronics 2025, 14(20), 4123; https://doi.org/10.3390/electronics14204123 - 21 Oct 2025
Viewed by 2271
Abstract
Medical imaging, particularly chest X-rays, plays a vital role in radiological diagnosis. However, interpreting these images and generating detailed diagnostic reports is a time-consuming task for clinicians. To address this challenge, this study proposes an automated image captioning framework for chest X-ray images, [...] Read more.
Medical imaging, particularly chest X-rays, plays a vital role in radiological diagnosis. However, interpreting these images and generating detailed diagnostic reports is a time-consuming task for clinicians. To address this challenge, this study proposes an automated image captioning framework for chest X-ray images, aiming to reduce clinical workload and enhance diagnostic efficiency. The proposed approach employs convolutional neural networks (CNNs) for visual feature extraction and a modified Transformer architecture—referred to as the Medical Transformer—for structured report generation. Three CNN models, namely InceptionV3, ResNet152V2, and Inception–ResNetV2, were evaluated as feature extractors. The attention mechanisms, Bahdanau, Luong, and scaled dot product, were activated by ReLU or Tanh functions to identify the optimal configuration, i.e., the maximum attention is used. Experiments were conducted using the Indiana University Chest X-ray dataset, which contains 7466 images paired with corresponding diagnostic reports. The proposed approach employs image augmentation to accommodate input variability, utilizes Inception–ResNetV2 for feature extraction, and integrates the Medical Transformer with maximum attention mechanisms to achieve optimal performance in medical report generation. Evaluation metrics include BLEU (BLEU-1 to BLEU-4 scores of 0.720, 0.669, 0.648, and 0.600, respectively), METEOR (0.741), and BERTScore (FBERT = 0.787), demonstrating superior performance compared to baseline models and the state of the art. These results validate the effectiveness of the proposed Medical Transformer framework in generating accurate and clinically relevant medical image captions. Full article
(This article belongs to the Special Issue Digital Signal and Image Processing for Multimedia Technology)
Show Figures

Figure 1

14 pages, 331 KB  
Article
Flow Matching for Simulation-Based Inference: Design Choices and Implications
by Massimiliano Giordano Orsini, Alessio Ferone, Laura Inno, Angelo Casolaro and Antonio Maratea
Electronics 2025, 14(19), 3833; https://doi.org/10.3390/electronics14193833 - 27 Sep 2025
Viewed by 2402
Abstract
Inverse problems are ubiquitous across many scientific fields, and involve the determination of the causes or parameters of a system from observations of its effects or outputs. These problems have been deeply studied through the use of simulated data, thereby under the lens [...] Read more.
Inverse problems are ubiquitous across many scientific fields, and involve the determination of the causes or parameters of a system from observations of its effects or outputs. These problems have been deeply studied through the use of simulated data, thereby under the lens of simulation-based inference. Recently, the natural combination of Continuous Normalizing Flows (CNFs) and Flow Matching Posterior Estimation (FMPE) has emerged as a novel, powerful, and scalable posterior estimator, capable of inferring the distribution of free parameters in a significantly reduced computational time compared to conventional techniques. While CNFs provide substantial flexibility in designing machine learning solutions, modeling decisions during their implementation can strongly influence predictive performance. To the best of our knowledge, no prior work has systematically analyzed how such modeling choices affect the robustness of posterior estimates in this framework. The aim of this work is to address this research gap by investigating the sensitivity of CNFs trained with FMPE under different modeling decisions, including data preprocessing, noise conditioning, and noisy observations. As a case study, we consider atmospheric retrieval of exoplanets and perform an extensive experimental campaign on the Ariel Data Challenge 2023 dataset. Through a comprehensive posterior evaluation framework, we demonstrate that (i) Z-score normalization outperforms min–max scaling across tasks; (ii) noise conditioning improves accuracy, coverage, and uncertainty estimation; and (iii) noisy observations significantly degrade predictive performance, thus underscoring reduced robustness under the assumed noise conditions. Full article
(This article belongs to the Special Issue Digital Signal and Image Processing for Multimedia Technology)
Show Figures

Figure 1

29 pages, 5334 KB  
Article
A Novel Self-Recovery Fragile Watermarking Scheme Based on Convolutional Autoencoder
by Chin-Feng Lee, Tong-Ming Li, Iuon-Chang Lin and Anis Ur Rehman
Electronics 2025, 14(18), 3595; https://doi.org/10.3390/electronics14183595 - 10 Sep 2025
Viewed by 1306
Abstract
In the digital era where images are easily accessible, concerns about image authenticity and integrity are increasing. To address this, we propose a deep learning-based fragile watermarking method for secure image authentication and content recovery. The method utilizes bottleneck features extracted by the [...] Read more.
In the digital era where images are easily accessible, concerns about image authenticity and integrity are increasing. To address this, we propose a deep learning-based fragile watermarking method for secure image authentication and content recovery. The method utilizes bottleneck features extracted by the convolutional encoder to carry both authentication and recovery information and employs deconvolution at the decoder to reconstruct image content. Additionally, the Arnold Transform is applied to scramble feature information, effectively enhancing resistance to collage attacks. At the detection stage, block voting and morphological closing operations improve tamper localization accuracy and robustness. Experiments tested various tampering ratios, with performance evaluated by PSNR, SSIM, precision, recall, and F1-score. Experiments under varying tampering ratios demonstrate that the proposed method maintains high visual quality and achieves reliable tamper detection and recovery, even at 75% tampering. Evaluation metrics including PSNR, SSIM, precision, recall, and F1-score confirm the effectiveness and practical applicability of the method. Full article
(This article belongs to the Special Issue Digital Signal and Image Processing for Multimedia Technology)
Show Figures

Figure 1

19 pages, 2082 KB  
Article
Multi-Scale Grid-Based Semantic Surface Point Generation for 3D Object Detection
by Xin-Fu Chen, Chun-Chieh Lee, Jung-Hua Lo, Chi-Hung Chuang and Kuo-Chin Fan
Electronics 2025, 14(17), 3492; https://doi.org/10.3390/electronics14173492 - 31 Aug 2025
Viewed by 1109
Abstract
3D object detection is a crucial technology in fields such as autonomous driving and robotics. As a direct representation of the 3D world, point cloud data plays a vital role in feature extraction and geometric representation. However, in real-world applications, point cloud data [...] Read more.
3D object detection is a crucial technology in fields such as autonomous driving and robotics. As a direct representation of the 3D world, point cloud data plays a vital role in feature extraction and geometric representation. However, in real-world applications, point cloud data often suffers from occlusion, resulting in incomplete observations and degraded detection performance. Existing methods, such as PG-RCNN, generate semantic surface points within each Region of Interest (RoI) using a single grid size. However, a fixed grid scale cannot adequately capture multi-scale features. A grid that is too small may miss fine structures—especially problematic when dealing with small or sparse objects—while a grid that is too large may introduce excessive background noise, reducing the precision of feature representation. To address this issue, we propose an enhanced PG-RCNN architecture with a Multi-Scale Grid Attention Module as the core contribution. This module improves the expressiveness of point features by aggregating multi-scale information and dynamically weighting features from different grid resolutions. Using a simple linear transformation, we generate attention weights to guide the model to focus on regions that contribute more to object recognition, while effectively filtering out redundant noise. We evaluate our method on the KITTI 3D object detection validation set. Experimental results show that, compared to the original PG-RCNN, our approach improves performance on the Cyclist category by 2.66% and 2.54% in the Moderate and Hard settings, respectively. Additionally, our approach shows more stable performance on small object detection tasks, with an average improvement of 2.57%, validating the positive impact of the Multi-Scale Grid Attention Module on fine-grained geometric modeling, and highlighting the efficiency and generalizability of our model. Full article
(This article belongs to the Special Issue Digital Signal and Image Processing for Multimedia Technology)
Show Figures

Figure 1

18 pages, 15722 KB  
Article
PANDA: A Polarized Attention Network for Enhanced Unsupervised Domain Adaptation in Semantic Segmentation
by Chiao-Wen Kao, Wei-Ling Chang, Chun-Chieh Lee and Kuo-Chin Fan
Electronics 2024, 13(21), 4302; https://doi.org/10.3390/electronics13214302 - 31 Oct 2024
Viewed by 2461
Abstract
Unsupervised domain adaptation (UDA) focuses on transferring knowledge from the labeled source domain to the unlabeled target domain, reducing the costs of manual data labeling. The main challenge in UDA is bridging the substantial feature distribution gap between the source and target domains. [...] Read more.
Unsupervised domain adaptation (UDA) focuses on transferring knowledge from the labeled source domain to the unlabeled target domain, reducing the costs of manual data labeling. The main challenge in UDA is bridging the substantial feature distribution gap between the source and target domains. To address this, we propose Polarized Attention Network Domain Adaptation (PANDA), a novel approach that leverages Polarized Self-Attention (PSA) to capture the intricate relationships between the source and target domains, effectively mitigating domain discrepancies. PANDA integrates both channel and spatial information, allowing it to capture detailed features and overall structures simultaneously. Our proposed method significantly outperforms current state-of-the-art unsupervised domain adaptation (UDA) techniques for semantic segmentation tasks. Specifically, it achieves a notable improvement in mean intersection over union (mIoU), with a 0.2% increase for the GTA→Cityscapes benchmark and a substantial 1.4% gain for the SYNTHIA→Cityscapes benchmark. As a result, our method attains mIoU scores of 76.1% and 68.7%, respectively, which reflect meaningful advancements in model accuracy and domain adaptation performance. Full article
(This article belongs to the Special Issue Digital Signal and Image Processing for Multimedia Technology)
Show Figures

Figure 1

15 pages, 3509 KB  
Article
Dense Feature Pyramid Deep Completion Network
by Xiaoping Yang, Ping Ni, Zhenhua Li and Guanghui Liu
Electronics 2024, 13(17), 3490; https://doi.org/10.3390/electronics13173490 - 2 Sep 2024
Viewed by 1764
Abstract
Most current point cloud super-resolution reconstruction requires huge calculations and has low accuracy when facing large outdoor scenes; a Dense Feature Pyramid Network (DenseFPNet) is proposed for the feature-level fusion of images with low-resolution point clouds to generate higher-resolution point clouds, which can [...] Read more.
Most current point cloud super-resolution reconstruction requires huge calculations and has low accuracy when facing large outdoor scenes; a Dense Feature Pyramid Network (DenseFPNet) is proposed for the feature-level fusion of images with low-resolution point clouds to generate higher-resolution point clouds, which can be utilized to solve the problem of the super-resolution reconstruction of 3D point clouds by turning it into a 2D depth map complementation problem, which can reduce the time and complexity of obtaining high-resolution point clouds only by LiDAR. The network first utilizes an image-guided feature extraction network based on RGBD-DenseNet as an encoder to extract multi-scale features, followed by an upsampling block as a decoder to gradually recover the size and details of the feature map. Additionally, the network connects the corresponding layers of the encoder and decoder through pyramid connections. Finally, experiments are conducted on the KITTI deep complementation dataset, and the network performs well in various metrics compared to other networks. It improves the RMSE by 17.71%, 16.60%, 7.11%, and 4.68% compared to the CSPD, Spade-RGBsD, Sparse-to-Dense, and GAENET. Full article
(This article belongs to the Special Issue Digital Signal and Image Processing for Multimedia Technology)
Show Figures

Figure 1

Back to TopTop