sensors-logo

Journal Browser

Journal Browser

Deep Learning Technology and Image Sensing: 2nd Edition

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: 5 March 2026 | Viewed by 10199

Special Issue Editors


E-Mail Website
Guest Editor
Division of Computer Engineering, Dongseo University, 47 Jurye Road, Sasang-gu, Busan 47011, Republic of Korea
Interests: image deconvolution/restoration; color image compression; computer vision; deep learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Machine Learning/Deep Learning Research Labs, Department of Computer Engineering, Dongseo University, Busan 47011, Republic of Korea
Interests: multi-agent reinforcement learning; hyperparameter optimization and network architecture search; automated machine learning; adversarial machine learning; bankruptcy prediction models and financial ratio analysis; datamining-based intrusion detection
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Deep learning-based computing technology is significantly improving the quality and reliability of image recognition data today. For example, in the field of autonomous driving, the performance of sensor themselves is also increasing through deep learning based on sensor and data fusion between front camera sensors and radars. Other deep learning-based computer vision technologies help to improve the performance of smartphone camera applications such as face recognition, panorama photography, depth/geometry detection, and high-quality magnification and detection. Still, other computer vision technologies have come to accurately recognize human behavior and posture. This allows for the use of human behavior as a tool for human–computer interfaces (HCI) in applications such as the Metaverse. This Special Issue covers all topics related to applications using deep learning-based image and video sensing technologies. 

Topics include, but are not limited to, the following:

  • Deep learning-based image sensing techniques;
  • Deep learning-based video sensing techniques;
  • Deep learning-based computer vision algorithms;
  • Deep learning-based signal processing techniques;
  • Deep learning-based computational photography.

Prof. Dr. Sukho Lee
Prof. Dr. Dae-Ki Kang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • image sensing
  • video sensing
  • image sensor
  • video sensor
  • computer vision

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issue

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

22 pages, 5746 KB  
Article
AGSK-Net: Adaptive Geometry-Aware Stereo-KANformer Network for Global and Local Unsupervised Stereo Matching
by Qianglong Feng, Xiaofeng Wang, Zhenglin Lu, Haiyu Wang, Tingfeng Qi and Tianyi Zhang
Sensors 2025, 25(18), 5905; https://doi.org/10.3390/s25185905 - 21 Sep 2025
Viewed by 269
Abstract
The performance of unsupervised stereo matching in complex regions such as weak textures and occlusions is constrained by the inherently local receptive fields of convolutional neural networks (CNNs), the absence of geometric priors, and the limited expressiveness of MLP in conventional ViTs. To [...] Read more.
The performance of unsupervised stereo matching in complex regions such as weak textures and occlusions is constrained by the inherently local receptive fields of convolutional neural networks (CNNs), the absence of geometric priors, and the limited expressiveness of MLP in conventional ViTs. To address these problems, we propose an Adaptive Geometry-aware Stereo-KANformer Network (AGSK-Net) for unsupervised stereo matching. Firstly, to resolve the conflict between the isotropic nature of traditional ViT and the epipolar geometry priors in stereo matching, we propose Adaptive Geometry-aware Multi-head Self-Attention (AG-MSA), which embeds epipolar priors via an adaptive hybrid structure of geometric modulation and penalty, enabling geometry-aware global context modeling. Secondly, we design Spatial Group-Rational KAN (SGR-KAN), which integrates the nonlinear capability of rational functions with the spatial awareness of deep convolutions, replacing the MLP with flexible, learnable rational functions to enhance the nonlinear expression ability of complex regions. Finally, we propose a Dynamic Candidate Gated Fusion (DCGF) module that employs dynamic dual-candidate states and spatially aware pre-enhancement to adaptively fuse global and local features across scales. Experiments demonstrate that AGSK-Net achieves state-of-the-art accuracy and generalizability on Scene Flow, KITTI 2012/2015, and Middlebury 2021. Full article
(This article belongs to the Special Issue Deep Learning Technology and Image Sensing: 2nd Edition)
Show Figures

Figure 1

20 pages, 55265 KB  
Article
Learning Precise Mask Representation for Siamese Visual Tracking
by Peng Yang, Fen Hu, Qinghui Wang and Lei Dou
Sensors 2025, 25(18), 5743; https://doi.org/10.3390/s25185743 - 15 Sep 2025
Viewed by 387
Abstract
Siamese network trackers are a prominent paradigm in visual object tracking due to efficient similarity learning. However, most Siamese trackers are restricted to the bounding box tracking format, which often fails to accurately describe the appearance of non-rigid targets with complex deformations. Additionally, [...] Read more.
Siamese network trackers are a prominent paradigm in visual object tracking due to efficient similarity learning. However, most Siamese trackers are restricted to the bounding box tracking format, which often fails to accurately describe the appearance of non-rigid targets with complex deformations. Additionally, since the bounding box frequently includes excessive background pixels, trackers are sensitive to similar distractors. To address these issues, we propose a novel segmentation-assisted model that learns binary mask representations of targets. This model is generic and can be seamlessly integrated into various Siamese frameworks, enabling pixel-wise segmentation tracking instead of the suboptimal bounding box tracking. Specifically, our model features two core components: (i) a multi-stage precise mask representation module composed of cascaded U-Net decoders, designed to predict segmentation masks of targets, and (ii) a saliency localization head based on the Euclidean model, which extracts spatial position constraints to boost the decoder’s discriminative capability. Extensive experiments on five tracking benchmarks demonstrate that our method effectively improves the performance of both anchor-based and anchor-free Siamese trackers. Notably, on GOT-10k, our method increases the AO scores of the baseline trackers SiamRPN++ (anchor-based) and SiamBAN (anchor-free) by 5.2% and 7.5%, respectively while maintaining speeds exceeding 60 FPS. Full article
(This article belongs to the Special Issue Deep Learning Technology and Image Sensing: 2nd Edition)
Show Figures

Figure 1

22 pages, 1906 KB  
Article
A Style Transfer-Based Fast Image Quality Assessment Method for Image Sensors
by Weizhi Xian, Bin Chen, Jielu Yan, Xuekai Wei, Kunyin Guo, Bin Fang and Mingliang Zhou
Sensors 2025, 25(16), 5121; https://doi.org/10.3390/s25165121 - 18 Aug 2025
Viewed by 682
Abstract
Accurate image quality evaluation is essential for optimizing sensor performance and enhancing the fidelity of visual data. The concept of “image style” encompasses the overall visual characteristics of an image, including elements such as colors, textures, shapes, lines, strokes, and other visual components. [...] Read more.
Accurate image quality evaluation is essential for optimizing sensor performance and enhancing the fidelity of visual data. The concept of “image style” encompasses the overall visual characteristics of an image, including elements such as colors, textures, shapes, lines, strokes, and other visual components. In this paper, we propose a novel full-reference image quality assessment (FR-IQA) method that leverages the principles of style transfer, which we call style- and content-based IQA (SCIQA). Our approach consists of three main steps. First, we employ a deep convolutional neural network (CNN) to decompose and represent images in the deep domain, capturing both low-level and high-level features. Second, we define a comprehensive deep perceptual distance metric between two images, taking into account both image content and style. This metric combines traditional content-based measures with style-based measures inspired by recent advances in neural style transfer. Finally, we formulate a perceptual optimization problem to determine the optimal parameters for the SCIQA model, which we solve via a convex optimization approach. Experimental results across multiple benchmark datasets (LIVE, CSIQ, TID2013, KADID-10k, and PIPAL) demonstrate that SCIQA outperforms state-of-the-art FR-IQA methods. Specifically, SCIQA achieves Pearson linear correlation coefficients (PLCC) of 0.956, 0.941, and 0.895 on the LIVE, CSIQ, and TID2013 datasets, respectively, outperforming traditional methods such as SSIM (PLCC: 0.847, 0.852, 0.665) and deep learning-based methods such as DISTS (PLCC: 0.924, 0.919, 0.855). The proposed method also demonstrates robust generalizability on the large-scale PIPAL dataset, achieving an SROCC of 0.702. Furthermore, SCIQA exhibits strong interpretability, exceptional prediction accuracy, and low computational complexity, making it a practical tool for real-world applications. Full article
(This article belongs to the Special Issue Deep Learning Technology and Image Sensing: 2nd Edition)
Show Figures

Figure 1

37 pages, 16392 KB  
Article
Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations
by Yosua Setyawan Soekamto, Andreas Lim, Leonard Christopher Limanjaya, Yoshua Kaleb Purwanto, Suk-Ho Lee and Dae-Ki Kang
Sensors 2025, 25(2), 449; https://doi.org/10.3390/s25020449 - 14 Jan 2025
Viewed by 2957
Abstract
Choosing nutritious foods is essential for daily health, but finding recipes that match available ingredients and dietary preferences can be challenging. Traditional recommendation methods often lack personalization and accurate ingredient recognition. Personalized systems address this by integrating user preferences, dietary needs, and ingredient [...] Read more.
Choosing nutritious foods is essential for daily health, but finding recipes that match available ingredients and dietary preferences can be challenging. Traditional recommendation methods often lack personalization and accurate ingredient recognition. Personalized systems address this by integrating user preferences, dietary needs, and ingredient availability. This study presents Pic2Plate, a framework combining Vision-Language Models (VLMs) and Retrieval-Augmented Generation (RAG) to overcome these challenges. Pic2Plate uses advanced image recognition to extract ingredient lists from user images and RAG to retrieve and personalize recipe recommendations. Leveraging smartphone camera sensors ensures accessibility and portability. Pic2Plate’s performance was evaluated in two areas: ingredient detection accuracy and recipe relevance. The ingredient detection module, powered by GPT-4o, achieved strong results with precision (0.83), recall (0.91), accuracy (0.77), and F1-score (0.86), demonstrating effectiveness in recognizing diverse food items. A survey of 120 participants assessed recipe relevance, with model rankings calculated using the Bradley–Terry method. Pic2Plate’s VLM and RAG integration consistently outperformed other models. These results highlight Pic2Plate’s ability to deliver context-aware, reliable, and diverse recipe suggestions. The study underscores its potential to transform recipe recommendation systems with a scalable, user-centric approach to personalized cooking. Full article
(This article belongs to the Special Issue Deep Learning Technology and Image Sensing: 2nd Edition)
Show Figures

Figure 1

17 pages, 4076 KB  
Article
Deep Ensemble Learning-Based Sensor for Flotation Froth Image Recognition
by Xiaojun Zhou and Yiping He
Sensors 2024, 24(15), 5048; https://doi.org/10.3390/s24155048 - 4 Aug 2024
Cited by 5 | Viewed by 2191
Abstract
Froth flotation is a widespread and important method for mineral separation, significantly influencing the purity and quality of extracted minerals. Traditionally, workers need to control chemical dosages by observing the visual characteristics of flotation froth, but this requires considerable experience and operational skills. [...] Read more.
Froth flotation is a widespread and important method for mineral separation, significantly influencing the purity and quality of extracted minerals. Traditionally, workers need to control chemical dosages by observing the visual characteristics of flotation froth, but this requires considerable experience and operational skills. This paper designs a deep ensemble learning-based sensor for flotation froth image recognition to monitor actual flotation froth working conditions, so as to assist operators in facilitating chemical dosage adjustments and achieve the industrial goals of promoting concentrate grade and mineral recovery. In our approach, training and validation data on flotation froth images are partitioned in K-fold cross validation, and deep neural network (DNN) based learners are generated through pre-trained DNN models in image-enhanced training data, in order to improve their generalization and robustness. Then, a membership function utilizing the performance information of the DNN-based learners during the validation is proposed to improve the recognition accuracy of the DNN-based learners. Subsequently, a technique for order preference by similarity to an ideal solution (TOPSIS) based on the F1 score is proposed to select the most probable working condition of flotation froth images through a decision matrix composed of the DNN-based learners’ predictions via a membership function, which is adopted to optimize the combination process of deep ensemble learning. The effectiveness and superiority of the designed sensor are verified in a real industrial gold–antimony froth flotation application. Full article
(This article belongs to the Special Issue Deep Learning Technology and Image Sensing: 2nd Edition)
Show Figures

Figure 1

18 pages, 5484 KB  
Article
ELA-Net: An Efficient Lightweight Attention Network for Skin Lesion Segmentation
by Tianyu Nie, Yishi Zhao and Shihong Yao
Sensors 2024, 24(13), 4302; https://doi.org/10.3390/s24134302 - 2 Jul 2024
Cited by 6 | Viewed by 2467
Abstract
In clinical conditions limited by equipment, attaining lightweight skin lesion segmentation is pivotal as it facilitates the integration of the model into diverse medical devices, thereby enhancing operational efficiency. However, the lightweight design of the model may face accuracy degradation, especially when dealing [...] Read more.
In clinical conditions limited by equipment, attaining lightweight skin lesion segmentation is pivotal as it facilitates the integration of the model into diverse medical devices, thereby enhancing operational efficiency. However, the lightweight design of the model may face accuracy degradation, especially when dealing with complex images such as skin lesion images with irregular regions, blurred boundaries, and oversized boundaries. To address these challenges, we propose an efficient lightweight attention network (ELANet) for the skin lesion segmentation task. In ELANet, two different attention mechanisms of the bilateral residual module (BRM) can achieve complementary information, which enhances the sensitivity to features in spatial and channel dimensions, respectively, and then multiple BRMs are stacked for efficient feature extraction of the input information. In addition, the network acquires global information and improves segmentation accuracy by putting feature maps of different scales through multi-scale attention fusion (MAF) operations. Finally, we evaluate the performance of ELANet on three publicly available datasets, ISIC2016, ISIC2017, and ISIC2018, and the experimental results show that our algorithm can achieve 89.87%, 81.85%, and 82.87% of the mIoU on the three datasets with a parametric of 0.459 M, which is an excellent balance between accuracy and lightness and is superior to many existing segmentation methods. Full article
(This article belongs to the Special Issue Deep Learning Technology and Image Sensing: 2nd Edition)
Show Figures

Figure 1

Back to TopTop