Submit to Special Issue Submit Abstract to Special Issue Review for Sensors Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

Deep Learning Technology and Image Sensing: 2nd Edition

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Related Special Issue
Published Papers

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: 5 March 2026 | Viewed by 13080

Share This Special Issue

Special Issue Editors

Prof. Dr. Sukho Lee

E-Mail Website
Guest Editor

Division of Computer Engineering, Dongseo University, 47 Jurye Road, Sasang-gu, Busan 47011, Republic of Korea
Interests: image deconvolution/restoration; color image compression; computer vision; deep learning
Special Issues, Collections and Topics in MDPI journals

Prof. Dr. Dae-Ki Kang

E-Mail Website
Guest Editor

Machine Learning/Deep Learning Research Labs, Department of Computer Engineering, Dongseo University, Busan 47011, Republic of Korea
Interests: multi-agent reinforcement learning; hyperparameter optimization and network architecture search; automated machine learning; adversarial machine learning; bankruptcy prediction models and financial ratio analysis; datamining-based intrusion detection
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Deep learning-based computing technology is significantly improving the quality and reliability of image recognition data today. For example, in the field of autonomous driving, the performance of sensor themselves is also increasing through deep learning based on sensor and data fusion between front camera sensors and radars. Other deep learning-based computer vision technologies help to improve the performance of smartphone camera applications such as face recognition, panorama photography, depth/geometry detection, and high-quality magnification and detection. Still, other computer vision technologies have come to accurately recognize human behavior and posture. This allows for the use of human behavior as a tool for human–computer interfaces (HCI) in applications such as the Metaverse. This Special Issue covers all topics related to applications using deep learning-based image and video sensing technologies.

Topics include, but are not limited to, the following:

Deep learning-based image sensing techniques;
Deep learning-based video sensing techniques;
Deep learning-based computer vision algorithms;
Deep learning-based signal processing techniques;
Deep learning-based computational photography.

Prof. Dr. Sukho Lee
Prof. Dr. Dae-Ki Kang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

deep learning
image sensing
video sensing
image sensor
video sensor
computer vision

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issue

Deep Learning Technology and Image Sensing in Sensors (13 articles)

Published Papers (7 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

26 pages, 1495 KB

Open AccessArticle

FlashLightNet: An End-to-End Deep Learning Framework for Real-Time Detection and Classification of Static and Flashing Traffic Light States

by Laith Bani Khaled, Mahfuzur Rahman, Iffat Ara Ebu and John E. Ball

Sensors 2025, 25(20), 6423; https://doi.org/10.3390/s25206423 - 17 Oct 2025

Viewed by 1174

Abstract

Accurate traffic light detection and classification are fundamental for autonomous vehicle (AV) navigation and real-time traffic management in complex urban environments. Existing systems often fall short of reliably identifying and classifying traffic light states in real-time, including their flashing modes. This study introduces FlashLightNet, a novel end-to-end deep learning framework that integrates the nano version of You Only Look Once, version 10m (YOLOv10n) for traffic light detection, Residual Neural Networks 18 (ResNet-18) for feature extraction, and a Long Short-Term Memory (LSTM) network for temporal state classification. The proposed framework is designed to robustly detect and classify traffic light states, including conventional signals (red, green, and yellow) and flashing signals (flash red and flash yellow), under diverse and challenging conditions such as varying lighting, occlusions, and environmental noise. The framework has been trained and evaluated on a comprehensive custom dataset of traffic light scenarios organized into temporal sequences to capture spatiotemporal dynamics. The dataset has been prepared by taking videos of traffic lights at different intersections of Starkville, Mississippi, and Mississippi State University, consisting of red, green, yellow, flash red, and flash yellow. In addition, simulation-based video datasets with different flashing rates—2, 3, and 4 s—for traffic light states at several intersections were created using RoadRunner, further enhancing the diversity and robustness of the dataset. The YOLOv10n model achieved a mean average precision (mAP) of 99.2% in traffic light detection, while the ResNet-18 and LSTM combination classified traffic light states (red, green, yellow, flash red, and flash yellow) with an F1-score of 96%. Full article

(This article belongs to the Special Issue Deep Learning Technology and Image Sensing: 2nd Edition)

► Show Figures

Figure 1

22 pages, 5746 KB

Open AccessArticle

AGSK-Net: Adaptive Geometry-Aware Stereo-KANformer Network for Global and Local Unsupervised Stereo Matching

by Qianglong Feng, Xiaofeng Wang, Zhenglin Lu, Haiyu Wang, Tingfeng Qi and Tianyi Zhang

Sensors 2025, 25(18), 5905; https://doi.org/10.3390/s25185905 - 21 Sep 2025

Viewed by 586

Abstract

The performance of unsupervised stereo matching in complex regions such as weak textures and occlusions is constrained by the inherently local receptive fields of convolutional neural networks (CNNs), the absence of geometric priors, and the limited expressiveness of MLP in conventional ViTs. To address these problems, we propose an Adaptive Geometry-aware Stereo-KANformer Network (AGSK-Net) for unsupervised stereo matching. Firstly, to resolve the conflict between the isotropic nature of traditional ViT and the epipolar geometry priors in stereo matching, we propose Adaptive Geometry-aware Multi-head Self-Attention (AG-MSA), which embeds epipolar priors via an adaptive hybrid structure of geometric modulation and penalty, enabling geometry-aware global context modeling. Secondly, we design Spatial Group-Rational KAN (SGR-KAN), which integrates the nonlinear capability of rational functions with the spatial awareness of deep convolutions, replacing the MLP with flexible, learnable rational functions to enhance the nonlinear expression ability of complex regions. Finally, we propose a Dynamic Candidate Gated Fusion (DCGF) module that employs dynamic dual-candidate states and spatially aware pre-enhancement to adaptively fuse global and local features across scales. Experiments demonstrate that AGSK-Net achieves state-of-the-art accuracy and generalizability on Scene Flow, KITTI 2012/2015, and Middlebury 2021. Full article

(This article belongs to the Special Issue Deep Learning Technology and Image Sensing: 2nd Edition)

► Show Figures

Figure 1

20 pages, 55265 KB

Open AccessArticle

Learning Precise Mask Representation for Siamese Visual Tracking

by Peng Yang, Fen Hu, Qinghui Wang and Lei Dou

Sensors 2025, 25(18), 5743; https://doi.org/10.3390/s25185743 - 15 Sep 2025

Viewed by 614

Abstract

Siamese network trackers are a prominent paradigm in visual object tracking due to efficient similarity learning. However, most Siamese trackers are restricted to the bounding box tracking format, which often fails to accurately describe the appearance of non-rigid targets with complex deformations. Additionally, since the bounding box frequently includes excessive background pixels, trackers are sensitive to similar distractors. To address these issues, we propose a novel segmentation-assisted model that learns binary mask representations of targets. This model is generic and can be seamlessly integrated into various Siamese frameworks, enabling pixel-wise segmentation tracking instead of the suboptimal bounding box tracking. Specifically, our model features two core components: (i) a multi-stage precise mask representation module composed of cascaded U-Net decoders, designed to predict segmentation masks of targets, and (ii) a saliency localization head based on the Euclidean model, which extracts spatial position constraints to boost the decoder’s discriminative capability. Extensive experiments on five tracking benchmarks demonstrate that our method effectively improves the performance of both anchor-based and anchor-free Siamese trackers. Notably, on GOT-10k, our method increases the AO scores of the baseline trackers SiamRPN++ (anchor-based) and SiamBAN (anchor-free) by 5.2% and 7.5%, respectively while maintaining speeds exceeding 60 FPS. Full article

(This article belongs to the Special Issue Deep Learning Technology and Image Sensing: 2nd Edition)

► Show Figures

Figure 1

22 pages, 1906 KB

Open AccessArticle

A Style Transfer-Based Fast Image Quality Assessment Method for Image Sensors

by Weizhi Xian, Bin Chen, Jielu Yan, Xuekai Wei, Kunyin Guo, Bin Fang and Mingliang Zhou

Sensors 2025, 25(16), 5121; https://doi.org/10.3390/s25165121 - 18 Aug 2025

Viewed by 901

Abstract

Accurate image quality evaluation is essential for optimizing sensor performance and enhancing the fidelity of visual data. The concept of “image style” encompasses the overall visual characteristics of an image, including elements such as colors, textures, shapes, lines, strokes, and other visual components. In this paper, we propose a novel full-reference image quality assessment (FR-IQA) method that leverages the principles of style transfer, which we call style- and content-based IQA (SCIQA). Our approach consists of three main steps. First, we employ a deep convolutional neural network (CNN) to decompose and represent images in the deep domain, capturing both low-level and high-level features. Second, we define a comprehensive deep perceptual distance metric between two images, taking into account both image content and style. This metric combines traditional content-based measures with style-based measures inspired by recent advances in neural style transfer. Finally, we formulate a perceptual optimization problem to determine the optimal parameters for the SCIQA model, which we solve via a convex optimization approach. Experimental results across multiple benchmark datasets (LIVE, CSIQ, TID2013, KADID-10k, and PIPAL) demonstrate that SCIQA outperforms state-of-the-art FR-IQA methods. Specifically, SCIQA achieves Pearson linear correlation coefficients (PLCC) of 0.956, 0.941, and 0.895 on the LIVE, CSIQ, and TID2013 datasets, respectively, outperforming traditional methods such as SSIM (PLCC: 0.847, 0.852, 0.665) and deep learning-based methods such as DISTS (PLCC: 0.924, 0.919, 0.855). The proposed method also demonstrates robust generalizability on the large-scale PIPAL dataset, achieving an SROCC of 0.702. Furthermore, SCIQA exhibits strong interpretability, exceptional prediction accuracy, and low computational complexity, making it a practical tool for real-world applications. Full article

(This article belongs to the Special Issue Deep Learning Technology and Image Sensing: 2nd Edition)

► Show Figures

Figure 1

37 pages, 16392 KB

Open AccessArticle

Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations

by Yosua Setyawan Soekamto, Andreas Lim, Leonard Christopher Limanjaya, Yoshua Kaleb Purwanto, Suk-Ho Lee and Dae-Ki Kang

Sensors 2025, 25(2), 449; https://doi.org/10.3390/s25020449 - 14 Jan 2025

Cited by 1 | Viewed by 3454

Abstract

Choosing nutritious foods is essential for daily health, but finding recipes that match available ingredients and dietary preferences can be challenging. Traditional recommendation methods often lack personalization and accurate ingredient recognition. Personalized systems address this by integrating user preferences, dietary needs, and ingredient availability. This study presents Pic2Plate, a framework combining Vision-Language Models (VLMs) and Retrieval-Augmented Generation (RAG) to overcome these challenges. Pic2Plate uses advanced image recognition to extract ingredient lists from user images and RAG to retrieve and personalize recipe recommendations. Leveraging smartphone camera sensors ensures accessibility and portability. Pic2Plate’s performance was evaluated in two areas: ingredient detection accuracy and recipe relevance. The ingredient detection module, powered by GPT-4o, achieved strong results with precision (0.83), recall (0.91), accuracy (0.77), and F1-score (0.86), demonstrating effectiveness in recognizing diverse food items. A survey of 120 participants assessed recipe relevance, with model rankings calculated using the Bradley–Terry method. Pic2Plate’s VLM and RAG integration consistently outperformed other models. These results highlight Pic2Plate’s ability to deliver context-aware, reliable, and diverse recipe suggestions. The study underscores its potential to transform recipe recommendation systems with a scalable, user-centric approach to personalized cooking. Full article

(This article belongs to the Special Issue Deep Learning Technology and Image Sensing: 2nd Edition)

► Show Figures

Figure 1

17 pages, 4076 KB

Open AccessArticle

Deep Ensemble Learning-Based Sensor for Flotation Froth Image Recognition

by Xiaojun Zhou and Yiping He

Sensors 2024, 24(15), 5048; https://doi.org/10.3390/s24155048 - 4 Aug 2024

Cited by 5 | Viewed by 2475

Abstract

Froth flotation is a widespread and important method for mineral separation, significantly influencing the purity and quality of extracted minerals. Traditionally, workers need to control chemical dosages by observing the visual characteristics of flotation froth, but this requires considerable experience and operational skills. This paper designs a deep ensemble learning-based sensor for flotation froth image recognition to monitor actual flotation froth working conditions, so as to assist operators in facilitating chemical dosage adjustments and achieve the industrial goals of promoting concentrate grade and mineral recovery. In our approach, training and validation data on flotation froth images are partitioned in K-fold cross validation, and deep neural network (DNN) based learners are generated through pre-trained DNN models in image-enhanced training data, in order to improve their generalization and robustness. Then, a membership function utilizing the performance information of the DNN-based learners during the validation is proposed to improve the recognition accuracy of the DNN-based learners. Subsequently, a technique for order preference by similarity to an ideal solution (TOPSIS) based on the F1 score is proposed to select the most probable working condition of flotation froth images through a decision matrix composed of the DNN-based learners’ predictions via a membership function, which is adopted to optimize the combination process of deep ensemble learning. The effectiveness and superiority of the designed sensor are verified in a real industrial gold–antimony froth flotation application. Full article

(This article belongs to the Special Issue Deep Learning Technology and Image Sensing: 2nd Edition)

► Show Figures

Figure 1

18 pages, 5484 KB

Open AccessArticle

ELA-Net: An Efficient Lightweight Attention Network for Skin Lesion Segmentation

by Tianyu Nie, Yishi Zhao and Shihong Yao

Sensors 2024, 24(13), 4302; https://doi.org/10.3390/s24134302 - 2 Jul 2024

Cited by 7 | Viewed by 2742

Abstract

In clinical conditions limited by equipment, attaining lightweight skin lesion segmentation is pivotal as it facilitates the integration of the model into diverse medical devices, thereby enhancing operational efficiency. However, the lightweight design of the model may face accuracy degradation, especially when dealing with complex images such as skin lesion images with irregular regions, blurred boundaries, and oversized boundaries. To address these challenges, we propose an efficient lightweight attention network (ELANet) for the skin lesion segmentation task. In ELANet, two different attention mechanisms of the bilateral residual module (BRM) can achieve complementary information, which enhances the sensitivity to features in spatial and channel dimensions, respectively, and then multiple BRMs are stacked for efficient feature extraction of the input information. In addition, the network acquires global information and improves segmentation accuracy by putting feature maps of different scales through multi-scale attention fusion (MAF) operations. Finally, we evaluate the performance of ELANet on three publicly available datasets, ISIC2016, ISIC2017, and ISIC2018, and the experimental results show that our algorithm can achieve 89.87%, 81.85%, and 82.87% of the mIoU on the three datasets with a parametric of 0.459 M, which is an excellent balance between accuracy and lightness and is superior to many existing segmentation methods. Full article

(This article belongs to the Special Issue Deep Learning Technology and Image Sensing: 2nd Edition)

► Show Figures

Journal Menu

Journal Browser

Deep Learning Technology and Image Sensing: 2nd Edition

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Related Special Issue

Published Papers (7 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI