MDPI - Publisher of Open Access Journals

31 pages, 2539 KB

Open AccessArticle

Design and Evaluation of an AI-Based Conversational Agent for Travel Agencies: Enhancing Training, Assistance, and Operational Efficiency

by Pablo Vicente-Martínez, Emilio Soria-Olivas, Inés Esteve-Mompó, Manuel Sánchez-Montañés, María Ángeles García Escrivà and Edu William-Secin

AI 2026, 7(4), 123; https://doi.org/10.3390/ai7040123 - 1 Apr 2026

Viewed by 509

Abstract

The tourism industry faces increasing pressure for agile, personalized services, yet travel agencies struggle with fragmented knowledge scattered across isolated systems and legacy formats. While Large Language Models (LLMs) are widely applied in customer-facing roles, their potential to enhance internal operational efficiency remains [...] Read more.

The tourism industry faces increasing pressure for agile, personalized services, yet travel agencies struggle with fragmented knowledge scattered across isolated systems and legacy formats. While Large Language Models (LLMs) are widely applied in customer-facing roles, their potential to enhance internal operational efficiency remains largely underexplored. This study presents the design and evaluation of an intelligent assistant specifically for travel agency operations, built upon a Retrieval-Augmented Generation (RAG) architecture using Gemini 2.0 Flash. The system integrates heterogeneous data sources, including structured product catalogs and unstructured documentation processed via Optical Character Recognition (OCR), into a unified interface comprising work assistance, interactive training, and evaluation modules. Results demonstrate information retrieval times not greater than 45 s, ensuring its daily usability, while maintaining 95% accuracy. Furthermore, the system democratizes tacit senior expertise and accelerates new employee onboarding. This research validates RAG architectures as a powerful solution to knowledge fragmentation, shifting the strategic AI focus from customer automation to employee empowerment and operational optimization. Full article

► Show Figures

Figure 1

32 pages, 3916 KB

Open AccessArticle

An Automated Detection Method for Motor Vehicles Encroaching on Non-Motorized Lanes Based on Unmanned Aerial Vehicle Imagery and Civilized Behavior Monitoring

by Zichan Tan, Yin Tan, Peijing Lin, Wenjie Su, Tian He and Weishen Wu

Sensors 2026, 26(7), 2027; https://doi.org/10.3390/s26072027 - 24 Mar 2026

Viewed by 220

Abstract

Motor vehicle encroachment into non-motorized lanes is a common but hard-to-verify violation in urban intersections, especially when monitored from unmanned aerial vehicles (UAVs) or high-mounted overhead views. Existing rule-based solutions built on horizontal bounding boxes and center-point/line-crossing criteria are sensitive to perspective distortion, [...] Read more.

Motor vehicle encroachment into non-motorized lanes is a common but hard-to-verify violation in urban intersections, especially when monitored from unmanned aerial vehicles (UAVs) or high-mounted overhead views. Existing rule-based solutions built on horizontal bounding boxes and center-point/line-crossing criteria are sensitive to perspective distortion, occlusion, and frame-to-frame jitter, resulting in unstable decisions and low evidential value. This paper presents a cascaded UAV-view system that closes the loop from perception to evidence output through detection–segmentation–recognition–decision. First, we adopt a two-stage detection cascade: a lightweight vehicle detector localizes vehicles using axis-aligned bounding boxes, and a dedicated YOLOv5n-based oriented bounding box (OBB) license plate detector, constructed via architecture grafting and weight transfer, is then applied within each vehicle region of interest (ROI) to localize rotated license plates under large pose variation and small-target conditions. Second, a U-Net lane region segmentation module provides pixel-level spatial constraints to define an enforceable lane occupancy region. Third, a perspective rectification step is integrated with the PP-OCRv4 optical character recognition (OCR) framework to improve license plate recognition reliability for tilted plates. Finally, an area ratio criterion and an N-frame temporal counter are used to suppress transient misdetections and stabilize alarms. On a representative 100-sample controlled encroachment benchmark, the proposed system improves detection accuracy from 67.0% to 92.0% and reduces the false positive rate from 32.35% to 5.88% compared with a baseline horizontal bounding box (HBB)-based rule. The system outputs both violation alarms and license plate evidence, supporting practical deployment for multi-view traffic governance. Full article

(This article belongs to the Section Vehicular Sensing)

► Show Figures

Figure 1

16 pages, 4714 KB

Open AccessArticle

Metasurface-Enabled Dual-Channel Optical Image Authentication Based on Polarization Multiplexing

by Yanfeng Su, Biao Zhu, Wenming Chen, Ruijie Xue, Zijing Li, Zhijian Cai, Qibin Feng and Guoqiang Lv

Photonics 2026, 13(3), 280; https://doi.org/10.3390/photonics13030280 - 15 Mar 2026

Viewed by 243

Abstract

In this paper, a metasurface-enabled dual-channel optical image authentication based on polarization multiplexing is proposed. During encryption, authentication phases corresponding to dual-channel plaintext images are firstly calculated by using a sparse-constraint-driven authentication-holography (SCDAH) algorithm. Then, target transmission phase and geometric phase of metasurface [...] Read more.

In this paper, a metasurface-enabled dual-channel optical image authentication based on polarization multiplexing is proposed. During encryption, authentication phases corresponding to dual-channel plaintext images are firstly calculated by using a sparse-constraint-driven authentication-holography (SCDAH) algorithm. Then, target transmission phase and geometric phase of metasurface to be designed are obtained accordingly by the composite phase modulation (CPM) principle. Next, the nanopillar-type metasurface unit is performed with parameter scanning to establish the transmission and geometric phase databases. Finally, the structural parameters of each nanopillar are determined on a pixel-by-pixel basis to complete the construction of polarization-multiplexing authentication metasurface (PMAM). During authentication, the PMAM are respectively illuminated by the left-handed circularly polarized (LCP) and right-handed circularly polarized (RCP) light to obtain pseudo-random images produced by far-field diffraction, and then the nonlinear correlation distribution between diffraction image and corresponding channel plaintext image is calculated, and the final authentication result of each channel is determined based on whether the signal-to-noise ratio of the nonlinear correlation distribution meets the standard. In fact, a new physical-characteristic-driven dual-channel optical image authentication technology is formed, where double identities of the user holding this PMAM can be simultaneously verified, breaking through the rigid constraint of conventional single metasurface-to-single image, meanwhile improving the capacity and efficiency for authentication metasurface from the perspective of physical mechanism. Numerical simulations are performed to demonstrate the feasibility of the proposed method, and the simulation results prove that the proposed method exhibits high feasibility and security as well as strong robustness against cropping attack, showing a promising application potential in the field of identity recognition and authentication. Full article

► Show Figures

Figure 1

19 pages, 764 KB

Open AccessArticle

FeOCR: Domain-Adaptive Chinese OCR with Visual Character Disambiguation and LLM-Based Correction for Metallurgical Documents

by Qiang Zheng, Yaxuan Sun, Lin Wang, Haoning Zhang, Fanjie Meng and Minghui Li

Electronics 2026, 15(6), 1144; https://doi.org/10.3390/electronics15061144 - 10 Mar 2026

Viewed by 359

Abstract

High-quality text corpora are essential for knowledge graph construction and domain-specific large model pre-training in technology-intensive industries, with the steel metallurgy sector serving as a representative case. However, many industrial documents remain in scanned or PDF formats, where general-purpose Optical Character Recognition (OCR) [...] Read more.

High-quality text corpora are essential for knowledge graph construction and domain-specific large model pre-training in technology-intensive industries, with the steel metallurgy sector serving as a representative case. However, many industrial documents remain in scanned or PDF formats, where general-purpose Optical Character Recognition (OCR) systems exhibit systematic errors when recognizing Chinese metallurgical documents. In particular, visually similar Chinese characters that differ by only minor strokes are frequently confused, leading to severe degradation of text reliability and cascading errors in downstream knowledge extraction. This paper proposes FeOCR, a general-purpose domain-adaptive framework for machine-printed Chinese characters, which is specifically evaluated within the context of the steel metallurgy industry. The framework integrates visual character disambiguation with context-aware semantic correction. We first construct a metallurgy-specific OCR dataset emphasizing high-frequency confusable Chinese word pairs and enhance data diversity through font perturbation and noise synthesis. Parameter-efficient fine-tuning (LoRA) is then applied to adapt a general OCR model to domain-specific visual patterns. Furthermore, a Large Language Model-based correction module performs semantic refinement of residual errors under domain lexical constraints. Experiments demonstrate significant reductions in character and word error rates, especially for confusable technical terms, providing a reliable foundation for industrial Chinese document digitization. Full article

► Show Figures

Figure 1

18 pages, 7090 KB

Open AccessArticle

SAW-Based Active Cleaning Cover Lens for Physical AI Optical Sensors

by Jiwoon Jeon, Jungwoo Yoon, Woochan Kim, Youngkwang Kim and Sangkug Chung

Symmetry 2026, 18(2), 347; https://doi.org/10.3390/sym18020347 - 13 Feb 2026

Viewed by 421

Abstract

This paper presents a cover lens concept for camera modules based on surface acoustic waves (SAW) to mitigate the degradation of physical AI optical sensor field-of-view performance caused by surface contamination. The proposed approach utilizes a single-phase unidirectional transducer (SPUDT) that intentionally breaks [...] Read more.

This paper presents a cover lens concept for camera modules based on surface acoustic waves (SAW) to mitigate the degradation of physical AI optical sensor field-of-view performance caused by surface contamination. The proposed approach utilizes a single-phase unidirectional transducer (SPUDT) that intentionally breaks left–right symmetry through a geometrically asymmetric electrode array to generate SAW, thereby removing droplet contamination. First, the acoustic streaming induced inside a single sessile droplet by the SAW was visualized, and the dynamic behavior of the droplet upon SAW actuation was observed using a high-speed camera. The internal flow developed into a recirculating vortex structure with directional deflection relative to the SAW propagation direction, indicating a symmetry-broken streaming pattern rather than a purely symmetric circulation. Upon the application of the SAW, the droplet was confirmed to move a total of 7.2 mm along the SAW propagation direction, accompanied by interfacial deformation and oscillation. Next, an analysis of transport trajectories for five sessile droplets dispensed at different y-coordinates (

y_{1}

–

y_{5}

) revealed that all droplets were transported along the x-axis regardless of their initial positions. Furthermore, the analysis of transport velocity as a function of droplet viscosity (1

c P

and 10

c P

) and volume (2

μ L

, 4

μ L

, and 6

μ L

) demonstrated that the transport velocity gradually increased with driving voltage but decreased as viscosity increased under identical actuation conditions. Finally, the proposed cover lens was applied to an automotive front camera module to verify its effectiveness in improving object recognition performance by removing surface contamination. Based on its simple structure and driving principle, the proposed technology is deemed to be expandable as a surface contamination cleaning technology for various physical AI perception systems, including intelligent security cameras and drone camera lenses. Full article

(This article belongs to the Special Issue Applications of Symmetry/Asymmetry in Artificial Intelligence and Deep Metaheuristics)

► Show Figures

Figure 1

19 pages, 5853 KB

Open AccessArticle

Design of a Three-Channel Common-Aperture Optical System Based on Modular Layout

by Lingling Wu, Yichun Wang, Fang Wang, Jinsong Lv, Qian Wang, Baoyi Yue and Xiaoxia Ruan

Photonics 2026, 13(2), 161; https://doi.org/10.3390/photonics13020161 - 6 Feb 2026

Viewed by 502

Abstract

Multi-channel common-aperture optical systems, which excel at simultaneous multi-spectral information acquisition, are widely used for image fusion. However, complex systems for long-distance multi-band detection suffer from difficulties in assembly and adjustment and light vignetting. To resolve this, the paper proposes a modular design [...] Read more.

Multi-channel common-aperture optical systems, which excel at simultaneous multi-spectral information acquisition, are widely used for image fusion. However, complex systems for long-distance multi-band detection suffer from difficulties in assembly and adjustment and light vignetting. To resolve this, the paper proposes a modular design method that splits the optical path into independent modules: the common-aperture optical path adopts an off-axis reflective beam-shrinking structure to extend the focal length and ensure 100% light input, compared with coaxial multi-channel common-aperture systems. The relay optical path of each spectral channel uses a continuous zoom design for smooth detection–recognition switching. Based on the method, a three-channel common-aperture system is developed integrating visible light (VIS), short-wave infrared (SWIR), and mid-wave infrared (MWIR). The modulation transfer function (MTF) and wavefront distribution of the common-aperture optical path approach the diffraction limit. After integration with the relay optical paths, the system, without global optimization, can achieve the following performance: the root mean square (RMS) across the full field of view (FOV) at different focal lengths for each channel is smaller than the detector pixel size (3.45 μm for VIS, 15 μm for SWIR/MWIR); the MTF exceeds 0.2 at the cutoff frequency. Subsequently, the results of the tolerance analysis verify the feasibility of the design for each module and the advantage of the modular layout in the assembly and adjustment of the system. Finally, the paper discusses the influence of parallel plates on the wavefront distortion of the system and proposes optimization thinking using freeform surfaces. The design results of the study validate the feasibility of the modular layout in simplifying the design and assembly of multi-channel common-aperture optical systems. Full article

► Show Figures

Figure 1

23 pages, 2302 KB

Open AccessArticle

Learnable Feature Disentanglement with Temporal-Complemented Motion Enhancement for Micro-Expression Recognition

by Yu Qian, Shucheng Huang and Kai Qu

Entropy 2026, 28(2), 180; https://doi.org/10.3390/e28020180 - 4 Feb 2026

Viewed by 447

Abstract

Micro-expressions (MEs) are involuntary facial movements that reveal genuine emotions, holding significant value in fields like deception detection and psychological diagnosis. However, micro-expression recognition (MER) is fundamentally challenged by the entanglement of subtle emotional motions with identity-specific features. Traditional methods, such as those [...] Read more.

Micro-expressions (MEs) are involuntary facial movements that reveal genuine emotions, holding significant value in fields like deception detection and psychological diagnosis. However, micro-expression recognition (MER) is fundamentally challenged by the entanglement of subtle emotional motions with identity-specific features. Traditional methods, such as those based on Robust Principal Component Analysis (RPCA), attempt to separate identity and motion components through fixed preprocessing and coarse decomposition. However, these methods can inadvertently remove subtle emotional cues and are disconnected from subsequent module training, limiting the discriminative power of features. Inspired by the Bruce–Young model of facial cognition, which suggests that facial identity and expression are processed via independent neural routes, we recognize the need for a more dynamic, learnable disentanglement paradigm for MER. We propose LFD-TCMEN, a novel network that introduces an end-to-end learnable feature disentanglement framework. The network is synergistically optimized by a multi-task objective unifying orthogonality, reconstruction, consistency, cycle, identity, and classification losses. Specifically, the Disentangle Representation Learning (DRL) module adaptively isolates pure motion patterns from subject-specific appearance, overcoming the limitations of static preprocessing, while the Temporal-Complemented Motion Enhancement (TCME) module integrates purified motion representations—highlighting subtle facial muscle activations—with optical flow dynamics to comprehensively model the spatiotemporal evolution of MEs. Extensive experiments on CAS(ME)³ and DFME benchmarks demonstrate that our method achieves state-of-the-art cross-subject performance, validating the efficacy of the proposed learnable disentanglement and synergistic optimization. Full article

(This article belongs to the Special Issue Application of Information Theory to Computer Vision and Image Processing, 3rd Edition)

► Show Figures

Figure 1

18 pages, 3547 KB

Open AccessReview

DNA Nanostructure-Assembled Metallic Nanoparticles for Biosensing Applications

by Shaokang Ren, Kai He, Canlin Cui, Haoyu Fan, Hongzhen Peng, Kai Jiao and Lihua Wang

Molecules 2026, 31(3), 513; https://doi.org/10.3390/molecules31030513 - 2 Feb 2026

Cited by 2 | Viewed by 701

Abstract

DNA nanotechnology offers an unprecedented level of structural programmability for organizing metallic nanoparticles into precisely defined architectures, providing a powerful platform for plasmonic biosensing. In particular, gold and silver nanoparticles assembled on DNA nanostructures enable nanometer-scale control over interparticle distance, orientation, and spatial [...] Read more.

DNA nanotechnology offers an unprecedented level of structural programmability for organizing metallic nanoparticles into precisely defined architectures, providing a powerful platform for plasmonic biosensing. In particular, gold and silver nanoparticles assembled on DNA nanostructures enable nanometer-scale control over interparticle distance, orientation, and spatial symmetry, which directly govern collective plasmonic behaviors and optical signal transduction. This review summarizes recent advances in DNA nanostructure-mediated assembly of metal nanoparticles, with an emphasis on design principles and assembly strategies that enable static and dynamic control of nanoparticle organization. Representative examples are discussed to illustrate how well-defined plasmonic assemblies give rise to tunable optical responses, including localized surface plasmon resonance modulation, chiroptical signals, fluorescence enhancement or quenching, and surface-enhanced Raman scattering. The role of structural programmability and stimulus-responsive reconfiguration in translating molecular recognition events into amplified optical outputs is highlighted in the context of biosensing. Finally, current challenges and future perspectives are outlined, focusing on structural robustness, signal reproducibility, and integration toward practical and multiplexed biosensing platforms. Full article

(This article belongs to the Special Issue Functional Nanomaterials for Biosensors and Biomedicine Application)

► Show Figures

Figure 1

19 pages, 3809 KB

Open AccessArticle

Theoretical Modeling and Experimental Study on Low-Altitude Slow-Small Target (LSS) Detection Based on Broadband Spectral Modulation Imaging

by Dongliang Li, Yangyang Hua, Siyuan Song, Jianguo Liu and Hongxing Cai

Sensors 2026, 26(3), 909; https://doi.org/10.3390/s26030909 - 30 Jan 2026

Viewed by 440

Abstract

The detection of low-altitude slow-small (LSS) targets, such as drones, is challenged by their small radar cross-section (RCS) and low signal-to-clutter ratio (SCR), resulting in short effective range and susceptibility to background clutter in complex environments. To overcome the limitations of conventional radar [...] Read more.

The detection of low-altitude slow-small (LSS) targets, such as drones, is challenged by their small radar cross-section (RCS) and low signal-to-clutter ratio (SCR), resulting in short effective range and susceptibility to background clutter in complex environments. To overcome the limitations of conventional radar and electro-optical methods, this paper proposes a novel detection theory based on broadband spectral modulation imaging (BSMI). We analyze the recognition accuracy for drone targets across different zenith angles and detection ranges through numerical simulations. A snapshot-based BSMI detection system was designed and implemented, with experiments conducted under consistent conditions for validation. Results demonstrate that the system achieves over 90% classification accuracy, confirming the theory’s effectiveness. This study significantly enhances detection probability and suppresses false alarms for low-altitude drones, providing a viable technical solution for monitoring unauthorized aerial activities. Full article

(This article belongs to the Section Optical Sensors)

► Show Figures

Figure 1

12 pages, 874 KB

Open AccessProceeding Paper

Smart Pavement Systems with Embedded Sensors for Traffic and Environmental Monitoring

by Wai Yie Leong

Eng. Proc. 2025, 120(1), 12; https://doi.org/10.3390/engproc2025120012 - 29 Jan 2026

Viewed by 1427

Abstract

The evolution of next-generation urban infrastructure necessitates the deployment of intelligent pavement systems capable of real-time data acquisition, adaptive response, and predictive analytics. This article presents the design, implementation, and performance evaluation of the smart pavement system incorporating multimodal embedded sensors for traffic [...] Read more.

The evolution of next-generation urban infrastructure necessitates the deployment of intelligent pavement systems capable of real-time data acquisition, adaptive response, and predictive analytics. This article presents the design, implementation, and performance evaluation of the smart pavement system incorporating multimodal embedded sensors for traffic density analysis, structural health monitoring, and environmental surveillance. SPS integrates piezoelectric transducers, micro-electro-mechanical system accelerometers, inductive loop coils, fiber Bragg grating (FBG) sensors, and capacitive moisture and temperature sensors within the asphalt and sub-base layers, forming a distributed sensor network that interfaces with an edge-AI-enabled data acquisition and control module. Each sensor node performs localized pre-processing using low-power microcontrollers and transmits spatiotemporal data to a centralized IoT gateway over an adaptive mesh topology via long-range wide-area network or 5G-Vehicle-to-Everything protocols. Data fusion algorithms employing Kalman filters, sensor drift compensation models, and deep convolutional recurrent neural networks enable accurate classification of vehicular loads, traffic, and anomaly detection. Additionally, the system supports real-time air pollutant detection (e.g., NO₂, CO, and PM2.5) using embedded electrochemical and optical gas sensors linked to mobile roadside units. Field deployments on a 1.2 km highway testbed demonstrate the system’s capability to achieve 95.7% classification accuracy for vehicle type recognition, ±1.5 mm resolution in rut depth measurement, and ±0.2 °C thermal sensitivity across dynamic weather conditions. Predictive analytics driven by long short-term memory networks yield a 21.4% improvement in maintenance planning accuracy, significantly reducing unplanned downtimes and repair costs. The architecture also supports vehicle-to-infrastructure feedback loops for adaptive traffic signal control and incident response. The proposed SPS architecture demonstrates a scalable and resilient framework for cyber-physical infrastructure, paving the way for smart cities that are responsive, efficient, and sustainable. Full article

(This article belongs to the Proceedings of 8th International Conference on Knowledge Innovation and Invention)

► Show Figures

Figure 1

20 pages, 17064 KB

Open AccessArticle

PriorSAM-DBNet: A SAM-Prior-Enhanced Dual-Branch Network for Efficient Semantic Segmentation of High-Resolution Remote Sensing Images

by Qiwei Zhang, Yisong Wang, Ning Li, Quanwen Jiang and Yong He

Sensors 2026, 26(2), 749; https://doi.org/10.3390/s26020749 - 22 Jan 2026

Cited by 1 | Viewed by 481

Abstract

Semantic segmentation of high-resolution remote sensing imagery is a critical technology for the intelligent interpretation of sensor data, supporting automated environmental monitoring and urban sensing systems. However, processing data from dense urban scenarios remains challenging due to sensor signal occlusions (e.g., shadows) and [...] Read more.

Semantic segmentation of high-resolution remote sensing imagery is a critical technology for the intelligent interpretation of sensor data, supporting automated environmental monitoring and urban sensing systems. However, processing data from dense urban scenarios remains challenging due to sensor signal occlusions (e.g., shadows) and the complexity of parsing multi-scale targets from optical sensors. Existing approaches often exhibit a trade-off between the accuracy of global semantic modeling and the precision of complex boundary recognition. While the Segment Anything Model (SAM) offers powerful zero-shot structural priors, its direct application to remote sensing is hindered by domain gaps and the lack of inherent semantic categorization. To address these limitations, we propose a dual-branch cooperative network, PriorSAM-DBNet. The main branch employs a Densely Connected Swin (DC-Swin) Transformer to capture cross-scale global features via a hierarchical shifted window attention mechanism. The auxiliary branch leverages SAM’s zero-shot capability to exploit structural universality, generating object-boundary masks as robust signal priors while bypassing semantic domain shifts. Crucially, we introduce a parameter-efficient Scaled Subsampling Projection (SSP) module that employs a weight-sharing mechanism to align cross-modal features, freezing the massive SAM backbone to ensure computational viability for practical sensor applications. Furthermore, a novel Attentive Cross-Modal Fusion (ACMF) module is designed to dynamically resolve semantic ambiguities by calibrating the global context with local structural priors. Extensive experiments on the ISPRS Vaihingen, Potsdam, and LoveDA-Urban datasets demonstrate that PriorSAM-DBNet outperforms state-of-the-art approaches. By fine-tuning only 0.91 million parameters in the auxiliary branch, our method achieves mIoU scores of 82.50%, 85.59%, and 53.36%, respectively. The proposed framework offers a scalable, high-precision solution for remote sensing semantic segmentation, particularly effective for disaster emergency response where rapid feature recognition from sensor streams is paramount. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Target Recognition and Remote Sensing Data Processing)

► Show Figures

Figure 1

27 pages, 3763 KB

Open AccessFeature PaperArticle

GO-PILL: A Geometry-Aware OCR Pipeline for Reliable Recognition of Debossed and Curved Pill Imprints

by Jaehyeon Jo, Sungan Yoon and Jeongho Cho

Mathematics 2026, 14(2), 356; https://doi.org/10.3390/math14020356 - 21 Jan 2026

Viewed by 716

Abstract

Manual pill identification is often inefficient and error-prone due to the large variety of medications and frequent visual similarity among pills, leading to misuse or dispensing errors. These challenges are exacerbated when pill imprints are engraved, curved, or irregularly arranged, conditions under which [...] Read more.

Manual pill identification is often inefficient and error-prone due to the large variety of medications and frequent visual similarity among pills, leading to misuse or dispensing errors. These challenges are exacerbated when pill imprints are engraved, curved, or irregularly arranged, conditions under which conventional optical character recognition (OCR)-based methods degrade significantly. To address this problem, we propose GO-PILL, a geometry-aware OCR pipeline for robust pill imprint recognition. The framework extracts text centerlines and imprint regions using the TextSnake algorithm. During imprint refinement, background noise is suppressed and contrast is enhanced to improve the visibility of embossed and debossed imprints. The imprint localization and alignment stage then rectifies curved or obliquely oriented text into a linear representation, producing geometrically normalized inputs suitable for OCR decoding. The refined imprints are processed by a multimodal OCR module that integrates a non-autoregressive language–vision fusion architecture for accurate character-level recognition. Experiments on a pill image dataset from the U.S. National Library of Medicine show that GO-PILL achieves an F1-score of 81.83% under set-based evaluation and a Top-10 pill identification accuracy of 76.52% in a simulated clinical scenario. GO-PILL consistently outperforms existing methods under challenging imprint conditions, demonstrating strong robustness and practical feasibility. Full article

(This article belongs to the Special Issue Applications of Deep Learning and Convolutional Neural Network)

► Show Figures

Figure 1

21 pages, 5194 KB

Open AccessArticle

Integrated Polarimetric Spectral Imaging Sensor Combining Spectral Imaging and Polarization Modulation Techniques

by Zihao Liu, Zhiping Song, Zhengqiang Li and Li Li

Sensors 2026, 26(1), 144; https://doi.org/10.3390/s26010144 - 25 Dec 2025

Viewed by 712

Abstract

Polarimetric spectral imaging systems have unique application advantages in environmental remote sensing, military target recognition, astronomy, medicine, etc., because of their ability to acquire multidimensional information. However, traditional systems are constrained by complex structures and low spectral resolution, making them unlikely to achieve [...] Read more.

Polarimetric spectral imaging systems have unique application advantages in environmental remote sensing, military target recognition, astronomy, medicine, etc., because of their ability to acquire multidimensional information. However, traditional systems are constrained by complex structures and low spectral resolution, making them unlikely to achieve their full potential. This study proposes a novel polarimetric spectral imaging method for information acquisition to address these shortcomings. The method integrates a polarization modulator (composed of two retarders and one polarizer) into the incident optical path of a push-broom imaging spectrometer for hardware integration. The modulator statically encodes the full polarization spectral information of the measured light into output power spectra, which the spectrometer records as raw spectral image data. Target polarimetric spectral imaging information is then reconstructed from the raw data to realize sensor functions. The system structure, data reconstruction principles, laboratory experiments with typical polarized light sources, and preliminary outdoor experiments verified the system’s correctness and reliability. The results facilitate further expansion of the application scope of polarimetric spectral imaging systems. Full article

(This article belongs to the Section Optical Sensors)

► Show Figures

Figure 1

21 pages, 3813 KB

Open AccessArticle

HMRM: A Hybrid Motion and Region-Fused Mamba Network for Micro-Expression Recognition

by Zhe Guo, Yi Liu, Rui Luo, Jiayi Liu and Lan Wei

Sensors 2025, 25(24), 7672; https://doi.org/10.3390/s25247672 - 18 Dec 2025

Viewed by 609

Abstract

Micro-expression recognition (MER), as an important branch of intelligent visual sensing, enables the analysis of subtle facial movements for applications in emotion understanding, human–computer interaction and security monitoring. However, existing methods struggle to capture fine-grained spatiotemporal dynamics under limited data and computational resources, [...] Read more.

Micro-expression recognition (MER), as an important branch of intelligent visual sensing, enables the analysis of subtle facial movements for applications in emotion understanding, human–computer interaction and security monitoring. However, existing methods struggle to capture fine-grained spatiotemporal dynamics under limited data and computational resources, making them difficult to deploy in real-world sensing systems. To address this limitation, we propose HMRM, a hybrid motion and region-fused Mamba network designed for efficient and accurate MER. HMRM enhances motion representation through a hybrid feature augmentation module that integrates gated recurrent unit (GRU)-attention optical flow estimation with a regional MotionMix enhancement strategy to increase motion diversity. Furthermore, it employs a grained Mamba encoder to achieve lightweight and effective long-range temporal modeling. Additionally, a regions feature fusion strategy is introduced to strengthen the representation of localized expression dynamics. Experiments on multiple MER benchmark datasets demonstrate that HMRM achieves state-of-the-art performance with strong generalization and low computational cost, highlighting its potential for integration into compact, real-time visual sensing and emotion analysis systems. Full article

(This article belongs to the Special Issue Emotion Recognition and Cognitive Behavior Analysis Based on Sensors)

► Show Figures

Figure 1

18 pages, 5536 KB

Open AccessEditor’s ChoiceArticle

Automated Particle Size Analysis of Supported Nanoparticle TEM Images Using a Pre-Trained SAM Model

by Xiukun Zhong, Guohong Liang, Lingbei Meng, Wei Xi, Lin Gu, Nana Tian, Yong Zhai, Yutong He, Yuqiong Huang, Fengmin Jin and Hong Gao

Nanomaterials 2025, 15(24), 1886; https://doi.org/10.3390/nano15241886 - 16 Dec 2025

Cited by 2 | Viewed by 1114

Abstract

This study addresses the challenges associated with transmission electron microscopy (TEM) image analysis of supported nanoparticles, including low signal-to-noise ratio, poor contrast, and interference from complex substrate backgrounds. This study proposes an automated segmentation and particle size analysis method based on a large-scale [...] Read more.

This study addresses the challenges associated with transmission electron microscopy (TEM) image analysis of supported nanoparticles, including low signal-to-noise ratio, poor contrast, and interference from complex substrate backgrounds. This study proposes an automated segmentation and particle size analysis method based on a large-scale deep learning model, namely segment anything model (SAM). Using Ru/TiO₂ and related materials as representative systems, the pretrained SAM is employed for zero-shot segmentation of nanoparticles, which is further integrated with a custom image processing pipeline, including optical character recognition (OCR) module, morphological optimization, and connected component analysis to achieve high-precision particle size quantification. Experimental results demonstrate that the method retains robust performance under challenging imaging conditions, with a size estimation error between 3% and 5% and a per-image processing time under 1 min, significantly outperforming traditional manual annotation and threshold-based segmentation approaches. This framework provides an efficient and reliable analytical tool for morphological characterization and structure–performance correlation studies in supported nanocatalysts. Full article

(This article belongs to the Section Theory and Simulation of Nanostructures)

► Show Figures

Figure 1

Search Results (134)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (134)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI