MDPI - Publisher of Open Access Journals

32 pages, 6323 KiB

Open AccessArticle

Design, Implementation and Evaluation of an Immersive Teleoperation Interface for Human-Centered Autonomous Driving

by Irene Bouzón, Jimena Pascual, Cayetana Costales, Aser Crespo, Covadonga Cima and David Melendi

Sensors 2025, 25(15), 4679; https://doi.org/10.3390/s25154679 - 29 Jul 2025

Viewed by 252

Abstract

As autonomous driving technologies advance, the need for human-in-the-loop systems becomes increasingly critical to ensure safety, adaptability, and public confidence. This paper presents the design and evaluation of a context-aware immersive teleoperation interface that integrates real-time simulation, virtual reality, and multimodal feedback to [...] Read more.

As autonomous driving technologies advance, the need for human-in-the-loop systems becomes increasingly critical to ensure safety, adaptability, and public confidence. This paper presents the design and evaluation of a context-aware immersive teleoperation interface that integrates real-time simulation, virtual reality, and multimodal feedback to support remote interventions in emergency scenarios. Built on a modular ROS2 architecture, the system allows seamless transition between simulated and physical platforms, enabling safe and reproducible testing. The experimental results show a high task success rate and user satisfaction, highlighting the importance of intuitive controls, gesture recognition accuracy, and low-latency feedback. Our findings contribute to the understanding of human-robot interaction (HRI) in immersive teleoperation contexts and provide insights into the role of multisensory feedback and control modalities in building trust and situational awareness for remote operators. Ultimately, this approach is intended to support the broader acceptability of autonomous driving technologies by enhancing human supervision, control, and confidence. Full article

(This article belongs to the Special Issue Human-Centred Smart Manufacturing - Industry 5.0)

► Show Figures

Figure 1

27 pages, 1128 KiB

Open AccessArticle

Adaptive Multi-Hop P2P Video Communication: A Super Node-Based Architecture for Conversation-Aware Streaming

by Jiajing Chen and Satoshi Fujita

Information 2025, 16(8), 643; https://doi.org/10.3390/info16080643 - 28 Jul 2025

Viewed by 245

Abstract

This paper proposes a multi-hop peer-to-peer (P2P) video streaming architecture designed to support dynamic, conversation-aware communication. The primary contribution is a decentralized system built on WebRTC that eliminates reliance on a central media server by employing super node aggregation. In this architecture, video [...] Read more.

This paper proposes a multi-hop peer-to-peer (P2P) video streaming architecture designed to support dynamic, conversation-aware communication. The primary contribution is a decentralized system built on WebRTC that eliminates reliance on a central media server by employing super node aggregation. In this architecture, video streams from multiple peer nodes are dynamically routed through a group of super nodes, enabling real-time reconfiguration of the network topology in response to conversational changes. To support this dynamic behavior, the system leverages WebRTC data channels for control signaling and overlay restructuring, allowing efficient dissemination of topology updates and coordination messages among peers. A key focus of this study is the rapid and efficient reallocation of network resources immediately following conversational events, ensuring that the streaming overlay remains aligned with ongoing interaction patterns. While the automatic detection of such events is beyond the scope of this work, we assume that external triggers are available to initiate topology updates. To validate the effectiveness of the proposed system, we construct a simulation environment using Docker containers and evaluate its streaming performance under dynamic network conditions. The results demonstrate the system’s applicability to adaptive, naturalistic communication scenarios. Finally, we discuss future directions, including the seamless integration of external trigger sources and enhanced support for flexible, context-sensitive interaction frameworks. Full article

(This article belongs to the Special Issue Second Edition of Advances in Wireless Communications Systems)

► Show Figures

Figure 1

25 pages, 2518 KiB

Open AccessArticle

An Efficient Semantic Segmentation Framework with Attention-Driven Context Enhancement and Dynamic Fusion for Autonomous Driving

by Jia Tian, Peizeng Xin, Xinlu Bai, Zhiguo Xiao and Nianfeng Li

Appl. Sci. 2025, 15(15), 8373; https://doi.org/10.3390/app15158373 - 28 Jul 2025

Viewed by 274

Abstract

In recent years, a growing number of real-time semantic segmentation networks have been developed to improve segmentation accuracy. However, these advancements often come at the cost of increased computational complexity, which limits their inference efficiency, particularly in scenarios such as autonomous driving, where [...] Read more.

In recent years, a growing number of real-time semantic segmentation networks have been developed to improve segmentation accuracy. However, these advancements often come at the cost of increased computational complexity, which limits their inference efficiency, particularly in scenarios such as autonomous driving, where strict real-time performance is essential. Achieving an effective balance between speed and accuracy has thus become a central challenge in this field. To address this issue, we present a lightweight semantic segmentation model tailored for the perception requirements of autonomous vehicles. The architecture follows an encoder–decoder paradigm, which not only preserves the capability for deep feature extraction but also facilitates multi-scale information integration. The encoder leverages a high-efficiency backbone, while the decoder introduces a dynamic fusion mechanism designed to enhance information interaction between different feature branches. Recognizing the limitations of convolutional networks in modeling long-range dependencies and capturing global semantic context, the model incorporates an attention-based feature extraction component. This is further augmented by positional encoding, enabling better awareness of spatial structures and local details. The dynamic fusion mechanism employs an adaptive weighting strategy, adjusting the contribution of each feature channel to reduce redundancy and improve representation quality. To validate the effectiveness of the proposed network, experiments were conducted on a single RTX 3090 GPU. The Dynamic Real-time Integrated Vision Encoder–Segmenter Network (DriveSegNet) achieved a mean Intersection over Union (mIoU) of 76.9% and an inference speed of 70.5 FPS on the Cityscapes test dataset, 74.6% mIoU and 139.8 FPS on the CamVid test dataset, and 35.8% mIoU with 108.4 FPS on the ADE20K dataset. The experimental results demonstrate that the proposed method achieves an excellent balance between inference speed, segmentation accuracy, and model size. Full article

► Show Figures

Figure 1

22 pages, 573 KiB

Open AccessArticle

Towards an Extensible and Text-Oriented Analytical Semantic Trajectory Framework

by Damião Ribeiro de Almeida, Cláudio de Souza Baptista, Fabio Gomes de Andrade and Anselmo Cardoso de Paiva

ISPRS Int. J. Geo-Inf. 2025, 14(8), 292; https://doi.org/10.3390/ijgi14080292 - 28 Jul 2025

Viewed by 189

Abstract

Semantically enriched trajectories have attracted growing interest in recent research, driven by the need for more expressive and context-aware movement data analysis. Two primary approaches have emerged for the storage and management of such data: moving object databases, which operate at the transactional [...] Read more.

Semantically enriched trajectories have attracted growing interest in recent research, driven by the need for more expressive and context-aware movement data analysis. Two primary approaches have emerged for the storage and management of such data: moving object databases, which operate at the transactional or operational level, and trajectory data warehouses (TDWs), which support analytical processing within decision support systems. Conventional TDW methodologies typically model semantic aspects of trajectories by introducing new dimensions into the data warehouse schema. However, this approach often requires structural modifications to the schema in order to accommodate additional semantic attributes, potentially resulting in significant disruptions to the architecture and maintenance of the underlying decision support systems. To overcome this limitation, we propose a novel TDW model that supports dynamic and extensible integration of semantic aspects, without necessitating changes to the schema. This design enhances flexibility and promotes seamless adaptability to domain-specific requirements. To enable such extensibility, we propose an innovative approach to representing semantic trajectories by leveraging natural language processing (NLP) techniques. without relying on traditional spatiotemporal features. This enables the analysis of semantic movement patterns purely through textual context. Finally, we present a comprehensive framework that implements the proposed model in real-world application scenarios, demonstrating its practical extensibility. Full article

► Show Figures

Figure 1

20 pages, 7280 KiB

Open AccessArticle

UAV-DETR: An Enhanced RT-DETR Architecture for Efficient Small Object Detection in UAV Imagery

by Yu Zhou and Yan Wei

Sensors 2025, 25(15), 4582; https://doi.org/10.3390/s25154582 - 24 Jul 2025

Viewed by 470

Abstract

To mitigate the technical challenges associated with small-object detection, feature degradation, and spatial-contextual misalignment in UAV-acquired imagery, this paper proposes UAV-DETR, an enhanced Transformer-based object detection model designed for aerial scenarios. Specifically, UAV imagery often suffers from feature degradation due to low resolution [...] Read more.

To mitigate the technical challenges associated with small-object detection, feature degradation, and spatial-contextual misalignment in UAV-acquired imagery, this paper proposes UAV-DETR, an enhanced Transformer-based object detection model designed for aerial scenarios. Specifically, UAV imagery often suffers from feature degradation due to low resolution and complex backgrounds and from semantic-spatial misalignment caused by dynamic shooting conditions. This work addresses these challenges by enhancing feature perception, semantic representation, and spatial alignment. Architecturally extending the RT-DETR framework, UAV-DETR incorporates three novel modules: the Channel-Aware Sensing Module (CAS), the Scale-Optimized Enhancement Pyramid Module (SOEP), and the newly designed Context-Spatial Alignment Module (CSAM), which integrates the functionalities of contextual and spatial calibration. These components collaboratively strengthen multi-scale feature extraction, semantic representation, and spatial-contextual alignment. The CAS module refines the backbone to improve multi-scale feature perception, while SOEP enhances semantic richness in shallow layers through lightweight channel-weighted fusion. CSAM further optimizes the hybrid encoder by simultaneously correcting contextual inconsistencies and spatial misalignments during feature fusion, enabling more precise cross-scale integration. Comprehensive comparisons with mainstream detectors, including Faster R-CNN and YOLOv5, demonstrate that UAV-DETR achieves superior small-object detection performance in complex aerial scenarios. The performance is thoroughly evaluated in terms of mAP@0.5, parameter count, and computational complexity (GFLOPs). Experiments on the VisDrone2019 dataset benchmark demonstrate that UAV-DETR achieves an mAP@0.5 of 51.6%, surpassing RT-DETR by 3.5% while reducing the number of model parameters from 19.8 million to 16.8 million. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

19 pages, 1040 KiB

Open AccessSystematic Review

A Systematic Review on Risk Management and Enhancing Reliability in Autonomous Vehicles

by Ali Mahmood and Róbert Szabolcsi

Machines 2025, 13(8), 646; https://doi.org/10.3390/machines13080646 - 24 Jul 2025

Viewed by 297

Abstract

Autonomous vehicles (AVs) hold the potential to revolutionize transportation by improving safety, operational efficiency, and environmental impact. However, ensuring reliability and safety in real-world conditions remains a major challenge. Based on an in-depth examination of 33 peer-reviewed studies (2015–2025), this systematic review organizes [...] Read more.

Autonomous vehicles (AVs) hold the potential to revolutionize transportation by improving safety, operational efficiency, and environmental impact. However, ensuring reliability and safety in real-world conditions remains a major challenge. Based on an in-depth examination of 33 peer-reviewed studies (2015–2025), this systematic review organizes advancements across five key domains: fault detection and diagnosis (FDD), collision avoidance and decision making, system reliability and resilience, validation and verification (V&V), and safety evaluation. It integrates both hardware- and software-level perspectives, with a focus on emerging techniques such as Bayesian behavior prediction, uncertainty-aware control, and set-based fault detection to enhance operational robustness. Despite these advances, this review identifies persistent challenges, including limited cross-layer fault modeling, lack of formal verification for learning-based components, and the scarcity of scenario-driven validation datasets. To address these gaps, this paper proposes future directions such as verifiable machine learning, unified fault propagation models, digital twin-based reliability frameworks, and cyber-physical threat modeling. This review offers a comprehensive reference for developing certifiable, context-aware, and fail-operational autonomous driving systems, contributing to the broader goal of ensuring safe and trustworthy AV deployment. Full article

(This article belongs to the Special Issue Innovative Applications and Challenges of Intelligent Automation and Control in Smart Machines)

► Show Figures

Figure 1

21 pages, 2941 KiB

Open AccessArticle

Dynamic Proxemic Model for Human–Robot Interactions Using the Golden Ratio

by Tomáš Spurný, Ján Babjak, Zdenko Bobovský and Aleš Vysocký

Appl. Sci. 2025, 15(15), 8130; https://doi.org/10.3390/app15158130 - 22 Jul 2025

Viewed by 227

Abstract

This paper presents a novel approach to determine dynamic safety and comfort zones in human–robot interactions (HRIs), with a focus on service robots operating in dynamic environments with people. The proposed proxemic model leverages the golden ratio-based comfort zone distribution and ISO safety [...] Read more.

This paper presents a novel approach to determine dynamic safety and comfort zones in human–robot interactions (HRIs), with a focus on service robots operating in dynamic environments with people. The proposed proxemic model leverages the golden ratio-based comfort zone distribution and ISO safety standards to define adaptive proxemic boundaries for robots around humans. Unlike traditional fixed-threshold approaches, this novel method proposes a gradual and context-sensitive modulation of robot behaviour based on human position, orientation, and relative velocity. The system was implemented on an NVIDIA Jetson Xavier NX platform using a ZED 2i stereo depth camera Stereolabs, New York, USA and tested on two mobile robotic platforms: Go1 Unitree, Hangzhou, China (quadruped) and Scout Mini Agilex, Dongguan, China (wheeled). The initial verification of proposed proxemic model through experimental comfort validation was conducted using two simple interaction scenarios, and subjective feedback was collected from participants using a modified Godspeed Questionnaire Series. The results show that the participants felt comfortable during the experiments with robots. This acceptance of the proposed methodology plays an initial role in supporting further research of the methodology. The proposed solution also facilitates integration into existing navigation frameworks and opens pathways towards socially aware robotic systems. Full article

(This article belongs to the Special Issue Intelligent Robotics: Design and Applications)

► Show Figures

Figure 1

27 pages, 1868 KiB

Open AccessArticle

SAM2-DFBCNet: A Camouflaged Object Detection Network Based on the Heira Architecture of SAM2

by Cao Yuan, Libang Liu, Yaqin Li and Jianxiang Li

Sensors 2025, 25(14), 4509; https://doi.org/10.3390/s25144509 - 21 Jul 2025

Viewed by 341

Abstract

Camouflaged Object Detection (COD) aims to segment objects that are highly integrated with their background, presenting significant challenges such as low contrast, complex textures, and blurred boundaries. Existing deep learning methods often struggle to achieve robust segmentation under these conditions. To address these [...] Read more.

Camouflaged Object Detection (COD) aims to segment objects that are highly integrated with their background, presenting significant challenges such as low contrast, complex textures, and blurred boundaries. Existing deep learning methods often struggle to achieve robust segmentation under these conditions. To address these limitations, this paper proposes a novel COD network, SAM2-DFBCNet, built upon the SAM2 Hiera architecture. Our network incorporates three key modules: (1) the Camouflage-Aware Context Enhancement Module (CACEM), which fuses local and global features through an attention mechanism to enhance contextual awareness in low-contrast scenes; (2) the Cross-Scale Feature Interaction Bridge (CSFIB), which employs a bidirectional convolutional GRU for the dynamic fusion of multi-scale features, effectively mitigating representation inconsistencies caused by complex textures and deformations; and (3) the Dynamic Boundary Refinement Module (DBRM), which combines channel and spatial attention mechanisms to optimize boundary localization accuracy and enhance segmentation details. Extensive experiments on three public datasets—CAMO, COD10K, and NC4K—demonstrate that SAM2-DFBCNet outperforms twenty state-of-the-art methods, achieving maximum improvements of 7.4%, 5.78%, and 4.78% in key metrics such as S-measure (

S_{α}

), F-measure (

F_{β}

), and mean E-measure (

E_{ϕ}

), respectively, while reducing the Mean Absolute Error (M) by 37.8%. These results validate the superior performance and robustness of our approach in complex camouflage scenarios. Full article

(This article belongs to the Special Issue Transformer Applications in Target Tracking)

► Show Figures

Figure 1

22 pages, 2129 KiB

Open AccessArticle

Reinforcement Learning Methods for Emulating Personality in a Game Environment

by Georgios Liapis, Anna Vordou, Stavros Nikolaidis and Ioannis Vlahavas

Appl. Sci. 2025, 15(14), 7894; https://doi.org/10.3390/app15147894 - 15 Jul 2025

Viewed by 378

Abstract

Reinforcement learning (RL), a branch of artificial intelligence (AI), is becoming more popular in a variety of application fields such as games, workplaces, and behavioral analysis, due to its ability to model complex decision-making through interaction and feedback. Traditional systems for personality and [...] Read more.

Reinforcement learning (RL), a branch of artificial intelligence (AI), is becoming more popular in a variety of application fields such as games, workplaces, and behavioral analysis, due to its ability to model complex decision-making through interaction and feedback. Traditional systems for personality and behavior assessment often rely on self-reported questionnaires, which are prone to bias and manipulation. RL offers a compelling alternative by generating diverse, objective behavioral data through agent–environment interactions. In this paper, we propose a Reinforcement Learning-based framework in a game environment, where agents simulate personality-driven behavior using context-aware policies and exhibit a wide range of realistic actions. Our method, which is based on the OCEAN Five personality model—openness, conscientiousness, extroversion, agreeableness, and neuroticism—relates psychological profiles to in-game decision-making patterns. The agents are allowed to operate in numerous environments, observe behaviors that were modeled using another simulation system (HiDAC) and develop the skills needed to navigate and complete tasks. As a result, we are able to identify the personality types and team configurations that have the greatest effects on task performance and collaboration effectiveness. Using interaction data derived from self-play, we investigate the relationships between behaviors motivated by the personalities of the agents, communication styles, and team outcomes. The results demonstrate that in addition to having an effect on performance, personality-aware agents provide a solid methodology for producing realistic behavioral data, developing adaptive NPCs, and evaluating team-based scenarios in challenging settings. Full article

(This article belongs to the Special Issue Innovative Artificial Intelligence Methods, Tools and Methodologies to Address Challenging Real-World Problems)

► Show Figures

Figure 1

20 pages, 3414 KiB

Open AccessArticle

Improvement in the Interception Vulnerability Level of Encryption Mechanism in GSM

by Fawad Ahmad, Reshail Khan and Armel Asongu Nkembi

Inventions 2025, 10(4), 56; https://doi.org/10.3390/inventions10040056 - 14 Jul 2025

Viewed by 270

Abstract

Data security is of the utmost importance in the domain of real-time environmental monitoring systems, particularly when employing advanced context-aware intelligent visual analytics. This paper addresses a significant deficiency in the Global System for Mobile Communications (GSM), a widely employed wireless communication system [...] Read more.

Data security is of the utmost importance in the domain of real-time environmental monitoring systems, particularly when employing advanced context-aware intelligent visual analytics. This paper addresses a significant deficiency in the Global System for Mobile Communications (GSM), a widely employed wireless communication system for environmental monitoring. The A5/1 encryption technique, which is extensively employed, ensures the security of user data by utilizing a 64-bit session key that is divided into three linear feedback shift registers (LFSRs). Despite the shown efficacy, the development of a probabilistic model for assessing the vulnerability of breaking or intercepting the session key (Kc) has not yet been achieved. In order to bridge this existing knowledge gap, this study proposes a probabilistic model that aims to evaluate the security of encrypted data within the framework of the Global System for Mobile Communications (GSM). The proposed model implements alterations to the current GSM encryption process by the augmentation of the quantity of Linear Feedback Shift Registers (LFSRs), consequently resulting in an improved level of security. The methodology entails increasing the number of registers while preserving the session key’s length, ensuring that the key length specified by GSM standards remains unaltered. This is especially important for environmental monitoring systems that depend on real-time data analysis and decision-making. In order to elucidate the notion, this analysis considers three distinct scenarios: encryption utilizing a set of five, seven, and nine registers. The majority function is employed to determine the registers that will undergo perturbation, hence increasing the complexity of the bit arrangement and enhancing the security against prospective attackers. This paper provides actual evidence using simulations to illustrate that an increase in the number of registers leads to a decrease in the vulnerability of data interception, hence boosting data security in GSM communication. Simulation results demonstrate that our method substantially reduces the risk of data interception, thereby improving the integrity of context-aware intelligent visual analytics in real-time environmental monitoring systems. Full article

(This article belongs to the Section Inventions and Innovation in Electrical Engineering/Energy/Communications)

► Show Figures

Figure 1

25 pages, 1429 KiB

Open AccessArticle

A Contrastive Semantic Watermarking Framework for Large Language Models

by Jianxin Wang, Xiangze Chang, Chaoen Xiao and Lei Zhang

Symmetry 2025, 17(7), 1124; https://doi.org/10.3390/sym17071124 - 14 Jul 2025

Viewed by 402

Abstract

The widespread deployment of large language models (LLMs) has raised urgent demands for verifiable content attribution and misuse mitigation. Existing text watermarking techniques often struggle in black-box or sampling-based scenarios due to limitations in robustness, imperceptibility, and detection generality. These challenges are particularly [...] Read more.

The widespread deployment of large language models (LLMs) has raised urgent demands for verifiable content attribution and misuse mitigation. Existing text watermarking techniques often struggle in black-box or sampling-based scenarios due to limitations in robustness, imperceptibility, and detection generality. These challenges are particularly critical in open-access settings, where model internals and generation logits are unavailable for attribution. To address these limitations, we propose CWS (Contrastive Watermarking with Semantic Modeling)—a novel keyless watermarking framework that integrates contrastive semantic token selection and shared embedding space alignment. CWS enables context-aware, fluent watermark embedding while supporting robust detection via a dual-branch mechanism: a lightweight z-score statistical test for public verification and a GRU-based semantic decoder for black-box adversarial robustness. Experiments on GPT-2, OPT-1.3B, and LLaMA-7B over C4 and DBpedia datasets demonstrate that CWS achieves F1 scores up to 99.9% and maintains F1 ≥ 93% under semantic rewriting, token substitution, and lossy compression (ε ≤ 0.25, δ ≤ 0.2). The GRU-based detector offers a superior speed–accuracy trade-off (0.42 s/sample) over LSTM and Transformer baselines. These results highlight CWS as a lightweight, black-box-compatible, and semantically robust watermarking method suitable for practical content attribution across LLM architectures and decoding strategies. Furthermore, CWS maintains a symmetrical architecture between embedding and detection stages via shared semantic representations, ensuring structural consistency and robustness. This semantic symmetry helps preserve detection reliability across diverse decoding strategies and adversarial conditions. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

19 pages, 1186 KiB

Open AccessArticle

Synthetic Patient–Physician Conversations Simulated by Large Language Models: A Multi-Dimensional Evaluation

by Syed Ali Haider, Srinivasagam Prabha, Cesar Abraham Gomez-Cabello, Sahar Borna, Ariana Genovese, Maissa Trabilsy, Bernardo G. Collaco, Nadia G. Wood, Sanjay Bagaria, Cui Tao and Antonio Jorge Forte

Sensors 2025, 25(14), 4305; https://doi.org/10.3390/s25144305 - 10 Jul 2025

Viewed by 565

Abstract

Background: Data accessibility remains a significant barrier in healthcare AI due to privacy constraints and logistical challenges. Synthetic data, which mimics real patient information while remaining both realistic and non-identifiable, offers a promising solution. Large Language Models (LLMs) create new opportunities to generate [...] Read more.

Background: Data accessibility remains a significant barrier in healthcare AI due to privacy constraints and logistical challenges. Synthetic data, which mimics real patient information while remaining both realistic and non-identifiable, offers a promising solution. Large Language Models (LLMs) create new opportunities to generate high-fidelity clinical conversations between patients and physicians. However, the value of this synthetic data depends on careful evaluation of its realism, accuracy, and practical relevance. Objective: To assess the performance of four leading LLMs: ChatGPT 4.5, ChatGPT 4o, Claude 3.7 Sonnet, and Gemini Pro 2.5 in generating synthetic transcripts of patient–physician interactions in plastic surgery scenarios. Methods: Each model generated transcripts for ten plastic surgery scenarios. Transcripts were independently evaluated by three clinically trained raters using a seven-criterion rubric: Medical Accuracy, Realism, Persona Consistency, Fidelity, Empathy, Relevancy, and Usability. Raters were blinded to the model identity to reduce bias. Each was rated on a 5-point Likert scale, yielding 840 total evaluations. Descriptive statistics were computed, and a two-way repeated measures ANOVA was used to test for differences across models and metrics. In addition, transcripts were analyzed using automated linguistic and content-based metrics. Results: All models achieved strong performance, with mean ratings exceeding 4.5 across all criteria. Gemini 2.5 Pro received mean scores (5.00 ± 0.00) in Medical Accuracy, Realism, Persona Consistency, Relevancy, and Usability. Claude 3.7 Sonnet matched the scores in Persona Consistency and Relevancy and led in Empathy (4.96 ± 0.18). ChatGPT 4.5 also achieved perfect scores in Relevancy, with high scores in Empathy (4.93 ± 0.25) and Usability (4.96 ± 0.18). ChatGPT 4o demonstrated consistently strong but slightly lower performance across most dimensions. ANOVA revealed no statistically significant differences across models (F(3, 6) = 0.85, p = 0.52). Automated analysis showed substantial variation in transcript length, style, and content richness: Gemini 2.5 Pro generated the longest and most emotionally expressive dialogues, while ChatGPT 4o produced the shortest and most concise outputs. Conclusions: Leading LLMs can generate medically accurate, emotionally appropriate synthetic dialogues suitable for educational and research use. Despite high performance, demographic homogeneity in generated patients highlights the need for improved diversity and bias mitigation in model outputs. These findings support the cautious, context-aware integration of LLM-generated dialogues into medical training, simulation, and research. Full article

(This article belongs to the Special Issue Feature Papers in Smart Sensing and Intelligent Sensors 2025)

► Show Figures

Figure 1

25 pages, 8372 KiB

Open AccessArticle

CSDNet: Context-Aware Segmentation of Disaster Aerial Imagery Using Detection-Guided Features and Lightweight Transformers

by Ahcene Zetout and Mohand Saïd Allili

Remote Sens. 2025, 17(14), 2337; https://doi.org/10.3390/rs17142337 - 8 Jul 2025

Viewed by 338

Abstract

Accurate multi-class semantic segmentation of disaster-affected areas is essential for rapid response and effective recovery planning. We present CSDNet, a context-aware segmentation model tailored to disaster scene scenarios, designed to improve segmentation of both large-scale disaster zones and small, underrepresented classes. The architecture [...] Read more.

Accurate multi-class semantic segmentation of disaster-affected areas is essential for rapid response and effective recovery planning. We present CSDNet, a context-aware segmentation model tailored to disaster scene scenarios, designed to improve segmentation of both large-scale disaster zones and small, underrepresented classes. The architecture combines a lightweight transformer module for global context modeling with depthwise separable convolutions (DWSCs) to enhance efficiency without compromising representational capacity. Additionally, we introduce a detection-guided feature fusion mechanism that integrates outputs from auxiliary detection tasks to mitigate class imbalance and improve discrimination of visually similar categories. Extensive experiments on several public datasets demonstrate that our model significantly improves segmentation of both man-made infrastructure and natural damage-related features, offering a robust and efficient solution for post-disaster analysis. Full article

► Show Figures

Figure 1

21 pages, 4859 KiB

Open AccessArticle

Improvement of SAM2 Algorithm Based on Kalman Filtering for Long-Term Video Object Segmentation

by Jun Yin, Fei Wu, Hao Su, Peng Huang and Yuetong Qixuan

Sensors 2025, 25(13), 4199; https://doi.org/10.3390/s25134199 - 5 Jul 2025

Viewed by 516

Abstract

The Segment Anything Model 2 (SAM2) has achieved state-of-the-art performance in pixel-level object segmentation for both static and dynamic visual content. Its streaming memory architecture maintains spatial context across video sequences, yet struggles with long-term tracking due to its static inference framework. SAM [...] Read more.

The Segment Anything Model 2 (SAM2) has achieved state-of-the-art performance in pixel-level object segmentation for both static and dynamic visual content. Its streaming memory architecture maintains spatial context across video sequences, yet struggles with long-term tracking due to its static inference framework. SAM 2’s fixed temporal window approach indiscriminately retains historical frames, failing to account for frame quality or dynamic motion patterns. This leads to error propagation and tracking instability in challenging scenarios involving fast-moving objects, partial occlusions, or crowded environments. To overcome these limitations, this paper proposes SAM2Plus, a zero-shot enhancement framework that integrates Kalman filter prediction, dynamic quality thresholds, and adaptive memory management. The Kalman filter models object motion using physical constraints to predict trajectories and dynamically refine segmentation states, mitigating positional drift during occlusions or velocity changes. Dynamic thresholds, combined with multi-criteria evaluation metrics (e.g., motion coherence, appearance consistency), prioritize high-quality frames while adaptively balancing confidence scores and temporal smoothness. This reduces ambiguities among similar objects in complex scenes. SAM2Plus further employs an optimized memory system that prunes outdated or low-confidence entries and retains temporally coherent context, ensuring constant computational resources even for infinitely long videos. Extensive experiments on two video object segmentation (VOS) benchmarks demonstrate SAM2Plus’s superiority over SAM 2. It achieves an average improvement of 1.0 in J&F metrics across all 24 direct comparisons, with gains exceeding 2.3 points on SA-V and LVOS datasets for long-term tracking. The method delivers real-time performance and strong generalization without fine-tuning or additional parameters, effectively addressing occlusion recovery and viewpoint changes. By unifying motion-aware physics-based prediction with spatial segmentation, SAM2Plus bridges the gap between static and dynamic reasoning, offering a scalable solution for real-world applications such as autonomous driving and surveillance systems. Full article

(This article belongs to the Special Issue Image and Video Processing and Recognition Based on Artificial Intelligence: 3rd Edition)

► Show Figures

Figure 1

27 pages, 3702 KiB

Open AccessArticle

Domain Knowledge-Enhanced Process Mining for Anomaly Detection in Commercial Bank Business Processes

by Yanying Li, Zaiwen Ni and Binqing Xiao

Systems 2025, 13(7), 545; https://doi.org/10.3390/systems13070545 - 4 Jul 2025

Viewed by 268

Abstract

Process anomaly detection in financial services systems is crucial for operational compliance and risk management. However, traditional process mining techniques frequently neglect the detection of significant low-frequency abnormalities due to their dependence on frequency and the inadequate incorporation of domain-specific knowledge. Therefore, we [...] Read more.

Process anomaly detection in financial services systems is crucial for operational compliance and risk management. However, traditional process mining techniques frequently neglect the detection of significant low-frequency abnormalities due to their dependence on frequency and the inadequate incorporation of domain-specific knowledge. Therefore, we develop an enhanced process mining algorithm by incorporating a domain-specific follow-relationship matrix derived from standard operating procedures (SOPs). We empirically evaluated the effectiveness of the proposed algorithm based on real-world event logs from a corporate account-opening process conducted from January to December 2022 in a Chinese commercial bank. Additionally, we employed large language models (LLMs) for root cause analysis and process optimization recommendations. The empirical results demonstrate that the E-Heuristic Miner significantly outperforms traditional machine learning methods and process mining algorithms in process anomaly detection. Furthermore, the integration of LLMs provides promising capabilities in semantic reasoning and offers explainable optimization suggestions, enhancing decision-making support in complex financial scenarios. Our study significantly improves the precision of process anomaly detection in financial contexts by incorporating banking-specific domain knowledge into process mining algorithms. Meanwhile, it extends theoretical boundaries and the practical applicability of process mining in intelligent, semantic-aware financial service management. Full article

(This article belongs to the Special Issue Business Process Management Based on Big Data Analytics)

► Show Figures

Figure 1

Search Results (285)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (285)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI