Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,066)

Search Parameters:
Keywords = video adaptation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 1265 KiB  
Article
Validation of the Player Personality and Dynamics Scale
by Ayose Lomba Perez, Juan Carlos Martín-Quintana, Jesus B. Alonso-Hernandez and Iván Martín-Rodríguez
Appl. Sci. 2025, 15(15), 8714; https://doi.org/10.3390/app15158714 - 6 Aug 2025
Abstract
This study presents the validation of the Player Personality and Dynamics Scale (PPDS), designed to identify player profiles in educational gamification contexts with narrative elements. Through a sample of 635 participants, a questionnaire was developed and applied, covering sociodemographic data, lifestyle habits, gaming [...] Read more.
This study presents the validation of the Player Personality and Dynamics Scale (PPDS), designed to identify player profiles in educational gamification contexts with narrative elements. Through a sample of 635 participants, a questionnaire was developed and applied, covering sociodemographic data, lifestyle habits, gaming practices, and a classification system of 40 items on a six-point Likert scale. The results of the factorial analysis confirm a structure of five factors: Toxic Profile, Joker Profile, Tryhard Profile, Aesthetic Profile, and Coacher Profile, with high fit and reliability indices (RMSEA = 0.06; CFI = 0.95; TLI = 0.91). The resulting classification enables the design of personalized gamified experiences that enhance learning and interaction in the classroom, highlighting the importance of understanding players’ motivations to better adapt educational dynamics. Applying this scale fosters meaningful learning through the creation of narratives tailored to students’ individual preferences. Full article
Show Figures

Figure 1

19 pages, 1109 KiB  
Article
User Preference-Based Dynamic Optimization of Quality of Experience for Adaptive Video Streaming
by Zixuan Feng, Yazhi Liu and Hao Zhang
Electronics 2025, 14(15), 3103; https://doi.org/10.3390/electronics14153103 - 4 Aug 2025
Viewed by 133
Abstract
With the rapid development of video streaming services, adaptive bitrate (ABR) algorithms have become a core technology for ensuring optimal viewing experiences. Traditional ABR strategies, predominantly rule-based or reinforcement learning-driven, typically employ uniform quality assessment metrics that overlook users’ subjective preference differences regarding [...] Read more.
With the rapid development of video streaming services, adaptive bitrate (ABR) algorithms have become a core technology for ensuring optimal viewing experiences. Traditional ABR strategies, predominantly rule-based or reinforcement learning-driven, typically employ uniform quality assessment metrics that overlook users’ subjective preference differences regarding factors such as video quality and stalling. To address this limitation, this paper proposes an adaptive video bitrate selection system that integrates preference modeling with reinforcement learning. By incorporating a preference learning module, the system models and scores user viewing trajectories, using these scores to replace conventional rewards and guide the training of the Proximal Policy Optimization (PPO) algorithm, thereby achieving policy optimization that better aligns with users’ perceived experiences. Simulation results on DASH network bandwidth traces demonstrate that the proposed optimization method improves overall Quality of Experience (QoE) by over 9% compared to other mainstream algorithms. Full article
Show Figures

Figure 1

24 pages, 1751 KiB  
Article
Robust JND-Guided Video Watermarking via Adaptive Block Selection and Temporal Redundancy
by Antonio Cedillo-Hernandez, Lydia Velazquez-Garcia, Manuel Cedillo-Hernandez, Ismael Dominguez-Jimenez and David Conchouso-Gonzalez
Mathematics 2025, 13(15), 2493; https://doi.org/10.3390/math13152493 - 3 Aug 2025
Viewed by 225
Abstract
This paper introduces a robust and imperceptible video watermarking framework designed for blind extraction in dynamic video environments. The proposed method operates in the spatial domain and combines multiscale perceptual analysis, adaptive Just Noticeable Difference (JND)-based quantization, and temporal redundancy via multiframe embedding. [...] Read more.
This paper introduces a robust and imperceptible video watermarking framework designed for blind extraction in dynamic video environments. The proposed method operates in the spatial domain and combines multiscale perceptual analysis, adaptive Just Noticeable Difference (JND)-based quantization, and temporal redundancy via multiframe embedding. Watermark bits are embedded selectively in blocks with high perceptual masking using a QIM strategy, and the corresponding DCT coefficients are estimated directly from the spatial domain to reduce complexity. To enhance resilience, each bit is redundantly inserted across multiple keyframes selected based on scene transitions. Extensive simulations over 21 benchmark videos (CIF, 4CIF, HD) validate that the method achieves superior performance in robustness and perceptual quality, with an average Bit Error Rate (BER) of 1.03%, PSNR of 50.1 dB, SSIM of 0.996, and VMAF of 97.3 under compression, noise, cropping, and temporal desynchronization. The system outperforms several recent state-of-the-art techniques in both quality and speed, requiring no access to the original video during extraction. These results confirm the method’s viability for practical applications such as copyright protection and secure video streaming. Full article
(This article belongs to the Section E: Applied Mathematics)
Show Figures

Figure 1

24 pages, 23817 KiB  
Article
Dual-Path Adversarial Denoising Network Based on UNet
by Jinchi Yu, Yu Zhou, Mingchen Sun and Dadong Wang
Sensors 2025, 25(15), 4751; https://doi.org/10.3390/s25154751 - 1 Aug 2025
Viewed by 234
Abstract
Digital image quality is crucial for reliable analysis in applications such as medical imaging, satellite remote sensing, and video surveillance. However, traditional denoising methods struggle to balance noise removal with detail preservation and lack adaptability to various types of noise. We propose a [...] Read more.
Digital image quality is crucial for reliable analysis in applications such as medical imaging, satellite remote sensing, and video surveillance. However, traditional denoising methods struggle to balance noise removal with detail preservation and lack adaptability to various types of noise. We propose a novel three-module architecture for image denoising, comprising a generator, a dual-path-UNet-based denoiser, and a discriminator. The generator creates synthetic noise patterns to augment training data, while the dual-path-UNet denoiser uses multiple receptive field modules to preserve fine details and dense feature fusion to maintain global structural integrity. The discriminator provides adversarial feedback to enhance denoising performance. This dual-path adversarial training mechanism addresses the limitations of traditional methods by simultaneously capturing both local details and global structures. Experiments on the SIDD, DND, and PolyU datasets demonstrate superior performance. We compare our architecture with the latest state-of-the-art GAN variants through comprehensive qualitative and quantitative evaluations. These results confirm the effectiveness of noise removal with minimal loss of critical image details. The proposed architecture enhances image denoising capabilities in complex noise scenarios, providing a robust solution for applications that require high image fidelity. By enhancing adaptability to various types of noise while maintaining structural integrity, this method provides a versatile tool for image processing tasks that require preserving detail. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

18 pages, 8744 KiB  
Article
A User-Centered Teleoperation GUI for Automated Vehicles: Identifying and Evaluating Information Requirements for Remote Driving and Assistance
by Maria-Magdalena Wolf, Henrik Schmidt, Michael Christl, Jana Fank and Frank Diermeyer
Multimodal Technol. Interact. 2025, 9(8), 78; https://doi.org/10.3390/mti9080078 - 31 Jul 2025
Viewed by 207
Abstract
Teleoperation emerged as a promising fallback for situations beyond the capabilities of automated vehicles. Nevertheless, teleoperation still faces challenges, such as reduced situational awareness. Since situational awareness is primarily built through the remote operator’s visual perception, the graphical user interface (GUI) design is [...] Read more.
Teleoperation emerged as a promising fallback for situations beyond the capabilities of automated vehicles. Nevertheless, teleoperation still faces challenges, such as reduced situational awareness. Since situational awareness is primarily built through the remote operator’s visual perception, the graphical user interface (GUI) design is critical. In addition to video feed, supplemental informational elements are crucial—not only for the predominantly studied remote driving, but also for emerging desk-based remote assistance concepts. This work develops a GUI for different teleoperation concepts by identifying key informational elements during the teleoperation process through expert interviews (N = 9). Following this, a static and dynamic GUI prototype was developed and evaluated in a click dummy study (N = 36). Thereby, the dynamic GUI adapts the number of displayed elements according to the teleoperation phase. Results show that both GUIs achieve good system usability scale (SUS) ratings, with the dynamic GUI significantly outperforming the static version in both usability and task completion time. However, the results might be attributable to a learning effect due to the lack of randomization. The user experience questionnaire (UEQ) score shows potential for improvement. To enhance the user experience, the GUI should be evaluated in a follow-up study that includes interaction with a real vehicle. Full article
Show Figures

Figure 1

19 pages, 3130 KiB  
Article
Deep Learning-Based Instance Segmentation of Galloping High-Speed Railway Overhead Contact System Conductors in Video Images
by Xiaotong Yao, Huayu Yuan, Shanpeng Zhao, Wei Tian, Dongzhao Han, Xiaoping Li, Feng Wang and Sihua Wang
Sensors 2025, 25(15), 4714; https://doi.org/10.3390/s25154714 - 30 Jul 2025
Viewed by 234
Abstract
The conductors of high-speed railway OCSs (Overhead Contact Systems) are susceptible to conductor galloping due to the impact of natural elements such as strong winds, rain, and snow, resulting in conductor fatigue damage and significantly compromising train operational safety. Consequently, monitoring the galloping [...] Read more.
The conductors of high-speed railway OCSs (Overhead Contact Systems) are susceptible to conductor galloping due to the impact of natural elements such as strong winds, rain, and snow, resulting in conductor fatigue damage and significantly compromising train operational safety. Consequently, monitoring the galloping status of conductors is crucial, and instance segmentation techniques, by delineating the pixel-level contours of each conductor, can significantly aid in the identification and study of galloping phenomena. This work expands upon the YOLO11-seg model and introduces an instance segmentation approach for galloping video and image sensor data of OCS conductors. The algorithm, designed for the stripe-like distribution of OCS conductors in the data, employs four-direction Sobel filters to extract edge features in horizontal, vertical, and diagonal orientations. These features are subsequently integrated with the original convolutional branch to form the FDSE (Four Direction Sobel Enhancement) module. It integrates the ECA (Efficient Channel Attention) mechanism for the adaptive augmentation of conductor characteristics and utilizes the FL (Focal Loss) function to mitigate the class-imbalance issue between positive and negative samples, hence enhancing the model’s sensitivity to conductors. Consequently, segmentation outcomes from neighboring frames are utilized, and mask-difference analysis is performed to autonomously detect conductor galloping locations, emphasizing their contours for the clear depiction of galloping characteristics. Experimental results demonstrate that the enhanced YOLO11-seg model achieves 85.38% precision, 77.30% recall, 84.25% AP@0.5, 81.14% F1-score, and a real-time processing speed of 44.78 FPS. When combined with the galloping visualization module, it can issue real-time alerts of conductor galloping anomalies, providing robust technical support for railway OCS safety monitoring. Full article
(This article belongs to the Section Industrial Sensors)
Show Figures

Figure 1

21 pages, 6892 KiB  
Article
Enhanced Temporal Action Localization with Separated Bidirectional Mamba and Boundary Correction Strategy
by Xiangbin Liu and Qian Peng
Mathematics 2025, 13(15), 2458; https://doi.org/10.3390/math13152458 - 30 Jul 2025
Viewed by 250
Abstract
Temporal action localization (TAL) is a research hotspot in video understanding, which aims to locate and classify actions in videos. However, existing methods have difficulties in capturing long-term actions due to focusing on local temporal information, which leads to poor performance in localizing [...] Read more.
Temporal action localization (TAL) is a research hotspot in video understanding, which aims to locate and classify actions in videos. However, existing methods have difficulties in capturing long-term actions due to focusing on local temporal information, which leads to poor performance in localizing long-term temporal sequences. In addition, most methods ignore the boundary importance for action instances, resulting in inaccurate localized boundaries. To address these issues, this paper proposes a state space model for temporal action localization, called Separated Bidirectional Mamba (SBM), which innovatively understands frame changes from the perspective of state transformation. It adapts to different sequence lengths and incorporates state information from the forward and backward for each frame through forward Mamba and backward Mamba to obtain more comprehensive action representations, enhancing modeling capabilities for long-term temporal sequences. Moreover, this paper designs a Boundary Correction Strategy (BCS). It calculates the contribution of each frame to action instances based on the pre-localized results, then adjusts weights of frames in boundary regression to ensure the boundaries are shifted towards the frames with higher contributions, leading to more accurate boundaries. To demonstrate the effectiveness of the proposed method, this paper reports mean Average Precision (mAP) under temporal Intersection over Union (tIoU) thresholds on four challenging benchmarks: THUMOS13, ActivityNet-1.3, HACS, and FineAction, where the proposed method achieves mAPs of 73.7%, 42.0%, 45.2%, and 29.1%, respectively, surpassing the state-of-the-art approaches. Full article
(This article belongs to the Special Issue Advances in Applied Mathematics in Computer Vision)
Show Figures

Figure 1

18 pages, 1127 KiB  
Article
Deep Reinforcement Learning Method for Wireless Video Transmission Based on Large Deviations
by Yongxiao Xie and Shian Song
Mathematics 2025, 13(15), 2434; https://doi.org/10.3390/math13152434 - 28 Jul 2025
Viewed by 160
Abstract
In scalable video transmission research, the video transmission process is commonly modeled as a Markov decision process, where deep reinforcement learning (DRL) methods are employed to optimize the wireless transmission of scalable videos. Furthermore, the adaptive DRL algorithm can address the energy shortage [...] Read more.
In scalable video transmission research, the video transmission process is commonly modeled as a Markov decision process, where deep reinforcement learning (DRL) methods are employed to optimize the wireless transmission of scalable videos. Furthermore, the adaptive DRL algorithm can address the energy shortage problem caused by the uncertainty of energy capture and accumulated storage, thereby reducing video interruptions and enhancing user experience. To further optimize resources in wireless energy transmission and tackle the challenge of balancing exploration and exploitation in the DRL algorithm, this paper develops an adaptive DRL algorithm that extends classical DRL frameworks by integrating dropout techniques during both the training and prediction processes. Moreover, to address the issue of continuous negative rewards, which are often attributed to incomplete training in the wireless video transmission DRL algorithm, this paper introduces the Cramér large deviation principle for specific discrimination. It identifies the optimal negative reward frequency boundary and minimizes the probability of misjudgment regarding continuous negative rewards. Finally, experimental validation is performed using the 2048-game environment that simulates wireless scalable video transmission conditions. The results demonstrate that the adaptive DRL algorithm described in this paper achieves superior convergence speed and higher cumulative rewards compared to the classical DRL approaches. Full article
(This article belongs to the Special Issue Optimization Theory, Method and Application, 2nd Edition)
Show Figures

Figure 1

27 pages, 1128 KiB  
Article
Adaptive Multi-Hop P2P Video Communication: A Super Node-Based Architecture for Conversation-Aware Streaming
by Jiajing Chen and Satoshi Fujita
Information 2025, 16(8), 643; https://doi.org/10.3390/info16080643 - 28 Jul 2025
Viewed by 344
Abstract
This paper proposes a multi-hop peer-to-peer (P2P) video streaming architecture designed to support dynamic, conversation-aware communication. The primary contribution is a decentralized system built on WebRTC that eliminates reliance on a central media server by employing super node aggregation. In this architecture, video [...] Read more.
This paper proposes a multi-hop peer-to-peer (P2P) video streaming architecture designed to support dynamic, conversation-aware communication. The primary contribution is a decentralized system built on WebRTC that eliminates reliance on a central media server by employing super node aggregation. In this architecture, video streams from multiple peer nodes are dynamically routed through a group of super nodes, enabling real-time reconfiguration of the network topology in response to conversational changes. To support this dynamic behavior, the system leverages WebRTC data channels for control signaling and overlay restructuring, allowing efficient dissemination of topology updates and coordination messages among peers. A key focus of this study is the rapid and efficient reallocation of network resources immediately following conversational events, ensuring that the streaming overlay remains aligned with ongoing interaction patterns. While the automatic detection of such events is beyond the scope of this work, we assume that external triggers are available to initiate topology updates. To validate the effectiveness of the proposed system, we construct a simulation environment using Docker containers and evaluate its streaming performance under dynamic network conditions. The results demonstrate the system’s applicability to adaptive, naturalistic communication scenarios. Finally, we discuss future directions, including the seamless integration of external trigger sources and enhanced support for flexible, context-sensitive interaction frameworks. Full article
(This article belongs to the Special Issue Second Edition of Advances in Wireless Communications Systems)
Show Figures

Figure 1

17 pages, 1486 KiB  
Article
Use of Instagram as an Educational Strategy for Learning Animal Reproduction
by Carlos C. Pérez-Marín
Vet. Sci. 2025, 12(8), 698; https://doi.org/10.3390/vetsci12080698 - 25 Jul 2025
Viewed by 300
Abstract
The present study explores the use of Instagram as an innovative strategy in the teaching–learning process in the context of animal reproduction topics. In the current era, with digital technology and social media transforming how information is accessed and consumed, it is essential [...] Read more.
The present study explores the use of Instagram as an innovative strategy in the teaching–learning process in the context of animal reproduction topics. In the current era, with digital technology and social media transforming how information is accessed and consumed, it is essential for teachers to adapt and harness the potential of these tools for educational purposes. This article delves into the need for teachers to stay updated with current trends and the importance of promoting digital competences among teachers. This research aims to provide insights into the benefits of integrating social media into the educational landscape. Students of Veterinary Science degrees, Master’s degrees in Equine Sport Medicine as well as vocational education and training (VET) were involved in this study. An Instagram account named “UCOREPRO” was created for educational use, and it was openly available to all users. Instagram usage metrics were consistently tracked. A voluntary survey comprising 35 questions was conducted to collect feedback regarding the educational use of smartphone technology, social media habits and the UCOREPRO Instagram account. The integration of Instagram as an educational tool was positively received by veterinary students. Survey data revealed that 92.3% of respondents found the content engaging, with 79.5% reporting improved understanding of the subject and 71.8% acquiring new knowledge. Students suggested improvements such as more frequent posting and inclusion of academic incentives. Concerns about privacy and digital distraction were present but did not outweigh the perceived benefits. The use of short videos and microlearning strategies proved particularly effective in capturing students’ attention. Overall, Instagram was found to be a promising platform to enhance motivation, engagement, and informal learning in veterinary education, provided that thoughtful integration and clear educational objectives are maintained. In general, students expressed positive opinions about the initiative, and suggested some ways in which it could be improved as an educational tool. Full article
Show Figures

Figure 1

15 pages, 4180 KiB  
Article
Quantitative and Correlation Analysis of Pear Leaf Dynamics Under Wind Field Disturbances
by Yunfei Wang, Xiang Dong, Weidong Jia, Mingxiong Ou, Shiqun Dai, Zhenlei Zhang and Ruohan Shi
Agriculture 2025, 15(15), 1597; https://doi.org/10.3390/agriculture15151597 - 24 Jul 2025
Viewed by 257
Abstract
In wind-assisted orchard spraying operations, the dynamic response of leaves—manifested through changes in their posture—critically influences droplet deposition on both sides of the leaf surface and the penetration depth into the canopy. These factors are pivotal in determining spray coverage and the spatial [...] Read more.
In wind-assisted orchard spraying operations, the dynamic response of leaves—manifested through changes in their posture—critically influences droplet deposition on both sides of the leaf surface and the penetration depth into the canopy. These factors are pivotal in determining spray coverage and the spatial distribution of pesticide efficacy. However, current research lacks comprehensive quantification and correlation analysis of the temporal response characteristics of leaves under wind disturbances. To address this gap, a systematic analytical framework was proposed, integrating real-time leaf segmentation and tracking, geometric feature quantification, and statistical correlation modeling. High-frame-rate videos of fluttering leaves were acquired under controlled wind conditions, and background segmentation was performed using principal component analysis (PCA) followed by clustering in the reduced feature space. A fine-tuned Segment Anything Model 2 (SAM2-FT) was employed to extract dynamic leaf masks and enable frame-by-frame tracking. Based on the extracted masks, time series of leaf area and inclination angle were constructed. Subsequently, regression analysis, cross-correlation functions, and Granger causality tests were applied to investigate cooperative responses and potential driving relationships among leaves. Results showed that the SAM2-FT model significantly outperformed the YOLO series in segmentation accuracy, achieving a precision of 98.7% and recall of 97.48%. Leaf area exhibited strong linear coupling and directional causality, while angular responses showed weaker correlations but demonstrated localized synchronization. This study offers a methodological foundation for quantifying temporal dynamics in wind–leaf systems and provides theoretical insights for the adaptive control and optimization of intelligent spraying strategies. Full article
(This article belongs to the Section Agricultural Technology)
Show Figures

Figure 1

40 pages, 1540 KiB  
Review
A Survey on Video Big Data Analytics: Architecture, Technologies, and Open Research Challenges
by Thi-Thu-Trang Do, Quyet-Thang Huynh, Kyungbaek Kim and Van-Quyet Nguyen
Appl. Sci. 2025, 15(14), 8089; https://doi.org/10.3390/app15148089 - 21 Jul 2025
Viewed by 615
Abstract
The exponential growth of video data across domains such as surveillance, transportation, and healthcare has raised critical challenges in scalability, real-time processing, and privacy preservation. While existing studies have addressed individual aspects of Video Big Data Analytics (VBDA), an integrated, up-to-date perspective remains [...] Read more.
The exponential growth of video data across domains such as surveillance, transportation, and healthcare has raised critical challenges in scalability, real-time processing, and privacy preservation. While existing studies have addressed individual aspects of Video Big Data Analytics (VBDA), an integrated, up-to-date perspective remains limited. This paper presents a comprehensive survey of system architectures and enabling technologies in VBDA. It categorizes system architectures into four primary types as follows: centralized, cloud-based infrastructures, edge computing, and hybrid cloud–edge. It also analyzes key enabling technologies, including real-time streaming, scalable distributed processing, intelligent AI models, and advanced storage for managing large-scale multimodal video data. In addition, the study provides a functional taxonomy of core video processing tasks, including object detection, anomaly recognition, and semantic retrieval, and maps these tasks to real-world applications. Based on the survey findings, the paper proposes ViMindXAI, a hybrid AI-driven platform that combines edge and cloud orchestration, adaptive storage, and privacy-aware learning to support scalable and trustworthy video analytics. Our analysis in this survey highlights emerging trends such as the shift toward hybrid cloud–edge architectures, the growing importance of explainable AI and federated learning, and the urgent need for secure and efficient video data management. These findings highlight key directions for designing next-generation VBDA platforms that enhance real-time, data-driven decision-making in domains such as public safety, transportation, and healthcare. These platforms facilitate timely insights, rapid response, and regulatory alignment through scalable and explainable analytics. This work provides a robust conceptual foundation for future research on adaptive and efficient decision-support systems in video-intensive environments. Full article
Show Figures

Figure 1

30 pages, 2282 KiB  
Article
User Experience of Navigating Work Zones with Automated Vehicles: Insights from YouTube on Challenges and Strengths
by Melika Ansarinejad, Kian Ansarinejad, Pan Lu and Ying Huang
Smart Cities 2025, 8(4), 120; https://doi.org/10.3390/smartcities8040120 - 19 Jul 2025
Viewed by 426
Abstract
Understanding automated vehicle (AV) behavior in complex road environments and user attitudes in such contexts is critical for their safe and effective integration into smart cities. Despite growing deployment, limited public data exist on AV performance in construction zones; highly dynamic settings marked [...] Read more.
Understanding automated vehicle (AV) behavior in complex road environments and user attitudes in such contexts is critical for their safe and effective integration into smart cities. Despite growing deployment, limited public data exist on AV performance in construction zones; highly dynamic settings marked by irregular lane markings, shifting detours, and unpredictable human presence. This study investigates AV behavior in these conditions through qualitative, video-based analysis of user-documented experiences on YouTube, focusing on Tesla’s supervised Full Self-Driving (FSD) and Waymo systems. Spoken narration, captions, and subtitles were examined to evaluate AV perception, decision-making, control, and interaction with humans. Findings reveal that while AVs excel in structured tasks such as obstacle detection, lane tracking, and cautious speed control, they face challenges in interpreting temporary infrastructure, responding to unpredictable human actions, and navigating low-visibility environments. These limitations not only impact performance but also influence user trust and acceptance. The study underscores the need for continued technological refinement, improved infrastructure design, and user-informed deployment strategies. By addressing current shortcomings, this research offers critical insights into AV readiness for real-world conditions and contributes to safer, more adaptive urban mobility systems. Full article
Show Figures

Figure 1

20 pages, 5700 KiB  
Article
Multimodal Personality Recognition Using Self-Attention-Based Fusion of Audio, Visual, and Text Features
by Hyeonuk Bhin and Jongsuk Choi
Electronics 2025, 14(14), 2837; https://doi.org/10.3390/electronics14142837 - 15 Jul 2025
Viewed by 471
Abstract
Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose [...] Read more.
Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose a multimodal personality recognition model that classifies the Big Five personality traits by extracting features from three heterogeneous sources: audio processed using Wav2Vec2, video represented as Skeleton Landmark time series, and text encoded through Bidirectional Encoder Representations from Transformers (BERT) and Doc2Vec embeddings. Each modality is handled through an independent Self-Attention block that highlights salient temporal information, and these representations are then summarized and integrated using a late fusion approach to effectively reflect both the inter-modal complementarity and cross-modal interactions. Compared to traditional recurrent neural network (RNN)-based multimodal models and unimodal classifiers, the proposed model achieves an improvement of up to 12 percent in the F1-score. It also maintains a high prediction accuracy and robustness under limited input conditions. Furthermore, a visualization based on t-distributed Stochastic Neighbor Embedding (t-SNE) demonstrates clear distributional separation across the personality classes, enhancing the interpretability of the model and providing insights into the structural characteristics of its latent representations. To support real-time deployment, a lightweight thread-based processing architecture is implemented, ensuring computational efficiency. By leveraging deep learning-based feature extraction and the Self-Attention mechanism, we present a novel personality recognition framework that balances performance with interpretability. The proposed approach establishes a strong foundation for practical applications in HRI, counseling, education, and other interactive systems that require personalized adaptation. Full article
(This article belongs to the Special Issue Explainable Machine Learning and Data Mining)
Show Figures

Figure 1

24 pages, 19550 KiB  
Article
TMTS: A Physics-Based Turbulence Mitigation Network Guided by Turbulence Signatures for Satellite Video
by Jie Yin, Tao Sun, Xiao Zhang, Guorong Zhang, Xue Wan and Jianjun He
Remote Sens. 2025, 17(14), 2422; https://doi.org/10.3390/rs17142422 - 12 Jul 2025
Viewed by 266
Abstract
Atmospheric turbulence severely degrades high-resolution satellite videos through spatiotemporally coupled distortions, including temporal jitter, spatial-variant blur, deformation, and scintillation, thereby constraining downstream analytical capabilities. Restoring turbulence-corrupted videos poses a challenging ill-posed inverse problem due to the inherent randomness of turbulent fluctuations. While existing [...] Read more.
Atmospheric turbulence severely degrades high-resolution satellite videos through spatiotemporally coupled distortions, including temporal jitter, spatial-variant blur, deformation, and scintillation, thereby constraining downstream analytical capabilities. Restoring turbulence-corrupted videos poses a challenging ill-posed inverse problem due to the inherent randomness of turbulent fluctuations. While existing turbulence mitigation methods for long-range imaging demonstrate partial success, they exhibit limited generalizability and interpretability in large-scale satellite scenarios. Inspired by refractive-index structure constant (Cn2) estimation from degraded sequences, we propose a physics-informed turbulence signature (TS) prior that explicitly captures spatiotemporal distortion patterns to enhance model transparency. Integrating this prior into a lucky imaging framework, we develop a Physics-Based Turbulence Mitigation Network guided by Turbulence Signature (TMTS) to disentangle atmospheric disturbances from satellite videos. The framework employs deformable attention modules guided by turbulence signatures to correct geometric distortions, iterative gated mechanisms for temporal alignment stability, and adaptive multi-frame aggregation to address spatially varying blur. Comprehensive experiments on synthetic and real-world turbulence-degraded satellite videos demonstrate TMTS’s superiority, achieving 0.27 dB PSNR and 0.0015 SSIM improvements over the DATUM baseline while maintaining practical computational efficiency. By bridging turbulence physics with deep learning, our approach provides both performance enhancements and interpretable restoration mechanisms, offering a viable solution for operational satellite video processing under atmospheric disturbances. Full article
Show Figures

Graphical abstract

Back to TopTop