Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (851)

Search Parameters:
Keywords = face videos

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
13 pages, 3044 KiB  
Article
Improving Event Data in Football Matches: A Case Study Model for Synchronizing Passing Events with Positional Data
by Alberto Cortez, Bruno Gonçalves, João Brito and Hugo Folgado
Appl. Sci. 2025, 15(15), 8694; https://doi.org/10.3390/app15158694 - 6 Aug 2025
Abstract
In football, accurately pinpointing key events like passes is vital for analyzing player and team performance. Despite continuous technological advancements, existing tracking systems still face challenges in accurately synchronizing events and positional data accurately. This is a case study that proposes a new [...] Read more.
In football, accurately pinpointing key events like passes is vital for analyzing player and team performance. Despite continuous technological advancements, existing tracking systems still face challenges in accurately synchronizing events and positional data accurately. This is a case study that proposes a new method to synchronize events and positional data collected during football matches. Three datasets were used to perform this study: a dataset created by applying a custom algorithm that synchronizes positional and event data, referred to as the optimized synchronization dataset (OSD); a simple temporal alignment between positional and event data, referred to as the raw synchronization dataset (RSD); and a manual notational data (MND) from the match video footage, considered the ground truth observations. The timestamp of the pass in both synchronized datasets was compared to the ground truth observations (MND). Spatial differences in OSD were also compared to the RSD data and to the original data from the provider. Root mean square error (RMSE) and mean absolute error (MAE) were utilized to assess the accuracy of both procedures. More accurate results were observed for optimized dataset, with RMSE values of RSD = 75.16 ms (milliseconds) and OSD = 72.7 ms, and MAE values RSD = 60.50 ms and OSD = 59.73 ms. Spatial accuracy also improved, with OSD showing reduced deviation from RSD compared to the original event data. The mean positional deviation was reduced from 1.59 ± 0.82 m in original event data to 0.41 ± 0.75 m in RSD. In conclusion, the model offers a more accurate method for synchronizing independent datasets for event and positional data. This is particularly beneficial for applications where precise timing and spatial location of actions are critical. In contrast to previous synchronization methods, this approach simplifies the process by using an automated technique based on patterns of ball velocity. This streamlines synchronization across datasets, reduces the need for manual intervention, and makes the method more practical for routine use in applied settings. Full article
Show Figures

Figure 1

18 pages, 797 KiB  
Article
On Becoming a Senior Staff Nurse in Taiwan: A Narrative Study
by Yu-Jen Hsieh and Yu-Tzu Dai
Healthcare 2025, 13(15), 1896; https://doi.org/10.3390/healthcare13151896 - 4 Aug 2025
Viewed by 213
Abstract
Background/Objectives: Senior nurses in Taiwan shoulder layered responsibilities shaped by professional roles, gendered expectations, and family duty. Although Taiwan faces a persistent shortage of experienced clinical nurses, limited research has explored how long-serving nurses sustain identity and commitment across decades of caregiving. [...] Read more.
Background/Objectives: Senior nurses in Taiwan shoulder layered responsibilities shaped by professional roles, gendered expectations, and family duty. Although Taiwan faces a persistent shortage of experienced clinical nurses, limited research has explored how long-serving nurses sustain identity and commitment across decades of caregiving. This study examines how senior staff nurses understand their journeys of becoming—and remaining—nurses within a culturally and emotionally complex landscape. Methods: Interviews were conducted between May 2019 and September 2023 in locations chosen by participants, with most sessions face-to-face and others undertaken via video conferencing during COVID-19. This narrative inquiry involved in-depth, multi-session interviews with five female senior staff nurses born in the 1970s to early 1980s. Each participant reflected on her life and career, supported by co-constructed “nursing life lines.” Thematic narrative analysis was conducted using McCormack’s five-lens framework and Riessman’s model, with ethical rigor ensured through reflexive journaling and participant validation. Results: Three overarching themes emerged: (1) inner strength and endurance, highlighting silent resilience and the ethical weight of caregiving; (2) support and responsibility in relationships, revealing the influence of family, faith, and relational duty; and (3) role navigation and professional identity, showing how nurses revisit meaning, self-understanding, and tensions across time. Participants described emotionally powerful moments, identity re-connection, and cultural values that shaped their paths. Conclusions: These narratives offer a relational and culturally embedded understanding of what it means to sustain a career in nursing. Narrative inquiry created space for reflection, meaning-making, and voice in a system where such voices are often unheard. Identity was not static—it was lived, reshaped, and held in story. Full article
Show Figures

Figure 1

18 pages, 8744 KiB  
Article
A User-Centered Teleoperation GUI for Automated Vehicles: Identifying and Evaluating Information Requirements for Remote Driving and Assistance
by Maria-Magdalena Wolf, Henrik Schmidt, Michael Christl, Jana Fank and Frank Diermeyer
Multimodal Technol. Interact. 2025, 9(8), 78; https://doi.org/10.3390/mti9080078 - 31 Jul 2025
Viewed by 207
Abstract
Teleoperation emerged as a promising fallback for situations beyond the capabilities of automated vehicles. Nevertheless, teleoperation still faces challenges, such as reduced situational awareness. Since situational awareness is primarily built through the remote operator’s visual perception, the graphical user interface (GUI) design is [...] Read more.
Teleoperation emerged as a promising fallback for situations beyond the capabilities of automated vehicles. Nevertheless, teleoperation still faces challenges, such as reduced situational awareness. Since situational awareness is primarily built through the remote operator’s visual perception, the graphical user interface (GUI) design is critical. In addition to video feed, supplemental informational elements are crucial—not only for the predominantly studied remote driving, but also for emerging desk-based remote assistance concepts. This work develops a GUI for different teleoperation concepts by identifying key informational elements during the teleoperation process through expert interviews (N = 9). Following this, a static and dynamic GUI prototype was developed and evaluated in a click dummy study (N = 36). Thereby, the dynamic GUI adapts the number of displayed elements according to the teleoperation phase. Results show that both GUIs achieve good system usability scale (SUS) ratings, with the dynamic GUI significantly outperforming the static version in both usability and task completion time. However, the results might be attributable to a learning effect due to the lack of randomization. The user experience questionnaire (UEQ) score shows potential for improvement. To enhance the user experience, the GUI should be evaluated in a follow-up study that includes interaction with a real vehicle. Full article
Show Figures

Figure 1

30 pages, 37977 KiB  
Article
Text-Guided Visual Representation Optimization for Sensor-Acquired Video Temporal Grounding
by Yun Tian, Xiaobo Guo, Jinsong Wang and Xinyue Liang
Sensors 2025, 25(15), 4704; https://doi.org/10.3390/s25154704 - 30 Jul 2025
Viewed by 266
Abstract
Video temporal grounding (VTG) aims to localize a semantically relevant temporal segment within an untrimmed video based on a natural language query. The task continues to face challenges arising from cross-modal semantic misalignment, which is largely attributed to redundant visual content in sensor-acquired [...] Read more.
Video temporal grounding (VTG) aims to localize a semantically relevant temporal segment within an untrimmed video based on a natural language query. The task continues to face challenges arising from cross-modal semantic misalignment, which is largely attributed to redundant visual content in sensor-acquired video streams, linguistic ambiguity, and discrepancies in modality-specific representations. Most existing approaches rely on intra-modal feature modeling, processing video and text independently throughout the representation learning stage. However, this isolation undermines semantic alignment by neglecting the potential of cross-modal interactions. In practice, a natural language query typically corresponds to spatiotemporal content in video signals collected through camera-based sensing systems, encompassing a particular sequence of frames and its associated salient subregions. We propose a text-guided visual representation optimization framework tailored to enhance semantic interpretation over video signals captured by visual sensors. This framework leverages textual information to focus on spatiotemporal video content, thereby narrowing the cross-modal gap. Built upon the unified cross-modal embedding space provided by CLIP, our model leverages video data from sensing devices to structure representations and introduces two dedicated modules to semantically refine visual representations across spatial and temporal dimensions. First, we design a Spatial Visual Representation Optimization (SVRO) module to learn spatial information within intra-frames. It selects salient patches related to the text, capturing more fine-grained visual details. Second, we introduce a Temporal Visual Representation Optimization (TVRO) module to learn temporal relations from inter-frames. Temporal triplet loss is employed in TVRO to enhance attention on text-relevant frames and capture clip semantics. Additionally, a self-supervised contrastive loss is introduced at the clip–text level to improve inter-clip discrimination by maximizing semantic variance during training. Experiments on Charades-STA, ActivityNet Captions, and TACoS, widely used benchmark datasets, demonstrate that our method outperforms state-of-the-art methods across multiple metrics. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

33 pages, 11684 KiB  
Article
Face Spoofing Detection with Stacking Ensembles in Work Time Registration System
by Rafał Klinowski and Mirosław Kordos
Appl. Sci. 2025, 15(15), 8402; https://doi.org/10.3390/app15158402 - 29 Jul 2025
Viewed by 139
Abstract
This paper introduces a passive face-authenticity detection system, designed for integration into an employee work time registration platform. The system is implemented as a stacking ensemble of multiple models. Each model independently assesses whether a camera is capturing a live human face or [...] Read more.
This paper introduces a passive face-authenticity detection system, designed for integration into an employee work time registration platform. The system is implemented as a stacking ensemble of multiple models. Each model independently assesses whether a camera is capturing a live human face or a spoofed representation, such as a photo or video. The ensemble comprises a convolutional neural network (CNN), a smartphone bezel-detection algorithm to identify faces displayed on electronic devices, a face context analysis module, and additional CNNs for image processing. The outputs of these models are aggregated by a neural network that delivers the final classification decision. We examined various combinations of models within the ensemble and compared the performance of our approach against existing methods through experimental evaluation. Full article
(This article belongs to the Special Issue Application of Artificial Intelligence in Image Processing)
Show Figures

Figure 1

14 pages, 691 KiB  
Article
Three-Dimensional-Printed Models: A Novel Approach to Ultrasound Education of the Placental Cord Insertion Site
by Samantha Ward, Sharon Maresse and Zhonghua Sun
Appl. Sci. 2025, 15(15), 8221; https://doi.org/10.3390/app15158221 - 24 Jul 2025
Viewed by 291
Abstract
Assessment of the placental cord insertion (PCI) is a vital component of antenatal ultrasound examinations. PCI can be complex, particularly in cases of abnormal PCI, and requires proficient sonographer spatial perception. The current literature describes the increasing potential of three-dimensional (3D) modelling to [...] Read more.
Assessment of the placental cord insertion (PCI) is a vital component of antenatal ultrasound examinations. PCI can be complex, particularly in cases of abnormal PCI, and requires proficient sonographer spatial perception. The current literature describes the increasing potential of three-dimensional (3D) modelling to enhance spatial awareness and understanding of complex anatomical structures. This study aimed to evaluate sonographers’ confidence in ultrasound assessment of the PCI and the potential benefit of novel 3D-printed models (3DPMs) of the PCI in ultrasound education. Sonographers employed at a large private medical imaging practice in Western Australia were invited to participate in a face-to-face presentation of two-dimensional (2D) ultrasound images, ultrasound videos, and 3DPMs of normal cord insertion (NCI), marginal cord insertion (MCI), and velamentous cord insertion (VCI). Our objective was to determine the benefit of 3DPMs in improving sonographers’ confidence and ability to spatially visualise the PCI. Thirty-three participants completed questionnaires designed to compare their confidence in assessing the PCI and their ability to spatially visualise the anatomical relationship between the placenta and PCI, before and after the presentation. There was a significant association between a participant’s year of experience and their confidence levels and spatial awareness of the PCI prior to the demonstration. The results showed the 3DPMs increased participant confidence and their spatial awareness of the PCI, with no significant association with years of experience. Additionally, participating sonographers were asked to rate the 3DPMs as an educational device. The 3DPMs were ranked as being a more useful educational tool for spatially visualising the NCI, MCI, and VCI than 2D ultrasound images and videos. Most participants responded favourably when asked whether the 3DPMs would be useful in ultrasound education, with 75.8%, 84.8%, and 97% indicating the models of NCI, MCI, and VCI, respectively, would be extremely useful. Our study has demonstrated a potential role for 3DPMs of the PCI in ultrasound education, supplementing traditional 2D educational resources. Full article
Show Figures

Figure 1

30 pages, 2282 KiB  
Article
User Experience of Navigating Work Zones with Automated Vehicles: Insights from YouTube on Challenges and Strengths
by Melika Ansarinejad, Kian Ansarinejad, Pan Lu and Ying Huang
Smart Cities 2025, 8(4), 120; https://doi.org/10.3390/smartcities8040120 - 19 Jul 2025
Viewed by 426
Abstract
Understanding automated vehicle (AV) behavior in complex road environments and user attitudes in such contexts is critical for their safe and effective integration into smart cities. Despite growing deployment, limited public data exist on AV performance in construction zones; highly dynamic settings marked [...] Read more.
Understanding automated vehicle (AV) behavior in complex road environments and user attitudes in such contexts is critical for their safe and effective integration into smart cities. Despite growing deployment, limited public data exist on AV performance in construction zones; highly dynamic settings marked by irregular lane markings, shifting detours, and unpredictable human presence. This study investigates AV behavior in these conditions through qualitative, video-based analysis of user-documented experiences on YouTube, focusing on Tesla’s supervised Full Self-Driving (FSD) and Waymo systems. Spoken narration, captions, and subtitles were examined to evaluate AV perception, decision-making, control, and interaction with humans. Findings reveal that while AVs excel in structured tasks such as obstacle detection, lane tracking, and cautious speed control, they face challenges in interpreting temporary infrastructure, responding to unpredictable human actions, and navigating low-visibility environments. These limitations not only impact performance but also influence user trust and acceptance. The study underscores the need for continued technological refinement, improved infrastructure design, and user-informed deployment strategies. By addressing current shortcomings, this research offers critical insights into AV readiness for real-world conditions and contributes to safer, more adaptive urban mobility systems. Full article
Show Figures

Figure 1

18 pages, 7391 KiB  
Article
Reliable QoE Prediction in IMVCAs Using an LMM-Based Agent
by Michael Sidorov, Tamir Berger, Jonathan Sterenson, Raz Birman and Ofer Hadar
Sensors 2025, 25(14), 4450; https://doi.org/10.3390/s25144450 - 17 Jul 2025
Viewed by 291
Abstract
Face-to-face interaction is one of the most natural forms of human communication. Unsurprisingly, Video Conferencing (VC) Applications have experienced a significant rise in demand over the past decade. With the widespread availability of cellular devices equipped with high-resolution cameras, Instant Messaging Video Call [...] Read more.
Face-to-face interaction is one of the most natural forms of human communication. Unsurprisingly, Video Conferencing (VC) Applications have experienced a significant rise in demand over the past decade. With the widespread availability of cellular devices equipped with high-resolution cameras, Instant Messaging Video Call Applications (IMVCAs) now constitute a substantial portion of VC communications. Given the multitude of IMVCA options, maintaining a high Quality of Experience (QoE) is critical. While content providers can measure QoE directly through end-to-end connections, Internet Service Providers (ISPs) must infer QoE indirectly from network traffic—a non-trivial task, especially when most traffic is encrypted. In this paper, we analyze a large dataset collected from WhatsApp IMVCA, comprising over 25,000 s of VC sessions. We apply four Machine Learning (ML) algorithms and a Large Multimodal Model (LMM)-based agent, achieving mean errors of 4.61%, 5.36%, and 13.24% for three popular QoE metrics: BRISQUE, PIQE, and FPS, respectively. Full article
Show Figures

Figure 1

21 pages, 7297 KiB  
Article
FGS-YOLOv8s-seg: A Lightweight and Efficient Instance Segmentation Model for Detecting Tomato Maturity Levels in Greenhouse Environments
by Dongfang Song, Ping Liu, Yanjun Zhu, Tianyuan Li and Kun Zhang
Agronomy 2025, 15(7), 1687; https://doi.org/10.3390/agronomy15071687 - 12 Jul 2025
Viewed by 390
Abstract
In a greenhouse environment, the application of artificial intelligence technology for selective tomato harvesting still faces numerous challenges, including varying lighting, background interference, and indistinct fruit surface features. This study proposes an improved instance segmentation model called FGS-YOLOv8s-seg, which achieves accurate detection and [...] Read more.
In a greenhouse environment, the application of artificial intelligence technology for selective tomato harvesting still faces numerous challenges, including varying lighting, background interference, and indistinct fruit surface features. This study proposes an improved instance segmentation model called FGS-YOLOv8s-seg, which achieves accurate detection and maturity grading of tomatoes in greenhouse environments. The model incorporates a novel SegNext_Attention mechanism at the end of the backbone, while simultaneously replacing Bottleneck structures in the neck layer with FasterNet blocks and integrating Gaussian Context Transformer modules to form a lightweight C2f_FasterNet_GCT structure. Experiments show that this model performs significantly better than mainstream segmentation models in core indicators such as precision (86.9%), recall (76.3%), average precision (mAP@0.5 84.8%), F1-score (81.3%), and GFLOPs (35.6 M). Compared with the YOLOv8s-seg baseline model, these metrics show improvements of 2.6%, 3.8%, 5.1%, 3.3%, and 6.8 M, respectively. Ablation experiments demonstrate that the improved architecture contributes significantly to performance gains, with combined improvements yielding optimal results. The analysis of detection performance videos under different cultivation patterns demonstrates the generalizability of the improved model in complex environments, achieving an optimal balance between detection accuracy (86.9%) and inference speed (53.2 fps). This study provides a reliable technical solution for the selective harvesting of greenhouse tomatoes. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

12 pages, 4368 KiB  
Article
A Dual-Branch Fusion Model for Deepfake Detection Using Video Frames and Microexpression Features
by Georgios Petmezas, Vazgken Vanian, Manuel Pastor Rufete, Eleana E. I. Almaloglou and Dimitris Zarpalas
J. Imaging 2025, 11(7), 231; https://doi.org/10.3390/jimaging11070231 - 11 Jul 2025
Viewed by 471
Abstract
Deepfake detection has become a critical issue due to the rise of synthetic media and its potential for misuse. In this paper, we propose a novel approach to deepfake detection by combining video frame analysis with facial microexpression features. The dual-branch fusion model [...] Read more.
Deepfake detection has become a critical issue due to the rise of synthetic media and its potential for misuse. In this paper, we propose a novel approach to deepfake detection by combining video frame analysis with facial microexpression features. The dual-branch fusion model utilizes a 3D ResNet18 for spatiotemporal feature extraction and a transformer model to capture microexpression patterns, which are difficult to replicate in manipulated content. We evaluate the model on the widely used FaceForensics++ (FF++) dataset and demonstrate that our approach outperforms existing state-of-the-art methods, achieving 99.81% accuracy and a perfect ROC-AUC score of 100%. The proposed method highlights the importance of integrating diverse data sources for deepfake detection, addressing some of the current limitations of existing systems. Full article
Show Figures

Figure 1

29 pages, 1184 KiB  
Article
Perception-Based H.264/AVC Video Coding for Resource-Constrained and Low-Bit-Rate Applications
by Lih-Jen Kau, Chin-Kun Tseng and Ming-Xian Lee
Sensors 2025, 25(14), 4259; https://doi.org/10.3390/s25144259 - 8 Jul 2025
Viewed by 397
Abstract
With the rapid expansion of Internet of Things (IoT) and edge computing applications, efficient video transmission under constrained bandwidth and limited computational resources has become increasingly critical. In such environments, perception-based video coding plays a vital role in maintaining acceptable visual quality while [...] Read more.
With the rapid expansion of Internet of Things (IoT) and edge computing applications, efficient video transmission under constrained bandwidth and limited computational resources has become increasingly critical. In such environments, perception-based video coding plays a vital role in maintaining acceptable visual quality while minimizing bit rate and processing overhead. Although newer video coding standards have emerged, H.264/AVC remains the dominant compression format in many deployed systems, particularly in commercial CCTV surveillance, due to its compatibility, stability, and widespread hardware support. Motivated by these practical demands, this paper proposes a perception-based video coding algorithm specifically tailored for low-bit-rate H.264/AVC applications. By targeting regions most relevant to the human visual system, the proposed method enhances perceptual quality while optimizing resource usage, making it particularly suitable for embedded systems and bandwidth-limited communication channels. In general, regions containing human faces and those exhibiting significant motion are of primary importance for human perception and should receive higher bit allocation to preserve visual quality. To this end, macroblocks (MBs) containing human faces are detected using the Viola–Jones algorithm, which leverages AdaBoost for feature selection and a cascade of classifiers for fast and accurate detection. This approach is favored over deep learning-based models due to its low computational complexity and real-time capability, making it ideal for latency- and resource-constrained IoT and edge environments. Motion-intensive macroblocks were identified by comparing their motion intensity against the average motion level of preceding reference frames. Based on these criteria, a dynamic quantization parameter (QP) adjustment strategy was applied to assign finer quantization to perceptually important regions of interest (ROIs) in low-bit-rate scenarios. The experimental results show that the proposed method achieves superior subjective visual quality and objective Peak Signal-to-Noise Ratio (PSNR) compared to the standard JM software and other state-of-the-art algorithms under the same bit rate constraints. Moreover, the approach introduces only a marginal increase in computational complexity, highlighting its efficiency. Overall, the proposed algorithm offers an effective balance between visual quality and computational performance, making it well suited for video transmission in bandwidth-constrained, resource-limited IoT and edge computing environments. Full article
Show Figures

Figure 1

17 pages, 7786 KiB  
Article
Video Coding Based on Ladder Subband Recovery and ResGroup Module
by Libo Wei, Aolin Zhang, Lei Liu, Jun Wang and Shuai Wang
Entropy 2025, 27(7), 734; https://doi.org/10.3390/e27070734 - 8 Jul 2025
Viewed by 341
Abstract
With the rapid development of video encoding technology in the field of computer vision, the demand for tasks such as video frame reconstruction, denoising, and super-resolution has been continuously increasing. However, traditional video encoding methods typically focus on extracting spatial or temporal domain [...] Read more.
With the rapid development of video encoding technology in the field of computer vision, the demand for tasks such as video frame reconstruction, denoising, and super-resolution has been continuously increasing. However, traditional video encoding methods typically focus on extracting spatial or temporal domain information, often facing challenges of insufficient accuracy and information loss when reconstructing high-frequency details, edges, and textures of images. To address this issue, this paper proposes an innovative LadderConv framework, which combines discrete wavelet transform (DWT) with spatial and channel attention mechanisms. By progressively recovering wavelet subbands, it effectively enhances the video frame encoding quality. Specifically, the LadderConv framework adopts a stepwise recovery approach for wavelet subbands, first processing high-frequency detail subbands with relatively less information, then enhancing the interaction between these subbands, and ultimately synthesizing a high-quality reconstructed image through inverse wavelet transform. Moreover, the framework introduces spatial and channel attention mechanisms, which further strengthen the focus on key regions and channel features, leading to notable improvements in detail restoration and image reconstruction accuracy. To optimize the performance of the LadderConv framework, particularly in detail recovery and high-frequency information extraction tasks, this paper designs an innovative ResGroup module. By using multi-layer convolution operations along with feature map compression and recovery, the ResGroup module enhances the network’s expressive capability and effectively reduces computational complexity. The ResGroup module captures multi-level features from low level to high level and retains rich feature information through residual connections, thus improving the overall reconstruction performance of the model. In experiments, the combination of the LadderConv framework and the ResGroup module demonstrates superior performance in video frame reconstruction tasks, particularly in recovering high-frequency information, image clarity, and detail representation. Full article
(This article belongs to the Special Issue Rethinking Representation Learning in the Age of Large Models)
Show Figures

Figure 1

21 pages, 2816 KiB  
Article
AutoStageMix: Fully Automated Stage Cross-Editing System Utilizing Facial Features
by Minjun Oh, Howon Jang and Daeho Lee
Appl. Sci. 2025, 15(13), 7613; https://doi.org/10.3390/app15137613 - 7 Jul 2025
Viewed by 315
Abstract
StageMix is a video compilation of multiple stage performances of the same song, edited seamlessly together using appropriate editing points. However, generating a StageMix requires specialized editing techniques and is a considerable time-consuming process. To address this challenge, we introduce AutoStageMix, an automated [...] Read more.
StageMix is a video compilation of multiple stage performances of the same song, edited seamlessly together using appropriate editing points. However, generating a StageMix requires specialized editing techniques and is a considerable time-consuming process. To address this challenge, we introduce AutoStageMix, an automated StageMix generation system designed to perform all processes automatically. The system is structured into five principal stages: preprocessing, feature extraction, identifying a transition point, editing path determination, and StageMix generation. The initial stage of the process involves audio analysis to synchronize the sequences across all input videos, followed by frame extraction. After that, the facial features are extracted from each video frame. Next, transition points are identified, which form the basis for face-based transitions, inter-stage cuts, and intra-stage cuts. Subsequently, a cost function is defined to facilitate the creation of cross-edited sequences. The optimal editing path is computed using Dijkstra’s algorithm to minimize the total cost of editing. Finally, the StageMix is generated by applying appropriate editing effects tailored to each transition type, aiming to maximize visual appeal. Experimental results suggest that our method generally achieves lower NME scores than existing StageMix generation approaches across multiple test songs. In a user study with 21 participants, AutoStageMix achieved viewer satisfaction comparable to that of professionally edited StageMixes, with no statistically significant difference between the two. AutoStageMix enables users to produce StageMixes effortlessly and efficiently by eliminating the need for manual editing. Full article
Show Figures

Figure 1

25 pages, 775 KiB  
Article
The Effects of Loving-Kindness Meditation Guided by Short Video Apps on Policemen’s Mindfulness, Public Service Motivation, Conflict Resolution Skills, and Communication Skills
by Chao Liu, Li-Jen Lin, Kang-Jie Zhang and Wen-Ko Chiou
Behav. Sci. 2025, 15(7), 909; https://doi.org/10.3390/bs15070909 - 4 Jul 2025
Cited by 1 | Viewed by 517
Abstract
Police officers work in high-stress environments that demand emotional resilience, interpersonal skills, and effective communication. Occupational stress can negatively impact their motivation, conflict resolution abilities, and professional effectiveness. Loving-Kindness Meditation (LKM), a mindfulness-based intervention focused on cultivating compassion and empathy, has shown promise [...] Read more.
Police officers work in high-stress environments that demand emotional resilience, interpersonal skills, and effective communication. Occupational stress can negatively impact their motivation, conflict resolution abilities, and professional effectiveness. Loving-Kindness Meditation (LKM), a mindfulness-based intervention focused on cultivating compassion and empathy, has shown promise in enhancing prosocial attitudes and emotional regulation. With the rise of short video platforms, digital interventions like video-guided LKM may offer accessible mental health support for law enforcement. This study examines the effects of short video app-guided LKM on police officers’ mindfulness, public service motivation (PSM), conflict resolution skills (CRSs), and communication skills (CSSs). It aims to determine whether LKM can enhance these psychological and professional competencies. A randomized controlled trial (RCT) was conducted with 110 active-duty police officers from a metropolitan police department in China, with 92 completing the study. Participants were randomly assigned to either the LKM group (n = 46) or the waitlist control group (n = 46). The intervention consisted of a 6-week short video app-guided LKM program with daily 10 min meditation sessions. Pre- and post-intervention assessments were conducted using several validated scales: the Mindfulness Attention Awareness Scale (MAAS), the Public Service Motivation Scale (PSM), the Conflict Resolution Styles Inventory (CRSI), and the Communication Competence Scale (CCS). A 2 (Group: LKM vs. Control) × 2 (Time: Pre vs. Post) mixed-design MANOVA was conducted to analyze the effects. Statistical analyses revealed significant group-by-time interaction effects for PSM (F(4,177) = 21.793, p < 0.001, η2 = 0.108), CRS (F(4,177) = 20.920, p < 0.001, η2 = 0.104), and CSS (F(4,177) = 49.095, p < 0.001, η2 = 0.214), indicating improvements in these areas for LKM participants. However, no significant improvement was observed for mindfulness (F(4,177) = 2.850, p = 0.930, η2 = 0.016). Short video app-guided LKM improves public service motivation, conflict resolution skills, and communication skills among police officers but does not significantly enhance mindfulness. These findings suggest that brief, digitally delivered compassion-focused programs can be seamlessly incorporated into routine in-service training to strengthen officers’ prosocial motivation, de-escalation competence, and public-facing communication, thereby fostering more constructive police–community interactions. Full article
Show Figures

Figure 1

21 pages, 2869 KiB  
Article
Multimodal Feature-Guided Audio-Driven Emotional Talking Face Generation
by Xueping Wang, Yuemeng Huo, Yanan Liu, Xueni Guo, Feihu Yan and Guangzhe Zhao
Electronics 2025, 14(13), 2684; https://doi.org/10.3390/electronics14132684 - 2 Jul 2025
Viewed by 634
Abstract
Audio-driven emotional talking face generation aims to generate talking face videos with rich facial expressions and temporal coherence. Current diffusion model-based approaches predominantly depend on either single-label emotion annotations or external video references, which often struggle to capture the complex relationships between modalities, [...] Read more.
Audio-driven emotional talking face generation aims to generate talking face videos with rich facial expressions and temporal coherence. Current diffusion model-based approaches predominantly depend on either single-label emotion annotations or external video references, which often struggle to capture the complex relationships between modalities, resulting in less natural emotional expressions. To address these issues, we propose MF-ETalk, a multimodal feature-guided method for emotional talking face generation. Specifically, we design an emotion-aware multimodal feature disentanglement and fusion framework that leverages Action Units (AUs) to disentangle facial expressions and models the nonlinear relationships among AU features using a residual encoder. Furthermore, we introduce a hierarchical multimodal feature fusion module that enables dynamic interactions among audio, visual cues, AUs, and motion dynamics. This module is optimized through global motion modeling, lip synchronization, and expression subspace learning, enabling full-face dynamic generation. Finally, an emotion-consistency constraint module is employed to refine the generated results and ensure the naturalness of expressions. Extensive experiments on the MEAD and HDTF datasets demonstrate that MF-ETalk outperforms state-of-the-art methods in both expression naturalness and lip-sync accuracy. For example, it achieves an FID of 43.052 and E-FID of 2.403 on MEAD, along with strong synchronization performance (LSE-C of 6.781, LSE-D of 7.962), confirming the effectiveness of our approach in producing realistic and emotionally expressive talking face videos. Full article
Show Figures

Figure 1

Back to TopTop