MDPI - Publisher of Open Access Journals

17 pages, 1603 KiB

Open AccessPerspective

A Perspective on Quality Evaluation for AI-Generated Videos

by Zhichao Zhang, Wei Sun and Guangtao Zhai

Sensors 2025, 25(15), 4668; https://doi.org/10.3390/s25154668 - 28 Jul 2025

Viewed by 343

Recent breakthroughs in AI-generated content (AIGC) have transformed video creation, empowering systems to translate text, images, or audio into visually compelling stories. Yet reliable evaluation of these machine-crafted videos remains elusive because quality is governed not only by spatial fidelity within individual frames [...] Read more.

Recent breakthroughs in AI-generated content (AIGC) have transformed video creation, empowering systems to translate text, images, or audio into visually compelling stories. Yet reliable evaluation of these machine-crafted videos remains elusive because quality is governed not only by spatial fidelity within individual frames but also by temporal coherence across frames and precise semantic alignment with the intended message. The foundational role of sensor technologies is critical, as they determine the physical plausibility of AIGC outputs. In this perspective, we argue that multimodal large language models (MLLMs) are poised to become the cornerstone of next-generation video quality assessment (VQA). By jointly encoding cues from multiple modalities such as vision, language, sound, and even depth, the MLLM can leverage its powerful language understanding capabilities to assess the quality of scene composition, motion dynamics, and narrative consistency, overcoming the fragmentation of hand-engineered metrics and the poor generalization ability of CNN-based methods. Furthermore, we provide a comprehensive analysis of current methodologies for assessing AIGC video quality, including the evolution of generation models, dataset design, quality dimensions, and evaluation frameworks. We argue that advances in sensor fusion enable MLLMs to combine low-level physical constraints with high-level semantic interpretations, further enhancing the accuracy of visual quality assessment. Full article

(This article belongs to the Special Issue Perspectives in Intelligent Sensors and Sensing Systems)

► Show Figures

Figure 1

39 pages, 2628 KiB

Open AccessArticle

A Decentralized Multi-Venue Real-Time Video Broadcasting System Integrating Chain Topology and Intelligent Self-Healing Mechanisms

by Tianpei Guo, Ziwen Song, Haotian Xin and Guoyang Liu

Appl. Sci. 2025, 15(14), 8043; https://doi.org/10.3390/app15148043 - 19 Jul 2025

Viewed by 478

Abstract

The rapid growth in large-scale distributed video conferencing, remote education, and real-time broadcasting poses significant challenges to traditional centralized streaming systems, particularly regarding scalability, cost, and reliability under high concurrency. Centralized approaches often encounter bottlenecks, increased bandwidth expenses, and diminished fault tolerance. This [...] Read more.

The rapid growth in large-scale distributed video conferencing, remote education, and real-time broadcasting poses significant challenges to traditional centralized streaming systems, particularly regarding scalability, cost, and reliability under high concurrency. Centralized approaches often encounter bottlenecks, increased bandwidth expenses, and diminished fault tolerance. This paper proposes a novel decentralized real-time broadcasting system employing a peer-to-peer (P2P) chain topology based on IPv6 networking and the Secure Reliable Transport (SRT) protocol. By exploiting the global addressing capability of IPv6, our solution simplifies direct node interconnections, effectively eliminating complexities associated with Network Address Translation (NAT). Furthermore, we introduce an innovative chain-relay transmission method combined with distributed node management strategies, substantially reducing reliance on central servers and minimizing deployment complexity. Leveraging SRT’s low-latency UDP transmission, packet retransmission, congestion control, and AES-128/256 encryption, the proposed system ensures robust security and high video stream quality across wide-area networks. Additionally, a WebSocket-based real-time fault detection algorithm coupled with a rapid fallback self-healing mechanism is developed, enabling millisecond-level fault detection and swift restoration of disrupted links. Extensive performance evaluations using Video Multi-Resolution Fidelity (VMRF) metrics across geographically diverse and heterogeneous environments confirm significant performance gains. Specifically, our approach achieves substantial improvements in latency, video quality stability, and fault tolerance over existing P2P methods, along with over tenfold enhancements in frame rates compared with conventional RTMP-based solutions, thereby demonstrating its efficacy, scalability, and cost-effectiveness for real-time video streaming applications. Full article

► Show Figures

Figure 1

19 pages, 1026 KiB

Open AccessArticle

Development of the Psychosocial Rehabilitation Web Application (Psychosocial Rehab App)

by Fagner Alfredo Ardisson Cirino Campos, José Carlos Sánches García, Gabriel Lamarca Galdino da Silva, João Antônio Lemos Araújo, Ines Farfán Ulloa, Edilson Carlos Caritá, Fabio Biasotto Feitosa, Marciana Fernandes Moll, Tomás Daniel Menendez Rodriguez and Carla Aparecida Arena Ventura

Nurs. Rep. 2025, 15(7), 228; https://doi.org/10.3390/nursrep15070228 - 25 Jun 2025

Viewed by 507

Abstract

Introduction: Few applications worldwide focus on psychosocial rehabilitation, and none specifically address psychosocial rehabilitation projects. This justifies the need for an application to assist mental health professionals in constructing and managing such projects in the Brazilian mental health scenario. Objective: This study aimed [...] Read more.

Introduction: Few applications worldwide focus on psychosocial rehabilitation, and none specifically address psychosocial rehabilitation projects. This justifies the need for an application to assist mental health professionals in constructing and managing such projects in the Brazilian mental health scenario. Objective: This study aimed to present a web application, the “Psychosocial Rehabilitation Application” (Psychosocial Rehab App), and describe its development in detail through a technological survey conducted between May 2024 and February 2025. Method: The development process of the web app was carried out in the following four stages, adapted from the Novak method: theoretical basis, requirements survey, prototyping, and development with alpha testing. The active and collaborative participation of the main researcher (a psychiatric nurse) and two undergraduate software engineers, supervised by a software engineer and a professor of nursing and psychology, was essential for producing a suitable operational product available to mental health professionals. Interactions were conducted via video calls, WhatsApp, and email. These interactions were transcribed using the Transkriptor software and inserted into the ATLAS.ti software for thematic analysis. Results: The web app “Psychosocial Rehabilitation Application” displays a home screen for registration and other screens structured into the stages of the psychosocial rehabilitation project (assessment, diagnosis, goals, intervention, agreements, and re-assessment). It also has a home screen, a resource screen, and a function screen with options to add a new project, search for a project, or search for mental health support services. These features facilitate the operation and streamline psychosocial rehabilitation projects by mental health professionals. Thematic analysis revealed three themes and seven codes describing the entire development process and interactions among participants in collaborative, interrelational work. A collaborative approach between researchers and developers was essential for translating the complexity of the psychosocial rehabilitation project into practical and usable functionalities for future users, who will be mental health professionals. Discussion: The Psychosocial Rehab App was developed collaboratively by mental health professionals and developers. It supports the creation of structured rehabilitation projects, improving decision-making and documentation. Designed for clinical use, the app promotes autonomy and recovery by aligning technology with psychosocial rehabilitation theory and the actual needs of mental health services. Conclusions: The Psychosocial Rehab App was developed through collaborative work between mental health and technology professionals. The lead researcher mediated this process to ensure that the app’s functionalities reflected both technical feasibility and therapeutic goals. Empathy and dialog were key to translating complex clinical needs into usable and context-appropriate technological solutions. Full article

(This article belongs to the Section Artificial Intelligence and Digital Innovations in Nursing Care)

► Show Figures

Figure 1

26 pages, 10901 KiB

Open AccessArticle

Video-Assisted Rockfall Kinematics Analysis (VARKA): Analyzing Shape and Release Angle Effects on Motion and Energy Dissipation

by Milad Ghahramanieisalou, Javad Sattarvand and Amin Moniri-Morad

Geotechnics 2025, 5(3), 42; https://doi.org/10.3390/geotechnics5030042 - 21 Jun 2025

Viewed by 248

Abstract

Understanding rockfall behavior is essential for accurately predicting hazards in both natural and engineered environments, yet prior research has predominantly focused on spherical rocks or single-impact scenarios, leaving critical gaps in highlighting the dynamics of non-spherical rocks and multiple impacts. This study addresses [...] Read more.

Understanding rockfall behavior is essential for accurately predicting hazards in both natural and engineered environments, yet prior research has predominantly focused on spherical rocks or single-impact scenarios, leaving critical gaps in highlighting the dynamics of non-spherical rocks and multiple impacts. This study addresses these shortcomings by investigating the influence of rock shape and release angle on motion, energy dissipation, and impact behavior. To achieve this, an innovative approach rooted in the Video-Assisted Rockfall Kinematics Analysis (VARKA) procedure was introduced, integrating a custom-designed apparatus, controlled experimental setups, and sophisticated data analysis techniques. Experiments utilizing a pendulum-based release system analyzed various scenarios involving different rock shapes and release angles. These tests provided comprehensive motion data for multiple impacts, including trajectories, translational and angular velocities, and the coefficient of restitution (COR). Results revealed that non-spherical rocks exhibited significantly more erratic trajectories and greater variability in COR values compared to spherical rocks. The experiments demonstrated that ellipsoidal and octahedral shapes had substantially higher variability in runout distances than spherical rocks. COR values for ellipsoidal shapes spanned a wide range, in contrast to the tighter clustering observed for spherical rocks. These findings highlight the pivotal influence of rock shape on lateral dispersion and energy dissipation, reinforcing the need for data-driven approaches to enhance and complement traditional physics-based predictive models. Full article

► Show Figures

Figure 1

27 pages, 612 KiB

Open AccessSystematic Review

Cocaine Cues Used in Experimental Research: A Systematic Review

by Eileen Brobbin, Natalie Lowry, Matteo Cella, Alex Copello, Simon Coulton, Jerome Di Pietro, Colin Drummond, Steven Glautier, Ceyda Kiyak, Thomas Phillips, Daniel Stahl, Shelley Starr, Lucia Valmaggia, Colin Williams and Paolo Deluca

Brain Sci. 2025, 15(6), 626; https://doi.org/10.3390/brainsci15060626 - 10 Jun 2025

Viewed by 1289

Abstract

Aims: Cue exposure therapy (CET) is a promising treatment approach for cocaine substance use disorder (SUD). CET specifically targets the psychological and physiological responses elicited by drug-related cues, aiming to reduce their motivational impact. To advance understanding of CET for cocaine treatment, [...] Read more.

Aims: Cue exposure therapy (CET) is a promising treatment approach for cocaine substance use disorder (SUD). CET specifically targets the psychological and physiological responses elicited by drug-related cues, aiming to reduce their motivational impact. To advance understanding of CET for cocaine treatment, this systematic review aims to categorise the range of cocaine cues used in research. Methods: A systematic review of the existing literature with searches conducted on PubMed and Web of Science bibliographic databases with no time constraints in August 2024 (PROSPERO: CRD42024554361). Three reviewers were independently involved in the screening, review and data extraction process, in line with PRISMA guidelines. Data extracted included participant demographics, study design, data on the cocaine cue task, and examples (if provided). Each study was appraised and received a quality score. The secondary outcome was to summarise examples for each category type identified. The data are presented as a narrative synthesis. Results: 3600 articles were identified and screened. 235 articles were included in the analysis. Cues identified included images, paraphernalia, drug-related words, cocaine smell, auditory stimuli presented via audiotapes, video recordings, scripts, and virtual reality environments, often combining multiple modalities. Included studies recruited cocaine-dependent individuals, recreational users, polydrug users, and non-cocaine-using controls. The sample sizes of the studies ranged from a single case study to a study including 1974 participants. Conclusions: This review found that studies employed a wide range of cue categories, but detailed examples were often lacking, limiting replication. The number and combination of cues varied: some studies used only cocaine-related images, while others included images, videos, physical items, and audiotapes. The level of immersion and personalisation also differed considerably. All studies used cocaine-specific cues, most commonly images or representations of cocaine substance, cocaine use or drug paraphernalia, drug preparation items, or conversations of cocaine use and its effects. The overall quality of the included studies was deemed good, with all adhering to standard research norms. While this review highlights the breath of cue types used in the literature, further research should focus on enhancing cue exposure techniques by incorporating more immersive and personalised stimuli, and by providing clearer documentation of cue characteristics to support replication and clinical translation. Full article

(This article belongs to the Special Issue Psychiatry and Addiction: A Multi-Faceted Issue)

► Show Figures

Figure 1

24 pages, 6881 KiB

Open AccessArticle

Sign Language Anonymization: Face Swapping Versus Avatars

by Marina Perea-Trigo, Manuel Vázquez-Enríquez, Jose C. Benjumea-Bellot, Jose L. Alba-Castro and Juan A. Álvarez-García

Electronics 2025, 14(12), 2360; https://doi.org/10.3390/electronics14122360 - 9 Jun 2025

Viewed by 556

Abstract

The visual nature of Sign Language datasets raises privacy concerns that hinder data sharing, which is essential for advancing deep learning (DL) models in Sign Language recognition and translation. This study evaluated two anonymization techniques, realistic avatar synthesis and face swapping (FS), designed [...] Read more.

The visual nature of Sign Language datasets raises privacy concerns that hinder data sharing, which is essential for advancing deep learning (DL) models in Sign Language recognition and translation. This study evaluated two anonymization techniques, realistic avatar synthesis and face swapping (FS), designed to anonymize the identities of signers, while preserving the semantic integrity of signed content. A novel metric, Identity Anonymization with Expressivity Preservation (IAEP), is introduced to assess the balance between effective anonymization and the preservation of facial expressivity crucial for Sign Language communication. In addition, the quality evaluation included the LPIPS and FID metrics, which measure perceptual similarity and visual quality. A survey with deaf participants further complemented the analysis, providing valuable insight into the practical usability and comprehension of anonymized videos. The results show that while face swapping achieved acceptable anonymization and preserved semantic clarity, avatar-based anonymization struggled with comprehension. These findings highlight the need for further research efforts on securing privacy while preserving Sign Language understandability, both for dataset accessibility and the anonymous participation of deaf people in digital content. Full article

(This article belongs to the Special Issue Application of Machine Learning in Graphics and Images, 2nd Edition)

► Show Figures

Figure 1

21 pages, 429 KiB

Open AccessReview

A Systematic Review of Bicycle Motocross: Influence of Physiological, Biomechanical, Physical, and Psychological Indicators on Sport Performance

by Boryi A. Becerra-Patiño, Aura Daniela Montenegro-Bonilla, Jorge Olivares-Arancibia, Sam Hernández-Jaña, Rodrigo Yáñez-Sepúlveda, Daniel Rojas-Valverde, Víctor Hernández-Beltrán, José M. Gamonales, José Pino-Ortega and José Francisco López-Gil

J. Funct. Morphol. Kinesiol. 2025, 10(2), 205; https://doi.org/10.3390/jfmk10020205 - 2 Jun 2025

Viewed by 756

Abstract

Background: This sport involves the integration of various capabilities and mechanisms, including cognitive, physiological, and biomechanical components, that allow the athlete to perform in competition. However, to date, no systematic review has analyzed the indicators that are decisive for sports performance in [...] Read more.

Background: This sport involves the integration of various capabilities and mechanisms, including cognitive, physiological, and biomechanical components, that allow the athlete to perform in competition. However, to date, no systematic review has analyzed the indicators that are decisive for sports performance in Bicycle Motocross (BMX). The objective of this work was to carry out a systematic review of the performance variables in BMX and establish recommendations for researchers and trainers. Materials and Methods: The following databases were consulted: PubMed, Scopus, and Web of Science. This systematic review uses the guidelines of the PRISMA statement and the guidelines for performing systematic reviews in sports sciences. The search approach, along with the selection criteria and additional details, were previously noted in the prospective registry (INPLASY202480036). The quality of the evidence was evaluated via the PEDro scale. Results: The 21 studies that make up the sample of this systematic review have a total sample of 287 athletes. However, in the studies analyzed, there are five main categories for the study of performance in BMX: (i) physiological profile and BMX and bicarbonate; (ii) BMX and physical characteristics (power, speed, and sprint); (iii) translation and rotational acceleration and systems and implements; (iv) psychological variables; and (v) skills and techniques. Conclusions: This systematic review provides convincing evidence regarding the influence of several factors that can determine performance in BMX, including P_max, cadence, neuromuscular capacity, feedback and cognitive training, accelerometry and video analysis, anaerobic–aerobic relationships, physical conditioning, strength, and speed. Full article

(This article belongs to the Special Issue Optimizing Post-activation Performance Enhancement)

► Show Figures

Figure 1

20 pages, 7472 KiB

Open AccessArticle

Uncertain Shape and Deformation Recognition Using Wavelet-Based Spatiotemporal Features

by Haruka Matoba, Takashi Kusaka, Koji Shimatani and Takayuki Tanaka

Electronics 2025, 14(11), 2131; https://doi.org/10.3390/electronics14112131 - 23 May 2025

Viewed by 366

Abstract

This paper proposes a wavelet-based spatiotemporal feature extraction method for recognizing uncertain shapes and their deformations. Uncertain shapes, such as hand gestures and fetal movements, exhibit individual and trial-dependent variations, making their accurate recognition challenging. Our approach constructs shape feature vectors by integrating [...] Read more.

This paper proposes a wavelet-based spatiotemporal feature extraction method for recognizing uncertain shapes and their deformations. Uncertain shapes, such as hand gestures and fetal movements, exhibit individual and trial-dependent variations, making their accurate recognition challenging. Our approach constructs shape feature vectors by integrating wavelet coefficients across multiple scales, ensuring robustness to rotation and translation. By analyzing the temporal evolution of these features, we can detect and quantify deformations effectively. Experimental evaluations demonstrate that the proposed method accurately identifies shape differences and tracks deformations, outperforming conventional approaches such as template matching and neural networks in adaptability and generalization. We further validate its applicability in tasks such as hand gesture recognition and fetal movement analysis from ultrasound videos. These results suggest that the proposed wavelet-based spatiotemporal feature extraction technique provides a reliable and computationally efficient solution for recognizing and tracking uncertain shapes in dynamic environments. Full article

(This article belongs to the Special Issue Advanced Machine Learning, Pattern Recognition, and Deep Learning Technologies: Methodologies and Applications)

► Show Figures

Figure 1

13 pages, 2240 KiB

Open AccessArticle

Monocular 3D Tooltip Tracking in Robotic Surgery—Building a Multi-Stage Pipeline

by Sanjeev Narasimhan, Mehmet Kerem Turkcan, Mattia Ballo, Sarah Choksi, Filippo Filicori and Zoran Kostic

Electronics 2025, 14(10), 2075; https://doi.org/10.3390/electronics14102075 - 20 May 2025

Cited by 1 | Viewed by 1139

Abstract

Tracking the precise movement of surgical tools is essential for enabling automated analysis, providing feedback, and enhancing safety in robotic-assisted surgery. Accurate 3D tracking of surgical tooltips is challenging to implement when using monocular videos due to the complexity of extracting depth information. [...] Read more.

Tracking the precise movement of surgical tools is essential for enabling automated analysis, providing feedback, and enhancing safety in robotic-assisted surgery. Accurate 3D tracking of surgical tooltips is challenging to implement when using monocular videos due to the complexity of extracting depth information. We propose a pipeline that combines state-of-the-art foundation models—Florence2 and Segment Anything 2 (SAM2)—for zero-shot 2D localization of tooltip coordinates using a monocular video input. Localization predictions are refined through supervised training of the YOLOv11 segmentation model to enable real-time applications. The depth estimation model Metric3D computes the relative depth and provides tooltip camera coordinates, which are subsequently transformed into world coordinates via a linear model estimating rotation and translation parameters. An experimental evaluation on the JIGSAWS Suturing Kinematic dataset achieves a 3D Average Jaccard score on tooltip tracking of 84.5 and 91.2 for the zero-shot and supervised approaches, respectively. The results validate the effectiveness of our approach and its potential to enhance real-time guidance and assessment in robotic-assisted surgical procedures. Full article

(This article belongs to the Special Issue Advances in Image Processing and Computer Vision Based on Machine Learning, 2nd Edition)

► Show Figures

Figure 1

20 pages, 750 KiB

Open AccessArticle

Physical Training Considerations for Futsal Players According to Strength and Conditioning Coaches: A Qualitative Study

by Rafael Albalad-Aiguabella, David Navarrete-Villanueva, Elena Mainer-Pardos, Oscar Villanueva-Guerrero, Borja Muniz-Pardos and Germán Vicente-Rodríguez

Sports 2025, 13(4), 126; https://doi.org/10.3390/sports13040126 - 18 Apr 2025

Viewed by 1390

Abstract

The professionalization of futsal requires greater physical demands on players, requiring strength and conditioning coaches to manage loads, optimize performance, and prevent injuries. This study aimed to describe the current practices of high-level strength and conditioning coaches and determine the elements needed to [...] Read more.

The professionalization of futsal requires greater physical demands on players, requiring strength and conditioning coaches to manage loads, optimize performance, and prevent injuries. This study aimed to describe the current practices of high-level strength and conditioning coaches and determine the elements needed to optimize their performance. Two video-recorded focus groups consisting of eight strength and conditioning coaches from the Spanish futsal league’s first and second divisions were transcribed, translated, and analyzed using a content analysis approach with open-ended questions on physical preparation and current practices. Results showed that strength and conditioning coaches prioritized five main areas: (1) competitive demands, (2) training load control and monitoring, (3) injury risk mitigation strategies, (4) contextual factors and interpersonal relationships, and (5) training methodologies to optimize performance. However, they also claim to deal with several limitations such as lack of time, limited resources and access to facilities, insufficient staff, problems related to combining sport with other activities (e.g., work), or the difficulty to individualize, which limits the optimization of their practices. Based on these findings, practical applications include implementing neuromuscular and strength training sessions at least twice a week, using cost-effective load monitoring tools (e.g., RPE and wellness questionnaires) to manage workloads, individualizing training programs to address the specific demands and characteristics of each player, and fostering close multidisciplinary collaboration to optimize performance and reduce injury risks. These insights can guide current and aspiring strength and conditioning coaches toward optimized practices. This study can assist novice strength and conditioning coaches in identifying the key focus areas of elite physical trainers and understanding their challenges and limitations, fostering collaboration among sports professionals to create a more optimized environment. Full article

(This article belongs to the Special Issue Strategies to Improve Modifiable Factors of Athletic Success)

► Show Figures

Figure 1

22 pages, 10173 KiB

Open AccessArticle

Tech-Enhanced Vocabulary Acquisition: Exploring the Use of Student-Created Video Learning Materials in the Tertiary-Level EFL (English as a Foreign Language) Flipped Classroom

by Jelena Bobkina, Svetlana Baluyan and Elena Dominguez Romero

Educ. Sci. 2025, 15(4), 450; https://doi.org/10.3390/educsci15040450 - 5 Apr 2025

Cited by 1 | Viewed by 1861

Abstract

This study explores the effectiveness of Technology-Assisted Vocabulary Learning (TAVL) using student-created video learning materials within a tertiary-level English as a Foreign Language (EFL) flipped classroom. By leveraging the flipped classroom model, which allocates classroom time for interactive activities and shifts instructional content [...] Read more.

This study explores the effectiveness of Technology-Assisted Vocabulary Learning (TAVL) using student-created video learning materials within a tertiary-level English as a Foreign Language (EFL) flipped classroom. By leveraging the flipped classroom model, which allocates classroom time for interactive activities and shifts instructional content delivery outside of class, the research investigates how student-produced videos can enhance vocabulary acquisition and retention. Conducted with 47 university students from a Translation and Translation Studies course, the study aims to fill a gap in empirical evidence regarding this innovative approach. Quantitative analysis revealed that students who created and utilized videos (Group 1) showed the highest improvement in vocabulary scores, followed by those who only used the videos (Group 2), with the control group relying on traditional teacher-led methods showing the least improvement. Qualitative feedback highlighted that video creators experienced deeper engagement and better vocabulary retention, while users appreciated the videos’ visual and auditory elements but faced challenges with vocabulary overload. The findings suggest that incorporating student-created videos into the curriculum fosters a dynamic and collaborative learning environment, offering practical implications for enhancing vocabulary instruction through technology-enhanced pedagogical practices. Future research should focus on optimizing video production processes and integrating these methods with traditional teaching for comprehensive vocabulary learning. Full article

(This article belongs to the Section Language and Literacy Education)

► Show Figures

Figure 1

15 pages, 6945 KiB

Open AccessArticle

Gaze Error Estimation and Linear Transformation to Improve Accuracy of Video-Based Eye Trackers

by Varun Padikal, Alex Plonkowski, Penelope F. Lawton, Laura K. Young and Jenny C. A. Read

Vision 2025, 9(2), 29; https://doi.org/10.3390/vision9020029 - 3 Apr 2025

Viewed by 1043

Abstract

Eye tracking technology plays a crucial role in various fields such as psychology, medical training, marketing, and human–computer interaction. However, achieving high accuracy over a larger field of view in eye tracking systems remains a significant challenge, both in free viewing and in [...] Read more.

Eye tracking technology plays a crucial role in various fields such as psychology, medical training, marketing, and human–computer interaction. However, achieving high accuracy over a larger field of view in eye tracking systems remains a significant challenge, both in free viewing and in a head-stabilized condition. In this paper, we propose a simple approach to improve the accuracy of video-based eye trackers through the implementation of linear coordinate transformations. This method involves applying stretching, shearing, translation, or their combinations to correct gaze accuracy errors. Our investigation shows that re-calibrating the eye tracker via linear transformations significantly improves the accuracy of video-based tracker over a large field of view. Full article

(This article belongs to the Section Visual Neuroscience)

► Show Figures

Figure 1

23 pages, 1716 KiB

Open AccessEditor’s ChoiceArticle

Knowledge Translator: Cross-Lingual Course Video Text Style Transform via Imposed Sequential Attention Networks

by Jingyi Zhang, Bocheng Zhao, Wenxing Zhang and Qiguang Miao

Electronics 2025, 14(6), 1213; https://doi.org/10.3390/electronics14061213 - 19 Mar 2025

Cited by 1 | Viewed by 485

Abstract

Massive Online Open Courses (MOOCs) have been growing rapidly in the past few years. Video content is an important carrier for cultural exchange and education popularization, and needs to be translated into multiple language versions to meet the needs of learners from different [...] Read more.

Massive Online Open Courses (MOOCs) have been growing rapidly in the past few years. Video content is an important carrier for cultural exchange and education popularization, and needs to be translated into multiple language versions to meet the needs of learners from different countries and regions. However, current MOOC video processing solutions rely excessively on manual operations, resulting in low efficiency and difficulty in meeting the urgent requirement for large-scale content translation. Key technical challenges include the accurate localization of embedded text in complex video frames, maintaining style consistency across languages, and preserving text readability and visual quality during translation. Existing methods often struggle with handling diverse text styles, background interference, and language-specific typographic variations. In view of this, this paper proposes an innovative cross-language style transfer algorithm that integrates advanced techniques such as attention mechanisms, latent space mapping, and adaptive instance normalization. Specifically, the algorithm first utilizes attention mechanisms to accurately locate the position of each text in the image, ensuring that subsequent processing can be targeted at specific text areas. Subsequently, by extracting features corresponding to this location information, the algorithm can ensure accurate matching of styles and text features, achieving an effective style transfer. Additionally, this paper introduces a new color loss function aimed at ensuring the consistency of text colors before and after style transfer, further enhancing the visual quality of edited images. Through extensive experimental verification, the algorithm proposed in this paper demonstrated excellent performance on both synthetic and real-world datasets. Compared with existing methods, the algorithm exhibited significant advantages in multiple image evaluation metrics, and the proposed method achieved a 2% improvement in the FID metric and a 20% improvement in the IS metric on relevant datasets compared to SOTA methods. Additionally, both the proposed method and the introduced dataset, PTTEXT, will be made publicly available upon the acceptance of the paper. For additional details, please refer to the project URL, which will be made public after the paper has been accepted. Full article

(This article belongs to the Special Issue Applications of Computational Intelligence, 3rd Edition)

► Show Figures

Figure 1

23 pages, 2409 KiB

Open AccessArticle

Generative AI in Higher Education Constituent Relationship Management (CRM): Opportunities, Challenges, and Implementation Strategies

by Carrie Marcinkevage and Akhil Kumar

Computers 2025, 14(3), 101; https://doi.org/10.3390/computers14030101 - 12 Mar 2025

Viewed by 2179

Abstract

This research explores opportunities for generative artificial intelligence (GenAI) in higher education constituent (customer) relationship management (CRM) to address the industry’s need for digital transformation driven by demographic shifts, economic challenges, and technological advancements. Using a qualitative research approach grounded in the principles [...] Read more.

This research explores opportunities for generative artificial intelligence (GenAI) in higher education constituent (customer) relationship management (CRM) to address the industry’s need for digital transformation driven by demographic shifts, economic challenges, and technological advancements. Using a qualitative research approach grounded in the principles of grounded theory, we conducted semi-structured interviews and an open-ended qualitative data collection instrument with technology vendors, implementation consultants, and HEI professionals that are actively exploring GenAI applications. Our findings highlight six primary types of GenAI—textual analysis and synthesis, data summarization, next-best action recommendations, speech synthesis and translation, code development, and image and video creation—each with applications across student recruitment, advising, alumni engagement, and administrative processes. We propose an evaluative framework with eight readiness criteria to assess institutional preparedness for GenAI adoption. While GenAI offers potential benefits, such as increased efficiency, reduced costs, and improved student engagement, its success depends on data readiness, ethical safeguards, and institutional leadership. By integrating GenAI as a co-intelligence alongside human expertise, HEIs can enhance CRM ecosystems and better support their constituents. Full article

(This article belongs to the Special Issue Smart Learning Environments)

► Show Figures

Figure 1

23 pages, 1774 KiB

Open AccessArticle

Adaptive Transformer-Based Deep Learning Framework for Continuous Sign Language Recognition and Translation

by Yahia Said, Sahbi Boubaker, Saleh M. Altowaijri, Ahmed A. Alsheikhy and Mohamed Atri

Mathematics 2025, 13(6), 909; https://doi.org/10.3390/math13060909 - 8 Mar 2025

Cited by 1 | Viewed by 1842

Abstract

Sign language recognition and translation remain pivotal for facilitating communication among the deaf and hearing communities. However, end-to-end sign language translation (SLT) faces major challenges, including weak temporal correspondence between sign language (SL) video frames and gloss annotations and the complexity of sequence [...] Read more.

Sign language recognition and translation remain pivotal for facilitating communication among the deaf and hearing communities. However, end-to-end sign language translation (SLT) faces major challenges, including weak temporal correspondence between sign language (SL) video frames and gloss annotations and the complexity of sequence alignment between long SL videos and natural language sentences. In this paper, we propose an Adaptive Transformer (ADTR)-based deep learning framework that enhances SL video processing for robust and efficient SLT. The proposed model incorporates three novel modules: Adaptive Masking (AM), Local Clip Self-Attention (LCSA), and Adaptive Fusion (AF) to optimize feature representation. The AM module dynamically removes redundant video frame representations, improving temporal alignment, while the LCSA module learns hierarchical representations at both local clip and full-video levels using a refined self-attention mechanism. Additionally, the AF module fuses multi-scale temporal and spatial features to enhance model robustness. Unlike conventional SLT models, our framework eliminates the reliance on gloss annotations, enabling direct translation from SL video sequences to spoken language text. The proposed method was evaluated using the ArabSign dataset, demonstrating state-of-the-art performance in translation accuracy, processing efficiency, and real-time applicability. The achieved results confirm that ADTR is a highly effective and scalable deep learning solution for continuous sign language recognition, positioning it as a promising AI-driven approach for real-world assistive applications. Full article

(This article belongs to the Special Issue Artificial Intelligence: Deep Learning and Computer Vision)

► Show Figures

Figure 1

Search Results (163)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (163)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI