Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (857)

Search Parameters:
Keywords = video transformer

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 18583 KiB  
Article
Transforming Pedagogical Practices and Teacher Identity Through Multimodal (Inter)action Analysis: A Case Study of Novice EFL Teachers in China
by Jing Zhou, Chengfei Li and Yan Cheng
Behav. Sci. 2025, 15(8), 1050; https://doi.org/10.3390/bs15081050 - 3 Aug 2025
Viewed by 61
Abstract
This study investigates the evolving pedagogical strategies and professional identity development of two novice college English teachers in China through a semester-long classroom-based inquiry. Drawing on Norris’s Multimodal (Inter)action Analysis (MIA), it analyzes 270 min of video-recorded lessons across three instructional stages, supported [...] Read more.
This study investigates the evolving pedagogical strategies and professional identity development of two novice college English teachers in China through a semester-long classroom-based inquiry. Drawing on Norris’s Multimodal (Inter)action Analysis (MIA), it analyzes 270 min of video-recorded lessons across three instructional stages, supported by visual transcripts and pitch-intensity spectrograms. The analysis reveals each teacher’s transformation from textbook-reliant instruction to student-centered pedagogy, facilitated by multimodal strategies such as gaze, vocal pitch, gesture, and head movement. These shifts unfold across the following three evolving identity configurations: compliance, experimentation, and dialogic enactment. Rather than following a linear path, identity development is shown as a negotiated process shaped by institutional demands and classroom interactional realities. By foregrounding the multimodal enactment of self in a non-Western educational context, this study offers insights into how novice EFL teachers navigate tensions between traditional discourse norms and reform-driven pedagogical expectations, contributing to broader understandings of identity formation in global higher education. Full article
Show Figures

Figure 1

20 pages, 4569 KiB  
Article
Lightweight Vision Transformer for Frame-Level Ergonomic Posture Classification in Industrial Workflows
by Luca Cruciata, Salvatore Contino, Marianna Ciccarelli, Roberto Pirrone, Leonardo Mostarda, Alessandra Papetti and Marco Piangerelli
Sensors 2025, 25(15), 4750; https://doi.org/10.3390/s25154750 - 1 Aug 2025
Viewed by 205
Abstract
Work-related musculoskeletal disorders (WMSDs) are a leading concern in industrial ergonomics, often stemming from sustained non-neutral postures and repetitive tasks. This paper presents a vision-based framework for real-time, frame-level ergonomic risk classification using a lightweight Vision Transformer (ViT). The proposed system operates directly [...] Read more.
Work-related musculoskeletal disorders (WMSDs) are a leading concern in industrial ergonomics, often stemming from sustained non-neutral postures and repetitive tasks. This paper presents a vision-based framework for real-time, frame-level ergonomic risk classification using a lightweight Vision Transformer (ViT). The proposed system operates directly on raw RGB images without requiring skeleton reconstruction, joint angle estimation, or image segmentation. A single ViT model simultaneously classifies eight anatomical regions, enabling efficient multi-label posture assessment. Training is supervised using a multimodal dataset acquired from synchronized RGB video and full-body inertial motion capture, with ergonomic risk labels derived from RULA scores computed on joint kinematics. The system is validated on realistic, simulated industrial tasks that include common challenges such as occlusion and posture variability. Experimental results show that the ViT model achieves state-of-the-art performance, with F1-scores exceeding 0.99 and AUC values above 0.996 across all regions. Compared to previous CNN-based system, the proposed model improves classification accuracy and generalizability while reducing complexity and enabling real-time inference on edge devices. These findings demonstrate the model’s potential for unobtrusive, scalable ergonomic risk monitoring in real-world manufacturing environments. Full article
(This article belongs to the Special Issue Secure and Decentralised IoT Systems)
Show Figures

Figure 1

21 pages, 6892 KiB  
Article
Enhanced Temporal Action Localization with Separated Bidirectional Mamba and Boundary Correction Strategy
by Xiangbin Liu and Qian Peng
Mathematics 2025, 13(15), 2458; https://doi.org/10.3390/math13152458 - 30 Jul 2025
Viewed by 232
Abstract
Temporal action localization (TAL) is a research hotspot in video understanding, which aims to locate and classify actions in videos. However, existing methods have difficulties in capturing long-term actions due to focusing on local temporal information, which leads to poor performance in localizing [...] Read more.
Temporal action localization (TAL) is a research hotspot in video understanding, which aims to locate and classify actions in videos. However, existing methods have difficulties in capturing long-term actions due to focusing on local temporal information, which leads to poor performance in localizing long-term temporal sequences. In addition, most methods ignore the boundary importance for action instances, resulting in inaccurate localized boundaries. To address these issues, this paper proposes a state space model for temporal action localization, called Separated Bidirectional Mamba (SBM), which innovatively understands frame changes from the perspective of state transformation. It adapts to different sequence lengths and incorporates state information from the forward and backward for each frame through forward Mamba and backward Mamba to obtain more comprehensive action representations, enhancing modeling capabilities for long-term temporal sequences. Moreover, this paper designs a Boundary Correction Strategy (BCS). It calculates the contribution of each frame to action instances based on the pre-localized results, then adjusts weights of frames in boundary regression to ensure the boundaries are shifted towards the frames with higher contributions, leading to more accurate boundaries. To demonstrate the effectiveness of the proposed method, this paper reports mean Average Precision (mAP) under temporal Intersection over Union (tIoU) thresholds on four challenging benchmarks: THUMOS13, ActivityNet-1.3, HACS, and FineAction, where the proposed method achieves mAPs of 73.7%, 42.0%, 45.2%, and 29.1%, respectively, surpassing the state-of-the-art approaches. Full article
(This article belongs to the Special Issue Advances in Applied Mathematics in Computer Vision)
Show Figures

Figure 1

18 pages, 2688 KiB  
Article
Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking
by Jie Zhao, Ying Gao, Chunjuan Bo and Dong Wang
Sensors 2025, 25(15), 4691; https://doi.org/10.3390/s25154691 - 29 Jul 2025
Viewed by 124
Abstract
Visual object tracking is one of the core techniques in human-centered artificial intelligence, which is very useful for human–machine interaction. State-of-the-art tracking methods have shown their robustness and accuracy on many challenges. However, a large amount of videos with precisely dense annotations are [...] Read more.
Visual object tracking is one of the core techniques in human-centered artificial intelligence, which is very useful for human–machine interaction. State-of-the-art tracking methods have shown their robustness and accuracy on many challenges. However, a large amount of videos with precisely dense annotations are required for fully supervised training of their models. Considering that annotating videos frame-by-frame is a labor- and time-consuming workload, reducing the reliance on manual annotations during the tracking models’ training is an important problem to be resolved. To make a trade-off between the annotating costs and the tracking performance, we propose a weakly supervised tracking method based on co-saliency learning, which can be flexibly integrated into various tracking frameworks to reduce annotation costs and further enhance the target representation in current search images. Since our method enables the model to explore valuable visual information from unlabeled frames, and calculate co-salient attention maps based on multiple frames, our weakly supervised methods can obtain competitive performance compared to fully supervised baseline trackers, using only 3.33% of manual annotations. We integrate our method into two CNN-based trackers and a Transformer-based tracker; extensive experiments on four general tracking benchmarks demonstrate the effectiveness of our method. Furthermore, we also demonstrate the advantages of our method on egocentric tracking task; our weakly supervised method obtains 0.538 success on TREK-150, which is superior to prior state-of-the-art fully supervised tracker by 7.7%. Full article
Show Figures

Figure 1

19 pages, 305 KiB  
Article
Gender Inequalities and Precarious Work–Life Balance in Italian Academia: Emergency Remote Work and Organizational Change During the COVID-19 Lockdown
by Annalisa Dordoni
Soc. Sci. 2025, 14(8), 471; https://doi.org/10.3390/socsci14080471 - 29 Jul 2025
Viewed by 294
Abstract
The COVID-19 pandemic has exposed and intensified structural tensions surrounding work−life balance, precarity, and gender inequalities in academia. This paper examines the spatial, temporal, and emotional disruptions experienced by early-career and precarious researchers in Italy during the first national lockdown (March–April 2020) and [...] Read more.
The COVID-19 pandemic has exposed and intensified structural tensions surrounding work−life balance, precarity, and gender inequalities in academia. This paper examines the spatial, temporal, and emotional disruptions experienced by early-career and precarious researchers in Italy during the first national lockdown (March–April 2020) and their engagement in remote academic work. Adopting an exploratory and qualitative approach, the study draws on ten narrative video interviews and thirty participant-generated images to investigate how structural dimensions—such as gender, class, caregiving responsibilities, and the organizational culture of the neoliberal university—shaped these lived experiences. The findings highlight the implosion of boundaries between paid work, care, family life, and personal space and how this disarticulation exacerbated existing inequalities, particularly for women and caregivers. By interpreting both visual and narrative data through a sociological lens on gender, work, and organizations, the paper contributes to current debates on the transformation of academic labor and the reshaping of temporal work regimes through the everyday use of digital technologies in contemporary neoliberal capitalism. It challenges the individualization of discourses on productivity and flexibility and calls for gender-sensitive, structurally informed policies that support equitable and sustainable transitions in work and family life, in line with European policy frameworks. Full article
17 pages, 1603 KiB  
Perspective
A Perspective on Quality Evaluation for AI-Generated Videos
by Zhichao Zhang, Wei Sun and Guangtao Zhai
Sensors 2025, 25(15), 4668; https://doi.org/10.3390/s25154668 - 28 Jul 2025
Viewed by 285
Abstract
Recent breakthroughs in AI-generated content (AIGC) have transformed video creation, empowering systems to translate text, images, or audio into visually compelling stories. Yet reliable evaluation of these machine-crafted videos remains elusive because quality is governed not only by spatial fidelity within individual frames [...] Read more.
Recent breakthroughs in AI-generated content (AIGC) have transformed video creation, empowering systems to translate text, images, or audio into visually compelling stories. Yet reliable evaluation of these machine-crafted videos remains elusive because quality is governed not only by spatial fidelity within individual frames but also by temporal coherence across frames and precise semantic alignment with the intended message. The foundational role of sensor technologies is critical, as they determine the physical plausibility of AIGC outputs. In this perspective, we argue that multimodal large language models (MLLMs) are poised to become the cornerstone of next-generation video quality assessment (VQA). By jointly encoding cues from multiple modalities such as vision, language, sound, and even depth, the MLLM can leverage its powerful language understanding capabilities to assess the quality of scene composition, motion dynamics, and narrative consistency, overcoming the fragmentation of hand-engineered metrics and the poor generalization ability of CNN-based methods. Furthermore, we provide a comprehensive analysis of current methodologies for assessing AIGC video quality, including the evolution of generation models, dataset design, quality dimensions, and evaluation frameworks. We argue that advances in sensor fusion enable MLLMs to combine low-level physical constraints with high-level semantic interpretations, further enhancing the accuracy of visual quality assessment. Full article
(This article belongs to the Special Issue Perspectives in Intelligent Sensors and Sensing Systems)
Show Figures

Figure 1

21 pages, 9651 KiB  
Article
Self-Supervised Visual Tracking via Image Synthesis and Domain Adversarial Learning
by Gu Geng, Sida Zhou, Jianing Tang, Xinming Zhang, Qiao Liu and Di Yuan
Sensors 2025, 25(15), 4621; https://doi.org/10.3390/s25154621 - 25 Jul 2025
Viewed by 198
Abstract
With the widespread use of sensors in applications such as autonomous driving and intelligent security, stable and efficient target tracking from diverse sensor data has become increasingly important. Self-supervised visual tracking has attracted increasing attention due to its potential to eliminate reliance on [...] Read more.
With the widespread use of sensors in applications such as autonomous driving and intelligent security, stable and efficient target tracking from diverse sensor data has become increasingly important. Self-supervised visual tracking has attracted increasing attention due to its potential to eliminate reliance on costly manual annotations; however, existing methods often train on incomplete object representations, resulting in inaccurate localization during inference. In addition, current methods typically struggle when applied to deep networks. To address these limitations, we propose a novel self-supervised tracking framework based on image synthesis and domain adversarial learning. We first construct a large-scale database of real-world target objects, then synthesize training video pairs by randomly inserting these targets into background frames while applying geometric and appearance transformations to simulate realistic variations. To reduce domain shift introduced by synthetic content, we incorporate a domain classification branch after feature extraction and adopt domain adversarial training to encourage feature alignment between real and synthetic domains. Experimental results on five standard tracking benchmarks demonstrate that our method significantly enhances tracking accuracy compared to existing self-supervised approaches without introducing any additional labeling cost. The proposed framework not only ensures complete target coverage during training but also shows strong scalability to deeper network architectures, offering a practical and effective solution for real-world tracking applications. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)
Show Figures

Figure 1

17 pages, 1486 KiB  
Article
Use of Instagram as an Educational Strategy for Learning Animal Reproduction
by Carlos C. Pérez-Marín
Vet. Sci. 2025, 12(8), 698; https://doi.org/10.3390/vetsci12080698 - 25 Jul 2025
Viewed by 275
Abstract
The present study explores the use of Instagram as an innovative strategy in the teaching–learning process in the context of animal reproduction topics. In the current era, with digital technology and social media transforming how information is accessed and consumed, it is essential [...] Read more.
The present study explores the use of Instagram as an innovative strategy in the teaching–learning process in the context of animal reproduction topics. In the current era, with digital technology and social media transforming how information is accessed and consumed, it is essential for teachers to adapt and harness the potential of these tools for educational purposes. This article delves into the need for teachers to stay updated with current trends and the importance of promoting digital competences among teachers. This research aims to provide insights into the benefits of integrating social media into the educational landscape. Students of Veterinary Science degrees, Master’s degrees in Equine Sport Medicine as well as vocational education and training (VET) were involved in this study. An Instagram account named “UCOREPRO” was created for educational use, and it was openly available to all users. Instagram usage metrics were consistently tracked. A voluntary survey comprising 35 questions was conducted to collect feedback regarding the educational use of smartphone technology, social media habits and the UCOREPRO Instagram account. The integration of Instagram as an educational tool was positively received by veterinary students. Survey data revealed that 92.3% of respondents found the content engaging, with 79.5% reporting improved understanding of the subject and 71.8% acquiring new knowledge. Students suggested improvements such as more frequent posting and inclusion of academic incentives. Concerns about privacy and digital distraction were present but did not outweigh the perceived benefits. The use of short videos and microlearning strategies proved particularly effective in capturing students’ attention. Overall, Instagram was found to be a promising platform to enhance motivation, engagement, and informal learning in veterinary education, provided that thoughtful integration and clear educational objectives are maintained. In general, students expressed positive opinions about the initiative, and suggested some ways in which it could be improved as an educational tool. Full article
Show Figures

Figure 1

21 pages, 1231 KiB  
Article
Emotional Responses to Bed Bug Encounters: Effects of Sex, Proximity, and Educational Intervention on Fear and Disgust Perceptions
by Corraine A. McNeill and Rose H. Danek
Insects 2025, 16(8), 759; https://doi.org/10.3390/insects16080759 - 24 Jul 2025
Viewed by 490
Abstract
This study investigated individuals’ emotional responses to bed bugs and how these were influenced by sex, proximity, and educational intervention. Using a pre-post experimental design, participants (n = 157) completed emotional assessments before and after viewing an educational video about bed bugs. [...] Read more.
This study investigated individuals’ emotional responses to bed bugs and how these were influenced by sex, proximity, and educational intervention. Using a pre-post experimental design, participants (n = 157) completed emotional assessments before and after viewing an educational video about bed bugs. Contrary to our initial hypothesis that only fear and disgust would be observed, participants also exhibited high levels of anxiety and anger. Following the educational intervention, disgust, fear, and anger toward bed bugs increased significantly. Participants experienced greater disgust and fear when imagining encounters with bed bugs in closer proximity, with home infestations giving stronger responses than workplace scenarios. The educational video reduced disgust toward bed bugs in the home but increased fear of them in public spaces, potentially promoting vigilance that could limit bed bug spread. Females reported higher levels of disgust and fear than males across all proximity conditions, supporting evolutionary theories regarding sex-specific disgust sensitivity. The educational video successfully increased participants’ knowledge about bed bugs while simultaneously shifting emotional responses from contamination-based disgust to threat-specific fear. These findings suggest that educational interventions can effectively modify emotional responses to bed bugs, potentially leading to more rational management behaviors by transforming vague anxiety into actionable awareness of specific threats. Full article
(This article belongs to the Collection Cultural Entomology: Our Love-hate Relationship with Insects)
Show Figures

Figure 1

27 pages, 705 KiB  
Article
A Novel Wavelet Transform and Deep Learning-Based Algorithm for Low-Latency Internet Traffic Classification
by Ramazan Enisoglu and Veselin Rakocevic
Algorithms 2025, 18(8), 457; https://doi.org/10.3390/a18080457 - 23 Jul 2025
Viewed by 326
Abstract
Accurate and real-time classification of low-latency Internet traffic is critical for applications such as video conferencing, online gaming, financial trading, and autonomous systems, where millisecond-level delays can degrade user experience. Existing methods for low-latency traffic classification, reliant on raw temporal features or static [...] Read more.
Accurate and real-time classification of low-latency Internet traffic is critical for applications such as video conferencing, online gaming, financial trading, and autonomous systems, where millisecond-level delays can degrade user experience. Existing methods for low-latency traffic classification, reliant on raw temporal features or static statistical analyses, fail to capture dynamic frequency patterns inherent to real-time applications. These limitations hinder accurate resource allocation in heterogeneous networks. This paper proposes a novel framework integrating wavelet transform (WT) and artificial neural networks (ANNs) to address this gap. Unlike prior works, we systematically apply WT to commonly used temporal features—such as throughput, slope, ratio, and moving averages—transforming them into frequency-domain representations. This approach reveals hidden multi-scale patterns in low-latency traffic, akin to structured noise in signal processing, which traditional time-domain analyses often overlook. These wavelet-enhanced features train a multilayer perceptron (MLP) ANN, enabling dual-domain (time–frequency) analysis. We evaluate our approach on a dataset comprising FTP, video streaming, and low-latency traffic, including mixed scenarios with up to four concurrent traffic types. Experiments demonstrate 99.56% accuracy in distinguishing low-latency traffic (e.g., video conferencing) from FTP and streaming, outperforming k-NN, CNNs, and LSTMs. Notably, our method eliminates reliance on deep packet inspection (DPI), offering ISPs a privacy-preserving and scalable solution for prioritizing time-sensitive traffic. In mixed-traffic scenarios, the model achieves 74.2–92.8% accuracy, offering ISPs a scalable solution for prioritizing time-sensitive traffic without deep packet inspection. By bridging signal processing and deep learning, this work advances efficient bandwidth allocation and enables Internet Service Providers to prioritize time-sensitive flows without deep packet inspection, improving quality of service in heterogeneous network environments. Full article
(This article belongs to the Section Algorithms for Multidisciplinary Applications)
Show Figures

Figure 1

21 pages, 1115 KiB  
Article
Non-Contact Oxygen Saturation Estimation Using Deep Learning Ensemble Models and Bayesian Optimization
by Andrés Escobedo-Gordillo, Jorge Brieva and Ernesto Moya-Albor
Technologies 2025, 13(7), 309; https://doi.org/10.3390/technologies13070309 - 19 Jul 2025
Viewed by 373
Abstract
Monitoring Peripheral Oxygen Saturation (SpO2) is an important vital sign both in Intensive Care Units (ICUs), during surgery and convalescence, and as part of remote medical consultations after of the COVID-19 pandemic. This has made the development of new SpO2 [...] Read more.
Monitoring Peripheral Oxygen Saturation (SpO2) is an important vital sign both in Intensive Care Units (ICUs), during surgery and convalescence, and as part of remote medical consultations after of the COVID-19 pandemic. This has made the development of new SpO2-measurement tools an area of active research and opportunity. In this paper, we present a new Deep Learning (DL) combined strategy to estimate SpO2 without contact, using pre-magnified facial videos to reveal subtle color changes related to blood flow and with no calibration per subject required. We applied the Eulerian Video Magnification technique using the Hermite Transform (EVM-HT) as a feature detector to feed a Three-Dimensional Convolutional Neural Network (3D-CNN). Additionally, parameters and hyperparameter Bayesian optimization and an ensemble technique over the dataset magnified were applied. We tested the method on 18 healthy subjects, where facial videos of the subjects, including the automatic detection of the reference from a contact pulse oximeter device, were acquired. As performance metrics for the SpO2-estimation proposal, we calculated the Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and other parameters from the Bland–Altman (BA) analysis with respect to the reference. Therefore, a significant improvement was observed by adding the ensemble technique with respect to the only optimization, obtaining 14.32% in RMSE (reduction from 0.6204 to 0.5315) and 13.23% in MAE (reduction from 0.4323 to 0.3751). On the other hand, regarding Bland–Altman analysis, the upper and lower limits of agreement for the Mean of Differences (MOD) between the estimation and the ground truth were 1.04 and −1.05, with an MOD (bias) of −0.00175; therefore, MOD ±1.96σ = −0.00175 ± 1.04. Thus, by leveraging Bayesian optimization for hyperparameter tuning and integrating a Bagging Ensemble, we achieved a significant reduction in the training error (bias), achieving a better generalization over the test set, and reducing the variance in comparison with the baseline model for SpO2 estimation. Full article
(This article belongs to the Section Assistive Technologies)
Show Figures

Figure 1

16 pages, 944 KiB  
Article
Artificial Intelligence in the Oil and Gas Industry: Applications, Challenges, and Future Directions
by Marcelo dos Santos Póvoas, Jéssica Freire Moreira, Severino Virgínio Martins Neto, Carlos Antonio da Silva Carvalho, Bruno Santos Cezario, André Luís Azevedo Guedes and Gilson Brito Alves Lima
Appl. Sci. 2025, 15(14), 7918; https://doi.org/10.3390/app15147918 - 16 Jul 2025
Viewed by 1129
Abstract
This study aims to provide a comprehensive overview of the application of artificial intelligence (AI) methods to solve real-world problems in the oil and gas sector. The methodology involved a two-step process for analyzing AI applications. In the first step, an initial exploration [...] Read more.
This study aims to provide a comprehensive overview of the application of artificial intelligence (AI) methods to solve real-world problems in the oil and gas sector. The methodology involved a two-step process for analyzing AI applications. In the first step, an initial exploration of scientific articles in the Scopus database was conducted using keywords related to AI and computational intelligence, resulting in a total of 11,296 articles. The bibliometric analysis conducted using VOS Viewer version 1.6.15 software revealed an average annual growth of approximately 15% in the number of publications related to AI in the sector between 2015 and 2024, indicating the growing importance of this technology. In the second step, the research focused on the OnePetro database, widely used by the oil industry, selecting articles with terms associated with production and drilling, such as “production system”, “hydrate formation”, “machine learning”, “real-time”, and “neural network”. The results highlight the transformative impact of AI on production operations, with key applications including optimizing operations through real-time data analysis, predictive maintenance to anticipate failures, advanced reservoir management through improved modeling, image and video analysis for continuous equipment monitoring, and enhanced safety through immediate risk detection. The bibliometric analysis identified a significant concentration of publications at Society of Petroleum Engineers (SPE) events, which accounted for approximately 40% of the selected articles. Overall, the integration of AI into production operations has driven significant improvements in efficiency and safety, and its continued evolution is expected to advance industry practices further and address emerging challenges. Full article
Show Figures

Figure 1

20 pages, 5700 KiB  
Article
Multimodal Personality Recognition Using Self-Attention-Based Fusion of Audio, Visual, and Text Features
by Hyeonuk Bhin and Jongsuk Choi
Electronics 2025, 14(14), 2837; https://doi.org/10.3390/electronics14142837 - 15 Jul 2025
Viewed by 458
Abstract
Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose [...] Read more.
Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose a multimodal personality recognition model that classifies the Big Five personality traits by extracting features from three heterogeneous sources: audio processed using Wav2Vec2, video represented as Skeleton Landmark time series, and text encoded through Bidirectional Encoder Representations from Transformers (BERT) and Doc2Vec embeddings. Each modality is handled through an independent Self-Attention block that highlights salient temporal information, and these representations are then summarized and integrated using a late fusion approach to effectively reflect both the inter-modal complementarity and cross-modal interactions. Compared to traditional recurrent neural network (RNN)-based multimodal models and unimodal classifiers, the proposed model achieves an improvement of up to 12 percent in the F1-score. It also maintains a high prediction accuracy and robustness under limited input conditions. Furthermore, a visualization based on t-distributed Stochastic Neighbor Embedding (t-SNE) demonstrates clear distributional separation across the personality classes, enhancing the interpretability of the model and providing insights into the structural characteristics of its latent representations. To support real-time deployment, a lightweight thread-based processing architecture is implemented, ensuring computational efficiency. By leveraging deep learning-based feature extraction and the Self-Attention mechanism, we present a novel personality recognition framework that balances performance with interpretability. The proposed approach establishes a strong foundation for practical applications in HRI, counseling, education, and other interactive systems that require personalized adaptation. Full article
(This article belongs to the Special Issue Explainable Machine Learning and Data Mining)
Show Figures

Figure 1

21 pages, 7297 KiB  
Article
FGS-YOLOv8s-seg: A Lightweight and Efficient Instance Segmentation Model for Detecting Tomato Maturity Levels in Greenhouse Environments
by Dongfang Song, Ping Liu, Yanjun Zhu, Tianyuan Li and Kun Zhang
Agronomy 2025, 15(7), 1687; https://doi.org/10.3390/agronomy15071687 - 12 Jul 2025
Viewed by 383
Abstract
In a greenhouse environment, the application of artificial intelligence technology for selective tomato harvesting still faces numerous challenges, including varying lighting, background interference, and indistinct fruit surface features. This study proposes an improved instance segmentation model called FGS-YOLOv8s-seg, which achieves accurate detection and [...] Read more.
In a greenhouse environment, the application of artificial intelligence technology for selective tomato harvesting still faces numerous challenges, including varying lighting, background interference, and indistinct fruit surface features. This study proposes an improved instance segmentation model called FGS-YOLOv8s-seg, which achieves accurate detection and maturity grading of tomatoes in greenhouse environments. The model incorporates a novel SegNext_Attention mechanism at the end of the backbone, while simultaneously replacing Bottleneck structures in the neck layer with FasterNet blocks and integrating Gaussian Context Transformer modules to form a lightweight C2f_FasterNet_GCT structure. Experiments show that this model performs significantly better than mainstream segmentation models in core indicators such as precision (86.9%), recall (76.3%), average precision (mAP@0.5 84.8%), F1-score (81.3%), and GFLOPs (35.6 M). Compared with the YOLOv8s-seg baseline model, these metrics show improvements of 2.6%, 3.8%, 5.1%, 3.3%, and 6.8 M, respectively. Ablation experiments demonstrate that the improved architecture contributes significantly to performance gains, with combined improvements yielding optimal results. The analysis of detection performance videos under different cultivation patterns demonstrates the generalizability of the improved model in complex environments, achieving an optimal balance between detection accuracy (86.9%) and inference speed (53.2 fps). This study provides a reliable technical solution for the selective harvesting of greenhouse tomatoes. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

12 pages, 4368 KiB  
Article
A Dual-Branch Fusion Model for Deepfake Detection Using Video Frames and Microexpression Features
by Georgios Petmezas, Vazgken Vanian, Manuel Pastor Rufete, Eleana E. I. Almaloglou and Dimitris Zarpalas
J. Imaging 2025, 11(7), 231; https://doi.org/10.3390/jimaging11070231 - 11 Jul 2025
Viewed by 458
Abstract
Deepfake detection has become a critical issue due to the rise of synthetic media and its potential for misuse. In this paper, we propose a novel approach to deepfake detection by combining video frame analysis with facial microexpression features. The dual-branch fusion model [...] Read more.
Deepfake detection has become a critical issue due to the rise of synthetic media and its potential for misuse. In this paper, we propose a novel approach to deepfake detection by combining video frame analysis with facial microexpression features. The dual-branch fusion model utilizes a 3D ResNet18 for spatiotemporal feature extraction and a transformer model to capture microexpression patterns, which are difficult to replicate in manipulated content. We evaluate the model on the widely used FaceForensics++ (FF++) dataset and demonstrate that our approach outperforms existing state-of-the-art methods, achieving 99.81% accuracy and a perfect ROC-AUC score of 100%. The proposed method highlights the importance of integrating diverse data sources for deepfake detection, addressing some of the current limitations of existing systems. Full article
Show Figures

Figure 1

Back to TopTop