MDPI - Publisher of Open Access Journals

23 pages, 3055 KiB

Open AccessArticle

A Markerless Approach for Full-Body Biomechanics of Horses

by Sarah K. Shaffer, Omar Medjaouri, Brian Swenson, Travis Eliason and Daniel P. Nicolella

Animals 2025, 15(15), 2281; https://doi.org/10.3390/ani15152281 - 5 Aug 2025

The ability to quantify equine kinematics is essential for clinical evaluation, research, and performance feedback. However, current methods are challenging to implement. This study presents a motion capture methodology for horses, where three-dimensional, full-body kinematics are calculated without instrumentation on the animal, offering [...] Read more.

The ability to quantify equine kinematics is essential for clinical evaluation, research, and performance feedback. However, current methods are challenging to implement. This study presents a motion capture methodology for horses, where three-dimensional, full-body kinematics are calculated without instrumentation on the animal, offering a more scalable and labor-efficient approach when compared with traditional techniques. Kinematic trajectories are calculated from multi-camera video data. First, a neural network identifies skeletal landmarks (markers) in each camera view and the 3D location of each marker is triangulated. An equine biomechanics model is scaled to match the subject’s shape, using segment lengths defined by markers. Finally, inverse kinematics (IK) produces full kinematic trajectories. We test this methodology on a horse at three gaits. Multiple neural networks (NNs), trained on different equine datasets, were evaluated. All networks predicted over 78% of the markers within 25% of the length of the radius bone on test data. Root-mean-square-error (RMSE) between joint angles predicted via IK using ground truth marker-based motion capture data and network-predicted data was less than 10 degrees for 25 to 32 of 35 degrees of freedom, depending on the gait and data used for network training. NNs trained over a larger variety of data improved joint angle RMSE and curve similarity. Marker prediction error, the average distance between ground truth and predicted marker locations, and IK marker error, the distance between experimental and model markers, were used to assess network, scaling, and registration errors. The results demonstrate the potential of markerless motion capture for full-body equine kinematic analysis. Full article

(This article belongs to the Special Issue Advances in Equine Sports Medicine, Therapy and Rehabilitation)

► Show Figures

Figure 1

16 pages, 353 KiB

Open AccessArticle

Surgical Assessment and Post-Operative Complications Following Video-Assisted Thoracoscopic Surgery (VATS) of Horses with Severe Equine Pasture Asthma During Asthma Exacerbation and Remission

by Caitlin J. Wenzel, Cathleen A. Mochal-King, Alison L. Eddy, Jacquelyn E. Bowser, Robert W. Wills, W. Isaac Jumper, Andrew Claude and Cyprianna E. Swiderski

Animals 2025, 15(15), 2276; https://doi.org/10.3390/ani15152276 - 4 Aug 2025

Abstract

The aim of this retrospective clinical study was to assess surgical duration and surgical and post-operative complications associated with Video-Assisted Thoracoscopic Surgery (VATS) and lung biopsy in horses with severe Equine Pasture Asthma (EPA) and paired control horses. Twelve horses (6 EPA-affected, 6 [...] Read more.

The aim of this retrospective clinical study was to assess surgical duration and surgical and post-operative complications associated with Video-Assisted Thoracoscopic Surgery (VATS) and lung biopsy in horses with severe Equine Pasture Asthma (EPA) and paired control horses. Twelve horses (6 EPA-affected, 6 control) were sex, age and breed matched. Twenty-four thoracic surgeries were performed. Surgery of each matched pair (EPA-affected and healthy) was performed during asthma exacerbation (summer) and remission (winter). Surgical times were shorter with uncomplicated thoracoscopy (85 min) and significantly longer (p < 0.001) when intra-operative complications necessitated conversion to thoracotomy (156 min). The overall surgical time of EPA-affected horses during asthma exacerbation was significantly longer than control horses at any time point, predicted mean difference of 78 min (p < 0.05). When comparing EPA-affected horses to themselves during asthma exacerbation and remission, surgical times were significantly longer (p < 0.01) with a predicted mean difference of 98 min; this effect of seasonality did not occur amongst control horses. Intra-operative surgical complications (6/24) were evenly divided between EPA and control horses, however, only severe EPA horses in exacerbation were noted to have lung hyperinflation. Post-operative complications: fever, colic, hemothorax, pneumothorax, subcutaneous emphysema, surgical site infection, and/or laminitis occurred in 13/24 surgical procedures (54%). No fatalities resulted from these procedures. Full article

(This article belongs to the Special Issue Surgical Procedures and Postoperative Complications in Animals)

► Show Figures

Figure 1

21 pages, 3755 KiB

Open AccessArticle

Thermal and Expansion Analysis of the Lebanese Flatbread Baking Process Using a High-Temperature Tunnel Oven

by Yves Mansour, Pierre Rahmé, Nemr El Hajj and Olivier Rouaud

Appl. Sci. 2025, 15(15), 8611; https://doi.org/10.3390/app15158611 (registering DOI) - 4 Aug 2025

Abstract

This study investigates the thermal dynamics and material behavior involved in the baking process for Lebanese flatbread, focusing on the heat transfer mechanisms, water loss, and dough expansion under high-temperature conditions. Despite previous studies on flatbread baking using impingement or conventional ovens, this [...] Read more.

This study investigates the thermal dynamics and material behavior involved in the baking process for Lebanese flatbread, focusing on the heat transfer mechanisms, water loss, and dough expansion under high-temperature conditions. Despite previous studies on flatbread baking using impingement or conventional ovens, this work presents the first experimental investigation of the traditional Lebanese flatbread baking process under realistic industrial conditions, specifically using a high-temperature tunnel oven with direct flame heating, extremely short baking times (~10–12 s), and peak temperatures reaching ~650 °C, which are essential to achieving the characteristic pocket formation and texture of Lebanese bread. This experimental study characterizes the baking kinetics of traditional Lebanese flatbread, recording mass loss pre- and post-baking, thermal profiles, and dough expansion through real-time temperature measurements and video recordings, providing insights into the dough’s thermal response and expansion behavior under high-temperature conditions. A custom-designed instrumented oven with a steel conveyor and a direct flame burner was employed. The dough, prepared following a traditional recipe, was analyzed during the baking process using K-type thermocouples and visual monitoring. Results revealed that Lebanese bread undergoes significant water loss due to high baking temperatures (~650 °C), leading to rapid crust formation and pocket development. Empirical equations modeling the relationship between baking time, temperature, and expansion were developed with high predictive accuracy. Additionally, an energy analysis revealed that the total energy required to bake Lebanese bread is approximately 667 kJ/kg, with an overall thermal efficiency of only 21%, dropping to 16% when preheating is included. According to previous CFD (Computational Fluid Dynamics) simulations, most heat loss in similar tunnel ovens occurs via the chimney (50%) and oven walls (29%). These findings contribute to understanding the broader thermophysical principles that can be applied to the development of more efficient baking processes for various types of bread. The empirical models developed in this study can be applied to automating and refining the industrial production of Lebanese flatbread, ensuring consistent product quality across different baking environments. Future studies will extend this work to alternative oven designs and dough formulations. Full article

(This article belongs to the Special Issue Chemical and Physical Properties in Food Processing: Second Edition)

► Show Figures

Figure 1

12 pages, 480 KiB

Open AccessArticle

A Novel Deep Learning Model for Predicting Colorectal Anastomotic Leakage: A Pioneer Multicenter Transatlantic Study

by Miguel Mascarenhas, Francisco Mendes, Filipa Fonseca, Eduardo Carvalho, Andre Santos, Daniela Cavadas, Guilherme Barbosa, Antonio Pinto da Costa, Miguel Martins, Abdullah Bunaiyan, Maísa Vasconcelos, Marley Ribeiro Feitosa, Shay Willoughby, Shakil Ahmed, Muhammad Ahsan Javed, Nilza Ramião, Guilherme Macedo and Manuel Limbert

J. Clin. Med. 2025, 14(15), 5462; https://doi.org/10.3390/jcm14155462 - 3 Aug 2025

Viewed by 56

Abstract

Background/Objectives: Colorectal anastomotic leak (CAL) is one of the most severe postoperative complications in colorectal surgery, impacting patient morbidity and mortality. Current risk assessment methods rely on clinical and intraoperative factors, but no real-time predictive tool exists. This study aimed to develop [...] Read more.

Background/Objectives: Colorectal anastomotic leak (CAL) is one of the most severe postoperative complications in colorectal surgery, impacting patient morbidity and mortality. Current risk assessment methods rely on clinical and intraoperative factors, but no real-time predictive tool exists. This study aimed to develop an artificial intelligence model based on intraoperative laparoscopic recording of the anastomosis for CAL prediction. Methods: A convolutional neural network (CNN) was trained with annotated frames from colorectal surgery videos across three international high-volume centers (Instituto Português de Oncologia de Lisboa, Hospital das Clínicas de Ribeirão Preto, and Royal Liverpool University Hospital). The dataset included a total of 5356 frames from 26 patients, 2007 with CAL and 3349 showing normal anastomosis. Four CNN architectures (EfficientNetB0, EfficientNetB7, ResNet50, and MobileNetV2) were tested. The models’ performance was evaluated using their sensitivity, specificity, accuracy, and area under the receiver operating characteristic (AUROC) curve. Heatmaps were generated to identify key image regions influencing predictions. Results: The best-performing model achieved an accuracy of 99.6%, AUROC of 99.6%, sensitivity of 99.2%, specificity of 100.0%, PPV of 100.0%, and NPV of 98.9%. The model reliably identified CAL-positive frames and provided visual explanations through heatmaps. Conclusions: To our knowledge, this is the first AI model developed to predict CAL using intraoperative video analysis. Its accuracy suggests the potential to redefine surgical decision-making by providing real-time risk assessment. Further refinement with a larger dataset and diverse surgical techniques could enable intraoperative interventions to prevent CAL before it occurs, marking a paradigm shift in colorectal surgery. Full article

(This article belongs to the Special Issue Updates in Digestive Diseases and Endoscopy)

► Show Figures

Figure 1

18 pages, 1127 KiB

Open AccessArticle

Deep Reinforcement Learning Method for Wireless Video Transmission Based on Large Deviations

by Yongxiao Xie and Shian Song

Mathematics 2025, 13(15), 2434; https://doi.org/10.3390/math13152434 - 28 Jul 2025

Viewed by 151

Abstract

In scalable video transmission research, the video transmission process is commonly modeled as a Markov decision process, where deep reinforcement learning (DRL) methods are employed to optimize the wireless transmission of scalable videos. Furthermore, the adaptive DRL algorithm can address the energy shortage [...] Read more.

In scalable video transmission research, the video transmission process is commonly modeled as a Markov decision process, where deep reinforcement learning (DRL) methods are employed to optimize the wireless transmission of scalable videos. Furthermore, the adaptive DRL algorithm can address the energy shortage problem caused by the uncertainty of energy capture and accumulated storage, thereby reducing video interruptions and enhancing user experience. To further optimize resources in wireless energy transmission and tackle the challenge of balancing exploration and exploitation in the DRL algorithm, this paper develops an adaptive DRL algorithm that extends classical DRL frameworks by integrating dropout techniques during both the training and prediction processes. Moreover, to address the issue of continuous negative rewards, which are often attributed to incomplete training in the wireless video transmission DRL algorithm, this paper introduces the Cramér large deviation principle for specific discrimination. It identifies the optimal negative reward frequency boundary and minimizes the probability of misjudgment regarding continuous negative rewards. Finally, experimental validation is performed using the 2048-game environment that simulates wireless scalable video transmission conditions. The results demonstrate that the adaptive DRL algorithm described in this paper achieves superior convergence speed and higher cumulative rewards compared to the classical DRL approaches. Full article

(This article belongs to the Special Issue Optimization Theory, Method and Application, 2nd Edition)

► Show Figures

Figure 1

23 pages, 3864 KiB

Open AccessArticle

Seeing Is Craving: Neural Dynamics of Appetitive Processing During Food-Cue Video Watching and Its Impact on Obesity

by Jinfeng Han, Kaixiang Zhuang, Debo Dong, Shaorui Wang, Feng Zhou, Yan Jiang and Hong Chen

Nutrients 2025, 17(15), 2449; https://doi.org/10.3390/nu17152449 - 27 Jul 2025

Viewed by 321

Abstract

Background/Objectives: Digital food-related videos significantly influence cravings, appetite, and weight outcomes; however, the dynamic neural mechanisms underlying appetite fluctuations during naturalistic viewing remain unclear. This study aimed to identify neural activity patterns associated with moment-to-moment appetite changes during naturalistic food-cue video viewing [...] Read more.

Background/Objectives: Digital food-related videos significantly influence cravings, appetite, and weight outcomes; however, the dynamic neural mechanisms underlying appetite fluctuations during naturalistic viewing remain unclear. This study aimed to identify neural activity patterns associated with moment-to-moment appetite changes during naturalistic food-cue video viewing and to examine their relationships with cravings and weight-related outcomes. Methods: Functional magnetic resonance imaging (fMRI) data were collected from 58 healthy female participants as they viewed naturalistic food-cue videos. Participants concurrently provided continuous ratings of their appetite levels throughout video viewing. Hidden Markov Modeling (HMM), combined with machine learning regression techniques, was employed to identify distinct neural states reflecting dynamic appetite fluctuations. Findings were independently validated using a shorter-duration food-cue video viewing task. Results: Distinct neural states characterized by heightened activation in default mode and frontoparietal networks consistently corresponded with increases in appetite ratings. Importantly, the higher expression of these appetite-related neural states correlated positively with participants’ Body Mass Index (BMI) and post-viewing food cravings. Furthermore, these neural states mediated the relationship between BMI and food craving levels. Longitudinal analyses revealed that the expression levels of appetite-related neural states predicted participants’ BMI trajectories over a subsequent six-month period. Participants experiencing BMI increases exhibited a significantly greater expression of these neural states compared to those whose BMI remained stable. Conclusions: Our findings elucidate how digital food cues dynamically modulate neural processes associated with appetite. These neural markers may serve as early indicators of obesity risk, offering valuable insights into the psychological and neurobiological mechanisms linking everyday media exposure to food cravings and weight management. Full article

(This article belongs to the Section Nutrition and Obesity)

► Show Figures

Figure 1

21 pages, 1622 KiB

Open AccessArticle

Enhancing Wearable Fall Detection System via Synthetic Data

by Minakshi Debnath, Sana Alamgeer, Md Shahriar Kabir and Anne H. Ngu

Sensors 2025, 25(15), 4639; https://doi.org/10.3390/s25154639 - 26 Jul 2025

Viewed by 355

Abstract

Deep learning models rely heavily on extensive training data, but obtaining sufficient real-world data remains a major challenge in clinical fields. To address this, we explore methods for generating realistic synthetic multivariate fall data to supplement limited real-world samples collected from three fall-related [...] Read more.

Deep learning models rely heavily on extensive training data, but obtaining sufficient real-world data remains a major challenge in clinical fields. To address this, we explore methods for generating realistic synthetic multivariate fall data to supplement limited real-world samples collected from three fall-related datasets: SmartFallMM, UniMib, and K-Fall. We apply three conventional time-series augmentation techniques, a Diffusion-based generative AI method, and a novel approach that extracts fall segments from public video footage of older adults. A key innovation of our work is the exploration of two distinct approaches: video-based pose estimation to extract fall segments from public footage, and Diffusion models to generate synthetic fall signals. Both methods independently enable the creation of highly realistic and diverse synthetic data tailored to specific sensor placements. To our knowledge, these approaches and especially their application in fall detection represent rarely explored directions in this research area. To assess the quality of the synthetic data, we use quantitative metrics, including the Fréchet Inception Distance (FID), Discriminative Score, Predictive Score, Jensen–Shannon Divergence (JSD), and Kolmogorov–Smirnov (KS) test, and visually inspect temporal patterns for structural realism. We observe that Diffusion-based synthesis produces the most realistic and distributionally aligned fall data. To further evaluate the impact of synthetic data, we train a long short-term memory (LSTM) model offline and test it in real time using the SmartFall App. Incorporating Diffusion-based synthetic data improves the offline F1-score by 7–10% and boosts real-time fall detection performance by 24%, confirming its value in enhancing model robustness and applicability in real-world settings. Full article

(This article belongs to the Special Issue Sensors Network and Wearables for People Activities and Wellbeing Monitoring)

► Show Figures

Figure 1

25 pages, 5652 KiB

Open AccessArticle

Modeling and Optimization of the Vacuum Degassing Process in Electric Steelmaking Route

by Bikram Konar, Noah Quintana and Mukesh Sharma

Processes 2025, 13(8), 2368; https://doi.org/10.3390/pr13082368 - 25 Jul 2025

Viewed by 258

Abstract

Vacuum degassing (VD) is a critical refining step in electric arc furnace (EAF) steelmaking for producing clean steel with reduced nitrogen and hydrogen content. This study develops an Effective Equilibrium Reaction Zone (EERZ) model focused on denitrogenation (de-N) by simulating interfacial reactions at [...] Read more.

Vacuum degassing (VD) is a critical refining step in electric arc furnace (EAF) steelmaking for producing clean steel with reduced nitrogen and hydrogen content. This study develops an Effective Equilibrium Reaction Zone (EERZ) model focused on denitrogenation (de-N) by simulating interfacial reactions at the bubble–steel interface (Z1). The model incorporates key process parameters such as argon flow rate, vacuum pressure, and initial nitrogen and sulfur concentrations. A robust empirical correlation was established between de-N efficiency and the mass of Z1, reducing prediction time from a day to under a minute. Additionally, the model was further improved by incorporating a dynamic surface exposure zone (Z_eye) to account for transient ladle eye effects on nitrogen removal under deep vacuum (<10 torr), validated using synchronized plant trials and Python-based video analysis. The integrated approach—combining thermodynamic-kinetic modeling, plant validation, and image-based diagnostics—provides a robust framework for optimizing VD control and enhancing nitrogen removal control in EAF-based steelmaking. Full article

(This article belongs to the Special Issue Innovative Approaches to Modeling, Optimization, Control, and Monitoring in Industrial Processes)

► Show Figures

Figure 1

15 pages, 1943 KiB

Open AccessArticle

Multimodal Latent Representation Learning for Video Moment Retrieval

by Jinkwon Hwang, Mingyu Jeon and Junyeong Kim

Sensors 2025, 25(14), 4528; https://doi.org/10.3390/s25144528 - 21 Jul 2025

Viewed by 438

Abstract

The rise of artificial intelligence (AI) has revolutionized the processing and analysis of video sensor data, driving advancements in areas such as surveillance, autonomous driving, and personalized content recommendations. However, leveraging video data presents unique challenges, particularly in the time-intensive feature extraction process [...] Read more.

The rise of artificial intelligence (AI) has revolutionized the processing and analysis of video sensor data, driving advancements in areas such as surveillance, autonomous driving, and personalized content recommendations. However, leveraging video data presents unique challenges, particularly in the time-intensive feature extraction process required for model training. This challenge is intensified in research environments lacking advanced hardware resources like GPUs. We propose a new method called the multimodal latent representation learning framework (MLRL) to address these limitations. MLRL enhances the performance of downstream tasks by conducting additional representation learning on pre-extracted features. By integrating and augmenting multimodal data, our method effectively predicts latent representations, leveraging pre-extracted features to reduce model training time and improve task performance. We validate the efficacy of MLRL on the video moment retrieval task using the QVHighlight dataset, benchmarking against the QD-DETR model. Our results demonstrate significant improvements, highlighting the potential of MLRL to streamline video data processing by leveraging pre-extracted features to bypass the time-consuming extraction process of raw sensor data and enhance model accuracy in various sensor-based applications. Full article

(This article belongs to the Special Issue Multimodal Perception Modeling Based on Advanced Computational Technologies)

► Show Figures

Figure 1

21 pages, 9571 KiB

Open AccessArticle

Performance Evaluation of Real-Time Image-Based Heat Release Rate Prediction Model Using Deep Learning and Image Processing Methods

by Joohyung Roh, Sehong Min and Minsuk Kong

Fire 2025, 8(7), 283; https://doi.org/10.3390/fire8070283 - 18 Jul 2025

Viewed by 506

Abstract

Heat release rate (HRR) is a key indicator for characterizing fire behavior, and it is conventionally measured under laboratory conditions. However, this measurement is limited in its widespread application to various fire conditions, due to its high cost, operational complexity, and lack of [...] Read more.

Heat release rate (HRR) is a key indicator for characterizing fire behavior, and it is conventionally measured under laboratory conditions. However, this measurement is limited in its widespread application to various fire conditions, due to its high cost, operational complexity, and lack of real-time predictive capability. Therefore, this study proposes an image-based HRR prediction model that uses deep learning and image processing techniques. The flame region in a fire video was segmented using the YOLO-YCbCr model, which integrates YCbCr color-space-based segmentation with YOLO object detection. For comparative analysis, the YOLO segmentation model was used. Furthermore, the fire diameter and flame height were determined from the spatial information of the segmented flame, and the HRR was predicted based on the correlation between flame size and HRR. The proposed models were applied to various experimental fire videos, and their prediction performances were quantitatively assessed. The results indicated that the proposed models accurately captured the HRR variations over time, and applying the average flame height calculation enhanced the prediction performance by reducing fluctuations in the predicted HRR. These findings demonstrate that the image-based HRR prediction model can be used to estimate real-time HRR values in diverse fire environments. Full article

► Show Figures

Figure 1

22 pages, 4033 KiB

Open AccessArticle

Masked Feature Residual Coding for Neural Video Compression

by Chajin Shin, Yonghwan Kim, KwangPyo Choi and Sangyoun Lee

Sensors 2025, 25(14), 4460; https://doi.org/10.3390/s25144460 - 17 Jul 2025

Viewed by 357

Abstract

In neural video compression, an approximation of the target frame is predicted, and a mask is subsequently applied to it. Then, the masked predicted frame is subtracted from the target frame and fed into the encoder along with the conditional information. However, this [...] Read more.

In neural video compression, an approximation of the target frame is predicted, and a mask is subsequently applied to it. Then, the masked predicted frame is subtracted from the target frame and fed into the encoder along with the conditional information. However, this structure has two limitations. First, in the pixel domain, even if the mask is perfectly predicted, the residuals cannot be significantly reduced. Second, reconstructed features with abundant temporal context information cannot be used as references for compressing the next frame. To address these problems, we propose Conditional Masked Feature Residual (CMFR) Coding. We extract features from the target frame and the predicted features using neural networks. Then, we predict the mask and subtract the masked predicted features from the target features. Thereafter, the difference is fed into the encoder with the conditional information. Moreover, to more effectively remove conditional information from the target frame, we introduce a Scaled Feature Fusion (SFF) module. In addition, we introduce a Motion Refiner to enhance the quality of the decoded optical flow. Experimental results show that our model achieves an 11.76% bit saving over the model without the proposed methods, averaged over all HEVC test sequences, demonstrating the effectiveness of the proposed methods. Full article

(This article belongs to the Special Issue Image and Video Processing and Recognition Based on Artificial Intelligence: 3rd Edition)

► Show Figures

Figure 1

16 pages, 944 KiB

Open AccessArticle

Artificial Intelligence in the Oil and Gas Industry: Applications, Challenges, and Future Directions

by Marcelo dos Santos Póvoas, Jéssica Freire Moreira, Severino Virgínio Martins Neto, Carlos Antonio da Silva Carvalho, Bruno Santos Cezario, André Luís Azevedo Guedes and Gilson Brito Alves Lima

Appl. Sci. 2025, 15(14), 7918; https://doi.org/10.3390/app15147918 - 16 Jul 2025

Viewed by 1129

Abstract

This study aims to provide a comprehensive overview of the application of artificial intelligence (AI) methods to solve real-world problems in the oil and gas sector. The methodology involved a two-step process for analyzing AI applications. In the first step, an initial exploration [...] Read more.

This study aims to provide a comprehensive overview of the application of artificial intelligence (AI) methods to solve real-world problems in the oil and gas sector. The methodology involved a two-step process for analyzing AI applications. In the first step, an initial exploration of scientific articles in the Scopus database was conducted using keywords related to AI and computational intelligence, resulting in a total of 11,296 articles. The bibliometric analysis conducted using VOS Viewer version 1.6.15 software revealed an average annual growth of approximately 15% in the number of publications related to AI in the sector between 2015 and 2024, indicating the growing importance of this technology. In the second step, the research focused on the OnePetro database, widely used by the oil industry, selecting articles with terms associated with production and drilling, such as “production system”, “hydrate formation”, “machine learning”, “real-time”, and “neural network”. The results highlight the transformative impact of AI on production operations, with key applications including optimizing operations through real-time data analysis, predictive maintenance to anticipate failures, advanced reservoir management through improved modeling, image and video analysis for continuous equipment monitoring, and enhanced safety through immediate risk detection. The bibliometric analysis identified a significant concentration of publications at Society of Petroleum Engineers (SPE) events, which accounted for approximately 40% of the selected articles. Overall, the integration of AI into production operations has driven significant improvements in efficiency and safety, and its continued evolution is expected to advance industry practices further and address emerging challenges. Full article

► Show Figures

Figure 1

20 pages, 5700 KiB

Open AccessArticle

Multimodal Personality Recognition Using Self-Attention-Based Fusion of Audio, Visual, and Text Features

by Hyeonuk Bhin and Jongsuk Choi

Electronics 2025, 14(14), 2837; https://doi.org/10.3390/electronics14142837 - 15 Jul 2025

Viewed by 458

Abstract

Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose [...] Read more.

Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose a multimodal personality recognition model that classifies the Big Five personality traits by extracting features from three heterogeneous sources: audio processed using Wav2Vec2, video represented as Skeleton Landmark time series, and text encoded through Bidirectional Encoder Representations from Transformers (BERT) and Doc2Vec embeddings. Each modality is handled through an independent Self-Attention block that highlights salient temporal information, and these representations are then summarized and integrated using a late fusion approach to effectively reflect both the inter-modal complementarity and cross-modal interactions. Compared to traditional recurrent neural network (RNN)-based multimodal models and unimodal classifiers, the proposed model achieves an improvement of up to 12 percent in the F1-score. It also maintains a high prediction accuracy and robustness under limited input conditions. Furthermore, a visualization based on t-distributed Stochastic Neighbor Embedding (t-SNE) demonstrates clear distributional separation across the personality classes, enhancing the interpretability of the model and providing insights into the structural characteristics of its latent representations. To support real-time deployment, a lightweight thread-based processing architecture is implemented, ensuring computational efficiency. By leveraging deep learning-based feature extraction and the Self-Attention mechanism, we present a novel personality recognition framework that balances performance with interpretability. The proposed approach establishes a strong foundation for practical applications in HRI, counseling, education, and other interactive systems that require personalized adaptation. Full article

(This article belongs to the Special Issue Explainable Machine Learning and Data Mining)

► Show Figures

Figure 1

25 pages, 4232 KiB

Open AccessArticle

Multimodal Fusion Image Stabilization Algorithm for Bio-Inspired Flapping-Wing Aircraft

by Zhikai Wang, Sen Wang, Yiwen Hu, Yangfan Zhou, Na Li and Xiaofeng Zhang

Biomimetics 2025, 10(7), 448; https://doi.org/10.3390/biomimetics10070448 - 7 Jul 2025

Viewed by 469

Abstract

This paper presents FWStab, a specialized video stabilization dataset tailored for flapping-wing platforms. The dataset encompasses five typical flight scenarios, featuring 48 video clips with intense dynamic jitter. The corresponding Inertial Measurement Unit (IMU) sensor data are synchronously collected, which jointly provide reliable [...] Read more.

This paper presents FWStab, a specialized video stabilization dataset tailored for flapping-wing platforms. The dataset encompasses five typical flight scenarios, featuring 48 video clips with intense dynamic jitter. The corresponding Inertial Measurement Unit (IMU) sensor data are synchronously collected, which jointly provide reliable support for multimodal modeling. Based on this, to address the issue of poor image acquisition quality due to severe vibrations in aerial vehicles, this paper proposes a multi-modal signal fusion video stabilization framework. This framework effectively integrates image features and inertial sensor features to predict smooth and stable camera poses. During the video stabilization process, the true camera motion originally estimated based on sensors is warped to the smooth trajectory predicted by the network, thereby optimizing the inter-frame stability. This approach maintains the global rigidity of scene motion, avoids visual artifacts caused by traditional dense optical flow-based spatiotemporal warping, and rectifies rolling shutter-induced distortions. Furthermore, the network is trained in an unsupervised manner by leveraging a joint loss function that integrates camera pose smoothness and optical flow residuals. When coupled with a multi-stage training strategy, this framework demonstrates remarkable stabilization adaptability across a wide range of scenarios. The entire framework employs Long Short-Term Memory (LSTM) to model the temporal characteristics of camera trajectories, enabling high-precision prediction of smooth trajectories. Full article

► Show Figures

Figure 1

21 pages, 4859 KiB

Open AccessArticle

Improvement of SAM2 Algorithm Based on Kalman Filtering for Long-Term Video Object Segmentation

by Jun Yin, Fei Wu, Hao Su, Peng Huang and Yuetong Qixuan

Sensors 2025, 25(13), 4199; https://doi.org/10.3390/s25134199 - 5 Jul 2025

Viewed by 536

Abstract

The Segment Anything Model 2 (SAM2) has achieved state-of-the-art performance in pixel-level object segmentation for both static and dynamic visual content. Its streaming memory architecture maintains spatial context across video sequences, yet struggles with long-term tracking due to its static inference framework. SAM [...] Read more.

The Segment Anything Model 2 (SAM2) has achieved state-of-the-art performance in pixel-level object segmentation for both static and dynamic visual content. Its streaming memory architecture maintains spatial context across video sequences, yet struggles with long-term tracking due to its static inference framework. SAM 2’s fixed temporal window approach indiscriminately retains historical frames, failing to account for frame quality or dynamic motion patterns. This leads to error propagation and tracking instability in challenging scenarios involving fast-moving objects, partial occlusions, or crowded environments. To overcome these limitations, this paper proposes SAM2Plus, a zero-shot enhancement framework that integrates Kalman filter prediction, dynamic quality thresholds, and adaptive memory management. The Kalman filter models object motion using physical constraints to predict trajectories and dynamically refine segmentation states, mitigating positional drift during occlusions or velocity changes. Dynamic thresholds, combined with multi-criteria evaluation metrics (e.g., motion coherence, appearance consistency), prioritize high-quality frames while adaptively balancing confidence scores and temporal smoothness. This reduces ambiguities among similar objects in complex scenes. SAM2Plus further employs an optimized memory system that prunes outdated or low-confidence entries and retains temporally coherent context, ensuring constant computational resources even for infinitely long videos. Extensive experiments on two video object segmentation (VOS) benchmarks demonstrate SAM2Plus’s superiority over SAM 2. It achieves an average improvement of 1.0 in J&F metrics across all 24 direct comparisons, with gains exceeding 2.3 points on SA-V and LVOS datasets for long-term tracking. The method delivers real-time performance and strong generalization without fine-tuning or additional parameters, effectively addressing occlusion recovery and viewpoint changes. By unifying motion-aware physics-based prediction with spatial segmentation, SAM2Plus bridges the gap between static and dynamic reasoning, offering a scalable solution for real-world applications such as autonomous driving and surveillance systems. Full article

(This article belongs to the Special Issue Image and Video Processing and Recognition Based on Artificial Intelligence: 3rd Edition)

► Show Figures

Figure 1

Search Results (1,212)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,212)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI