Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (96)

Search Parameters:
Keywords = time perceptual prediction

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
28 pages, 3284 KB  
Article
Diffusion-Enhanced Underwater Debris Detection via Improved YOLOv12n Framework
by Jianghan Tao, Fan Zhao, Yijia Chen, Yongying Liu, Feng Xue, Jian Song, Hao Wu, Jundong Chen, Peiran Li and Nan Xu
Remote Sens. 2025, 17(23), 3910; https://doi.org/10.3390/rs17233910 - 2 Dec 2025
Viewed by 436
Abstract
Detecting underwater debris is important for monitoring the marine environment but remains challenging due to poor image quality, visual noise, object occlusions, and diverse debris appearances in underwater scenes. This study proposes UDD-YOLO, a novel detection framework that, for the first time, applies [...] Read more.
Detecting underwater debris is important for monitoring the marine environment but remains challenging due to poor image quality, visual noise, object occlusions, and diverse debris appearances in underwater scenes. This study proposes UDD-YOLO, a novel detection framework that, for the first time, applies a diffusion-based model to underwater image enhancement, introducing a new paradigm for improving perceptual quality in marine vision tasks. Specifically, the proposed framework integrates three key components: (1) a Cold Diffusion module that acts as a pre-processing stage to restore image clarity and contrast by reversing deterministic degradation such as blur and occlusion—without injecting stochastic noise—making it the first diffusion-based enhancement applied to underwater object detection; (2) an AMC2f feature extraction module that combines multi-scale separable convolutions and learnable normalization to improve representation for targets with complex morphology and scale variation; and (3) a Unified-IoU (UIoU) loss function designed to dynamically balance localization learning between high- and low-quality predictions, thereby reducing errors caused by occlusion or boundary ambiguity. Extensive experiments are conducted on the public underwater plastic pollution detection dataset, which includes 15 categories of underwater debris. The proposed method achieves a mAP50 of 81.8%, with 87.3% precision and 75.1% recall, surpassing eleven advanced detection models such as Faster R-CNN, RT-DETR-L, YOLOv8n, and YOLOv12n. Ablation studies verify the function of every module. These findings show that diffusion-driven enhancement, when coupled with feature extraction and localization optimization, offers a promising direction for accurate, robust underwater perception, opening new opportunities for environmental monitoring and autonomous marine systems. Full article
Show Figures

Figure 1

22 pages, 1821 KB  
Article
Generative AI for Video Translation: A Scalable Architecture for Multilingual Video Conferencing
by Amirkia Rafiei Oskooei, Eren Caglar, Ibrahim Şahin, Ayse Kayabay and Mehmet S. Aktas
Appl. Sci. 2025, 15(23), 12691; https://doi.org/10.3390/app152312691 - 30 Nov 2025
Viewed by 426
Abstract
The real-time deployment of cascaded generative AI pipelines for applications like video translation is constrained by significant system-level challenges. These include the cumulative latency of sequential model inference and the quadratic (O(N2)) computational complexity that renders multi-user [...] Read more.
The real-time deployment of cascaded generative AI pipelines for applications like video translation is constrained by significant system-level challenges. These include the cumulative latency of sequential model inference and the quadratic (O(N2)) computational complexity that renders multi-user video conferencing applications unscalable. This paper proposes and evaluates a practical system-level framework designed to mitigate these critical bottlenecks. The proposed architecture incorporates a turn-taking mechanism to reduce computational complexity from quadratic to linear in multi-user scenarios, and a segmented processing protocol to manage inference latency for a perceptually real-time experience. We implement a proof-of-concept pipeline and conduct a rigorous performance analysis across a multi-tiered hardware setup, including commodity (NVIDIA RTX 4060), cloud (NVIDIA T4), and enterprise (NVIDIA A100) GPUs. Our objective evaluation demonstrates that the system achieves real-time throughput (τ<1.0) on modern hardware. A subjective user study further validates the approach, showing that a predictable, initial processing delay is highly acceptable to users in exchange for a smooth, uninterrupted playback experience. The work presents a validated, end-to-end system design that offers a practical roadmap for deploying scalable, real-time generative AI applications in multilingual communication platforms. Full article
Show Figures

Figure 1

14 pages, 1737 KB  
Article
Classification of Speech and Associated EEG Responses from Normal-Hearing and Cochlear Implant Talkers Using Support Vector Machines
by Shruthi Raghavendra, Sungmin Lee and Chin-Tuan Tan
Audiol. Res. 2025, 15(6), 158; https://doi.org/10.3390/audiolres15060158 - 18 Nov 2025
Viewed by 385
Abstract
Background/Objectives: Speech produced by individuals with hearing loss differs notably from that of normal-hearing (NH) individuals. Although cochlear implants (CIs) provide sufficient auditory input to support speech acquisition and control, there remains considerable variability in speech intelligibility among CI users. As a [...] Read more.
Background/Objectives: Speech produced by individuals with hearing loss differs notably from that of normal-hearing (NH) individuals. Although cochlear implants (CIs) provide sufficient auditory input to support speech acquisition and control, there remains considerable variability in speech intelligibility among CI users. As a result, speech produced by CI talkers often exhibits distinct acoustic characteristics compared to that of NH individuals. Methods: Speech data were obtained from eight cochlear-implant (CI) and eight normal-hearing (NH) talkers, while electroencephalogram (EEG) responses were recorded from 11 NH listeners exposed to the same speech stimuli. Support Vector Machine (SVM) classifiers employing 3-fold cross-validation were evaluated using classification accuracy as the performance metric. This study evaluated the efficacy of Support Vector Machine (SVM) algorithms using four kernel functions (Linear, Polynomial, Gaussian, and Radial Basis Function) to classify speech produced by NH and CI talkers. Six acoustic features—Log Energy, Zero-Crossing Rate (ZCR), Pitch, Linear Predictive Coefficients (LPC), Mel-Frequency Cepstral Coefficients (MFCCs), and Perceptual Linear Predictive Cepstral Coefficients (PLP-CC)—were extracted. These same features were also extracted from electroencephalogram (EEG) recordings of NH listeners who were exposed to the speech stimuli. The EEG analysis leveraged the assumption of quasi-stationarity over short time windows. Results: Classification of speech signals using SVMs yielded the highest accuracies of 100% and 94% for the Energy and MFCC features, respectively, using Gaussian and RBF kernels. EEG responses to speech achieved classification accuracies exceeding 70% for ZCR and Pitch features using the same kernels. Other features such as LPC and PLP-CC yielded moderate to low classification performance. Conclusions: The results indicate that both speech-derived and EEG-derived features can effectively differentiate between CI and NH talkers. Among the tested kernels, Gaussian and RBF provided superior performance, particularly when using Energy and MFCC features. These findings support the application of SVMs for multimodal classification in hearing research, with potential applications in improving CI speech processing and auditory rehabilitation. Full article
(This article belongs to the Section Hearing)
Show Figures

Figure 1

13 pages, 1410 KB  
Article
The Effect and Time Course of Prediction and Perceptual Load on Category-Based Attentional Orienting Across Color and Shape Dimensions
by Yunpeng Jiang, Tianyu Chen, Fangyuan Ou, Yun Wang, Ruixi Feng, Xia Wu and Lin Lin
Brain Sci. 2025, 15(11), 1210; https://doi.org/10.3390/brainsci15111210 - 9 Nov 2025
Viewed by 470
Abstract
Objectives: This study investigated the temporal dynamics of category-based attentional orienting (CAO) under the influences of prediction (top-down) and perceptual load (bottom-up) across color and shape dimensions, combining behavioral and event-related potential (ERP) measures. Methods: Across two experiments, we manipulated predictive validity and [...] Read more.
Objectives: This study investigated the temporal dynamics of category-based attentional orienting (CAO) under the influences of prediction (top-down) and perceptual load (bottom-up) across color and shape dimensions, combining behavioral and event-related potential (ERP) measures. Methods: Across two experiments, we manipulated predictive validity and perceptual load during a visual search for category-defined targets. Results: The results revealed a critical dimension-specific effect of prediction: invalid predictions elicited a larger N2pc component (indexing attentional selection) for shape-defined targets, but not color-defined targets, indicating that shape CAO relies more heavily on predictive information during early processing. At the behavioral level, a combined analysis of the two experiments revealed an interaction between prediction and perceptual load on accuracy, suggesting their integration can occur at later stages. Conclusions: These findings demonstrate that prediction and perceptual load exhibit distinct temporal profiles, primarily independently modulating early attentional orienting, with their interactive effects on behavior being more nuanced and dimension-dependent. This study elucidates the distinct temporal and dimensional mechanisms through which top-down and bottom-up sources of uncertainty shape attentional orienting to categories. Full article
(This article belongs to the Section Neuropsychology)
Show Figures

Figure 1

15 pages, 1029 KB  
Article
Climate-Crisis Landscapes in VR: Effects on Distance and Time Estimation
by Tina Iachini, Alessandro Troise, Angela Sole Rega, Angelo Lucio Silvino, Mariachiara Rapuano and Francesco Ruotolo
Sustainability 2025, 17(21), 9778; https://doi.org/10.3390/su17219778 - 3 Nov 2025
Viewed by 464
Abstract
The Climate Crisis is reshaping not only ecosystems but also human cognition. While its psychological impact is increasingly acknowledged, little is known about how environmental degradation influences basic cognitive functions. Since spatial and temporal cognition provide the perceptual scaffolding for orientation and various [...] Read more.
The Climate Crisis is reshaping not only ecosystems but also human cognition. While its psychological impact is increasingly acknowledged, little is known about how environmental degradation influences basic cognitive functions. Since spatial and temporal cognition provide the perceptual scaffolding for orientation and various decision-making processes, distortions in these dimensions may hinder adaptive responses to ecological change. This study examined whether simulated climate-related degradation affects spatial-temporal cognition and whether interoceptive awareness predicts variability in these effects. Using immersive Virtual Reality combined with an omnidirectional treadmill, participants walked along paths in verdant and arid landscapes and then estimated the duration and distance travelled on each path. The results showed that arid environments led to longer time and distance estimates than verdant ones, although there were no objective differences in path length or actual walking time. Furthermore, temporal judgements, but not spatial ones, were predicted by interoceptive attention regulation: participants with a higher capacity to regulate attention towards bodily sensations consistently provided shorter temporal estimates across all contexts. These findings demonstrate that spatial-temporal representations are sensitive to ecological quality and that interoceptive processes contribute to individual differences in temporal perception. This highlights the value of integrating cognitive processes and interoception into sustainability science, suggesting that environmental preservation supports not only ecological well-being but also the cognitive foundations through which humans perceive and adapt to their surroundings. Full article
Show Figures

Figure 1

29 pages, 23797 KB  
Article
Tone Mapping of HDR Images via Meta-Guided Bayesian Optimization and Virtual Diffraction Modeling
by Deju Huang, Xifeng Zheng, Jingxu Li, Ran Zhan, Jiachang Dong, Yuanyi Wen, Xinyue Mao, Yufeng Chen and Yu Chen
Sensors 2025, 25(21), 6577; https://doi.org/10.3390/s25216577 - 25 Oct 2025
Cited by 1 | Viewed by 812
Abstract
This paper proposes a novel image tone-mapping framework that incorporates meta-learning, a psychophysical model, Bayesian optimization, and light-field virtual diffraction. First, we formalize the virtual diffraction process as a mathematical operator defined in the frequency domain to reconstruct high-dynamic-range (HDR) images through phase [...] Read more.
This paper proposes a novel image tone-mapping framework that incorporates meta-learning, a psychophysical model, Bayesian optimization, and light-field virtual diffraction. First, we formalize the virtual diffraction process as a mathematical operator defined in the frequency domain to reconstruct high-dynamic-range (HDR) images through phase modulation, enabling the precise control of image details and contrast. In parallel, we apply the Stevens power law to simulate the nonlinear luminance perception of the human visual system, thereby adjusting the overall brightness distribution of the HDR image and improving the visual experience. Unlike existing methods that primarily emphasize structural fidelity, the proposed method strikes a balance between perceptual fidelity and visual naturalness. Secondly, an adaptive parameter tuning system based on Bayesian optimization is developed to conduct optimization of the Tone Mapping Quality Index (TMQI), quantifying uncertainty using probabilistic models to approximate the global optimum with fewer evaluations. Furthermore, we propose a task-distribution-oriented meta-learning framework: a meta-feature space based on image statistics is constructed, and task clustering is combined with a gated meta-learner to rapidly predict initial parameters. This approach significantly enhances the robustness of the algorithm in generalizing to diverse HDR content and effectively mitigates the cold-start problem in the early stage of Bayesian optimization, thereby accelerating the convergence of the overall optimization process. Experimental results demonstrate that the proposed method substantially outperforms state-of-the-art tone-mapping algorithms across multiple benchmark datasets, with an average improvement of up to 27% in naturalness. Furthermore, the meta-learning-guided Bayesian optimization achieves two- to five-fold faster convergence. In the trade-off between computational time and performance, the proposed method consistently dominates the Pareto frontier, achieving high-quality results and efficient convergence with a low computational cost. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

29 pages, 7838 KB  
Article
MSLNet and Perceptual Grouping for Guidewire Segmentation and Localization
by Adrian Barbu
Sensors 2025, 25(20), 6426; https://doi.org/10.3390/s25206426 - 17 Oct 2025
Viewed by 436
Abstract
Fluoroscopy (real-time X-ray) images are used for monitoring minimally invasive coronary angioplasty operations such as stent placement. During these operations, a thin wire called a guidewire is used to guide different tools, such as a stent or a balloon, in order to repair [...] Read more.
Fluoroscopy (real-time X-ray) images are used for monitoring minimally invasive coronary angioplasty operations such as stent placement. During these operations, a thin wire called a guidewire is used to guide different tools, such as a stent or a balloon, in order to repair the vessels. However, fluoroscopy images are noisy, and the guidewire is very thin, practically invisible in many places, making its localization very difficult. Guidewire segmentation is the task of finding the guidewire pixels, while guidewire localization is the higher-level task aimed at finding a parameterized curve describing the guidewire points. This paper presents a method for guidewire localization that starts from a guidewire segmentation, from which it extracts a number of initial curves as pixel chains and uses a novel perceptual grouping method to merge these initial curves into a small number of curves. The paper also introduces a novel guidewire segmentation method that uses a residual network (ResNet) as a feature extractor and predicts a coarse segmentation that is refined only in promising locations to a fine segmentation. Experiments on two challenging datasets, one with 871 frames and one with 23,449 frames, show that the method obtains results competitive with existing segmentation methods such as Res-UNet and nnU-Net, while having no skip connections and a faster inference time. Full article
(This article belongs to the Special Issue Advanced Deep Learning for Biomedical Sensing and Imaging)
Show Figures

Figure 1

36 pages, 2468 KB  
Systematic Review
Virtual Reality Application in Evaluating the Soundscape in Urban Environment: A Systematic Review
by Özlem Gök Tokgöz, Margret Sibylle Engel, Cherif Othmani and M. Ercan Altinsoy
Acoustics 2025, 7(4), 68; https://doi.org/10.3390/acoustics7040068 - 17 Oct 2025
Viewed by 1698
Abstract
Urban soundscapes are complex due to the interaction of different sound sources and the influence of structures on sound propagation. Moreover, the dynamic nature of sounds over time and space adds to this complexity. Virtual reality (VR) has emerged as a powerful tool [...] Read more.
Urban soundscapes are complex due to the interaction of different sound sources and the influence of structures on sound propagation. Moreover, the dynamic nature of sounds over time and space adds to this complexity. Virtual reality (VR) has emerged as a powerful tool to simulate acoustic and visual environments, offering users an immersive sense of presence in controlled settings. This technology facilitates more accurate and predictive assessment of urban environments. It serves as a flexible tool for exploring, analyzing, and interpreting them under repeatable conditions. This study presents a systematic literature review focusing on research that integrates VR technology for the audiovisual reconstruction of urban environments. This topic remains relatively underrepresented in the existing literature. A total of 69 peer-reviewed studies were analyzed in this systematic review. The studies were classified according to research goals, selected urban environments, VR technologies used, technical equipment, and experimental setups. In this study, the relationship between the tools used in urban VR representations is examined, and experimental setups are discussed from both technical and perceptual perspectives. This paper highlights existing challenges and opportunities in using VR to assess soundscapes and offers practical insights for future applications of VR in urban environments. Full article
Show Figures

Figure 1

36 pages, 6685 KB  
Article
From Predictive Coding to EBPM: A Novel DIME Integrative Model for Recognition and Cognition
by Ionel Cristian Vladu, Nicu George Bîzdoacă, Ionica Pirici and Bogdan Cătălin
Appl. Sci. 2025, 15(20), 10904; https://doi.org/10.3390/app152010904 - 10 Oct 2025
Viewed by 949
Abstract
Predictive Coding (PC) frameworks claim to model recognition via prediction–error loops, but they often lack explicit biological implementation of fast familiar recognition and impose latency that limits real-time robotic control. We begin with Experience-Based Pattern Matching (EBPM), a biologically grounded mechanism inspired [...] Read more.
Predictive Coding (PC) frameworks claim to model recognition via prediction–error loops, but they often lack explicit biological implementation of fast familiar recognition and impose latency that limits real-time robotic control. We begin with Experience-Based Pattern Matching (EBPM), a biologically grounded mechanism inspired by neural engram reactivation, enabling near-instantaneous recognition of familiar stimuli without iterative inference. Building upon this, we propose Dynamic Integrative Matching and Encoding (DIME), a hybrid system that relies on EBPM under familiar and low-uncertainty conditions and dynamically engages PC when confronted with novelty or high uncertainty. We evaluate EBPM, PC, and DIME across multiple image datasets (MNIST, Fashion-MNIST, CIFAR-10) and on a robotic obstacle-course simulation. Results from multi-seed experiments with ablation and complexity analyses show that EBPM achieves minimal latency (e.g., ~0.03 ms/ex in MNIST, ~0.026 ms/step in robotics) but poor performance in novel or noisy cases; PC exhibits robustness at a high cost; DIME delivers strong trade-offs—boosted accuracy in familiar clean situations (+4–5% over EBPM on CIFAR-10), while cutting PC invocations by ~50% relative to pure PC. Our contributions: (i) formalizing EBPM as a neurocomputational algorithm built from biologically plausible principles, (ii) developing DIME as a dynamic EBPM–PC integrator, (iii) providing ablation and complexity analyses illuminating component roles, and (iv) offering empirical validation in both perceptual and embodied robotic scenarios—paving the way for low-latency recognition systems. Full article
(This article belongs to the Section Robotics and Automation)
Show Figures

Figure 1

27 pages, 32995 KB  
Article
Recognition of Wood-Boring Insect Creeping Signals Based on Residual Denoising Vision Network
by Henglong Lin, Huajie Xue, Jingru Gong, Cong Huang, Xi Qiao, Liping Yin and Yiqi Huang
Sensors 2025, 25(19), 6176; https://doi.org/10.3390/s25196176 - 5 Oct 2025
Viewed by 720
Abstract
Currently, the customs inspection of wood-boring pests in timber still primarily relies on manual visual inspection, which involves observing insect holes on the timber surface and splitting the timber for confirmation. However, this method has significant drawbacks such as long detection time, high [...] Read more.
Currently, the customs inspection of wood-boring pests in timber still primarily relies on manual visual inspection, which involves observing insect holes on the timber surface and splitting the timber for confirmation. However, this method has significant drawbacks such as long detection time, high labor cost, and accuracy relying on human experience, making it difficult to meet the practical needs of efficient and intelligent customs quarantine. To address this issue, this paper develops a rapid identification system based on the peristaltic signals of wood-boring pests through the PyQt framework. The system employs a deep learning model with multi-attention mechanisms, namely the Residual Denoising Vision Network (RDVNet). Firstly, a LabVIEW-based hardware–software system is used to collect pest peristaltic signals in an environment free of vibration interference. Subsequently, the original signals are clipped, converted to audio format, and mixed with external noise. Then signal features are extracted through three cepstral feature extraction methods Mel-Frequency Cepstral Coefficients (MFCC), Power-Normalized Cepstral Coefficients (PNCC), and RelAtive SpecTrAl-Perceptual Linear Prediction (RASTA-PLP) and input into the model. In the experimental stage, this paper compares the denoising module of RDVNet (de-RDVNet) with four classic denoising models under five noise intensity conditions. Finally, it evaluates the performance of RDVNet and four other noise reduction classification models in classification tasks. The results show that PNCC has the most comprehensive feature extraction capability. When PNCC is used as the model input, de-RDVNet achieves an average peak signal-to-noise ratio (PSNR) of 29.8 and a Structural Similarity Index Measure (SSIM) of 0.820 in denoising experiments, both being the best among the comparative models. In classification experiments, RDVNet has an average F1 score of 0.878 and an accuracy of 92.8%, demonstrating the most excellent performance. Overall, the application of this system in customs timber quarantine can effectively improve detection efficiency and reduce labor costs and has significant practical value and promotion prospects. Full article
(This article belongs to the Section Smart Agriculture)
Show Figures

Figure 1

20 pages, 803 KB  
Article
The Effective Highlight-Detection Model for Video Clips Using Spatial—Perceptual
by Sungshin Kwak, Jaedong Lee and Sohyun Park
Electronics 2025, 14(18), 3640; https://doi.org/10.3390/electronics14183640 - 15 Sep 2025
Viewed by 2736
Abstract
With the rapid growth of video platforms such as YouTube, Bilibili, and Dailymotion, an enormous amount of video content is being shared worldwide. In this environment, content providers are increasingly adopting methods that restructure videos around highlight scenes and distribute them in short-form [...] Read more.
With the rapid growth of video platforms such as YouTube, Bilibili, and Dailymotion, an enormous amount of video content is being shared worldwide. In this environment, content providers are increasingly adopting methods that restructure videos around highlight scenes and distribute them in short-form formats to encourage more efficient content consumption by viewers. As a result of this trend, the importance of highlight extraction technologies capable of automatically identifying key scenes from large-scale video datasets has been steadily increasing. To address this need, this study proposes SPOT (Spatial Perceptual Optimized TimeSformer), a highlight extraction model. The proposed model enhances spatial perceptual capability by integrating a CNN encoder into the internal structure of the existing Transformer-based TimeSformer, enabling simultaneous learning of both the local and global features of a video. The experiments were conducted using Google’s YT-8M video dataset along with the MR.Hisum dataset, which provides organized highlight information. The SPOT model adopts a regression-based highlight prediction framework. Experimental results on video datasets of varying complexity showed that, in the high-complexity group, the SPOT model achieved a reduction in mean squared error (MSE) of approximately 0.01 (from 0.090 to 0.080) compared to the original TimeSformer. Furthermore, the model outperformed the baseline across all complexity groups in terms of mAP, Coverage, and F1-Score metrics. These results suggest that the proposed model holds strong potential for diverse multimodal applications such as video summarization, content recommendation, and automated video editing. Moreover, it is expected to serve as a foundational technology for advancing video-based artificial intelligence systems in the future. Full article
(This article belongs to the Special Issue Image Processing Based on Convolution Neural Network: 2nd Edition)
Show Figures

Figure 1

23 pages, 3668 KB  
Article
Graph-Driven Micro-Expression Rendering with Emotionally Diverse Expressions for Lifelike Digital Humans
by Lei Fang, Fan Yang, Yichen Lin, Jing Zhang and Mincheol Whang
Biomimetics 2025, 10(9), 587; https://doi.org/10.3390/biomimetics10090587 - 3 Sep 2025
Viewed by 1055
Abstract
Micro-expressions, characterized by brief and subtle facial muscle movements, are essential for conveying nuanced emotions in digital humans, yet existing rendering techniques often produce rigid or emotionally monotonous animations due to the inadequate modeling of temporal dynamics and action unit interdependencies. This paper [...] Read more.
Micro-expressions, characterized by brief and subtle facial muscle movements, are essential for conveying nuanced emotions in digital humans, yet existing rendering techniques often produce rigid or emotionally monotonous animations due to the inadequate modeling of temporal dynamics and action unit interdependencies. This paper proposes a graph-driven framework for micro-expression rendering that generates emotionally diverse and lifelike expressions. We employ a 3D-ResNet-18 backbone network to perform joint spatio-temporal feature extraction from facial video sequences, enhancing sensitivity to transient motion cues. Action units (AUs) are modeled as nodes in a symmetric graph, with edge weights derived from empirical co-occurrence probabilities and processed via a graph convolutional network to capture structural dependencies and symmetric interactions. This symmetry is justified by the inherent bilateral nature of human facial anatomy, where AU relationships are based on co-occurrence and facial anatomy analysis (as per the FACS), which are typically undirected and symmetric. Human faces are symmetric, and such relationships align with the design of classic spectral GCNs for undirected graphs, assuming that adjacency matrices are symmetric to model non-directional co-occurrences effectively. Predicted AU activations and timestamps are interpolated into continuous motion curves using B-spline functions and mapped to skeletal controls within a real-time animation pipeline (Unreal Engine). Experiments on the CASME II dataset demonstrate superior performance, achieving an F1-score of 77.93% and an accuracy of 84.80% (k-fold cross-validation, k = 5), outperforming baselines in temporal segmentation. Subjective evaluations confirm that the rendered digital human exhibits improvements in perceptual clarity, naturalness, and realism. This approach bridges micro-expression recognition and high-fidelity facial animation, enabling more expressive virtual interactions through curve extraction from AU values and timestamps. Full article
(This article belongs to the Section Bioinspired Sensorics, Information Processing and Control)
Show Figures

Figure 1

14 pages, 685 KB  
Proceeding Paper
Predictive Analysis of Voice Pathology Using Logistic Regression: Insights and Challenges
by Divya Mathews Olakkengil and Sagaya Aurelia P
Eng. Proc. 2025, 107(1), 28; https://doi.org/10.3390/engproc2025107028 - 27 Aug 2025
Viewed by 932
Abstract
Voice pathology diagnosis is essential for the timely detection and management of voice disorders, which can significantly impact an individual’s quality of life. This study employed logistic regression to evaluate the predictive power of variables that include age, severity, loudness, breathiness, pitch, roughness, [...] Read more.
Voice pathology diagnosis is essential for the timely detection and management of voice disorders, which can significantly impact an individual’s quality of life. This study employed logistic regression to evaluate the predictive power of variables that include age, severity, loudness, breathiness, pitch, roughness, strain, and gender on a binary diagnosis outcome (Yes/No). The analysis was performed on the Perceptual Voice Qualities Database (PVQD), a comprehensive dataset containing voice samples with perceptual ratings. Two widely used voice quality assessment tools, CAPE-V (Consensus Auditory-Perceptual Evaluation of Voice) and GRBAS (Grade, Roughness, Breathiness, Asthenia, Strain), were employed to annotate voice qualities, ensuring systematic and clinically relevant perceptual evaluations. The model revealed that age (odds ratio: 1.033, p < 0.001), loudness (odds ratio: 1.071, p = 0.005), and gender (male) (odds ratio: 1.904, p = 0.043) were statistically significant predictors of voice pathology. In contrast, severity and voice quality-related features like breathiness, pitch, roughness, and strain did not show statistical significance, suggesting their limited predictive contributions within this model. While the results provide valuable insights, the study underscores notable limitations of logistic regression. The model assumes a linear relationship between the independent variables and the log odds of the outcome, which restricts its ability to capture complex, non-linear patterns within the data. Additionally, logistic regression does not inherently account for interactions between predictors or feature dependencies, potentially limiting its performance in more intricate datasets. Furthermore, a fixed classification threshold (0.5) may lead to misclassification, particularly in datasets with imbalanced classes or skewed predictor distributions. These findings highlight that although logistic regression serves as a useful tool for identifying significant predictors, its results are dataset-dependent and cannot be generalized across diverse populations. Future research should validate these findings using heterogeneous datasets and employ advanced machine learning techniques to address the limitations of logistic regression. Integrating non-linear models or feature interaction analyses may enhance diagnostic accuracy, ensuring more reliable and robust voice pathology predictions. Full article
Show Figures

Figure 1

19 pages, 1221 KB  
Article
Comparative Analysis of Standard Operating Procedures Across Safety-Critical Domains: Lessons for Human Performance and Safety Engineering
by Jomana A. Bashatah and Lance Sherry
Systems 2025, 13(8), 717; https://doi.org/10.3390/systems13080717 - 20 Aug 2025
Viewed by 1252
Abstract
Standard Operating Procedures (SOPs) serve a critical role in complex systems operations, guiding operator response during normal and emergency scenarios. This study compares 29 SOPs (517 steps) across three domains with varying operator selection rigor: airline operations, Habitable Airlock (HAL) operations, and semi-autonomous [...] Read more.
Standard Operating Procedures (SOPs) serve a critical role in complex systems operations, guiding operator response during normal and emergency scenarios. This study compares 29 SOPs (517 steps) across three domains with varying operator selection rigor: airline operations, Habitable Airlock (HAL) operations, and semi-autonomous vehicles. Using the extended Procedure Representation Language (e-PRL) framework, each step was decomposed into perceptual, cognitive, and motor components, enabling quantitative analysis of step types, memory demands, and training requirements. Monte Carlo simulations compared Time on Procedure against the Allowable Operational Time Window to predict failure rates. The analysis revealed three universal vulnerabilities: verification steps missing following waiting requirements (70% in airline operations, 58% in HAL operations, and 25% in autonomous vehicle procedures), ambiguous perceptual cues (15–48% of steps), and excessive memory demands (highest in HAL procedures at 71% average recall score). Procedure failure probabilities varied significantly (5.72% to 63.47% across domains), with autonomous vehicle procedures showing the greatest variability despite minimal operator selection. Counterintuitively, Habitable Airlock procedures requiring the most selective operators had the highest memory demands, suggesting that rigorous operator selection may compensate for procedure design deficiencies. These findings establish that procedure design approaches vary by domain based on assumptions about operator capabilities rather than universal human factors principles. Full article
(This article belongs to the Section Systems Engineering)
Show Figures

Figure 1

27 pages, 2560 KB  
Article
Predicting Wine Quality Under Changing Climate: An Integrated Approach Combining Machine Learning, Statistical Analysis, and Systems Thinking
by Maja Borlinič Gačnik, Andrej Škraba, Karmen Pažek and Črtomir Rozman
Beverages 2025, 11(4), 116; https://doi.org/10.3390/beverages11040116 - 11 Aug 2025
Cited by 1 | Viewed by 2296
Abstract
Climate change poses significant challenges for viticulture, particularly in regions known for producing high-quality wines. Wine quality results from a complex interaction between climatic factors, regional characteristics, and viticultural practices. Methods: This study integrates statistical analysis, machine learning (ML) algorithms, and systems thinking [...] Read more.
Climate change poses significant challenges for viticulture, particularly in regions known for producing high-quality wines. Wine quality results from a complex interaction between climatic factors, regional characteristics, and viticultural practices. Methods: This study integrates statistical analysis, machine learning (ML) algorithms, and systems thinking to assess the extent to which wine quality can be predicted using monthly weather data and regional classification. The dataset includes average wine scores, monthly temperatures and precipitation, and categorical region data for Slovenia between 2011 and 2021. Predictive models tested include Random Forest, Support Vector Machine, Decision Tree, and linear regression. In addition, Causal Loop Diagrams (CLDs) were constructed to explore feedback mechanisms and systemic dynamics. Results: The Random Forest model showed the highest prediction accuracy (R2 = 0.779). Regional classification emerged as the most influential variable, followed by temperatures in September and April. Precipitation did not have a statistically significant effect on wine ratings. CLD models revealed time delays in the effects of adaptation measures and highlighted the role of perceptual lags in growers’ responses to climate signals. Conclusions: The combined use of ML, statistical methods, and CLDs enhances understanding of how climate variability influences wine quality. This integrated approach offers practical insights for winegrowers, policymakers, and regional planners aiming to develop climate-resilient viticultural strategies. Future research should include phenological phase modeling and dynamic simulation to further improve predictive accuracy and system-level understanding. Full article
(This article belongs to the Section Sensory Analysis of Beverages)
Show Figures

Graphical abstract

Back to TopTop