Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (382)

Search Parameters:
Keywords = conditional video generation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 3477 KB  
Article
Characteristics of Eye Movements and Correlation to Cognitive Functions in Relation to the Location of Guide Signs and Driving Speed
by Takaya Maeyama, Hiroki Okada and Daisuke Sawamura
J. Eye Mov. Res. 2026, 19(2), 25; https://doi.org/10.3390/jemr19020025 - 2 Mar 2026
Viewed by 171
Abstract
Driving safety critically depends on the ability of drivers to efficiently recognize and process guide sign information under varying traffic conditions. This study examined how driving speed (slow/fast) and guide sign location (front/left) influence eye-movement behavior during guide sign recognition, and how these [...] Read more.
Driving safety critically depends on the ability of drivers to efficiently recognize and process guide sign information under varying traffic conditions. This study examined how driving speed (slow/fast) and guide sign location (front/left) influence eye-movement behavior during guide sign recognition, and how these effects relate to drivers’ cognitive functions and basic demographics. Twenty-four licensed drivers performed a guide sign recognition task using onboard video stimuli, and eye movements based on fixations and saccades were recorded. Generalized linear mixed models with participants as random effects were used to analyze the interactions between driving conditions, cognitive functions, demographics, and eye movement measures. Under low-load conditions, such as slow driving and front-positioned signs, individual differences in cognitive functions, including verbal memory and useful field of view, were strongly reflected in eye-movement behavior. Under high-load conditions characterized by fast driving and left-positioned signs, the influence of cognitive function was reduced, and eye movements were more strongly associated with driving experience. Increasing driving speed was associated with fewer eye movements, whereas the saccade amplitude remained unchanged, indicating the suppression of exploratory eye movements. For left-positioned signs, the fixation duration on the target was maintained, whereas gaze shifts between the forward environment and the sign were reduced. Full article
Show Figures

Figure 1

31 pages, 4772 KB  
Article
Benchmark Operational Condition Multimodal Dataset Construction for the Municipal Solid Waste Incineration Process
by Yapeng Hua, Jian Tang and Hao Tian
Sustainability 2026, 18(5), 2282; https://doi.org/10.3390/su18052282 - 27 Feb 2026
Viewed by 103
Abstract
Municipal solid waste incineration (MSWI) is a typical complex industrial process for achieving sustainable development of the global environment. It implements the “perception-prediction–control” mode based on domain experts by using multimodal information. To harness the complementary value of different modal data, prevent information [...] Read more.
Municipal solid waste incineration (MSWI) is a typical complex industrial process for achieving sustainable development of the global environment. It implements the “perception-prediction–control” mode based on domain experts by using multimodal information. To harness the complementary value of different modal data, prevent information conflicts or fusion failures caused by misalignment, and ensure the availability of multimodal datasets and the reliability of analytical conclusions, constructing a benchmark operational condition multimodal dataset is essential. The objective of this work was to create a multimodal reference database for the operational status of IMSW processes. Based on the description of the MSWI process and the analysis of the characteristics of the multimodal data, the process data is first preprocessed under different missing scenarios, missing value processing and outlier processing. Then, single-frame images of the flame video are captured on a minute scale, and the missing combustion lines are quantized by using machine vision technology. Finally, the alignment of combustion line quantization (CLQ) values with the minute time scale of process data is achieved through the multimodal time synchronization module. Taking an MSWI power plant in Beijing as the research object, the combustion flame video and process data under the benchmark operating conditions were collected. The hybrid missing value management strategy combining linear interpolation with the LRDT model improved data integrity, and a spatiotemporal aligned multimodal dataset was constructed. The standardized benchmark operating condition multimodal data was obtained to support combustion state analysis during the incineration process, pollutant generation prediction, and process optimization. Therefore, the objectives of ‘reduction, harmlessness, and resource utilization’ of municipal solid waste, addressing land resource shortages, protecting the ecological environment, and promoting the dual carbon goal can be supported. Additionally, data and technical support for environmental and urban sustainable development are provided. Full article
(This article belongs to the Section Waste and Recycling)
Show Figures

Figure 1

21 pages, 2964 KB  
Article
MEMA: Multimodal Aesthetic Evaluation of Music in Visual Contexts
by Huaye Zhang, Chenglizhao Chen, Mengke Song, Tingting Chen, Diqiong Jiang, Lichun Liu and Xinyu Liu
Sensors 2026, 26(4), 1395; https://doi.org/10.3390/s26041395 - 23 Feb 2026
Viewed by 366
Abstract
Recent technologies such as music retrieval, soundtrack generation, and video understanding have developed rapidly. Consequently, the aesthetic evaluation of video soundtracks has become an important research topic in academia. Soundtracks are key elements in shaping the emotional atmosphere and driving the narrative rhythm. [...] Read more.
Recent technologies such as music retrieval, soundtrack generation, and video understanding have developed rapidly. Consequently, the aesthetic evaluation of video soundtracks has become an important research topic in academia. Soundtracks are key elements in shaping the emotional atmosphere and driving the narrative rhythm. Therefore, they require systematic methods to assess their artistic coordination with visual content. However, existing approaches mostly focus on evaluating the quality of the music itself. They often lack the ability to model the deeper aesthetic synergy between audio and visuals. To address this gap, we propose MEMA, a new soundtrack aesthetic evaluation model. MEMA employs a two-stage training strategy. The first stage builds a crossmodal imagination mechanism using a Conditional Variational Autoencoder. This method achieves bidirectional semantic reconstruction between audio and visuals. The second stage introduces a Guided Cross-Attention Alignment Module. This module enhances the model’s focus on key narrative moments in video. To facilitate this research, we also construct VMAE-Sets. It is the first large-scale dataset dedicated to soundtrack aesthetic evaluation. Finally, MEMA performs scoring and textual evaluation along three core aesthetic dimensions. Experimental results demonstrate that MEMA outperforms existing methods, achieving average improvements of 18.137% in LCC and 17.866% in SRCC compared to the strongest baseline. These findings confirm its superior audio–visual narrative alignment, demonstrating high consistency with human judgments. Full article
Show Figures

Figure 1

26 pages, 6282 KB  
Article
Biomechanical Evaluation of Head Acceleration and Kinematics in Boxing: The Role of Gloves and Helmets—A Pilot Study
by Monika Ratajczak, Dariusz Leśnik, Rafał Kubacki, Claudia Sbriglio and Mariusz Ptak
Appl. Sci. 2026, 16(4), 1999; https://doi.org/10.3390/app16041999 - 17 Feb 2026
Viewed by 353
Abstract
Head injuries remain one of the major health concerns in contact sports such as boxing. Despite the widespread use of protective gloves and helmets, their biomechanical effectiveness in mitigating head acceleration and reducing brain injury risk remains uncertain. This study aims to biomechanically [...] Read more.
Head injuries remain one of the major health concerns in contact sports such as boxing. Despite the widespread use of protective gloves and helmets, their biomechanical effectiveness in mitigating head acceleration and reducing brain injury risk remains uncertain. This study aims to biomechanically assess available boxing equipment solutions and identify the brain–skull system’s response to physical forces from a boxing punch. A dedicated experimental setup was developed using mini triaxial accelerometers and a high-speed camera to measure head accelerations in a Primus unbreakable dummy. Tests were performed using gloves of different masses (0 oz, 10 oz, and 16 oz) and three head protection configurations: no helmet, rugby helmet, and boxing helmet. The resultant accelerations were analyzed and compared across test conditions. Peak wrist accelerations ranged from 195.00 to 271.77 m/s2, while head accelerations did not exceed biomechanical injury thresholds. The boxing helmet, composed of multilayer polyurethane foam, did not consistently decrease acceleration; in some cases, it produced higher overloads due to increased head mass and moment of inertia. A rugby helmet made of open-cell EVA (ethylene vinyl acetate) foam with lower density exhibited more favorable energy-dissipation characteristics under low-impact conditions. Glove mass also influenced acceleration differently between male and female participants, likely due to variations in punch velocity and force generation. This work is a pilot study using two trained adult volunteers to validate the combined IMU–video measurement framework. The results serve as hypothesis-generating mechanistic observations rather than population-level effect estimates. Protective effectiveness in boxing depends on a complex interaction between material properties, geometry, and user biomechanics. Optimal equipment design should balance energy absorption and mass to minimize both linear and rotational accelerations. Future studies should integrate advanced material modeling and finite element simulations to support the development of adaptive, lightweight protective systems. Full article
(This article belongs to the Special Issue Physiology and Biomechanical Monitoring in Sport)
Show Figures

Figure 1

25 pages, 1558 KB  
Article
Towards Scalable Monitoring: An Interpretable Multimodal Framework for Migration Content Detection on TikTok Under Data Scarcity
by Dimitrios Taranis, Gerasimos Razis and Ioannis Anagnostopoulos
Electronics 2026, 15(4), 850; https://doi.org/10.3390/electronics15040850 - 17 Feb 2026
Viewed by 279
Abstract
Short-form video platforms such as TikTok (TikTok Pte. Ltd., Singapore) host large volumes of user-generated, often ephemeral, content related to irregular migration, where relevant cues are distributed across visual scenes, on-screen text, and multilingual captions. Automatically identifying migration-related videos is challenging due to [...] Read more.
Short-form video platforms such as TikTok (TikTok Pte. Ltd., Singapore) host large volumes of user-generated, often ephemeral, content related to irregular migration, where relevant cues are distributed across visual scenes, on-screen text, and multilingual captions. Automatically identifying migration-related videos is challenging due to this multimodal complexity and the scarcity of labeled data in sensitive domains. This paper presents an interpretable multimodal classification framework designed for deployment under data-scarce conditions. We extract features from platform metadata, automated video analysis (Google Cloud Video Intelligence), and Optical Character Recognition (OCR) text, and compare text-only, OCR-only, and vision-only baselines against a multimodal fusion approach using Logistic Regression, Random Forest, and XGBoost. In this pilot study, multimodal fusion consistently improves class separation over single-modality models, achieving an F1-score of 0.92 for the migration-related class under stratified cross-validation. Given the limited sample size, these results are interpreted as evidence of feature separability rather than definitive generalization. Feature importance and SHAP analyses identify OCR-derived keywords, maritime cues, and regional indicators as the most influential predictors. To assess robustness under data scarcity, we apply SMOTE to synthetically expand the training set to 500 samples and evaluate performance on a small held-out set of real videos, observing stable results that further support feature-level robustness. Finally, we demonstrate scalability by constructing a weakly labeled corpus of 600 videos using the identified multimodal cues, highlighting the suitability of the proposed feature set for weakly supervised monitoring at scale. Overall, this work serves as a methodological blueprint for building interpretable multimodal monitoring pipelines in sensitive, low-resource settings. Full article
(This article belongs to the Special Issue Multimodal Learning for Multimedia Content Analysis and Understanding)
Show Figures

Figure 1

24 pages, 6937 KB  
Article
Cost-Effective Fish Volume Estimation in Aquaculture Using Infrared Imaging and Multi-Modal Deep Learning
by Like Zhang, Yanling Han, Ge Song, Jing Wang and Ping Ma
Sensors 2026, 26(4), 1221; https://doi.org/10.3390/s26041221 - 13 Feb 2026
Viewed by 231
Abstract
Accurate fish volume estimation is essential for sustainable aquaculture management, yet traditional methods are invasive and costly, while existing non-invasive approaches rely on expensive multi-sensor setups. This study proposes a cost-effective infrared (IR)-only pipeline that reconstructs depth and Red Green Blue (RGB) from [...] Read more.
Accurate fish volume estimation is essential for sustainable aquaculture management, yet traditional methods are invasive and costly, while existing non-invasive approaches rely on expensive multi-sensor setups. This study proposes a cost-effective infrared (IR)-only pipeline that reconstructs depth and Red Green Blue (RGB) from low-cost infrared videos (<USD 100 per camera), enabling scalable biomass monitoring in dense tanks. The pipeline integrates five modules: IR-to-depth estimation with contour-guided attention and smoothing loss; IR-to-RGB generation via texture-conditioned injection and water-adaptive loss; detection and tracking using cross-modal fusion and behavior-constrained Kalman filtering; instance segmentation with depth-guided branches and deformation-adaptive loss; and volume estimation through trajectory–depth Transformer fusion with refraction correction. Trained on a curated dataset of 166 goldfish across 124 videos (8–16 fish/tank), the system achieves Mean Absolute Error (MAE) of 0.85 cm3 and coefficient of determination (R2) of 0.961 for volume estimation, outperforming state-of-the-art methods by 19–41% while reducing hardware costs by 80%. This work advances precision aquaculture by providing robust, deployable tools for feed optimization and health monitoring, promoting environmental sustainability amid rising global seafood demand. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

18 pages, 6606 KB  
Data Descriptor
Annotated IoT Dataset of Waste Collection Events
by Peter Tarábek, Andrej Michalek, Roman Hriník, Ľubomír Králik and Karol Decsi
Data 2026, 11(2), 38; https://doi.org/10.3390/data11020038 - 11 Feb 2026
Viewed by 238
Abstract
This work presents a curated dataset of multimodal sensor measurements from Internet of Things (IoT) units mounted on waste collection vehicles. Each unit records multiple data streams including GPS position, vehicle velocity, radar-based container presence, accelerometer readings of the lifting arm, and RFID [...] Read more.
This work presents a curated dataset of multimodal sensor measurements from Internet of Things (IoT) units mounted on waste collection vehicles. Each unit records multiple data streams including GPS position, vehicle velocity, radar-based container presence, accelerometer readings of the lifting arm, and RFID tag identifiers of the bins. The dataset provides two complementary forms of annotation: (1) algorithmically generated events that were manually cleaned through visual inspection of sensor signals, offering large-scale coverage across 5 vehicles over a total of 25 collection days, and (2) manually validated events derived from synchronized video recordings, representing ground truth for 3 vehicles over 8 collection days. In total, the dataset contains 12,391 annotated waste collection events. The dataset spans diverse operational conditions with varying container sizes and includes both RFID-equipped and non-RFID bins. It can be used to train and evaluate machine learning models for event detection, anomaly recognition, or explainability studies, and to support practical applications such as Pay-as-you-throw (PAYT) waste management schemes. By combining multimodal sensor signals with reliable annotations, the dataset represents a unique resource for advancing research in smart waste collection and the broader field of IoT-enabled urban services. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

24 pages, 933 KB  
Article
SDG-Driven Entrepreneurship Through Technology Solutions in Higher Education Enhanced by Problem-Based Learning: An Active Learning Approach in a Smart Classroom Environment
by Josep Petchamé, Dubravka Novkovic, Paul Fox, Lisa Kinnear and Ricardo Torres-Kompen
Sustainability 2026, 18(4), 1849; https://doi.org/10.3390/su18041849 - 11 Feb 2026
Viewed by 320
Abstract
This article describes a problem-based learning (PBL) intervention enhanced by a smart classroom environment, which supported online interactions and class activities. The academic experience was centered on the United Nations Sustainable Development Goals (SDGs). Multidisciplinary teams of first-year students worked with private companies [...] Read more.
This article describes a problem-based learning (PBL) intervention enhanced by a smart classroom environment, which supported online interactions and class activities. The academic experience was centered on the United Nations Sustainable Development Goals (SDGs). Multidisciplinary teams of first-year students worked with private companies on briefs explicitly mapped to the SDGs, where instruction combined coaching sessions, peer feedback, and short videos that scaffolded problem analysis, value proposition design, business-model development, and Minimum Viable Product (MVP) prototyping. Once the student teams completed the activity, a qualitative survey using the Bipolar Laddering (BLA) tool was administered to analyze the suitability of the PBL methodology for the activity. BLA elicits respondent-generated positive and negative poles and associated justifications through open questions; unlike structured questionnaires, it does not condition answers and foregrounds the students’ own categories of meaning. Findings are reported as observed patterns across teams and briefs rather than as claims of impact. The analysis attends to the role of technological scaffolds for first-year university students. The contribution of this research is twofold: (1) providing a replicable course design that situates sustainability and the SDGs in a real-world context, positioning early-stage undergraduates to practice design thinking and entrepreneurial action within an active learning approach; and (2) preserving students’ voices through the BLA tool in an activity that links PBL implementation to SDG-oriented outcomes. Full article
(This article belongs to the Special Issue Creating an Innovative Learning Environment)
Show Figures

Figure 1

15 pages, 16945 KB  
Article
TLDD-YOLO: An Improved YOLO for Transmission Line Component and Defect Detection
by Kuihao Wang, Yan Huang and Yincheng Qi
Electronics 2026, 15(4), 757; https://doi.org/10.3390/electronics15040757 - 11 Feb 2026
Viewed by 237
Abstract
Unmanned Aerial Vehicle (UAV) inspection of transmission lines faces two primary challenges when detecting and analyzing components or defects in videos or images: poor performance in detecting small objects, and interference from complex backgrounds. To enhance defect detection under such cluttered conditions, this [...] Read more.
Unmanned Aerial Vehicle (UAV) inspection of transmission lines faces two primary challenges when detecting and analyzing components or defects in videos or images: poor performance in detecting small objects, and interference from complex backgrounds. To enhance defect detection under such cluttered conditions, this paper introduces an improved YOLO-based model, termed Transmission Line Defect Detection–YOLO (TLDD-YOLO), which jointly optimizes feature representation via a Dual-Branch Guided Attention (DBGA) mechanism and a Spatial Offset Attention Module (SOAM). DBGA employs a dual-branch structure to extract high-frequency spatial details and channel-wise semantic information, thereby guiding the backbone network to preserve the critical edge and texture features of small objects, mitigating detail loss during downsampling. SOAM utilizes a lightweight offset generation network to produce spatial offset matrices, and dynamically adjusts feature distributions through offset-guided spatial alignment, enabling feature contours to better conform to object shapes while reducing interference from complex backgrounds. The experimental results on a self-constructed transmission line inspection dataset demonstrate that TLDD-YOLO achieves 57.1% mAP, 83.8% mAP50, and 36.1% mAPs. Compared with the baseline model, the proposed method improves mAP, mAP50, and mAPs by 1.8%, 1.8%, and 7.7%, respectively, confirming its effectiveness for small object detection in UAV-based transmission line inspection. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Electric Power Systems)
Show Figures

Figure 1

7 pages, 568 KB  
Case Report
Temporal Lobe Epilepsy Masquerading as Panic Attacks: A Case Report
by Samuel Cholette-Tétrault, Philippe Leclerc, Thomas Barabé-Tremblay and Michaela Barbarosie
Healthcare 2026, 14(4), 445; https://doi.org/10.3390/healthcare14040445 - 10 Feb 2026
Viewed by 328
Abstract
Background: The clinical presentation of temporal lobe epilepsy (TLE) and panic disorder can sometimes overlap, particularly when the seizure symptoms include paroxysmal episodes of intense fear and autonomic symptoms. As a result, patients with TLE can be misdiagnosed with a primary psychiatric illness, [...] Read more.
Background: The clinical presentation of temporal lobe epilepsy (TLE) and panic disorder can sometimes overlap, particularly when the seizure symptoms include paroxysmal episodes of intense fear and autonomic symptoms. As a result, patients with TLE can be misdiagnosed with a primary psychiatric illness, which leads to inappropriate treatment, worsening of the underlying condition and decreased function and quality of life. Clinical case: We present the case of a 46-year-old woman, known for a 20-year history of generalized epilepsy and major depressive disorder with panic attacks that were refractory and persistent despite trials of SSRIs, benzodiazepines and cognitive behavioral therapy (CBT). While hospitalized for video-EEG monitoring in the context of worsening epilepsy, she was found to have TLE seizures presenting as what the patient had described as panic attacks, and that sometimes progressed to secondarily generalized seizures. Following a transition from a medication regimen targeting generalized epilepsy to one more appropriate for focal seizures, the patient experienced clinical improvement with a decrease in the magnitude and frequency of panic symptoms. Conclusions: This case, in combination with other case reports in the literature, demonstrates the need for clinical suspicion of TLE in patients presenting with atypical panic-like episodes or a refractory panic disorder, especially in cases known for epilepsy or having risk factors for seizure disorder. It also highlights the importance of comprehensive diagnostic evaluation in neuropsychiatric presentations, including EEG and brain imaging, to ensure accurate diagnosis and appropriate management. Full article
(This article belongs to the Special Issue Substance Abuse, Mental Health Disorders, and Intervention Strategies)
Show Figures

Figure 1

23 pages, 6932 KB  
Article
RocSync: Millisecond-Accurate Temporal Synchronization for Heterogeneous Camera Systems
by Jaro Meyer, Frédéric Giraud, Joschua Wüthrich, Marc Pollefeys, Philipp Fürnstahl and Lilian Calvet
Sensors 2026, 26(3), 1036; https://doi.org/10.3390/s26031036 - 5 Feb 2026
Viewed by 339
Abstract
Accurate spatiotemporal alignment of multi-view video streams is essential for a wide range of dynamic-scene applications such as multi-view 3D reconstruction, pose estimation, and scene understanding. However, synchronizing multiple cameras remains a significant challenge, especially in heterogeneous setups combining professional- and consumer-grade devices, [...] Read more.
Accurate spatiotemporal alignment of multi-view video streams is essential for a wide range of dynamic-scene applications such as multi-view 3D reconstruction, pose estimation, and scene understanding. However, synchronizing multiple cameras remains a significant challenge, especially in heterogeneous setups combining professional- and consumer-grade devices, visible and infrared sensors, or systems with and without audio, where common hardware synchronization capabilities are often unavailable. This limitation is particularly evident in real-world environments, where controlled capture conditions are not feasible. In this work, we present a low-cost, general-purpose synchronization method that achieves millisecond-level temporal alignment across diverse camera systems while supporting both visible (RGB) and infrared (IR) modalities. The proposed solution employs a custom-built LED Clock that encodes time through red and infrared LEDs, allowing visual decoding of the exposure window (start and end times) from recorded frames for millisecond-level synchronization. We benchmark our method against hardware synchronization and achieve a residual error of 1.34 ms RMSE across multiple recordings. In further experiments, our method outperforms light-, audio-, and timecode-based synchronization approaches and directly improves downstream computer vision tasks, including multi-view pose estimation and 3D reconstruction. Finally, we validate the system in large-scale surgical recordings involving over 25 heterogeneous cameras spanning both IR and RGB modalities. This solution simplifies and streamlines the synchronization pipeline and expands access to advanced vision-based sensing in unconstrained environments, including industrial and clinical applications. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

31 pages, 8257 KB  
Article
Analytical Assessment of Pre-Trained Prompt-Based Multimodal Deep Learning Models for UAV-Based Object Detection Supporting Environmental Crimes Monitoring
by Andrea Demartis, Fabio Giulio Tonolo, Francesco Barchi, Samuel Zanella and Andrea Acquaviva
Geomatics 2026, 6(1), 14; https://doi.org/10.3390/geomatics6010014 - 3 Feb 2026
Viewed by 1080
Abstract
Illegal dumping poses serious risks to ecosystems and human health, requiring effective and timely monitoring strategies. Advances in uncrewed aerial vehicles (UAVs), photogrammetry, and deep learning (DL) have created new opportunities for detecting and characterizing waste objects over large areas. Within the framework [...] Read more.
Illegal dumping poses serious risks to ecosystems and human health, requiring effective and timely monitoring strategies. Advances in uncrewed aerial vehicles (UAVs), photogrammetry, and deep learning (DL) have created new opportunities for detecting and characterizing waste objects over large areas. Within the framework of the EMERITUS Project, an EU Horizon Europe initiative supporting the fight against environmental crimes, this study evaluates the performance of pre-trained prompt-based multimodal (PBM) DL models integrated into ArcGIS Pro for object detection and segmentation. To test such models, UAV surveys were specially conducted at a semi-controlled test site in northern Italy, producing very high-resolution orthoimages and video frames populated with simulated waste objects such as tyres, barrels, and sand piles. Three PBM models (CLIPSeg, GroundingDINO, and TextSAM) were tested under varying hyperparameters and input conditions, including orthophotos at multiple resolutions and frames extracted from UAV-acquired videos. Results show that model performance is highly dependent on object type and imagery resolution. In contrast, within the limited ranges tested, hyperparameter tuning rarely produced significant improvements. The evaluation of the models was performed using low IoU to generalize across different types of detection models and to focus on the ability of detecting object. When evaluating the models with orthoimagery, CLIPSeg achieved the highest accuracy with F1 scores up to 0.88 for tyres, whereas barrels and ambiguous classes consistently underperformed. Video-derived (oblique) frames generally outperformed orthophotos, reflecting a closer match to model training perspectives. Despite the current limitations in performances highlighted by the tests, PBM models demonstrate strong potential for democratizing GeoAI (Geospatial Artificial Intelligence). These tools effectively enable non-expert users to employ zero-shot classification in UAV-based monitoring workflows targeting environmental crime. Full article
Show Figures

Figure 1

17 pages, 4838 KB  
Article
Unseen Hazard Recognition in Autonomous Driving Using Vision–Language and Sensor-Based Temporal Models
by Faisal Mehmood, Sajid Ur Rehman, Asif Mehmood and Young-Jin Kim
Appl. Sci. 2026, 16(3), 1503; https://doi.org/10.3390/app16031503 - 2 Feb 2026
Viewed by 399
Abstract
Autonomous driving (AD) systems remain vulnerable to rare, ambiguous, and out-of-label (OOL) hazards that are insufficiently represented in conventional training datasets. This work investigates perception robustness under such conditions by using the Challenge of Out-Of-Label (COOOL) benchmark dataset, which consists of 200 dashcam [...] Read more.
Autonomous driving (AD) systems remain vulnerable to rare, ambiguous, and out-of-label (OOL) hazards that are insufficiently represented in conventional training datasets. This work investigates perception robustness under such conditions by using the Challenge of Out-Of-Label (COOOL) benchmark dataset, which consists of 200 dashcam video sequences annotated with both common and uncommon traffic hazards. We analyze that the behavior of widely used methods in the perception of components and present a multimodal pipeline in which we integrate YOLO11x for object detection, Hough Transform for lane estimation, and GPT-4o for scene description, and for temporal modeling, we use Long Short-Term Memory (LSTM) networks. On the COOOL benchmark, YOLO11x achieves an mAP@0.5 of 54.1% on the common object categories, whereas the detection of rare and OFL hazards remains challenging, with a recall of 72.6%. Incorporating temporal risk modeling improves hazard recall to 71.8%, indicating a modest but consistent gain in recognizing uncommon events. Hough Transform shows the stable behavior in standard conditions for lane estimation, with a mean lateral deviation of 8.9 pixels in daylight scenes and 13.4 pixels under low-light conditions. The temporal anomaly detection module attains an AUROC of 0.65, reflecting the limitation but meaningful discrimination between nominal and anomalous driving situations. For interpretability, the GPT-4o scene description module generates context-aware textual explanations with an object coverage score of 0.72 and a factual consistency rate of 78%, as assessed through manual inspection. The end-to-end pipeline operates at approximately 10–12 frames per second on a single GPU, supporting near-real-time analysis and optimization. Our results confirm that state-of-the-art perception models struggle with OOL hazards and that multimodal vision–language–temporal integration provides incremental improvements in robustness and interpretability when evaluated under the standardized out-of-distribution conditions. Full article
(This article belongs to the Special Issue Autonomous Vehicles and Robotics—2nd Edition)
Show Figures

Figure 1

20 pages, 3361 KB  
Article
Applied Dynamic System Theory for Coordination Assessment of Whole-Body Center of Mass During Different Countermovements
by Carlos Rodrigues, Miguel Velhote Correia, João M. C. S. Abrantes, Marco Aurélio Benedetti Rodrigues and Jurandir Nadal
Sensors 2026, 26(3), 957; https://doi.org/10.3390/s26030957 - 2 Feb 2026
Viewed by 406
Abstract
This study applies phase plane analysis of medio-lateral, anteroposterior, and vertical directions for the coordination assessment of whole-body (WB) center of mass (COM) movement during the impulse phase of a standard maximum vertical jump (MVJ) with long, short, and no countermovement (CM). A [...] Read more.
This study applies phase plane analysis of medio-lateral, anteroposterior, and vertical directions for the coordination assessment of whole-body (WB) center of mass (COM) movement during the impulse phase of a standard maximum vertical jump (MVJ) with long, short, and no countermovement (CM). A video system and force platform were used, with the amplitudes of WB COM excursion obtained from image-based motion capture at each anatomical direction, and the 2D and 3D mean radial distance were compared under long, short, and no CM conditions. The estimate of the population mean length was used as a measure of distribution concentration, and the Rayleigh statistical test for circular data was applied with the sample distribution critical value. Watson’s U2 goodness-of-fit test for the von Mises distribution was used with the mean direction and concentration factor. The applied metrics led to the detection of shared and specific features in the global and phase plane analysis of WB COM movement coordination in the medio-lateral, anteroposterior, and vertical directions during long, short, and no CM conditions in relation to MVJ performance assessed from ground reaction force (GRF) through the force platform. Thus, long, short, and no CM impulses share lower amplitudes of WB COM excursion in the medio-lateral direction and mean radial distance to its mean, whereas the anteroposterior and vertical excursion of WB COM, along with the 2D transversal and 3D spatial length of the WB COM path, present as potential predictors of MVJ performance, with distinct behavior in long CM compared to short and no CM. Additionally, the applied workflow on generalized phase plane analysis led to the detection, through complementary metrics, of the anatomical WB COM movement directions with higher coordination based on phase concentration tests at 5% significance, in line with MVJ performance under different CM conditions. Full article
Show Figures

Figure 1

19 pages, 1248 KB  
Article
Round-Trip Time Estimation Using Enhanced Regularized Extreme Learning Machine
by Hassan Rizky Putra Sailellah, Hilal Hudan Nuha and Aji Gautama Putrada
Network 2026, 6(1), 10; https://doi.org/10.3390/network6010010 - 29 Jan 2026
Viewed by 380
Abstract
Reliable Internet connectivity is essential for latency-sensitive services such as video conferencing, media streaming, and online gaming. Round-trip time (RTT) is a key indicator of network performance and is central to setting retransmission timeout (RTO); inaccurate RTT estimates may trigger unnecessary retransmissions or [...] Read more.
Reliable Internet connectivity is essential for latency-sensitive services such as video conferencing, media streaming, and online gaming. Round-trip time (RTT) is a key indicator of network performance and is central to setting retransmission timeout (RTO); inaccurate RTT estimates may trigger unnecessary retransmissions or slow loss recovery. This paper proposes an Enhanced Regularized Extreme Learning Machine (RELM) for RTT estimation that improves generalization and efficiency by interleaving a bidirectional log-step heuristic to select the regularization constant C. Unlike manual tuning or fixed-range grid search, the proposed heuristic explores C on a logarithmic scale in both directions (×10 and /10) within a single loop and terminates using a tolerance–patience criterion, reducing redundant evaluations without requiring predefined bounds. A custom RTT dataset is generated using Mininet with a dumbbell topology under controlled delay injections (1–1000 ms), yielding 1000 supervised samples derived from 100,000 raw RTT measurements. Experiments follow a strict train/validation/test split (6:1:3) with training-only standardization/normalization and validation-only hyperparameter selection. On the controlled Mininet dataset, the best configuration (ReLU, 150 hidden neurons, C=102) achieves R2=0.9999, MAPE=0.0018, MAE=966.04, and RMSE=1589.64 on the test set, while maintaining millisecond-level runtime. Under the same evaluation pipeline, the proposed method demonstrates competitive performance compared to common regression baselines (SVR, GAM, Decision Tree, KNN, Random Forest, GBDT, and ELM), while maintaining lower computational overhead within the controlled simulation setting. To assess practical robustness, an additional evaluation on a public real-world WiFi RSS–RTT dataset shows near-meter accuracy in LOS and mixed LOS/NLOS scenarios, while performance degrades markedly under dominant NLOS conditions, reflecting physical-channel limitations rather than model instability. These results demonstrate the feasibility of the Enhanced RELM and motivate further validation on operational networks with packet loss, jitter, and path variability. Full article
Show Figures

Figure 1

Back to TopTop