Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (356)

Search Parameters:
Keywords = conditional video generation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
35 pages, 3912 KB  
Article
Integration of Road Data Collected Using LSB Audio Steganography
by Adam Stančić, Ivan Grgurević, Marko Matulin and Marko Periša
Technologies 2025, 13(12), 597; https://doi.org/10.3390/technologies13120597 - 18 Dec 2025
Abstract
Modern traffic-monitoring systems increasingly rely on supplemental analytical data to complement video recordings, yet such data are rarely integrated into video containers without altering the original footage. This paper proposes a lightweight audio-based approach for embedding road-condition information using a Least Significant Bit [...] Read more.
Modern traffic-monitoring systems increasingly rely on supplemental analytical data to complement video recordings, yet such data are rarely integrated into video containers without altering the original footage. This paper proposes a lightweight audio-based approach for embedding road-condition information using a Least Significant Bit (LSB) steganography framework. The method operates by serializing sensor data, encoding it into the LSB positions of synthetically generated audio, and subsequently compressing the audio track while preserving imperceptibility and video integrity. A series of controlled experiments evaluates how waveform type, sampling rate, amplitude, and frequency influence the storage efficiency and quality of WAV and FLAC stego-audio files. Additional tests examine the impact of embedding capacity and output-quality settings on compression behavior. Results reveal clear trade-offs between audio quality, data capacity, and file size, demonstrating that the proposed framework enables efficient, secure, and scalable integration of metadata into surveillance recordings. The findings establish practical guidelines for deploying LSB-based audio embedding in real traffic-monitoring environments. Full article
(This article belongs to the Special Issue IoT-Enabling Technologies and Applications—2nd Edition)
23 pages, 3710 KB  
Article
Multi-Domain Intelligent State Estimation Network for Highly Maneuvering Target Tracking with Non-Gaussian Noise
by Zhenzhen Ma, Xueying Wang, Yuan Huang, Qingyu Xu, Wei An and Weidong Sheng
Remote Sens. 2025, 17(24), 4016; https://doi.org/10.3390/rs17244016 - 12 Dec 2025
Viewed by 154
Abstract
In the field of remote sensing, tracking highly maneuvering targets is challenging due to its rapidly changing patterns and uncertainties, particularly under non-Gaussian noise conditions. In this paper, we consider the problem of tracking highly maneuvering targets without using preset parameters in non-Gaussian [...] Read more.
In the field of remote sensing, tracking highly maneuvering targets is challenging due to its rapidly changing patterns and uncertainties, particularly under non-Gaussian noise conditions. In this paper, we consider the problem of tracking highly maneuvering targets without using preset parameters in non-Gaussian noise. We propose a multi-domain intelligent state estimation network (MIENet). It consists of two main models to estimate the key parameter for the Unscented Kalman Filter, enabling robust tracking of highly maneuvering targets under various intensities and distributions of observation noise. The first model, called a fusion denoising model (FDM), is designed to eliminate observation noise by enhancing multi-domain feature fusion. The second model, called a parameter estimation model (PEM), is designed to estimate key parameters of target motion by learning both global and local motion information. Additionally, we design a physically constrained loss function (PCLoss) that incorporates physics-informed constraints and prior knowledge. We evaluate our method on radar trajectory simulation and real remote sensing video datasets. Simulation results on the LAST dataset demonstrate that the proposed FDM can reduce the root mean square error (RMSE) of observation noise by more than 60%. Moreover, the proposed MIENet consistently outperforms the state-of-the-art state estimation algorithms across various highly maneuvering scenes, achieving this performance without requiring adjustment of noise parameters under non-Gaussian noise. Furthermore, experiments conducted on the real-world SV248S dataset confirm that MIENet effectively generalizes to satellite video object tracking tasks. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Figure 1

16 pages, 2755 KB  
Article
Outdoor Rearing and Behavioural Patterns in Diverse Rabbit Breeds: An Exploratory Study
by Luigia Bosa, Gloria Bernabucci, Francesca Di Federico, Lorenzo Nompleggio, Marta Vispi, Laura Menchetti, Alessandro Dal Bosco, Simona Mattioli, Riccardo Primi, Pedro Girotti and Cesare Castellini
Animals 2025, 15(24), 3562; https://doi.org/10.3390/ani15243562 - 11 Dec 2025
Viewed by 236
Abstract
EU regulations on organic rabbit farming are relatively recent, and scientific evidence on key technical aspects remains limited. Outdoor systems may improve health and welfare by allowing natural behaviours, but their effectiveness depends on management practices, environmental conditions, and breed. The objective of [...] Read more.
EU regulations on organic rabbit farming are relatively recent, and scientific evidence on key technical aspects remains limited. Outdoor systems may improve health and welfare by allowing natural behaviours, but their effectiveness depends on management practices, environmental conditions, and breed. The objective of this study was to explore breed-related differences in rabbit behaviour under outdoor rearing conditions. A total of 15 Leprino di Viterbo (LV) and 15 New Zealand White (NZW) rabbits were weaned at 29 days of age and reared under outdoor conditions until 84 days of age, during October and November. All animals had ad libitum access to a commercial pelleted diet and pasture, and the intake of both was measured weekly. Animal behaviour was monitored by video recording, and eight one-minute sample intervals per day were analysed using focal sampling and continuous recording methods. Grass intake and estimated digestible energy (DE) were assessed on a weekly basis. Data were analysed using Generalized Estimating Equations to evaluate the effects of time, time of day, and breed. Behavioural patterns varied depending on genetic strain and time of day. Notably, LV rabbits exhibited a higher frequency of grazing and active behaviours compared to NZW rabbits, whereas NZW rabbits showed a higher frequency of resting behaviours and social contact. Breed-related differences in other behaviors and in grass intake patterns were not statistically robust. In conclusion, LV rabbits appear to be better adapted to outdoor conditions, exhibiting a greater pasture utilization, but further studies are recommended to confirm these findings and to evaluate their robustness across different seasonal and environmental conditions. Full article
(This article belongs to the Special Issue Livestock Welfare in Extensive Production System)
Show Figures

Figure 1

28 pages, 1538 KB  
Article
Video Satellite Visual Tracking of Space Targets with Uncertainties in Camera Parameters and Target Position
by Zikai Zhong, Caizhi Fan and Haibo Song
Remote Sens. 2025, 17(24), 3978; https://doi.org/10.3390/rs17243978 - 9 Dec 2025
Viewed by 157
Abstract
Video satellites feature agile attitude maneuverability and the capability for continuous target imaging, making them an effective complement to ground-based remote sensing technologies. Existing research on video satellite tracking methods generally assumes either accurately calibrated camera parameters or precisely known target positions. However, [...] Read more.
Video satellites feature agile attitude maneuverability and the capability for continuous target imaging, making them an effective complement to ground-based remote sensing technologies. Existing research on video satellite tracking methods generally assumes either accurately calibrated camera parameters or precisely known target positions. However, deviations in camera parameters and errors in target localization can significantly degrade the performance of current tracking approaches. This paper proposes a novel adaptive visual tracking method for video satellites to track near-circular space targets in the presence of simultaneous uncertainties in both camera parameters and target position. First, the parameters representing these two types of uncertainties are separated through linearization. Then, based on the real-time image tracking error and the current parameter estimates, an update law for the uncertain parameters and a visual tracking law are designed. The stability of the closed-loop system and the convergence of the tracking error are rigorously proven. Finally, quantitative comparisons are conducted using a defined image stability index against two conventional tracking methods. Simulation results demonstrate that under coexisting uncertainties, traditional control methods either fail to track the target or exhibit significant tracking precision degradation. In contrast, the average image error during the steady-state phase exhibits a reduction of approximately one order of magnitude with the proposed method compared to the traditional image-based approach, demonstrating its superior tracking precision under complex uncertainty conditions. Full article
(This article belongs to the Section Satellite Missions for Earth and Planetary Exploration)
Show Figures

Figure 1

41 pages, 2890 KB  
Article
STREAM: A Semantic Transformation and Real-Time Educational Adaptation Multimodal Framework in Personalized Virtual Classrooms
by Leyli Nouraei Yeganeh, Yu Chen, Nicole Scarlett Fenty, Amber Simpson and Mohsen Hatami
Future Internet 2025, 17(12), 564; https://doi.org/10.3390/fi17120564 - 5 Dec 2025
Viewed by 379
Abstract
Most adaptive learning systems personalize around content sequencing and difficulty adjustment rather than transforming instructional material within the lesson itself. This paper presents the STREAM (Semantic Transformation and Real-Time Educational Adaptation Multimodal) framework. This modular pipeline decomposes multimodal educational content into semantically tagged, [...] Read more.
Most adaptive learning systems personalize around content sequencing and difficulty adjustment rather than transforming instructional material within the lesson itself. This paper presents the STREAM (Semantic Transformation and Real-Time Educational Adaptation Multimodal) framework. This modular pipeline decomposes multimodal educational content into semantically tagged, pedagogically annotated units for regeneration into alternative formats while preserving source traceability. STREAM is designed to integrate automatic speech recognition, transformer-based natural language processing, and planned computer vision components to extract instructional elements from teacher explanations, slides, and embedded media. Each unit receives metadata, including time codes, instructional type, cognitive demand, and prerequisite concepts, designed to enable format-specific regeneration with explicit provenance links. For a predefined visual-learner profile, the system generates annotated path diagrams, two-panel instructional guides, and entity pictograms with complete back-link coverage. Ablation studies confirm that individual components contribute measurably to output completeness without compromising traceability. This paper reports results from a tightly scoped feasibility pilot that processes a single five-minute elementary STEM video offline under clean audio–visual conditions. We position the pilot’s limitations as testable hypotheses that require validation across diverse content domains, authentic deployments with ambient noise and bandwidth constraints, multiple learner profiles, including multilingual students and learners with disabilities, and controlled comprehension studies. The contribution is a transparent technical demonstration of feasibility and a methodological scaffold for investigating whether within-lesson content transformation can support personalized learning at scale. Full article
Show Figures

Graphical abstract

14 pages, 5237 KB  
Article
Automated Detection of Kinky Back in Broiler Chickens Using Optimized Deep Learning Techniques
by Ramesh Bahadur Bist, Andi Asnayanti, Anh Dang Trieu Do, Yang Tian, Chaitanya Pallerla, Dongyi Wang and Adnan A. K. Alrubaye
AgriEngineering 2025, 7(12), 415; https://doi.org/10.3390/agriengineering7120415 - 4 Dec 2025
Viewed by 243
Abstract
The global poultry industry faces growing challenges from skeletal disorders, with Kinky Back (KB) significantly impacting broiler welfare and production. KB causes spinal deformities that reduce mobility, feed access, and increase mortality. It often remains undetected in early subclinical stages. Traditional KB diagnosis [...] Read more.
The global poultry industry faces growing challenges from skeletal disorders, with Kinky Back (KB) significantly impacting broiler welfare and production. KB causes spinal deformities that reduce mobility, feed access, and increase mortality. It often remains undetected in early subclinical stages. Traditional KB diagnosis methods are slow and subjective, and highlighting the need for an automated and objective detection. This study develops a machine learning approach for detecting KB in broilers using image data. Male Cobb 500 broilers were raised under controlled conditions and monitored over 7 weeks using overhead 4K video cameras. Behavioral and posture data related to KB were collected and annotated from images extracted from the videos. First, various optimizers (SGD, Adam, AdamW), image sizes, and data augmentation techniques were compared, and the best-performing optimizer, image size, and data augmentation technique were identified. These findings were then used to compare different original lightweight YOLO models trained and to identify the best models with further modifications to these configurations, aiming to improve detection accuracy. Different machine vision models were evaluated using precision, recall, F1-score, and mean average precision metrics to identify the best-performing approach. Among the tested optimizers, SGD achieved the highest precision (100%) and mAP_0.50–0.95 (74.7%), indicating superior localization and lower false-positive rates, while AdamW produced the highest recall (98.9%) with slightly lower precision. Image input size of 960 × 960 pixels yielded the best balance of precision (99.0%), recall (99.4%), and F1-score (99.2%). Data augmentation improved recall and reduced false negatives by confirming its value in enhancing model generalization. Among YOLO architectures, YOLOv9 performs best. Furthermore, the optimized YOLOv9 model, combined with augmentation and 960-sized images, achieved the highest performance, with a precision of 99.1%, recall of 100%, F1-score of 99.5%, and mAP of 78.0%. Overall, the proposed optimized YOLOv9-based system provides a reliable and scalable framework for automated detection of Kinky Back, supporting data-driven welfare management in modern poultry production. Full article
(This article belongs to the Special Issue Precision Farming Technologies for Monitoring Livestock and Poultry)
Show Figures

Figure 1

24 pages, 1857 KB  
Article
Ensemble Modeling of Multiple Physical Indicators to Dynamically Phenotype Autism Spectrum Disorder
by Marie Amale Huynh, Aaron Kline, Saimourya Surabhi, Kaitlyn Dunlap, Onur Cezmi Mutlu, Mohammadmahdi Honarmand, Parnian Azizian, Peter Washington and Dennis P. Wall
Algorithms 2025, 18(12), 764; https://doi.org/10.3390/a18120764 - 2 Dec 2025
Viewed by 227
Abstract
Early detection of Autism Spectrum Disorder (ASD), a neurodevelopmental condition characterized by social communication challenges, is essential for timely intervention. Naturalistic home videos collected via mobile applications offer scalable opportunities for digital diagnostics. We leveraged GuessWhat, a mobile game designed to engage parents [...] Read more.
Early detection of Autism Spectrum Disorder (ASD), a neurodevelopmental condition characterized by social communication challenges, is essential for timely intervention. Naturalistic home videos collected via mobile applications offer scalable opportunities for digital diagnostics. We leveraged GuessWhat, a mobile game designed to engage parents and children, which has generated over 3000 structured videos from 382 children. From this collection, we curated a final analytic sample of 688 feature-rich videos centered on a single dyad, enabling more consistent modeling. We developed a two-step pipeline: (1) filtering to isolate high-quality videos, and (2) feature engineering to extract interpretable behavioral signals. Unimodal LSTM-based models trained on eye gaze, head position, and facial expression achieved test AUCs of 86% (95% CI: 0.79–0.92), 78% (95% CI: 0.69–0.86), and 67% (95% CI: 0.55–0.78), respectively. Late-stage fusion of unimodal outputs significantly improved predictive performance, yielding a test AUC of 90% (95% CI: 0.84–0.95). Our findings demonstrate the complementary value of distinct behavioral channels and support the feasibility of using mobile-captured videos for detecting clinically relevant signals. While further work is needed to improve generalizability and inclusivity, this study highlights the promise of real-time, scalable autism phenotyping for early interventions. Full article
(This article belongs to the Special Issue Algorithms for Computer Aided Diagnosis: 2nd Edition)
Show Figures

Figure 1

39 pages, 1170 KB  
Review
Bridging Distance, Delivering Care: Pediatric Tele-Nutrition in the Digital Health Era—A Narrative Review
by Motti Haimi and Liron Inchi
Healthcare 2025, 13(23), 3107; https://doi.org/10.3390/healthcare13233107 - 28 Nov 2025
Viewed by 539
Abstract
Background: The emergence of telehealth has transformed healthcare delivery across multiple disciplines, with tele-nutrition representing a rapidly evolving field that addresses nutritional assessment, counseling, and management through digital platforms. Objective: This narrative review examines the current landscape of pediatric tele-nutrition services, exploring technological [...] Read more.
Background: The emergence of telehealth has transformed healthcare delivery across multiple disciplines, with tele-nutrition representing a rapidly evolving field that addresses nutritional assessment, counseling, and management through digital platforms. Objective: This narrative review examines the current landscape of pediatric tele-nutrition services, exploring technological platforms, clinical applications, evidence for effectiveness, implementation considerations, and future directions. Methods: A comprehensive literature search was conducted across PubMed, CINAHL, Embase, and Web of Science databases from January 2010 to October 2025. A total of 114 relevant sources were selected, encompassing randomized controlled trials, observational studies, systematic reviews, implementation studies, clinical guidelines, and policy documents. Results: This review synthesized 114 sources, predominantly from the United States (54%) and European nations (21%), with evidence expansion accelerating post-COVID-19 pandemic. Evidence suggests pediatric tele-nutrition demonstrates clinical outcomes comparable to traditional in-person care across diverse populations including obesity management, diabetes, gastrointestinal disorders, feeding difficulties, metabolic conditions, and preventive nutrition services. Multiple technology platforms are utilized, with synchronous video consultations most common (60–85% of encounters). Benefits include enhanced access to specialized care, increased frequency of contact, reduced family burden, and high satisfaction rates (>80% across most studies). Challenges include limitations in physical assessment, digital equity concerns affecting vulnerable populations, variable reimbursement policies, and the need for provider training. Hybrid models combining virtual and in-person care appear optimal for many conditions. Conclusions: Pediatric tele-nutrition represents a viable and effective care delivery model with particular advantages for families facing geographic, logistic, or access barriers. Continued attention to digital equity, provider training, regulatory frameworks, sustainable reimbursement policies, and rigorous evidence generation will optimize implementation and outcomes. Future directions include artificial intelligence applications, precision nutrition approaches, and expanded global health applications. Full article
(This article belongs to the Special Issue Telemedicine and eHealth Applications in the Pediatric Population)
Show Figures

Figure 1

17 pages, 1552 KB  
Article
Adaptive Pseudo Text Augmentation for Noise-Robust Text-to-Image Person Re-Identification
by Lian Xiong, Wangdong Li, Huaixin Chen and Yuxi Feng
Sensors 2025, 25(23), 7157; https://doi.org/10.3390/s25237157 - 24 Nov 2025
Viewed by 330
Abstract
Text-to-image person re-identification (T2I-ReID) aims to retrieve pedestrians from images/videos based on textual descriptions. However, most methods implicitly assume that training image–text pairs are correctly aligned, while in practice, issues such as under-correlated and falsely correlated image–text pairs arise due to coarse-grained text [...] Read more.
Text-to-image person re-identification (T2I-ReID) aims to retrieve pedestrians from images/videos based on textual descriptions. However, most methods implicitly assume that training image–text pairs are correctly aligned, while in practice, issues such as under-correlated and falsely correlated image–text pairs arise due to coarse-grained text annotations and erroneous textual descriptions. To address this problem, we propose a T2I-ReID method based on noise identification and pseudo-text generation. We first extracts image–text features using the Contrastive Language–Image Pre-Training model (CLIP), then employs the token fusion model to select and fuse informative local token features, resulting in token fusion embedding (TFE) for fine-grained representations. To identify noisy image–text pairs, we apply the two-component Gaussian mixture model (GMM) to fit the per-sample loss distributions computed by the predictions of basic feature embedding (BFE) and TFE. Finally, when the noise identification tends to stabilize, we employ a multimodal large language model (MLLM) to generate pseudo-texts that replace the noisy text, facilitating learning more reliable visual–semantic associations and cross-modal alignment under noisy conditions. Extensive experiments on the CUHK-PEDES, ICFG-PEDES, and RSTPReid datasets demonstrate the effectiveness of our proposed model and the good compatibility with other baselines. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

25 pages, 3837 KB  
Article
Swimming Performance and Behavior of High-Altitude Fish in High-Flow Velocity Environments
by Kaixiao Chen, Guanxi Ding, Yun Li, Gangwei He, Yanteng Zhou and Xiaogang Wang
Animals 2025, 15(22), 3327; https://doi.org/10.3390/ani15223327 - 18 Nov 2025
Viewed by 344
Abstract
The optimization of fishway design relies on a deep understanding of fish swimming performance and behavioral traits. Traditional methods often underestimate fish swimming performance and overlook their behavior under high-flow conditions, particularly in the context of high-altitude species. This study, based on an [...] Read more.
The optimization of fishway design relies on a deep understanding of fish swimming performance and behavioral traits. Traditional methods often underestimate fish swimming performance and overlook their behavior under high-flow conditions, particularly in the context of high-altitude species. This study, based on an open-channel flume system and combined with high-speed video tracking and Acoustic Doppler Velocity (ADV) measurements, constructs a Resource Selection Function-Generalized Additive Mixed Models (RSF-GAMMs) to quantify the swimming performance and behavior mechanisms of the high-altitude species, Schizothorax oconnori Lloyd, 1908 (S. oconnori), in high-velocity environments. The results show that S. oconnori significantly outperforms traditional swimming tests and exhibits strong dependence on movement modes. Endurance analysis reveals the breakpoints of endurance models, indicating the species’ high sensitivity to variations in exercise intensity, showcasing the unique physiological and behavioral characteristics of high-altitude fish. In high-velocity conditions, adult S. oconnori primarily aims to optimize energy conservation and stability, selectively choosing water bodies with varying disturbance levels depending on its movement mode and endurance state, thus optimizing path selection. This study presents a systematic method for quantifying the extreme swimming abilities and nonlinear behavioral responses of adult S. oconnori under complex flow conditions, providing scientific guidance for setting hydraulic thresholds and developing protection strategies for fishways. Full article
(This article belongs to the Special Issue Fish Cognition and Behaviour)
Show Figures

Figure 1

29 pages, 4304 KB  
Review
From Pixels to Motion: A Systematic Analysis of Translation-Based Video Synthesis Techniques
by Pratim Saha and Chengcui Zhang
Information 2025, 16(11), 990; https://doi.org/10.3390/info16110990 - 16 Nov 2025
Viewed by 450
Abstract
Translation-based Video Synthesis (TVS) has emerged as a transformative technology that enables sophisticated manipulation and generation of dynamic visual content. This comprehensive survey systematically examines the evolution of TVS methodologies, encompassing both image-to-video (I2V) and video-to-video (V2V) translation approaches. We analyze the progression [...] Read more.
Translation-based Video Synthesis (TVS) has emerged as a transformative technology that enables sophisticated manipulation and generation of dynamic visual content. This comprehensive survey systematically examines the evolution of TVS methodologies, encompassing both image-to-video (I2V) and video-to-video (V2V) translation approaches. We analyze the progression from domain-specific facial animation techniques to generalizable diffusion-based frameworks, investigating architectural innovations that address fundamental challenges in temporal consistency and cross-domain adaptation. Our investigation categorizes V2V methods into paired approaches, including conditional GAN-based frameworks and world-consistent synthesis, and unpaired approaches organized into five distinct paradigms: 3D GAN-based processing, temporal constraint mechanisms, optical flow integration, content-motion disentanglement learning, and extended image-to-image frameworks. Through comprehensive evaluation across diverse datasets, we analyze the performance using spatial quality metrics, temporal consistency measures, and semantic preservation indicators. We present a qualitative analysis comparing methods evaluated on identical benchmarks, revealing critical trade-offs between visual quality, temporal coherence, and computational efficiency. Current challenges persist in long-term temporal coherence, with future research directions identified in long-range video generation, audio-visual synthesis for enhanced realism, and development of comprehensive evaluation metrics that better capture human perceptual quality. This survey provides a structured understanding of methodological foundations, evaluation frameworks, and future research opportunities in TVS. We identify pathways for advancing cross-domain generalization, improving computational efficiency, and developing enhanced evaluation metrics for practical deployment, contributing to the broader understanding of temporal video synthesis technologies. Full article
(This article belongs to the Special Issue Computer and Multimedia Technology)
Show Figures

Figure 1

24 pages, 1982 KB  
Article
AI-Augmented Water Quality Event Response: The Role of Generative Models for Decision Support
by Stephen Mounce, Richard Mounce and Joby Boxall
Water 2025, 17(22), 3260; https://doi.org/10.3390/w17223260 - 14 Nov 2025
Viewed by 665
Abstract
The global water sector faces unprecedented challenges from climate change, rapid urbanisation, and ageing infrastructure, necessitating a shift towards proactive, digital strategies. Historically characterised as “data rich but information poor,” the sector struggles with underutilised and siloed operational data. Traditional machine learning (ML) [...] Read more.
The global water sector faces unprecedented challenges from climate change, rapid urbanisation, and ageing infrastructure, necessitating a shift towards proactive, digital strategies. Historically characterised as “data rich but information poor,” the sector struggles with underutilised and siloed operational data. Traditional machine learning (ML) models have provided a foundation for smart water management, and subsequently deep learning (DL) approaches utilising algorithmic breakthroughs and big data have proved to be even more powerful under the right conditions. This paper explores and reviews the transformative potential of Generative Artificial Intelligence (GenAI) and Large Language Models (LLMs), enabling a paradigm shift towards data-centric thinking. GenAI, particularly when augmented with Retrieval-Augmented Generation (RAG) and agentic AI, can create new content, facilitate natural language interaction, synthesise insights from vast unstructured data (of all types including text, images and video) and automate complex, multi-step workflows. Focusing on the critical area of drinking water quality, we demonstrate how these intelligent tools can move beyond reactive systems. A case study is presented which utilises regulatory reports to mine knowledge, providing GenAI-powered chatbots for accessible insights and improved water quality event management. This approach empowers water professionals with dynamic, trustworthy decision support, enhancing the safety and resilience of drinking water supplies by recalling past actions, generating novel insights and simulating response scenarios. Full article
Show Figures

Figure 1

19 pages, 4107 KB  
Article
Structured Prompting and Collaborative Multi-Agent Knowledge Distillation for Traffic Video Interpretation and Risk Inference
by Yunxiang Yang, Ningning Xu and Jidong J. Yang
Computers 2025, 14(11), 490; https://doi.org/10.3390/computers14110490 - 9 Nov 2025
Viewed by 1073
Abstract
Comprehensive highway scene understanding and robust traffic risk inference are vital for advancing Intelligent Transportation Systems (ITS) and autonomous driving. Traditional approaches often struggle with scalability and generalization, particularly under the complex and dynamic conditions of real-world environments. To address these challenges, we [...] Read more.
Comprehensive highway scene understanding and robust traffic risk inference are vital for advancing Intelligent Transportation Systems (ITS) and autonomous driving. Traditional approaches often struggle with scalability and generalization, particularly under the complex and dynamic conditions of real-world environments. To address these challenges, we introduce a novel structured prompting and multi-agent collaborative knowledge distillation framework that enables automatic generation of high-quality traffic scene annotations and contextual risk assessments. Our framework orchestrates two large vision–language models (VLMs): GPT-4o and o3-mini, using a structured Chain-of-Thought (CoT) strategy to produce rich, multiperspective outputs. These outputs serve as knowledge-enriched pseudo-annotations for supervised fine-tuning of a much smaller student VLM. The resulting compact 3B-scale model, named VISTA (Vision for Intelligent Scene and Traffic Analysis), is capable of understanding low-resolution traffic videos and generating semantically faithful, risk-aware captions. Despite its significantly reduced parameter count, VISTA achieves strong performance across established captioning metrics (BLEU-4, METEOR, ROUGE-L, and CIDEr) when benchmarked against its teacher models. This demonstrates that effective knowledge distillation and structured role-aware supervision can empower lightweight VLMs to capture complex reasoning capabilities. The compact architecture of VISTA facilitates efficient deployment on edge devices, enabling real-time risk monitoring without requiring extensive infrastructure upgrades. Full article
Show Figures

Figure 1

21 pages, 36892 KB  
Article
Self-Supervised Depth and Ego-Motion Learning from Multi-Frame Thermal Images with Motion Enhancement
by Rui Yu, Guoliang Ma, Jian Guo and Lisong Xu
Appl. Sci. 2025, 15(22), 11890; https://doi.org/10.3390/app152211890 - 8 Nov 2025
Viewed by 511
Abstract
Thermal cameras are known for their ability to overcome lighting constraints and provide reliable thermal radiation images. This capability facilitates methods for depth and ego-motion estimation, enabling efficient learning of poses and scene structures under all-day conditions. However, the existing studies on depth [...] Read more.
Thermal cameras are known for their ability to overcome lighting constraints and provide reliable thermal radiation images. This capability facilitates methods for depth and ego-motion estimation, enabling efficient learning of poses and scene structures under all-day conditions. However, the existing studies on depth prediction for thermal images are limited. In practical applications, thermal cameras capture sequential frames. Unfortunately, the potential of this multi-frame aspect is underutilized by the previous methods, resulting in limitations on the depth prediction accuracy of thermal videos. To leverage the multi-frame advantages of thermal videos and to improve the accuracy of monocular depth estimation from thermal images, we propose a framework for self-supervised depth and ego-motion learning from multi-frame thermal images. We construct a multi-view stereo (MVS) cost volume from temporally adjacent thermal frames. The construction process is adjusted based on the estimated pose, which serves as a motion hint. To stabilize the motion hint and improve pose estimation accuracy, we design a motion enhancement module that utilizes self-generated poses for additional supervisory signals. Additionally, we introduce RGB images in the training phase to form a multi-spectral loss, thereby augmenting the performance of the thermal model. The experimental results, conducted on a public dataset, demonstrate the proposed method’s accurate estimation of depth and ego-motion across varying light conditions, surpassing the performance of the self-supervised baseline. Full article
(This article belongs to the Special Issue Application of Artificial Intelligence in Image Processing)
Show Figures

Figure 1

24 pages, 2447 KB  
Article
Augmented Gait Classification: Integrating YOLO, CNN–SNN Hybridization, and GAN Synthesis for Knee Osteoarthritis and Parkinson’s Disease
by Houmem Slimi, Ala Balti, Mounir Sayadi and Mohamed Moncef Ben Khelifa
Signals 2025, 6(4), 64; https://doi.org/10.3390/signals6040064 - 7 Nov 2025
Viewed by 662
Abstract
We propose a novel hybrid deep learning framework that synergistically integrates Convolutional Neural Networks (CNNs), Spiking Neural Networks (SNNs), and Generative Adversarial Networks (GANs) for robust and accurate classification of high-resolution frontal and sagittal human gait video sequences—capturing both lower-limb kinematics and upper-body [...] Read more.
We propose a novel hybrid deep learning framework that synergistically integrates Convolutional Neural Networks (CNNs), Spiking Neural Networks (SNNs), and Generative Adversarial Networks (GANs) for robust and accurate classification of high-resolution frontal and sagittal human gait video sequences—capturing both lower-limb kinematics and upper-body posture—from subjects with Knee Osteoarthritis (KOA), Parkinson’s Disease (PD), and healthy Normal (NM) controls, classified into three disease-type categories. Our approach first employs a tailored CNN backbone to extract rich spatial features from fixed-length clips (e.g., 16 frames resized to 128 × 128 px), which are then temporally encoded and processed by an SNN layer to capture dynamic gait patterns. To address class imbalance and enhance generalization, a conditional GAN augments rare severity classes with realistic synthetic gait sequences. Evaluated on the controlled, marker-based KOA-PD-NM laboratory public dataset, our model achieves an overall accuracy of 99.47%, a sensitivity of 98.4%, a specificity of 99.0%, and an F1-score of 98.6%, outperforming baseline CNN, SNN, and CNN–SNN configurations by over 2.5% in accuracy and 3.1% in F1-score. Ablation studies confirm that GAN-based augmentation yields a 1.9% accuracy gain, while the SNN layer provides critical temporal robustness. Our findings demonstrate that this CNN–SNN–GAN paradigm offers a powerful, computationally efficient solution for high-precision, gait-based disease classification, achieving a 48.4% reduction in FLOPs (1.82 GFLOPs to 0.94 GFLOPs) and 9.2% lower average power consumption (68.4 W to 62.1 W) on Kaggle P100 GPU compared to CNN-only baselines. The hybrid model demonstrates significant potential for energy savings on neuromorphic hardware, with an estimated 13.2% reduction in energy per inference based on FLOP-based analysis, positioning it favorably for deployment in resource-constrained clinical environments and edge computing scenarios. Full article
Show Figures

Figure 1

Back to TopTop