MDPI - Publisher of Open Access Journals

21 pages, 12849 KB

Open AccessArticle

VETA-CLIP: Lightweight Video Adaptation with Efficient Spatio-Temporal Attention and Variation Loss

by Jing Huang and Jiaxin Liao

Electronics 2026, 15(8), 1701; https://doi.org/10.3390/electronics15081701 - 17 Apr 2026

Viewed by 152

Full fine-tuning of large-scale vision-language models for video action recognition incurs prohibitive computational cost and often degrades pre-trained spatial representations. To address this, we propose VETA-CLIP, a Video Efficient Temporal Adaptation framework that enhances temporal modeling while preserving cross-modal alignment. By incorporating lightweight [...] Read more.

Full fine-tuning of large-scale vision-language models for video action recognition incurs prohibitive computational cost and often degrades pre-trained spatial representations. To address this, we propose VETA-CLIP, a Video Efficient Temporal Adaptation framework that enhances temporal modeling while preserving cross-modal alignment. By incorporating lightweight adapters into a frozen backbone, VETA-CLIP introduces only 3.55M trainable parameters (a 98% reduction compared to full fine-tuning). Our approach features two key innovations: (1) an Efficient Spatio-Temporal Attention (ESTA) mechanism with a parameter-free boundary replication temporal shift (BRTS) module, which explicitly decouples spatial and temporal attention heads to capture inter-frame dynamics while minimizing disruption to the pre-trained spatial representations; and (2) a novel Variation Loss that maximizes both local inter-frame differences and global temporal variance, encouraging the model to focus on action-related changes rather than static backgrounds. Extensive experiments on HMDB-51, UCF-101, and Something-Something v2 demonstrate that VETA-CLIP achieves competitive performance across zero-shot, base-to-novel, and few-shot protocols, while and remains competitive on the Kinetics-400 dataset. Notably, our eight-frame variant requires only 4.7 GB of peak GPU memory and 2.47 ms of inference per video, demonstrating exceptional computational efficiency alongside consistent accuracy gains. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

40 pages, 12177 KB

Open AccessArticle

Dynamic Multi-Relation Learning with Multi-Scale Hypergraph Transformer for Multi-Modal Traffic Forecasting

by Juan Chen and Meiqing Shan

Future Transp. 2026, 6(1), 51; https://doi.org/10.3390/futuretransp6010051 - 22 Feb 2026

Viewed by 482

Abstract

Accurate multi-modal traffic demand forecasting is key to optimizing intelligent transportation systems (ITSs). To overcome the shortcomings of existing methods in capturing dynamic high-order correlations between heterogeneous spatial units and decoupling intra- and inter-mode dependencies at multiple time scales, this paper proposes a [...] Read more.

Accurate multi-modal traffic demand forecasting is key to optimizing intelligent transportation systems (ITSs). To overcome the shortcomings of existing methods in capturing dynamic high-order correlations between heterogeneous spatial units and decoupling intra- and inter-mode dependencies at multiple time scales, this paper proposes a Dynamic Multi-Relation Learning with Multi-Scale Hypergraph Transformer method (MST-Hype Trans). The model integrates three novel modules. Firstly, the Multi-Scale Temporal Hypergraph Convolutional Network (MSTHCN) achieves collaborative decoupling and captures periodic and cross-modal temporal interactions of transportation demand at multiple granularities, such as time, day, and week, by constructing a multi-scale temporal hypergraph. Secondly, the Dynamic Multi-Relationship Spatial Hypergraph Network (DMRSHN) innovatively integrates geographic proximity, passenger flow similarity, and transportation connectivity to construct structural hyperedges and combines KNN and K-means algorithms to generate dynamic hyperedges, thereby accurately modeling the high-order spatial correlations of dynamic evolution between heterogeneous nodes. Finally, the Conditional Meta Attention Gated Fusion Network (CMAGFN), as a lightweight meta network, introduces a gate control mechanism based on multi-head cross-attention. It can dynamically generate node features based on real-time traffic context and adaptively calibrate the fusion weights of multi-source information, achieving optimal prediction decisions for scene perception. Experiments on three real-world datasets (NYC-Taxi, -Bike, and -Subway) demonstrate that MST-Hyper Trans achieves an average reduction of 7.6% in RMSE and 9.2% in MAE across all modes compared to the strongest baseline, while maintaining interpretability of spatiotemporal interactions. This study not only provides good model interpretability but also offers a reliable solution for multi-modal traffic collaborative management. Full article

(This article belongs to the Special Issue Recent Advances in Artificial Intelligence and Big Data for Intelligent Transportation Systems)

► Show Figures

Figure 1

19 pages, 358 KB

Open AccessArticle

Edge-Level Forest Fire Prediction with Selective Communication in Hierarchical Wireless Sensor Networks

by Ahshanul Haque and Hamdy Soliman

Electronics 2026, 15(4), 881; https://doi.org/10.3390/electronics15040881 - 20 Feb 2026

Viewed by 421

Abstract

Wildfire events are increasing in frequency and severity, creating an urgent need for early, accurate, and energy-efficient forest fire prediction systems that can operate at a large scale. A fundamental challenge in edge-level forest fire prediction lies in jointly achieving high detection accuracy [...] Read more.

Wildfire events are increasing in frequency and severity, creating an urgent need for early, accurate, and energy-efficient forest fire prediction systems that can operate at a large scale. A fundamental challenge in edge-level forest fire prediction lies in jointly achieving high detection accuracy while minimizing wireless transmissions and communication-related energy consumption. This paper proposes a communication-aware hierarchical wireless sensor network (WSN) framework that performs fire versus normal environmental state classification directly at the network edge. Multi-modal physical and constrained virtual sensor readings are fused into short-term temporal supervectors and processed locally using lightweight random forest classifiers deployed on sensor nodes and cluster heads. A temporal 2-of-3 voting mechanism is applied at the edge to suppress transient noise and improve prediction reliability before triggering communication. The proposed design enables selective, event-driven transmission, where only temporally validated abnormal states are forwarded through the hierarchy, thereby decoupling detection accuracy from continuous data reporting. Extensive experiments using real multi-modal environmental sensor data and statistically rigorous 5-fold GroupKFold cross-validation—ensuring strict node-level separation between training and testing—demonstrate the effectiveness of the approach. The proposed framework achieves a node-level accuracy of 98.82 ± 1.75% and a scenario-level detection accuracy of 96.52 ± 0.89%. Compared to periodic reporting and the LEACH protocol, the system reduces wireless transmissions by over 66% and communication-related energy consumption by more than 66% across network sizes ranging from 100 to 1000 nodes. The main contributions of this work are summarized as follows: (1) a communication-aware hierarchical Edge-AI framework for early forest fire prediction that performs local inference and temporal validation directly at sensor nodes; (2) a constrained virtual sensing strategy integrated with temporal supervector modeling to enhance spatial coverage while preserving reliability; and (3) a statistically rigorous large-scale evaluation demonstrating joint optimization of prediction accuracy, transmission reduction, and communication energy efficiency across network sizes ranging from 100 to 1000 nodes. These results show that accurate early forest fire prediction can be achieved through edge-level inference and selective communication, substantially extending network lifetime while maintaining statistically reliable detection performance. Full article

(This article belongs to the Special Issue AI and Machine Learning in Recommender Systems and Customer Behavior)

► Show Figures

Figure 1

29 pages, 3243 KB

Open AccessArticle

A Platform-Agnostic Publish–Subscribe Architecture with Dynamic Optimization

by Ahmed Twabi, Yepeng Ding and Tohru Kondo

Future Internet 2025, 17(9), 426; https://doi.org/10.3390/fi17090426 - 19 Sep 2025

Cited by 1 | Viewed by 989

Abstract

Real-time media streaming over publish–subscribe platforms is increasingly vital in scenarios that demand the scalability of event-driven architectures while ensuring timely media delivery. This is especially true in multi-modal and resource-constrained environments, such as IoT, Physical Activity Recognition and Measure (PARM), and Internet [...] Read more.

Real-time media streaming over publish–subscribe platforms is increasingly vital in scenarios that demand the scalability of event-driven architectures while ensuring timely media delivery. This is especially true in multi-modal and resource-constrained environments, such as IoT, Physical Activity Recognition and Measure (PARM), and Internet of Video Things (IoVT), where integrating sensor data with media streams often leads to complex hybrid setups that compromise consistency and maintainability. Publish–subscribe (pub/sub) platforms like Kafka and MQTT offer scalability and decoupled communication but fall short in supporting real-time video streaming due to platform-dependent design, rigid optimization, and poor sub-second media handling. This paper presents FrameMQ, a layered, platform-agnostic architecture designed to overcome these limitations by decoupling application logic from platform-specific configurations and enabling dynamic real-time optimization. FrameMQ exposes tunable parameters such as compression and segmentation, allowing integration with external optimizers. Using Particle Swarm Optimization (PSO) as an exemplary optimizer, FrameMQ reduces total latency from over 2300 ms to below 400ms under stable conditions (over an

80 %

improvement) and maintains up to a

52 %

reduction under adverse network conditions. These results demonstrate FrameMQ’s ability to meet the demands of latency-sensitive applications, such as real-time streaming, IoT, and surveillance, while offering portability, extensibility, and platform independence without modifying the core application logic. Full article

► Show Figures

Figure 1

16 pages, 3175 KB

Open AccessArticle

Research and Optimization of Key Technologies for Manure Cleaning Equipment Based on a Profiling Wheel Mechanism

by Fengxin Yan, Can Gao, Lishuang Ren, Jiahao Li and Yuanda Gao

AgriEngineering 2025, 7(9), 287; https://doi.org/10.3390/agriengineering7090287 - 3 Sep 2025

Viewed by 1256

Abstract

This study addresses the problems of poor dynamic stability, high vibration coupling, and inefficient energy use in large-farm manure handling machines. A profiling wheel-based multi-disciplinary approach is proposed in the study. With the rocker arm prototype, double-ball heads, and a hydraulic damping system, [...] Read more.

This study addresses the problems of poor dynamic stability, high vibration coupling, and inefficient energy use in large-farm manure handling machines. A profiling wheel-based multi-disciplinary approach is proposed in the study. With the rocker arm prototype, double-ball heads, and a hydraulic damping system, a parametric design is built that includes vibration and energy consumption. The simulation results in EDEM2022 and ANSYS2022 prove the structure viability and motion compensation capability, while NSGA-II optimizes the damping parameters (k₁ = 380 kN/m, C = 1200 Ns/m). The results show a 14.7% σ_Fc reduction, 14.3% α_RMS decrease, resonance avoidance (14–18 Hz), Δx (horizontal offset of the frame) < 5 mm, 18% power loss to 12.5%, and 62% stability improvement. The new research includes constructing a dynamic model by combining the Hertz contact theory with the modal decoupling method, while interacting with an automatic algorithm of adaptive damping and a mechanical-hydraulic-control-oriented optimization platform. Future work could integrate lightweight materials and multi-machine collaboration for smarter, greener manure cleaning. Full article

(This article belongs to the Section Agricultural Mechanization and Machinery)

► Show Figures

Figure 1

25 pages, 732 KB

Open AccessArticle

Accuracy-Aware MLLM Task Offloading and Resource Allocation in UAV-Assisted Satellite Edge Computing

by Huabing Yan, Hualong Huang, Zijia Zhao, Zhi Wang and Zitian Zhao

Drones 2025, 9(7), 500; https://doi.org/10.3390/drones9070500 - 16 Jul 2025

Cited by 2 | Viewed by 3233

Abstract

This paper presents a novel framework for optimizing multimodal large language model (MLLM) inference through task offloading and resource allocation in UAV-assisted satellite edge computing (SEC) networks. MLLMs leverage transformer architectures to integrate heterogeneous data modalities for IoT applications, particularly real-time monitoring in [...] Read more.

This paper presents a novel framework for optimizing multimodal large language model (MLLM) inference through task offloading and resource allocation in UAV-assisted satellite edge computing (SEC) networks. MLLMs leverage transformer architectures to integrate heterogeneous data modalities for IoT applications, particularly real-time monitoring in remote areas. However, cloud computing dependency introduces latency, bandwidth, and privacy challenges, while IoT device limitations require efficient distributed computing solutions. SEC, utilizing low-earth orbit (LEO) satellites and unmanned aerial vehicles (UAVs), extends mobile edge computing to provide ubiquitous computational resources for remote IoTDs. We formulate the joint optimization of MLLM task offloading and resource allocation as a mixed-integer nonlinear programming (MINLP) problem, minimizing latency and energy consumption while optimizing offloading decisions, power allocation, and UAV trajectories. To address the dynamic SEC environment characterized by satellite mobility, we propose an action-decoupled soft actor–critic (AD-SAC) algorithm with discrete–continuous hybrid action spaces. The simulation results demonstrate that our approach significantly outperforms conventional deep reinforcement learning methods in convergence and system cost reduction compared to baseline algorithms. Full article

► Show Figures

Figure 1

16 pages, 1863 KB

Open AccessArticle

Parameter-Matching Multi-Objective Optimization for Diesel Engine Torsional Dampers

by Zhongxu Tian and Zhongda Ge

Appl. Sci. 2025, 15(10), 5639; https://doi.org/10.3390/app15105639 - 18 May 2025

Viewed by 1103

Abstract

Torsional vibration dampers effectively mitigate torsional oscillations and additional stresses in diesel engine crankshaft systems, ensuring operational safety and reliability. Traditional damper selection principles, grounded in dual-pendulum dynamic models, focus on minimizing maximum torsional angles but fail to accurately characterize vibration behaviors in [...] Read more.

Torsional vibration dampers effectively mitigate torsional oscillations and additional stresses in diesel engine crankshaft systems, ensuring operational safety and reliability. Traditional damper selection principles, grounded in dual-pendulum dynamic models, focus on minimizing maximum torsional angles but fail to accurately characterize vibration behaviors in multi-cylinder engines. This study addresses this limitation by investigating dynamic modeling and numerical methods for an eight-cylinder diesel crankshaft system. A torsional vibration model was developed using Cholesky decomposition and the Jacobi sweep method for free vibration analysis, followed by dynamic response calculations through model decoupling and modal superposition. Parameter optimization of the damper was achieved via the NSGA-II multi-objective algorithm combined with a Bayesian-hyperparameter-optimized BP neural network. The results show that high-inertia-ratio dampers effectively suppress vibration and stress, while low-inertia-ratio configurations require approximately 20% elevated tuning ratios beyond theoretical parameters to achieve an additional 5% stress reduction, albeit with amplified torsional oscillations. Additionally, the study critically evaluates the numerical reliability of conventional dual-pendulum-based tuning ratio selection methods. This integrated approach enhances the precision of damper parameter matching for multi-cylinder engine applications. Full article

(This article belongs to the Section Acoustics and Vibrations)

► Show Figures

Figure 1

13 pages, 1163 KB

Open AccessArticle

A Decoupled Modal Reduction Method for the Steady-State Vibration Analysis of Vibro-Acoustic Systems with Non-Classical Damping

by Ruxin Gao and Shanshan Fan

Acoustics 2024, 6(3), 792-804; https://doi.org/10.3390/acoustics6030044 - 23 Sep 2024

Viewed by 2060

Abstract

This paper presents a decoupled modal reduction method for the steady-state vibration analysis of vibro-acoustic systems characterized by non-classical damping. The proposed approach initially reduces the order of the coupled governing equations of the vibro-acoustic system through the utilization of non-coupled modes, subsequently [...] Read more.

This paper presents a decoupled modal reduction method for the steady-state vibration analysis of vibro-acoustic systems characterized by non-classical damping. The proposed approach initially reduces the order of the coupled governing equations of the vibro-acoustic system through the utilization of non-coupled modes, subsequently employing the complex mode superposition technique to address non-classical damping effects. By leveraging non-coupled modes, this method circumvents the need to solve for coupled modes as required in traditional modal reduction techniques, thereby diminishing both computational complexity and cost. Furthermore, the complex mode superposition method facilitates the decoupling of coupled governing equations with non-classical damping, enhancing computational efficiency. Numerical examples validate both the accuracy and effectiveness of this methodology. Given that modal decomposition is independent of frequency, an analysis of computational efficiency across various stages further substantiates that this method offers significant advantages in terms of efficiency for computational challenges encountered over a broad frequency range. Full article

(This article belongs to the Special Issue Vibration and Noise (2nd Edition))

► Show Figures

Figure 1

Search Results (8)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (8)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI