Saved Queries

Empirical parameter settings in steam-based soil disinfestation for Panax notoginseng (a valuable medicinal plant) often hinder the simultaneous optimization of pathogen control and energy efficiency. To address this limitation, this study aims to develop a parameter regulation framework that integrates multi-output regression with scenario-oriented intelligent decision-making. Initially, a comprehensive dataset comprising critical parameters—steam pressure (P_steam), soil compaction (C_soil), and heating time (t_heat)—was established. A random search (RS) hyperparameter optimization scheme was employed to comparatively evaluate the multi-output predictive performance of Random Forest (RF), Support Vector Regression (SVR), and Multilayer Perceptron (MLP) for the joint estimation of soil temperature (T_soil) and root-rot pathogen kill rate (Kill_rate). Subsequently, by integrating total energy consumption (E_total) and operating electricity cost models, a constrained search algorithm was implemented to develop three objective-oriented decision modes: “maximize Kill_rate”, “minimize C_electricity”, and “maximize Efficiency”. Results demonstrate that the RS-optimized SVR yielded superior multi-output performance, achieving R² of 0.968 for T_soil (MAE = 2.44 °C) and 0.808 for Kill_rate (MAE = 7.85%). Compared to conventional empirical configurations, the proposed decision modes exhibited significant advantages across diverse scenarios. In the “maximize Kill_rate” mode, dynamic extensions of t_heat facilitated theoretical complete inactivation even under challenging heating conditions, effectively eliminating disinfection “blind spots” inherent in fixed-duration strategies. Under the “minimize C_electricity” mode, precise regulation of P_steam reduced operational electricity costs by 18.2% while satisfying the constraint of Kill_rate ≥ 95%. Furthermore, the “maximize Efficiency” mode identified an optimal operating point at C_soil = 64 kPa (P_steam = 0.4 MPa, t_heat = 13 min), thereby mitigating performance degradation associated with excessive tillage or high media rigidity and achieving an optimized cost–benefit ratio. By synthesizing high-fidelity multi-output regression with a flexible multi-mode decision-making framework, this study provides an intelligent solution for soil disinfestation in protected agriculture, facilitating the coordinated optimization of phytosanitary efficacy, energy expenditure, and economic viability. Full article

(This article belongs to the Section Soil and Plant Nutrition)

20 pages, 1844 KB

Open AccessArticle

AI-Enhanced Prognostic Model for Predicting Polyp Recurrence and Guiding Post-Polypectomy Surveillance Intervals Using the ERCPMP-V5 Dataset

by Sri Harsha Boppana, Sachin Sravan Kumar Komati, Ritwik Raj, Gautam Maddineni, Raja Chandra Chakinala, Pradeep Yarra, Venkata C. K. Sunkesula and Cyrus David Mintz

J. Clin. Med. 2026, 15(9), 3303; https://doi.org/10.3390/jcm15093303 (registering DOI) - 26 Apr 2026

Abstract

Introduction: Colorectal cancer remains a leading cause of cancer-related morbidity and mortality, with adenomatous polyps representing a common precursor. Post-polypectomy polyp recurrence represents a significant risk of colorectal cancer, driving periodic colonoscopy surveillance and polypectomy as needed. In this study, we explore a multimodal machine learning approach that integrates endoscopic imaging with clinical and pathology data to improve recurrence risk prediction and support individualized surveillance planning. Methods: We developed and evaluated a multimodal artificial intelligence (AI) model to predict post-polypectomy colorectal polyp recurrence using the ERCPMP-v5 dataset. The cohort included 217 patients with 796 high-resolution endoscopic RGB images and 21 endoscopic videos; video data were converted to still frames at 2 frames per second. Images and frames were resized to 224 × 224 pixels and normalized. Patient-level demographic, morphological (Paris, Kudo Pit, JNET), anatomical, and pathological variables were encoded using standard scaling for continuous features and one-hot encoding for categorical features. Visual representations were extracted using a pretrained Vision Transformer backbone (ViT-Base-Patch16-224) with frozen weights. Structured metadata (79 variables) was encoded using a multilayer perceptron. A late fusion framework used image and metadata representations to generate a recurrence probability via a sigmoid classifier; probabilities were thresholded at 0.5 for binary prediction. Model performance was evaluated on a held-out test set using accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). We additionally compared fusion performance with image-only and metadata-only baselines. Predicted probabilities were translated to surveillance recommendations using risk tiers: low risk (0.00 ≤ p < 0.20), moderate risk (0.20 ≤ p < 0.50), and high risk (p ≥ 0.50). Results: On the test set, the multimodal fusion model achieved 90.4% accuracy, 86.7% precision, 83.1% recall, 84.9% F1-score, and an AUC of 0.920. The image-only model achieved 84.6% accuracy (AUC 0.880), and the metadata-only model achieved 81.9% accuracy (AUC 0.850), indicating improved performance with multimodal fusion. Risk stratification enabled surveillance recommendations of 1–3 years for low risk, 6–12 months for moderate risk, and 3–6 months for high risk. Conclusions: A late-fusion multimodal model integrating endoscopic imaging with structured clinical and pathology variables demonstrated excellent performance for predicting post-polypectomy recurrence and generated actionable risk-based surveillance intervals. This approach may support individualized follow-up planning and more efficient allocation of surveillance resources, while prioritizing timely evaluation for patients at higher predicted risk. Full article

(This article belongs to the Special Issue Artificial Intelligence in Gastrointestinal Disorders: Current Updates from Theory to Clinical Practice)

►▼ Show Figures

Figure 1

22 pages, 3386 KB

Open AccessArticle

UAV Visual Localization via Multimodal Fusion and Multi-Scale Attention Enhancement

by Yiheng Wang, Yushuai Zhang, Zhenyu Wang, Jianxin Guo, Feng Wang, Rui Zhu and Dejing Lin

Sustainability 2026, 18(9), 4277; https://doi.org/10.3390/su18094277 (registering DOI) - 25 Apr 2026

Abstract

For power-grid applications such as transmission corridor inspection, substation asset inspection, and post-disaster emergency repair, reliable UAV self-localization under GNSS-degraded or GNSS-denied conditions is critical to ensuring operational safety and accurate defect geotagging. Due to substantial discrepancies in viewpoint, scale, and geometric structure between oblique UAV images and nadir satellite images, conventional RGB-based cross-view retrieval methods often suffer from unstable alignment and insufficient geometric modeling, particularly in scenarios with repetitive textures and partial overlap. To address these challenges, we propose a cross-view visual geo-localization model that integrates RGBD multimodal inputs with multi-scale attention enhancement. Specifically, MiDaS is used to estimate relative depth from UAV imagery, which is concatenated with RGB to form a four-channel input, while satellite images are padded with an additional zero channel to maintain dimensional consistency. A shared-weight ViTAdapter is adopted to learn joint semantic–geometric representations, and a lightweight Efficient Multi-scale Attention (EMA) module is adopted on spatial feature maps to strengthen multi-scale spatial consistency. In addition, an IoU-weighted InfoNCE loss is employed to accommodate partial matching during training, thereby improving the robustness of feature alignment. Experiments on the GTA-UAV dataset under the cross-area protocol show stable performance across both retrieval and localization metrics. Specifically, Recall@1, Recall@5, and Recall@10 reach 18.12%, 38.83%, and 49.47%, respectively; AP is 28.01 and SDM@3 is 0.53; meanwhile, the top-1 geodesic distance error Dis@1 is 1052.73 m. These results indicate that explicit geometric priors combined with multi-scale spatial enhancement can effectively improve cross-view feature alignment, leading to enhanced robustness and accuracy for localization in challenging power inspection scenarios. Full article

(This article belongs to the Special Issue Planning, Operation, and Energy Efficiency of Sustainable Electric Power Systems)

►▼ Show Figures

Figure 1

22 pages, 2316 KB

Open AccessArticle

MVDFusion: Multimodal Vehicle Detection in Foggy Weather Using LiDAR and Radar Fusion

by Jiake Tian, Yan Gao, Xin Xia, Guoliang Ju, Peijun Ye, Sijie Tang, Hong Wang and Xucong Wang

Sensors 2026, 26(9), 2663; https://doi.org/10.3390/s26092663 (registering DOI) - 25 Apr 2026

Abstract

Millimeter-wave (mmWave) radar is widely used for vehicle detection in adverse weather conditions due to its robustness against environmental interference. However, the sparsity of mmWave radar data and the lack of height information significantly limit its broader applicability. To address these challenges, we propose MVDFusion, a multi-modal vehicle detection framework that integrates LiDAR and radar data for robust perception in foggy environments. The proposed framework is designed to fully exploit LiDAR information to compensate for the limitations of sparse radar data. Specifically, two key modules are developed: a radar height query module to enhance height estimation, and a radar–LiDAR query fusion module to improve feature representation. This design enables deep feature-level integration of mmWave radar and LiDAR data. Extensive experiments on the Oxford Radar RobotCar dataset demonstrate that MVDFusion achieves superior performance and robustness under foggy conditions. In particular, it outperforms existing state-of-the-art methods at intersection-over-union thresholds of 0.5, 0.65, and 0.8, achieving detection accuracies of 95.8%, 94.2%, and 81.5%. Full article

(This article belongs to the Section Sensing and Imaging)

►▼ Show Figures

Figure 1

19 pages, 1479 KB

Open AccessArticle

Reward-Guided Dynamic Fusion and Modality Decoupling for Enhanced Multimodal Sentiment Analysis

by He Zhang, Zichen Gao, Qi Yan, Yu Gu, Shuang Wang, Linsong Liu and Dequan An

Electronics 2026, 15(9), 1813; https://doi.org/10.3390/electronics15091813 - 24 Apr 2026

Abstract

Multimodal Sentiment Analysis (MSA) integrates multiple modalities to better understand human emotions. However, existing methods often neglect heterogeneity among modal features, causing redundancy and inconsistencies. Additionally, the dynamic interplay between modalities is frequently ignored during fusion, limiting performance. To address these issues, we propose Reward-Guided Dynamic Fusion and Modality Decoupling (RDFD). RDFD includes two key components: (1) a feature decoupling module that separates modality-specific and modality-shared features, reducing redundancy and conflicts; (2) a Reward-Guided Dynamic Fusion module that adaptively selects guiding modalities to enhance modality-specific representations and enable flexible fusion. Experiments on the CMU-MOSI and CMU-MOSEI datasets show that RDFD achieves state-of-the-art performance, demonstrating its effectiveness in advancing Multimodal Sentiment Analysis. Full article

(This article belongs to the Section Computer Science & Engineering)

20 pages, 1886 KB

Open AccessFeature PaperArticle

Modeling Count Distributions via Skewness–Kurtosis Orthogonal Expansions

by Won-Woo Lee, Ji-Hun Lee, Jong-Seung Lee and Hyung-Tae Ha

Mathematics 2026, 14(9), 1422; https://doi.org/10.3390/math14091422 - 23 Apr 2026

Viewed by 65

Abstract

We develop a semi-parametric framework for representing discrete probability mass functions through orthogonal polynomial representations. Classical count models, such as the Poisson and negative binomial distributions, impose restrictive structural assumptions that often fail to accommodate empirical features including heavy overdispersion, multimodality, and nonstandard tail behavior. To address these limitations, we introduce a linear-tilt model constructed from orthonormal polynomial systems associated with Poisson and negative binomial baselines, namely the Charlier and Meixner families. The proposed representation improves the baseline distribution using additional information from empirical moments. This allows the distribution to flexibly adjust its shape, capturing differences in skewness and kurtosis. We establish theoretical properties of the expansion within a weighted Hilbert space formulation, where the coefficients arise as orthogonal projections that can be expressed as expectations of the corresponding polynomial basis functions. In addition, we analyze approximation behavior and provide numerical bounds on the resulting numerical error and convergence properties of truncated approximations. The practical relevance of the proposed methodology is illustrated through applications to several empirical datasets, demonstrating its ability to capture complex distributional structures while preserving a tractable semi-parametric form. Full article

(This article belongs to the Special Issue Advances in Flexible Parametric Distributions for Modeling Skewness and Kurtosis)

25 pages, 3924 KB

Open AccessArticle

SemAlign3D: Multi-Dataset Point Cloud Segmentation with Learnable Class Prompts and KNN Multi-Scale Attention

by Xuanhong Bao and Hao Zhang

Remote Sens. 2026, 18(9), 1284; https://doi.org/10.3390/rs18091284 - 23 Apr 2026

Viewed by 81

Abstract

Point cloud segmentation is a core technology in remote sensing, enabling the extraction of rich semantic information from complex scenes. Existing methods struggle with semantic inconsistency across multiple heterogeneous datasets in complex urban environments. To address semantic inconsistencies, we propose SemAlign3D, a novel multimodal framework for point cloud segmentation that combines learnable class prompts with a multi-scale feature attention module. We integrate five large-scale datasets (SensatUrban, STPLS3D, WHU3D, SemanticKITTI, Semantic3D) to construct a unified training framework, ensuring label consistency by recalibrating semantic labels. The learnable class prompt mechanism dynamically adapts to dataset-specific semantics, enhancing the semantic consistency across multiple datasets of point cloud segmentation. Additionally, the Multi-scale K-Nearest Neighbor Feature Attention Enhancement module integrates local and global features, improving semantic discriminability in complex scenes. Within a single unified training framework, our method effectively aligns semantic labels from multiple heterogeneous datasets, achieving gains of +1.61% mIoU on WHU3D and +0.98% mIoU on SemanticKITTI. These results demonstrate the effectiveness of our framework in improving semantic consistency and robustness across heterogeneous point cloud datasets. Full article

25 pages, 1701 KB

Open AccessArticle

Concrete Crack Detection in Extremely Dark Environments Based on Infrared-Visible Multi-Level Registration Fusion and Frequency Decoupling

by Zixiang Li, Weishuai Xie and Bingquan Xiang

Sensors 2026, 26(9), 2612; https://doi.org/10.3390/s26092612 - 23 Apr 2026

Viewed by 121

Abstract

To address the issues of difficult heterogeneous image registration and low segmentation accuracy caused by the severe lack of illumination and significant modal differences in concrete cracks in extremely dark environments, this paper proposes a two-stage processing framework of registration–fusion first, and decoupling–segmentation later. In the registration and fusion stage, a registration algorithm based on morphological priors and multi-level quadtree spatial constraints is designed. This approach transforms the problem from pixel grayscale matching to spatial topological matching, achieving a feature fusion of high infrared saliency and high visible light sharpness. In the segmentation stage, a Latent Frequency-Decoupled Topological Network (LFDT-Net) is proposed. It utilizes Discrete Wavelet Transform (DWT) to achieve high-fidelity frequency decoupling of the low-frequency infrared backbone and the high-frequency visible light edges. Furthermore, a Cross-Frequency Guidance Module is utilized to eliminate double-edged artifacts, and a skeleton-aware topological loss function is introduced to constrain the topological integrity of the cracks. Experimental results on a self-built heterogeneous multi-modal crack dataset demonstrate that the proposed method significantly outperforms existing mainstream methods in registration accuracy, fusion quality, and segmentation accuracy. Achieving a mean Intersection over Union (mIoU) of 81.7%, the method effectively suppresses background noise in dark environments and precisely restores the microscopic edges and continuous topological structures of faint cracks. Full article

(This article belongs to the Special Issue AI-Based Visual Sensing for Object Detection)

25 pages, 7920 KB

Open AccessArticle

MBA-Former: A Boundary-Aware Transformer for Synergistic Multi-Modal Representation in Pine Wilt Disease Detection from High-Resolution Satellite Imagery

by Rui Hou, Yantao Zhou, Ying Wang, Zhiquan Huang, Jing Yao, Quanjun Jiao, Wenjiang Huang and Biyao Zhang

Forests 2026, 17(5), 517; https://doi.org/10.3390/f17050517 (registering DOI) - 23 Apr 2026

Viewed by 120

Abstract

Pine wilt disease (PWD) is a devastating biological forest disturbance, making its large-scale and high-precision remote sensing monitoring crucial for epidemic prevention and control. However, the performance of existing deep learning methods in high-resolution imagery is often limited by the confusion of spectral features among disparate ground objects and the complexity of forest boundaries. To address these challenges, this study proposes an innovative, end-to-end deep learning architecture termed MBA-Former. Built upon the robust Swin Transformer V2 backbone, the model systematically integrates two highly adaptable functional modules: (1) a front-end intelligent fusion module designed to adaptively fuse heterogeneous features, and (2) a back-end boundary refinement module that refines segmentation contours via dual-task learning. To train and evaluate the model, fine-grained manual annotations were first performed on Gaofen-2 satellite imagery acquired from multiple typical epidemic areas across northern and southern China. Information-enhanced datasets were constructed by fusing the original spectral bands, typical vegetation indices, and texture features. A comprehensive performance evaluation was then conducted, specifically targeting typical challenging scenarios characterized by complex ground object boundaries. The experimental results demonstrate that the Multi-modal Boundary-Aware Transformer (MBA-Former) significantly outperforms current state-of-the-art models. It achieved a mean Intersection over Union (mIoU) of 81.74%, an IoU of 77.58% for the most critical infected tree category, and a Boundary F1-Score of 78.62%. Compared to the best-performing baseline model, Swin-Unet, these three metrics exhibited notable improvements of 2.88%, 3.55%, and 4.46%, respectively. These findings convincingly demonstrate that MBA-Former provides a highly accurate and robust solution for the large-scale, automated remote sensing monitoring of forest diseases, offering immense value in preventing significant economic losses and preserving forest ecosystem integrity. Full article

(This article belongs to the Special Issue Forest Disturbance Monitoring by Remote Sensing: Advancements and Applications)

►▼ Show Figures

Figure 1

23 pages, 1876 KB

Open AccessArticle

Retrieval-Augmented Few-Shot Malware Detection via Binary Visualization and Vision–Language Embeddings

by Woo Jin Jung, Nae-Joung Kwak and Byoung-Yup Lee

Appl. Sci. 2026, 16(9), 4100; https://doi.org/10.3390/app16094100 - 22 Apr 2026

Viewed by 236

Abstract

The rapid evolution of malware families poses significant challenges for cybersecurity systems, particularly when newly emerging threats lack sufficient labeled data. Although image-based deep learning approaches have achieved strong performance under fully supervised conditions, their dependence on retraining limits adaptability in dynamic environments. To address this issue, we propose a Retrieval-Augmented Few-Shot Malware Detection Framework that integrates binary-to-image visualization, multimodal embedding using a frozen Vision–Language Model (Qwen2.5-VL), and similarity-based external memory retrieval. Malware binaries are converted into grayscale images and embedded into a semantic vector space without task-specific fine-tuning. During inference, query samples retrieve similar support embeddings from a vector database, and predictions are generated through similarity-weighted aggregation, enabling adaptation without parameter updates. Evaluated on the MalImg dataset with 25 malware families under 1-shot to 10-shot settings, the framework achieves 0.886 accuracy in the 10-shot configuration. Ablation results demonstrate that combining VLM embeddings with retrieval mechanisms provides consistent improvements over individual components. These findings highlight the effectiveness of decoupling representation learning from adaptation for scalable few-shot malware detection. Full article

(This article belongs to the Special Issue Security, Privacy and Application in New Intelligence Techniques: 2nd Edition)

►▼ Show Figures

Figure 1

29 pages, 11353 KB

Open AccessArticle

Real-Field-Ready and Digitally Sustainable Plant Disease Recognition via Federated Multimodal Edge Learning and Few-Shot Domain Adaptation

by Muhammad Irfan Sharif, Yong Zhong, Muhammad Zaheer Sajid and Francesco Marinello

Agriculture 2026, 16(9), 918; https://doi.org/10.3390/agriculture16090918 (registering DOI) - 22 Apr 2026

Viewed by 270

Abstract

Plant disease diagnosis in real-world agricultural environments is challenged by data scarcity, domain shift, privacy constraints, and limited edge-device resources. This paper proposes FMEL-FSDA, a Federated Multimodal Edge Learning framework with Few-Shot Domain Adaptation for robust field-based plant disease recognition. The framework integrates attention-based RGB–text feature fusion, privacy-preserving federated learning, rapid few-shot personalization, and uncertainty-aware inference within an edge-efficient architecture. Federated training enables collaborative learning across distributed farms without sharing raw data, while few-shot adaptation allows fast deployment to new regions using only 1–10 labeled samples per class. Experiments on the PlantWild in-the-wild dataset show that FMEL-FSDA outperforms centralized, federated, and few-shot baselines, achieving 93.78% accuracy, 93.33% F1-score, and 0.97 AUC. The model maintains strong performance under privacy mechanisms such as gradient perturbation and secure aggregation, reduces communication overhead by up to 4×, and supports low-latency edge inference. Uncertainty estimation and Grad-CAM-based explainability further enhance reliability by identifying low-confidence cases and highlighting disease-relevant regions. Overall, FMEL-FSDA offers a scalable, privacy-aware, and field-ready solution for intelligent plant disease diagnosis in precision agriculture. Full article

(This article belongs to the Special Issue Image Analysis Techniques in Quality Assessment of Agricultural Products)

►▼ Show Figures

Figure 1

23 pages, 3022 KB

Open AccessArticle

Pedestrian Physiological Response Map Prediction Model for Street Audiovisual Environments Using LSTM Networks

by Jingwen Xing, Xuyuan He, Xinxin Li, Tianci Wang, Siqing Mao and Luyao Li

Buildings 2026, 16(9), 1648; https://doi.org/10.3390/buildings16091648 - 22 Apr 2026

Viewed by 124

Abstract

Existing studies of street-related emotional perception mainly rely on static scene evaluations, which cannot capture the cumulative effects of environmental exposure during continuous walking. To address this limitation, this study proposes a method for predicting pedestrian physiological responses in sequential audiovisual street environments. Four real-world walking routes were selected, with outbound and return directions treated as independent paths, yielding eight paths and 32 valid samples. EEG, ECG, sound pressure level, first-person video, and GPS data were synchronously collected to construct a 1 s multimodal time-series dataset. Pearson correlation, Kendall correlation, and mutual information analyses were used to examine linear, monotonic, and nonlinear relationships between environmental variables and physiological indicators, and the resulting weights were incorporated into a Long Short-Term Memory (LSTM) model for multi-step prediction. Visual elements and noise exposure were the main factors influencing physiological responses. Among the models, the mutual-information-weighted LSTM performed best, achieving an R² of 0.77 for heart rate variability (RMSSD), whereas prediction of the EEG ratio (β/α and θ/β) remained limited. An additional independent street sample outside the training set was then used to generate a dual-dimensional EEG-ECG physiological response map, demonstrating the model’s potential for identifying emotional risk segments and supporting street-level micro-renewal. Full article

(This article belongs to the Special Issue AI and Data Analytics for Energy-Efficient and Healthy Buildings: 2nd Edition)

23 pages, 2737 KB

Open AccessArticle

Multimodal and Explainable Deep Learning for Occupational Accident Classification Using Transformer-LSTM Architectures

by Esin Ayşe Zaimoğlu

Buildings 2026, 16(9), 1642; https://doi.org/10.3390/buildings16091642 - 22 Apr 2026

Viewed by 184

Abstract

Occupational safety analytics is increasingly moving toward data-driven methodologies; however, existing models often struggle to capture the multidimensional nature of accident causation. This study presents a multimodal Hybrid Transformer-LSTM framework for classifying occupational fatalities by jointly modeling unstructured narratives, cyclical temporal features, and regional spatial indicators. Utilizing a large-scale dataset of 14,914 OSHA fatality records, the proposed architecture leverages BERT-based embeddings for semantic extraction and Bidirectional LSTMs as non-linear pattern encoders for spatiotemporal context. Conceptually grounded in the Swiss Cheese Model, the framework treats different data modalities as proxies for distinct layers of system risk, ranging from proximal unsafe acts to environmental preconditions. Experimental results show that the multimodal architecture achieves an accuracy of 84.56%, representing a 5.33% gain over unimodal BERT baselines. To address the inherent “black-box” nature of deep learning, a SHAP-based explainability framework is incorporated to quantify the contributions of both textual tokens and environmental features to the model’s decision-making process. The results indicate that integrating narrative semantics with temporal and spatial context enhances discriminative performance and enables context-aware classification within a weakly supervised setting. By providing a scalable and interpretable classification framework, this study offers a data-driven decision-support approach for safety professionals and regulatory bodies seeking to implement evidence-based risk management strategies in high-risk industrial sectors. Full article

(This article belongs to the Section Construction Management, and Computers & Digitization)

►▼ Show Figures

Figure 1

37 pages, 2158 KB

Open AccessReview

AI-Powered Animal-Vehicle Collision Prevention Systems: A Comprehensive Review

by Kaaviyashri Saraboji, Dipankar Mitra and Savisesh Malampallayil

Electronics 2026, 15(8), 1767; https://doi.org/10.3390/electronics15081767 - 21 Apr 2026

Viewed by 277

Abstract

Animal-vehicle collisions (AVCs) pose a significant threat to road safety, wildlife conservation, and transportation systems worldwide. Advances in artificial intelligence (AI) and computer vision have enabled intelligent detection and mitigation systems aimed at reducing such collisions. This review synthesizes the current state of AI-powered AVC prevention systems, examining deep learning architectures, multimodal sensor technologies, real-time processing frameworks, and system-level integration strategies. We analyze the transition from traditional computer vision methods to modern deep neural networks, evaluate sensor fusion approaches, and assess existing wildlife detection datasets and benchmarking practices. Key technical challenges are identified, including environmental variability, long-range detection constraints, dataset scarcity, cross-species generalization limitations, and real-time safety requirements. Rather than framing AVC prevention solely as an object detection task, this review conceptualizes it as a safety-critical perception and risk assessment pipeline operating under strict latency and deployment constraints. Persistent gaps in wildlife-specific detection, standardized evaluation protocols, and scalable edge deployment are discussed. To organize these insights, we present WildSafe-Edge as a conceptual reference architecture derived from the literature, synthesizing system-level design considerations and highlighting open research directions. Future research directions include transfer learning, synthetic data augmentation, vehicle-to-everything (V2X) integration, and edge-centric architectures to enable robust, real-world collision mitigation systems. Full article

(This article belongs to the Topic Innovations in AI and Signal Processing for Advanced Sensing, Radar, RFID, and Communication Systems)

23 pages, 5106 KB

Open AccessArticle

A Multidimensional Framework for Analyzing Image–Text Consistency in Social Media

by Hongqi Xia, Zhijie Zhao, Binbin Zhao, Hong Lan, Han Wu, Xujing Jing and Yanrong Zhang

Appl. Sci. 2026, 16(8), 4044; https://doi.org/10.3390/app16084044 - 21 Apr 2026

Viewed by 195

Abstract

As image–text posts have become a dominant form of social media communication, understanding how the two modalities jointly convey meaning remains a key challenge in multimodal analysis. This study aims to examine whether image–text consistency is inherently multidimensional rather than reducible to a single similarity metric. Existing studies often reduce consistency to a single relevance score, which cannot capture semantic, emotional, and functional interactions. We construct a dataset of 28,650 multimodal posts and model image–text relationships along three dimensions: semantic consistency (CSC), emotional consistency (CEC), and informational matching consistency (IMC). Semantic and emotional alignment are measured using cross-modal representation and similarity computation, while IMC is defined through rule-based classification of informational roles. Results show that emotional consistency (CEC = 0.621) is higher than semantic consistency (CSC = 0.549,

p < 0.001

), while 61.0% of posts maintain consistent informational orientation. These findings demonstrate that image–text consistency exhibits distinct cross-dimensional patterns that cannot be captured by single-metric approaches. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 49.

Go to page 1 2 3 4 5

Search Results (2,433)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI