Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (342)

Search Parameters:
Keywords = zero shot-learning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
32 pages, 4314 KB  
Article
A Hardware-Aware Federated Meta-Learning Framework for Intraday Return Prediction Under Data Scarcity and Edge Constraints
by Zhe Wen, Xin Cheng, Ruixin Xue, Jinao Ye, Zhongfeng Wang and Meiqi Wang
Appl. Sci. 2026, 16(5), 2319; https://doi.org/10.3390/app16052319 - 27 Feb 2026
Viewed by 14
Abstract
Although deep learning has achieved remarkable success in time-series prediction, intraday algorithmic trading is characterized by frequent regime shifts (concept drift), which can rapidly render models trained on historical data obsolete in real applications. This motivates on-device adaptation at edge trading terminals. However, [...] Read more.
Although deep learning has achieved remarkable success in time-series prediction, intraday algorithmic trading is characterized by frequent regime shifts (concept drift), which can rapidly render models trained on historical data obsolete in real applications. This motivates on-device adaptation at edge trading terminals. However, practical deployment is constrained by a tripartite bottleneck: real-time samples are scarce, hardware resources on edge are limited, and communication overhead between cloud and edge must be kept low to satisfy stringent latency requirements. To address these challenges, we develop a hardware-aware edge learning framework that combines federated learning (FL) and meta-learning to enable rapid few-shot personalization without exposing local data. Importantly, the framework incorporates our proposed Sleep Node Algorithm (SNA), which turns the “FL + meta-learning” combination into a practical and efficient edge solution. Specifically, SNA dynamically deactivates “inertial” (insensitive) network components during adaptation: it provides a structural regularizer that stabilizes few-shot updates and mitigates overfitting under concept drift, while inducing sparsity that reduces both on-device computation and cloud-edge communication. To efficiently leverage these unstructured zero nodes introduced by SNA, we further design a dedicated accelerator, EPAST (Energy-efficient Pipelined Accelerator for Sparse Training). EPAST adopts a heterogeneous architecture and introduces a dedicated Backward Pipeline (BPIP) dataflow that overlaps backpropagation stages, thereby improving hardware utilization under irregular sparse workloads. Experimental results demonstrate that our system consistently outperforms strong baselines, including DQN, GARCH-XGBoost, and LRU, in terms of Pearson IC. A 55 nm CMOS ASIC implementation further validates robust learning under an extreme 5-shot setting (IC = 0.1176), achieving an end-to-end training speed-up of 11.35× and an energy efficiency of 45.78 TOPS/W. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Industrial Engineering)
Show Figures

Figure 1

18 pages, 1079 KB  
Article
Feasibility of Using Large Language Models for Structured Medication Extraction from Clinical Text: A Comparative Analysis of Zero-Shot and Few-Shot Paradigms
by Evan Schulte, Mohamed Abusharkh, Kushal Dahal, Michael Klepser and Minji Sohn
Appl. Sci. 2026, 16(5), 2300; https://doi.org/10.3390/app16052300 - 27 Feb 2026
Viewed by 36
Abstract
The digitization of healthcare has been accompanied by a rapid expansion of electronic health records (EHRs); however, a significant proportion of critical patient data, specifically medication regimens, remains entrapped within unstructured clinical narratives. The inability to seamlessly compute this data hinders advancements in [...] Read more.
The digitization of healthcare has been accompanied by a rapid expansion of electronic health records (EHRs); however, a significant proportion of critical patient data, specifically medication regimens, remains entrapped within unstructured clinical narratives. The inability to seamlessly compute this data hinders advancements in pharmacovigilance, clinical decision support, and population health management. This study presents a comprehensive, rigorous evaluation of the feasibility of deploying Large Language Models (LLMs) to automate the extraction of structured dosage information (Dose, Daily Frequency, Duration) from outpatient antimicrobial clinical notes sourced from the Collaboration to Harmonize Antimicrobial Registry Measures (CHARM) registry. We scrutinized the performance of five distinct open-weight architectures, namely GPT-OSS:20B, Gemma 2:9B, Mistral 7B, Qwen3:14B and Llama 3.2, across both Zero-Shot and Retrieval Augmented Generation (RAG)-based Few-Shot prompting paradigms. Our analysis reveals a fundamental architectural trade-off: the reasoning-optimized GPT-OSS:20B dominates the zero-shot landscape (F1 > 0.90) by leveraging abstract schema understanding, whereas the instruction-tuned Gemma 2:9B excels in the few-shot setting (F1 ~ 0.99), effectively utilizing examples as guardrails to surpass larger models. Conversely, smaller models (Mistral, Llama) exhibit a prohibitive “hallucination barrier,” rendering them unsafe for unsupervised clinical application. Furthermore, we identify “Inconsistent Unit Handling” and “Complex Temporal Logic” as persistent failure modes that resist simple scaling laws. This report provides a definitive framework for selecting model architectures based on the availability of few-shot examples and highlights the necessity of dynamic RAG strategies to achieve production-grade reliability in medical informatics. Full article
Show Figures

Figure 1

43 pages, 1324 KB  
Article
Explainable Kolmogorov–Arnold Networks for Zero-Shot Human Activity Recognition on TinyML Edge Devices
by Ismail Lamaakal, Chaymae Yahyati, Yassine Maleh, Khalid El Makkaoui and Ibrahim Ouahbi
Mach. Learn. Knowl. Extr. 2026, 8(3), 55; https://doi.org/10.3390/make8030055 - 26 Feb 2026
Viewed by 75
Abstract
Human Activity Recognition (HAR) on wearable and IoT devices must jointly satisfy four requirements: high accuracy, the ability to recognize previously unseen activities, strict memory and latency constraints, and interpretable decisions. In this work, we address all four by introducing an explainable Kolmogorov–Arnold [...] Read more.
Human Activity Recognition (HAR) on wearable and IoT devices must jointly satisfy four requirements: high accuracy, the ability to recognize previously unseen activities, strict memory and latency constraints, and interpretable decisions. In this work, we address all four by introducing an explainable Kolmogorov–Arnold Network for Human Activity Recognition (TinyKAN-HAR) with a zero-shot learning (ZSL) module, designed specifically for TinyML edge devices. The proposed KAN replaces fixed activation functions by learnable one-dimensional spline operators applied after linear mixing, yielding compact yet expressive feature extractors whose internal nonlinearities can be directly visualized. On top of the KAN latent space, we learn a semantic projection and cosine-based compatibility function that align sensor features with class-level semantic embeddings, enabling both pure and generalized zero-shot recognition of unseen activities. We evaluate our method on three benchmark datasets (UCI HAR, WISDM, PAMAP2) under subject-disjoint and zero-shot splits. TinyKAN-HAR consistently achieves over 97% macro-F1 on seen classes and over 96% accuracy on unseen activities, with harmonic mean above 96% in the generalized ZSL setting, outperforming CNN, LSTM and Transformer-based ZSL baselines. For explainability, we combine gradient-based attributions, SHAP-style global relevance scores and inspection of the learned spline functions to provide sensor-level, temporal and neuron-level insights into each prediction. After 8-bit quantization and TinyML-oriented optimizations, the deployed model occupies only 145 kB of flash and 26 kB of RAM, and achieves an average inference latency of 4.1 ms (about 0.32 mJ per window) on a Cortex-M4F-class microcontroller, while preserving accuracy within 0.2% of the full-precision model. These results demonstrate that explainable, zero-shot HAR with near state-of-the-art accuracy is feasible on severely resource-constrained TinyML edge devices. Full article
(This article belongs to the Section Learning)
Show Figures

Figure 1

18 pages, 4500 KB  
Article
Localizing Perceptual Artifacts in Synthetic Images for Image Quality Assessment via Deep-Learning-Based Anomaly Detection
by Zijin Yin
Electronics 2026, 15(5), 916; https://doi.org/10.3390/electronics15050916 - 24 Feb 2026
Viewed by 115
Abstract
While deep generative models, such as text-to-image diffusion, demonstrate strong capabilities in synthesizing photorealistic images, they frequently produce perceptual artifacts (e.g., distorted structures or unnatural textures) that require manual correction. Existing artifact localization methods typically rely on fully supervised training with large-scale pixel-level [...] Read more.
While deep generative models, such as text-to-image diffusion, demonstrate strong capabilities in synthesizing photorealistic images, they frequently produce perceptual artifacts (e.g., distorted structures or unnatural textures) that require manual correction. Existing artifact localization methods typically rely on fully supervised training with large-scale pixel-level annotations, which suffer from high labeling costs. To address these challenges, we propose a novel framework based on the core insight that perceptual artifacts can be fundamentally modeled as “semantic outliers”—regions that inherently fail to match any pre-defined semantic categories. Instead of learning specific artifact features, we introduce a Mask-based Semantic Rejection (MSR) mechanism within a semantic segmentation architecture. This mechanism leverages the “one-vs-all” property of object queries to identify regions that are consistently rejected by all pre-trained semantic categories. Furthermore, we design a flexible adaptation strategy that supports both zero-shot inference using pre-trained semantic knowledge and fine-tuning with a margin-based suppression objective to explicitly optimize the rejection boundary using minimal supervision. Comprehensive experiments across 11 synthesis tasks demonstrate that MSR significantly outperforms state-of-the-art methods, particularly in data-efficient scenarios. Specifically, the framework achieves mIoU improvements of 6.52% and 13.06% on the text-to-image task using only 10% and 50% of labeled samples, respectively, underscoring its superior capability. Full article
(This article belongs to the Special Issue Computer Vision and AI Algorithms for Diverse Scenarios)
Show Figures

Figure 1

28 pages, 5823 KB  
Article
Automated Multi-Modal MRI Segmentation of Stroke Lesions and Corticospinal Tract Integrity for Functional Outcome Prediction
by Daniyal Iqbal, Domenec Puig, Muhammad Mursil and Hatem A. Rashwan
Tomography 2026, 12(3), 29; https://doi.org/10.3390/tomography12030029 - 24 Feb 2026
Viewed by 100
Abstract
Background/Objectives: Stroke is a leading cause of long-term disability, and predicting functional outcome at discharge, such as the modified Rankin Scale (mRS), is important for guiding treatment and rehabilitation. Many existing approaches depend on advanced imaging or complex corticospinal tract (CST) segmentation from [...] Read more.
Background/Objectives: Stroke is a leading cause of long-term disability, and predicting functional outcome at discharge, such as the modified Rankin Scale (mRS), is important for guiding treatment and rehabilitation. Many existing approaches depend on advanced imaging or complex corticospinal tract (CST) segmentation from multi-shell diffusion MRI, limiting clinical feasibility. Automated lesion segmentation is also challenging due to lesion heterogeneity and MRI variability. This study proposes a clinically feasible multimodal MRI pipeline based on routine imaging. Methods: Lesion segmentation models were trained and evaluated on the ISLES 2022 dataset (250 training, 150 test cases). Zero-shot external evaluation was performed on 149 cases from ISLES 2024 using standard MRI sequences only. An ensemble of deep learning models (SEALS, NVAUTO, FACTORIZER) was evaluated on ISLES 2022, while SEALS alone was used for external testing. CST segmentation was performed using TractSeg on single-shell diffusion-weighted imaging. Imaging biomarkers included lesion volume, shape, ADC-based texture features, CST integrity, and lesion–CST overlap. These features were used to train machine learning models for binary mRS prediction at discharge. Results: The ensemble achieved a Dice score of 0.82 on ISLES 2022, while zero-shot evaluation on ISLES 2024 achieved 0.57. In exploratory analysis, CatBoost achieved the highest point estimates (accuracy 0.88, F1-score 0.87, ROC-AUC 0.83). Key predictors included lesion–CST overlap, lesion volume, surface area, dissimilarity, and contrast. Conclusions: This exploratory study demonstrates the feasibility of combining automated lesion segmentation with anatomically informed biomarkers using routine clinical MRI, supporting interpretable stroke outcome modelling and motivating future large-scale validation. Full article
Show Figures

Graphical abstract

43 pages, 1927 KB  
Article
A Large-Scale Empirical Study of LLM Orchestration and Ensemble Strategies for Sentiment Analysis in Recommender Systems
by Konstantinos I. Roumeliotis, Dionisis Margaris, Dimitris Spiliotopoulos and Costas Vassilakis
Future Internet 2026, 18(2), 112; https://doi.org/10.3390/fi18020112 - 20 Feb 2026
Viewed by 384
Abstract
This paper presents a comprehensive empirical evaluation comparing meta-model aggregation strategies with traditional ensemble methods and standalone models for sentiment analysis in recommender systems beyond standalone large language model (LLM) performance. We investigate whether aggregating multiple LLMs through a reasoning-based meta-model provides measurable [...] Read more.
This paper presents a comprehensive empirical evaluation comparing meta-model aggregation strategies with traditional ensemble methods and standalone models for sentiment analysis in recommender systems beyond standalone large language model (LLM) performance. We investigate whether aggregating multiple LLMs through a reasoning-based meta-model provides measurable performance advantages over individual models and standard statistical aggregation approaches in zero-shot sentiment classification. Using a balanced dataset of 5000 verified Amazon purchase reviews (1000 reviews per rating category from 1 to 5 stars, sampled via two-stage stratified sampling across five product categories), we evaluate 12 different leading pre-trained LLMs from four major providers (OpenAI, Anthropic, Google, and DeepSeek) in both standalone and meta-model configurations. Our experimental design systematically compares individual model performance against GPT-based meta-model aggregation and traditional ensemble baselines (majority voting, mean aggregation). Results show statistically significant improvements (McNemar’s test, p < 0.001): the GPT-5 meta-model achieves 71.40% accuracy (10.15 percentage point improvement over the 61.25% individual model average), while the GPT-5 mini meta-model reaches 70.32% (9.07 percentage point improvement). These observed improvements surpass traditional ensemble methods (majority voting: 62.64%; mean aggregation: 62.96%), suggesting potential value in meta-model aggregation for sentiment analysis tasks. Our analysis reveals empirical patterns including neutral sentiment classification challenges (3-star ratings show 64.83% failure rates across models), model influence hierarchies, and cost-accuracy trade-offs ($130.45 aggregation cost vs. $0.24–$43.97 for individual models per 5000 predictions). This work provides evidence-based insights into the comparative effectiveness of LLM aggregation strategies in recommender systems, demonstrating that meta-model aggregation with natural language reasoning capabilities achieves measurable performance gains beyond statistical aggregation alone. Full article
(This article belongs to the Special Issue Intelligent Agents and Their Application)
Show Figures

Figure 1

17 pages, 2000 KB  
Article
Probabilistic Bird Trajectory Forecasting with Heavy-Tailed Uncertainty Modeling for Low-Altitude Airspace Monitoring
by Feiyang Song, Zhonghe Liu, Yuyang Zhao and Jingguo Zhu
Sensors 2026, 26(4), 1270; https://doi.org/10.3390/s26041270 - 15 Feb 2026
Viewed by 294
Abstract
The low-altitude airspace of bird flocks is gradually shared by unmanned aerial vehicles (UAVs), posing safety risks that necessitate accurate trajectory forecasting. However, existing vision-based methods often treat trajectory prediction and UAV detection as separate tasks, assume light-tailed Gaussian noise, and rely on [...] Read more.
The low-altitude airspace of bird flocks is gradually shared by unmanned aerial vehicles (UAVs), posing safety risks that necessitate accurate trajectory forecasting. However, existing vision-based methods often treat trajectory prediction and UAV detection as separate tasks, assume light-tailed Gaussian noise, and rely on heavy backbones. These limitations, when applied to bird trajectory forecasting, limit uncertainty calibration and embedded deployment in ground-based monocular surveillance. In this work, we propose a unified framework for low-altitude monitoring. Its core, Mini-BirdFormer, combines a lightweight Transformer encoder with a Student-t mixture density head to model heavy-tailed flight dynamics and produce calibrated uncertainty. Experiments on a real-world dataset show the model achieves strong long-horizon performance with only 1.05 million parameters, attaining a minADE of 0.785 m and reducing negative log-likelihood from 1.25 to −2.01 (lower is better) compared with a Gaussian Long Short-Term Memory (LSTM) baseline. Crucially, it enables low-latency inference on resource-constrained platforms at 616 FPS. Additionally, a system-level extension supports zero-shot UAV detection via open-vocabulary learning, attaining 92% recall without false alarms. Results demonstrate that combining heavy-tailed probabilistic modeling with a compact backbone provides a practical, deployable approach for monitoring shared airspace. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

29 pages, 2340 KB  
Article
Target-Aware Bilingual Stance Detection in Social Media Using Transformer Architecture
by Abdul Rahaman Wahab Sait and Yazeed Alkhurayyif
Electronics 2026, 15(4), 830; https://doi.org/10.3390/electronics15040830 - 14 Feb 2026
Viewed by 140
Abstract
Stance detection has emerged as an essential tool in natural language processing for understanding how individuals express agreement, disagreement, or neutrality toward specific targets in social and online discourse. It plays a crucial role in bilingual and multilingual environments, including English-Arabic social media [...] Read more.
Stance detection has emerged as an essential tool in natural language processing for understanding how individuals express agreement, disagreement, or neutrality toward specific targets in social and online discourse. It plays a crucial role in bilingual and multilingual environments, including English-Arabic social media ecosystems, where differences in language structure, discourse style, and data availability pose significant challenges for reliable stance modelling. Existing approaches often struggle with target awareness, cross-lingual generalization, robustness to noisy user-generated text, and the interpretability of model decisions. This study aims to build a reliable, explainable target-aware bilingual stance-detection framework that generalizes across heterogeneous stance formats and languages without retraining on a dataset specific to the target language. Thus, a unified dual-encoder architecture based on mDeBERTa-v3 is proposed. Cross-language contrastive learning offers an auxiliary training objective to align English and Arabic stance representations in a common semantic space. Robustness-oriented regularization is used to mitigate the effects of informal language, vocabulary variation, and adversarial noise. To promote transparency and trustworthiness, the framework incorporates token-level rationale extraction, enables fine-grained interpretability, and supports analysis of hallucination. The proposed model is tested on a combined bilingual test set and two structurally distinct zero-shot benchmarks: MT-CSD and AraStance. Experimental results show consistent performance, with accuracies of 85.0% and 86.8% and F1-scores of 84.7% and 86.8% on the zero-shot benchmarks, confirming stable performance and realistic generalization. Ultimately, these findings reveal that effective bilingual stance detection can be achieved via explicit target conditioning, cross-lingual alignment, and explainability-driven design. Full article
Show Figures

Figure 1

23 pages, 2573 KB  
Article
Development of an Unattended Ionosphere–Geomagnetism Monitoring System with Dual-Adversarial AI for Remote Mid–High-Latitude Regions
by Cheng Cui, Zhengxiang Xu, Zefeng Liu, Zejun Hu, Fuqiang Li, Yinke Dou and Yuchen Wang
Aerospace 2026, 13(2), 179; https://doi.org/10.3390/aerospace13020179 - 13 Feb 2026
Viewed by 175
Abstract
To address coverage gaps in high-latitude space weather monitoring caused by constraints in energy, bandwidth, and labeled samples, this study presents a systematic solution deployed in Hailar, China. We constructed a Cloud–Edge–Terminal system featuring wind–solar hybrid energy and RK3588-based edge computing, achieving six [...] Read more.
To address coverage gaps in high-latitude space weather monitoring caused by constraints in energy, bandwidth, and labeled samples, this study presents a systematic solution deployed in Hailar, China. We constructed a Cloud–Edge–Terminal system featuring wind–solar hybrid energy and RK3588-based edge computing, achieving six months of stable ionospheric–geomagnetic observation under −40 °C. Furthermore, we propose a Dual-Adversarial Recurrent Autoencoder (DA-RAE) for anomaly detection. Utilizing a single-source domain strategy, the model learns physical manifolds from quiet-day data, enabling zero-shot anomaly perception in the unsupervised target domain. Field tests in March 2025 demonstrated superior generalized anomaly detection capabilities, successfully identifying both transient space weather events and environmental equipment faults (baseline drifts). This work validates the value of edge intelligence for autonomous operations in extreme environments, providing a reproducible paradigm for global ground-based networks. Full article
(This article belongs to the Special Issue Situational Awareness Using Space-Based Sensor Networks)
Show Figures

Figure 1

18 pages, 4326 KB  
Article
DCS: A Zero-Shot Anomaly Detection Framework with DINO-CLIP-SAM Integration
by Yan Wan, Yingqi Lang and Li Yao
Appl. Sci. 2026, 16(4), 1836; https://doi.org/10.3390/app16041836 - 12 Feb 2026
Viewed by 243
Abstract
Recently, the progress of foundation models such as CLIP and SAM has shown the great potential of zero-shot anomaly detection tasks. However, existing methods usually rely on general descriptions such as “abnormal”, and the semantic coverage is insufficient, making it difficult to express [...] Read more.
Recently, the progress of foundation models such as CLIP and SAM has shown the great potential of zero-shot anomaly detection tasks. However, existing methods usually rely on general descriptions such as “abnormal”, and the semantic coverage is insufficient, making it difficult to express fine-grained anomaly semantics. In addition, CLIP primarily performs global-level alignment, and it is difficult to accurately locate minor defects, while the segmentation quality of SAM is highly dependent on prompt constraints. In order to solve these problems, we proposed DCS, a unified framework that integrates Grounding DINO, CLIP and SAM through three key innovations. First of all, we introduced FinePrompt for adaptive learning, which significantly enhanced the modeling ability of exception semantics by building a fine-grained exception description library and adopting learnable text embeddings. Secondly, we have designed an Adaptive Dual-path Cross-modal Interaction (ADCI) module to achieve more effective cross-modal information exchange through dual-path fusion. Finally, we proposed a Box-Point Prompt Combiner (BPPC), which combines box prior information provided by DINO with the point prompt generated by CLIP, so as to guide SAM to generate finer and more complete segmentation results. A large number of experiments have proved the effectiveness of our method. On the MVTec-AD and VisA datasets, DCS has achieved the most state-of-the-art zero-shot anomaly detection results. Full article
Show Figures

Figure 1

24 pages, 19000 KB  
Article
Scaling Functional Electrical Stimulation Control for Diverse Users Through Offline Distributional Reinforcement Learning
by Nat Wannawas, Jyotindra Narayan, Warakom Nerdnoi and Arsanchai Sukkuea
Robotics 2026, 15(2), 38; https://doi.org/10.3390/robotics15020038 - 8 Feb 2026
Viewed by 291
Abstract
Functional Electrical Stimulation (FES) can restore motor function; however, achieving precise multi-joint control remains challenging due to nonlinear muscle dynamics and fatigue. Reinforcement Learning (RL) offers a promising solution, but practical deployment is hindered by the need for patient-specific calibration. This study investigates [...] Read more.
Functional Electrical Stimulation (FES) can restore motor function; however, achieving precise multi-joint control remains challenging due to nonlinear muscle dynamics and fatigue. Reinforcement Learning (RL) offers a promising solution, but practical deployment is hindered by the need for patient-specific calibration. This study investigates offline RL approaches for controlling planar arm movements using heterogeneous datasets, aiming to enable zero-shot transfer to new users. We develop a biomechanical arm model in MuJoCo and evaluate four RL algorithms coupled with three offline techniques: conservative Q learning (SAC-CQL and QBR-CQL), Randomized Ensemble (QBR-REM), and distributional RL (IQNBR). Across all conditions, IQNBR demonstrates robust learning and superior control performance, achieving an average RMSE of 3.8±0.6 cm, even when trained on mixed-quality data. These results highlight the potential of distributional RL as a base learning method to build generic FES controllers that can operate without exhaustive calibration, with broader implications for controlling robots with human-like actuation systems. Full article
(This article belongs to the Special Issue AI-Powered Robotic Systems: Learning, Perception and Decision-Making)
Show Figures

Figure 1

19 pages, 3856 KB  
Article
Towards Sustainable Wildlife Conservation: Automatic Recognition of Endangered Animal Behavior Using a Multimodal Contrastive Learning Framework
by Shuyi Liu, Ao Xu and Zhenjie Hou
Sustainability 2026, 18(3), 1612; https://doi.org/10.3390/su18031612 - 5 Feb 2026
Viewed by 225
Abstract
Automatic recognition of endangered animal behavior is crucial for biodiversity conservation and improving animal welfare, yet traditional manual observation remains inefficient and invasive. This work contributes directly to sustainable wildlife management by enabling non-invasive, scalable, and efficient monitoring, which supports long-term ecological balance [...] Read more.
Automatic recognition of endangered animal behavior is crucial for biodiversity conservation and improving animal welfare, yet traditional manual observation remains inefficient and invasive. This work contributes directly to sustainable wildlife management by enabling non-invasive, scalable, and efficient monitoring, which supports long-term ecological balance and aligns with several United Nations Sustainable Development Goals (SDGs), particularly SDG 15 (Life on Land) and SDG 12 (Responsible Consumption and Production). The current deep learning approaches often struggle with the scarcity of behavioral data and complex environments, leading to poor model generalization. To address these challenges, this study focuses on endangered animal behavior monitoring and proposes a multimodal learning framework termed ABCLIP. This model leverages multimodal contrastive learning between video-and-text pairs, utilizing natural language supervision to enhance representation ability. The framework integrates pre-training, prompt learning, and fine-tuning to optimize performance specifically for small-scale animal behavior datasets, with a focus on the specific social and ecological behaviors of giant pandas. The experimental results demonstrate that ABCLIP achieves remarkable accuracy and robustness in recognizing endangered animal behaviors, attaining Top-1 and Top-5 accuracy of 82.50% and 99.25%, respectively, on the LoTE-Animal dataset, which outperforms strong baseline methods such as SlowFast (78.54%/97.55%). Furthermore, in zero-shot recognition scenarios for unseen behaviors, ABCLIP achieves an accuracy of 58.00%. This study highlights the potential of multimodal contrastive learning in wildlife monitoring and provides efficient technical support for precise protection measures and scientific management of endangered species. Full article
Show Figures

Figure 1

31 pages, 8257 KB  
Article
Analytical Assessment of Pre-Trained Prompt-Based Multimodal Deep Learning Models for UAV-Based Object Detection Supporting Environmental Crimes Monitoring
by Andrea Demartis, Fabio Giulio Tonolo, Francesco Barchi, Samuel Zanella and Andrea Acquaviva
Geomatics 2026, 6(1), 14; https://doi.org/10.3390/geomatics6010014 - 3 Feb 2026
Viewed by 1062
Abstract
Illegal dumping poses serious risks to ecosystems and human health, requiring effective and timely monitoring strategies. Advances in uncrewed aerial vehicles (UAVs), photogrammetry, and deep learning (DL) have created new opportunities for detecting and characterizing waste objects over large areas. Within the framework [...] Read more.
Illegal dumping poses serious risks to ecosystems and human health, requiring effective and timely monitoring strategies. Advances in uncrewed aerial vehicles (UAVs), photogrammetry, and deep learning (DL) have created new opportunities for detecting and characterizing waste objects over large areas. Within the framework of the EMERITUS Project, an EU Horizon Europe initiative supporting the fight against environmental crimes, this study evaluates the performance of pre-trained prompt-based multimodal (PBM) DL models integrated into ArcGIS Pro for object detection and segmentation. To test such models, UAV surveys were specially conducted at a semi-controlled test site in northern Italy, producing very high-resolution orthoimages and video frames populated with simulated waste objects such as tyres, barrels, and sand piles. Three PBM models (CLIPSeg, GroundingDINO, and TextSAM) were tested under varying hyperparameters and input conditions, including orthophotos at multiple resolutions and frames extracted from UAV-acquired videos. Results show that model performance is highly dependent on object type and imagery resolution. In contrast, within the limited ranges tested, hyperparameter tuning rarely produced significant improvements. The evaluation of the models was performed using low IoU to generalize across different types of detection models and to focus on the ability of detecting object. When evaluating the models with orthoimagery, CLIPSeg achieved the highest accuracy with F1 scores up to 0.88 for tyres, whereas barrels and ambiguous classes consistently underperformed. Video-derived (oblique) frames generally outperformed orthophotos, reflecting a closer match to model training perspectives. Despite the current limitations in performances highlighted by the tests, PBM models demonstrate strong potential for democratizing GeoAI (Geospatial Artificial Intelligence). These tools effectively enable non-expert users to employ zero-shot classification in UAV-based monitoring workflows targeting environmental crime. Full article
Show Figures

Figure 1

29 pages, 7612 KB  
Article
A Method for 3D Building Individualization Integrating SAMPolyBuild and Multiple Spatial-Geometric Features
by Lianshuai Cao, Yi Cheng, Zheng Zhang, Ge Zhu, Kunyang Ma and Xinyue Xu
Sensors 2026, 26(3), 999; https://doi.org/10.3390/s26030999 - 3 Feb 2026
Viewed by 269
Abstract
Individualization of buildings is one of the key issues in the establishment of three-dimensional (3D) building models. Most existing individualization methods rely on inefficient manual separation, while deep learning approaches require extensive pre-training and are highly influenced by the spatial structure of the [...] Read more.
Individualization of buildings is one of the key issues in the establishment of three-dimensional (3D) building models. Most existing individualization methods rely on inefficient manual separation, while deep learning approaches require extensive pre-training and are highly influenced by the spatial structure of the models. To address these issues, this paper proposes a novel method for 3D building individualization that integrates SAMPolyBuild with multiple spatial-geometric features. Leveraging the zero-shot learning capability of SAMPolyBuild, the method first performs coarse extraction of individual buildings, then refines the extraction accuracy using multiple spatial-geometric features. Innovatively, two statistical parameters—Jensen-Shannon Divergence and Earth Mover’s Distance—are introduced into the building identification process. To validate the feasibility and effectiveness of the proposed method, experiments were conducted on the Semantic Urban Meshes (SUM) dataset. The results demonstrate that the method can effectively extract individual building models from urban oblique photogrammetric 3D models, achieving an F1-score of approximately 0.83 for buildings with typical spatial structures. Full article
(This article belongs to the Special Issue Remote Sensing, Geophysics and GIS)
Show Figures

Figure 1

26 pages, 403 KB  
Article
How the Representation of Retrieved Context Affects In-Context Prompting for Commit Message Generation
by Dokyeong An and Geunseok Yang
Electronics 2026, 15(3), 652; https://doi.org/10.3390/electronics15030652 - 2 Feb 2026
Viewed by 153
Abstract
High-quality commit messages are essential software artifacts because they succinctly communicate the intent and scope of code changes, yet large language models (LLMs) often fail to reflect project-specific writing conventions when used in a zero-shot setting without contextual signals. This study investigates not [...] Read more.
High-quality commit messages are essential software artifacts because they succinctly communicate the intent and scope of code changes, yet large language models (LLMs) often fail to reflect project-specific writing conventions when used in a zero-shot setting without contextual signals. This study investigates not whether retrieval helps, but how the same retrieved example, when represented differently in the prompt, quantitatively changes generation outcomes. We implement a retrieve-then-generate framework where the target commit’s diff is used as a query for BM25 (Best Matching 25)-based sparse retrieval over a commit-level database, and the top-1 similar commit is optionally injected as an example context. We compare a no-context condition (K = 0) against a minimal-context condition (K = 1) under three context representations: Diff-only, Message-only, and Diff + Message pair. Using Qwen-7B on 8000 evaluation samples with a fixed prompt skeleton, deterministic decoding, and identical post-processing across conditions, we observe negligible differences at K = 0 (BLEU-4 1.14, ROUGE-L 7.47–7.48, METEOR 4.88–4.91), establishing a stable baseline. At K = 1, the same top-1 retrieved case yields systematically different metric responses depending on how it is represented (Diff-only, Message-only, or Diff + Message), even under an identical prompt skeleton, deterministic decoding, and identical post-processing. This indicates that “context representation” is not a cosmetic formatting choice but a first-class prompt-design variable in retrieval-augmented in-context learning for commit message generation. Accordingly, practitioners should select the representation based on the intended objective (e.g., lexical/style alignment vs. change-intent grounding), rather than assuming a universally optimal format. Full article
(This article belongs to the Special Issue AI-Powered Natural Language Processing Applications)
Show Figures

Figure 1

Back to TopTop