Saved Queries

Although many researchers use pre-trained models to better solve downstream tasks, further exploration of more effective pre-training methods remains necessary, especially for multi-modal pre-training where high-quality training data is more difficult to obtain. This work aims to improve the knowledge-learning performance in multi-modal pre-training. Some researchers focus on injecting entity knowledge into language pre-trained models based on masked entity model (MEM) training, which masks entities randomly and lets the model recover. These methods cannot guarantee good performance due to the lack of consideration of which entities are more valuable for learning. Moreover, in multi-modal training data, some entities may be unrelated to visual content. In this work, for the vision-language pre-trained model, we propose a Masked Entity Model pre-training method based on Active learning (ActiveMEM). It is designed to actively mask important and informative entities—those that are both informative and uncertain—for the model to recover, thereby encouraging it to extract more valuable knowledge from the data. The proposed method is evaluated using three pre-training datasets and four downstream datasets, and the experimental results demonstrate the effectiveness of our method. Full article

►▼ Show Figures

Figure 1

21 pages, 1456 KB

Open AccessArticle

A Camera-Based Multimodal Defect Sensing Framework for Substation Equipment Monitoring via Cross-Modal Feature Mapping

by Ziquan Liu, Hai Xue, Chengbo Hu, Chao Wei and Can Zhang

Sensors 2026, 26(12), 3935; https://doi.org/10.3390/s26123935 (registering DOI) - 21 Jun 2026

Abstract

To address the limitations of vision-only defect detection, image–semantic misalignment, and spatial-logic conflicts in complex substation inspection scenarios, this paper proposes a camera-sensor-based multimodal defect sensing framework with cross-modal feature mapping for substation equipment monitoring. The proposed framework integrates field inspection images acquired by camera sensors, defect textual descriptions, and equipment topology knowledge and establishes a unified domain-adaptive pre-training–bidirectional cross-modal mapping–hierarchical reasoning workflow. First, a Contrastive Language–Image Pre-training (CLIP)-based domain-adaptive pre-training strategy is developed to enhance the representation of equipment categories, defect attributes, and inspection-scene semantics. Second, a bidirectional cross-modal feature mapping network is constructed to model fine-grained interactions between candidate visual regions and textual semantics, where uncertainty-aware fusion and prototype constraints are introduced to improve semantic alignment and defect discrimination. Third, a hierarchical neuro-symbolic reasoning module incorporates equipment topology and spatial rules for posterior verification, logical consistency checking, and false-positive suppression. Experiments on a substation inspection image dataset demonstrate that the proposed method achieves 90.8% mAP@0.5, 68.7% mAP@0.5:0.95, and 89.4% F1-score, outperforming mainstream and recent detection models. Full article

(This article belongs to the Special Issue Advanced Sensing Technologies for Grid Monitoring, Protection, and Control)

34 pages, 11535 KB

Open AccessArticle

EASE-PVNet: Robust Periocular Identity Verification Across Pre- and Post-Operative Facial Images

by Ziyad Azzaz, Omar Khaled, Esraa Khatab, Hany Said and Omar Shalash

Mach. Learn. Knowl. Extr. 2026, 8(6), 169; https://doi.org/10.3390/make8060169 (registering DOI) - 21 Jun 2026

Abstract

Identity verification across pre-operative and post-operative facial images remains a challenging task, particularly following eyelid surgery, where localized periocular changes can disrupt conventional face recognition systems. This research introduces a novel verification framework using an ensemble-based autoencoder-initialized Siamese eye-region periocular verification network designed to remain resilient to surgically induced appearance variation. The proposed approach integrates anatomy-guided periocular normalization with a Siamese deep metric learning architecture, initialized via unsupervised autoencoder pretraining, enabling the model to acquire periocular-specific representations before supervised learning. Robustness in this data-limited clinical setting is enhanced through a combination of constrained periocular augmentation, dropout-based regularization, L2 weight decay, validation-guided checkpoint selection, staged hard-negative mining, validation-weighted multi-seed ensemble learning, and bootstrap-based threshold calibration. Experimental evaluation demonstrates recognition rates of 96.08% on the test set. These results indicate that the proposed framework maintains discriminative periocular identity representations under post-surgical appearance variation while remaining robust in a limited-data clinical setting. Full article

(This article belongs to the Special Issue Artificial Intelligence for Signal, Image, and Multimodal Data Processing: Algorithms, Models, and Knowledge Extraction)

30 pages, 6607 KB

Open AccessArticle

Beta Normalization Aggregation-Based Ensemble Learning for Lung Cancer Classification: Evaluation on CT and Histopathological Images

by Mobarak Abumohsen, Enrique Costa-Montenegro, Silvia García-Méndez, Amani Yousef Owda and Majdi Owda

Appl. Sci. 2026, 16(12), 6224; https://doi.org/10.3390/app16126224 (registering DOI) - 20 Jun 2026

Abstract

The early and accurate detection of lung cancer (LC) is one of the primary challenges in the clinical diagnostics process, which plays a vital role in the treatment of the disease. Although various deep learning (DL) techniques have been presented, the existing DL methods are mainly focused on single-modal images, either computed tomography (CT) or histopathological images, which are associated with poor generalization, diversity, and applicability. To mitigate the existing issues, the present work aims to develop a modality-independent ensemble DL framework that is independently evaluated on CT and histopathological image datasets for LC classification. In this work, the proposed framework was developed using the Beta Normalization Aggregation (BNA) technique, where the performance of three state-of-the-art pre-trained convolutional neural network (CNN) architectures was compared on two distinct imaging modalities images. Based on the comparative analysis of the performance metrics, Xception, DenseNet121, and MobileNetV2, are chosen to develop the Ensemble model. Predictions generated by the selected CNN models are aggregated using the proposed BNA strategy to improve classification robustness, which improves the confidence of the prediction results and discriminative capabilities. The experiments using public data sets have confirmed the excellent performance of the model. On the CT dataset, the proposed BNA Ensemble achieved a testing accuracy of 97.45%, with a precision of 97.88%, recall of 97.45%, F1-score of 97.45%, and an AUC of 0.9986. On the histopathological dataset, the framework achieved an accuracy of 99.80%, with precision, recall, and F1-score all reaching 99.80%, and an AUC of 1.0000. These results demonstrate the effectiveness, robustness, and generalizability of the proposed BNA framework. The analysis of the results using t-SNE plots, confusion matrices, ROC curves, and confidence distributions provided additional insights into feature separability, classification performance, and prediction confidence of the proposed framework. Full article

(This article belongs to the Special Issue Applications of Modern Medical Technologies Combined with Artificial Intelligence)

►▼ Show Figures

Figure 1

26 pages, 1846 KB

Open AccessArticle

Cross-Sensor and Cross-Population Generalization of Deep Learning Models for Digital Mammography: A Controlled Four-Country Benchmark of Five Backbone Architectures with Statistical Significance Testing

by Somprasonk Gabbualoy, Pattarapong Phasukkit and Supan Tungjitkusolmun

Sensors 2026, 26(12), 3911; https://doi.org/10.3390/s26123911 (registering DOI) - 19 Jun 2026

Viewed by 79

Abstract

Background/Objectives: Deep learning models for digital mammography sensor data are increasingly deployed across hospitals using different X-ray detector technologies and patient populations. Whether models trained on one sensor platform and population maintain accuracy when transferred to another has not been tested for the latest generation of mammography-specific foundation models under one controlled protocol. Methods: We fine-tuned five backbone architectures (ResNet-50, DINOv2-B14, Rad-DINO, Mammo-CLIP B5, and Mammo-FM) on CBIS-DDSM (film-digitized, USA, n = 714 validation) with three seeds, ablated a density-aware focal loss across three auxiliary weights, and evaluated transfer to three external sensor cohorts: CMMD (full-field digital, China, n = 1032), DMID (mixed digital, India, n = 509), and MIAS (film-digitized, UK, n = 322). Significance used paired DeLong z-tests with Benjamini–Hochberg FDR correction; temperature scaling tested post hoc recalibration at all transfer targets. Results: Within this single-source three-seed evaluation, ResNet-50 outperformed all four foundation models on CBIS-DDSM (AUC 0.867 vs. 0.847, 0.846, 0.813, and 0.703; all gaps p_adj < 0.05). The density-aware focal loss degraded both AUC and calibration at every weight tested. At transfer, every model lost 0.165 to 0.320 AUC points relative to in-distribution performance, with sensitivity at 95% specificity collapsing from 0.31 to 0.47 in-distribution to 0.11 to 0.22 across the three external targets. A per-seed Stouffer meta-analysis confirms that Mammo-CLIP B5 and Mammo-FM significantly outperformed ResNet-50 on DMID and Mammo-CLIP on CMMD, after BH-FDR; MIAS comparisons remained directional only. In the extremely dense subgroup (BI-RADS D4), Mammo-FM reached AUC 0.870 versus ResNet-50 at 0.842, a directional observation whose 95% CIs overlap heavily at the n = 140 sample size and which we do not interpret as a statistically supported advantage. Conclusions: In this single training-source, three-seed protocol, mammography-specific pretraining did not deliver the in-distribution AUC premium reported in the originating papers, and no architecture reached a level at which transfer deployment without local validation would be defensible. We frame these as observations specific to the present protocol rather than as broader conclusions about foundation models for mammography classification. The findings argue for sensor-stratified and population-stratified external validation and for local recalibration as practical prerequisites before clinical use. Code and weights are released under MIT license. Full article

(This article belongs to the Special Issue Medical Imaging: Artificial Intelligence, Image Recognition, and Machine Learning Techniques (2nd Edition）)

28 pages, 2199 KB

Open AccessArticle

Deep Learning Models for Defect Identification in Oryza sativa Rice Grains: A Comparative Study

by Yasiel Pérez Vera, Melissa Kristel Chambi Flores, Santiago Alonso Avilés Córdova, Irvin Estuardo Cazorla Macedo, Percy Aarón Luján Biamonte and Edgardo Alfredo Rivero Callohuanca

AgriEngineering 2026, 8(6), 252; https://doi.org/10.3390/agriengineering8060252 (registering DOI) - 19 Jun 2026

Viewed by 54

Abstract

Manual classification of rice grain defects remains a persistent challenge in the Peruvian rice industry, as it relies heavily on human inspection, leading to variability, inconsistency, and reduced efficiency when processing large volumes of product. This study evaluates the effectiveness of transfer learning and convolutional neural networks (CNNs) for the automatic classification of four rice grain categories relevant to quality assessment: Whole, Stained, Broken, and Chalky. A dataset comprising 6599 RGB images was employed. To ensure a reliable evaluation protocol, the dataset was first partitioned into training (70%), validation (15%), and test (15%) subsets, after which data augmentation was independently applied within each partition to balance class distributions. Five pretrained CNN architectures were evaluated: MobileNetV2, EfficientNetB0, ResNet50, DenseNet121, and InceptionV3, all of which share a common classification head. Models were trained using transfer learning and early stopping based on validation loss. Performance was assessed using accuracy, precision, recall, F1-score, confusion matrices, 95% confidence intervals, and pairwise McNemar statistical tests. The results showed that ResNet50 achieved the highest classification accuracy (84.71%), followed by EfficientNetB0 (83.60%) and DenseNet121 (83.20%). Statistical analysis indicated that performance differences among the top-performing architectures were relatively small, with significant differences observed only for selected model pairs. Across all evaluated models, the discrimination between Whole and Chalky grains remained the most challenging classification task due to their high visual similarity. Overall, the findings demonstrate that transfer learning-based CNNs provide an effective and scalable approach for automated rice grain defect identification and quality assessment in agricultural environments. Full article

(This article belongs to the Special Issue Computer Vision for Smart Agriculture)

22 pages, 3060 KB

Open AccessSystematic Review

Dose-Response Effect of Oral Caffeine Use on Aerobic Exercise Performance: A Systematic Review and Meta-Analysis

by Gabriel L. Martins, Juliana M. Aparecido, Marcelo L. Marquezi, Caroline S. Frientes, Leonardo R. Miedes, Matheus S. Fornel, Tiago Fernandes and Antônio Herbert Lancha

Nutrients 2026, 18(12), 1989; https://doi.org/10.3390/nu18121989 - 19 Jun 2026

Viewed by 261

Abstract

Background/Objective: Caffeine has demonstrated ergogenic effects across various doses (2–9 mg·kg⁻¹). However, aerobic responses to caffeine vary substantially, with time-trial performance ranging from ~–3% to +16%. Given that higher doses may increase adverse effects without clear additional benefits, this review examined the effects of low (≤3 mg·kg⁻¹), moderate (4–6 mg·kg⁻¹), and high (>6 mg·kg⁻¹) caffeine doses on time-trial performance. Methods: A systematic review and meta-analysis of randomized, placebo-controlled trials was conducted using PubMed, Embase, and Virtual Health Library databases. Eligible studies included healthy adults (18–59 years) acutely ingesting oral anhydrous caffeine before aerobic time-trial tests, with performance outcomes measured exclusively as time-to-completion variables. Data were pooled using standardized mean differences (SMDs) and 95% confidence intervals under random-effects models, and risk of bias was assessed using the Cochrane Risk of Bias tool. Results: Forty-eight studies (689 participants) met the inclusion criteria. Both low and moderate caffeine doses significantly reduced time-trial completion time relative to placebo. Low doses produced a standardized mean difference of −0.27 (95% CI: −0.44 to −0.11; p = 0.001), whereas moderate doses resulted in an SMD of −0.52 (95% CI: −0.77 to −0.28; p < 0.0001). No studies evaluating high caffeine doses (>6 mg·kg⁻¹) and reporting time-to-completion outcomes met the inclusion criteria. Subgroup analyses demonstrated similar ergogenic effects in both trained and highly trained individuals consuming moderate caffeine doses. Conclusions: This is the first meta-analysis specifically focused on aerobic time-trial performance to suggest that pre-exercise ingestion of low caffeine doses (1.3–3 mg·kg⁻¹) may enhance endurance performance by reducing time-trial completion time. Notably, the use of moderate caffeine doses (4–6 mg·kg⁻¹) appears to produce a more consistent ergogenic effect. Full article

(This article belongs to the Special Issue Individualised Caffeine Use in Sport and Exercise)

►▼ Show Figures

Figure 1

15 pages, 998 KB

Open AccessArticle

Perceived Exertion Is Associated with Cardiovascular Strain but Not Glycemic Response to Gym-Based Exercise in Adults with Type 1 Diabetes: An Exploratory Randomized Crossover Trial

by José Adevalton Feitosa Gomes, Anthony Rodrigues de Vasconcelos, José Roberto Andrade do Nascimento Júnior, Ysadora Verena Ribeiro de Souza, Fabiana Oliveira dos Santos Camatari, Bruno Bavaresco Gambassi, Manoel da Cunha Costa, Paulo Adriano Schwingel and Jorge Luiz de Brito Gomes

Int. J. Environ. Res. Public Health 2026, 23(6), 814; https://doi.org/10.3390/ijerph23060814 (registering DOI) - 19 Jun 2026

Viewed by 67

Abstract

Adults with type 1 diabetes mellitus (T1DM) face elevated cardiovascular risk, and regular exercise is a key non-pharmacological mitigation strategy. However, safe prescription requires cardiovascular and glycemic monitoring, often unfeasible in real-world gyms. Low-cost psychophysiological tools (ratings of perceived exertion—RPE and enjoyment) may offer practical alternatives. This exploratory randomized crossover trial examined whether post-session RPE and enjoyment are associated with acute heart rate (HR) and capillary blood glucose (BG) responses to gym-based aerobic and resistance training. Twelve adults with T1DM (29.8 ± 7.8 years; HbA1c 7.7 ± 1.6%; LDL-c 119.5 ± 24.4 mg/dL) completed three ~30 min sessions: aerobic interval training (AE) and two resistance protocols (STA, STB). HR and BG were measured pre-, immediately post-, and 20 min post-exercise; RPE and enjoyment, post-session. Multiple linear regression, controlling for exercise session type, examined associations of RPE and enjoyment with resting HR, BG, and percentage of heart rate reserve (%HR). RPE was higher after STA and STB than AE (p < 0.001; η²p = 0.529), while enjoyment and %HR were similar across sessions. Neither variable was associated with resting HR or BG (all adjusted R² < 0; all p > 0.05). Controlling for exercise session type, RPE was a significant positive predictor of %HR (β = 0.44, p = 0.044), whereas enjoyment was not (β = −0.06, p = 0.719); however, the overall %HR model did not reach statistical significance (adjusted R² =0.119; F(4,31) = 2.183; p = 0.094). These exploratory findings suggest that RPE, but not enjoyment, may serve as a low-cost adjunct intensity marker to inform exercise prescription in adults with T1DM at elevated cardiovascular risk; however, replication in larger samples is needed before clinical recommendations can be drawn. Direct BG monitoring remains essential for safety. Full article

(This article belongs to the Special Issue Health Effects of Physical Activity and Exercise in People at Risk for Cardiovascular Diseases)

►▼ Show Figures

Graphical abstract

29 pages, 2144 KB

Open AccessArticle

A Lightweight Temporal Convolutional Network for Contactless SPPB-Aligned Functional Fall-Risk Stratification in Older Adults Using Monocular RGB Video

by Kai-Chih Lin, Rong-Jong Wai and Hung-Yu Chang Chien

Sensors 2026, 26(12), 3894; https://doi.org/10.3390/s26123894 (registering DOI) - 18 Jun 2026

Viewed by 201

Abstract

Falls among older adults remain a major public health concern, yet scalable and interpretable sensing approaches for functional fall-risk stratification remain limited. This study presents a lightweight contactless framework for five-level Short Physical Performance Battery (SPPB)-aligned functional fall-risk stratification using monocular RGB video. A total of 688 community-dwelling older adults completed SPPB-aligned assessments, including balance, five-times sit-to-stand, and 3 m gait tasks. Because prospective fall-event outcomes were unavailable, supervised labels were constructed from a pre-specified SPPB-aligned functional risk index rather than observed future falls. BlazePose-based two-dimensional keypoints were extracted, normalized using pelvis-centered and height-scaled transformations, and represented as temporal skeletal trajectories. Biomechanical descriptors were fused with embeddings from the proposed Temporal Convolutional Artificial Intelligence Fall-Risk Network (TCAI-FallNet). Participant-level data partitioning was used to reduce data leakage. TCAI-FallNet achieved a macro-averaged area under the curve of 0.91 and an overall accuracy of 81.3%. The trained model had a footprint under 3 MB, and TCN inference latency was below 20 ms per sequence under workstation-based evaluation. These findings suggest that TCAI-FallNet may support contactless SPPB-aligned functional mobility risk stratification, while prospective fall-event validation remains necessary. Full article

(This article belongs to the Topic Innovation, Communication and Engineering, 2nd Edition)

21 pages, 698 KB

Open AccessArticle

Automatic Diacritization Models for a High-Population Low-Resource African Language (Yorùbá)

by Joshua I. Ayoola and Peter O. Olukanmi

Appl. Sci. 2026, 16(12), 6195; https://doi.org/10.3390/app16126195 (registering DOI) - 18 Jun 2026

Viewed by 91

Abstract

Diacritization is an essential part of the reading and writing of text in Yorùbá, a widely-spoken tonal language in West Africa and some parts of the American continent. Unfortunately, typical computer-typed texts are not diacritized. Thus, automatic diacritization is a critical issue in Yorùbá natural language processing (NLP), since missing tone marks and underdots affect text comprehension, translation and speech technology. This paper begins by reviewing the state of the art. While there is a paucity of Yorùbá diacritization models, four models found were studied to explore their performances using the standardised Yorùbá Automatic Diacritization Dataset: the 2018 Volta Baseline, the mT5_base_yoruba_adr, GPT-5.2 and Gemini 3.1 Pro. We measured the performance based on a set of metrics: Word Error Rate (WER), Character Error Rate (CER), Diacritization Error Rate (DER), Word Diacritization Error Rate (WDER), BLEU and ChrF, using the complete diacritic removal condition of the YAD test set. To ensure reproducibility, the LLM evaluations were conducted via the respective official APIs and AI Studio with pinned snapshots and deterministic settings, with each model evaluated across three independent full-dataset runs. The findings showed that the specialised mT5_base_yoruba_adr model slightly outperforms the LLMs, achieving the lowest error rates of 34.85% CER, 18.34% WER, 43.37% DER and 18.33% WDER, as well as a BLEU of 0.6872 and ChrF of 0.8436. Gemini 3.1 Pro ranked second across all error rate metrics with 35.68% CER, 18.96% WER, and 44.84% DER but outperformed mT5 by a small margin on ChrF (0.8469), followed by GPT-5.2 with 54.01% CER, 38.05% WER, and 62.64% DER. The Volta Baseline built on the early seq2seq showed the weakest performance with 92.37% CER and 94.42% DER. These results challenge the assumption that large parameter count and massive pre-training guarantee superior performance in low-resource language tasks and show that targeted fine-tuning on Yorùbá-specific data remains important. Our work serves as a reference for researchers seeking an overview of the state of the art, as well as a detailed and reproducible evaluation of existing models. The results highlight methodological progress and gaps in current systems. Addressing these gaps will require domain-adaptive fine-tuning, improved algorithms, and robust datasets to advance the state-of-the-art in African-language automatic diacritization research. Full article

(This article belongs to the Special Issue Natural Language Processing (NLP): Technologies and Applications)

46 pages, 5318 KB

Open AccessArticle

Towards a Better Characterization of Adversarial Attacks in Geospatial Imagery

by Veet Zaveri and Arun S. Maiya

Remote Sens. 2026, 18(12), 2041; https://doi.org/10.3390/rs18122041 - 18 Jun 2026

Viewed by 104

Abstract

Manipulated satellite imagery threatens analytic workflows, policy decisions, and trust in geospatial intelligence. Operational systems increasingly benefit from capabilities for both manipulation detection and manipulation-family attribution to support verification, triage, and downstream analysis. We present a unified benchmark for characterizing three representative manipulation families in geospatial imagery—generative manipulations, pixel-level perturbations, and adversarial patches—using a controlled, class-balanced design and 20 modern vision architectures spanning conventional, Earth-observation-pretrained, and vision-language models. Across architectures, the dominant failure boundary is between authentic imagery and subtle pixel-level perturbations, whereas generative manipulations and adversarial patches are generally more separable under matched in-domain conditions. Additional analyses reveal important generalization limitations under unseen manipulation variants and external-domain transfer, demonstrating that strong benchmark performance does not necessarily translate to reliable operational screening. The framework also enables systematic comparison of unified multi-attack and specialized detection strategies, providing insight into their relative strengths and limitations. Rather than proposing a new defense, this work provides a reproducible methodology for characterizing manipulation artifacts, model failure modes, and deployment-relevant screening behavior in geospatial imagery, with applications to analyst triage, verification workflows, and trustworthy use of satellite data. Full article

22 pages, 5647 KB

Open AccessArticle

LiquidGAN for Handwriting-Based Detection and Severity Classification of Extrapyramidal Symptoms

by Erandhi M. Liyanage, Chun-Hung Lee, Wen-Yen Chang, Andrew An-Zhe Lee, Guan-Hsiung Liaw, Wu-Chuan Yang, Yu-Hsin Liu, Kun-Chan Lan and Sai Ho Ling

Sensors 2026, 26(12), 3890; https://doi.org/10.3390/s26123890 (registering DOI) - 18 Jun 2026

Viewed by 254

Abstract

Extrapyramidal symptoms (EPS) are motor side effects commonly induced by antipsychotic medications and can lead to measurable changes in handwriting patterns. These symptoms affect both the spatial and temporal characteristics of writing, including stroke thickness, direction and the rate of directional change. To model these complex variations, we propose a novel Liquid Generative Adversarial Network (LiquidGAN), which combines the adaptive dynamics of liquid neural networks with the data generation capability of GANs. Handwriting data were collected from 94 patients with confirmed EPS and 30 healthy controls using Archimedean spiral patterns drawn with both hands. A total of 211 images were processed for both binary and multiclass classification using a pretrained ResNet50 model. The pretrained ResNet50 achieved 92% accuracy and 97% precision in the binary classification task; however, its performance dropped significantly to 57% accuracy in multiclass classification, indicating limited capability in capturing fine-grained EPS severity variations. In contrast, the proposed LiquidGAN demonstrated excellent performance in the binary classification task, achieving 97% accuracy and 98% precision. More importantly, LiquidGAN substantially outperformed the baseline in the more challenging multiclass setting, achieving 70% accuracy and precision across four classes (mild, moderate, severe, and control). This shows that the diverse dataset from the liquidGAN significantly improves the HOG-ANN classification and effectively captures complex and subtle handwriting variations associated with different EPS severity levels that conventional models such as ResNet50 fail to distinguish. In addition, LiquidGAN generated diverse and realistic synthetic handwriting samples, yielding improved Fréchet Inception Distance (FID), precision, and recall compared with style GAN. These findings demonstrate that handwriting biomarkers, when analyzed through dynamic generative learning, offer an effective and non-invasive approach for monitoring extrapyramidal side effects in clinical settings. Full article

(This article belongs to the Special Issue Sensors for Human Health Monitoring Based on Biomedical Signals: From New Perception to Intelligent Diagnosis)

►▼ Show Figures

Figure 1

14 pages, 274 KB

Open AccessArticle

Image-Based Classification of Ship Hull Cleanliness Based on Transfer Learning

by Piotr Ściegienka, Łukasz Wróbel, Daniel Dąbrowski, Marcin Michalak, Dawid Macha, Marek Sikora, Tomasz Borowik and Tomasz Hartwig

Appl. Syst. Innov. 2026, 9(6), 130; https://doi.org/10.3390/asi9060130 - 18 Jun 2026

Viewed by 96

Abstract

Fouling on ship hulls increases hydrodynamic drag, fuel consumption, and emissions. This, in turn, necessitates the development of efficient methods for side cleaning and inspection. This work focuses on the application of image-based classification to assess the cleanliness of the surface of the hull in robotic cleaning systems, with respect to the ISO 8501-4 standard. Due to limited data availability, transfer learning techniques using pre-trained convolutional neural networks (ResNet50, EfficientNetB0 and MobileNetV2) were used. Both end-to-end models and hybrid approaches that combine deep feature extraction with XGBoost classification were evaluated. Experiments were carried out on binary classification (cleaned vs. uncleaned surfaces) and multi-class classification of cleanliness levels (WA1, WA2, WA2.5). The results show that transfer learning enables effective recognition of cleaning status, achieving high performance for binary classification despite a small dataset. However, multi-class classification remains challenging due to subtle differences between classes and data limitations. The proposed approach supports automated visual inspection of underwater robotic platforms and represents a step toward objective standards-based assessment of hull cleaning processes. Full article

(This article belongs to the Special Issue Autonomous Robotics and Hybrid Intelligent Systems)

43 pages, 4497 KB

Open AccessArticle

OATS-RS: Ontology-Aware Adaptive and Selective Zero-Shot Scene Classification for Remote Sensing

by János Horváth

Remote Sens. 2026, 18(12), 2038; https://doi.org/10.3390/rs18122038 - 18 Jun 2026

Viewed by 255

Abstract

Zero-shot remote sensing is attractive for scene classification because new regions, sensors, and label taxonomies often appear before sufficient annotated data are available for supervised adaptation. We present OATS-RS, an inference-centric framework that keeps a remote sensing vision–language model (VLM) backbone frozen and improves zero-shot decisions through ontology-aware prompt construction, hierarchical and contrastive scoring, adaptive multi-view aggregation, unlabeled transductive refinement, ambiguity-aware local re-ranking, and selective prediction. The method targets the common remote sensing regime in which neighboring classes such as annual crop, permanent crop, forest, pasture, herbaceous vegetation, river, and sea or lake overlap strongly in red–green–blue (RGB) appearance, meaning that they require more than a single class-name prompt. On the supplied final EuroSAT RGB evaluation with a GeoRSCLIP Contrastive Language–Image Pre-training (CLIP)-family Vision Transformer Base with 32 × 32-pixel patches (ViT-B-32) backbone, the complete pipeline obtains top-1 accuracy of 0.522, balanced accuracy of 0.522, macro-averaged F1 score (macro-F1) of 0.535, and top-3 accuracy of 0.887. The strongest classes are industrial area, residential area, river, highway, and pasture, whereas the weakest classes remain herbaceous vegetation and several fine-grained vegetation categories. Selective prediction increases accepted-example accuracy to 0.538 at 0.934 coverage, but the expected calibration error (ECE) remains high at 0.384. These results support a qualified conclusion: ontology-guided zero-shot inference can already recover useful semantic shortlists for structured remote-sensing scenes, but fine-grained natural-class disambiguation, calibrated confidence, multi-dataset transfer, component-level ablations, and measured runtime remain essential before dependable deployment claims can be made. Full article

►▼ Show Figures

Figure 1

27 pages, 15972 KB

Open AccessArticle

A Dual-Branch Detector Based on the Multi-Granularity Dynamic Selection Mechanism for Remote Sensing Incremental Detection

by Shixi Li, Weiji Wang, Yousheng Xu, Wei Yao and Shengzhou Xu

Remote Sens. 2026, 18(12), 2032; https://doi.org/10.3390/rs18122032 - 18 Jun 2026

Viewed by 161

Abstract

In practical remote sensing object detection tasks, the application of deep learning approaches often takes the form of incremental learning: when the application includes new target types that were not encountered during training, a pre-trained model must acquire new knowledge without suffering catastrophic forgetting. Among the various techniques proposed, knowledge distillation (KD)-based regularization has proven to be one of the most effective methods. Current KD-based approaches primarily focus on addressing inter-task confusion and optimizing feature selection during distillation processes. In this paper, we propose a dual-branch detector-independent learning framework and a multi-granularity dynamic selection strategy. The former decouples detection tasks for old and new classes to mitigate inter-class confusion, while the latter is a novel, exquisitely designed distillation mechanism that ensures precise transfer of critical old-class information. Moreover, we apply a DIST loss that aligns both inter-class and intra-class relations, further enhancing the fidelity of old-class knowledge transfer. Experiments on the DIOR and DOTA datasets demonstrate that our method significantly outperforms state-of-the-art incremental-learning approaches for remote-sensing object detection and exhibits good robustness under different remote-sensing scenarios. Full article

(This article belongs to the Special Issue Object Detection in Remote Sensing Images Based on Artificial Intelligence)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 128.

Go to page 1 2 3 4 5

Search Results (6,389)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI