Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (6,389)

Search Parameters:
Keywords = pre-training models

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
14 pages, 6425 KB  
Article
Improving Entity Understanding for Vision-Language Pre-Training via Active Learning
by Qunbo Wang, Sen Zhang, Boxuan Shao, Xize Guo, Jiayong An, Chao Fan, Yuanjun Jing, Junxian Li and Wenjun Wu
Big Data Cogn. Comput. 2026, 10(6), 198; https://doi.org/10.3390/bdcc10060198 (registering DOI) - 22 Jun 2026
Abstract
Although many researchers use pre-trained models to better solve downstream tasks, further exploration of more effective pre-training methods remains necessary, especially for multi-modal pre-training where high-quality training data is more difficult to obtain. This work aims to improve the knowledge-learning performance in multi-modal [...] Read more.
Although many researchers use pre-trained models to better solve downstream tasks, further exploration of more effective pre-training methods remains necessary, especially for multi-modal pre-training where high-quality training data is more difficult to obtain. This work aims to improve the knowledge-learning performance in multi-modal pre-training. Some researchers focus on injecting entity knowledge into language pre-trained models based on masked entity model (MEM) training, which masks entities randomly and lets the model recover. These methods cannot guarantee good performance due to the lack of consideration of which entities are more valuable for learning. Moreover, in multi-modal training data, some entities may be unrelated to visual content. In this work, for the vision-language pre-trained model, we propose a Masked Entity Model pre-training method based on Active learning (ActiveMEM). It is designed to actively mask important and informative entities—those that are both informative and uncertain—for the model to recover, thereby encouraging it to extract more valuable knowledge from the data. The proposed method is evaluated using three pre-training datasets and four downstream datasets, and the experimental results demonstrate the effectiveness of our method. Full article
Show Figures

Figure 1

21 pages, 1456 KB  
Article
A Camera-Based Multimodal Defect Sensing Framework for Substation Equipment Monitoring via Cross-Modal Feature Mapping
by Ziquan Liu, Hai Xue, Chengbo Hu, Chao Wei and Can Zhang
Sensors 2026, 26(12), 3935; https://doi.org/10.3390/s26123935 (registering DOI) - 21 Jun 2026
Abstract
To address the limitations of vision-only defect detection, image–semantic misalignment, and spatial-logic conflicts in complex substation inspection scenarios, this paper proposes a camera-sensor-based multimodal defect sensing framework with cross-modal feature mapping for substation equipment monitoring. The proposed framework integrates field inspection images acquired [...] Read more.
To address the limitations of vision-only defect detection, image–semantic misalignment, and spatial-logic conflicts in complex substation inspection scenarios, this paper proposes a camera-sensor-based multimodal defect sensing framework with cross-modal feature mapping for substation equipment monitoring. The proposed framework integrates field inspection images acquired by camera sensors, defect textual descriptions, and equipment topology knowledge and establishes a unified domain-adaptive pre-training–bidirectional cross-modal mapping–hierarchical reasoning workflow. First, a Contrastive Language–Image Pre-training (CLIP)-based domain-adaptive pre-training strategy is developed to enhance the representation of equipment categories, defect attributes, and inspection-scene semantics. Second, a bidirectional cross-modal feature mapping network is constructed to model fine-grained interactions between candidate visual regions and textual semantics, where uncertainty-aware fusion and prototype constraints are introduced to improve semantic alignment and defect discrimination. Third, a hierarchical neuro-symbolic reasoning module incorporates equipment topology and spatial rules for posterior verification, logical consistency checking, and false-positive suppression. Experiments on a substation inspection image dataset demonstrate that the proposed method achieves 90.8% mAP@0.5, 68.7% mAP@0.5:0.95, and 89.4% F1-score, outperforming mainstream and recent detection models. Full article
34 pages, 11535 KB  
Article
EASE-PVNet: Robust Periocular Identity Verification Across Pre- and Post-Operative Facial Images
by Ziyad Azzaz, Omar Khaled, Esraa Khatab, Hany Said and Omar Shalash
Mach. Learn. Knowl. Extr. 2026, 8(6), 169; https://doi.org/10.3390/make8060169 (registering DOI) - 21 Jun 2026
Abstract
Identity verification across pre-operative and post-operative facial images remains a challenging task, particularly following eyelid surgery, where localized periocular changes can disrupt conventional face recognition systems. This research introduces a novel verification framework using an ensemble-based autoencoder-initialized Siamese eye-region periocular verification network designed [...] Read more.
Identity verification across pre-operative and post-operative facial images remains a challenging task, particularly following eyelid surgery, where localized periocular changes can disrupt conventional face recognition systems. This research introduces a novel verification framework using an ensemble-based autoencoder-initialized Siamese eye-region periocular verification network designed to remain resilient to surgically induced appearance variation. The proposed approach integrates anatomy-guided periocular normalization with a Siamese deep metric learning architecture, initialized via unsupervised autoencoder pretraining, enabling the model to acquire periocular-specific representations before supervised learning. Robustness in this data-limited clinical setting is enhanced through a combination of constrained periocular augmentation, dropout-based regularization, L2 weight decay, validation-guided checkpoint selection, staged hard-negative mining, validation-weighted multi-seed ensemble learning, and bootstrap-based threshold calibration. Experimental evaluation demonstrates recognition rates of 96.08% on the test set. These results indicate that the proposed framework maintains discriminative periocular identity representations under post-surgical appearance variation while remaining robust in a limited-data clinical setting. Full article
30 pages, 6607 KB  
Article
Beta Normalization Aggregation-Based Ensemble Learning for Lung Cancer Classification: Evaluation on CT and Histopathological Images
by Mobarak Abumohsen, Enrique Costa-Montenegro, Silvia García-Méndez, Amani Yousef Owda and Majdi Owda
Appl. Sci. 2026, 16(12), 6224; https://doi.org/10.3390/app16126224 (registering DOI) - 20 Jun 2026
Abstract
The early and accurate detection of lung cancer (LC) is one of the primary challenges in the clinical diagnostics process, which plays a vital role in the treatment of the disease. Although various deep learning (DL) techniques have been presented, the existing DL [...] Read more.
The early and accurate detection of lung cancer (LC) is one of the primary challenges in the clinical diagnostics process, which plays a vital role in the treatment of the disease. Although various deep learning (DL) techniques have been presented, the existing DL methods are mainly focused on single-modal images, either computed tomography (CT) or histopathological images, which are associated with poor generalization, diversity, and applicability. To mitigate the existing issues, the present work aims to develop a modality-independent ensemble DL framework that is independently evaluated on CT and histopathological image datasets for LC classification. In this work, the proposed framework was developed using the Beta Normalization Aggregation (BNA) technique, where the performance of three state-of-the-art pre-trained convolutional neural network (CNN) architectures was compared on two distinct imaging modalities images. Based on the comparative analysis of the performance metrics, Xception, DenseNet121, and MobileNetV2, are chosen to develop the Ensemble model. Predictions generated by the selected CNN models are aggregated using the proposed BNA strategy to improve classification robustness, which improves the confidence of the prediction results and discriminative capabilities. The experiments using public data sets have confirmed the excellent performance of the model. On the CT dataset, the proposed BNA Ensemble achieved a testing accuracy of 97.45%, with a precision of 97.88%, recall of 97.45%, F1-score of 97.45%, and an AUC of 0.9986. On the histopathological dataset, the framework achieved an accuracy of 99.80%, with precision, recall, and F1-score all reaching 99.80%, and an AUC of 1.0000. These results demonstrate the effectiveness, robustness, and generalizability of the proposed BNA framework. The analysis of the results using t-SNE plots, confusion matrices, ROC curves, and confidence distributions provided additional insights into feature separability, classification performance, and prediction confidence of the proposed framework. Full article
Show Figures

Figure 1

26 pages, 1846 KB  
Article
Cross-Sensor and Cross-Population Generalization of Deep Learning Models for Digital Mammography: A Controlled Four-Country Benchmark of Five Backbone Architectures with Statistical Significance Testing
by Somprasonk Gabbualoy, Pattarapong Phasukkit and Supan Tungjitkusolmun
Sensors 2026, 26(12), 3911; https://doi.org/10.3390/s26123911 (registering DOI) - 19 Jun 2026
Viewed by 79
Abstract
Background/Objectives: Deep learning models for digital mammography sensor data are increasingly deployed across hospitals using different X-ray detector technologies and patient populations. Whether models trained on one sensor platform and population maintain accuracy when transferred to another has not been tested for the [...] Read more.
Background/Objectives: Deep learning models for digital mammography sensor data are increasingly deployed across hospitals using different X-ray detector technologies and patient populations. Whether models trained on one sensor platform and population maintain accuracy when transferred to another has not been tested for the latest generation of mammography-specific foundation models under one controlled protocol. Methods: We fine-tuned five backbone architectures (ResNet-50, DINOv2-B14, Rad-DINO, Mammo-CLIP B5, and Mammo-FM) on CBIS-DDSM (film-digitized, USA, n = 714 validation) with three seeds, ablated a density-aware focal loss across three auxiliary weights, and evaluated transfer to three external sensor cohorts: CMMD (full-field digital, China, n = 1032), DMID (mixed digital, India, n = 509), and MIAS (film-digitized, UK, n = 322). Significance used paired DeLong z-tests with Benjamini–Hochberg FDR correction; temperature scaling tested post hoc recalibration at all transfer targets. Results: Within this single-source three-seed evaluation, ResNet-50 outperformed all four foundation models on CBIS-DDSM (AUC 0.867 vs. 0.847, 0.846, 0.813, and 0.703; all gaps p_adj < 0.05). The density-aware focal loss degraded both AUC and calibration at every weight tested. At transfer, every model lost 0.165 to 0.320 AUC points relative to in-distribution performance, with sensitivity at 95% specificity collapsing from 0.31 to 0.47 in-distribution to 0.11 to 0.22 across the three external targets. A per-seed Stouffer meta-analysis confirms that Mammo-CLIP B5 and Mammo-FM significantly outperformed ResNet-50 on DMID and Mammo-CLIP on CMMD, after BH-FDR; MIAS comparisons remained directional only. In the extremely dense subgroup (BI-RADS D4), Mammo-FM reached AUC 0.870 versus ResNet-50 at 0.842, a directional observation whose 95% CIs overlap heavily at the n = 140 sample size and which we do not interpret as a statistically supported advantage. Conclusions: In this single training-source, three-seed protocol, mammography-specific pretraining did not deliver the in-distribution AUC premium reported in the originating papers, and no architecture reached a level at which transfer deployment without local validation would be defensible. We frame these as observations specific to the present protocol rather than as broader conclusions about foundation models for mammography classification. The findings argue for sensor-stratified and population-stratified external validation and for local recalibration as practical prerequisites before clinical use. Code and weights are released under MIT license. Full article
28 pages, 2199 KB  
Article
Deep Learning Models for Defect Identification in Oryza sativa Rice Grains: A Comparative Study
by Yasiel Pérez Vera, Melissa Kristel Chambi Flores, Santiago Alonso Avilés Córdova, Irvin Estuardo Cazorla Macedo, Percy Aarón Luján Biamonte and Edgardo Alfredo Rivero Callohuanca
AgriEngineering 2026, 8(6), 252; https://doi.org/10.3390/agriengineering8060252 (registering DOI) - 19 Jun 2026
Viewed by 54
Abstract
Manual classification of rice grain defects remains a persistent challenge in the Peruvian rice industry, as it relies heavily on human inspection, leading to variability, inconsistency, and reduced efficiency when processing large volumes of product. This study evaluates the effectiveness of transfer learning [...] Read more.
Manual classification of rice grain defects remains a persistent challenge in the Peruvian rice industry, as it relies heavily on human inspection, leading to variability, inconsistency, and reduced efficiency when processing large volumes of product. This study evaluates the effectiveness of transfer learning and convolutional neural networks (CNNs) for the automatic classification of four rice grain categories relevant to quality assessment: Whole, Stained, Broken, and Chalky. A dataset comprising 6599 RGB images was employed. To ensure a reliable evaluation protocol, the dataset was first partitioned into training (70%), validation (15%), and test (15%) subsets, after which data augmentation was independently applied within each partition to balance class distributions. Five pretrained CNN architectures were evaluated: MobileNetV2, EfficientNetB0, ResNet50, DenseNet121, and InceptionV3, all of which share a common classification head. Models were trained using transfer learning and early stopping based on validation loss. Performance was assessed using accuracy, precision, recall, F1-score, confusion matrices, 95% confidence intervals, and pairwise McNemar statistical tests. The results showed that ResNet50 achieved the highest classification accuracy (84.71%), followed by EfficientNetB0 (83.60%) and DenseNet121 (83.20%). Statistical analysis indicated that performance differences among the top-performing architectures were relatively small, with significant differences observed only for selected model pairs. Across all evaluated models, the discrimination between Whole and Chalky grains remained the most challenging classification task due to their high visual similarity. Overall, the findings demonstrate that transfer learning-based CNNs provide an effective and scalable approach for automated rice grain defect identification and quality assessment in agricultural environments. Full article
(This article belongs to the Special Issue Computer Vision for Smart Agriculture)
22 pages, 3060 KB  
Systematic Review
Dose-Response Effect of Oral Caffeine Use on Aerobic Exercise Performance: A Systematic Review and Meta-Analysis
by Gabriel L. Martins, Juliana M. Aparecido, Marcelo L. Marquezi, Caroline S. Frientes, Leonardo R. Miedes, Matheus S. Fornel, Tiago Fernandes and Antônio Herbert Lancha
Nutrients 2026, 18(12), 1989; https://doi.org/10.3390/nu18121989 - 19 Jun 2026
Viewed by 261
Abstract
Background/Objective: Caffeine has demonstrated ergogenic effects across various doses (2–9 mg·kg−1). However, aerobic responses to caffeine vary substantially, with time-trial performance ranging from ~–3% to +16%. Given that higher doses may increase adverse effects without clear additional benefits, this review examined [...] Read more.
Background/Objective: Caffeine has demonstrated ergogenic effects across various doses (2–9 mg·kg−1). However, aerobic responses to caffeine vary substantially, with time-trial performance ranging from ~–3% to +16%. Given that higher doses may increase adverse effects without clear additional benefits, this review examined the effects of low (≤3 mg·kg−1), moderate (4–6 mg·kg−1), and high (>6 mg·kg−1) caffeine doses on time-trial performance. Methods: A systematic review and meta-analysis of randomized, placebo-controlled trials was conducted using PubMed, Embase, and Virtual Health Library databases. Eligible studies included healthy adults (18–59 years) acutely ingesting oral anhydrous caffeine before aerobic time-trial tests, with performance outcomes measured exclusively as time-to-completion variables. Data were pooled using standardized mean differences (SMDs) and 95% confidence intervals under random-effects models, and risk of bias was assessed using the Cochrane Risk of Bias tool. Results: Forty-eight studies (689 participants) met the inclusion criteria. Both low and moderate caffeine doses significantly reduced time-trial completion time relative to placebo. Low doses produced a standardized mean difference of −0.27 (95% CI: −0.44 to −0.11; p = 0.001), whereas moderate doses resulted in an SMD of −0.52 (95% CI: −0.77 to −0.28; p < 0.0001). No studies evaluating high caffeine doses (>6 mg·kg−1) and reporting time-to-completion outcomes met the inclusion criteria. Subgroup analyses demonstrated similar ergogenic effects in both trained and highly trained individuals consuming moderate caffeine doses. Conclusions: This is the first meta-analysis specifically focused on aerobic time-trial performance to suggest that pre-exercise ingestion of low caffeine doses (1.3–3 mg·kg−1) may enhance endurance performance by reducing time-trial completion time. Notably, the use of moderate caffeine doses (4–6 mg·kg−1) appears to produce a more consistent ergogenic effect. Full article
(This article belongs to the Special Issue Individualised Caffeine Use in Sport and Exercise)
Show Figures

Figure 1

15 pages, 998 KB  
Article
Perceived Exertion Is Associated with Cardiovascular Strain but Not Glycemic Response to Gym-Based Exercise in Adults with Type 1 Diabetes: An Exploratory Randomized Crossover Trial
by José Adevalton Feitosa Gomes, Anthony Rodrigues de Vasconcelos, José Roberto Andrade do Nascimento Júnior, Ysadora Verena Ribeiro de Souza, Fabiana Oliveira dos Santos Camatari, Bruno Bavaresco Gambassi, Manoel da Cunha Costa, Paulo Adriano Schwingel and Jorge Luiz de Brito Gomes
Int. J. Environ. Res. Public Health 2026, 23(6), 814; https://doi.org/10.3390/ijerph23060814 (registering DOI) - 19 Jun 2026
Viewed by 67
Abstract
Adults with type 1 diabetes mellitus (T1DM) face elevated cardiovascular risk, and regular exercise is a key non-pharmacological mitigation strategy. However, safe prescription requires cardiovascular and glycemic monitoring, often unfeasible in real-world gyms. Low-cost psychophysiological tools (ratings of perceived exertion—RPE and enjoyment) may [...] Read more.
Adults with type 1 diabetes mellitus (T1DM) face elevated cardiovascular risk, and regular exercise is a key non-pharmacological mitigation strategy. However, safe prescription requires cardiovascular and glycemic monitoring, often unfeasible in real-world gyms. Low-cost psychophysiological tools (ratings of perceived exertion—RPE and enjoyment) may offer practical alternatives. This exploratory randomized crossover trial examined whether post-session RPE and enjoyment are associated with acute heart rate (HR) and capillary blood glucose (BG) responses to gym-based aerobic and resistance training. Twelve adults with T1DM (29.8 ± 7.8 years; HbA1c 7.7 ± 1.6%; LDL-c 119.5 ± 24.4 mg/dL) completed three ~30 min sessions: aerobic interval training (AE) and two resistance protocols (STA, STB). HR and BG were measured pre-, immediately post-, and 20 min post-exercise; RPE and enjoyment, post-session. Multiple linear regression, controlling for exercise session type, examined associations of RPE and enjoyment with resting HR, BG, and percentage of heart rate reserve (%HR). RPE was higher after STA and STB than AE (p < 0.001; η2p = 0.529), while enjoyment and %HR were similar across sessions. Neither variable was associated with resting HR or BG (all adjusted R2 < 0; all p > 0.05). Controlling for exercise session type, RPE was a significant positive predictor of %HR (β = 0.44, p = 0.044), whereas enjoyment was not (β = −0.06, p = 0.719); however, the overall %HR model did not reach statistical significance (adjusted R2 =0.119; F(4,31) = 2.183; p = 0.094). These exploratory findings suggest that RPE, but not enjoyment, may serve as a low-cost adjunct intensity marker to inform exercise prescription in adults with T1DM at elevated cardiovascular risk; however, replication in larger samples is needed before clinical recommendations can be drawn. Direct BG monitoring remains essential for safety. Full article
Show Figures

Graphical abstract

29 pages, 2144 KB  
Article
A Lightweight Temporal Convolutional Network for Contactless SPPB-Aligned Functional Fall-Risk Stratification in Older Adults Using Monocular RGB Video
by Kai-Chih Lin, Rong-Jong Wai and Hung-Yu Chang Chien
Sensors 2026, 26(12), 3894; https://doi.org/10.3390/s26123894 (registering DOI) - 18 Jun 2026
Viewed by 201
Abstract
Falls among older adults remain a major public health concern, yet scalable and interpretable sensing approaches for functional fall-risk stratification remain limited. This study presents a lightweight contactless framework for five-level Short Physical Performance Battery (SPPB)-aligned functional fall-risk stratification using monocular RGB video. [...] Read more.
Falls among older adults remain a major public health concern, yet scalable and interpretable sensing approaches for functional fall-risk stratification remain limited. This study presents a lightweight contactless framework for five-level Short Physical Performance Battery (SPPB)-aligned functional fall-risk stratification using monocular RGB video. A total of 688 community-dwelling older adults completed SPPB-aligned assessments, including balance, five-times sit-to-stand, and 3 m gait tasks. Because prospective fall-event outcomes were unavailable, supervised labels were constructed from a pre-specified SPPB-aligned functional risk index rather than observed future falls. BlazePose-based two-dimensional keypoints were extracted, normalized using pelvis-centered and height-scaled transformations, and represented as temporal skeletal trajectories. Biomechanical descriptors were fused with embeddings from the proposed Temporal Convolutional Artificial Intelligence Fall-Risk Network (TCAI-FallNet). Participant-level data partitioning was used to reduce data leakage. TCAI-FallNet achieved a macro-averaged area under the curve of 0.91 and an overall accuracy of 81.3%. The trained model had a footprint under 3 MB, and TCN inference latency was below 20 ms per sequence under workstation-based evaluation. These findings suggest that TCAI-FallNet may support contactless SPPB-aligned functional mobility risk stratification, while prospective fall-event validation remains necessary. Full article
(This article belongs to the Topic Innovation, Communication and Engineering, 2nd Edition)
21 pages, 698 KB  
Article
Automatic Diacritization Models for a High-Population Low-Resource African Language (Yorùbá)
by Joshua I. Ayoola and Peter O. Olukanmi
Appl. Sci. 2026, 16(12), 6195; https://doi.org/10.3390/app16126195 (registering DOI) - 18 Jun 2026
Viewed by 91
Abstract
Diacritization is an essential part of the reading and writing of text in Yorùbá, a widely-spoken tonal language in West Africa and some parts of the American continent. Unfortunately, typical computer-typed texts are not diacritized. Thus, automatic diacritization is a critical issue in [...] Read more.
Diacritization is an essential part of the reading and writing of text in Yorùbá, a widely-spoken tonal language in West Africa and some parts of the American continent. Unfortunately, typical computer-typed texts are not diacritized. Thus, automatic diacritization is a critical issue in Yorùbá natural language processing (NLP), since missing tone marks and underdots affect text comprehension, translation and speech technology. This paper begins by reviewing the state of the art. While there is a paucity of Yorùbá diacritization models, four models found were studied to explore their performances using the standardised Yorùbá Automatic Diacritization Dataset: the 2018 Volta Baseline, the mT5_base_yoruba_adr, GPT-5.2 and Gemini 3.1 Pro. We measured the performance based on a set of metrics: Word Error Rate (WER), Character Error Rate (CER), Diacritization Error Rate (DER), Word Diacritization Error Rate (WDER), BLEU and ChrF, using the complete diacritic removal condition of the YAD test set. To ensure reproducibility, the LLM evaluations were conducted via the respective official APIs and AI Studio with pinned snapshots and deterministic settings, with each model evaluated across three independent full-dataset runs. The findings showed that the specialised mT5_base_yoruba_adr model slightly outperforms the LLMs, achieving the lowest error rates of 34.85% CER, 18.34% WER, 43.37% DER and 18.33% WDER, as well as a BLEU of 0.6872 and ChrF of 0.8436. Gemini 3.1 Pro ranked second across all error rate metrics with 35.68% CER, 18.96% WER, and 44.84% DER but outperformed mT5 by a small margin on ChrF (0.8469), followed by GPT-5.2 with 54.01% CER, 38.05% WER, and 62.64% DER. The Volta Baseline built on the early seq2seq showed the weakest performance with 92.37% CER and 94.42% DER. These results challenge the assumption that large parameter count and massive pre-training guarantee superior performance in low-resource language tasks and show that targeted fine-tuning on Yorùbá-specific data remains important. Our work serves as a reference for researchers seeking an overview of the state of the art, as well as a detailed and reproducible evaluation of existing models. The results highlight methodological progress and gaps in current systems. Addressing these gaps will require domain-adaptive fine-tuning, improved algorithms, and robust datasets to advance the state-of-the-art in African-language automatic diacritization research. Full article
(This article belongs to the Special Issue Natural Language Processing (NLP): Technologies and Applications)
46 pages, 5318 KB  
Article
Towards a Better Characterization of Adversarial Attacks in Geospatial Imagery
by Veet Zaveri and Arun S. Maiya
Remote Sens. 2026, 18(12), 2041; https://doi.org/10.3390/rs18122041 - 18 Jun 2026
Viewed by 104
Abstract
Manipulated satellite imagery threatens analytic workflows, policy decisions, and trust in geospatial intelligence. Operational systems increasingly benefit from capabilities for both manipulation detection and manipulation-family attribution to support verification, triage, and downstream analysis. We present a unified benchmark for characterizing three representative manipulation [...] Read more.
Manipulated satellite imagery threatens analytic workflows, policy decisions, and trust in geospatial intelligence. Operational systems increasingly benefit from capabilities for both manipulation detection and manipulation-family attribution to support verification, triage, and downstream analysis. We present a unified benchmark for characterizing three representative manipulation families in geospatial imagery—generative manipulations, pixel-level perturbations, and adversarial patches—using a controlled, class-balanced design and 20 modern vision architectures spanning conventional, Earth-observation-pretrained, and vision-language models. Across architectures, the dominant failure boundary is between authentic imagery and subtle pixel-level perturbations, whereas generative manipulations and adversarial patches are generally more separable under matched in-domain conditions. Additional analyses reveal important generalization limitations under unseen manipulation variants and external-domain transfer, demonstrating that strong benchmark performance does not necessarily translate to reliable operational screening. The framework also enables systematic comparison of unified multi-attack and specialized detection strategies, providing insight into their relative strengths and limitations. Rather than proposing a new defense, this work provides a reproducible methodology for characterizing manipulation artifacts, model failure modes, and deployment-relevant screening behavior in geospatial imagery, with applications to analyst triage, verification workflows, and trustworthy use of satellite data. Full article
22 pages, 5647 KB  
Article
LiquidGAN for Handwriting-Based Detection and Severity Classification of Extrapyramidal Symptoms
by Erandhi M. Liyanage, Chun-Hung Lee, Wen-Yen Chang, Andrew An-Zhe Lee, Guan-Hsiung Liaw, Wu-Chuan Yang, Yu-Hsin Liu, Kun-Chan Lan and Sai Ho Ling
Sensors 2026, 26(12), 3890; https://doi.org/10.3390/s26123890 (registering DOI) - 18 Jun 2026
Viewed by 254
Abstract
Extrapyramidal symptoms (EPS) are motor side effects commonly induced by antipsychotic medications and can lead to measurable changes in handwriting patterns. These symptoms affect both the spatial and temporal characteristics of writing, including stroke thickness, direction and the rate of directional change. To [...] Read more.
Extrapyramidal symptoms (EPS) are motor side effects commonly induced by antipsychotic medications and can lead to measurable changes in handwriting patterns. These symptoms affect both the spatial and temporal characteristics of writing, including stroke thickness, direction and the rate of directional change. To model these complex variations, we propose a novel Liquid Generative Adversarial Network (LiquidGAN), which combines the adaptive dynamics of liquid neural networks with the data generation capability of GANs. Handwriting data were collected from 94 patients with confirmed EPS and 30 healthy controls using Archimedean spiral patterns drawn with both hands. A total of 211 images were processed for both binary and multiclass classification using a pretrained ResNet50 model. The pretrained ResNet50 achieved 92% accuracy and 97% precision in the binary classification task; however, its performance dropped significantly to 57% accuracy in multiclass classification, indicating limited capability in capturing fine-grained EPS severity variations. In contrast, the proposed LiquidGAN demonstrated excellent performance in the binary classification task, achieving 97% accuracy and 98% precision. More importantly, LiquidGAN substantially outperformed the baseline in the more challenging multiclass setting, achieving 70% accuracy and precision across four classes (mild, moderate, severe, and control). This shows that the diverse dataset from the liquidGAN significantly improves the HOG-ANN classification and effectively captures complex and subtle handwriting variations associated with different EPS severity levels that conventional models such as ResNet50 fail to distinguish. In addition, LiquidGAN generated diverse and realistic synthetic handwriting samples, yielding improved Fréchet Inception Distance (FID), precision, and recall compared with style GAN. These findings demonstrate that handwriting biomarkers, when analyzed through dynamic generative learning, offer an effective and non-invasive approach for monitoring extrapyramidal side effects in clinical settings. Full article
Show Figures

Figure 1

14 pages, 274 KB  
Article
Image-Based Classification of Ship Hull Cleanliness Based on Transfer Learning
by Piotr Ściegienka, Łukasz Wróbel, Daniel Dąbrowski, Marcin Michalak, Dawid Macha, Marek Sikora, Tomasz Borowik and Tomasz Hartwig
Appl. Syst. Innov. 2026, 9(6), 130; https://doi.org/10.3390/asi9060130 - 18 Jun 2026
Viewed by 96
Abstract
Fouling on ship hulls increases hydrodynamic drag, fuel consumption, and emissions. This, in turn, necessitates the development of efficient methods for side cleaning and inspection. This work focuses on the application of image-based classification to assess the cleanliness of the surface of the [...] Read more.
Fouling on ship hulls increases hydrodynamic drag, fuel consumption, and emissions. This, in turn, necessitates the development of efficient methods for side cleaning and inspection. This work focuses on the application of image-based classification to assess the cleanliness of the surface of the hull in robotic cleaning systems, with respect to the ISO 8501-4 standard. Due to limited data availability, transfer learning techniques using pre-trained convolutional neural networks (ResNet50, EfficientNetB0 and MobileNetV2) were used. Both end-to-end models and hybrid approaches that combine deep feature extraction with XGBoost classification were evaluated. Experiments were carried out on binary classification (cleaned vs. uncleaned surfaces) and multi-class classification of cleanliness levels (WA1, WA2, WA2.5). The results show that transfer learning enables effective recognition of cleaning status, achieving high performance for binary classification despite a small dataset. However, multi-class classification remains challenging due to subtle differences between classes and data limitations. The proposed approach supports automated visual inspection of underwater robotic platforms and represents a step toward objective standards-based assessment of hull cleaning processes. Full article
(This article belongs to the Special Issue Autonomous Robotics and Hybrid Intelligent Systems)
43 pages, 4497 KB  
Article
OATS-RS: Ontology-Aware Adaptive and Selective Zero-Shot Scene Classification for Remote Sensing
by János Horváth
Remote Sens. 2026, 18(12), 2038; https://doi.org/10.3390/rs18122038 - 18 Jun 2026
Viewed by 255
Abstract
Zero-shot remote sensing is attractive for scene classification because new regions, sensors, and label taxonomies often appear before sufficient annotated data are available for supervised adaptation. We present OATS-RS, an inference-centric framework that keeps a remote sensing vision–language model (VLM) backbone frozen and [...] Read more.
Zero-shot remote sensing is attractive for scene classification because new regions, sensors, and label taxonomies often appear before sufficient annotated data are available for supervised adaptation. We present OATS-RS, an inference-centric framework that keeps a remote sensing vision–language model (VLM) backbone frozen and improves zero-shot decisions through ontology-aware prompt construction, hierarchical and contrastive scoring, adaptive multi-view aggregation, unlabeled transductive refinement, ambiguity-aware local re-ranking, and selective prediction. The method targets the common remote sensing regime in which neighboring classes such as annual crop, permanent crop, forest, pasture, herbaceous vegetation, river, and sea or lake overlap strongly in red–green–blue (RGB) appearance, meaning that they require more than a single class-name prompt. On the supplied final EuroSAT RGB evaluation with a GeoRSCLIP Contrastive Language–Image Pre-training (CLIP)-family Vision Transformer Base with 32 × 32-pixel patches (ViT-B-32) backbone, the complete pipeline obtains top-1 accuracy of 0.522, balanced accuracy of 0.522, macro-averaged F1 score (macro-F1) of 0.535, and top-3 accuracy of 0.887. The strongest classes are industrial area, residential area, river, highway, and pasture, whereas the weakest classes remain herbaceous vegetation and several fine-grained vegetation categories. Selective prediction increases accepted-example accuracy to 0.538 at 0.934 coverage, but the expected calibration error (ECE) remains high at 0.384. These results support a qualified conclusion: ontology-guided zero-shot inference can already recover useful semantic shortlists for structured remote-sensing scenes, but fine-grained natural-class disambiguation, calibrated confidence, multi-dataset transfer, component-level ablations, and measured runtime remain essential before dependable deployment claims can be made. Full article
Show Figures

Figure 1

27 pages, 15972 KB  
Article
A Dual-Branch Detector Based on the Multi-Granularity Dynamic Selection Mechanism for Remote Sensing Incremental Detection
by Shixi Li, Weiji Wang, Yousheng Xu, Wei Yao and Shengzhou Xu
Remote Sens. 2026, 18(12), 2032; https://doi.org/10.3390/rs18122032 - 18 Jun 2026
Viewed by 161
Abstract
In practical remote sensing object detection tasks, the application of deep learning approaches often takes the form of incremental learning: when the application includes new target types that were not encountered during training, a pre-trained model must acquire new knowledge without suffering catastrophic [...] Read more.
In practical remote sensing object detection tasks, the application of deep learning approaches often takes the form of incremental learning: when the application includes new target types that were not encountered during training, a pre-trained model must acquire new knowledge without suffering catastrophic forgetting. Among the various techniques proposed, knowledge distillation (KD)-based regularization has proven to be one of the most effective methods. Current KD-based approaches primarily focus on addressing inter-task confusion and optimizing feature selection during distillation processes. In this paper, we propose a dual-branch detector-independent learning framework and a multi-granularity dynamic selection strategy. The former decouples detection tasks for old and new classes to mitigate inter-class confusion, while the latter is a novel, exquisitely designed distillation mechanism that ensures precise transfer of critical old-class information. Moreover, we apply a DIST loss that aligns both inter-class and intra-class relations, further enhancing the fidelity of old-class knowledge transfer. Experiments on the DIOR and DOTA datasets demonstrate that our method significantly outperforms state-of-the-art incremental-learning approaches for remote-sensing object detection and exhibits good robustness under different remote-sensing scenarios. Full article
Show Figures

Figure 1

Back to TopTop