Saved Queries

Audio deepfake and vocoder fingerprint detectors are increasingly used to identify synthetic speech and attribute it to its generating model. However, their robustness against adversarial perturbations remains unclear across attack algorithms, perturbation domains, detector representations, and vocoder types. This paper presents a focused, quality-aware evaluation of four representative adversarial attacks, namely the Fast Gradient Sign Method (FGSM), Basic Iterative Method (BIM), Projected Gradient Descent (PGD), and Carlini–Wagner (CW) attack, against audio deepfake and vocoder fingerprint detectors. Each attack is implemented in both the waveform domain and the short-time Fourier transform (STFT) magnitude domain. All attacks are optimized against Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks (AASIST) under a targeted fake-to-real objective and are evaluated on synthetic speech generated by HiFi-GAN, Fullband MelGAN, StyleMelGAN, and Parallel WaveGAN. Attack performance is first measured on the source AASIST detector, after which black-box transferability is assessed on three target detector families: ResNet with Linear Frequency Cepstral Coefficient (LFCC) features, LCNN with Constant-Q Cepstral Coefficient (CQCC) features, and a bidirectional long short-term memory (BiLSTM) detector. The results show that adversarial effectiveness depends strongly on perturbation domain and detector representation. STFT-magnitude PGD transfers strongly to LFCC-based ResNet detectors but has limited effect on CQCC-based and recurrent detectors. In contrast, waveform-domain attacks produce broader transferability across feature-based detectors, with different attacks showing distinct ASR–quality trade-offs. Under the chosen waveform-domain budget, FGSM and BIM preserve transcription-level intelligibility while retaining meaningful black-box transferability, whereas CW provides the strongest overall source-detector and black-box attack performance. To distinguish effective adversarial perturbations from destructive signal degradation, we evaluate audio quality and intelligibility using word error rate (WER) and signal-to-noise ratio (SNR). Overall, the findings show that robustness claims in audio deepfake and vocoder fingerprint detection are limited when adversarial perturbations, black-box transferability, and audio quality are jointly considered. Full article

(This article belongs to the Special Issue Adversarial Attacks and Cyber Security)

29 pages, 42377 KB

Open AccessArticle

PG-SalDETR: A Method for Detecting Small Defects in Steel Plates Based on Physically Guided Saliency and Asymmetric Perception Network

by Xiaodong Zhang, Cuiyun Li and Shengye Zhao

Symmetry 2026, 18(7), 1104; https://doi.org/10.3390/sym18071104 (registering DOI) - 29 Jun 2026

Abstract

Steel plate defect detection is confronted with problems such as weak features of small defects, disconnection between physical priors and detection tasks, and semantic inconsistency of multi-scale fusion, which can easily lead to the misdetection of small defects. To solve these problems, this paper proposes a detection method named PG-SalDETR. Firstly, this paper proposes a physics-guided saliency perception mechanism (PGSPM), which transforms physical priors into learnable guidance signals and directly embeds them into the detection network for joint optimization. Secondly, this paper proposes the token sequence saliency perception network (TSSP-Net), which is designed to help improve the perception and representation of small defect features through an asymmetric dual-branch architecture, adaptive fusion, and residual fusion. Thirdly, a two-stage query refinement mechanism (TSQRM) is proposed. Through physically guided offset correction and adaptive multi-scale feature aggregation, it optimizes the query while preserving fine-grained defect details. Finally, the dynamic cross-scale fusion module (LCASF) is proposed. Through the dynamic cross-scale fusion strategy, the semantic inconsistency problem of small defect features in multi-scale fusion is alleviated. Experimental results demonstrate improvements. Compared to Salience DETR, PG-SalDETR achieves an AP increase of 3.8% and 2.6%, and an AP_S increase of 2.8% and 3.9% on the NEU-DET and GC10-DET datasets, respectively. These results indicate the effectiveness of the proposed method for small defect detection on steel plate surfaces. Full article

(This article belongs to the Section Physics)

►▼ Show Figures

Figure 1

32 pages, 270887 KB

Open AccessArticle

DCFP-YOLO: A Dual-Backbone Feature Fusion Network for Multi-Pose Chili Flower Recognition and Edge Deployment

by Minqiu Kuang, Xiaojian Li, Fangping Xie, Shang Chen, Dawei Liu, Yang Xiang, Bei Wu, Feng Liu, Yuxuan Zhang and Xu Li

Agriculture 2026, 16(13), 1422; https://doi.org/10.3390/agriculture16131422 (registering DOI) - 29 Jun 2026

Abstract

To address the challenges of difficult feature extraction and insufficient recognition accuracy caused by the small size of chili flowers, occlusion by branches and leaves, and illumination variations in complex field environments, a dual-backbone-based chili flower pose estimation algorithm, termed DCFP-YOLO, is proposed. Built upon the YOLO11n framework, the proposed method performs classification and recognition of five typical upward-oriented chili flower poses. To alleviate the loss of local detail features of small chili flowers under complex backgrounds, a dual-backbone feature extraction network composed of StarNet and ShuffleNetV2 is constructed. Specifically, the StarNet backbone enhances the extraction of fine-grained local features from key floral regions, while the ShuffleNetV2 backbone improves the perception of global spatial structural information. The complementary fusion of dual-backbone features strengthens the representation capability of chili flower pose features in complex environments. To mitigate the attenuation of shallow detail information during multi-scale feature transmission, a Bidirectional Multi-branch Auxiliary Feature Pyramid Network (BiMAFPN) is designed to enhance feature propagation through cross-scale feature interaction, thereby improving pose recognition performance under occlusion and overlapping conditions. Furthermore, a Programmable Gradient Information (PGI)-assisted training mechanism is introduced to optimize gradient propagation paths and alleviate information bottlenecks in deep networks, thereby enhancing the robustness of multi-pose feature extraction under occlusion, blur, and complex illumination conditions. Experimental results demonstrate that DCFP-YOLO achieves recall, mAP50, and mAP50 values of 87.4%, 92.0%, and 66.9%, respectively, representing improvements of 1.7, 1.3, and 3.5 percentage points over the baseline model. Overall performance surpasses that of current mainstream object detection algorithms. After deployment on the NVIDIA Jetson AGX Orin platform, the model achieves an inference speed of 20.9 frames/s, which can basically satisfy the real-time perception requirements of chili flower pose recognition in complex agricultural environments. The proposed method provides an effective visual perception framework for chili flower pose recognition in complex agricultural environments. Rather than constituting a complete robotic pollination solution, the developed model serves as a potential perception component for future intelligent pollination robotic systems, providing reliable flower pose information for subsequent research on target localization, end-effector alignment, and robotic pollination in unstructured greenhouse environments. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

►▼ Show Figures

Figure 1

43 pages, 2827 KB

Open AccessArticle

MS-SENet: A Multi-Scale Squeeze–Excitation Network for Deep-Learning-Based Automatic Modulation Classification in Cognitive Radio Systems

by Evelio Astaiza Hoyos, Héctor Fabio Bermúdez-Orozco and Nasly Cristina Rodriguez-Idrobo

Future Internet 2026, 18(7), 343; https://doi.org/10.3390/fi18070343 (registering DOI) - 29 Jun 2026

Abstract

Automatic modulation classification (AMC) is a critical enabler of cognitive radio (CR) systems, allowing secondary users to identify primary user modulation schemes and adapt transmission parameters in real time. Traditional AMC approaches, based on likelihood functions or hand-crafted features, suffer from degraded performance under low signal-to-noise ratio (SNR) conditions and realistic channel impairments. In this paper, we propose MS-SENet (Multi-Scale Squeeze–Excitation Network), a novel deep-learning architecture that integrates multi-scale convolutional feature extraction, squeeze-and-excitation channel attention, residual learning, bidirectional long short-term memory (BiLSTM) temporal modelling, and global attention pooling into a unified framework for robust AMC. The multi-scale convolution module employs parallel branches with kernel sizes of 3, 5, and 7 to capture both fine-grained phase transitions and coarse envelope patterns from raw in-phase/quadrature (I/Q) signal samples. Squeeze–excitation residual blocks perform channel-wise feature recalibration, enabling the network to emphasize informative feature maps while suppressing less relevant ones. A bidirectional LSTM layer models temporal dependencies across the signal sequence, and a global attention pooling mechanism performs weighted temporal aggregation prior to classification. We present a comprehensive taxonomy of deep-learning architectures for AMC organised along five axes—input representation, feature extraction, temporal modelling, regularization strategy, and architectural complexity—and conduct a rigorous comparative evaluation against ten baseline architectures on a RadioML-style synthetic dataset (110,000 samples, 11 modulation classes, and 20 SNR levels from −20 to +18 dB). The experimental results demonstrate that MS-SENet achieves a mean classification accuracy of 87.9% at SNR ≥ 0 dB (the average of the medium and high SNR regime averages: 86.06% for 0 ≤ SNR < 10 dB and 89.68% for SNR ≥ 10 dB) while maintaining a compact footprint of approximately 406 K parameters, making it suitable for deployment on resource-constrained edge devices. We further analyze the robustness of the proposed architecture to multipath fading, carrier frequency offset, and sample rate offset, confirming its resilience under practical operating conditions. MS-SENet is an architecture designed for automatic modulation classification of I/Q signals and is not related to the homonymous architecture for speech emotion recognition. Full article

33 pages, 1264 KB

Open AccessArticle

Symmetry-Aware Discrepancy Representation and Collaborative Optimization for Multi-Class Defect Image Generation

by Beibei Jia, Haijian Shao, Dengbiao Jiang, Nian Tao and Guoquan Yao

Symmetry 2026, 18(7), 1101; https://doi.org/10.3390/sym18071101 (registering DOI) - 29 Jun 2026

Abstract

Industrial defect image generation is an effective way to alleviate data scarcity and class imbalance in visual inspection. In industrial images, defects usually appear as local asymmetric perturbations on globally regular background structures, which makes defect synthesis dependent on both background consistency and local anomaly fidelity. Existing generative methods still face difficulties when only limited anomalous samples are available, especially in representing fine-grained discrepancies among defect categories, coordinating global and local branches across diffusion stages, and constraining small defect regions and their boundary transitions. To address these issues, this paper develops a symmetry-aware multi-constraint diffusion framework based on the dual-branch architecture of DualAnoDiff. The framework treats multi-class industrial defect generation as a joint optimization problem involving class-conditioned discrepancy representation, diffusion-stage-aware branch coordination, and saliency-guided regional supervision. First, Class-Conditioned Shared-Basis LoRA (CSB-LoRA) models category-specific defect characteristics by combining cross-class shared low-rank bases with class-dependent coefficients, allowing common structural priors and class-specific asymmetric patterns to be represented simultaneously. Second, Temporal Dual-branch Attention Modulation (TDAM) adjusts branch interaction, background information injection, and residual feature fusion according to the denoising stage, so that the generation process can gradually shift from global structure restoration to local defect refinement. Third, Saliency-Guided Reconstruction Loss (SGRL) applies stronger spatial constraints to defect regions and boundary neighborhoods, improving local detail preservation and defect-background continuity. Experiments on the MVTec AD dataset show that the proposed method improves both generation quality and perceptual diversity compared with DualAnoDiff. The average IS increases from 1.93 to 2.07, and IC-LPIPS increases from 0.38 to 0.41. When the generated samples are used for downstream defect segmentation, AP-P improves from 84.5% to 85.7%, and F1-P improves from 78.8% to 79.3%. These results indicate that the generated samples can serve as useful synthetic training data for few-shot and class-imbalanced industrial inspection. Full article

(This article belongs to the Section Computer)

29 pages, 2905 KB

Open AccessArticle

Temporal Attribution Matrix for Tracking XAI Feature Importance Evolution in Wind Turbine Gearbox Degradation Detection Using SCADA Data

by Jhamil Gutierrez, Ace Beneth Jacinto, Jamil Allen Fortaleza, Amor Lacara, Riah Ann Fermin-Cayanan and Arjay Alba

Energies 2026, 19(13), 3072; https://doi.org/10.3390/en19133072 (registering DOI) - 29 Jun 2026

Abstract

Wind turbine gearbox condition monitoring increasingly combines Supervisory Control and Data Acquisition (SCADA) data with Explainable Artificial Intelligence (XAI) for predictive maintenance. However, current XAI applications report attributions as static or globally aggregated feature-importance results. Such representations do not reveal when fault-related variables emerge, how dominance shifts between features, or how the explanatory structure evolves as degradation progresses. This limits their value for time-resolved diagnostic interpretation. To address this gap, this study proposes the Temporal Attribution Matrix (TAM), a temporal interpretability framework that tracks the evolution of XAI-derived feature importance across degradation periods. The central hypothesis is that temporal attribution patterns contain diagnostic information not captured by static feature-importance summaries. TAM was applied to a three-year SCADA dataset from Fuhrländer FL2500 wind turbines using XGBoost-SHAP and 1D-CNN Grad-CAM within sliding weekly windows. Four temporal measures were derived: feature onset time, dominance transition, attribution entropy, and cross-model consistency. Both XAI methods independently identified gearbox bearing temperatures 451 and 152 as the most influential features. TAM further revealed a synchronized thermal-feature onset on 23 October 2012, 14 SHAP dominance transitions compared with 70 Grad-CAM transitions, and a moderate cross-model Spearman correlation of 0.488. Secondary validation using WT82 confirmed TAM’s applicability beyond a single turbine. These results demonstrate that TAM extends static XAI by producing time-resolved degradation narratives for SCADA-based wind turbine predictive maintenance. Full article

(This article belongs to the Section A3: Wind, Wave and Tidal Energy)

►▼ Show Figures

Figure 1

15 pages, 2264 KB

Open AccessArticle

Self-Supervised Bidirectional State Space Modeling for Voiceprint Feature Representation and Recognition

by Junju Lai, Wei Wang, Guangyao Li, Zhichong Kong, Chao Yuan and Qian Zhou

Electronics 2026, 15(13), 2838; https://doi.org/10.3390/electronics15132838 (registering DOI) - 29 Jun 2026

Abstract

As substation equipment continues to evolve toward higher voltage levels, larger capacities, and more complex operating conditions, voiceprint signals exhibit greater sensitivity and observability during the early stages of faults. However, traditional modeling approaches still suffer from limitations in capturing long-range temporal dependencies, suppressing noise interference, and adapting to unlabeled data. To address these issues, a state space model-based Mamba self-supervised voiceprint framework, termed MSANet, is proposed. A bidirectional state space scanning mechanism is introduced into the network architecture to avoid the high computational complexity of attention mechanisms while simultaneously preserving both global contextual correlations and local detail representations of voiceprint signals. In addition, a spectrum block masking-based self-supervised learning strategy is incorporated, enabling the model to extract stable time–frequency structural features even under unlabeled or limited labeled samples. Experimental results demonstrate that MSANet achieves high accuracy in voiceprint-related tasks. Furthermore, the lightweight version of the model maintains competitive performance while significantly reducing computational and storage overhead, indicating its feasibility for deployment on edge devices in resource-constrained scenarios such as substation environments. The proposed method provides a potential methodological basis for enhancing fault-related voiceprint feature extraction, representation learning, and future practical engineering deployment. Full article

►▼ Show Figures

Figure 1

51 pages, 1481 KB

Open AccessArticle

A Hybrid Feature-Enhanced IndoBERT Framework with Controlled Semi-Supervised Learning for Low-Resource Indonesian Hate Speech Detection

by Shoffan Saifullah and Rafał Dreżewski

Appl. Sci. 2026, 16(13), 6478; https://doi.org/10.3390/app16136478 (registering DOI) - 29 Jun 2026

Abstract

Low-resource hate speech detection remains a challenging task for Indonesian social media due to limited labeled annotations, highly informal linguistic expressions, and substantial lexical variability. Under such conditions, purely supervised transformer models often suffer from unstable semantic generalization, while conventional pseudo-labeling methods are vulnerable to noisy unlabeled sample propagation. To address these limitations, this study proposes a hybrid feature-enhanced IndoBERT framework integrated with a controlled semi-supervised learning strategy. The proposed model combines contextual IndoBERT embeddings with abusive lexicon cues, handcrafted linguistic indicators, and TF-IDF–SVD statistical representations through a lightweight concatenation–projection feature fusion mechanism, while unlabeled data are incorporated via adaptive confidence thresholding and class-balanced pseudo-label selection to improve pseudo-label reliability. Extensive experiments were conducted under realistic low-resource supervision settings using only 5%, 10%, and 20% labeled data, and the proposed framework was systematically compared against representative baselines, including sparse lexical machine learning models, shallow neural architectures, multilingual transformers, IndoBERTweet, naive pseudo-labeling, and LLM-based prompting. The results show that model effectiveness is strongly supervision-dependent. Under the most extreme low-resource setting, compact statistical augmentation provides the most stable complementary signal, whereas under moderate low-resource supervision, the full hybrid representation combined with controlled semi-supervised learning yields the strongest and most consistent gains. The proposed Hybrid IndoBERT + controlled SSL framework outperforms all baselines at the 20% labeled setting, reaching an accuracy of 0.8654, Macro-F1 of 0.8633, and ROC-AUC of 0.9334. Additional analyses of pseudo-label reliability, calibration behavior, computational efficiency, and qualitative error patterns further show that the proposed framework improves low-resource robustness while maintaining comparable inference-time efficiency. These findings demonstrate that low-resource hate speech detection benefits most from the staged integration of contextual semantic modeling, interpretable linguistic cues, global lexical–statistical structure, and carefully regulated unlabeled data exploitation. Additional experiments using GPT-4o-mini and Llama-3.1-8B further demonstrate that the proposed framework remains competitive against general-purpose large language model prompting approaches under low-resource Indonesian hate speech detection scenarios. The proposed framework provides a practical and reproducible direction for hate speech detection in annotation-constrained social media environments. Full article

(This article belongs to the Special Issue Recent Applications of Machine Learning and LLMs in Natural Language Processing (NLP): 2nd Edition)

►▼ Show Figures

Figure 1

15 pages, 1563 KB

Open AccessArticle

Matriarchs and Metopism: An Analysis of the Wari Iconographic Representation of the Skull

by Louise Deglin

Humans 2026, 6(3), 22; https://doi.org/10.3390/humans6030022 (registering DOI) - 29 Jun 2026

Abstract

While skeletal imagery appears across various ancient Andean traditions, the Wari Empire (c. 600–1000 CE) developed a uniquely standardized and widespread skull motif—the uma tullu—distributed throughout its former territory. Through an analysis of 63 artifacts spanning ceramic, textile, and metal, this study identifies key diagnostic markers of the motif: the representation of the metopic suture and the application of red pigment. By cross-referencing these stylistic features with bioarchaeological data, the research posits that the uma tullu served as a central communicative device. In the absence of a formal script, this motif may have encoded imperial values and ancestral cult practices, facilitating ideological expansion and state identity. Ultimately, this work demonstrates how standardized iconography functioned as a system of graphic communication and ideological cohesion in the Middle Horizon Andes. Full article

►▼ Show Figures

Figure 1

20 pages, 4127 KB

Open AccessArticle

Quantum Machine Learning for Water Pollution Profiling in the Rio Santiago Basin

by Alan Abraham-Mexicano, Carlos V. Muro-Medina, Valentin Flores-Payan, Elisa Ramos-Pinzon, Carolina L. Recio-Colmenares, Roxana B. Recio-Colmenares and Cesar A. Garcia-Garcia

Quantum Rep. 2026, 8(3), 60; https://doi.org/10.3390/quantum8030060 (registering DOI) - 29 Jun 2026

Abstract

The Rio Santiago basin is one of the most environmentally stressed river systems in Mexico, with persistent organic, nutrient, microbial, surfactant, and metal contamination. This study develops a near-term quantum machine learning workflow for environmental monitoring and water-pollution profiling using multivariate records from 13 stations between 2009 and 2022. QML is evaluated here because quantum feature maps can define nonlinear, interaction-rich kernels that remain executable on present quantum hardware, providing an alternative representation to compare with classical PCA, RBF, UMAP, and HDBSCAN baselines rather than a presumed computational advantage. After quality screening, log transformation, standardization, and domain-guided feature selection, pollution profiles are evaluated across PCA, RBF spectral clustering, UMAP/KMeans, UMAP/HDBSCAN, a simulated ZZ-style quantum feature-map kernel, and Qiskit Runtime hardware evaluations of the same kernel concept. The initial cleaned-data results show that classical PCA clustering identifies broad lower-load, high organic/surfactant, and rain-season solids/microbial profiles. UMAP/HDBSCAN provides the strongest cleaned full-sample nonlinear baseline, with a silhouette score of 0.568 after excluding 177 noise samples. The simulated quantum-kernel representation separates station-linked gradients, while matched n = 650 stability diagnostics show near-identical quantum-kernel clustering across random initializations (mean ARI = 0.994 for cleaned data) but retain the RBF kernel as the strongest nonlinear comparator. Two 24-sample Qiskit hardware runs and two matched 8-record hardware checks provide proof-of-execution evidence. The analysis is framed as a controlled representation study, not as a claim of quantum advantage. Full article

(This article belongs to the Special Issue Beyond Classical Limits: Quantum Machine Learning for Multi-Field Research)

►▼ Show Figures

Figure 1

29 pages, 6355 KB

Open AccessArticle

SFEFeNet: A Structure-Frequency Mutual-Guided Lightweight Network for Remote Sensing Image Super-Resolution

by Runtao Liu, Yupeng Shang, Guoqing Zhang and Le Sun

Remote Sens. 2026, 18(13), 2102; https://doi.org/10.3390/rs18132102 (registering DOI) - 29 Jun 2026

Abstract

Remote sensing image super-resolution plays an important role in object recognition, urban monitoring, and fine-grained remote sensing interpretation. This paper studies lightweight single-image remote sensing image super-resolution, in which only one LR observation is available and the model must recover reliable structural details under a limited computational budget. Existing lightweight methods reduce parameter counts and computational complexity, but their limited representation capacity often causes blurred boundaries, broken road structures, and missing high-frequency details in buildings, roads, and texture-rich regions. To address these issues, we propose SFEFeNet, a Structure-Frequency Mutual-Guided Lightweight Network for remote sensing image super-resolution. First, we design a Lightweight Structure-Frequency Block (LSFB) to jointly model local spatial features, structural responses, and frequency responses with low computational overhead. Second, we introduce a Structure-Frequency Mutual Guidance (SFMG) module, where edge responses guide high-frequency component selection, and the selected high-frequency responses further refine edge-aware attention. Finally, we propose a Structure-Frequency Fusion Gate (SFFG) to adaptively integrate lightweight features, local spatial features, frequency-enhanced features, and structure-refined features. Experiments on RSSCN7, DOTA, and WHU-RS19 datasets evaluate SFEFeNet in terms of reconstruction quality, visual performance, and model complexity. Additional analyses further examine structural preservation, complex synthetic degradation, real-image generalization, and statistical stability. Notably, SFEFeNet-Lite contains 0.539 M parameters and 17.07 G FLOPs for

\times 2

, and 0.622 M parameters and 7.12 G FLOPs for

\times 4

, enabling effective structure-frequency feature modeling with lightweight computational cost. Full article

(This article belongs to the Special Issue AI-Driven Hyperspectral Image Classification and Processing in Remote Sensing)

►▼ Show Figures

Figure 1

21 pages, 3138 KB

Open AccessArticle

TP-CanineNet: Temporal Context Contrastive Learning with Pseudo-Label Supervision for Abnormal Behavior Detection of Canine

by Xiangyun Guo, Xiaoya Kong, Chuiyu Kong, Jiashuo Feng and Yuxin Liu

Animals 2026, 16(13), 1997; https://doi.org/10.3390/ani16131997 (registering DOI) - 29 Jun 2026

Abstract

Canines exhibit various behavioral abnormalities, such as excessive barking, destructive behaviors, and indoor defecation when left at home alone. Identifying these abnormal behaviors and implementing scientific and reasonable interventions can help improve canine welfare and promote harmonious coexistence between humans and companion animals. However, existing canine behavior recognition methods struggle to adapt to the characteristics of strong temporal continuity and uneven motion amplitude of abnormal behaviors exhibited by lonely dogs, resulting in inadequate temporal feature representation and low recognition accuracy. Therefore, this study developed a TP-CanineNet model based on a Weakly Supervised Video Anomaly Detection (WS-VAD) framework to address this issue. The model integrated a Temporal Context Aggregation (TCA) module to efficiently capture local–global temporal dependencies and suppress temporal noise, and further enhances the representation of temporal features in dog behaviors. Meanwhile, a Pseudo-Instance Discriminative Enhancement (PIDE) module is adopted to strengthen the feature distinction between abnormal and normal behaviors. We constructed an Alone-Dog dataset comprising 430 video samples and 60 ground-truth labeled samples to validate the model’s effectiveness. Experimental results showed that the proposed model achieved a frame-level AUC of 85.19% and an AP of 72.55%, representing improvements of 2.20% and 8.33%, respectively, over the baseline model. The method can provide intelligent detection of domestic dog behaviors when left alone at home. Full article

(This article belongs to the Section Companion Animals)

►▼ Show Figures

Figure 1

33 pages, 5003 KB

Open AccessArticle

SEMTRA: Global Semantic Transition and Rough-Set Rules for Auditable Post-Hoc Explainability

by Pavlo Radiuk, Oleksander Barmak and Iurii Krak

Mach. Learn. Knowl. Extr. 2026, 8(7), 181; https://doi.org/10.3390/make8070181 (registering DOI) - 29 Jun 2026

Abstract

Deep learning architectures generate highly effective but difficult-to-audit latent representations, creating a practical gap between predictive performance and verifiable explanations. Existing post hoc techniques often produce fragmented local attributions rather than dataset-level rulebooks. In this work, we propose Global SEMantic TRAnsition (SEMTRA), a post hoc framework that maps frozen representation features into semantic attributes, discretizes those attributes, and induces rough-set production rules with explicit coverage, conflict, fidelity, and abstention reporting. Evaluated on the Animals with Attributes 2 (AwA2) Protocol A, the semantic transition achieved a Mean Absolute Error (MAE) of

0.1029 \pm 0.0005

. The extracted rulebook covered 84.80% of test instances, yielding a covered accuracy of 39.73% and a covered fidelity to the base predictor of 40.48%. Under the Protocol B split, continuous semantic-prototype transfer reached an unseen-object accuracy of

44.02 % \pm 1.22 %

as a semantic-transfer validation. Cross-domain validations using SUN and Derm7pt demonstrated that the audit protocol is portable yet strongly dataset-dependent. In the controlled synthetic benchmark, SEMTRA achieved a macro-F1 score of 0.879 at zero semantic noise and degraded to 0.838 at the highest evaluated noise level. Ultimately, SEMTRA serves as a transparent audit layer to expose the verifiable logical subset of a model, rather than replacing the underlying predictor. Full article

(This article belongs to the Special Issue Explainable Artificial Intelligence: Theoretical Foundations and Methodological Advances)

►▼ Show Figures

Graphical abstract

20 pages, 2821 KB

Open AccessArticle

MD-Transformer: Multimodal Integration of ProtBERT Embeddings and Physicochemical Descriptors for Protein–Protein Interface Residue Prediction

by Jiahui Yang, Jihua Feng, Yuting Zhang and Zhongxing Chen

Int. J. Mol. Sci. 2026, 27(13), 5848; https://doi.org/10.3390/ijms27135848 (registering DOI) - 29 Jun 2026

Abstract

Accurate prediction of protein–protein interaction (PPI) interface residues is essential for understanding molecular recognition and supporting structure-guided design. To integrate contextual sequence representations with structure-related physicochemical information, we propose a multimodal framework termed MD-Transformer. The model combines residue-level ProtBERT embeddings with physicochemical descriptors, including B-factor, solvent-accessible surface area (SASA), and hydrophobicity. A hybrid fusion module first aligns heterogeneous features, followed by Transformer encoding and cross-modal attention for multimodal integration. Using the DB5.5 benchmark, physicochemical descriptors were Z-score normalized exclusively with training-set statistics. Under the complex-level split protocol (Official A), MD-Transformer achieved an AUPRC of 0.564, outperforming the ablation model without physicochemical descriptors by 0.159 and reducing false-positive predictions on exposed non-interface residues. Under the homology-aware split protocol (Official B v1), the model maintained an AUPRC of 0.480 and an MCC of 0.242, indicating retained predictive capability under reduced sequence similarity constraints. Under the same aligned evaluation workflow, PeSTo achieved an AUPRC of 0.264. Further SASA-stratified analyses identified SASA as a major contributor to suppressing false-positive predictions across residue exposure environments, while also revealing a precision-recall trade-off in highly exposed residues. These results suggest that contextual sequence representations and residue-level physicochemical descriptors provide complementary predictive signals. Full article

(This article belongs to the Section Molecular Informatics)

►▼ Show Figures

Figure 1

36 pages, 10488 KB

Open AccessArticle

CAMD-RTDETR: Real-Time Multi-Defect Detection Method for Tunnel Structures

by Yunyun Hao and Xiangyang Xu

Sensors 2026, 26(13), 4112; https://doi.org/10.3390/s26134112 (registering DOI) - 29 Jun 2026

Abstract

Intelligent tunnel defect detection is essential for structural safety and efficient operation and maintenance. However, manual inspection is inefficient, subjective, and risky, while existing deep learning methods often show unstable performance under practical conditions involving small targets, large-scale variations, and severe background interference, limiting their accuracy and real-time deployment on edge devices. To address these issues, this paper proposes CAMD-RTDETR, an end-to-end real-time multi-defect detection method based on RT-DETR. Cross-attention feature mining is introduced to enable bidirectional interaction between shallow spatial details and deep semantic information, enhancing the perception of weak-texture defects such as fine cracks. Multi-scale contextual pooling is designed to aggregate features from different receptive fields and improve the unified representation of cracks, seepage, and spalling with diverse morphologies. In addition, decoding enhancement and query optimization are incorporated to improve query updating and localization discrimination, thereby enhancing detection stability and boundary accuracy in complex tunnel scenes. Experiments on a field-collected tunnel defect dataset show that CAMD-RTDETR achieves an average inference latency of 15.855 ms per image and a processing speed of 63.06 FPS under the batch-size-1 testing setting. Compared with the baseline RT-DETR, Precision, Recall, mAP50, and mAP50-95 are improved by 6.3%, 13.5%, 14.7%, and 15.8%, respectively. Comparisons with seven representative detectors further demonstrate its superior accuracy and real-time performance, demonstrating its preliminary feasibility for edge-side inference and its potential for future integration into vehicle-mounted tunnel inspection systems. Full article

(This article belongs to the Special Issue Sensors in Civil Structural Health Monitoring—2nd Edition)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 190.

Go to page 1 2 3 4 5

Search Results (9,477)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI