Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,130)

Search Parameters:
Keywords = attention state classification

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
34 pages, 4356 KB  
Article
Neural Efficiency and Attentional Instability in Gaming Disorder: A Task-Based Occipital EEG and Machine Learning Study
by Riaz Muhammad, Ezekiel Edward Nettey-Oppong, Muhammad Usman, Saeed Ahmed Khan Abro, Toufique Ahmed Soomro and Ahmed Ali
Bioengineering 2026, 13(2), 152; https://doi.org/10.3390/bioengineering13020152 - 28 Jan 2026
Abstract
Gaming Disorder (GD) is becoming more widely acknowledged as a behavioral addiction characterized by impaired control and functional impairment. While resting-state impairments are well understood, the neurophysiological dynamics during active gameplay remain underexplored. This study identified task-based occipital EEG biomarkers of GD and [...] Read more.
Gaming Disorder (GD) is becoming more widely acknowledged as a behavioral addiction characterized by impaired control and functional impairment. While resting-state impairments are well understood, the neurophysiological dynamics during active gameplay remain underexplored. This study identified task-based occipital EEG biomarkers of GD and assessed their diagnostic utility. Occipital EEG (O1/O2) data from 30 participants (15 with GD, 15 controls) collected during active mobile gaming were used in this study. Spectral, temporal, and nonlinear complexity features were extracted. Feature relevance was ranked using Random Forest, and classification performance was evaluated using Leave-One-Subject-Out (LOSO) cross-validation to ensure subject-independent generalization across five models (Random Forest, KNN, SVM, Decision Tree, ANN). The GD group exhibited paradoxical “spectral slowing” during gameplay, characterized by increased Delta/Theta power and decreased Beta activity relative to controls. Beta variability was identified as a key biomarker, reflecting altered attentional stability, while elevated Alpha power suggested potential neural habituation or sensory gating. The Decision Tree classifier emerged as the most robust model, achieving a classification accuracy of 80.0%. Results suggest distinct neurophysiological patterns in GD, where increased low-frequency power may reflect automatized processing or “Neural Efficiency” despite active task engagement. These findings highlight the potential of occipital biomarkers as accessible and objective screening metrics for Gaming Disorder. Full article
(This article belongs to the Special Issue AI in Biomedical Image Segmentation, Processing and Analysis)
Show Figures

Figure 1

20 pages, 363 KB  
Article
Analysis of Using Machine Learning Application Possibilities for the Detection and Classification of Topographic Objects
by Katarzyna Kryzia, Aleksandra Radziejowska, Justyna Adamczyk and Dominik Kryzia
ISPRS Int. J. Geo-Inf. 2026, 15(2), 59; https://doi.org/10.3390/ijgi15020059 - 27 Jan 2026
Abstract
The growing availability of spatial data from remote sensing, laser scanning (LiDAR), and photogrammetric techniques stimulates the dynamic development of methods for the automatic detection and classification of topographic objects. In recent years, both classical machine learning (ML) algorithms and deep learning (DL) [...] Read more.
The growing availability of spatial data from remote sensing, laser scanning (LiDAR), and photogrammetric techniques stimulates the dynamic development of methods for the automatic detection and classification of topographic objects. In recent years, both classical machine learning (ML) algorithms and deep learning (DL) methods have found wide application in the analysis of large and complex data sets. Despite significant achievements, literature on the subject remains scattered, and a comprehensive review that systematically compares algorithm classes with respect to data modality, performance, and application context is still needed. The aim of this article is to provide a critical analysis of the current state of research on the use of ML and DL algorithms in the detection and classification of topographic objects. The theoretical foundations of selected methods, their applications to various data sources, and the accuracy and computational requirements reported in the literature are presented. Attention is paid to comparing classical ML algorithms (including SVM, RF, KNN) with modern deep architectures (CNN, U-Net, ResNet), with respect to different data types such as satellite imagery, aerial orthophotos, and LiDAR point clouds, indicating their effectiveness in the context of cartographic and elevation data. The article also discusses the main challenges related to data availability, model interpretability, and computational costs, and points to promising directions for further research. The summary of the results shows that DL methods are frequently reported to achieve several to over ten percentage points higher segmentation and classification accuracy than classical ML approaches, depending on data type and object complexity, particularly in the analysis of raster data and LiDAR point clouds. The conclusions emphasize the practical significance of these methods for spatial planning, infrastructure monitoring, and environmental management, as well as their potential in the automation of topographic analysis. Full article
13 pages, 2027 KB  
Article
An Improved Diffusion Model for Generating Images of a Single Category of Food on a Small Dataset
by Zitian Chen, Zhiyong Xiao, Dinghui Wu and Qingbing Sang
Foods 2026, 15(3), 443; https://doi.org/10.3390/foods15030443 - 26 Jan 2026
Viewed by 95
Abstract
In the era of the digital food economy, high-fidelity food images are critical for applications ranging from visual e-commerce presentation to automated dietary assessment. However, developing robust computer vision systems for food analysis is often hindered by data scarcity for long-tail or regional [...] Read more.
In the era of the digital food economy, high-fidelity food images are critical for applications ranging from visual e-commerce presentation to automated dietary assessment. However, developing robust computer vision systems for food analysis is often hindered by data scarcity for long-tail or regional dishes. To address this challenge, we propose a novel high-fidelity food image synthesis framework as an effective data augmentation tool. Unlike generic generative models, our method introduces an Ingredient-Aware Diffusion Model based on the Masked Diffusion Transformer (MaskDiT) architecture. Specifically, we design a Label and Ingredients Encoding (LIE) module and a Cross-Attention (CA) mechanism to explicitly model the relationship between food composition and visual appearance, simulating the “cooking” process digitally. Furthermore, to stabilize training on limited data samples, we incorporate a linear interpolation strategy into the diffusion process. Extensive experiments on the Food-101 and VireoFood-172 datasets demonstrate that our method achieves state-of-the-art generation quality even in data-scarce scenarios. Crucially, we validate the practical utility of our synthetic images: utilizing them for data augmentation improved the accuracy of downstream food classification tasks from 95.65% to 96.20%. This study provides a cost-effective solution for generating diverse, controllable, and realistic food data to advance smart food systems. Full article
Show Figures

Figure 1

24 pages, 10940 KB  
Article
A Few-Shot Object Detection Framework for Remote Sensing Images Based on Adaptive Decision Boundary and Multi-Scale Feature Enhancement
by Lijiale Yang, Bangjie Li, Dongdong Guan and Deliang Xiang
Remote Sens. 2026, 18(3), 388; https://doi.org/10.3390/rs18030388 - 23 Jan 2026
Viewed by 157
Abstract
Given the high cost of acquiring large-scale annotated datasets, few-shot object detection (FSOD) has emerged as an increasingly important research direction. However, existing FSOD methods face two critical challenges in remote sensing images (RSIs): (1) features of small targets within remote sensing images [...] Read more.
Given the high cost of acquiring large-scale annotated datasets, few-shot object detection (FSOD) has emerged as an increasingly important research direction. However, existing FSOD methods face two critical challenges in remote sensing images (RSIs): (1) features of small targets within remote sensing images are incompletely represented due to extremely small-scale and cluttered backgrounds, which weakens discriminability and leads to significant detection degradation; (2) unified classification boundaries fail to handle the distinct confidence distributions between well-sampled base classes and sparsely sampled novel classes, leading to ineffective knowledge transfer. To address these issues, we propose TS-FSOD, a Transfer-Stable FSOD framework with two key innovations. First, the proposed detector integrates a Feature Enhancement Module (FEM) leveraging hierarchical attention mechanisms to alleviate small target feature attenuation, and an Adaptive Fusion Unit (AFU) utilizing spatial-channel selection to strengthen target feature representations while mitigating background interference. Second, Dynamic Temperature-scaling Learnable Classifier (DTLC) employs separate learnable temperature parameters for base and novel classes, combined with difficulty-aware weighting and dynamic adjustment, to adaptively calibrate decision boundaries for stable knowledge transfer. Experiments on DIOR and NWPU VHR-10 datasets show that TS-FSOD achieves competitive or superior performance compared to state-of-the-art methods, with improvements up to 4.30% mAP, particularly excelling in 3-shot and 5-shot scenarios. Full article
Show Figures

Figure 1

18 pages, 2210 KB  
Article
SPINET-KSP: A Multi-Modal LLM-Graph Foundation Model for Contextual Prediction of Kinase-Substrate-Phosphatase Triads
by Michael Olaolu Arowolo, Marian Emmanuel Okon, Davis Austria, Muhammad Azam and Sulaiman Olaniyi Abdulsalam
Kinases Phosphatases 2026, 4(1), 3; https://doi.org/10.3390/kinasesphosphatases4010003 - 22 Jan 2026
Viewed by 63
Abstract
Reversible protein phosphorylation is an important regulatory mechanism in cellular signalling and disease, regulated by the opposing actions of kinases and phosphatases. Modern computer methods predict kinase–substrate or phosphatase–substrate interactions in isolation and lack specificity for biological conditions, neglecting triadic regulation. We present [...] Read more.
Reversible protein phosphorylation is an important regulatory mechanism in cellular signalling and disease, regulated by the opposing actions of kinases and phosphatases. Modern computer methods predict kinase–substrate or phosphatase–substrate interactions in isolation and lack specificity for biological conditions, neglecting triadic regulation. We present SPINET-KSP, a multi-modal LLM–Graph foundation model engineered for the prediction of kinase–substrate–phosphatase (KSP) triads with contextual awareness. SPINET-KSP integrates high-confidence interactomes (SIGNOR, BioGRID, STRING), structural contacts obtained from AlphaFold3, ESM-3 sequence embeddings, and a 512-dimensional cell-state manifold with 1612 quantitative phosphoproteomic conditions. A heterogeneous KSP graph is examined utilising a cross-attention Graphormer with Reversible Triad Attention to mimic kinase–phosphatase antagonism. SPINET-KSP, pre-trained on 3.41 million validated phospho-sites utilising masked phosphorylation modelling and contrastive cell-state learning, achieves an AUROC of 0.852 for kinase-family classification (sensitivity 0.821, specificity 0.834, MCC 0.655) and a Pearson correlation coefficient of 0.712 for phospho-occupancy prediction. In distinct 2025 mass spectrometry datasets, it identifies 72% of acknowledged cancer-resistance triads within the top 10 rankings and uncovers 247 supplementary triads validated using orthogonal proteomics. SPINET-KSP is the first foundational model for simulating context-dependent reversible phosphorylation, enabling the targeting of dysregulated kinase-phosphatase pathways in diseases. Full article
Show Figures

Figure 1

23 pages, 13685 KB  
Article
CAT: Causal Attention with Linear Complexity for Efficient and Interpretable Hyperspectral Image Classification
by Ying Liu, Zhipeng Shen, Haojiao Yang, Waixi Liu and Xiaofei Yang
Remote Sens. 2026, 18(2), 358; https://doi.org/10.3390/rs18020358 - 21 Jan 2026
Viewed by 110
Abstract
Hyperspectral image (HSI) classification is pivotal in remote sensing, yet deep learning models, particularly Transformers, remain susceptible to spurious spectral–spatial correlations and suffer from limited interpretability. These issues stem from their inability to model the underlying causal structure in high-dimensional data. This paper [...] Read more.
Hyperspectral image (HSI) classification is pivotal in remote sensing, yet deep learning models, particularly Transformers, remain susceptible to spurious spectral–spatial correlations and suffer from limited interpretability. These issues stem from their inability to model the underlying causal structure in high-dimensional data. This paper introduces the Causal Attention Transformer (CAT), a novel architecture that integrates causal inference with a hierarchical CNN-Transformer backbone to address these limitations. CAT incorporates three key modules: (1) a Causal Attention Mechanism that enforces temporal and spatial causality via triangular masking and axial decomposition to eliminate spurious dependencies; (2) a Dual-Path Hierarchical Fusion module that adaptively integrates spectral and spatial causal features using learnable gating; and (3) a Linearized Causal Attention module that reduces the computational complexity from O(N2) to O(N) via kernelized cumulative summation, enabling scalable high-resolution HSI processing. Extensive experiments on three benchmark datasets (Indian Pines, Pavia University, Houston2013) demonstrate that CAT achieves state-of-the-art performance, outperforming leading CNN and Transformer models in both accuracy and robustness. Furthermore, CAT provides inherently interpretable spectral–spatial causal maps, offering valuable insights for reliable remote sensing analysis. Full article
Show Figures

Graphical abstract

23 pages, 5756 KB  
Article
MG-HGLNet: A Mixed-Grained Hierarchical Geometric-Semantic Learning Framework with Dynamic Prototypes for Coronary Artery Lesions Assessment
by Xiangxin Wang, Yangfan Chen, Yi Wu, Yujia Zhou, Yang Chen and Qianjin Feng
Bioengineering 2026, 13(1), 118; https://doi.org/10.3390/bioengineering13010118 - 20 Jan 2026
Viewed by 185
Abstract
Automated assessment of coronary artery (CA) lesions via Coronary Computed Tomography Angiography (CCTA) is essential for the diagnosis of coronary artery disease (CAD). However, current deep learning approaches confront several challenges, primarily regarding the modeling of long-range anatomical dependencies, the effective decoupling of [...] Read more.
Automated assessment of coronary artery (CA) lesions via Coronary Computed Tomography Angiography (CCTA) is essential for the diagnosis of coronary artery disease (CAD). However, current deep learning approaches confront several challenges, primarily regarding the modeling of long-range anatomical dependencies, the effective decoupling of plaque texture from stenosis geometry, and the utilization of clinically prevalent mixed-grained annotations. To address these challenges, we propose a novel mixed-grained hierarchical geometric-semantic learning network (MG-HGLNet). Specifically, we introduce a topology-aware dual-stream encoding (TDE) module, which incorporates a bidirectional vessel Mamba (BiV-Mamba) encoder to capture global hemodynamic contexts and rectify spatial distortions inherent in curved planar reformation (CPR). Furthermore, a synergistic spectral–morphological decoupling (SSD) module is designed to disentangle task-specific features; it utilizes frequency-domain analysis to extract plaque spectral fingerprints while employing a texture-guided deformable attention mechanism to refine luminal boundary. To mitigate the scarcity of fine-grained labels, we implement a mixed-grained supervision optimization (MSO) strategy, utilizing anatomy-aware dynamic prototypes and logical consistency constraints to effectively leverage coarse branch-level labels. Extensive experiments on an in-house dataset demonstrate that MG-HGLNet achieves a stenosis grading accuracy of 92.4% and a plaque classification accuracy of 91.5%. The results suggest that our framework not only outperforms state-of-the-art methods but also maintains robust performance under weakly supervised settings, offering a promising solution for label-efficient CAD diagnosis. Full article
Show Figures

Graphical abstract

33 pages, 19776 KB  
Article
Multiparametric Vibration Diagnostics of Machine Tools Within a Digital Twin Framework Using Machine Learning
by Andrey Kurkin, Yuri Kabaldin, Maksim Zhelonkin, Sergey Mancerov, Maksim Anosov and Dmitriy Shatagin
Appl. Sci. 2026, 16(2), 982; https://doi.org/10.3390/app16020982 - 18 Jan 2026
Viewed by 256
Abstract
In the context of the digital transformation of industrial production, the need for intelligent maintenance and repair systems capable of ensuring reliable operation of machine-tool equipment without operator involvement is growing. This present study reviews the current state and future development of diagnostic [...] Read more.
In the context of the digital transformation of industrial production, the need for intelligent maintenance and repair systems capable of ensuring reliable operation of machine-tool equipment without operator involvement is growing. This present study reviews the current state and future development of diagnostic and condition-monitoring systems for metalworking machine tools. A review of international standards and existing solutions from domestic and international vendors in vibration diagnostics has been conducted. Particular attention is paid to non-intrusive vibration diagnostics, digital twins, multiparametric analysis methods, and neural network approaches to failure prediction. The architecture of the developed system is presented. The concept of the system is developed in full compliance with Russian and international standards of vibration diagnostics. At its core, the comprehensive digital twin relies on machine learning methods. The proposed architecture is a predictive-maintenance system built on interconnected digital twin realizations: the dynamic machine passport of a unit, operational data, and a comprehensive digital twin of the machine-tool equipment. The potential of neuromorphic computing on a hardware platform is being considered as a promising element for local-condition classification and emergency protection. At the current development stage, the operating principle has been demonstrated along with the integration into the control loop. The system is now at the beginning of laboratory testing. It demonstrates capabilities for comprehensive assessment of the equipment’s technical condition based on multiparametric data, short-term vibration trend forecasting using a Long Short-Term Memory network, and state classification using a Multilayer Perceptron model. The results of the system’s testing on a turning machining center have been analyzed. Full article
(This article belongs to the Special Issue Vibration-Based Diagnostics and Condition Monitoring)
Show Figures

Figure 1

28 pages, 19177 KB  
Article
Dual-Task Learning for Fine-Grained Bird Species and Behavior Recognition via Token Re-Segmentation, Multi-Scale Mixed Attention, and Feature Interleaving
by Cong Zhang, Zhichao Chen, Ye Lin, Xiuping Huang and Chih-Wei Lin
Appl. Sci. 2026, 16(2), 966; https://doi.org/10.3390/app16020966 - 17 Jan 2026
Viewed by 111
Abstract
In the ecosystem, birds are important indicators that can sensitively reflect changes in the ecological environment and its health. However, bird monitoring has challenges due to species diversity, variable behaviors, and distinct morphological characteristics. Therefore, we propose a parallel dual-branch hybrid CNN–Transformer architecture [...] Read more.
In the ecosystem, birds are important indicators that can sensitively reflect changes in the ecological environment and its health. However, bird monitoring has challenges due to species diversity, variable behaviors, and distinct morphological characteristics. Therefore, we propose a parallel dual-branch hybrid CNN–Transformer architecture for feature extraction that simultaneously captures local and global image features to address the “local feature similarity” issue in dual tasks of bird species and behaviors. The dual-task framework comprises three main components: the Token Re-segmentation Module (TRM), the Multi-scale Adaptive Module (MAM), and the Feature Interleaving Structure (FIS). The designed MAM fuses hybrid attention to address the problem of different-scale birds. MAM models the interdependencies between spatial and channel dimensions of features from different scales. It enables the model to adaptively choose scale-specific feature representations, accommodating inputs of different scales. In addition, we designed an efficient feature-sharing mechanism, called FIS, between parallel CNN branches. FIS interleaving delivers and fuses CNN feature maps across parallel layers, combining them with the features of the corresponding Transformer layer to share local and global information at different depths and promote deep feature fusion across parallel networks. Finally, we designed the TRM to address the challenge of visually similar but distinct bird species and of similar poses with distinct behaviors. TRM adopts a two-step approach: first, it locates discriminative regions, and then performs fine segmentation on them. This module enables the network to allocate relatively more attention to key areas while merging non-essential information and reducing interference from irrelevant details. Experiments on the self-made dataset demonstrate that, compared with state-of-the-art classification networks, the proposed network achieves the best performance, achieving 79.70% accuracy in bird species recognition, 76.21% in behavior recognition, and the best performance in dual-task recognition. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

20 pages, 5073 KB  
Article
SAWGAN-BDCMA: A Self-Attention Wasserstein GAN and Bidirectional Cross-Modal Attention Framework for Multimodal Emotion Recognition
by Ning Zhang, Shiwei Su, Haozhe Zhang, Hantong Yang, Runfang Hao and Kun Yang
Sensors 2026, 26(2), 582; https://doi.org/10.3390/s26020582 - 15 Jan 2026
Viewed by 196
Abstract
Emotion recognition from physiological signals is pivotal for advancing human–computer interaction, yet unimodal pipelines frequently underperform due to limited information, constrained data diversity, and suboptimal cross-modal fusion. Addressing these limitations, the Self-Attention Wasserstein Generative Adversarial Network with Bidirectional Cross-Modal Attention (SAWGAN-BDCMA) framework is [...] Read more.
Emotion recognition from physiological signals is pivotal for advancing human–computer interaction, yet unimodal pipelines frequently underperform due to limited information, constrained data diversity, and suboptimal cross-modal fusion. Addressing these limitations, the Self-Attention Wasserstein Generative Adversarial Network with Bidirectional Cross-Modal Attention (SAWGAN-BDCMA) framework is proposed. This framework reorganizes the learning process around three complementary components: (1) a Self-Attention Wasserstein GAN (SAWGAN) that synthesizes high-quality Electroencephalography (EEG) and Photoplethysmography (PPG) to expand diversity and alleviate distributional imbalance; (2) a dual-branch architecture that distills discriminative spatiotemporal representations within each modality; and (3) a Bidirectional Cross-Modal Attention (BDCMA) mechanism that enables deep two-way interaction and adaptive weighting for robust fusion. Evaluated on the DEAP and ECSMP datasets, SAWGAN-BDCMA significantly outperforms multiple contemporary methods, achieving 94.25% accuracy for binary and 87.93% for quaternary classification on DEAP. Furthermore, it attains 97.49% accuracy for six-class emotion recognition on the ECSMP dataset. Compared with state-of-the-art multimodal approaches, the proposed framework achieves an accuracy improvement ranging from 0.57% to 14.01% across various tasks. These findings offer a robust solution to the long-standing challenges of data scarcity and modal imbalance, providing a profound theoretical and technical foundation for fine-grained emotion recognition and intelligent human–computer collaboration. Full article
(This article belongs to the Special Issue Advanced Signal Processing for Affective Computing)
Show Figures

Figure 1

15 pages, 1262 KB  
Article
Structured Scene Parsing with a Hierarchical CLIP Model for Images
by Yunhao Sun, Xiaoao Chen, Heng Chen, Yiduo Liang and Ruihua Qi
Appl. Sci. 2026, 16(2), 788; https://doi.org/10.3390/app16020788 - 12 Jan 2026
Viewed by 183
Abstract
Visual Relationship Prediction (VRP) is crucial for advancing structured scene understanding, yet existing methods struggle with ineffective multimodal fusion, static relationship representations, and a lack of logical consistency. To address these limitations, this paper proposes a Hierarchical CLIP model (H-CLIP) for structured scene [...] Read more.
Visual Relationship Prediction (VRP) is crucial for advancing structured scene understanding, yet existing methods struggle with ineffective multimodal fusion, static relationship representations, and a lack of logical consistency. To address these limitations, this paper proposes a Hierarchical CLIP model (H-CLIP) for structured scene parsing. Our approach leverages a pre-trained CLIP backbone to extract aligned visual, textual, and spatial features for entities and their union regions. A multi-head self-attention mechanism then performs deep, dynamic multimodal fusion. The core innovation is a consistency and reversibility verification mechanism, which imposes algebraic constraints as a regularization loss to enforce logical coherence in the learned relation space. Extensive experiments on the Visual Genome dataset demonstrate the superiority of the proposed method. H-CLIP significantly outperforms state-of-the-art baselines on the predicate classification task, achieving a Recall@50 score of 64.31% and a Mean Recall@50 of 36.02%, thereby validating its effectiveness in generating accurate and logically consistent scene graphs even under long-tailed distributions. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

44 pages, 9272 KB  
Systematic Review
Toward a Unified Smart Point Cloud Framework: A Systematic Review of Definitions, Methods, and a Modular Knowledge-Integrated Pipeline
by Mohamed H. Salaheldin, Ahmed Shaker and Songnian Li
Buildings 2026, 16(2), 293; https://doi.org/10.3390/buildings16020293 - 10 Jan 2026
Viewed by 387
Abstract
Reality-capture has made point clouds a primary spatial data source, yet processing and integration limits hinder their potential. Prior reviews focus on isolated phases; by contrast, Smart Point Clouds (SPCs)—augmenting points with semantics, relations, and query interfaces to enable reasoning—received limited attention. This [...] Read more.
Reality-capture has made point clouds a primary spatial data source, yet processing and integration limits hinder their potential. Prior reviews focus on isolated phases; by contrast, Smart Point Clouds (SPCs)—augmenting points with semantics, relations, and query interfaces to enable reasoning—received limited attention. This systematic review synthesizes the state-of-the-art SPC terminology and methods to propose a modular pipeline. Following PRISMA, we searched Scopus, Web of Science, and Google Scholar up to June 2025. We included English-language studies in geomatics and engineering presenting novel SPC methods. Fifty-eight publications met eligibility criteria: Direct (n = 22), Indirect (n = 22), and New Use (n = 14). We formalize an operative SPC definition—queryable, ontology-linked, provenance-aware—and map contributions across traditional point cloud processing stages (from acquisition to modeling). Evidence shows practical value in cultural heritage, urban planning, and AEC/FM via semantic queries, rule checks, and auditable updates. Comparative qualitative analysis reveals cross-study trends: higher and more uniform density stabilizes features but increases computation, and hybrid neuro-symbolic classification improves long-tail consistency; however, methodological heterogeneity precluded quantitative synthesis. We distill a configurable eight-module pipeline and identify open challenges in data at scale, domain transfer, temporal (4D) updates, surface exports, query usability, and sensor fusion. Finally, we recommend lightweight reporting standards to improve discoverability and reuse. Full article
(This article belongs to the Section Construction Management, and Computers & Digitization)
Show Figures

Figure 1

31 pages, 10745 KB  
Article
CNN-GCN Coordinated Multimodal Frequency Network for Hyperspectral Image and LiDAR Classification
by Haibin Wu, Haoran Lv, Aili Wang, Siqi Yan, Gabor Molnar, Liang Yu and Minhui Wang
Remote Sens. 2026, 18(2), 216; https://doi.org/10.3390/rs18020216 - 9 Jan 2026
Viewed by 258
Abstract
The existing multimodal image classification methods often suffer from several key limitations: difficulty in effectively balancing local detail and global topological relationships in hyperspectral image (HSI) feature extraction; insufficient multi-scale characterization of terrain features from light detection and ranging (LiDAR) elevation data; and [...] Read more.
The existing multimodal image classification methods often suffer from several key limitations: difficulty in effectively balancing local detail and global topological relationships in hyperspectral image (HSI) feature extraction; insufficient multi-scale characterization of terrain features from light detection and ranging (LiDAR) elevation data; and neglect of deep inter-modal interactions in traditional fusion methods, often accompanied by high computational complexity. To address these issues, this paper proposes a comprehensive deep learning framework combining convolutional neural network (CNN), a graph convolutional network (GCN), and wavelet transform for the joint classification of HSI and LiDAR data, including several novel components: a Spectral Graph Mixer Block (SGMB), where a CNN branch captures fine-grained spectral–spatial features by multi-scale convolutions, while a parallel GCN branch models long-range contextual features through an enhanced gated graph network. This dual-path design enables simultaneous extraction of local detail and global topological features from HSI data; a Spatial Coordinate Block (SCB) to enhance spatial awareness and improve the perception of object contours and distribution patterns; a Multi-Scale Elevation Feature Extraction Block (MSFE) for capturing terrain representations across varying scales; and a Bidirectional Frequency Attention Encoder (BiFAE) to enable efficient and deep interaction between multimodal features. These modules are intricately designed to work in concert, forming a cohesive end-to-end framework, which not only achieves a more effective balance between local details and global contexts but also enables deep yet computationally efficient interaction across features, significantly strengthening the discriminability and robustness of the learned representation. To evaluate the proposed method, we conducted experiments on three multimodal remote sensing datasets: Houston2013, Augsburg, and Trento. Quantitative results demonstrate that our framework outperforms state-of-the-art methods, achieving OA values of 98.93%, 88.05%, and 99.59% on the respective datasets. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Graphical abstract

20 pages, 2616 KB  
Article
MS-TSEFNet: Multi-Scale Spatiotemporal Efficient Feature Fusion Network
by Weijie Wu, Lifei Liu, Weijie Chen, Yixin Chen, Xingyu Wang, Andrzej Cichocki, Yunhe Lu and Jing Jin
Sensors 2026, 26(2), 437; https://doi.org/10.3390/s26020437 - 9 Jan 2026
Viewed by 195
Abstract
Motor imagery signal decoding is an important research direction in the field of brain–computer interfaces, which aim to judge the motor imagery state of an individual by analyzing electroencephalogram (EEG) signals. Deep learning technology has been gradually applied to EEG classification, which can [...] Read more.
Motor imagery signal decoding is an important research direction in the field of brain–computer interfaces, which aim to judge the motor imagery state of an individual by analyzing electroencephalogram (EEG) signals. Deep learning technology has been gradually applied to EEG classification, which can automatically extract features. However, when processing complex EEG signals, the existing decoding models cannot effectively fuse features at different levels, resulting in limited classification performance. This study proposes a multi-scale spatiotemporal efficient feature fusion network (MS-TSEFNet), which learns the dynamic changes in EEG signals at different time scales through multi-scale convolution modules and combines the spatial attention mechanism to efficiently capture the spatial correlation between electrodes in EEG signals. In addition, the network adopts an efficient feature fusion strategy to deeply fuse features at different levels, thereby improving the expression ability of the model. In the task of motor imagery signal decoding, MS-TSEFNet shows higher accuracy and robustness. We use the public BCIC-IV2a, BCIC-IV2b and ECUST datasets for evaluation. The experimental results show that the average classification accuracy of MS-TSEFNet reaches 80.31%, 86.69% and 71.14%, respectively, which is better than the current state-of-the-art algorithms. We conducted an ablation experiment to further verify the effectiveness of the model. The experimental results showed that each module played an important role in improving the final performance. In particular, the combination of the multi-scale convolution module and the feature fusion module significantly improved the model’s ability to extract the spatiotemporal features of EEG signals. Full article
(This article belongs to the Special Issue EEG Signal Processing Techniques and Applications—3rd Edition)
Show Figures

Figure 1

24 pages, 4797 KB  
Article
PRTNet: Combustion State Recognition Model of Municipal Solid Waste Incineration Process Based on Enhanced Res-Transformer and Multi-Scale Feature Guided Aggregation
by Jian Zhang, Junyu Ge and Jian Tang
Sustainability 2026, 18(2), 676; https://doi.org/10.3390/su18020676 - 9 Jan 2026
Viewed by 178
Abstract
Accurate identification of the combustion state in municipal solid waste incineration (MSWI) processes is crucial for achieving efficient, low-emission, and safe operation. However, existing methods often struggle with stable and reliable recognition due to insufficient feature extraction capabilities when confronted with challenges such [...] Read more.
Accurate identification of the combustion state in municipal solid waste incineration (MSWI) processes is crucial for achieving efficient, low-emission, and safe operation. However, existing methods often struggle with stable and reliable recognition due to insufficient feature extraction capabilities when confronted with challenges such as complex flame morphology, blurred boundaries, and significant noise in flame images. To address this, this paper proposes a novel hybrid architecture model named PRTNet, which aims to enhance the accuracy and robustness of combustion state recognition through multi-scale feature enhancement and adaptive fusion mechanisms. First, a local-semantic enhanced residual network is constructed to establish spatial correlations between fine-grained textures and macroscopic combustion patterns. Subsequently, a feature-adaptive fusion Transformer is designed, which models long-range dependencies and high-frequency details in parallel via deformable attention and local convolutions, and achieves adaptive fusion of global and local features through a gating mechanism. Finally, a cross-scale feature guided aggregation module is proposed to fuse shallow detailed information with deep semantic features under dual-attention guidance. Experiments conducted on a flame image dataset from an MSWI plant in Beijing show that PRTNet achieves an accuracy of 96.29% in the combustion state classification task, with precision, recall, and F1-score all exceeding 96%, significantly outperforming numerous mainstream baseline models. Ablation studies further validate the effectiveness and synergistic effects of each module. The proposed method provides a reliable solution for intelligent flame state recognition in complex industrial scenarios, contributing to the advancement of intelligent and sustainable development in municipal solid waste incineration processes. Full article
(This article belongs to the Special Issue Life Cycle and Sustainability Nexus in Solid Waste Management)
Show Figures

Figure 1

Back to TopTop