Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (300)

Search Parameters:
Keywords = latent deep features

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
28 pages, 3453 KB  
Article
Denoising Adaptive Multi-Branch Architecture for Detecting Cyber Attacks in Industrial Internet of Services
by Ghazia Qaiser and Siva Chandrasekaran
J. Cybersecur. Priv. 2026, 6(1), 26; https://doi.org/10.3390/jcp6010026 - 5 Feb 2026
Abstract
The emerging scope of the Industrial Internet of Services (IIoS) requires a robust intrusion detection system to detect malicious attacks. The increasing frequency of sophisticated and high-impact cyber attacks has resulted in financial losses and catastrophes in IIoS-based manufacturing industries. However, existing solutions [...] Read more.
The emerging scope of the Industrial Internet of Services (IIoS) requires a robust intrusion detection system to detect malicious attacks. The increasing frequency of sophisticated and high-impact cyber attacks has resulted in financial losses and catastrophes in IIoS-based manufacturing industries. However, existing solutions often struggle to adapt and generalize to new cyber attacks. This study proposes a unique approach designed for known and zero-day network attack detection in IIoS environments, called Denoising Adaptive Multi-Branch Architecture (DA-MBA). The proposed approach is a smart, conformal, and self-adjusting cyber attack detection framework featuring denoising representation learning, hybrid neural inference, and open-set uncertainty calibration. The model merges a denoising autoencoder (DAE) to generate noise-tolerant latent representations, which are processed using a hybrid multi-branch classifier combining dense and bidirectional recurrent layers to capture both static and temporal attack signatures. Moreover, it addresses challenges such as adaptability and generalizability by hybridizing a Multilayer Perceptron (MLP) and bidirectional LSTM (BiLSTM). The proposed hybrid model was designed to fuse feed-forward transformations with sequence-aware modeling, which can capture direct feature interactions and any underlying temporal and order-dependent patterns. Multiple approaches have been applied to strengthen the dual-branch architecture, such as class weighting and comprehensive hyperparameter optimization via Optuna, which collectively address imbalanced data, overfitting, and dynamically shifting threat vectors. The proposed DA-MBA is evaluated on two widely recognized IIoT-based datasets, Edge-IIoT set and WUSTL-IIoT-2021 and achieves over 99% accuracy and a near 0.02 loss, underscoring its effectiveness in detecting the most sophisticated attacks and outperforming recent deep learning IDS baselines. The solution offers a scalable and flexible architecture for enhancing cybersecurity within evolving IIoS environments by coupling feature denoising, multi-branch classification, and automated hyperparameter tuning. The results confirm that coupling robust feature denoising with sequence-aware classification can provide a scalable and flexible framework for improving cybersecurity within the IIoS. The proposed architecture offers a scalable, interpretable, and risk sensitive defense mechanism for IIoS, advancing secure, adaptive, and trustworthy industrial cyber-resilience. Full article
(This article belongs to the Special Issue Cyber Security and Digital Forensics—2nd Edition)
Show Figures

Figure 1

9 pages, 1744 KB  
Proceeding Paper
Intelligent Password Guessing Using Feature-Guided Diffusion
by Yi-Ching Huang and Jhe-Wei Lin
Eng. Proc. 2025, 120(1), 51; https://doi.org/10.3390/engproc2025120051 - 5 Feb 2026
Abstract
In modern cybersecurity and deep learning, conditional password guessing plays a critical role in improving password-cracking efficiency by leveraging known patterns and constraints. In contrast with traditional brute-force or dictionary-based attacks, we developed an approach that adopts a latent diffusion model to simulate [...] Read more.
In modern cybersecurity and deep learning, conditional password guessing plays a critical role in improving password-cracking efficiency by leveraging known patterns and constraints. In contrast with traditional brute-force or dictionary-based attacks, we developed an approach that adopts a latent diffusion model to simulate human password selection behavior, generating more realistic password candidates. We incorporated masked character inputs as conditions and applied advanced feature extraction to capture common patterns such as character substitutions and typing habits. Furthermore, we employed visualization techniques, including autoencoders and principal component analysis, to analyze password distributions, enhancing model interpretability and aiding both offensive and defensive security strategies. Full article
(This article belongs to the Proceedings of 8th International Conference on Knowledge Innovation and Invention)
Show Figures

Figure 1

16 pages, 1033 KB  
Article
Harnessing Symmetry in Recurrence Plots: A Multi-Scale Detail Boosting Approach for Time Series Similarity Measurement
by Jiancheng Yin, Xuye Zhuang, Wentao Sui and Yunlong Sheng
Symmetry 2026, 18(2), 290; https://doi.org/10.3390/sym18020290 - 4 Feb 2026
Abstract
Time series similarity measurement is a fundamental task underpinning clustering, classification, and anomaly detection. Traditional approaches predominantly rely on one-dimensional data representations, which often fail to capture complex structural dependencies. To address this limitation, this paper proposes a novel similarity measurement framework based [...] Read more.
Time series similarity measurement is a fundamental task underpinning clustering, classification, and anomaly detection. Traditional approaches predominantly rely on one-dimensional data representations, which often fail to capture complex structural dependencies. To address this limitation, this paper proposes a novel similarity measurement framework based on two-dimensional image enhancement. The method initially transforms one-dimensional time series into recurrence plots (RPs), converting temporal dynamics into visually symmetric textures, enhancing the temporal information of the one-dimensional time series. To overcome the potential blurring of fine-grained information during transformation, multi-scale detail boosting (MSDB) is introduced to amplify the high-frequency components and textural details of the RP images. Subsequently, a pre-trained ResNet-18 network is utilized to extract deep visual features from the enhanced images, and the similarity is quantified using the Euclidean distance of these feature vectors. Extensive experiments on the UCR Time Series Classification Archive demonstrate that the proposed method effectively leverages image enhancement to reveal latent temporal patterns. This approach leverages the inherent symmetry properties embedded in recurrence plots. By enhancing the texture of these symmetrical structures, the proposed method provides a more robust and informative basis for similarity assessment. Full article
(This article belongs to the Section Mathematics)
24 pages, 1390 KB  
Article
NornirNet: A Deep Learning Framework to Distinguish Benign from Malignant Type II Endoleaks After Endovascular Aortic Aneurysm Repair Using Preoperative Imaging
by Francesco Andreoli, Fabio Mattiussi, Elias Wasseh, Andrea Leoncini, Ludovica Ettorre, Jacopo Galafassi, Maria Antonella Ruffino, Luca Giovannacci, Alessandro Robaldo and Giorgio Prouse
AI 2026, 7(2), 57; https://doi.org/10.3390/ai7020057 - 4 Feb 2026
Abstract
Background/Objectives: Type II endoleak (T2EL) remains the most frequent complication after endovascular aortic aneurysm repair (EVAR), with uncertain clinical relevance and management. While most resolve spontaneously, persistent T2ELs can lead to sac enlargement and rupture risk. This study proposes a deep learning framework [...] Read more.
Background/Objectives: Type II endoleak (T2EL) remains the most frequent complication after endovascular aortic aneurysm repair (EVAR), with uncertain clinical relevance and management. While most resolve spontaneously, persistent T2ELs can lead to sac enlargement and rupture risk. This study proposes a deep learning framework for preoperative prediction of T2EL occurrence and severity using volumetric computed tomography angiography (CTA) data. Methods: A retrospective analysis of 277 patients undergoing standard EVAR (2010–2023) was performed. Preoperative CTA scans were processed for volumetric normalization and fed into a 3D convolutional neural network (CNN) trained to classify patients into three categories: no T2EL, benign T2EL, or malignant T2EL. The model was trained on 175 cases, validated on 72, and tested on an independent cohort of 30 patients. Performance metrics included accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). Results: The CNN achieved an overall accuracy of 76.7% (95% CI: 0.63–0.90), a macro-averaged F1-score of 0.77, and an AUC of 0.93. Class-specific AUCs were 0.93 for no T2EL, 0.91 for benign, and 0.96 for malignant cases, confirming high discriminative capacity across outcomes. Most misclassifications occurred between adjacent categories. Conclusions: This study introduces the first end-to-end 3D CNN capable of predicting both the presence and severity of T2EL directly from preoperative CTA, without manual segmentation or handcrafted features. These findings suggest that preoperative imaging encodes latent structural information predictive of endoleak-driven sac reperfusion, potentially enabling personalized pre-emptive embolization strategies and tailored surveillance after EVAR. Full article
(This article belongs to the Special Issue The Future of Image Processing: Leveraging Pattern Recognition and AI)
28 pages, 922 KB  
Article
MAESTRO: A Multi-Scale Ensemble Framework with GAN-Based Data Refinement for Robust Malicious Tor Traffic Detection
by Jinbu Geng, Yu Xie, Jun Li, Xuewen Yu and Lei He
Mathematics 2026, 14(3), 551; https://doi.org/10.3390/math14030551 - 3 Feb 2026
Abstract
Malicious Tor traffic data contains deep domain-specific knowledge, which makes labeling challenging, and the lack of labeled data degrades the accuracy of learning-based detectors. Real-world deployments also exhibit severe class imbalance, where malicious traffic constitutes a small minority of network flows, which further [...] Read more.
Malicious Tor traffic data contains deep domain-specific knowledge, which makes labeling challenging, and the lack of labeled data degrades the accuracy of learning-based detectors. Real-world deployments also exhibit severe class imbalance, where malicious traffic constitutes a small minority of network flows, which further reduces detection performance. In addition, Tor’s fixed 512-byte cell architecture removes packet-size diversity that many encrypted-traffic methods rely on, making feature extraction difficult. This paper proposes an efficient three-stage framework, MAESTRO v1.0, for malicious Tor traffic detection. In Stage 1, MAESTRO extracts multi-scale behavioral signatures by fusing temporal, positional, and directional embeddings at cell, direction, and flow granularities to mitigate feature homogeneity; it then compresses these representations with an autoencoder into compact latent features. In Stage 2, MAESTRO introduces an ensemble-based quality quantification method that combines five complementary anomaly detection models to produce robust discriminability scores for adaptive sample weighting, helping the classifier to emphasize high-quality samples. MAESTRO also trains three specialized GANs per minority class and applies strict five-model ensemble validation to synthesize diverse high-fidelity samples, addressing extreme class imbalance. We evaluate MAESTRO under systematic imbalance settings, ranging from the natural distribution to an extreme 1% malicious ratio. On the CCS’22 Tor malware dataset, MAESTRO achieves 92.38% accuracy, 64.79% recall, and 73.70% F1-score under the natural distribution, improving F1-score by up to 15.53% compared with state-of-the-art baselines. Under the 1% malicious setting, MAESTRO maintains 21.1% recall, which is 14.1 percentage points higher than the best baseline, while conventional methods drop below 10%. Full article
(This article belongs to the Special Issue New Advances in Network Security and Data Privacy)
Show Figures

Figure 1

15 pages, 978 KB  
Article
SpectTrans: Joint Spectral–Temporal Modeling for Polyphonic Piano Transcription via Spectral Gating Networks
by Rui Cao, Yan Liang, Lei Feng and Yuanzi Li
Electronics 2026, 15(3), 665; https://doi.org/10.3390/electronics15030665 - 3 Feb 2026
Viewed by 97
Abstract
Automatic Music Transcription (AMT) plays a fundamental role in Music Information Retrieval (MIR) by converting raw audio signals into symbolic representations such as MIDI or musical scores. Despite advances in deep learning, accurately transcribing piano performances remains challenging due to dense polyphony, wide [...] Read more.
Automatic Music Transcription (AMT) plays a fundamental role in Music Information Retrieval (MIR) by converting raw audio signals into symbolic representations such as MIDI or musical scores. Despite advances in deep learning, accurately transcribing piano performances remains challenging due to dense polyphony, wide dynamic range, sustain pedal effects, and harmonic interactions between simultaneous notes. Existing approaches using convolutional and recurrent architectures, or autoregressive models, often fail to capture long-range temporal dependencies and global harmonic structures, while conventional Vision Transformers overlook the anisotropic characteristics of audio spectrograms, leading to harmonic neglect. In this work, we propose SpectTrans, a novel piano transcription framework that integrates a Spectral Gating Network with a multi-head self-attention Transformer to jointly model spectral and temporal dependencies. Latent CNN features are projected into the frequency domain via a Real Fast Fourier Transform, enabling adaptive filtering of overlapping harmonics and suppression of non-stationary noise, while deeper layers capture long-term melodic and chordal relationships. Experimental evaluation on polyphonic piano datasets demonstrates that this architecture produces acoustically coherent representations, improving the robustness and precision of transcription under complex performance conditions. These results suggest that combining frequency-domain refinement with global temporal modeling provides an effective strategy for high-fidelity AMT. Full article
Show Figures

Figure 1

24 pages, 3150 KB  
Article
An Intrusion Detection Model Based on Equalization Loss and Spatio-Temporal Feature Extraction
by Miaolei Deng, Shaojun Fan, Yupei Kan and Chuanchuan Sun
Electronics 2026, 15(3), 646; https://doi.org/10.3390/electronics15030646 - 2 Feb 2026
Viewed by 152
Abstract
In recent years, the expansion of network scale and the diversification of attack methods pose dual challenges to intrusion detection systems in extracting effective features and addressing class imbalance. To address these issues, the Spatial–Temporal Equilibrium Graph Convolutional Network (STEGCN) is proposed. This [...] Read more.
In recent years, the expansion of network scale and the diversification of attack methods pose dual challenges to intrusion detection systems in extracting effective features and addressing class imbalance. To address these issues, the Spatial–Temporal Equilibrium Graph Convolutional Network (STEGCN) is proposed. This model integrates Graph Convolutional Network (GCN) and Gated Recurrent Unit (GRU), leveraging GCN to extract high-order spatial features from network traffic data while capturing complex topological relationships and latent patterns. Meanwhile, GRU efficiently models the dynamic evolution of network traffic over time, accurately depicting temporal trends and anomaly patterns. The synergy of these two components provides a comprehensive representation of network behavior. To mitigate class imbalance in intrusion detection, the Equalization Loss v2 (EQLv2) is introduced. By dynamically adjusting gradient contributions, this function reduces the dominance of majority classes, thereby enhancing the model’s sensitivity to minority-class attacks. Experimental results demonstrate that STEGCN achieves superior detection performance on the UNSW-NB15 and CICIDS2017 datasets. Compared with traditional deep learning models, STEGCN shows significant improvements in accuracy and recall, particularly in detecting minority-class intrusions. Full article
Show Figures

Figure 1

31 pages, 4489 KB  
Article
A Hybrid Intrusion Detection Framework Using Deep Autoencoder and Machine Learning Models
by Salam Allawi Hussein and Sándor R. Répás
AI 2026, 7(2), 39; https://doi.org/10.3390/ai7020039 - 25 Jan 2026
Viewed by 350
Abstract
This study provides a detailed comparative analysis of a three-hybrid intrusion detection method aimed at strengthening network security through precise and adaptive threat identification. The proposed framework integrates an Autoencoder-Gaussian Mixture Model (AE-GMM) with two supervised learning techniques, XGBoost and Logistic Regression, combining [...] Read more.
This study provides a detailed comparative analysis of a three-hybrid intrusion detection method aimed at strengthening network security through precise and adaptive threat identification. The proposed framework integrates an Autoencoder-Gaussian Mixture Model (AE-GMM) with two supervised learning techniques, XGBoost and Logistic Regression, combining deep feature extraction with interpretability and stable generalization. Although the downstream classifiers are trained in a supervised manner, the hybrid intrusion detection nature of the framework is preserved through unsupervised representation learning and probabilistic modeling in the AE-GMM stage. Two benchmark datasets were used for evaluation: NSL-KDD, representing traditional network behavior, and UNSW-NB15, reflecting modern and diverse traffic patterns. A consistent preprocessing pipeline was applied, including normalization, feature selection, and dimensionality reduction, to ensure fair comparison and efficient training. The experimental findings show that hybridizing deep learning with gradient-boosted and linear classifiers markedly enhances detection performance and resilience. The AE–GMM-XGBoost model achieved superior outcomes, reaching an F1-score above 0.94 ± 0.0021 and an AUC greater than 0.97 on both datasets, demonstrating high accuracy in distinguishing legitimate and malicious traffic. AE-GMM-Logistic Regression also achieved strong and balanced performance, recording an F1-score exceeding 0.91 ± 0.0020 with stable generalization across test conditions. Conversely, the standalone AE-GMM effectively captured deep latent patterns but exhibited lower recall, indicating limited sensitivity to subtle or emerging attacks. These results collectively confirm that integrating autoencoder-based representation learning with advanced supervised models significantly improves intrusion detection in complex network settings. The proposed framework therefore provides a solid and extensible basis for future research in explainable and federated intrusion detection, supporting the development of adaptive and proactive cybersecurity defenses. Full article
Show Figures

Figure 1

23 pages, 3926 KB  
Article
Spatiotemporal Correlation Hybrid Deep Learning Model for Dissolved Oxygen Prediction in Water
by Yajie Gu, Yin Zhao, Hao Wang and Fengliang Huang
Sustainability 2026, 18(2), 863; https://doi.org/10.3390/su18020863 - 14 Jan 2026
Viewed by 194
Abstract
Surface water is essential for sustaining ecosystems and supporting human socio-economic development, yet pollution from urbanization increasingly threatens its ecological sustainability. The accurate prediction of dissolved oxygen (DO), as an important indicator of water quality, is crucial for water resource protection. To address [...] Read more.
Surface water is essential for sustaining ecosystems and supporting human socio-economic development, yet pollution from urbanization increasingly threatens its ecological sustainability. The accurate prediction of dissolved oxygen (DO), as an important indicator of water quality, is crucial for water resource protection. To address the methodological gaps in current research, we propose a hybrid deep learning model (GCG) that integrates spatiotemporal correlations to enhance DO prediction accuracy through the systematic exploitation of latent data dependencies. This study proposes a three-stage modeling framework: (1) A novel adjacency matrix construction methodology based on Pearson correlation coefficients is developed to quantify spatial correlations between monitoring stations, enabling spatial feature aggregation via graph convolutional networks (GCNs); (2) the spatially enhanced features are subsequently processed through 1D convolutional neural networks (CNNs) to capture temporal local patterns; (3) model performance is comprehensively evaluated using four metrics: R2, RMSE, MAE, and MAPE. The proposed model was implemented for DO prediction in Lake Taihu, China. Experimental results demonstrate that compared to conventional adjacency matrix construction methods, the Pearson correlation-based adjacency matrix confers advantages, achieving at least a 5% reduction in RMSE and over 10% improvement in MAE and MAPE. Furthermore, the GCG model outperformed the comparison model, with an R2 enhancement of 8%, while reducing RMSE and MAE by over 70% and 60%, respectively. These results validate the model’s effectiveness in mining spatiotemporal correlations for regional water quality forecasting, offering a reliable tool toward sustainable water monitoring and ecosystem-based management. Full article
(This article belongs to the Section Sustainable Water Management)
Show Figures

Figure 1

20 pages, 36648 KB  
Article
Global Lunar FeO Mapping via Wavelet–Autoencoder Feature Learning from M3 Hyperspectral Data
by Julia Fernández–Díaz, Fernando Sánchez Lasheras, Javier Gracia Rodríguez, Santiago Iglesias Álvarez, Antonio Luis Marqués Sierra and Francisco Javier de Cos Juez
Mathematics 2026, 14(2), 254; https://doi.org/10.3390/math14020254 - 9 Jan 2026
Viewed by 253
Abstract
Accurate global mapping of lunar iron oxide (FeO) abundance is essential for understanding the Moon’s geological evolution and for supporting future in situ resource utilization (ISRU). While hyperspectral data from the Moon Mineralogy Mapper (M3) provide a unique combination of high spectral dimensionality, [...] Read more.
Accurate global mapping of lunar iron oxide (FeO) abundance is essential for understanding the Moon’s geological evolution and for supporting future in situ resource utilization (ISRU). While hyperspectral data from the Moon Mineralogy Mapper (M3) provide a unique combination of high spectral dimensionality, hectometre-scale spatial resolution, and near-global coverage, existing FeO retrieval approaches struggle to fully exploit the high dimensionality, nonlinear spectral variability, and planetary-scale volume of the Global Mode dataset. To address these limitations, we present an integrated machine learning pipeline for estimating lunar FeO abundance from M3 hyperspectral observations. Unlike traditional methods based on raw reflectance or empirical spectral indices, the proposed framework combines Discrete Wavelet Transform (DWT), deep autoencoder-based feature compression, and ensemble regression to achieve robust and scalable FeO prediction. M3 spectra (83 bands, 475–3000 nm) are transformed using a Daubechies-4 (db4) DWT to extract 42 representative coefficients per pixel, capturing the dominant spectral information while filtering high-frequency noise. These features are further compressed into a six-dimensional latent space via a deep autoencoder and used as input to a Random Forest regressor, which outperforms kernel-based and linear Support Vector Regression (SVR) as well as Lasso regression in predictive accuracy and stability. The proposed model achieves an average prediction error of 1.204 wt.% FeO and demonstrates consistent performance across diverse lunar geological units. Applied to 806 orbital tracks (approximately 3.5×109 pixels), covering more than 95% of the lunar surface, the pipeline produces a global FeO abundance map at 150 m per pixel resolution. These results demonstrate the potential of integrating multiscale wavelet representations with nonlinear feature learning to enable large-scale, geochemically constrained planetary mineral mapping. Full article
Show Figures

Figure 1

27 pages, 13798 KB  
Article
A Hierarchical Deep Learning Architecture for Diagnosing Retinal Diseases Using Cross-Modal OCT to Fundus Translation in the Lack of Paired Data
by Ekaterina A. Lopukhova, Gulnaz M. Idrisova, Timur R. Mukhamadeev, Grigory S. Voronkov, Ruslan V. Kutluyarov and Elizaveta P. Topolskaya
J. Imaging 2026, 12(1), 36; https://doi.org/10.3390/jimaging12010036 - 8 Jan 2026
Viewed by 415
Abstract
The paper focuses on automated diagnosis of retinal diseases, particularly Age-related Macular Degeneration (AMD) and diabetic retinopathy (DR), using optical coherence tomography (OCT), while addressing three key challenges: disease comorbidity, severe class imbalance, and the lack of strictly paired OCT and fundus data. [...] Read more.
The paper focuses on automated diagnosis of retinal diseases, particularly Age-related Macular Degeneration (AMD) and diabetic retinopathy (DR), using optical coherence tomography (OCT), while addressing three key challenges: disease comorbidity, severe class imbalance, and the lack of strictly paired OCT and fundus data. We propose a hierarchical modular deep learning system designed for multi-label OCT screening with conditional routing to specialized staging modules. To enable DR staging when fundus images are unavailable, we use cross-modal alignment between OCT and fundus representations. This approach involves training a latent bridge that projects OCT embeddings into the fundus feature space. We enhance clinical reliability through per-class threshold calibration and implement quality control checks for OCT-only DR staging. Experiments demonstrate robust multi-label performance (macro-F1 =0.989±0.006 after per-class threshold calibration) and reliable calibration (ECE =2.1±0.4%), and OCT-only DR staging is feasible in 96.1% of cases that meet the quality control criterion. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

43 pages, 1151 KB  
Review
Clustering of Temporal and Visual Data: Recent Advancements
by Priyanka Mudgal
Data 2026, 11(1), 7; https://doi.org/10.3390/data11010007 - 4 Jan 2026
Cited by 1 | Viewed by 411
Abstract
Clustering plays a central role in uncovering latent structure within both temporal and visual data. It enables critical insights in various domains including healthcare, finance, surveillance, autonomous systems, and many more. With the growing volume and complexity of time-series and image-based datasets, there [...] Read more.
Clustering plays a central role in uncovering latent structure within both temporal and visual data. It enables critical insights in various domains including healthcare, finance, surveillance, autonomous systems, and many more. With the growing volume and complexity of time-series and image-based datasets, there is an increasing demand for robust, flexible, and scalable clustering algorithms. Although these modalities differ—time-series being inherently sequential and vision data being spatial—they exhibit common challenges such as high dimensionality, noise, variability in alignment and scale, and the need for interpretable groupings. This survey presents a comprehensive review of recent advancements in clustering methods that are adaptable to both time-series and vision data. We explore a wide spectrum of approaches, including distance-based techniques (e.g., DTW, EMD), feature-based methods, model-based strategies (e.g., GMMs, HMMs), and deep learning frameworks such as autoencoders, self-supervised learning, and graph neural networks. We also survey hybrid and ensemble models, as well as semi-supervised and active clustering methods that leverage minimal supervision for improved performance. By highlighting both the shared principles and the modality-specific adaptations of clustering strategies, this work outlines current capabilities and open challenges, and suggests future directions toward unified, multimodal clustering systems. Full article
(This article belongs to the Section Featured Reviews of Data Science Research)
Show Figures

Figure 1

16 pages, 1260 KB  
Article
DAR-Swin: Dual-Attention Revamped Swin Transformer for Intelligent Vehicle Perception Under NVH Disturbances
by Xinglong Zhang, Zhiguo Zhang, Huihui Zuo, Chaotan Xue, Zhenjiang Wu, Zhiyu Cheng and Yan Wang
Machines 2026, 14(1), 51; https://doi.org/10.3390/machines14010051 - 31 Dec 2025
Viewed by 281
Abstract
In recent years, deep learning-based image classification has made significant progress, especially in safety-critical perception fields such as intelligent vehicles. Factors such as vibrations caused by NVH (noise, vibration, and harshness), sensor noise, and road surface roughness pose challenges to robustness and real-time [...] Read more.
In recent years, deep learning-based image classification has made significant progress, especially in safety-critical perception fields such as intelligent vehicles. Factors such as vibrations caused by NVH (noise, vibration, and harshness), sensor noise, and road surface roughness pose challenges to robustness and real-time deployment. The Transformer architecture has become a fundamental component of high-performance models. However, in complex visual environments, shifted window attention mechanisms exhibit inherent limitations: although computationally efficient, local window constraints impede cross-region semantic integration, while deep feature processing obstructs robust representation learning. To address these challenges, we propose DAR-Swin (Dual-Attention Revamped Swin Transformer), enhancing the framework through two complementary attention mechanisms. First, Scalable Self-Attention universally substitutes the standard Window-based Multi-head Self-Attention via sub-quadratic complexity operators. These operators decouple spatial positions from feature associations, enabling position-adaptive receptive fields for comprehensive contextual modeling. Second, Latent Proxy Attention integrated before the classification head adopts a learnable spatial proxy to integrate global semantic information into a fixed-size representation, while preserving relational semantics and achieving linear computational complexity through efficient proxy interactions. Extensive experiments demonstrate significant improvements over Swin Transformer Base, achieving 87.3% top-1 accuracy on CIFAR-100 (+1.5% absolute improvement) and 57.0% mAP on COCO2017 (+1.3% absolute improvement). These characteristics are particularly important for the active and passive safety features of intelligent vehicles. Full article
Show Figures

Figure 1

27 pages, 5157 KB  
Article
Remote Sensing Scene Classification via Multi-Feature Fusion Based on Discriminative Multiple Canonical Correlation Analysis
by Shavkat Fazilov, Ozod Yusupov, Yigitali Khandamov, Erali Eshonqulov, Jalil Khamidov and Khabiba Abdieva
AI 2026, 7(1), 5; https://doi.org/10.3390/ai7010005 - 23 Dec 2025
Cited by 1 | Viewed by 617
Abstract
Scene classification in remote sensing images is one of the urgent tasks that requires an improvement in recognition accuracy due to complex spatial structures and high inter-class similarity. Although feature extraction using convolutional neural networks provides high efficiency, combining deep features obtained from [...] Read more.
Scene classification in remote sensing images is one of the urgent tasks that requires an improvement in recognition accuracy due to complex spatial structures and high inter-class similarity. Although feature extraction using convolutional neural networks provides high efficiency, combining deep features obtained from different architectures in a semantically consistent manner remains an important scientific problem. In this study, a DMCCA + SVM model is proposed, in which Discriminative Multiple Canonical Correlation Analysis (DMCCA) is applied to fuse multi-source deep features, and final classification is performed using a Support Vector Machine (SVM). Unlike conventional fusion methods, DMCCA projects heterogeneous features into a unified low-dimensional latent space by maximizing within-class correlation and minimizing between-class correlation, resulting in a more separable and compact feature space. The proposed approach was evaluated on three widely used benchmark datasets—NWPU-RESISC45, AID, and PatternNet—and achieved accuracy scores of 92.75%, 93.92%, and 99.35%, respectively. The results showed that the model outperforms modern individual CNN architectures. Additionally, the model’s stability and generalization capability were confirmed through K-fold cross-validation. Overall, the proposed DMCCA + SVM model was experimentally validated as an effective and reliable solution for high-accuracy classification of remote sensing scenes. Full article
Show Figures

Figure 1

19 pages, 1830 KB  
Article
Robust Target Association Method with Weighted Bipartite Graph Optimal Matching in Multi-Sensor Fusion
by Hanbao Wu, Wei Chen and Weiming Chen
Sensors 2026, 26(1), 49; https://doi.org/10.3390/s26010049 - 20 Dec 2025
Viewed by 407
Abstract
Accurate group target association is essential for multi-sensor multi-target tracking, particularly in heterogeneous radar systems where systematic biases, asynchronous observations, and dense formations frequently cause ambiguous or incorrect associations. Existing approaches often rely on strict spatial assumptions or pre-trained models, limiting their robustness [...] Read more.
Accurate group target association is essential for multi-sensor multi-target tracking, particularly in heterogeneous radar systems where systematic biases, asynchronous observations, and dense formations frequently cause ambiguous or incorrect associations. Existing approaches often rely on strict spatial assumptions or pre-trained models, limiting their robustness when measurement distortions and sensor-specific deviations are present. To address these challenges, this work proposes a robust association framework that integrates deep feature embedding, density-adaptive clustering, and global graph-theoretic matching. The method first applies an autoencoder–HDBSCAN clustering scheme to extract stable latent representations and obtain adaptive group structures under nonlinear distortions and non-uniform target densities. A weighted bipartite graph is then constructed, and a global optimal matching strategy is employed to compensate for heterogeneous systematic errors while preserving inter-group structural consistency. A mutual-support verification mechanism further enhances robustness against random disturbances. Monte Carlo experiments show that the proposed method maintains over 90% association accuracy even in dense scenarios with a target spacing of 1.4 km. Under various systematic bias conditions, it outperforms representative baselines such as Deep Association and JPDA by more than 20%. These results demonstrate the method’s robustness, adaptability, and suitability for practical multi-radar applications. The framework is training-free and easily deployable, offering a reliable solution for group target association in real-world multi-sensor fusion systems. Full article
Show Figures

Figure 1

Back to TopTop