Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (35)

Search Parameters:
Keywords = MLP-Mixer

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 16353 KB  
Article
RepACNet: A Lightweight Reparameterized Asymmetric Convolution Network for Monocular Depth Estimation
by Wanting Jiang, Jun Li, Yaoqian Niu, Hao Chen and Shuang Peng
Sensors 2026, 26(4), 1199; https://doi.org/10.3390/s26041199 - 12 Feb 2026
Viewed by 334
Abstract
Monocular depth estimation (MDE) is a cornerstone task in 2D/3D scene reconstruction and recognition with widespread applications in autonomous driving, robotics, and augmented reality. However, existing state-of-the-art methods face a fundamental trade-off between computational efficiency and estimation accuracy, limiting their deployment in resource-constrained [...] Read more.
Monocular depth estimation (MDE) is a cornerstone task in 2D/3D scene reconstruction and recognition with widespread applications in autonomous driving, robotics, and augmented reality. However, existing state-of-the-art methods face a fundamental trade-off between computational efficiency and estimation accuracy, limiting their deployment in resource-constrained real-world scenarios. It is of high interest to design lightweight but effective models to enable potential deployment on resource-constrained mobile devices. To address this problem, we present RepACNet, a novel lightweight network that addresses this challenge through reparameterized asymmetric convolution designs and CNN-based architecture that integrates MLP-Mixer components. First, we propose Reparameterized Token Mixer with Asymmetric Convolution (RepTMAC), an efficient block that captures long-range dependencies while maintaining linear computational complexity. Unlike Transformer-based methods, our approach achieves global feature interaction with tiny overhead. Second, we introduce Squeeze-and-Excitation Consecutive Dilated Convolutions (SECDCs), which integrates adaptive channel attention with dilated convolutions to capture depth-specific features across multiple scales. We validate the effectiveness of our approach through extensive experiments on two widely recognized benchmarks, NYU Depth v2 and KITTI Eigen. The experimental results demonstrate that our model achieves competitive performance while maintaining significantly fewer parameters compared to state-of-the-art models. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

23 pages, 3990 KB  
Article
DB-MLP: A Lightweight Dual-Branch MLP for Road Roughness Classification Using Vehicle Sprung Mass Acceleration
by Defu Chen, Mingye Li, Guojun Chen, Junyu He and Xiaoai Lu
Sensors 2026, 26(3), 990; https://doi.org/10.3390/s26030990 - 3 Feb 2026
Viewed by 268
Abstract
Accurate identification of road roughness is pivotal for optimizing vehicle suspension control and enhancing passenger comfort. However, existing data-driven methods often struggle to balance classification accuracy with the strict computational constraints of real-time onboard monitoring. To address this challenge, this paper proposes a [...] Read more.
Accurate identification of road roughness is pivotal for optimizing vehicle suspension control and enhancing passenger comfort. However, existing data-driven methods often struggle to balance classification accuracy with the strict computational constraints of real-time onboard monitoring. To address this challenge, this paper proposes a lightweight and robust road roughness classification framework utilizing a single sprung mass accelerometer. First, to overcome the scarcity of labeled real-world data and the limitations of linear models, a high-fidelity co-simulation platform combining CarSim and Simulink is established. This platform generates physically consistent vibration datasets covering ISO A–F roughness levels, effectively capturing nonlinear suspension dynamics. Second, we introduce DB-MLP, a novel Dual-Branch Multi-Layer Perceptron architecture. In contrast to computationally intensive Transformer or RNN-based models, DB-MLP employs a dual-branch strategy with multi-resolution temporal projection to efficiently capture multi-scale dependencies, and integrates dual-domain (time and position-wise) feature transformation blocks for robust feature extraction. Experimental results demonstrate that DB-MLP achieves a superior accuracy of 98.5% with only 0.58 million parameters. Compared to leading baselines such as TimeMixer and InceptionTime, our model reduces inference latency by approximately 20 times (0.007 ms/sample) while maintaining competitive performance on the specific road classification task. This study provides a cost-effective, high-precision solution suitable for real-time deployment on embedded vehicle systems. Full article
(This article belongs to the Section Vehicular Sensing)
Show Figures

Figure 1

24 pages, 3522 KB  
Article
Deep Learning-Assisted Detection and Classification of Thymoma Tumors in CT Scans
by Murat Kılıç, Merve Bıyıklı, Salih Taha Alperen Özçelik, Hüseyin Üzen and Hüseyin Fırat
Diagnostics 2025, 15(24), 3191; https://doi.org/10.3390/diagnostics15243191 - 14 Dec 2025
Viewed by 602
Abstract
Background/Objectives: Thymoma is a rare epithelial neoplasm originating from the thymus gland, and its accurate detection and classification using computed tomography (CT) images remain diagnostically challenging due to subtle morphological similarities with other mediastinal pathologies. This study presents a deep learning (DL)-based model [...] Read more.
Background/Objectives: Thymoma is a rare epithelial neoplasm originating from the thymus gland, and its accurate detection and classification using computed tomography (CT) images remain diagnostically challenging due to subtle morphological similarities with other mediastinal pathologies. This study presents a deep learning (DL)-based model designed to improve diagnostic accuracy for both thymoma detection and subtype classification (benign vs. malignant). Methods: The proposed approach integrates a pre-trained VGG16 network for efficient feature extraction—capitalizing on its capacity to capture hierarchical spatial features—and an MLP-Mixer-based feature enhancement module, which effectively models both local and global feature dependencies without relying on conventional convolutional mechanisms. Additionally, customized preprocessing and post-processing methods are employed to enhance image quality and suppress redundant data. The model’s performance was evaluated on two classification tasks: distinguishing thymoma from healthy cases and discriminating between benign and malignant thymoma. Comparative analysis was conducted against state-of-the-art DL models including ResNet50, ResNet34, SEResNeXt50, InceptionResNetV2, MobileNetV2, VGG16, InceptionV3, and DenseNet121 using metrics such as F1 score, accuracy, recall, and precision. Results: The model proposed in this study obtained its best performance in thymoma vs. healthy classification, with an accuracy of 97.15% and F1 score of 80.99%. In the benign vs. malignant task, it attained an accuracy of 79.20% and an F1 score of 78.51%, outperforming all baseline methods. Conclusions: The integration of VGG16’s robust spatial feature extraction and the MLP-Mixer’s effective feature mixing demonstrates superior and balanced performance, highlighting the model’s potential for clinical decision support in thymoma diagnosis. Full article
(This article belongs to the Special Issue Diagnostic Imaging of Pulmonary Diseases)
Show Figures

Figure 1

17 pages, 4375 KB  
Article
Improving the Detection Performance of Cardiovascular Diseases from Heart Sound Signals with a New Deep Learning-Based Approach
by Ozgen Safak, Mehmet Tolga Hekim, Tolga Cakmak, Fatih Demir and Kursat Demir
Diagnostics 2025, 15(18), 2379; https://doi.org/10.3390/diagnostics15182379 - 18 Sep 2025
Viewed by 1056
Abstract
Background/Objectives: Cardiovascular diseases are among the leading causes of death worldwide. Early diagnosis of these conditions minimizes the risk of future death. Listening to heart sounds with a stethoscope is one of the easiest and fastest methods for diagnosing heart conditions. While [...] Read more.
Background/Objectives: Cardiovascular diseases are among the leading causes of death worldwide. Early diagnosis of these conditions minimizes the risk of future death. Listening to heart sounds with a stethoscope is one of the easiest and fastest methods for diagnosing heart conditions. While heart sounds are a quick and easy diagnostic method, they require significant expert interpretation. Recently, artificial intelligence models trained based on these expert interpretations have become popular in the development of decision support systems. Methods: The proposed approach uses the popular 2016 PhysioNet/CinC Challenge dataset for PCG signals. Spectrogram image transformation was then performed to increase the representativeness of these signals. A deep learning-based model that allows for the simultaneous training of residual and attention blocks and the MLP-mixer model was used for feature extraction. A new algorithm combining the strengths of NCA and ReliefF algorithms was proposed to select the strongest features in the feature set. The SVM algorithm was used for classification. Results: With this proposed approach, over 98% success was achieved in all accuracy, sensitivity, specificity, precision, and F1-score metrics. Conclusions: As a result, an artificial intelligence-based decision support system that detects cardiovascular diseases with high accuracy is presented. Full article
(This article belongs to the Special Issue Artificial Intelligence in Cardiovascular and Stroke Imaging)
Show Figures

Figure 1

17 pages, 1455 KB  
Article
STID-Mixer: A Lightweight Spatio-Temporal Modeling Framework for AIS-Based Vessel Trajectory Prediction
by Leiyu Wang, Jian Zhang, Guangyin Jin and Xinyu Dong
Eng 2025, 6(8), 184; https://doi.org/10.3390/eng6080184 - 3 Aug 2025
Viewed by 1148
Abstract
The Automatic Identification System (AIS) has become a key data source for ship behavior monitoring and maritime traffic management, widely used in trajectory prediction and anomaly detection. However, AIS data suffer from issues such as spatial sparsity, heterogeneous features, variable message formats, and [...] Read more.
The Automatic Identification System (AIS) has become a key data source for ship behavior monitoring and maritime traffic management, widely used in trajectory prediction and anomaly detection. However, AIS data suffer from issues such as spatial sparsity, heterogeneous features, variable message formats, and irregular sampling intervals, while vessel trajectories are characterized by strong spatial–temporal dependencies. These factors pose significant challenges for efficient and accurate modeling. To address this issue, we propose a lightweight vessel trajectory prediction framework that integrates Spatial–Temporal Identity encoding with an MLP-Mixer architecture. The framework discretizes spatial and temporal features into structured IDs and uses dual MLP modules to model temporal dependencies and feature interactions without relying on convolution or attention mechanisms. Experiments on a large-scale real-world AIS dataset demonstrate that the proposed STID-Mixer achieves superior accuracy, training efficiency, and generalization capability compared to representative baseline models. The method offers a compact and deployable solution for large-scale maritime trajectory modeling. Full article
Show Figures

Figure 1

24 pages, 2324 KB  
Article
FUSE-Net: Multi-Scale CNN for NIR Band Prediction from RGB Using GNDVI-Guided Green Channel Enhancement
by Gwanghyeong Lee, Deepak Ghimire, Donghoon Kim, Sewoon Cho, Byoungjun Kim and Sunghwan Jeong
Sensors 2025, 25(13), 4076; https://doi.org/10.3390/s25134076 - 30 Jun 2025
Viewed by 1561
Abstract
Hyperspectral imaging (HSI) is a powerful tool for precision imaging tasks such as vegetation analysis, but its widespread use remains limited due to the high cost of equipment and challenges in data acquisition. To explore a more accessible alternative, we propose a Green [...] Read more.
Hyperspectral imaging (HSI) is a powerful tool for precision imaging tasks such as vegetation analysis, but its widespread use remains limited due to the high cost of equipment and challenges in data acquisition. To explore a more accessible alternative, we propose a Green Normalized Difference Vegetation Index (GNDVI)-guided green channel adjustment method, termed G-RGB, which enables the estimation of near-infrared (NIR) reflectance from standard RGB image inputs. The G-RGB method enhances the green channel to encode NIR-like information, generating a spectrally enriched representation. Building on this, we introduce FUSE-Net, a novel deep learning model that combines multi-scale convolutional layers and MLP-Mixer-based channel learning to effectively model spatial and spectral dependencies. For evaluation, we constructed a high-resolution RGB-HSI paired dataset by capturing basil leaves under controlled conditions. Through ablation studies and band combination analysis, we assessed the model’s ability to recover spectral information. The experimental results showed that the G-RGB input consistently outperformed unmodified RGB across multiple metrics, including mean squared error (MSE), peak signal-to-noise ratio (PSNR), spectral correlation coefficient (SCC), and structural similarity (SSIM), with the best performance observed when paired with FUSE-Net. While our method does not replace true NIR data, it offers a viable approximation during inference when only RGB images are available, supporting cost-effective analysis in scenarios where HSI systems are inaccessible. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

19 pages, 4196 KB  
Article
Active and Inactive Tuberculosis Classification Using Convolutional Neural Networks with MLP-Mixer
by Beanbonyka Rim, Hyeonung Jang, Hongchang Lee and Wangsu Jeon
Bioengineering 2025, 12(6), 630; https://doi.org/10.3390/bioengineering12060630 - 9 Jun 2025
Cited by 7 | Viewed by 1616
Abstract
Early detection of tuberculosis plays a critical role in effective treatment management. Like active tuberculosis, early identification of inactive forms such as latent or healed tuberculosis is essential to prevent future reactivation. In this study, we developed a deep-learning-based binary classification model to [...] Read more.
Early detection of tuberculosis plays a critical role in effective treatment management. Like active tuberculosis, early identification of inactive forms such as latent or healed tuberculosis is essential to prevent future reactivation. In this study, we developed a deep-learning-based binary classification model to distinguish between active and inactive tuberculosis cases. Our model architecture incorporated an EfficientNet backbone with an MLP-Mixer classification head and was fine-tuned on a dataset annotated by Cheonan Soonchunhyang Hospital. To enhance predictive performance, we applied transfer learning using weights pre-trained on the JFT-300M dataset via the Noisy Student training method. Unlike conventional models, our approach achieved competitive results, with an accuracy of 96.3%, a sensitivity of 95.9%, and a specificity of 96.6% on the test set. These promising outcomes suggest that our model could serve as a valuable asset to support clinical decision-making and streamline early screening workflows for latent tuberculosis. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Figure 1

22 pages, 553 KB  
Article
Block-Wise Domain Adaptation for Workload Prediction from fNIRS Data
by Jiyang Wang, Ayse Altay, Leanne Hirshfield and Senem Velipasalar
Sensors 2025, 25(12), 3593; https://doi.org/10.3390/s25123593 - 7 Jun 2025
Cited by 1 | Viewed by 1210
Abstract
Functional near-infrared spectroscopy (fNIRS) is a non-intrusive way to measure cortical hemodynamic activity. Predicting cognitive workload from fNIRS data has taken on a diffuse set of methods. To be applicable in real-world settings, models are needed, which can perform well across different sessions [...] Read more.
Functional near-infrared spectroscopy (fNIRS) is a non-intrusive way to measure cortical hemodynamic activity. Predicting cognitive workload from fNIRS data has taken on a diffuse set of methods. To be applicable in real-world settings, models are needed, which can perform well across different sessions as well as different subjects. However, most existing works assume that training and testing data come from the same subjects and/or cannot generalize well across never-before-seen subjects. Additional challenges imposed by fNIRS data include not only the high variations in inter-subject fNIRS data but also the variations in intra-subject data collected across different blocks of sessions. To address these challenges, we propose an effective method, referred to as the block-wise domain adaptation (BWise-DA), which explicitly minimizes intra-session variance as well by viewing different blocks from the same subject and same session as different domains. We minimize the intra-class domain discrepancy and maximize the inter-class domain discrepancy accordingly. In addition, we propose an MLPMixer-based model for workload prediction. Experimental results demonstrate that the proposed model provides better performance compared to three different baseline models on three publicly-available workload datasets. Two of the datasets are collected from n-back tasks and one of them is from finger-tapping. Moreover, the experimental results show that our proposed contrastive learning method can also be leveraged to improve the performance of the baseline models. We also present a visualization study showing that the models are paying attention to the right regions in the brain, which are known to be involved in the respective tasks. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

17 pages, 11621 KB  
Article
An Automated Algorithm for Obstructive Sleep Apnea Detection Using a Wireless Abdomen-Worn Sensor
by Thi Hang Dang, Seong-mun Kim, Min-seong Choi, Sung-nam Hwan, Hyung-ki Min and Franklin Bien
Sensors 2025, 25(8), 2412; https://doi.org/10.3390/s25082412 - 10 Apr 2025
Cited by 3 | Viewed by 4782
Abstract
Obstructive sleep apnea (OSA) is common among older populations and individuals with cardiovascular diseases. OSA diagnosis is primarily conducted using polysomnography or recommended home sleep apnea test (HSAT) devices. Wireless wearable devices have emerged as promising tools for OSA screening and follow-up. This [...] Read more.
Obstructive sleep apnea (OSA) is common among older populations and individuals with cardiovascular diseases. OSA diagnosis is primarily conducted using polysomnography or recommended home sleep apnea test (HSAT) devices. Wireless wearable devices have emerged as promising tools for OSA screening and follow-up. This study introduces a novel automated algorithm for detecting OSA using abdominal movement signals and acceleration data collected by a wireless abdomen-worn sensor (Soomirang). Thirty-seven subjects underwent overnight monitoring using an HSAT device and the Soomirang system simultaneously. Normal and apnea events were classified using an MLP-Mixer deep learning model based on Soomirang data, which was also used to estimate total sleep time (ST). Pearson correlation and Bland–Altman analyses were conducted to evaluate the agreement of ST and the apnea–hypopnea index (AHI) calculated by the HSAT device and Soomirang. ST demonstrated a correlation of 0.9 with an average time difference of 7.5 min, while AHI showed a correlation of 0.95 with an average AHI difference of 3. The accuracy, sensitivity, and specificity of the Soomirang for detecting OSA were 97.14%, 100%, and 95.45% at AHI ≥ 15, respectively. The proposed algorithm, utilizing data from a wireless abdomen-worn device exhibited excellent performance in detecting moderate to severe OSA. The findings underscored the potential of a simple device as an accessible and effective tool for OSA screening and follow-up. Full article
(This article belongs to the Section Wearables)
Show Figures

Figure 1

21 pages, 20129 KB  
Article
UMAP-Based All-MLP Marine Diesel Engine Fault Detection Method
by Shengli Dong, Jilong Liu, Bing Han, Shengzheng Wang, Hong Zeng and Meng Zhang
Electronics 2025, 14(7), 1293; https://doi.org/10.3390/electronics14071293 - 25 Mar 2025
Cited by 3 | Viewed by 1249
Abstract
This study presents an innovative approach for marine diesel engine fault detection, integrating unsupervised learning through Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction with time series prediction, offering significant improvements over existing methods. Unlike traditional model-based or expert-driven approaches, which struggle with [...] Read more.
This study presents an innovative approach for marine diesel engine fault detection, integrating unsupervised learning through Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction with time series prediction, offering significant improvements over existing methods. Unlike traditional model-based or expert-driven approaches, which struggle with complex nonlinear systems, or supervised data-driven methods limited by scarce labeled fault data, our unsupervised method establishes a normal operational baseline without requiring fault labels, enhancing applicability across diverse conditions. Leveraging UMAP’s nonlinear dimensionality reduction, the proposed method outperforms conventional linear techniques (e.g., PCA) by amplifying subtle system anomalies, enabling earlier detection of state transitions—up to two batches before deviations appear in traditional performance indicators (Ps)—thus improving fault detection sensitivity. To address nonlinear relationships in UMAP-reduced dimensions, the proposed TimeMixer-FI model enhances the TimeMixer architecture with MLP-Mixer layers. The TimeMixer-FI model demonstrates consistent improvements over the original TimeMixer across various sequence lengths, achieving an MSE reduction of 69.1% (from 0.0544 to 0.0168) and an MAE reduction of 46.3% (from 0.1023 to 0.0549) at an input sequence length of 60 time steps, thereby enhancing the reliability of the time series prediction baseline. Experimental results validate that this approach significantly enhances both the sensitivity and accuracy of early fault detection, providing a more robust and efficient solution for predictive maintenance in marine diesel engines. Full article
Show Figures

Figure 1

57 pages, 16680 KB  
Article
Generating High Spatial and Temporal Surface Albedo with Multispectral-Wavemix and Temporal-Shift Heatmaps
by Sagthitharan Karalasingham, Ravinesh C. Deo, Nawin Raj, David Casillas-Perez and Sancho Salcedo-Sanz
Remote Sens. 2025, 17(3), 461; https://doi.org/10.3390/rs17030461 - 29 Jan 2025
Cited by 3 | Viewed by 2308
Abstract
Surface albedo is a key variable influencing ground-reflected solar irradiance, which is a vital factor in boosting the energy gains of bifacial solar installations. Therefore, surface albedo is crucial towards estimating photovoltaic power generation of both bifacial and tilted solar installations. Varying across [...] Read more.
Surface albedo is a key variable influencing ground-reflected solar irradiance, which is a vital factor in boosting the energy gains of bifacial solar installations. Therefore, surface albedo is crucial towards estimating photovoltaic power generation of both bifacial and tilted solar installations. Varying across daylight hours, seasons, and locations, surface albedo is assumed to be constant across time by various models. The lack of granular temporal observations is a major challenge to the modeling of intra-day albedo variability. Though satellite observations of surface reflectance, useful for estimating surface albedo, provide wide spatial coverage, they too lack temporal granularity. Therefore, this paper considers a novel approach to temporal downscaling with imaging time series of satellite-sensed surface reflectance and limited high-temporal ground observations from surface radiation (SURFRAD) monitoring stations. Aimed at increasing information density for learning temporal patterns from an image series and using visual redundancy within such imagery for temporal downscaling, we introduce temporally shifted heatmaps as an advantageous approach over Gramian Angular Field (GAF)-based image time series. Further, we propose Multispectral-WaveMix, a derivative of the mixer-based computer vision architecture, as a high-performance model to harness image time series for surface albedo forecasting applications. Multispectral-WaveMix models intra-day variations in surface albedo on a 1 min scale. The framework combines satellite-sensed multispectral surface reflectance imagery at a 30 m scale from Landsat and Sentinel-2A and 2B satellites and granular ground observations from SURFRAD surface radiation monitoring sites as image time series for image-to-image translation between remote-sensed imagery and ground observations. The proposed model, with temporally shifted heatmaps and Multispectral-WaveMix, was benchmarked against predictions from models image-to-image MLP-Mix, MLP-Mix, and Standard MLP. Model predictions were also contrasted against ground observations from the monitoring sites and predictions from the National Solar Radiation Database (NSRDB). The Multispectral-WaveMix outperformed other models with a Cauchy loss of 0.00524, a signal-to-noise ratio (SNR) of 72.569, and a structural similarity index (SSIM) of 0.999, demonstrating the high potential of such modeling approaches for generating granular time series. Additional experiments were also conducted to explore the potential of the trained model as a domain-specific pre-trained alternative for the temporal modeling of unseen locations. As bifacial solar installations gain dominance to fulfill the increasing demand for renewables, our proposed framework provides a hybrid modeling approach to build models with ground observations and satellite imagery for intra-day surface albedo monitoring and hence for intra-day energy gain modeling and bifacial deployment planning. Full article
Show Figures

Figure 1

19 pages, 4720 KB  
Article
Applying MLP-Mixer and gMLP to Human Activity Recognition
by Takeru Miyoshi, Makoto Koshino and Hidetaka Nambo
Sensors 2025, 25(2), 311; https://doi.org/10.3390/s25020311 - 7 Jan 2025
Cited by 5 | Viewed by 2985
Abstract
The development of deep learning has led to the proposal of various models for human activity recognition (HAR). Convolutional neural networks (CNNs), initially proposed for computer vision tasks, are examples of models applied to sensor data. Recently, high-performing models based on Transformers and [...] Read more.
The development of deep learning has led to the proposal of various models for human activity recognition (HAR). Convolutional neural networks (CNNs), initially proposed for computer vision tasks, are examples of models applied to sensor data. Recently, high-performing models based on Transformers and multi-layer perceptrons (MLPs) have also been proposed. When applying these methods to sensor data, we often initialize hyperparameters with values optimized for image processing tasks as a starting point. We suggest that comparable accuracy could be achieved with fewer parameters for sensor data, which typically have lower dimensionality than image data. Reducing the number of parameters would decrease memory requirements and computational complexity by reducing the model size. We evaluated the performance of two MLP-based models, MLP-Mixer and gMLP, by reducing the values of hyperparameters in their MLP layers from those proposed in the respective original papers. The results of this study suggest that the performance of MLP-based models is positively correlated with the number of parameters. Furthermore, these MLP-based models demonstrate improved computational efficiency for specific HAR tasks compared to representative CNNs. Full article
(This article belongs to the Section Wearables)
Show Figures

Figure 1

27 pages, 12110 KB  
Article
Exploring the Impact of Additive Shortcuts in Neural Networks via Information Bottleneck-like Dynamics: From ResNet to Transformer
by Zhaoyan Lyu and Miguel R. D. Rodrigues
Entropy 2024, 26(11), 974; https://doi.org/10.3390/e26110974 - 14 Nov 2024
Cited by 3 | Viewed by 1904
Abstract
Deep learning has made significant strides, driving advances in areas like computer vision, natural language processing, and autonomous systems. In this paper, we further investigate the implications of the role of additive shortcut connections, focusing on models such as ResNet, Vision Transformers (ViTs), [...] Read more.
Deep learning has made significant strides, driving advances in areas like computer vision, natural language processing, and autonomous systems. In this paper, we further investigate the implications of the role of additive shortcut connections, focusing on models such as ResNet, Vision Transformers (ViTs), and MLP-Mixers, given that they are essential in enabling efficient information flow and mitigating optimization challenges such as vanishing gradients. In particular, capitalizing on our recent information bottleneck approach, we analyze how additive shortcuts influence the fitting and compression phases of training, crucial for generalization. We leverage Z-X and Z-Y measures as practical alternatives to mutual information for observing these dynamics in high-dimensional spaces. Our empirical results demonstrate that models with identity shortcuts (ISs) often skip the initial fitting phase and move directly into the compression phase, while non-identity shortcut (NIS) models follow the conventional two-phase process. Furthermore, we explore how IS models are still able to compress effectively, maintaining their generalization capacity despite bypassing the early fitting stages. These findings offer new insights into the dynamics of shortcut connections in neural networks, contributing to the optimization of modern deep learning architectures. Full article
(This article belongs to the Section Information Theory, Probability and Statistics)
Show Figures

Figure 1

25 pages, 50041 KB  
Article
How Resilient Are Kolmogorov–Arnold Networks in Classification Tasks? A Robustness Investigation
by Ahmed Dawod Mohammed Ibrahum, Zhengyu Shang and Jang-Eui Hong
Appl. Sci. 2024, 14(22), 10173; https://doi.org/10.3390/app142210173 - 6 Nov 2024
Cited by 13 | Viewed by 5012
Abstract
Kolmogorov–Arnold Networks (KANs) are a novel class of neural network architectures based on the Kolmogorov–Arnold representation theorem, which has demonstrated potential advantages in accuracy and interpretability over Multilayer Perceptron (MLP) models. This paper comprehensively evaluates the robustness of various KAN architectures—including KAN, KAN-Mixer, [...] Read more.
Kolmogorov–Arnold Networks (KANs) are a novel class of neural network architectures based on the Kolmogorov–Arnold representation theorem, which has demonstrated potential advantages in accuracy and interpretability over Multilayer Perceptron (MLP) models. This paper comprehensively evaluates the robustness of various KAN architectures—including KAN, KAN-Mixer, KANConv_KAN, and KANConv_MLP—against adversarial attacks, which constitute a critical aspect that has been underexplored in current research. We compare these models with MLP-based architectures such as MLP, MLP-Mixer, and ConvNet_MLP across three traffic sign classification datasets: GTSRB, BTSD, and CTSD. The models were subjected to various adversarial attacks (FGSM, PGD, CW, and BIM) with varying perturbation levels and were trained under different strategies, including standard training, adversarial training, and Randomized Smoothing. Our experimental results demonstrate that KAN-based models, particularly the KAN-Mixer, exhibit superior robustness to adversarial attacks compared to their MLP counterparts. Specifically, the KAN-Mixer consistently achieved lower Success Attack Rates (SARs) and Degrees of Change (DoCs) across most attack types and datasets while maintaining high accuracy on clean data. For instance, under FGSM attacks with ϵ=0.01, the KAN-Mixer outperformed the MLP-Mixer by maintaining higher accuracy and lower SARs. Adversarial training and Randomized Smoothing further enhanced the robustness of KAN-based models, with t-SNE visualizations revealing more stable latent space representations under adversarial perturbations. These findings underscore the potential of KAN architectures to improve neural network security and reliability in adversarial settings. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

26 pages, 3468 KB  
Article
MGCET: MLP-mixer and Graph Convolutional Enhanced Transformer for Hyperspectral Image Classification
by Mohammed A. A. Al-qaness, Guoyong Wu and Dalal AL-Alimi
Remote Sens. 2024, 16(16), 2892; https://doi.org/10.3390/rs16162892 - 8 Aug 2024
Cited by 7 | Viewed by 3343
Abstract
The vision transformer (ViT) has demonstrated performance comparable to that of convolutional neural networks (CNN) in the hyperspectral image classification domain. This is achieved by transforming images into sequence data and mining global spectral-spatial information to establish remote dependencies. Nevertheless, both the ViT [...] Read more.
The vision transformer (ViT) has demonstrated performance comparable to that of convolutional neural networks (CNN) in the hyperspectral image classification domain. This is achieved by transforming images into sequence data and mining global spectral-spatial information to establish remote dependencies. Nevertheless, both the ViT and CNNs have their own limitations. For instance, a CNN is constrained by the extent of its receptive field, which prevents it from fully exploiting global spatial-spectral features. Conversely, the ViT is prone to excessive distraction during the feature extraction process. To be able to overcome the problem of insufficient feature information extraction caused using by a single paradigm, this paper proposes an MLP-mixer and a graph convolutional enhanced transformer (MGCET), whose network consists of a spatial-spectral extraction block (SSEB), an MLP-mixer, and a graph convolutional enhanced transformer (GCET). First, spatial-spectral features are extracted using SSEB, and then local spatial-spectral features are fused with global spatial-spectral features by the MLP-mixer. Finally, graph convolution is embedded in multi-head self-attention (MHSA) to mine spatial relationships and similarity between pixels, which further improves the modeling capability of the model. Correlation experiments were conducted on four different HSI datasets. The MGEET algorithm achieved overall accuracies (OAs) of 95.45%, 97.57%, 98.05%, and 98.52% on these datasets. Full article
Show Figures

Graphical abstract

Back to TopTop