Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (55)

Search Parameters:
Keywords = CNNs meet transformers

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 3305 KB  
Article
SatViT-Seg: A Transformer-Only Lightweight Semantic Segmentation Model for Real-Time Land Cover Mapping of High-Resolution Remote Sensing Imagery on Satellites
by Daoyu Shu, Zhan Zhang, Fang Wan, Wang Ru, Bingnan Yang, Yan Zhang, Jianzhong Lu and Xiaoling Chen
Remote Sens. 2026, 18(1), 1; https://doi.org/10.3390/rs18010001 - 19 Dec 2025
Viewed by 163
Abstract
The demand for real-time land cover mapping from high-resolution remote sensing (HR-RS) imagery motivates lightweight segmentation models running directly on satellites. By processing on-board and transmitting only fine-grained semantic products instead of massive raw imagery, these models provide timely support for disaster response, [...] Read more.
The demand for real-time land cover mapping from high-resolution remote sensing (HR-RS) imagery motivates lightweight segmentation models running directly on satellites. By processing on-board and transmitting only fine-grained semantic products instead of massive raw imagery, these models provide timely support for disaster response, environmental monitoring, and precision agriculture. Many recent methods combine convolutional neural networks (CNNs) with Transformers to balance local and global feature modeling, with convolutions as explicit information aggregation modules. Such heterogeneous hybrids may be unnecessary for lightweight models if similar aggregation can be achieved homogeneously, and operator inconsistency complicates optimization and hinders deployment on resource-constrained satellites. Meanwhile, lightweight Transformer components in these architectures often adopt aggressive channel compression and shallow contextual interaction to meet compute budgets, impairing boundary delineation and recognition of small or rare classes. To address this, we propose SatViT-Seg, a lightweight semantic segmentation model with a pure Vision Transformer (ViT) backbone. Unlike CNN-Transformer hybrids, SatViT-Seg adopts a homogeneous two-module design: a Local-Global Aggregation and Distribution (LGAD) module that uses window self-attention for local modeling and dynamically pooled global tokens with linear attention for long-range interaction, and a Bi-dimensional Attentive Feed-Forward Network (FFN) that enhances representation learning by modulating channel and spatial attention. This unified design overcomes common lightweight ViT issues such as channel compression and weak spatial correlation modeling. SatViT-Seg is implemented and evaluated in LuoJiaNET and PyTorch; comparative experiments with existing methods are run in PyTorch with unified training and data preprocessing for fairness, while the LuoJiaNET implementation highlights deployment-oriented efficiency on a graph-compiled runtime. Compared with the strongest baseline, SatViT-Seg improves mIoU by up to 1.81% while maintaining the lowest FLOPs among all methods. These results indicate that homogeneous Transformers offer strong potential for resource-constrained, on-board real-time land cover mapping in satellite missions. Full article
(This article belongs to the Special Issue Geospatial Artificial Intelligence (GeoAI) in Remote Sensing)
Show Figures

Figure 1

19 pages, 2663 KB  
Article
Hyperspectral Imaging Combined with Deep Learning for the Detection of Mold Diseases on Paper Cultural Relics
by Ya Zhao, Qiankun Song, Tao Song, Shaojiang Dong, Qian Wu and Zourong Long
Heritage 2025, 8(12), 495; https://doi.org/10.3390/heritage8120495 - 23 Nov 2025
Viewed by 362
Abstract
Mold contamination is one of the critical factors threatening the safety of paper-based cultural relics. Current detection methods rely predominantly on offline analysis, facing challenges such as low efficiency and limited real-time accuracy, which hinder their effectiveness in meeting the technical requirements of [...] Read more.
Mold contamination is one of the critical factors threatening the safety of paper-based cultural relics. Current detection methods rely predominantly on offline analysis, facing challenges such as low efficiency and limited real-time accuracy, which hinder their effectiveness in meeting the technical requirements of cultural heritage preventive conservation. This study proposes a hyperspectral imaging (HSI)-deep learning integrated fungal segmentation framework for deterioration detection in paper-based artifacts. Firstly, the HSI data was reduced to three dimensions via Locally Linear Embedding (LLE) manifold learning to construct 3D pseudo-color imagery, effectively preserving discriminative spectral features between fungal colonies and substrates while eliminating spectral redundancy. Secondly, a hybrid architecture synergizing Feature Pyramid Networks (FPN) with Vision Transformers was developed for semantic segmentation, leveraging CNN’s local feature extraction and Transformer’s global context modeling to enhance fungal signature saliency and suppress background interference. Innovatively, a dynamic sparse attention mechanism is introduced, optimizing attention allocation through the TOP-K algorithm to screen regions richer in mold information spatially and spectrally, thereby improving segmentation accuracy. Semantic segmentation experiments were conducted on papers infected with different molds. The results demonstrate that the proposed method achieves excellent performance in mold segmentation, providing technical support for mold detection and preventive conservation of cultural relics. Full article
(This article belongs to the Section Cultural Heritage)
Show Figures

Figure 1

29 pages, 6436 KB  
Article
Deep Learning-Based Prediction of Commercial Aircraft Noise: A CNN–Transformer Hybrid Model Versus Support Vector Regression and Multi-Layer Perceptron
by Ömer Osman Dursun
Aerospace 2025, 12(11), 1031; https://doi.org/10.3390/aerospace12111031 - 20 Nov 2025
Viewed by 315
Abstract
The rapid growth of the aviation industry and increasing air traffic demand more careful attention to environmental concerns. Among these, aircraft noise is considered one of the main sources of environmental noise, especially after land-based transportation. The World Health Organization highlights noise pollution [...] Read more.
The rapid growth of the aviation industry and increasing air traffic demand more careful attention to environmental concerns. Among these, aircraft noise is considered one of the main sources of environmental noise, especially after land-based transportation. The World Health Organization highlights noise pollution as the second-most important environmental factor after air pollution, with serious consequences for public health. Long-term exposure to high noise levels has been linked to problems such as cardiovascular disease and sleep disruption. In response, ICAO has introduced stricter standards especially in Annex 16, Volume I requiring aircraft to meet tighter noise limits. This study focuses on estimating the noise levels of Airbus and Boeing aircraft during approach, lateral, and flyover phases. The models use parameters such as maximum take-off and landing weights, engine thrust, and bypass ratio. Three approaches are compared: Support vector regression (SVR), a classical machine learning method, multi-layer perceptron(MLP), and a CNN–Transformer hybrid model, which combines deep learning and attention-based techniques. Their predictive performances were evaluated using MSE, RMSE, MAE, MAPE, and R2. The CNN–Transformer showed better results in all metrics. At the flyover point, it reached an R2 of 0.981, compared to 0.898 for SVR and 0.919 for MLP. At the lateral point, its MAE dropped to 0.58, while SVR had 1.64 and MLP 1.17. The attention-based model found patterns that the traditional one missed. It gave better results in several cases. Apart from this, some technologies used to reduce noise may also help save fuel and increase energy efficiency. For example, engines with a high bypass ratio can lower both noise and emissions. These kinds of solutions connect performance with environmental benefits. These insights could be useful for those involved in airport planning, aircraft engine design, or regulatory planning. Full article
(This article belongs to the Section Air Traffic and Transportation)
Show Figures

Figure 1

22 pages, 2460 KB  
Article
AI-Driven Cybersecurity in IoT: Adaptive Malware Detection and Lightweight Encryption via TRIM-SEC Framework
by Ibrahim Mutambik
Sensors 2025, 25(22), 7072; https://doi.org/10.3390/s25227072 - 19 Nov 2025
Viewed by 681
Abstract
The explosive growth in Internet of Things (IoT) technologies has given rise to significant security concerns, especially with the emergence of sophisticated and zero-day malware attacks. Conventional malware detection methods based on static or dynamic analysis often fail to meet the real-time operational [...] Read more.
The explosive growth in Internet of Things (IoT) technologies has given rise to significant security concerns, especially with the emergence of sophisticated and zero-day malware attacks. Conventional malware detection methods based on static or dynamic analysis often fail to meet the real-time operational needs and limited-resource constraints typical of IoT systems. This paper proposes TRIM-SEC (Transformer-Integrated Malware Security and Encryption for IoT), a lightweight and scalable framework that unifies intelligent threat detection with secure data transmission. The framework begins with Autoencoder-Based Feature Denoising (AEFD) to eliminate noise and enhance input quality, followed by Principal Component Analysis (PCA) for efficient dimensionality reduction. Malware classification is performed using a Transformer-Augmented Neural Network (TANN), which leverages multi-head self-attention to capture both contextual and temporal dependencies, enabling accurate detection of diverse threats such as Zero-Day, botnets, and zero-day exploits. For secure communication, TRIM-SEC incorporates Lightweight Elliptic Curve Cryptography (LECC), enhanced with Particle Swarm Optimization (PSO) to generate cryptographic keys with minimal computational burden. The framework is rigorously evaluated against advanced baselines, including LSTM-based IDS, CNN-GRU hybrids, and blockchain-enhanced security models. Experimental results show that TRIM-SEC delivers higher detection accuracy, fewer false alarms, and reduced encryption latency, which makes it well-suited for real-time operation in smart IoT ecosystems. Its balanced integration of detection performance, cryptographic strength, and computational efficiency positions TRIM-SEC as a promising solution for securing next-generation IoT environments. Full article
Show Figures

Figure 1

22 pages, 2100 KB  
Article
Abrupt Change Detection of ECG by Spiking Neural Networks: Policy-Aware Operating Points for Edge-Level MI Screening
by Youngseok Lee
Appl. Sci. 2025, 15(22), 12210; https://doi.org/10.3390/app152212210 - 18 Nov 2025
Viewed by 539
Abstract
Electrocardiogram (ECG) monitoring on low-power edge devices requires models that balance accuracy, latency, and energy consumption. This study evaluates abrupt change detection in ECG using spiking neural networks (SNNs) trained on spike-encoded signals that preserve salient cardiac dynamics. This study used 4910 ECG [...] Read more.
Electrocardiogram (ECG) monitoring on low-power edge devices requires models that balance accuracy, latency, and energy consumption. This study evaluates abrupt change detection in ECG using spiking neural networks (SNNs) trained on spike-encoded signals that preserve salient cardiac dynamics. This study used 4910 ECG segments from 290 subjects (PTB Diagnostic Database; 2.5-s windows at 1 kHz), providing context for the reported results. Under a unified architecture, preprocessing pipeline, and training schedule, we compare two representative neuron models—leaky integrate-and-fire (LIF) and adaptive exponential integrate-and-fire (AdEx). We report balanced accuracy, sensitivity, inference latency, and an energy proxy based on spike-event counts, and we examine robustness to input noise and temporal distortions. Across operating points, AdEx yields the highest overall accuracy and sensitivity, whereas LIF achieves the lowest energy cost and shortest latency, favoring deployment on resource-constrained hardware. Both SNN variants substantially reduce computational events—hence estimated energy—relative to conventional artificial neural network baselines, supporting their suitability for real-time, on-device diagnostics. These findings provide practical guidance for selecting neuron dynamics and decision thresholds to meet target accuracy–sensitivity trade-offs under energy and latency budgets. Overall, combining spike-encoded ECG with appropriately chosen SNN dynamics enables reliable abrupt change detection with notable efficiency gains, offering a path toward scalable edge-level cardiovascular monitoring. While lightweight CNNs and shallow transformers are important references, to keep the scope focused on SNN design choices and policy-aware thresholding for edge constraints, we refrain from reporting additional ANN numbers here. A seed-controlled head-to-head benchmark is reserved for future work. Full article
(This article belongs to the Special Issue Research on Artificial Intelligence in Healthcare)
Show Figures

Figure 1

29 pages, 4325 KB  
Article
A 1-Dimensional Physiological Signal Prediction Method Based on Composite Feature Preprocessing and Multi-Scale Modeling
by Peiquan Chen, Jie Li, Bo Peng, Zhaohui Liu and Liang Zhou
Sensors 2025, 25(21), 6726; https://doi.org/10.3390/s25216726 - 3 Nov 2025
Viewed by 755
Abstract
The real-time, precise monitoring of physiological signals such as intracranial pressure (ICP) and arterial blood pressure (BP) holds significant clinical importance. However, traditional methods like invasive ICP monitoring and invasive arterial blood pressure measurement present challenges including complex procedures, high infection risks, and [...] Read more.
The real-time, precise monitoring of physiological signals such as intracranial pressure (ICP) and arterial blood pressure (BP) holds significant clinical importance. However, traditional methods like invasive ICP monitoring and invasive arterial blood pressure measurement present challenges including complex procedures, high infection risks, and difficulties in continuous measurement. Consequently, learning-based prediction utilizing observable signals (e.g., BP/pulse waves) has emerged as a crucial alternative approach. Existing models struggle to simultaneously capture multi-scale local features and long-range temporal dependencies, while their computational complexity remains prohibitively high for meeting real-time clinical demands. To address this, this paper proposes a physiological signal prediction method combining composite feature preprocessing with multiscale modeling. First, a seven-dimensional feature matrix is constructed based on physiological prior knowledge to enhance feature discriminative power and mitigate phase mismatch issues. Second, a network architecture CNN-LSTM-Attention (CBAnet), integrating multiscale convolutions, long short-term memory (LSTM), and attention mechanisms is designed to effectively capture both local waveform details and long-range temporal dependencies, thereby improving waveform prediction accuracy and temporal consistency. Experiments on GBIT-ABP, CHARIS, and our self-built PPG-HAF dataset show that CBAnet achieves competitive performance relative to bidirectional long short-term Memory (BiLSTM), convolutional neural network-long short-term memory network (CNN-LSTM), Transformer, and Wave-U-Net baselines across Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination (R2). This study provides a promising, efficient approach for non-invasive, continuous physiological parameter prediction. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

17 pages, 1517 KB  
Article
Swin Transformer-Based Real-Time Multi-Tasking Image Detection in Industrial Automation Production Environments
by Haoxuan Li, Wei He and Anran Lan
Machines 2025, 13(10), 972; https://doi.org/10.3390/machines13100972 - 21 Oct 2025
Viewed by 751
Abstract
Automated production plays a vital role in the long-term development of industrial enterprises, and automated production has high requirements for defect detection of industrial parts. In this study, we construct a complex atom network based on Swin Transformer—selected for its window-based multi-head self-attention [...] Read more.
Automated production plays a vital role in the long-term development of industrial enterprises, and automated production has high requirements for defect detection of industrial parts. In this study, we construct a complex atom network based on Swin Transformer—selected for its window-based multi-head self-attention (W-MSA) and shifted window-based multi-head self-attention (SW-MSA) mechanisms, which enable efficient cross-window feature interaction and reduce computational complexity compared to vanilla Transformer or CNN-based methods in multi-task scenarios—and after repairing and recovering the abnormally generated and randomly masked images in the industrial automated production environment, we utilize the discriminative sub-network to achieve real-time abnormality image detection and classification. Then, the loss function optimization model is used to construct a real-time multi-task image detection model (MSTUnet) and design a real-time detection system in the industrial automation production environment. In the PE pipe image defect detection for industrial automated production, the average recognition rate of this paper’s detection model for six kinds of defects can reach 99.21%. Practical results show that the product excellence rate and qualification rate in the industrial automated production line equipped with this paper’s detection system reached 15.32% and 91.40%, respectively, and the production efficiency has been improved. The real-time multi-task image inspection technology and system proposed in this paper meet the requirements of industrial production for accurate, real-time and reliable, and can be practically applied in the industrial automation production environment, bringing good economic benefits. Full article
(This article belongs to the Section Automation and Control Systems)
Show Figures

Figure 1

29 pages, 3625 KB  
Article
Wind Farm Collector Line Fault Diagnosis and Location System Based on CNN-LSTM and ICEEMDAN-PE Combined with Wavelet Denoising
by Huida Duan, Song Bai, Zhipeng Gao and Ying Zhao
Electronics 2025, 14(17), 3347; https://doi.org/10.3390/electronics14173347 - 22 Aug 2025
Viewed by 703
Abstract
To enhance the accuracy and precision of fault diagnosis and location for the collector lines in wind farms under complex operating conditions, an intelligent combined method based on CNN-LSTM and ICEEMDAN-PE-improved wavelet threshold denoising is proposed. A wind power plant model is established [...] Read more.
To enhance the accuracy and precision of fault diagnosis and location for the collector lines in wind farms under complex operating conditions, an intelligent combined method based on CNN-LSTM and ICEEMDAN-PE-improved wavelet threshold denoising is proposed. A wind power plant model is established using the PSCADV46/EMTDC software. In response to the issue of indistinct fault current signal characteristics under complex fault conditions, a hybrid fault diagnosis model is constructed using CNN-LSTM. The convolutional neural network is utilized to extract the local time-frequency features of the current signals, while the long short-term memory network is employed to capture the dynamic time series patterns of faults. Combined with the improved phase-mode transformation, various types of faults are intelligently classified, effectively resolving the problem of fault feature extraction and achieving a fault diagnosis accuracy rate of 96.5%. To resolve the problem of small fault current amplitudes, low fault traveling wave amplitudes, and difficulty in accurate location due to noise interference in actual wind farms with high-resistance grounding faults, a combined denoising algorithm based on ICEEMDAN-PE-improved wavelet threshold is proposed. This algorithm, through the collaborative optimization of modal decomposition and entropy threshold, significantly improves the signal-to-noise ratio and reduces the root mean square error under simulated conditions with injected Gaussian white noise, stabilizing the fault location error within 0.5%. Extensive simulation results demonstrate that the fault diagnosis and location method proposed in this paper can effectively meet engineering requirements and provide reliable technical support for the intelligent operation and maintenance system of a wind farm. Full article
(This article belongs to the Special Issue Advanced Online Monitoring and Fault Diagnosis of Power Equipment)
Show Figures

Figure 1

27 pages, 5654 KB  
Article
Intelligent Detection and Description of Foreign Object Debris on Airport Pavements via Enhanced YOLOv7 and GPT-Based Prompt Engineering
by Hanglin Cheng, Ruoxi Zhang, Ruiheng Zhang, Yihao Li, Yang Lei and Weiguang Zhang
Sensors 2025, 25(16), 5116; https://doi.org/10.3390/s25165116 - 18 Aug 2025
Cited by 1 | Viewed by 1472
Abstract
Foreign Object Debris (FOD) on airport pavements poses a serious threat to aviation safety, making accurate detection and interpretable scene understanding crucial for operational risk management. This paper presents an integrated multi-modal framework that combines an enhanced YOLOv7-X detector, a cascaded YOLO-SAM segmentation [...] Read more.
Foreign Object Debris (FOD) on airport pavements poses a serious threat to aviation safety, making accurate detection and interpretable scene understanding crucial for operational risk management. This paper presents an integrated multi-modal framework that combines an enhanced YOLOv7-X detector, a cascaded YOLO-SAM segmentation module, and a structured prompt engineering mechanism to generate detailed semantic descriptions of detected FOD. Detection performance is improved through the integration of Coordinate Attention, Spatial–Depth Conversion (SPD-Conv), and a Gaussian Similarity IoU (GSIoU) loss, leading to a 3.9% gain in mAP@0.5 for small objects with only a 1.7% increase in inference latency. The YOLO-SAM cascade leverages high-quality masks to guide structured prompt generation, which incorporates spatial encoding, material attributes, and operational risk cues, resulting in a substantial improvement in description accuracy from 76.0% to 91.3%. Extensive experiments on a dataset of 12,000 real airport images demonstrate competitive detection and segmentation performance compared to recent CNN- and transformer-based baselines while achieving robust semantic generalization in challenging scenarios, such as complete darkness, low-light, high-glare nighttime conditions, and rainy weather. A runtime breakdown shows that the enhanced YOLOv7-X requires 40.2 ms per image, SAM segmentation takes 142.5 ms, structured prompt construction adds 23.5 ms, and BLIP-2 description generation requires 178.6 ms, resulting in an end-to-end latency of 384.8 ms per image. Although this does not meet strict real-time video requirements, it is suitable for semi-real-time or edge-assisted asynchronous deployment, where detection robustness and semantic interpretability are prioritized over ultra-low latency. The proposed framework offers a practical, deployable solution for airport FOD monitoring, combining high-precision detection with context-aware description generation to support intelligent runway inspection and maintenance decision-making. Full article
(This article belongs to the Special Issue AI and Smart Sensors for Intelligent Transportation Systems)
Show Figures

Figure 1

22 pages, 7620 KB  
Article
DSTANet: A Lightweight and High-Precision Network for Fine-Grained and Early Identification of Maize Leaf Diseases in Field Environments
by Xinyue Gao, Lili He, Yinchuan Liu, Jiaxin Wu, Yuying Cao, Shoutian Dong and Yinjiang Jia
Sensors 2025, 25(16), 4954; https://doi.org/10.3390/s25164954 - 10 Aug 2025
Viewed by 860
Abstract
Early and accurate identification of maize diseases is crucial for ensuring sustainable agricultural development. However, existing maize disease identification models face challenges including high inter-class similarity, intra-class variability, and limited capability in identifying early-stage symptoms. To address these limitations, we proposed DSTANet (decomposed [...] Read more.
Early and accurate identification of maize diseases is crucial for ensuring sustainable agricultural development. However, existing maize disease identification models face challenges including high inter-class similarity, intra-class variability, and limited capability in identifying early-stage symptoms. To address these limitations, we proposed DSTANet (decomposed spatial token aggregation network), a lightweight and high-performance model for maize leaf disease identification. In this study, we constructed a comprehensive maize leaf image dataset comprising six common disease types and healthy samples, with early and late stages of northern leaf blight and eyespot specifically differentiated. DSTANet employed MobileViT as the backbone architecture, combining the advantages of CNNs for local feature extraction with transformers for global feature modeling. To enhance lesion localization and mitigate interference from complex field backgrounds, DSFM (decomposed spatial fusion module) was introduced. Additionally, the MSTA (multi-scale token aggregator) was designed to leverage hidden-layer feature channels more effectively, improving information flow and preventing gradient vanishing. Experimental results showed that DSTANet achieved an accuracy of 96.11%, precision of 96.17%, recall of 96.11%, and F1-score of 96.14%. With only 1.9M parameters, 0.6 GFLOPs (floating point operations), and an inference speed of 170 images per second, the model meets real-time deployment requirements on edge devices. This study provided a novel and practical approach for fine-grained and early-stage maize disease identification, offering technical support for smart agriculture and precision crop management. Full article
(This article belongs to the Section Smart Agriculture)
Show Figures

Figure 1

24 pages, 6378 KB  
Article
Comparative Analysis of Ensemble Machine Learning Methods for Alumina Concentration Prediction
by Xiang Xia, Xiangquan Li, Yanhong Wang and Jianheng Li
Processes 2025, 13(8), 2365; https://doi.org/10.3390/pr13082365 - 25 Jul 2025
Viewed by 1121
Abstract
In the aluminum electrolysis production process, the traditional cell control method based on cell voltage and series current can no longer meet the goals of energy conservation, consumption reduction, and digital-intelligent transformation. Therefore, a new digital cell control technology that is centrally dependent [...] Read more.
In the aluminum electrolysis production process, the traditional cell control method based on cell voltage and series current can no longer meet the goals of energy conservation, consumption reduction, and digital-intelligent transformation. Therefore, a new digital cell control technology that is centrally dependent on various process parameters has become an urgent demand in the aluminum electrolysis industry. Among them, the real-time online measurement of alumina concentration is one of the key data points for implementing such technology. However, due to the harsh production environment and limitations of current sensor technologies, hardware-based detection of alumina concentration is difficult to achieve. To address this issue, this study proposes a soft-sensing model for alumina concentration based on a long short-term memory (LSTM) neural network optimized by a weighted average algorithm (WAA). The proposed method outperforms BiLSTM, CNN-LSTM, CNN-BiLSTM, CNN-LSTM-Attention, and CNN-BiLSTM-Attention models in terms of predictive accuracy. In comparison to LSTM models optimized using the Grey Wolf Optimizer (GWO), Harris Hawks Optimization (HHO), Optuna, Tornado Optimization Algorithm (TOC), and Whale Migration Algorithm (WMA), the WAA-enhanced LSTM model consistently achieves significantly better performance. This superiority is evidenced by lower MAE and RMSE values, along with higher R2 and accuracy scores. The WAA-LSTM model remains stable throughout the training process and achieves the lowest final loss, further confirming the accuracy and superiority of the proposed approach. Full article
Show Figures

Figure 1

18 pages, 533 KB  
Article
Comparative Analysis of Deep Learning Models for Intrusion Detection in IoT Networks
by Abdullah Waqas, Sultan Daud Khan, Zaib Ullah, Mohib Ullah and Habib Ullah
Computers 2025, 14(7), 283; https://doi.org/10.3390/computers14070283 - 17 Jul 2025
Viewed by 1467
Abstract
The Internet of Things (IoT) holds transformative potential in fields such as power grid optimization, defense networks, and healthcare. However, the constrained processing capacities and resource limitations of IoT networks make them especially susceptible to cyber threats. This study addresses the problem of [...] Read more.
The Internet of Things (IoT) holds transformative potential in fields such as power grid optimization, defense networks, and healthcare. However, the constrained processing capacities and resource limitations of IoT networks make them especially susceptible to cyber threats. This study addresses the problem of detecting intrusions in IoT environments by evaluating the performance of deep learning (DL) models under different data and algorithmic conditions. We conducted a comparative analysis of three widely used DL models—Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM), and Bidirectional LSTM (biLSTM)—across four benchmark IoT intrusion detection datasets: BoTIoT, CiCIoT, ToNIoT, and WUSTL-IIoT-2021. Each model was assessed under balanced and imbalanced dataset configurations and evaluated using three loss functions (cross-entropy, focal loss, and dual focal loss). By analyzing model efficacy across these datasets, we highlight the importance of generalizability and adaptability to varied data characteristics that are essential for real-world applications. The results demonstrate that the CNN trained using the cross-entropy loss function consistently outperforms the other models, particularly on balanced datasets. On the other hand, LSTM and biLSTM show strong potential in temporal modeling, but their performance is highly dependent on the characteristics of the dataset. By analyzing the performance of multiple DL models under diverse datasets, this research provides actionable insights for developing secure, interpretable IoT systems that can meet the challenges of designing a secure IoT system. Full article
(This article belongs to the Special Issue Application of Deep Learning to Internet of Things Systems)
Show Figures

Figure 1

19 pages, 9631 KB  
Article
Res2Former: Integrating Res2Net and Transformer for a Highly Efficient Speaker Verification System
by Defu Chen, Yunlong Zhou, Xianbao Wang, Sheng Xiang, Xiaohu Liu and Yijian Sang
Electronics 2025, 14(12), 2489; https://doi.org/10.3390/electronics14122489 - 19 Jun 2025
Viewed by 1868
Abstract
Speaker verification (SV) is an exceptionally effective method of biometric authentication. However, its performance is heavily influenced by the effectiveness of the extracted speaker features and their suitability for use in resource-limited environments. Transformer models and convolutional neural networks (CNNs), leveraging self-attention mechanisms, [...] Read more.
Speaker verification (SV) is an exceptionally effective method of biometric authentication. However, its performance is heavily influenced by the effectiveness of the extracted speaker features and their suitability for use in resource-limited environments. Transformer models and convolutional neural networks (CNNs), leveraging self-attention mechanisms, have demonstrated state-of-the-art performance in most Natural Language Processing (NLP) and Image Recognition tasks. However, previous studies indicate that standalone Transformer and CNN architectures present distinct challenges in speaker verification. Specifically, while Transformer models deliver good results, they fail to meet the requirements of low-resource scenarios and computational efficiency. On the other hand, CNNs perform well in resource-constrained environments but suffer from significantly reduced recognition accuracy. Several existing approaches, such as Conformer, combine Transformers and CNNs but still face challenges related to high resource consumption and low computational efficiency. To address these issues, we propose a novel solution that enhances the Transformer model by introducing multi-scale convolutional attention and a Global Response Normalization (GRN)-based feed-forward network, resulting in a lightweight backbone architecture called the lightweight simple transformer (LST). We further improve LST by incorporating the Res2Net structure from CNN, yielding the Res2Former model—a low-parameter, high—precision SV model. In Res2Former, we design and implement a time-frequency adaptive feature fusion(TAFF) mechanism that enables fine-grained feature propagation by fusing features at different depths at the frame level. Additionally, holistic fusion is employed for global feature propagation across the model. To enhance performance, multiple convergence methods are introduced, improving the overall efficacy of the SV system. Experimental results on the VoxCeleb1-O, VoxCeleb1-E, VoxCeleb1-H, and Cn-Celeb(E) datasets demonstrate that Res2Former achieves excellent performance, with the Large configuration attaining Equal Error Rate (EER)/Minimum Detection Cost Function (minDCF) scores of 0.81%/0.08, 0.98%/0.11, 1.81%/0.17, and 8.39%/0.46, respectively. Notably, the Base configuration of Res2Former, with only 1.73M parameters, also delivers competitive results. Full article
(This article belongs to the Special Issue New Advances in Embedded Software and Applications)
Show Figures

Figure 1

21 pages, 512 KB  
Article
Enhancing Sign Language Recognition Performance Through Coverage-Based Dynamic Clip Generation
by Taewan Kim and Bongjae Kim
Appl. Sci. 2025, 15(11), 6372; https://doi.org/10.3390/app15116372 - 5 Jun 2025
Cited by 1 | Viewed by 1794
Abstract
Sign Language Recognition (SLR) has made substantial progress through advances in deep learning and video-based action recognition. Conventional SLR systems typically segment input videos into a fixed number of clips (e.g., five clips per video), regardless of the video’s actual length, to meet [...] Read more.
Sign Language Recognition (SLR) has made substantial progress through advances in deep learning and video-based action recognition. Conventional SLR systems typically segment input videos into a fixed number of clips (e.g., five clips per video), regardless of the video’s actual length, to meet the fixed-length input requirements of deep learning models. While this approach simplifies model design and training, it fails to account for temporal variations inherent in sign language videos. Specifically, applying a fixed number of clips to videos of varying lengths can lead to significant information loss: longer videos suffer from excessive frame skipping, causing the model to miss critical gestural cues, whereas shorter videos require frame duplication, introducing temporal redundancy that distorts motion dynamics. To address these limitations, we propose a dynamic clip generation method that adaptively adjusts the number of clips during inference based on a novel coverage metric. This metric quantifies how effectively a clip selection captures the temporal information in a given video, enabling the system to maintain both temporal fidelity and computational efficiency. Experimental results on benchmark SLR datasets using multiple models-including 3D CNNs, R(2+1)D, Video Swin Transformer, and Multiscale Vision Transformers demonstrate that our method consistently outperforms fixed clip generation methods. Notably, our approach achieves 98.67% accuracy with the Video Swin Transformer while reducing inference time by 28.57%. These findings highlight the effectiveness of coverage-based dynamic clip generation in improving both accuracy and efficiency, particularly for videos with high temporal variability. Full article
(This article belongs to the Topic Applied Computing and Machine Intelligence (ACMI))
Show Figures

Figure 1

19 pages, 2889 KB  
Article
APO-CViT: A Non-Destructive Estrus Detection Method for Breeding Pigs Based on Multimodal Feature Fusion
by Jinghan Cai, Wenzheng Liu, Tonghai Liu, Fanzhen Wang, Zhihan Li, Xue Wang and Hua Li
Animals 2025, 15(7), 1067; https://doi.org/10.3390/ani15071067 - 7 Apr 2025
Cited by 1 | Viewed by 1001
Abstract
Detecting estrus in sows is important for improving pig reproductive performance and pig farm production efficiency levels. Traditional estrus detection methods are highly subjective and inaccurate, making it difficult to meet the demands of modern farming. This research developed a multimodal feature fusion [...] Read more.
Detecting estrus in sows is important for improving pig reproductive performance and pig farm production efficiency levels. Traditional estrus detection methods are highly subjective and inaccurate, making it difficult to meet the demands of modern farming. This research developed a multimodal feature fusion method that combines audio and thermal infrared image data to enhance the accuracy and robustness of estrus monitoring in breeding pigs. We designed the Adaptive-PIG-OESTUS-CNN-ViT model, which uses thermal infrared images and audio as inputs for a network model. By integrating the Vision Transformer and convolutional neural networks, the model extracted and fused features from multimodal data. An adaptive cross-attention mechanism was employed to automatically learn feature vectors representing the combined thermal infrared and audio data, which were then fed into an improved DenseNet network to identify estrus and non-estrus states in breeding pigs. The model achieved an accuracy of 98.92%, a recall rate of 95.83%, and an F1-score of 97.35%, effectively performing non-destructive estrus detection in breeding pigs. Compared with traditional estrus detection methods, this approach more accurately integrated data from different modalities to distinguish the estrus state of breeding pigs, providing an efficient, objective, and non-destructive means for sow estrus detection. Full article
(This article belongs to the Section Pigs)
Show Figures

Figure 1

Back to TopTop