MDPI - Publisher of Open Access Journals

21 pages, 4949 KiB

Open AccessArticle

An Integrated Lightweight Neural Network Design and FPGA-Accelerated Edge Computing for Chili Pepper Variety and Origin Identification via an E-Nose

by Ziyu Guo, Yong Yin, Haolin Gu, Guihua Peng, Xueya Wang, Ju Chen and Jia Yan

Foods 2025, 14(15), 2612; https://doi.org/10.3390/foods14152612 - 25 Jul 2025

Viewed by 153

Abstract

A chili pepper variety and origin detection system that integrates a field-programmable gate array (FPGA) with an electronic nose (e-nose) is proposed in this paper to address the issues of variety confusion and origin ambiguity in the chili pepper market. The system uses [...] Read more.

A chili pepper variety and origin detection system that integrates a field-programmable gate array (FPGA) with an electronic nose (e-nose) is proposed in this paper to address the issues of variety confusion and origin ambiguity in the chili pepper market. The system uses the AIRSENSE PEN3 e-nose from Germany to collect gas data from thirteen different varieties of chili peppers and two specific varieties of chili peppers originating from seven different regions. Model training is conducted via the proposed lightweight convolutional neural network ChiliPCNN. By combining the strengths of a convolutional neural network (CNN) and a multilayer perceptron (MLP), the ChiliPCNN model achieves an efficient and accurate classification process, requiring only 268 parameters for chili pepper variety identification and 244 parameters for origin tracing, with 364 floating-point operations (FLOPs) and 340 FLOPs, respectively. The experimental results demonstrate that, compared with other advanced deep learning methods, the ChiliPCNN has superior classification performance and good stability. Specifically, ChiliPCNN achieves accuracy rates of 94.62% in chili pepper variety identification and 93.41% in origin tracing tasks involving Jiaoyang No. 6, with accuracy rates reaching as high as 99.07% for Xianjiao No. 301. These results fully validate the effectiveness of the model. To further increase the detection speed of the ChiliPCNN, its acceleration circuit is designed on the Xilinx Zynq7020 FPGA from the United States and optimized via fixed-point arithmetic and loop unrolling strategies. The optimized circuit reduces the latency to 5600 ns and consumes only 1.755 W of power, significantly improving the resource utilization rate and processing speed of the model. This system not only achieves rapid and accurate chili pepper variety and origin detection but also provides an efficient and reliable intelligent agricultural management solution, which is highly important for promoting the development of agricultural automation and intelligence. Full article

(This article belongs to the Special Issue Development of Digital Equipment and Artificial Intelligence for Sustainable Food Systems)

► Show Figures

Figure 1

16 pages, 2358 KiB

Open AccessArticle

A Hybrid Content-Aware Network for Single Image Deraining

by Guoqiang Chai, Rui Yang, Jin Ge and Yulei Chen

Computers 2025, 14(7), 262; https://doi.org/10.3390/computers14070262 - 4 Jul 2025

Viewed by 267

Abstract

Rain streaks degrade the quality of optical images and seriously affect the effectiveness of subsequent vision-based algorithms. Although the applications of a convolutional neural network (CNN) and self-attention mechanism (SA) in single image deraining have shown great success, there are still unresolved issues [...] Read more.

Rain streaks degrade the quality of optical images and seriously affect the effectiveness of subsequent vision-based algorithms. Although the applications of a convolutional neural network (CNN) and self-attention mechanism (SA) in single image deraining have shown great success, there are still unresolved issues regarding the deraining performance and the large computational load. The work in this paper fully coordinates and utilizes the advantages between CNN and SA and proposes a hybrid content-aware deraining network (CAD) to reduce complexity and generate high-quality results. Specifically, we construct the CADBlock, including the content-aware convolution and attention mixer module (CAMM) and the multi-scale double-gated feed-forward module (MDFM). In CAMM, the attention mechanism is used for intricate windows to generate abundant features and simple convolution is used for plain windows to reduce computational costs. In MDFM, multi-scale spatial features are double-gated fused to preserve local detail features and enhance image restoration capabilities. Furthermore, a four-token contextual attention module (FTCA) is introduced to explore the content information among neighbor keys to improve the representation ability. Both qualitative and quantitative validations on synthetic and real-world rain images demonstrate that the proposed CAD can achieve a competitive deraining performance. Full article

(This article belongs to the Special Issue Machine Learning Applications in Pattern Recognition)

► Show Figures

Figure 1

21 pages, 4725 KiB

Open AccessArticle

A Novel Open Circuit Fault Diagnosis for a Modular Multilevel Converter with Modal Time-Frequency Diagram and FFT-CNN-BIGRU Attention

by Ziyuan Zhai, Ning Wang, Siran Lu, Bo Zhou and Lei Guo

Machines 2025, 13(6), 533; https://doi.org/10.3390/machines13060533 - 19 Jun 2025

Viewed by 237

Abstract

Fault diagnosis is one of the most important issues for a modular multilevel converter (MMC). However, conventional solutions are deficient in two aspects. Firstly, they lack the necessary feature information. Secondly, they are incapable of performing open-circuit fault diagnosis of the modular multilevel [...] Read more.

Fault diagnosis is one of the most important issues for a modular multilevel converter (MMC). However, conventional solutions are deficient in two aspects. Firstly, they lack the necessary feature information. Secondly, they are incapable of performing open-circuit fault diagnosis of the modular multilevel converter with the requisite degree of accuracy. To solve this problem, an intelligent diagnosis method is proposed to integrate the modal time–frequency diagram and FFT-CNN-BiGRU-Attention. By selecting the phase current and bridge arm voltage as the core fault parameters, the particle swarm algorithm is used to optimize the Variational Modal Decomposition parameters, and the fault signal is decomposed and reconstructed into sensitive feature components. The reconstructed signals are further transformed into modal time–frequency diagrams via continuous wavelet transform to fully retain the time–frequency domain features. In the model construction stage, the frequency–domain features are first extracted using the fast Fourier transform (FFT), and the local patterns are captured through a combination with a convolutional neural network; subsequently, the timing correlations are analyzed using bidirectional gated loop cells, and the Attention Mechanism is introduced to strengthen the key features. Simulations show that the proposed method achieves 98.63% accuracy in locating faulty insulated gate bipolar transistors (IGBTs) in the sub-module, with second-level real-time response capability. Compared with the recently published scheme, it maintains stable performance under complex working conditions such as noise interference and data imbalances, showing stronger robustness and practical value. This study provides a new idea for the intelligent operation and maintenance of power electronic devices, which can be extended to the fault diagnosis of other power equipment in the future. Full article

(This article belongs to the Section Electromechanical Energy Conversion Systems)

► Show Figures

Figure 1

17 pages, 6081 KiB

Open AccessArticle

Research on Shale Oil Well Productivity Prediction Model Based on CNN-BiGRU Algorithm

by Yuan Pan, Xuewei Liu, Fuchun Tian, Liyong Yang, Xiaoting Gou, Yunpeng Jia, Quan Wang and Yingxi Zhang

Energies 2025, 18(10), 2523; https://doi.org/10.3390/en18102523 - 13 May 2025

Viewed by 366

Abstract

Unconventional reservoirs are characterized by intricate fluid-phase behaviors, and physics-based shale oil well productivity prediction models often exhibit substantial deviations due to oversimplified theoretical frameworks and challenges in parameter acquisition. Under these circumstances, data-driven approaches leveraging actual production datasets have emerged as viable [...] Read more.

Unconventional reservoirs are characterized by intricate fluid-phase behaviors, and physics-based shale oil well productivity prediction models often exhibit substantial deviations due to oversimplified theoretical frameworks and challenges in parameter acquisition. Under these circumstances, data-driven approaches leveraging actual production datasets have emerged as viable alternatives for productivity forecasting. Nevertheless, conventional data-driven architectures suffer from structural simplicity, limited capacity for processing low-dimensional feature spaces, and exclusive applicability to intra-sequence learning paradigms (e.g., production-to-production sequence mapping). This fundamentally conflicts with the underlying principles of mechanistic modeling, which emphasize pressure-to-production sequence transformations. To address these limitations, we propose a hybrid deep learning architecture integrating convolutional neural networks with bidirectional gated recurrent units (CNN-BiGRU). The model incorporates dedicated input pathways: fully connected layers for feature embedding and convolutional operations for high-dimensional feature extraction. By implementing a sequence-to-sequence (seq2seq) architecture with encoder–decoder mechanisms, our framework enables cross-domain sequence learning, effectively bridging pressure dynamics with production profiles. The CNN-BiGRU model was implemented on the TensorFlow framework, with rigorous validation of model robustness and systematic evaluation of feature importance. Hyperparameter optimization via grid searching yielded optimal configurations, while field applications demonstrated operational feasibility. Comparative analysis revealed a mean relative error (MRE) of 16.11% between predicted and observed production values, substantiating the model’s predictive competence. This methodology establishes a novel paradigm for machine learning-driven productivity prediction in unconventional reservoir engineering. Full article

(This article belongs to the Section H: Geo-Energy)

► Show Figures

Figure 1

20 pages, 6973 KiB

Open AccessArticle

Research on Water Quality Prediction Model Based on Spatiotemporal Weighted Fusion and Hierarchical Cross-Attention Mechanisms

by Jiaming Zhou, Ke Wei, Jiahuan Huang, Lin Yang and Junzhe Shi

Water 2025, 17(9), 1244; https://doi.org/10.3390/w17091244 - 22 Apr 2025

Viewed by 675

Abstract

In the context of drinking water safety assurance, water quality prediction faces challenges due to temporal fluctuations, seasonal cycles, and the impacts of sudden events. To address the issue of cumulative prediction bias caused by the simplistic feature fusion of traditional methods, this [...] Read more.

In the context of drinking water safety assurance, water quality prediction faces challenges due to temporal fluctuations, seasonal cycles, and the impacts of sudden events. To address the issue of cumulative prediction bias caused by the simplistic feature fusion of traditional methods, this study proposes a neural network architecture that integrates spatiotemporal features with a hierarchical cross-attention mechanism. Innovatively, the model constructs a parallel feature extraction framework, integrating BiGRUs (Bidirectional Gated Recurrent Units) and BiTCNs (Bidirectional Temporal Convolutional Networks). By incorporating a bidirectional spatiotemporal interaction mechanism, the model effectively captures long-term dependencies in time series and local associations in spatial topology. During the feature fusion phase, layer-by-layer weighting through learnable parameters enables adaptive spatiotemporal feature processing. A hierarchical cross-attention module is designed to achieve deep feature integration, enhancing the discriminative expression of spatial features while preserving the dynamics of time series. The experimental results demonstrate that when predicting water quality monitoring data from the Xidong Water Plant, this model excels in forecasting key indicators such as total phosphorus and total nitrogen. Compared to traditional hybrid models, it reduces the MSE (Mean Squared Error) by 33.35%, the MAE (Mean Absolute Error) by 38.05%, and the RMSE (Root Mean Square Error, RMSE) by 19.35%, and increases the R² (coefficient of determination, R²) by 2.15 percentage points. These achievements break the limitations of traditional methods’ rigid and simplistic feature fusion, fully demonstrating the model’s superiority in prediction accuracy and generalization capabilities. Full article

(This article belongs to the Section Water Quality and Contamination)

► Show Figures

Figure 1

27 pages, 41478 KiB

Open AccessArticle

LO-MLPRNN: A Classification Algorithm for Multispectral Remote Sensing Images by Fusing Selective Convolution

by Xiangsuo Fan, Yan Zhang, Yong Peng, Qi Li, Xianqiang Wei, Jiabin Wang and Fadong Zou

Sensors 2025, 25(8), 2472; https://doi.org/10.3390/s25082472 - 14 Apr 2025

Viewed by 418

Abstract

To address the limitation of traditional deep learning algorithms in fully utilizing contextual information in multispectral remote sensing (RS) images, this paper proposes an improved vegetation cover classification algorithm called LO-MLPRNN, which integrates Large Selective Kernel Network (LSK) and Omni-Dimensional Dynamic Convolution (ODC) [...] Read more.

To address the limitation of traditional deep learning algorithms in fully utilizing contextual information in multispectral remote sensing (RS) images, this paper proposes an improved vegetation cover classification algorithm called LO-MLPRNN, which integrates Large Selective Kernel Network (LSK) and Omni-Dimensional Dynamic Convolution (ODC) with a Multi-Layer Perceptron Recurrent Neural Network (MLPRNN). The algorithm employs parallel-connected ODC and LSK modules to adaptively adjust convolution kernel parameters across multiple dimensions and dynamically optimize spatial receptive fields, enabling multi-perspective feature fusion for efficient processing of multispectral band information. The extracted features are mapped to a high-dimensional space through a Gate Recurrent Unit (GRU) and fully connected layers, with nonlinear characteristics enhanced by activation functions, ultimately achieving pixel-level land cover classification. Experiments conducted on GF-2 (0.75 m) and Sentinel-2 (10 m) multispectral RS images from Liucheng County, Liuzhou City, Guangxi Province, demonstrate that LO-MLPRNN achieves overall accuracies of 99.11% and 99.43%, outperforming Vision Transformer (ViT) by 2.61% and 3.98%, respectively. Notably, the classification accuracy for sugarcane reaches 99.70% and 99.67%, showcasing its superior performance. Full article

(This article belongs to the Special Issue Smart Image Recognition and Detection Sensors)

► Show Figures

Figure 1

22 pages, 14368 KiB

Open AccessArticle

Global Ionospheric TEC Map Prediction Based on Multichannel ED-PredRNN

by Haijun Liu, Yan Ma, Huijun Le, Liangchao Li, Rui Zhou, Jian Xiao, Weifeng Shan, Zhongxiu Wu and Yalan Li

Atmosphere 2025, 16(4), 422; https://doi.org/10.3390/atmos16040422 - 4 Apr 2025

Viewed by 614

Abstract

High-precision total electron content (TEC) prediction can improve the accuracy of the Global Navigation Satellite System (GNSS)-based applications. The existing deep learning models for TEC prediction mainly include long short-term memory (LSTM), convolutional long short-term memory (ConvLSTM), and their variants, which contain only [...] Read more.

High-precision total electron content (TEC) prediction can improve the accuracy of the Global Navigation Satellite System (GNSS)-based applications. The existing deep learning models for TEC prediction mainly include long short-term memory (LSTM), convolutional long short-term memory (ConvLSTM), and their variants, which contain only one temporal memory. These models may result in fuzzy prediction results due to neglecting spatial memory, as spatial memory is crucial for capturing the correlations of TEC within the TEC neighborhood. In this paper, we draw inspiration from the predictive recurrent neural network (PredRNN), which has dual memory states to construct a TEC prediction model named Multichannel ED-PredRNN. The highlights of our work include the following: (1) for the first time, a dual memory mechanism was utilized in TEC prediction, which can more fully capture the temporal and spatial features; (2) we modified the n vs. n structure of original PredRNN to an encoder–decoder structure, so as to handle the problem of unequal input and output lengths in TEC prediction; and (3) we expanded the feature channels by extending the Kp, Dst, and F10.7 to the same spatiotemporal resolution as global TEC maps, overlaying them together to form multichannel features, so as to fully utilize the influence of solar and geomagnetic activities on TEC. The proposed Multichannel ED-PredRNN was compared with COPG, ConvLSTM, and convolutional gated recurrent unit (ConvGRU) from multiple perspectives on a data set of 6 years, including comparisons at different solar activities, time periods, latitude regions, single stations, and geomagnetic storm periods. The results show that in almost all cases, the proposed Multichannel ED-PredRNN outperforms the three comparative models, indicating that it can more fully utilize temporal and spatial features to improve the accuracy of TEC prediction. Full article

(This article belongs to the Section Atmospheric Techniques, Instruments, and Modeling)

► Show Figures

Figure 1

21 pages, 866 KiB

Open AccessArticle

An Event Recognition Method for a Φ-OTDR System Based on CNN-BiGRU Network Model with Attention

by Changli Li, Xiaoyu Chen and Yi Shi

Photonics 2025, 12(4), 313; https://doi.org/10.3390/photonics12040313 - 28 Mar 2025

Viewed by 628

Abstract

The phase-sensitive optical time domain reflectometry (Φ-OTDR) technique offers a method for distributed acoustic sensing (DAS) systems to detect external acoustic fluctuations and mechanical vibrations. By accurately identifying vibration events, DAS systems provide a non-invasive solution for security monitoring. However, limitations in temporal [...] Read more.

The phase-sensitive optical time domain reflectometry (Φ-OTDR) technique offers a method for distributed acoustic sensing (DAS) systems to detect external acoustic fluctuations and mechanical vibrations. By accurately identifying vibration events, DAS systems provide a non-invasive solution for security monitoring. However, limitations in temporal signal analysis and the lack of spatial features significantly impact classification accuracy in event recognition. To address these challenges, this paper proposes a network model for vibration-event recognition that integrates convolutional neural networks (CNNs), bidirectional gated recurrent units (BiGRUs), and attention mechanisms, referred to as CNN-BiGRU-Attention (CBA). First, the CBA model processes spatiotemporal matrices converted from raw signals, extracting low-level features through convolution and pooling. Subsequently, features are further extracted and separated along both the temporal and spatial dimensions. In the spatial-dimension branch, horizontal convolution and pooling generate enhanced spatial feature maps. In the temporal-dimension branch, vertical convolution and pooling are followed by BiGRU processing to capture dynamic changes in vibration events from both past and future contexts. Additionally, the attention mechanism focuses on extracted features in both dimensions. The features from the two dimensions are then fused using two cross-attention mechanisms. Finally, classification probabilities are output through a fully connected layer and a softmax activation function. In the experimental simulation section, the model is validated using real-world data. A comparison with four other typical models demonstrates that the proposed CBA model offers significant advantages in both recognition accuracy and robustness. Full article

(This article belongs to the Special Issue Distributed Optical Fiber Sensing Technology)

► Show Figures

Figure 1

28 pages, 4882 KiB

Open AccessArticle

A Daily Runoff Prediction Model for the Yangtze River Basin Based on an Improved Generative Adversarial Network

by Tong Liu, Xudong Cui and Li Mo

Sustainability 2025, 17(7), 2990; https://doi.org/10.3390/su17072990 - 27 Mar 2025

Viewed by 423

Abstract

Hydrological runoff prediction plays a crucial role in water resource management and sustainable development. However, it is often constrained by the nonlinearity, strong stochasticity, and high non-stationarity of hydrological data, as well as the limited accuracy of traditional forecasting methods. Although Wasserstein Generative [...] Read more.

Hydrological runoff prediction plays a crucial role in water resource management and sustainable development. However, it is often constrained by the nonlinearity, strong stochasticity, and high non-stationarity of hydrological data, as well as the limited accuracy of traditional forecasting methods. Although Wasserstein Generative Adversarial Networks with Gradient Penalty (WGAN-GP) have been widely used for data augmentation to enhance predictive model training, their direct application as forecasting models remains limited. Additionally, the architectures of the generator and discriminator in WGAN-GP have not been fully optimized, and their potential in hydrological forecasting has not been thoroughly explored. Meanwhile, the strategy of jointly optimizing Variational Autoencoders (VAEs) with WGAN-GP is still in its infancy in this field. To address these challenges and promote more accurate and sustainable water resource planning, this study proposes a comprehensive forecasting model, VXWGAN-GP, which integrates Variational Autoencoders (VAEs), WGAN-GP, Convolutional Neural Networks (CNN), Bidirectional Long Short-Term Memory Networks (BiLSTM), Gated Recurrent Units (GRUs), and Attention mechanisms. The VAE enhances feature representation by learning the data distribution and generating new features, which are then combined with the original features to improve predictive performance. The generator integrates GRU, BiLSTM, and Attention mechanisms: GRU captures short-term dependencies, BiLSTM captures long-term dependencies, and Attention focuses on critical time steps to generate forecasting results. The discriminator, based on CNN, evaluates the differences between the generated and real data through adversarial training, thereby optimizing the generator’s forecasting ability and achieving high-precision runoff prediction. This study conducts daily runoff prediction experiments at the Yichang, Cuntan, and Pingshan hydrological stations in the Yangtze River Basin. The results demonstrate that VXWGAN-GP significantly improves the quality of input features and enhances runoff prediction accuracy, offering a reliable tool for sustainable hydrological forecasting and water resource management. By providing more precise and robust runoff predictions, this model contributes to long-term water sustainability and resilience in hydrological systems. Full article

► Show Figures

Figure 1

19 pages, 13377 KiB

Open AccessArticle

Research on Offshore Vessel Trajectory Prediction Based on PSO-CNN-RGRU-Attention

by Wei Liu and Yu Cao

Appl. Sci. 2025, 15(7), 3625; https://doi.org/10.3390/app15073625 - 26 Mar 2025

Viewed by 378

Abstract

In busy offshore waters with high vessel density and intersecting shipping lanes, the risk of collisions and accidents is significantly increased. To address the problem of insufficient feature extraction capability of traditional recurrent neural networks (RNNs) in ship trajectory prediction in busy nearshore [...] Read more.

In busy offshore waters with high vessel density and intersecting shipping lanes, the risk of collisions and accidents is significantly increased. To address the problem of insufficient feature extraction capability of traditional recurrent neural networks (RNNs) in ship trajectory prediction in busy nearshore areas, this paper proposes a hybrid model based on Particle Swarm Optimization (PSO), Convolutional Neural Networks (CNN), Residual Networks, Attention Mechanism, and Gated Recurrent Units (GRU), named PSO-CNN-RGRU-Attention, for ship trajectory prediction. This study utilizes real Automatic Identification System (AIS) data and applies the PSO algorithm to optimize the model and determine the optimal parameters, using a sliding window method for input and output prediction. The effectiveness and practicality of the model have been fully verified. Experimental results show that, compared to the PSO-CNN-GRU model, the proposed model improves the longitude by 7.8%, 3.4%, and 1.7% in terms of Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE), respectively, and improves the latitude by 48.3%, 62.9%, and 39.2%, respectively. This has significantly contributed to enhancing the safety of ship navigation in the Bohai Strait. Full article

(This article belongs to the Section Marine Science and Engineering)

► Show Figures

Figure 1

23 pages, 5392 KiB

Open AccessArticle

A Sliding Window-Based CNN-BiGRU Approach for Human Skeletal Pose Estimation Using mmWave Radar

by Yuquan Luo, Yuqiang He, Yaxin Li, Huaiqiang Liu, Jun Wang and Fei Gao

Sensors 2025, 25(4), 1070; https://doi.org/10.3390/s25041070 - 11 Feb 2025

Viewed by 1198

Abstract

In this paper, we present a low-cost, low-power millimeter-wave (mmWave) skeletal joint localization system. High-quality point cloud data are generated using the self-developed BHYY_MMW6044 59–64 GHz mmWave radar device. A sliding window mechanism is introduced to extend the single-frame point cloud into multi-frame [...] Read more.

In this paper, we present a low-cost, low-power millimeter-wave (mmWave) skeletal joint localization system. High-quality point cloud data are generated using the self-developed BHYY_MMW6044 59–64 GHz mmWave radar device. A sliding window mechanism is introduced to extend the single-frame point cloud into multi-frame time-series data, enabling the full utilization of temporal information. This is combined with convolutional neural networks (CNNs) for spatial feature extraction and a bidirectional gated recurrent unit (BiGRU) for temporal modeling. The proposed spatio-temporal information fusion framework for multi-frame point cloud data fully exploits spatio-temporal features, effectively alleviates the sparsity issue of radar point clouds, and significantly enhances the accuracy and robustness of pose estimation. Experimental results demonstrate that the proposed system accurately detects 25 skeletal joints, particularly improving the positioning accuracy of fine joints, such as the wrist, thumb, and fingertip, highlighting its potential for widespread application in human–computer interaction, intelligent monitoring, and motion analysis. Full article

(This article belongs to the Section Radar Sensors)

► Show Figures

Figure 1

20 pages, 8886 KiB

Open AccessArticle

Multi-Scale Hierarchical Feature Fusion for Infrared Small-Target Detection

by Yue Wang, Xinhong Wang, Shi Qiu, Xianghui Chen, Zhaoyan Liu, Chuncheng Zhou, Weiyuan Yao, Hongjia Cheng, Yu Zhang, Feihong Wang and Zhan Shu

Remote Sens. 2025, 17(3), 428; https://doi.org/10.3390/rs17030428 - 27 Jan 2025

Cited by 2 | Viewed by 1317

Abstract

Detecting small targets in infrared images presents significant challenges due to their tiny size and complex backgrounds, making this task a hotspot for research. Traditional methods rely on assumption-based modeling and manual design, struggling to handle the variability of real-world scenarios. Although convolutional [...] Read more.

Detecting small targets in infrared images presents significant challenges due to their tiny size and complex backgrounds, making this task a hotspot for research. Traditional methods rely on assumption-based modeling and manual design, struggling to handle the variability of real-world scenarios. Although convolutional neural networks (CNNs) increase robustness to diverse scenes with a data-driven paradigm, many CNN-based methods are insufficient in capturing fine-grained details necessary for small targets and are less effective during multi-scale feature fusion. To overcome these challenges, we propose the novel Wide-scale Gated Fully Fusion Network (WGFFNet) in this article, which contributes to infrared small-target detection (IRSTD). WGFFNet uses a classic encoder–decoder structure, where the designed stepped fusion block (SFB) embedded in the feature extraction stage captures finer local context across multiple scales during encoding, and along the decoding path, the multi-level features are progressively integrated by a Fully Gated Interaction (FGI) Module to enhance feature representation. The inclusion of a boundary difference loss further optimizes the edge details of targets. We conducted comprehensive experiments on two public infrared small-target datasets: SIRST-V2 and IRSTD-1k. Quantitative and qualitative results demonstrate that our WGFFNet outperforms representative methods when considering various evaluation metrics together, achieving an improved detection performance and computational efficiency for detecting small targets in infrared images. Full article

► Show Figures

Figure 1

17 pages, 5862 KiB

Open AccessArticle

A Short-Term Power Load Forecasting Method Using CNN-GRU with an Attention Mechanism

by Qingbo Hua, Zengliang Fan, Wei Mu, Jiqiang Cui, Rongxin Xing, Huabo Liu and Junwei Gao

Energies 2025, 18(1), 106; https://doi.org/10.3390/en18010106 - 30 Dec 2024

Cited by 9 | Viewed by 1839

Abstract

This paper proposes a short-term electric load forecasting method combining convolutional neural networks and gated recurrent units with an attention mechanism. By integrating CNNs and GRUs, the method can fully leverage the strengths of CNNs in feature extraction and the advantages of GRUs [...] Read more.

This paper proposes a short-term electric load forecasting method combining convolutional neural networks and gated recurrent units with an attention mechanism. By integrating CNNs and GRUs, the method can fully leverage the strengths of CNNs in feature extraction and the advantages of GRUs in sequence modeling, enabling the model to comprehend signal data more comprehensively and effectively extract features from sequential data. The introduction of the attention mechanism allows the traditional model to dynamically focus on important parts of the input data while ignoring the unimportant parts. This capability enables the model to utilize input information more efficiently, thereby enhancing model performance. This paper applies the proposed model to a dataset comprising regional electric load and meteorological data for experimentation. The results show that compared with other common models, the proposed model effectively reduces the mean absolute error and root mean square error (121.51 and 263.43, respectively) and accurately predicts the short-term change in regional power load. Full article

(This article belongs to the Section F: Electrical Engineering)

► Show Figures

Figure 1

17 pages, 2272 KiB

Open AccessArticle

Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification

by Okpala Chibuike and Xiaopeng Yang

Diagnostics 2024, 14(24), 2790; https://doi.org/10.3390/diagnostics14242790 - 12 Dec 2024

Cited by 4 | Viewed by 2771

Abstract

Background/Objectives: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high-frequency components of images, which are critical in identifying fine-grained patterns, while CNNs have [...] Read more.

Background/Objectives: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high-frequency components of images, which are critical in identifying fine-grained patterns, while CNNs have difficulties in capturing long-range dependencies due to their local receptive fields, which makes it difficult to fully capture the spatial relationship across lung regions. Methods: In this paper, we proposed a hybrid architecture that integrates ViTs and CNNs within a modular component block(s) to leverage both local feature extraction and global context capture. In each component block, the CNN is used to extract the local features, which are then passed through the ViT to capture the global dependencies. We implemented a gated attention mechanism that combines the channel-, spatial-, and element-wise attention to selectively emphasize the important features, thereby enhancing overall feature representation. Furthermore, we incorporated a multi-scale fusion module (MSFM) in the proposed framework to fuse the features at different scales for more comprehensive feature representation. Results: Our proposed model achieved an accuracy of 99.50% in the classification of four pulmonary conditions. Conclusions: Through extensive experiments and ablation studies, we demonstrated the effectiveness of our approach in improving the medical image classification performance, while achieving good calibration results. This hybrid approach offers a promising framework for reliable and accurate disease diagnosis in medical imaging. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Figure 1

16 pages, 952 KiB

Open AccessFeature PaperArticle

SiCRNN: A Siamese Approach for Sleep Apnea Identification via Tracheal Microphone Signals

by Davide Lillini, Carlo Aironi, Lucia Migliorelli, Leonardo Gabrielli and Stefano Squartini

Sensors 2024, 24(23), 7782; https://doi.org/10.3390/s24237782 - 5 Dec 2024

Viewed by 1251

Abstract

Sleep apnea syndrome (SAS) affects about 3–7% of the global population, but is often undiagnosed. It involves pauses in breathing during sleep, for at least 10 s, due to partial or total airway blockage. The current gold standard for diagnosing SAS is polysomnography [...] Read more.

Sleep apnea syndrome (SAS) affects about 3–7% of the global population, but is often undiagnosed. It involves pauses in breathing during sleep, for at least 10 s, due to partial or total airway blockage. The current gold standard for diagnosing SAS is polysomnography (PSG), an intrusive procedure that depends on subjective assessment by expert clinicians. To address the limitations of PSG, we propose a decision support system, which uses a tracheal microphone for data collection and a deep learning (DL) approach—namely SiCRNN—to detect apnea events during overnight sleep recordings. Our proposed SiCRNN processes Mel spectrograms using a Siamese approach, integrating a convolutional neural network (CNN) backbone and a bidirectional gated recurrent unit (GRU). The final detection of apnea events is performed using an unsupervised clustering algorithm, specifically k-means. Multiple experimental runs were carried out to determine the optimal network configuration and the most suitable type and frequency range for the input data. Tests with data from eight patients showed that our method can achieve a

R e c a l l

score of up to 95% for apnea events. We also compared the proposed approach to a fully convolutional baseline, recently introduced in the literature, highlighting the effectiveness of the Siamese training paradigm in improving the identification of SAS. Full article

(This article belongs to the Special Issue Wearable Sensors and Artificial Intelligence for Measuring Human Vital Signs: 2nd Edition)

► Show Figures

Figure 1

Search Results (72)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (72)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI