Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (33)

Search Parameters:
Keywords = multi-FCN fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 4927 KB  
Article
A Multi-Resolution Attention U-Net for Pavement Distress Segmentation in 3D Images: Architecture and Data-Driven Insights
by Haitao Gong, Jueqiang Tao, Xiaohua Luo and Feng Wang
Mathematics 2025, 13(17), 2752; https://doi.org/10.3390/math13172752 - 27 Aug 2025
Viewed by 1124
Abstract
High-resolution 3D pavement images have become a valuable data source for automated surface distress detection and assessment. However, accurately identifying and segmenting cracks from pavement images remains challenging, due to factors such as low contrast and hair-like thinness. This study investigates key factors [...] Read more.
High-resolution 3D pavement images have become a valuable data source for automated surface distress detection and assessment. However, accurately identifying and segmenting cracks from pavement images remains challenging, due to factors such as low contrast and hair-like thinness. This study investigates key factors affecting segmentation performance and proposes a novel deep learning architecture designed to enhance segmentation robustness under these challenging conditions. The proposed model integrates a multi-resolution feature extraction stream with gated attention mechanisms to improve spatial awareness and selectively fuse information across feature levels. Our extensive experiments on a 3D pavement dataset demonstrated that the proposed method outperformed several state-of-the-art architectures, including FCN, U-Net, DeepLab, DeepCrack, and CrackFormer. Compared with U-Net, it improved F1 from 0.733 to 0.780. The gains were most pronounced on thin cracks, with F1 from 0.531 to 0.626. Our paired t-tests across folds showed the method is statistically better than U-Net and DeepCrack on Recall, IoU, Dice, and F1. These findings highlight the effectiveness of the attention-guided, multi-scale feature fusion method for robust crack segmentation using 3D pavement data. Full article
(This article belongs to the Special Issue The Application of Deep Neural Networks in Image Processing)
Show Figures

Figure 1

21 pages, 3621 KB  
Article
CSNet: A Remote Sensing Image Semantic Segmentation Network Based on Coordinate Attention and Skip Connections
by Jiahao Li, Hongguo Zhang, Liang Chen, Binbin He and Huaixin Chen
Remote Sens. 2025, 17(12), 2048; https://doi.org/10.3390/rs17122048 - 13 Jun 2025
Cited by 3 | Viewed by 2099
Abstract
In recent years, the continuous development of deep learning has significantly advanced its application in the field of remote sensing. However, the semantic segmentation of high-resolution remote sensing images remains challenging due to the presence of multi-scale objects and intricate spatial details, often [...] Read more.
In recent years, the continuous development of deep learning has significantly advanced its application in the field of remote sensing. However, the semantic segmentation of high-resolution remote sensing images remains challenging due to the presence of multi-scale objects and intricate spatial details, often leading to the loss of critical information during segmentation. To address this issue and enable fast and accurate segmentation of remote sensing images, we made improvements based on SegNet and named the enhanced model CSNet. CSNet is built upon the SegNet architecture and incorporates a coordinate attention (CA) mechanism, which enables the network to focus on salient features and capture global spatial information, thereby improving segmentation accuracy and facilitating the recovery of spatial structures. Furthermore, skip connections are introduced between the encoder and decoder to directly transfer low-level features to the decoder. This promotes the fusion of semantic information at different levels, enhances the recovery of fine-grained details, and optimizes the gradient flow during training, effectively mitigating the vanishing gradient problem and improving training efficiency. Additionally, a hybrid loss function combining weighted cross-entropy and Dice loss is employed. To address the issue of class imbalance, several categories within the dataset are merged, and samples with an excessively high proportion of background pixels are removed. These strategies significantly enhance the segmentation performance, particularly for small-sample classes. Experimental results from the Five-Billion-Pixels dataset demonstrate that, while introducing only a modest increase in parameters compared to SegNet, CSNet achieves superior segmentation performance in terms of overall classification accuracy, boundary delineation, and detail preservation, outperforming established methods such as U-Net, FCN, DeepLabv3+, SegNet, ViT, HRNe and BiFormert. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

15 pages, 2557 KB  
Article
Precision-Driven Semantic Segmentation of Pipe Gallery Diseases Using PipeU-NetX: A Depthwise Separable Convolution Approach
by Wenbin Song, Hanqian Wu and Chunlin Pu
Computation 2025, 13(6), 143; https://doi.org/10.3390/computation13060143 - 10 Jun 2025
Viewed by 783
Abstract
Aiming at the problems of high labor cost, low detection efficiency, and insufficient detection accuracy of traditional pipe gallery disease detection methods, this paper proposes a pipe gallery disease segmentation model, PipeU-NetX, based on deep learning technology. By introducing the innovative down-sampling module [...] Read more.
Aiming at the problems of high labor cost, low detection efficiency, and insufficient detection accuracy of traditional pipe gallery disease detection methods, this paper proposes a pipe gallery disease segmentation model, PipeU-NetX, based on deep learning technology. By introducing the innovative down-sampling module MD-U, up-sampling module SC-U, and feature fusion module FFM, the model optimizes the feature extraction and fusion process, reduces the loss of feature information, and realizes the accurate segmentation of the pipe gallery disease image. In comparison with U-Net, FCN, and Deeplabv3+ models, PipeU-NetX achieved the best PA, MPA, FWIoU, and MIoU, which were 99.15%, 92.66%, 98.34%, and 87.63%, respectively. Compared with the benchmark model U-Net, the MIoU and MPA of the PipeU-NetX model increased by 4.64% and 3.92%, respectively, and the number of parameters decreased by 23.71%. The detection speed increased by 22.1%. The PipeU-NetX model proposed in this paper shows the powerful ability of multi-scale feature extraction and defect area adaptive recognition and provides an effective solution for the intelligent monitoring of the pipe gallery environment and accurate disease segmentation. Full article
Show Figures

Figure 1

18 pages, 4823 KB  
Article
ME-FCN: A Multi-Scale Feature-Enhanced Fully Convolutional Network for Building Footprint Extraction
by Hui Sheng, Yaoteng Zhang, Wei Zhang, Shiqing Wei, Mingming Xu and Yasir Muhammad
Remote Sens. 2024, 16(22), 4305; https://doi.org/10.3390/rs16224305 - 19 Nov 2024
Cited by 2 | Viewed by 1774
Abstract
The precise extraction of building footprints using remote sensing technology is increasingly critical for urban planning and development amid growing urbanization. However, considering the complexity of building backgrounds, diverse scales, and varied appearances, accurately and efficiently extracting building footprints from various remote sensing [...] Read more.
The precise extraction of building footprints using remote sensing technology is increasingly critical for urban planning and development amid growing urbanization. However, considering the complexity of building backgrounds, diverse scales, and varied appearances, accurately and efficiently extracting building footprints from various remote sensing images remains a significant challenge. In this paper, we propose a novel network architecture called ME-FCN, specifically designed to perceive and optimize multi-scale features to effectively address the challenge of extracting building footprints from complex remote sensing images. We introduce a Squeeze-and-Excitation U-Block (SEUB), which cascades multi-scale semantic information exploration in shallow feature maps and incorporates channel attention to optimize features. In the network’s deeper layers, we implement an Adaptive Multi-scale feature Enhancement Block (AMEB), which captures large receptive field information through concatenated atrous convolutions. Additionally, we develop a novel Dual Multi-scale Attention (DMSA) mechanism to further enhance the accuracy of cascaded features. DMSA captures multi-scale semantic features across both channel and spatial dimensions, suppresses redundant information, and realizes multi-scale feature interaction and fusion, thereby improving the overall accuracy and efficiency. Comprehensive experiments on three datasets demonstrate that ME-FCN outperforms mainstream segmentation methods. Full article
Show Figures

Figure 1

19 pages, 4803 KB  
Article
Rural Road Extraction in Xiong’an New Area of China Based on the RC-MSFNet Network Model
by Nanjie Yang, Weimeng Di, Qingyu Wang, Wansi Liu, Teng Feng and Xiaomin Tian
Sensors 2024, 24(20), 6672; https://doi.org/10.3390/s24206672 - 16 Oct 2024
Cited by 1 | Viewed by 1632
Abstract
High-resolution remote sensing imagery, reaching meter or sub-meter levels, provides essential data for extracting and identifying road information. However, rural roads are often narrow, elongated, and have blurred boundaries, with textures that resemble surrounding environments such as construction sites, vegetation, and farmland. These [...] Read more.
High-resolution remote sensing imagery, reaching meter or sub-meter levels, provides essential data for extracting and identifying road information. However, rural roads are often narrow, elongated, and have blurred boundaries, with textures that resemble surrounding environments such as construction sites, vegetation, and farmland. These features often lead to incomplete extraction and low extraction accuracy of rural roads. To address these challenges, this study introduces the RC-MSFNet model, based on the U-Net architecture, to enhance rural road extraction performance. The RC-MSFNet model mitigates the vanishing gradient problem in deep networks by incorporating residual neural networks in the downsampling stage. In the upsampling stage, a connectivity attention mechanism is added after dual convolution layers to improve the model’s ability to capture road completeness and connectivity. Additionally, the bottleneck section replaces the traditional dual convolution layers with a multi-scale fusion atrous convolution module to capture features at various scales. The study focuses on rural roads in the Xiong’an New Area, China, using high-resolution imagery from China’s Gaofen-2 satellite to construct the XARoads rural road dataset. Roads were extracted from the XARoads dataset and DeepGlobe public dataset using the RC-MSFNet model and compared with some models such as U-Net, FCN, SegNet, DeeplabV3+, R-Net, and RC-Net. Experimental results showed that: (1) The proposed method achieved precision (P), intersection over union (IOU), and completeness (COM) scores of 0.8350, 0.6523, and 0.7489, respectively, for rural road extraction in Xiong’an New Area, representing precision improvements of 3.8%, 6.78%, 7.85%, 2.14%, 0.58%, and 2.53% over U-Net, FCN, SegNet, DeeplabV3+, R-Net, and RC-Net. (2) The method excelled at extracting narrow roads and muddy roads with unclear boundaries, with fewer instances of omission or false extraction, demonstrating advantages in complex rural terrain and areas with indistinct road boundaries. Accurate rural road extraction can provide valuable reference data for urban development and planning in the Xiong’an New Area. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

22 pages, 16731 KB  
Article
Advanced Global Prototypical Segmentation Framework for Few-Shot Hyperspectral Image Classification
by Kunming Xia, Guowu Yuan, Mengen Xia, Xiaosen Li, Jinkang Gui and Hao Zhou
Sensors 2024, 24(16), 5386; https://doi.org/10.3390/s24165386 - 21 Aug 2024
Cited by 3 | Viewed by 2183
Abstract
With the advancement of deep learning, related networks have shown strong performance for Hyperspectral Image (HSI) classification. However, these methods face two main challenges in HSI classification: (1) the inability to capture global information of HSI due to the restriction of patch input [...] Read more.
With the advancement of deep learning, related networks have shown strong performance for Hyperspectral Image (HSI) classification. However, these methods face two main challenges in HSI classification: (1) the inability to capture global information of HSI due to the restriction of patch input and (2) insufficient utilization of information from limited labeled samples. To overcome these challenges, we propose an Advanced Global Prototypical Segmentation (AGPS) framework. Within the AGPS framework, we design a patch-free feature extractor segmentation network (SegNet) based on a fully convolutional network (FCN), which processes the entire HSI to capture global information. To enrich the global information extracted by SegNet, we propose a Fusion of Lateral Connection (FLC) structure that fuses the low-level detailed features of the encoder output with the high-level features of the decoder output. Additionally, we propose an Atrous Spatial Pyramid Pooling-Position Attention (ASPP-PA) module to capture multi-scale spatial positional information. Finally, to explore more valuable information from limited labeled samples, we propose an advanced global prototypical representation learning strategy. Building upon the dual constraints of the global prototypical representation learning strategy, we introduce supervised contrastive learning (CL), which optimizes our network with three different constraints. The experimental results of three public datasets demonstrate that our method outperforms the existing state-of-the-art methods. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

15 pages, 3999 KB  
Article
DEU-Net: A Multi-Scale Fusion Staged Network for Magnetic Tile Defect Detection
by Yifan Huang, Zhiwen Huang and Tao Jin
Appl. Sci. 2024, 14(11), 4724; https://doi.org/10.3390/app14114724 - 30 May 2024
Cited by 5 | Viewed by 2032
Abstract
Surface defect detection is a critical task in the manufacturing industry to ensure product quality and machining efficiency. Image-based precise defect detection faces significant challenges due to defects lacking fixed shapes and the detection being heavily influenced by lighting conditions. Addressing the efficiency [...] Read more.
Surface defect detection is a critical task in the manufacturing industry to ensure product quality and machining efficiency. Image-based precise defect detection faces significant challenges due to defects lacking fixed shapes and the detection being heavily influenced by lighting conditions. Addressing the efficiency demands of defect detection algorithms, often deployed on embedded devices, and the highly imbalanced pixel ratio between foreground and background images, this paper introduces a multi-scale fusion staged U-shaped convolutional neural network (DEU-Net). The network provides segmentation results for defect anomalies while indicating the probability of defect presence. It enables the model to train with fewer parameters, a crucial requirement for practical applications. The proposed model achieves an MIoU of 66.94 and an F1 score of 74.89 with lower Params (36.675) and Flops (19.714). Comparative analysis with FCN, U-Net, Deeplab v3+, U-Net++, Attention U-Net, and Trans U-Net demonstrates the superiority of the proposed approach in surface defect detection. Full article
Show Figures

Figure 1

16 pages, 4121 KB  
Article
Short-Term Prediction of Time-Varying Passenger Flow for Intercity High-Speed Railways: A Neural Network Model Based on Multi-Source Data
by Huanyin Su, Shanglin Mo and Shuting Peng
Mathematics 2023, 11(16), 3446; https://doi.org/10.3390/math11163446 - 8 Aug 2023
Cited by 2 | Viewed by 2041
Abstract
The accurate prediction of passenger flow is crucial in improving the quality of the service of intercity high-speed railways. At present, there are a few studies on such predictions for railway origin–destination (O-D) pairs, and usually only a single factor is considered, yielding [...] Read more.
The accurate prediction of passenger flow is crucial in improving the quality of the service of intercity high-speed railways. At present, there are a few studies on such predictions for railway origin–destination (O-D) pairs, and usually only a single factor is considered, yielding a low prediction accuracy. In this paper, we propose a neural network model based on multi-source data (NN-MSD) to predict the O-D passenger flow of intercity high-speed railways at different times in one day in the short term, considering the factors of time, space, and weather. Firstly, the factors that influence time-varying passenger flow are analyzed based on multi-source data. The cyclical characteristics, spatial and temporal fusion characteristics, and weather characteristics are extracted. Secondly, a neural network model including three modules is designed based on the characteristics. A fully connected network (FCN) model is used in the first module to process the classification data. A bi-directional Long Short-Term Memory (Bi-LSTM) model is used in the second module to process the time series data. The results of the first module and the second module are spliced and fused in the third module using an FCN model. Finally, an experimental analysis is performed for the Guangzhou–Zhuhai intercity high-speed railway in China, in which three groups of comparison experiments are designed. The results show that the proposed NN-MSD model can predict many O-D pairs with a high and stable accuracy, which outperforms the baseline models, and multi-source data are very helpful in improving the prediction accuracy. Full article
(This article belongs to the Special Issue Advanced Methods in Intelligent Transportation Systems)
Show Figures

Figure 1

15 pages, 4845 KB  
Article
Cattle Target Segmentation Method in Multi-Scenes Using Improved DeepLabV3+ Method
by Tao Feng, Yangyang Guo, Xiaoping Huang and Yongliang Qiao
Animals 2023, 13(15), 2521; https://doi.org/10.3390/ani13152521 - 4 Aug 2023
Cited by 9 | Viewed by 2988
Abstract
Obtaining animal regions and the relative position relationship of animals in the scene is conducive to further studying animal habits, which is of great significance for smart animal farming. However, the complex breeding environment still makes detection difficult. To address the problems of [...] Read more.
Obtaining animal regions and the relative position relationship of animals in the scene is conducive to further studying animal habits, which is of great significance for smart animal farming. However, the complex breeding environment still makes detection difficult. To address the problems of poor target segmentation effects and the weak generalization ability of existing semantic segmentation models in complex scenes, a semantic segmentation model based on an improved DeepLabV3+ network (Imp-DeepLabV3+) was proposed. Firstly, the backbone network of the DeepLabV3+ model was replaced by MobileNetV2 to enhance the feature extraction capability of the model. Then, the layer-by-layer feature fusion method was adopted in the Decoder stage to integrate high-level semantic feature information with low-level high-resolution feature information at multi-scale to achieve more precise up-sampling operation. Finally, the SENet module was further introduced into the network to enhance information interaction after feature fusion and improve the segmentation precision of the model under complex datasets. The experimental results demonstrate that the Imp-DeepLabV3+ model achieved a high pixel accuracy (PA) of 99.4%, a mean pixel accuracy (MPA) of 98.1%, and a mean intersection over union (MIoU) of 96.8%. Compared to the original DeepLabV3+ model, the segmentation performance of the improved model significantly improved. Moreover, the overall segmentation performance of the Imp-DeepLabV3+ model surpassed that of other commonly used semantic segmentation models, such as Fully Convolutional Networks (FCNs), Lite Reduced Atrous Spatial Pyramid Pooling (LR-ASPP), and U-Net. Therefore, this study can be applied to the field of scene segmentation and is conducive to further analyzing individual information and promoting the development of intelligent animal farming. Full article
(This article belongs to the Section Cattle)
Show Figures

Figure 1

22 pages, 21052 KB  
Article
IRDC-Net: Lightweight Semantic Segmentation Network Based on Monocular Camera for Mobile Robot Navigation
by Thai-Viet Dang, Dinh-Manh-Cuong Tran and Phan Xuan Tan
Sensors 2023, 23(15), 6907; https://doi.org/10.3390/s23156907 - 3 Aug 2023
Cited by 23 | Viewed by 2974
Abstract
Computer vision plays a significant role in mobile robot navigation due to the wealth of information extracted from digital images. Mobile robots localize and move to the intended destination based on the captured images. Due to the complexity of the environment, obstacle avoidance [...] Read more.
Computer vision plays a significant role in mobile robot navigation due to the wealth of information extracted from digital images. Mobile robots localize and move to the intended destination based on the captured images. Due to the complexity of the environment, obstacle avoidance still requires a complex sensor system with a high computational efficiency requirement. This study offers a real-time solution to the problem of extracting corridor scenes from a single image using a lightweight semantic segmentation model integrating with the quantization technique to reduce the numerous training parameters and computational costs. The proposed model consists of an FCN as the decoder and MobilenetV2 as the decoder (with multi-scale fusion). This combination allows us to significantly minimize computation time while achieving high precision. Moreover, in this study, we also propose to use the Balance Cross-Entropy loss function to handle diverse datasets, especially those with class imbalances and to integrate a number of techniques, for example, the Adam optimizer and Gaussian filters, to enhance segmentation performance. The results demonstrate that our model can outperform baselines across different datasets. Moreover, when being applied to practical experiments with a real mobile robot, the proposed model’s performance is still consistent, supporting the optimal path planning, allowing the mobile robot to efficiently and effectively avoid the obstacles. Full article
(This article belongs to the Special Issue Artificial Intelligence in Computer Vision: Methods and Applications)
Show Figures

Figure 1

16 pages, 981 KB  
Article
Learning to Fuse Multiple Brain Functional Networks for Automated Autism Identification
by Chaojun Zhang, Yunling Ma, Lishan Qiao, Limei Zhang and Mingxia Liu
Biology 2023, 12(7), 971; https://doi.org/10.3390/biology12070971 - 8 Jul 2023
Cited by 7 | Viewed by 2224
Abstract
Functional connectivity network (FCN) has become a popular tool to identify potential biomarkers for brain dysfunction, such as autism spectrum disorder (ASD). Due to its importance, researchers have proposed many methods to estimate FCNs from resting-state functional MRI (rs-fMRI) data. However, the existing [...] Read more.
Functional connectivity network (FCN) has become a popular tool to identify potential biomarkers for brain dysfunction, such as autism spectrum disorder (ASD). Due to its importance, researchers have proposed many methods to estimate FCNs from resting-state functional MRI (rs-fMRI) data. However, the existing FCN estimation methods usually only capture a single relationship between brain regions of interest (ROIs), e.g., linear correlation, nonlinear correlation, or higher-order correlation, thus failing to model the complex interaction among ROIs in the brain. Additionally, such traditional methods estimate FCNs in an unsupervised way, and the estimation process is independent of the downstream tasks, which makes it difficult to guarantee the optimal performance for ASD identification. To address these issues, in this paper, we propose a multi-FCN fusion framework for rs-fMRI-based ASD classification. Specifically, for each subject, we first estimate multiple FCNs using different methods to encode rich interactions among ROIs from different perspectives. Then, we use the label information (ASD vs. healthy control (HC)) to learn a set of fusion weights for measuring the importance/discrimination of those estimated FCNs. Finally, we apply the adaptively weighted fused FCN on the ABIDE dataset to identify subjects with ASD from HCs. The proposed FCN fusion framework is straightforward to implement and can significantly improve diagnostic accuracy compared to traditional and state-of-the-art methods. Full article
Show Figures

Figure 1

25 pages, 7447 KB  
Article
Synergy of Sentinel-1 and Sentinel-2 Imagery for Crop Classification Based on DC-CNN
by Kaixin Zhang, Da Yuan, Huijin Yang, Jianhui Zhao and Ning Li
Remote Sens. 2023, 15(11), 2727; https://doi.org/10.3390/rs15112727 - 24 May 2023
Cited by 13 | Viewed by 5436
Abstract
Over the years, remote sensing technology has become an important means to obtain accurate agricultural production information, such as crop type distribution, due to its advantages of large coverage and a short observation period. Nowadays, the cooperative use of multi-source remote sensing imagery [...] Read more.
Over the years, remote sensing technology has become an important means to obtain accurate agricultural production information, such as crop type distribution, due to its advantages of large coverage and a short observation period. Nowadays, the cooperative use of multi-source remote sensing imagery has become a new development trend in the field of crop classification. In this paper, the polarimetric components of Sentinel-1 (S-1) decomposed by a new model-based decomposition method adapted to dual-polarized SAR data were introduced into crop classification for the first time. Furthermore, a Dual-Channel Convolutional Neural Network (DC-CNN) with feature extraction, feature fusion, and encoder-decoder modules for crop classification based on S-1 and Sentinel-2 (S-2) was constructed. The two branches can learn from each other by sharing parameters so as to effectively integrate the features extracted from multi-source data and obtain a high-precision crop classification map. In the proposed method, firstly, the backscattering components (VV, VH) and polarimetric components (volume scattering, remaining scattering) were obtained from S-1, and the multispectral feature was extracted from S-2. Four candidate combinations of multi-source features were formed with the above features. Following that, the optimal one was found on a trial. Next, the characteristics of optimal combinations were input into the corresponding network branches. In the feature extraction module, the features with strong collaboration ability in multi-source data were learned by parameter sharing, and they were deeply fused in the feature fusion module and encoder-decoder module to obtain more accurate classification results. The experimental results showed that the polarimetric components, which increased the difference between crop categories and reduced the misclassification rate, played an important role in crop classification. Among the four candidate feature combinations, the combination of S-1 and S-2 features had a higher classification accuracy than using a single data source, and the classification accuracy was the highest when two polarimetric components were utilized simultaneously. On the basis of the optimal combination of features, the effectiveness of the proposed method was verified. The classification accuracy of DC-CNN reached 98.40%, with Kappa scoring 0.98 and Macro-F1 scoring 0.98, compared to 2D-CNN (OA reached 94.87%, Kappa scored 0.92, and Macro-F1 scored 0.95), FCN (OA reached 96.27%, Kappa scored 0.94, and Macro-F1 scored 0.96), and SegNet (OA reached 96.90%, Kappa scored 0.95, and Macro-F1 scored 0.97). The results of this study demonstrated that the proposed method had significant potential for crop classification. Full article
Show Figures

Graphical abstract

17 pages, 4751 KB  
Article
Distract Your Attention: Multi-Head Cross Attention Network for Facial Expression Recognition
by Zhengyao Wen, Wenzhong Lin, Tao Wang and Ge Xu
Biomimetics 2023, 8(2), 199; https://doi.org/10.3390/biomimetics8020199 - 11 May 2023
Cited by 190 | Viewed by 12551
Abstract
This paper presents a novel facial expression recognition network, called Distract your Attention Network (DAN). Our method is based on two key observations in biological visual perception. Firstly, multiple facial expression classes share inherently similar underlying facial appearance, and their differences could be [...] Read more.
This paper presents a novel facial expression recognition network, called Distract your Attention Network (DAN). Our method is based on two key observations in biological visual perception. Firstly, multiple facial expression classes share inherently similar underlying facial appearance, and their differences could be subtle. Secondly, facial expressions simultaneously exhibit themselves through multiple facial regions, and for recognition, a holistic approach by encoding high-order interactions among local features is required. To address these issues, this work proposes DAN with three key components: Feature Clustering Network (FCN), Multi-head Attention Network (MAN), and Attention Fusion Network (AFN). Specifically, FCN extracts robust features by adopting a large-margin learning objective to maximize class separability. In addition, MAN instantiates a number of attention heads to simultaneously attend to multiple facial areas and build attention maps on these regions. Further, AFN distracts these attentions to multiple locations before fusing the feature maps to a comprehensive one. Extensive experiments on three public datasets (including AffectNet, RAF-DB, and SFEW 2.0) verified that the proposed method consistently achieves state-of-the-art facial expression recognition performance. The DAN code is publicly available. Full article
(This article belongs to the Special Issue Bio-Inspired Computing: Theories and Applications)
Show Figures

Figure 1

22 pages, 9929 KB  
Article
High-Precision Segmentation of Buildings with Small Sample Sizes Based on Transfer Learning and Multi-Scale Fusion
by Xiaobin Xu, Haojie Zhang, Yingying Ran and Zhiying Tan
Remote Sens. 2023, 15(9), 2436; https://doi.org/10.3390/rs15092436 - 5 May 2023
Cited by 8 | Viewed by 3255
Abstract
In order to improve the accuracy of the segmentation of buildings with small sample sizes, this paper proposes a building-segmentation network, ResFAUnet, with transfer learning and multi-scale feature fusion. The network is based on AttentionUnet. The backbone of the encoder is replaced by [...] Read more.
In order to improve the accuracy of the segmentation of buildings with small sample sizes, this paper proposes a building-segmentation network, ResFAUnet, with transfer learning and multi-scale feature fusion. The network is based on AttentionUnet. The backbone of the encoder is replaced by the ResNeXt101 network for feature extraction, and the attention mechanism of the skip connection is preserved to fuse the shallow features of the encoding part and the deep features of the decoding part. In the decoder, the feature-pyramid structure is used to fuse the feature maps of different scales. More features can be extracted from limited image samples. The proposed network is compared with current classical semantic segmentation networks, Unet, SuUnet, FCN, and SegNet. The experimental results show that in the dataset selected in this paper, the precision indicators of ResFAUnet are improved by 4.77%, 2.3%, 2.11%, and 1.57%, respectively, compared with the four comparison networks. Full article
(This article belongs to the Special Issue Remote Sensing and Machine Learning of Signal and Image Processing)
Show Figures

Figure 1

20 pages, 5886 KB  
Article
A Novel Hybridoma Cell Segmentation Method Based on Multi-Scale Feature Fusion and Dual Attention Network
by Jianfeng Lu, Hangpeng Ren, Mengtao Shi, Chen Cui, Shanqing Zhang, Mahmoud Emam and Li Li
Electronics 2023, 12(4), 979; https://doi.org/10.3390/electronics12040979 - 16 Feb 2023
Cited by 41 | Viewed by 2862
Abstract
The hybridoma cell screening method is usually done manually by human eyes during the production process for monoclonal antibody drugs. This traditional screening method has certain limitations, such as low efficiency and subjectivity bias. Furthermore, most of the existing deep learning-based image segmentation [...] Read more.
The hybridoma cell screening method is usually done manually by human eyes during the production process for monoclonal antibody drugs. This traditional screening method has certain limitations, such as low efficiency and subjectivity bias. Furthermore, most of the existing deep learning-based image segmentation methods have certain drawbacks, due to different shapes of hybridoma cells and uneven location distribution. In this paper, we propose a deep hybridoma cell image segmentation method based on residual and attention U-Net (RA-UNet). Firstly, the feature maps of the five modules in the network encoder are used for multi-scale feature fusion in a feature pyramid form and then spliced into the network decoder to enrich the semantic level of the feature maps in the decoder. Secondly, a dual attention mechanism module based on global and channel attention mechanisms is presented. The global attention mechanism (non-local neural network) is connected to the network decoder to expand the receptive field of the feature map and bring more rich information to the network. Then, the channel attention mechanism SENet (the squeeze-and-excitation network) is connected to the non-local attention mechanism. Consequently, the important features are enhanced by the learning of the feature channel weights, and the secondary features are suppressed, hence improving the cell segmentation performance and accuracy. Finally, the focal loss function is used to guide the network to learn the hard-to-classify cell categories. Furthermore, we evaluate the performance of the proposed RA-UNet method on a newly established hybridoma cell image dataset. Experimental results show that the proposed method has good reliability and improves the efficiency of hybridoma cell segmentation compared with state-of-the-art networks such as FCN, UNet, and UNet++. The results show that the proposed RA-UNet model has improvements of 0.8937%, 0.9926%, 0.9512%, and 0.9007% in terms of the dice coefficients, PA, MPA, and MIoU, respectively. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

Back to TopTop