Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (205)

Search Parameters:
Keywords = mask region-based convolutional neural network

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
39 pages, 5615 KB  
Article
A Method for Reconstructing and Predicting the Volume of Bowl-Type Tableware and Its Application in Dietary Analysis
by Xu Ji, Kai Song, Lianzheng Sun, Haolin Lu, Hengyuan Zhang and Yiran Feng
Symmetry 2026, 18(1), 199; https://doi.org/10.3390/sym18010199 - 21 Jan 2026
Viewed by 75
Abstract
To overcome the low accuracy of conventional methods for estimating liquid volume and food nutrient content in bowl-type tableware, as well as the tool dependence and time-consuming nature of manual measurements, this study proposes an integrated approach that combines geometric reconstruction with deep [...] Read more.
To overcome the low accuracy of conventional methods for estimating liquid volume and food nutrient content in bowl-type tableware, as well as the tool dependence and time-consuming nature of manual measurements, this study proposes an integrated approach that combines geometric reconstruction with deep learning–based segmentation. After a one-time camera calibration, only a frontal and a top-down image of a bowl are required. The pipeline automatically extracts key geometric information, including rim diameter, base diameter, bowl height, and the inner-wall profile, to complete geometric modeling and capacity computation. The estimated parameters are stored in a reusable bowl database, enabling repeated predictions of liquid volume and food nutrient content at different fill heights. We further propose Bowl Thick Net to predict bowl wall thickness with millimeter-level accuracy. In addition, we developed a Geometry-aware Feature Pyramid Network (GFPN) module and integrated it into an improved Mask R-CNN (Region-based Convolutional Neural Network) framework to enable precise segmentation of bowl contours. By integrating the contour mask with the predicted bowl wall thickness, precise geometric parameters for capacity estimation can be obtained. Liquid volume is then predicted using the geometric relationship of the liquid or food surface, while food nutrient content is estimated by coupling predicted food weight with a nutritional composition database. Experiments demonstrate an arithmetic mean error of −3.03% for bowl capacity estimation, a mean liquid-volume prediction error of 9.24%, and a mean nutrient-content (by weight) prediction error of 11.49% across eight food categories. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

19 pages, 2524 KB  
Article
Brain Tumour Classification Model Based on Spatial Block–Residual Block Collaborative Architecture with Strip Pooling Feature Fusion
by Meilan Tang, Xinlian Zhou and Zhiyong Li
J. Imaging 2025, 11(12), 427; https://doi.org/10.3390/jimaging11120427 - 29 Nov 2025
Viewed by 395
Abstract
Precise classification of brain tumors is crucial for early diagnosis and treatment, but obtaining tumor masks is extremely challenging, limiting the application of traditional methods. This paper proposes a brain tumor classification model based on whole-brain images, combining a spatial block–residual block cooperative [...] Read more.
Precise classification of brain tumors is crucial for early diagnosis and treatment, but obtaining tumor masks is extremely challenging, limiting the application of traditional methods. This paper proposes a brain tumor classification model based on whole-brain images, combining a spatial block–residual block cooperative architecture with striped pooling feature fusion to achieve multi-scale feature representation without requiring tumor masks. The model extracts fine-grained morphological features through three shallow VGG spatial blocks while capturing global contextual information between tumors and surrounding tissues via four deep ResNet residual blocks. Residual connections mitigate gradient vanishing. To effectively fuse multi-level features, strip pooling modules are introduced after the third spatial block and fourth residual block, enabling cross-layer feature integration—particularly optimizing representation of irregular tumor regions. The fused features undergo cross-scale concatenation, integrating both spatial perception and semantic information, and are ultimately classified via an end-to-end Softmax classifier. Experimental results demonstrate that the model achieves an accuracy of 97.29% in brain tumor image classification tasks, significantly outperforming traditional convolutional neural networks. This validates its effectiveness in achieving high-precision, multi-scale feature learning and classification without brain tumor masks, holding potential clinical application value. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

16 pages, 10443 KB  
Article
A Machine Learning-Based Model for Classifying the Shape of Tomato
by Trang-Thi Ho, Rosdyana Mangir Irawan Kusuma, Van Lam Ho and Hsiang Yin Wen
AgriEngineering 2025, 7(11), 373; https://doi.org/10.3390/agriengineering7110373 - 5 Nov 2025
Viewed by 928
Abstract
Most fruit classification studies rely on color-based features, but shape-based analysis provides a promising alternative for distinguishing subtle variations within the same variety. Tomato shape classification is challenging due to irregular contours, variable imaging conditions, and difficulty in extracting consistent geometric features. In [...] Read more.
Most fruit classification studies rely on color-based features, but shape-based analysis provides a promising alternative for distinguishing subtle variations within the same variety. Tomato shape classification is challenging due to irregular contours, variable imaging conditions, and difficulty in extracting consistent geometric features. In this study, we propose an efficient and structured workflow to address these challenges through contour-based analysis. The process begins with the application of a Mask Region-based Convolutional Neural Network (Mask R-CNN) model to accurately isolate tomatoes from the background. Subsequently, the segmented tomatoes are extracted and encoded using Elliptic Fourier Descriptors (EFDs) to capture detailed shape characteristics. These features are used to train a range of machine learning models, including Support Vector Machine (SVM), Random Forest, One-Dimensional Convolutional Neural Network (1D-CNN), and Bidirectional Encoder Representations from Transformers (BERT). Experimental results observe that the Random Forest model achieved the highest accuracy of 79.4%. This approach offers a robust, interpretable, and quantitative framework for tomato shape classification, reducing manual labor and supporting practical agricultural applications. Full article
(This article belongs to the Topic Digital Agriculture, Smart Farming and Crop Monitoring)
Show Figures

Figure 1

26 pages, 4332 KB  
Article
CDSANet: A CNN-ViT-Attention Network for Ship Instance Segmentation
by Weidong Zhu, Piao Wang and Kuifeng Luan
J. Imaging 2025, 11(11), 383; https://doi.org/10.3390/jimaging11110383 - 31 Oct 2025
Viewed by 669
Abstract
Ship instance segmentation in remote sensing images is essential for maritime applications such as intelligent surveillance and port management. However, this task remains challenging due to dense target distributions, large variations in ship scales and shapes, and limited high-quality datasets. The existing YOLOv8 [...] Read more.
Ship instance segmentation in remote sensing images is essential for maritime applications such as intelligent surveillance and port management. However, this task remains challenging due to dense target distributions, large variations in ship scales and shapes, and limited high-quality datasets. The existing YOLOv8 framework mainly relies on convolutional neural networks and CIoU loss, which are less effective in modeling global–local interactions and producing accurate mask boundaries. To address these issues, we propose CDSANet, a novel one-stage ship instance segmentation network. CDSANet integrates convolutional operations, Vision Transformers, and attention mechanisms within a unified architecture. The backbone adopts a Convolutional Vision Transformer Attention (CVTA) module to enhance both local feature extraction and global context perception. The neck employs dynamic-weighted DOWConv to adaptively handle multi-scale ship instances, while SIoU loss improves localization accuracy and orientation robustness. Additionally, CBAM enhances the network’s focus on salient regions, and a MixUp-based augmentation strategy is used to improve model generalization. Extensive experiments on the proposed VLRSSD dataset demonstrate that CDSANet achieves state-of-the-art performance with a mask AP (50–95) of 75.9%, surpassing the YOLOv8 baseline by 1.8%. Full article
Show Figures

Figure 1

30 pages, 4298 KB  
Article
Integrating Convolutional, Transformer, and Graph Neural Networks for Precision Agriculture and Food Security
by Esraa A. Mahareek, Mehmet Akif Cifci and Abeer S. Desuky
AgriEngineering 2025, 7(10), 353; https://doi.org/10.3390/agriengineering7100353 - 19 Oct 2025
Cited by 1 | Viewed by 1884
Abstract
Ensuring global food security requires accurate and robust solutions for crop health monitoring, weed detection, and large-scale land-cover classification. To this end, we propose AgroVisionNet, a hybrid deep learning framework that integrates Convolutional Neural Networks (CNNs) for local feature extraction, Vision Transformers (ViTs) [...] Read more.
Ensuring global food security requires accurate and robust solutions for crop health monitoring, weed detection, and large-scale land-cover classification. To this end, we propose AgroVisionNet, a hybrid deep learning framework that integrates Convolutional Neural Networks (CNNs) for local feature extraction, Vision Transformers (ViTs) for capturing long-range global dependencies, and Graph Neural Networks (GNNs) for modeling spatial relationships between image regions. The framework was evaluated on five diverse benchmark datasets—PlantVillage (leaf-level disease detection), Agriculture-Vision (field-scale anomaly segmentation), BigEarthNet (satellite-based land-cover classification), UAV Crop and Weed (weed segmentation), and EuroSAT (multi-class land-cover recognition). Across these datasets, AgroVisionNet consistently outperformed strong baselines including ResNet-50, EfficientNet-B0, ViT, and Mask R-CNN. For example, it achieved 97.8% accuracy and 95.6% IoU on PlantVillage, 94.5% accuracy on Agriculture-Vision, 92.3% accuracy on BigEarthNet, 91.5% accuracy on UAV Crop and Weed, and 96.4% accuracy on EuroSAT. These results demonstrate the framework’s robustness across tasks ranging from fine-grained disease detection to large-scale anomaly mapping. The proposed hybrid approach addresses persistent challenges in agricultural imaging, including class imbalance, image quality variability, and the need for multi-scale feature integration. By combining complementary architectural strengths, AgroVisionNet establishes a new benchmark for deep learning applications in precision agriculture. Full article
Show Figures

Figure 1

14 pages, 1787 KB  
Article
HE-DMDeception: Adversarial Attack Network for 3D Object Detection Based on Human Eye and Deep Learning Model Deception
by Pin Zhang, Yawen Liu, Heng Liu, Yichao Teng, Jiazheng Ni, Zhuansun Xiaobo and Jiajia Wang
Information 2025, 16(10), 867; https://doi.org/10.3390/info16100867 - 7 Oct 2025
Viewed by 768
Abstract
This paper presents HE-DMDeception, a novel adversarial attack network that integrates human visual deception with deep model deception to enhance the security of 3D object detection. Existing patch-based and camouflage methods can mislead deep learning models but struggle to generate visually imperceptible, high-quality [...] Read more.
This paper presents HE-DMDeception, a novel adversarial attack network that integrates human visual deception with deep model deception to enhance the security of 3D object detection. Existing patch-based and camouflage methods can mislead deep learning models but struggle to generate visually imperceptible, high-quality textures. Our framework employs a CycleGAN-based camouflage network to generate highly camouflaged background textures, while a dedicated deception module disrupts non-maximum suppression (NMS) and attention mechanisms through optimized constraints that balance attack efficacy and visual fidelity. To overcome the scarcity of annotated vehicle data, an image segmentation module based on the pre-trained Segment Anything (SAM) model is introduced, leveraging a two-stage training strategy combining semi-supervised self-training and supervised fine-tuning. Experimental results show that the minimum P@0.5 values (50%, 55%, 20%, 25%, 25%) were achieved by HE-DMDeception across You Only Look Once version 8 (YOLOv8), Real-Time Detection Transformer (RT-DETR), Fast Region-based Convolutional Neural Network (Faster-RCNN), Single Shot MultiBox Detector (SSD), and MaskRegion-based Convolutional Neural Network (Mask RCNN) detection models, while maintaining high visual consistency with the original camouflage. These findings demonstrate the robustness and practicality of HE-DMDeception, offering new insights into 3D object detection adversarial attacks. Full article
Show Figures

Figure 1

20 pages, 162180 KB  
Article
Annotation-Efficient and Domain-General Segmentation from Weak Labels: A Bounding Box-Guided Approach
by Ammar M. Okran, Hatem A. Rashwan, Sylvie Chambon and Domenec Puig
Electronics 2025, 14(19), 3917; https://doi.org/10.3390/electronics14193917 - 1 Oct 2025
Cited by 2 | Viewed by 1088
Abstract
Manual pixel-level annotation remains a major bottleneck in deploying deep learning models for dense prediction and semantic segmentation tasks across domains. This challenge is especially pronounced in applications involving fine-scale structures, such as cracks in infrastructure or lesions in medical imaging, where annotations [...] Read more.
Manual pixel-level annotation remains a major bottleneck in deploying deep learning models for dense prediction and semantic segmentation tasks across domains. This challenge is especially pronounced in applications involving fine-scale structures, such as cracks in infrastructure or lesions in medical imaging, where annotations are time-consuming, expensive, and subject to inter-observer variability. To address these challenges, this work proposes a weakly supervised and annotation-efficient segmentation framework that integrates sparse bounding-box annotations with a limited subset of strong (pixel-level) labels to train robust segmentation models. The fundamental element of the framework is a lightweight Bounding Box Encoder that converts weak annotations into multi-scale attention maps. These maps guide a ConvNeXt-Base encoder, and a lightweight U-Net–style convolutional neural network (CNN) decoder—using nearest-neighbor upsampling and skip connections—reconstructs the final segmentation mask. This design enables the model to focus on semantically relevant regions without relying on full supervision, drastically reducing annotation cost while maintaining high accuracy. We validate our framework on two distinct domains, road crack detection and skin cancer segmentation, demonstrating that it achieves performance comparable to fully supervised segmentation models using only 10–20% of strong annotations. Given the ability of the proposed framework to generalize across varied visual contexts, it has strong potential as a general annotation-efficient segmentation tool for domains where strong labeling is costly or infeasible. Full article
Show Figures

Figure 1

18 pages, 13697 KB  
Article
A New Anticyclone Identification Method Based on Mask R-CNN Model and Its Application
by Yang Kong, Hao Wu, Ping Xia and Yumin Zhang
Atmosphere 2025, 16(10), 1140; https://doi.org/10.3390/atmos16101140 - 28 Sep 2025
Viewed by 518
Abstract
In recent decades, frequent cold waves and low-temperature events in mid-to-high latitude Eurasia have severely impacted socioeconomic activities in Northeast China. Accurately identifying anticyclones is essential due to their close relation to cold air activity. This study proposes a new anticyclone identification method [...] Read more.
In recent decades, frequent cold waves and low-temperature events in mid-to-high latitude Eurasia have severely impacted socioeconomic activities in Northeast China. Accurately identifying anticyclones is essential due to their close relation to cold air activity. This study proposes a new anticyclone identification method using the Mask region-based convolutional neural network (Mask R-CNN) model to detect synoptic-scale anticyclones by capturing their two-dimensional structural features and investigating their relationship with snow-ice disasters in Northeast China. It is found that compared with traditional objective identification methods, the new method better captures the overall structural characteristics of anticyclones, significantly improving the description of large-scale, strong anticyclones. Specifically, it incorporates 7.3% of small-scale anticyclones into larger-scale systems. Anticyclones are closely correlated with local cooling and cold air mass changes over Northeast China, with 60% of anticyclones accompanying regional cold air mass accumulation and temperature drops. Two case studies of the rare rain-snow and cold wave events revealed that these events were preceded by the generation and eastward expansion of an upstream anticyclone identified by the new method. This demonstrates that the proposed method can effectively track anticyclones and the evolution of cold high-pressure systems, providing insights into extreme cold events. Full article
Show Figures

Figure 1

26 pages, 7402 KB  
Article
Hybrid Architecture for Tight Sandstone: Automated Mineral Identification and Quantitative Petrology
by Lanfang Dong, Chenxu Sun, Xiaolu Yu, Xinming Zhang, Menglian Chen and Mingyang Xu
Minerals 2025, 15(9), 962; https://doi.org/10.3390/min15090962 - 11 Sep 2025
Viewed by 615
Abstract
This study proposes an integrated computer vision system for automated petrological analysis of tight sandstone micro-structures. The system combines Zero-Shot Segmentation SAM (Segment Anything Model), Mask R-CNN (Region-Based Convolutional Neural Networks) instance segmentation, and an improved MetaFormer architecture with Cascaded Group Attention (CGA) [...] Read more.
This study proposes an integrated computer vision system for automated petrological analysis of tight sandstone micro-structures. The system combines Zero-Shot Segmentation SAM (Segment Anything Model), Mask R-CNN (Region-Based Convolutional Neural Networks) instance segmentation, and an improved MetaFormer architecture with Cascaded Group Attention (CGA) attention mechanism, together with a parameter analysis module to form a hybrid deep learning system. This enables end-to-end mineral identification and multi-scale structural quantification of granulometric properties, grain contact relationships, and pore networks. The system is validated on proprietary tight sandstone datasets, SMISD (Sandstone Microscopic Image Segmentation Dataset)/SMIRD (Sandstone Microscopic Image Recognition Dataset). It achieves 92.1% mIoU segmentation accuracy and 90.7% mineral recognition accuracy while reducing processing time from more than 30 min to less than 2 min per sample. The system provides standardized reservoir characterization through automated generation of quantitative reports (Excel), analytical images (JPG), and structured data (JSON), demonstrating production-ready efficiency for tight sandstone evaluation. Full article
(This article belongs to the Section Mineral Exploration Methods and Applications)
Show Figures

Figure 1

27 pages, 16753 KB  
Article
A 1°-Resolution Global Ionospheric TEC Modeling Method Based on a Dual-Branch Input Convolutional Neural Network
by Nian Liu, Yibin Yao and Liang Zhang
Remote Sens. 2025, 17(17), 3095; https://doi.org/10.3390/rs17173095 - 5 Sep 2025
Viewed by 1441
Abstract
Total Electron Content (TEC) is a fundamental parameter characterizing the electron density distribution in the ionosphere. Traditional global TEC modeling approaches predominantly rely on mathematical methods (such as spherical harmonic function fitting), often resulting in models suffering from excessive smoothing and low accuracy. [...] Read more.
Total Electron Content (TEC) is a fundamental parameter characterizing the electron density distribution in the ionosphere. Traditional global TEC modeling approaches predominantly rely on mathematical methods (such as spherical harmonic function fitting), often resulting in models suffering from excessive smoothing and low accuracy. While the 1° high-resolution global TEC model released by MIT offers improved temporal-spatial resolution, it exhibits regions of data gaps. Existing ionospheric image completion methods frequently employ Generative Adversarial Networks (GANs), which suffer from drawbacks such as complex model structures and lengthy training times. We propose a novel high-resolution global ionospheric TEC modeling method based on a Dual-Branch Convolutional Neural Network (DB-CNN) designed for the completion and restoration of incomplete 1°-resolution ionospheric TEC images. The novel model utilizes a dual-branch input structure: the background field, generated using the International Reference Ionosphere (IRI) model TEC maps, and the observation field, consisting of global incomplete TEC maps coupled with their corresponding mask maps. An asymmetric dual-branch parallel encoder, feature fusion, and residual decoder framework enables precise reconstruction of missing regions, ultimately generating a complete global ionospheric TEC map. Experimental results demonstrate that the model achieves Root Mean Square Errors (RMSE) of 0.30 TECU and 1.65 TECU in the observed and unobserved regions, respectively, in simulated data experiments. For measured experiments, the RMSE values are 1.39 TECU and 1.93 TECU in the observed and unobserved regions. Validation results utilizing Jason-3 altimeter-measured VTEC demonstrate that the model achieves stable reconstruction performance across all four seasons and various time periods. In key-day comparisons, its STD and RMSE consistently outperform those of the CODE global ionospheric model (GIM). Furthermore, a long-term evaluation from 2021 to 2024 reveals that, compared to the CODE model, the DB-CNN achieves average reductions of 38.2% in STD and 23.5% in RMSE. This study provides a novel dual-branch input convolutional neural network-based method for constructing 1°-resolution global ionospheric products, offering significant application value for enhancing GNSS positioning accuracy and space weather monitoring capabilities. Full article
Show Figures

Figure 1

54 pages, 2856 KB  
Review
Applications, Trends, and Challenges of Precision Weed Control Technologies Based on Deep Learning and Machine Vision
by Xiangxin Gao, Jianmin Gao and Waqar Ahmed Qureshi
Agronomy 2025, 15(8), 1954; https://doi.org/10.3390/agronomy15081954 - 13 Aug 2025
Cited by 6 | Viewed by 4938
Abstract
Advanced computer vision (CV) and deep learning (DL) are essential for sustainable agriculture via automated vegetation management. This paper methodically reviews advancements in these technologies for agricultural settings, analyzing their fundamental principles, designs, system integration, and practical applications. The amalgamation of transformer topologies [...] Read more.
Advanced computer vision (CV) and deep learning (DL) are essential for sustainable agriculture via automated vegetation management. This paper methodically reviews advancements in these technologies for agricultural settings, analyzing their fundamental principles, designs, system integration, and practical applications. The amalgamation of transformer topologies with convolutional neural networks (CNNs) in models such as YOLO (You Only Look Once) and Mask R-CNN (Region-Based Convolutional Neural Network) markedly enhances target recognition and semantic segmentation. The integration of LiDAR (Light Detection and Ranging) with multispectral imagery significantly improves recognition accuracy in intricate situations. Moreover, the integration of deep learning models with control systems, which include laser modules, robotic arms, and precision spray nozzles, facilitates the development of intelligent robotic mowing systems that significantly diminish chemical herbicide consumption and enhance operational efficiency relative to conventional approaches. Significant obstacles persist, including restricted environmental adaptability, real-time processing limitations, and inadequate model generalization. Future directions entail the integration of varied data sources, the development of streamlined models, and the enhancement of intelligent decision-making systems, establishing a framework for the advancement of sustainable agricultural technology. Full article
(This article belongs to the Special Issue Research Progress in Agricultural Robots in Arable Farming)
Show Figures

Figure 1

14 pages, 2616 KB  
Article
Novel Throat-Attached Piezoelectric Sensors Based on Adam-Optimized Deep Belief Networks
by Ben Wang, Hua Xia, Yang Feng, Bingkun Zhang, Haoda Yu, Xulehan Yu and Keyong Hu
Micromachines 2025, 16(8), 841; https://doi.org/10.3390/mi16080841 - 22 Jul 2025
Viewed by 742
Abstract
This paper proposes an Adam-optimized Deep Belief Networks (Adam-DBNs) denoising method for throat-attached piezoelectric signals. The method aims to process mechanical vibration signals captured through polyvinylidene fluoride (PVDF) sensors attached to the throat region, which are typically contaminated by environmental noise and physiological [...] Read more.
This paper proposes an Adam-optimized Deep Belief Networks (Adam-DBNs) denoising method for throat-attached piezoelectric signals. The method aims to process mechanical vibration signals captured through polyvinylidene fluoride (PVDF) sensors attached to the throat region, which are typically contaminated by environmental noise and physiological noise. First, the short-time Fourier transform (STFT) is utilized to convert the original signals into the time–frequency domain. Subsequently, the masked time–frequency representation is reconstructed into the time domain through a diagonal average-based inverse STFT. To address complex nonlinear noise structures, a Deep Belief Network is further adopted to extract features and reconstruct clean signals, where the Adam optimization algorithm ensures the efficient convergence and stability of the training process. Compared with traditional Convolutional Neural Networks (CNNs), Adam-DBNs significantly improve waveform similarity by 6.77% and reduce the local noise energy residue by 0.099696. These results demonstrate that the Adam-DBNs method exhibits substantial advantages in signal reconstruction fidelity and residual noise suppression, providing an efficient and robust solution for throat-attached piezoelectric sensor signal enhancement tasks. Full article
(This article belongs to the Section E:Engineering and Technology)
Show Figures

Figure 1

24 pages, 824 KB  
Article
MMF-Gait: A Multi-Model Fusion-Enhanced Gait Recognition Framework Integrating Convolutional and Attention Networks
by Kamrul Hasan, Khandokar Alisha Tuhin, Md Rasul Islam Bapary, Md Shafi Ud Doula, Md Ashraful Alam, Md Atiqur Rahman Ahad and Md. Zasim Uddin
Symmetry 2025, 17(7), 1155; https://doi.org/10.3390/sym17071155 - 19 Jul 2025
Cited by 1 | Viewed by 1523
Abstract
Gait recognition is a reliable biometric approach that uniquely identifies individuals based on their natural walking patterns. It is widely used to recognize individuals who are challenging to camouflage and do not require a person’s cooperation. The general face-based person recognition system often [...] Read more.
Gait recognition is a reliable biometric approach that uniquely identifies individuals based on their natural walking patterns. It is widely used to recognize individuals who are challenging to camouflage and do not require a person’s cooperation. The general face-based person recognition system often fails to determine the offender’s identity when they conceal their face by wearing helmets and masks to evade identification. In such cases, gait-based recognition is ideal for identifying offenders, and most existing work leverages a deep learning (DL) model. However, a single model often fails to capture a comprehensive selection of refined patterns in input data when external factors are present, such as variation in viewing angle, clothing, and carrying conditions. In response to this, this paper introduces a fusion-based multi-model gait recognition framework that leverages the potential of convolutional neural networks (CNNs) and a vision transformer (ViT) in an ensemble manner to enhance gait recognition performance. Here, CNNs capture spatiotemporal features, and ViT features multiple attention layers that focus on a particular region of the gait image. The first step in this framework is to obtain the Gait Energy Image (GEI) by averaging a height-normalized gait silhouette sequence over a gait cycle, which can handle the left–right gait symmetry of the gait. After that, the GEI image is fed through multiple pre-trained models and fine-tuned precisely to extract the depth spatiotemporal feature. Later, three separate fusion strategies are conducted, and the first one is decision-level fusion (DLF), which takes each model’s decision and employs majority voting for the final decision. The second is feature-level fusion (FLF), which combines the features from individual models through pointwise addition before performing gait recognition. Finally, a hybrid fusion combines DLF and FLF for gait recognition. The performance of the multi-model fusion-based framework was evaluated on three publicly available gait databases: CASIA-B, OU-ISIR D, and the OU-ISIR Large Population dataset. The experimental results demonstrate that the fusion-enhanced framework achieves superior performance. Full article
(This article belongs to the Special Issue Symmetry and Its Applications in Image Processing)
Show Figures

Figure 1

21 pages, 2471 KB  
Article
Attention-Based Mask R-CNN Enhancement for Infrared Image Target Segmentation
by Liang Wang and Kan Ren
Symmetry 2025, 17(7), 1099; https://doi.org/10.3390/sym17071099 - 9 Jul 2025
Cited by 2 | Viewed by 1888
Abstract
Image segmentation is an important method in the field of image processing, while infrared (IR) image segmentation is one of the challenges in this field due to the unique characteristics of IR data. Infrared imaging utilizes the infrared radiation emitted by objects to [...] Read more.
Image segmentation is an important method in the field of image processing, while infrared (IR) image segmentation is one of the challenges in this field due to the unique characteristics of IR data. Infrared imaging utilizes the infrared radiation emitted by objects to produce images, which can supplement the performance of visible-light images under adverse lighting conditions to some extent. However, the low spatial resolution and limited texture details in IR images hinder the achievement of high-precision segmentation. To address these issues, an attention mechanism based on symmetrical cross-channel interaction—motivated by symmetry principles in computer vision—was integrated into a Mask Region-Based Convolutional Neural Network (Mask R-CNN) framework. A Bottleneck-enhanced Squeeze-and-Attention (BNSA) module was incorporated into the backbone network, and novel loss functions were designed for both the bounding box (Bbox) regression and mask prediction branches to enhance segmentation performance. Furthermore, a dedicated infrared image dataset was constructed to validate the proposed method. The experimental results demonstrate that the optimized model achieves higher segmentation accuracy and better segmentation performance compared to the original network and other mainstream segmentation models on our dataset, demonstrating how symmetrical design principles can effectively improve complex vision tasks. Full article
(This article belongs to the Special Issue Symmetry and Its Applications in Computer Vision)
Show Figures

Figure 1

22 pages, 47906 KB  
Article
Spatial Localization of Broadleaf Species in Mixed Forests in Northern Japan Using UAV Multi-Spectral Imagery and Mask R-CNN Model
by Nyo Me Htun, Toshiaki Owari, Satoshi N. Suzuki, Kenji Fukushi, Yuuta Ishizaki, Manato Fushimi, Yamato Unno, Ryota Konda and Satoshi Kita
Remote Sens. 2025, 17(13), 2111; https://doi.org/10.3390/rs17132111 - 20 Jun 2025
Cited by 1 | Viewed by 1757
Abstract
Precise spatial localization of broadleaf species is crucial for efficient forest management and ecological studies. This study presents an advanced approach for segmenting and classifying broadleaf tree species, including Japanese oak (Quercus crispula), in mixed forests using multi-spectral imagery captured by [...] Read more.
Precise spatial localization of broadleaf species is crucial for efficient forest management and ecological studies. This study presents an advanced approach for segmenting and classifying broadleaf tree species, including Japanese oak (Quercus crispula), in mixed forests using multi-spectral imagery captured by unmanned aerial vehicles (UAVs) and deep learning. High-resolution UAV images, including RGB and NIR bands, were collected from two study sites in Hokkaido, Japan: Sub-compartment 97g in the eastern region and Sub-compartment 68E in the central region. A Mask Region-based Convolutional Neural Network (Mask R-CNN) framework was employed to recognize and classify single tree crowns based on annotated training data. The workflow incorporated UAV-derived imagery and crown annotations, supporting reliable model development and evaluation. Results showed that combining multi-spectral bands (RGB and NIR) with canopy height model (CHM) data significantly improved classification performance at both study sites. In Sub-compartment 97g, the RGB + NIR + CHM achieved a precision of 0.76, recall of 0.74, and F1-score of 0.75, compared to 0.73, 0.74, and 0.73 using RGB alone; 0.68, 0.70, and 0.66 with RGB + NIR; and 0.63, 0.67, and 0.63 with RGB + CHM. Similarly, at Sub-compartment 68E, the RGB + NIR + CHM attained a precision of 0.81, recall of 0.78, and F1-score of 0.80, outperforming RGB alone (0.79, 0.79, 0.78), RGB + NIR (0.75, 0.74, 0.72), and RGB + CHM (0.76, 0.75, 0.74). These consistent improvements across diverse forest conditions highlight the effectiveness of integrating spectral (RGB and NIR) and structural (CHM) data. These findings underscore the value of integrating UAV multi-spectral imagery with deep learning techniques for reliable, large-scale identification of tree species and forest monitoring. Full article
Show Figures

Figure 1

Back to TopTop